July 29, 2019

3009 words 15 mins read

Paper Group AWR 146

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference. Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly. Learning Functional Causal Models with Generative Neural Networks. Argument Mining with Structured SVMs and RNNs. Reporting Score Distributions Makes a Difference: Pe …

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference


Title	Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference
Authors	Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, Diana Inkpen
Abstract	The RepEval 2017 Shared Task aims to evaluate natural language understanding models for sentence representation, in which a sentence is represented as a fixed-length vector with neural networks and the quality of the representation is tested with a natural language inference task. This paper describes our system (alpha) that is ranked among the top in the Shared Task, on both the in-domain test set (obtaining a 74.9% accuracy) and on the cross-domain test set (also attaining a 74.9% accuracy), demonstrating that the model generalizes well to the cross-domain data. Our model is equipped with intra-sentence gated-attention composition which helps achieve a better performance. In addition to submitting our model to the Shared Task, we have also tested it on the Stanford Natural Language Inference (SNLI) dataset. We obtain an accuracy of 85.5%, which is the best reported result on SNLI when cross-sentence attention is not allowed, the same condition enforced in RepEval 2017.
Tasks	Natural Language Inference
Published	2017-08-04
URL	http://arxiv.org/abs/1708.01353v1
PDF	http://arxiv.org/pdf/1708.01353v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-network-based-sentence
Repo	https://github.com/eilon47/DL_Ass4
Framework	pytorch

Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly


Title	Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly
Authors	Yongqin Xian, Christoph H. Lampert, Bernt Schiele, Zeynep Akata
Abstract	Due to the importance of zero-shot learning, i.e. classifying images where there is a lack of labeled training data, the number of proposed approaches has recently increased steadily. We argue that it is time to take a step back and to analyze the status quo of the area. The purpose of this paper is three-fold. First, given the fact that there is no agreed upon zero-shot learning benchmark, we first define a new benchmark by unifying both the evaluation protocols and data splits of publicly available datasets used for this task. This is an important contribution as published results are often not comparable and sometimes even flawed due to, e.g. pre-training on zero-shot test classes. Moreover, we propose a new zero-shot learning dataset, the Animals with Attributes 2 (AWA2) dataset which we make publicly available both in terms of image features and the images themselves. Second, we compare and analyze a significant number of the state-of-the-art methods in depth, both in the classic zero-shot setting but also in the more realistic generalized zero-shot setting. Finally, we discuss in detail the limitations of the current status of the area which can be taken as a basis for advancing it.
Tasks	Zero-Shot Learning
Published	2017-07-03
URL	http://arxiv.org/abs/1707.00600v3
PDF	http://arxiv.org/pdf/1707.00600v3.pdf
PWC	https://paperswithcode.com/paper/zero-shot-learning-a-comprehensive-evaluation
Repo	https://github.com/vkverma01/Zero-Shot-Learning
Framework	none

Learning Functional Causal Models with Generative Neural Networks


Title	Learning Functional Causal Models with Generative Neural Networks
Authors	Olivier Goudet, Diviyan Kalainathan, Philippe Caillou, Isabelle Guyon, David Lopez-Paz, Michèle Sebag
Abstract	We introduce a new approach to functional causal modeling from observational data, called Causal Generative Neural Networks (CGNN). CGNN leverages the power of neural networks to learn a generative model of the joint distribution of the observed variables, by minimizing the Maximum Mean Discrepancy between generated and observed data. An approximate learning criterion is proposed to scale the computational cost of the approach to linear complexity in the number of observations. The performance of CGNN is studied throughout three experiments. Firstly, CGNN is applied to cause-effect inference, where the task is to identify the best causal hypothesis out of $X\rightarrow Y$ and $Y\rightarrow X$. Secondly, CGNN is applied to the problem of identifying v-structures and conditional independences. Thirdly, CGNN is applied to multivariate functional causal modeling: given a skeleton describing the direct dependences in a set of random variables $\textbf{X} = [X_1, \ldots, X_d]$, CGNN orients the edges in the skeleton to uncover the directed acyclic causal graph describing the causal structure of the random variables. On all three tasks, CGNN is extensively assessed on both artificial and real-world data, comparing favorably to the state-of-the-art. Finally, CGNN is extended to handle the case of confounders, where latent variables are involved in the overall causal model.
Tasks
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05321v3
PDF	http://arxiv.org/pdf/1709.05321v3.pdf
PWC	https://paperswithcode.com/paper/learning-functional-causal-models-with
Repo	https://github.com/yrodill/internship-Angers
Framework	none

Argument Mining with Structured SVMs and RNNs


Title	Argument Mining with Structured SVMs and RNNs
Authors	Vlad Niculae, Joonsuk Park, Claire Cardie
Abstract	We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20% of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizations, can enforce structure constraints (e.g., transitivity), and can express dependencies between adjacent relations and propositions. Our approaches outperform unstructured baselines in both web comments and argumentative essay datasets.
Tasks	Argument Mining
Published	2017-04-23
URL	http://arxiv.org/abs/1704.06869v1
PDF	http://arxiv.org/pdf/1704.06869v1.pdf
PWC	https://paperswithcode.com/paper/argument-mining-with-structured-svms-and-rnns
Repo	https://github.com/vene/marseille
Framework	none

Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging


Title	Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging
Authors	Nils Reimers, Iryna Gurevych
Abstract	In this paper we show that reporting a single performance score is insufficient to compare non-deterministic approaches. We demonstrate for common sequence tagging tasks that the seed value for the random number generator can result in statistically significant (p < 10^-4) differences for state-of-the-art systems. For two recent systems for NER, we observe an absolute difference of one percentage point F1-score depending on the selected seed value, making these systems perceived either as state-of-the-art or mediocre. Instead of publishing and reporting single performance scores, we propose to compare score distributions based on multiple executions. Based on the evaluation of 50.000 LSTM-networks for five sequence tagging tasks, we present network architectures that produce both superior performance as well as are more stable with respect to the remaining hyperparameters.
Tasks
Published	2017-07-31
URL	http://arxiv.org/abs/1707.09861v1
PDF	http://arxiv.org/pdf/1707.09861v1.pdf
PWC	https://paperswithcode.com/paper/reporting-score-distributions-makes-a
Repo	https://github.com/vietnlp/etnlp
Framework	none

AI Safety Gridworlds


Title	AI Safety Gridworlds
Authors	Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg
Abstract	We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.
Tasks	Safe Exploration
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09883v2
PDF	http://arxiv.org/pdf/1711.09883v2.pdf
PWC	https://paperswithcode.com/paper/ai-safety-gridworlds
Repo	https://github.com/deepmind/ai-safety-gridworlds
Framework	tf

Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals


Title	Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals
Authors	Shanxin Yuan, Guillermo Garcia-Hernando, Bjorn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, Junsong Yuan, Xinghao Chen, Guijin Wang, Fan Yang, Kai Akiyama, Yang Wu, Qingfu Wan, Meysam Madadi, Sergio Escalera, Shile Li, Dongheui Lee, Iason Oikonomidis, Antonis Argyros, Tae-Kyun Kim
Abstract	Official Torch7 implementation of “V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map”, CVPR 2018
Tasks	3D Pose Estimation, Hand Pose Estimation, Pose Estimation
Published	2017-12-11
URL	http://arxiv.org/abs/1712.03917v2
PDF	http://arxiv.org/pdf/1712.03917v2.pdf
PWC	https://paperswithcode.com/paper/depth-based-3d-hand-pose-estimation-from
Repo	https://github.com/mks0601/V2V-PoseNet_RELEASE
Framework	pytorch

Words are Malleable: Computing Semantic Shifts in Political and Media Discourse


Title	Words are Malleable: Computing Semantic Shifts in Political and Media Discourse
Authors	Hosein Azarbonyad, Mostafa Dehghani, Kaspar Beelen, Alexandra Arkut, Maarten Marx, Jaap Kamps
Abstract	Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other valuable dimensions such as social or political variability. We propose an approach for detecting semantic shifts between different viewpoints–broadly defined as a set of texts that share a specific metadata feature, which can be a time-period, but also a social entity such as a political party. For each viewpoint, we learn a semantic space in which each word is represented as a low dimensional neural embedded vector. The challenge is to compare the meaning of a word in one space to its meaning in another space and measure the size of the semantic shifts. We compare the effectiveness of a measure based on optimal transformations between the two spaces with a measure based on the similarity of the neighbors of the word in the respective spaces. Our experiments demonstrate that the combination of these two performs best. We show that the semantic shifts not only occur over time, but also along different viewpoints in a short period of time. For evaluation, we demonstrate how this approach captures meaningful semantic shifts and can help improve other tasks such as the contrastive viewpoint summarization and ideology detection (measured as classification accuracy) in political texts. We also show that the two laws of semantic change which were empirically shown to hold for temporal shifts also hold for shifts across viewpoints. These laws state that frequent words are less likely to shift meaning while words with many senses are more likely to do so.
Tasks
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05603v1
PDF	http://arxiv.org/pdf/1711.05603v1.pdf
PWC	https://paperswithcode.com/paper/words-are-malleable-computing-semantic-shifts
Repo	https://github.com/MLBurnham/word_embeddings
Framework	none

Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN


Title	Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN
Authors	Lvmin Zhang, Yi Ji, Xin Lin
Abstract	Recently, with the revolutionary neural style transferring methods, creditable paintings can be synthesized automatically from content images and style images. However, when it comes to the task of applying a painting’s style to an anime sketch, these methods will just randomly colorize sketch lines as outputs and fail in the main task: specific style tranfer. In this paper, we integrated residual U-net to apply the style to the gray-scale sketch with auxiliary classifier generative adversarial network (AC-GAN). The whole process is automatic and fast, and the results are creditable in the quality of art style as well as colorization.
Tasks	Colorization, Style Transfer
Published	2017-06-11
URL	http://arxiv.org/abs/1706.03319v2
PDF	http://arxiv.org/pdf/1706.03319v2.pdf
PWC	https://paperswithcode.com/paper/style-transfer-for-anime-sketches-with
Repo	https://github.com/lllyasviel/style2paints
Framework	none

Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55


Title	Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55
Authors	Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra, Narasimha Murthy, Bhargava Ramu, Bharadwaj Manda, M Ramanathan, Gautam Kumar, P Preetham, Siddharth Srivastava, Swati Bhugra, Brejesh Lall, Christian Haene, Shubham Tulsiani, Jitendra Malik, Jared Lafer, Ramsey Jones, Siyuan Li, Jie Lu, Shi Jin, Jingyi Yu, Qixing Huang, Evangelos Kalogerakis, Silvio Savarese, Pat Hanrahan, Thomas Funkhouser, Hao Su, Leonidas Guibas
Abstract	We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database. The benchmark consists of two tasks: part-level segmentation of 3D shapes and 3D reconstruction from single view images. Ten teams have participated in the challenge and the best performing teams have outperformed state-of-the-art approaches on both tasks. A few novel deep learning architectures have been proposed on various 3D representations on both tasks. We report the techniques used by each team and the corresponding performances. In addition, we summarize the major discoveries from the reported results and possible trends for the future work in the field.
Tasks	3D Reconstruction
Published	2017-10-17
URL	http://arxiv.org/abs/1710.06104v2
PDF	http://arxiv.org/pdf/1710.06104v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-3d-shape-reconstruction-and
Repo	https://github.com/facebookresearch/SparseConvNet
Framework	pytorch

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations


Title	InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations
Authors	Yunzhu Li, Jiaming Song, Stefano Ermon
Abstract	The goal of imitation learning is to mimic expert behavior without access to an explicit reward signal. Expert demonstrations provided by humans, however, often show significant variability due to latent factors that are typically not explicitly modeled. In this paper, we propose a new algorithm that can infer the latent structure of expert demonstrations in an unsupervised way. Our method, built on top of Generative Adversarial Imitation Learning, can not only imitate complex behaviors, but also learn interpretable and meaningful representations of complex behavioral data, including visual demonstrations. In the driving domain, we show that a model learned from human demonstrations is able to both accurately reproduce a variety of behaviors and accurately anticipate human actions using raw visual inputs. Compared with various baselines, our method can better capture the latent structure underlying expert demonstrations, often recovering semantically meaningful factors of variation in the data.
Tasks	Imitation Learning
Published	2017-03-26
URL	http://arxiv.org/abs/1703.08840v2
PDF	http://arxiv.org/pdf/1703.08840v2.pdf
PWC	https://paperswithcode.com/paper/infogail-interpretable-imitation-learning
Repo	https://github.com/ermongroup/InfoGAIL
Framework	tf

Delayed Sampling and Automatic Rao-Blackwellization of Probabilistic Programs


Title	Delayed Sampling and Automatic Rao-Blackwellization of Probabilistic Programs
Authors	Lawrence M. Murray, Daniel Lundén, Jan Kudlicka, David Broman, Thomas B. Schön
Abstract	We introduce a dynamic mechanism for the solution of analytically-tractable substructure in probabilistic programs, using conjugate priors and affine transformations to reduce variance in Monte Carlo estimators. For inference with Sequential Monte Carlo, this automatically yields improvements such as locally-optimal proposals and Rao-Blackwellization. The mechanism maintains a directed graph alongside the running program that evolves dynamically as operations are triggered upon it. Nodes of the graph represent random variables, edges the analytically-tractable relationships between them. Random variables remain in the graph for as long as possible, to be sampled only when they are used by the program in a way that cannot be resolved analytically. In the meantime, they are conditioned on as many observations as possible. We demonstrate the mechanism with a few pedagogical examples, as well as a linear-nonlinear state-space model with simulated data, and an epidemiological model with real data of a dengue outbreak in Micronesia. In all cases one or more variables are automatically marginalized out to significantly reduce variance in estimates of the marginal likelihood, in the final case facilitating a random-weight or pseudo-marginal-type importance sampler for parameter estimation. We have implemented the approach in Anglican and a new probabilistic programming language called Birch.
Tasks	Probabilistic Programming
Published	2017-08-25
URL	http://arxiv.org/abs/1708.07787v2
PDF	http://arxiv.org/pdf/1708.07787v2.pdf
PWC	https://paperswithcode.com/paper/delayed-sampling-and-automatic-rao
Repo	https://github.com/lawmurray/MultiObjectTracking
Framework	none

Factorization tricks for LSTM networks


Title	Factorization tricks for LSTM networks
Authors	Oleksii Kuchaiev, Boris Ginsburg
Abstract	We present two simple ways of reducing the number of parameters and accelerating the training of large Long Short-Term Memory (LSTM) networks: the first one is “matrix factorization by design” of LSTM matrix into the product of two smaller matrices, and the second one is partitioning of LSTM matrix, its inputs and states into the independent groups. Both approaches allow us to train large LSTM networks significantly faster to the near state-of the art perplexity while using significantly less RNN parameters.
Tasks	Language Modelling
Published	2017-03-31
URL	http://arxiv.org/abs/1703.10722v3
PDF	http://arxiv.org/pdf/1703.10722v3.pdf
PWC	https://paperswithcode.com/paper/factorization-tricks-for-lstm-networks
Repo	https://github.com/rdspring1/PyTorch_GBW_LM
Framework	pytorch

Learnable pooling with Context Gating for video classification


Title	Learnable pooling with Context Gating for video classification
Authors	Antoine Miech, Ivan Laptev, Josef Sivic
Abstract	Current methods for video analysis often extract frame-level features using pre-trained convolutional neural networks (CNNs). Such features are then aggregated over time e.g., by simple temporal averaging or more sophisticated recurrent neural networks such as long short-term memory (LSTM) or gated recurrent units (GRU). In this work we revise existing video representations and study alternative methods for temporal aggregation. We first explore clustering-based aggregation layers and propose a two-stream architecture aggregating audio and visual features. We then introduce a learnable non-linear unit, named Context Gating, aiming to model interdependencies among network activations. Our experimental results show the advantage of both improvements for the task of video classification. In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.
Tasks	Video Classification, Video Understanding
Published	2017-06-21
URL	http://arxiv.org/abs/1706.06905v2
PDF	http://arxiv.org/pdf/1706.06905v2.pdf
PWC	https://paperswithcode.com/paper/learnable-pooling-with-context-gating-for
Repo	https://github.com/antoine77340/LOUPE
Framework	tf

Reconfiguring the Imaging Pipeline for Computer Vision


Title	Reconfiguring the Imaging Pipeline for Computer Vision
Authors	Mark Buckler, Suren Jayasuriya, Adrian Sampson
Abstract	Advancements in deep learning have ignited an explosion of research on efficient hardware for embedded computer vision. Hardware vision acceleration, however, does not address the cost of capturing and processing the image data that feeds these algorithms. We examine the role of the image signal processing (ISP) pipeline in computer vision to identify opportunities to reduce computation and save energy. The key insight is that imaging pipelines should be designed to be configurable: to switch between a traditional photography mode and a low-power vision mode that produces lower-quality image data suitable only for computer vision. We use eight computer vision algorithms and a reversible pipeline simulation tool to study the imaging system’s impact on vision performance. For both CNN-based and classical vision algorithms, we observe that only two ISP stages, demosaicing and gamma compression, are critical for task performance. We propose a new image sensor design that can compensate for skipping these stages. The sensor design features an adjustable resolution and tunable analog-to-digital converters (ADCs). Our proposed imaging system’s vision mode disables the ISP entirely and configures the sensor to produce subsampled, lower-precision image data. This vision mode can save ~75% of the average energy of a baseline photography mode while having only a small impact on vision task accuracy.
Tasks	Demosaicking
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04352v3
PDF	http://arxiv.org/pdf/1705.04352v3.pdf
PWC	https://paperswithcode.com/paper/reconfiguring-the-imaging-pipeline-for
Repo	https://github.com/cucapra/approx-vision
Framework	none