May 7, 2019

2643 words 13 mins read

Paper Group AWR 79

Actor-critic versus direct policy search: a comparison based on sample complexity. Variational Graph Auto-Encoders. Linear Algebraic Structure of Word Senses, with Applications to Polysemy. Ancestral Causal Inference. Discovering Causal Signals in Images. Lecture Notes on Randomized Linear Algebra. What Is the Best Practice for CNNs Applied to Visu …

Actor-critic versus direct policy search: a comparison based on sample complexity


Title	Actor-critic versus direct policy search: a comparison based on sample complexity
Authors	Arnaud de Froissard de Broissia, Olivier Sigaud
Abstract	Sample efficiency is a critical property when optimizing policy parameters for the controller of a robot. In this paper, we evaluate two state-of-the-art policy optimization algorithms. One is a recent deep reinforcement learning method based on an actor-critic algorithm, Deep Deterministic Policy Gradient (DDPG), that has been shown to perform well on various control benchmarks. The other one is a direct policy search method, Covariance Matrix Adaptation Evolution Strategy (CMA-ES), a black-box optimization method that is widely used for robot learning. The algorithms are evaluated on a continuous version of the mountain car benchmark problem, so as to compare their sample complexity. From a preliminary analysis, we expect DDPG to be more sample efficient than CMA-ES, which is confirmed by our experimental results.
Tasks
Published	2016-06-29
URL	http://arxiv.org/abs/1606.09152v2
PDF	http://arxiv.org/pdf/1606.09152v2.pdf
PWC	https://paperswithcode.com/paper/actor-critic-versus-direct-policy-search-a
Repo	https://github.com/MOCR/DDPG
Framework	tf

Variational Graph Auto-Encoders


Title	Variational Graph Auto-Encoders
Authors	Thomas N. Kipf, Max Welling
Abstract	We introduce the variational graph auto-encoder (VGAE), a framework for unsupervised learning on graph-structured data based on the variational auto-encoder (VAE). This model makes use of latent variables and is capable of learning interpretable latent representations for undirected graphs. We demonstrate this model using a graph convolutional network (GCN) encoder and a simple inner product decoder. Our model achieves competitive results on a link prediction task in citation networks. In contrast to most existing models for unsupervised learning on graph-structured data and link prediction, our model can naturally incorporate node features, which significantly improves predictive performance on a number of benchmark datasets.
Tasks	Graph Clustering, Link Prediction
Published	2016-11-21
URL	http://arxiv.org/abs/1611.07308v1
PDF	http://arxiv.org/pdf/1611.07308v1.pdf
PWC	https://paperswithcode.com/paper/variational-graph-auto-encoders
Repo	https://github.com/tkipf/gae
Framework	tf

Linear Algebraic Structure of Word Senses, with Applications to Polysemy


Title	Linear Algebraic Structure of Word Senses, with Applications to Polysemy
Authors	Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski
Abstract	Word embeddings are ubiquitous in NLP and information retrieval, but it is unclear what they represent when the word is polysemous. Here it is shown that multiple word senses reside in linear superposition within the word embedding and simple sparse coding can recover vectors that approximately capture the senses. The success of our approach, which applies to several embedding methods, is mathematically explained using a variant of the random walk on discourses model (Arora et al., 2016). A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 “discourse atoms” that gives a succinct description of which other words co-occur with that word sense. Discourse atoms can be of independent interest, and make the method potentially more useful. Empirical tests are used to verify and support the theory.
Tasks	Information Retrieval, Word Embeddings
Published	2016-01-14
URL	http://arxiv.org/abs/1601.03764v6
PDF	http://arxiv.org/pdf/1601.03764v6.pdf
PWC	https://paperswithcode.com/paper/linear-algebraic-structure-of-word-senses
Repo	https://github.com/PrincetonML/SemanticVector
Framework	none

Ancestral Causal Inference


Title	Ancestral Causal Inference
Authors	Sara Magliacane, Tom Claassen, Joris M. Mooij
Abstract	Constraint-based causal discovery from limited data is a notoriously difficult challenge due to the many borderline independence test decisions. Several approaches to improve the reliability of the predictions by exploiting redundancy in the independence information have been proposed recently. Though promising, existing approaches can still be greatly improved in terms of accuracy and scalability. We present a novel method that reduces the combinatorial explosion of the search space by using a more coarse-grained representation of causal information, drastically reducing computation time. Additionally, we propose a method to score causal predictions based on their confidence. Crucially, our implementation also allows one to easily combine observational and interventional data and to incorporate various types of available background knowledge. We prove soundness and asymptotic consistency of our method and demonstrate that it can outperform the state-of-the-art on synthetic data, achieving a speedup of several orders of magnitude. We illustrate its practical feasibility by applying it on a challenging protein data set.
Tasks	Causal Discovery, Causal Inference
Published	2016-06-22
URL	http://arxiv.org/abs/1606.07035v3
PDF	http://arxiv.org/pdf/1606.07035v3.pdf
PWC	https://paperswithcode.com/paper/ancestral-causal-inference
Repo	https://github.com/caus-am/aci
Framework	none

Discovering Causal Signals in Images


Title	Discovering Causal Signals in Images
Authors	David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, Léon Bottou
Abstract	This paper establishes the existence of observable footprints that reveal the “causal dispositions” of the object categories appearing in collections of images. We achieve this goal in two steps. First, we take a learning approach to observational causal discovery, and build a classifier that achieves state-of-the-art performance on finding the causal direction between pairs of random variables, given samples from their joint distribution. Second, we use our causal direction classifier to effectively distinguish between features of objects and features of their contexts in collections of static images. Our experiments demonstrate the existence of a relation between the direction of causality and the difference between objects and their contexts, and by the same token, the existence of observable signals that reveal the causal dispositions of objects.
Tasks	Causal Discovery
Published	2016-05-26
URL	http://arxiv.org/abs/1605.08179v2
PDF	http://arxiv.org/pdf/1605.08179v2.pdf
PWC	https://paperswithcode.com/paper/discovering-causal-signals-in-images
Repo	https://github.com/kyrs/NCC-experiments
Framework	tf

Lecture Notes on Randomized Linear Algebra


Title	Lecture Notes on Randomized Linear Algebra
Authors	Michael W. Mahoney
Abstract	These are lecture notes that are based on the lectures from a class I taught on the topic of Randomized Linear Algebra (RLA) at UC Berkeley during the Fall 2013 semester.
Tasks
Published	2016-08-16
URL	http://arxiv.org/abs/1608.04481v1
PDF	http://arxiv.org/pdf/1608.04481v1.pdf
PWC	https://paperswithcode.com/paper/lecture-notes-on-randomized-linear-algebra
Repo	https://github.com/NumericalMax/Randomized-Matrix-Product
Framework	none

What Is the Best Practice for CNNs Applied to Visual Instance Retrieval?


Title	What Is the Best Practice for CNNs Applied to Visual Instance Retrieval?
Authors	Jiedong Hao, Jing Dong, Wei Wang, Tieniu Tan
Abstract	Previous work has shown that feature maps of deep convolutional neural networks (CNNs) can be interpreted as feature representation of a particular image region. Features aggregated from these feature maps have been exploited for image retrieval tasks and achieved state-of-the-art performances in recent years. The key to the success of such methods is the feature representation. However, the different factors that impact the effectiveness of features are still not explored thoroughly. There are much less discussion about the best combination of them. The main contribution of our paper is the thorough evaluations of the various factors that affect the discriminative ability of the features extracted from CNNs. Based on the evaluation results, we also identify the best choices for different factors and propose a new multi-scale image feature representation method to encode the image effectively. Finally, we show that the proposed method generalises well and outperforms the state-of-the-art methods on four typical datasets used for visual instance retrieval.
Tasks	Image Retrieval
Published	2016-11-05
URL	http://arxiv.org/abs/1611.01640v1
PDF	http://arxiv.org/pdf/1611.01640v1.pdf
PWC	https://paperswithcode.com/paper/what-is-the-best-practice-for-cnns-applied-to
Repo	https://github.com/hbwang1427/image_retrieval
Framework	none

Learning Scalable Deep Kernels with Recurrent Structure


Title	Learning Scalable Deep Kernels with Recurrent Structure
Authors	Maruan Al-Shedivat, Andrew Gordon Wilson, Yunus Saatchi, Zhiting Hu, Eric P. Xing
Abstract	Many applications in speech, robotics, finance, and biology deal with sequential data, where ordering matters and recurrent structures are common. However, this structure cannot be easily captured by standard kernel functions. To model such structure, we propose expressive closed-form kernel functions for Gaussian processes. The resulting model, GP-LSTM, fully encapsulates the inductive biases of long short-term memory (LSTM) recurrent networks, while retaining the non-parametric probabilistic advantages of Gaussian processes. We learn the properties of the proposed kernels by optimizing the Gaussian process marginal likelihood using a new provably convergent semi-stochastic gradient procedure and exploit the structure of these kernels for scalable training and prediction. This approach provides a practical representation for Bayesian LSTMs. We demonstrate state-of-the-art performance on several benchmarks, and thoroughly investigate a consequential autonomous driving application, where the predictive uncertainties provided by GP-LSTM are uniquely valuable.
Tasks	Autonomous Driving, Gaussian Processes, Smart Grid Prediction
Published	2016-10-27
URL	http://arxiv.org/abs/1610.08936v3
PDF	http://arxiv.org/pdf/1610.08936v3.pdf
PWC	https://paperswithcode.com/paper/learning-scalable-deep-kernels-with-recurrent
Repo	https://github.com/alshedivat/keras-gp
Framework	tf

Constructing a Natural Language Inference Dataset using Generative Neural Networks


Title	Constructing a Natural Language Inference Dataset using Generative Neural Networks
Authors	Janez Starc, Dunja Mladenić
Abstract	Natural Language Inference is an important task for Natural Language Understanding. It is concerned with classifying the logical relation between two sentences. In this paper, we propose several text generative neural networks for generating text hypothesis, which allows construction of new Natural Language Inference datasets. To evaluate the models, we propose a new metric – the accuracy of the classifier trained on the generated dataset. The accuracy obtained by our best generative model is only 2.7% lower than the accuracy of the classifier trained on the original, human crafted dataset. Furthermore, the best generated dataset combined with the original dataset achieves the highest accuracy. The best model learns a mapping embedding for each training example. By comparing various metrics we show that datasets that obtain higher ROUGE or METEOR scores do not necessarily yield higher classification accuracies. We also provide analysis of what are the characteristics of a good dataset including the distinguishability of the generated datasets from the original one.
Tasks	Natural Language Inference
Published	2016-07-20
URL	http://arxiv.org/abs/1607.06025v2
PDF	http://arxiv.org/pdf/1607.06025v2.pdf
PWC	https://paperswithcode.com/paper/constructing-a-natural-language-inference
Repo	https://github.com/jstarc/nli_generation
Framework	none

Rationalizing Neural Predictions


Title	Rationalizing Neural Predictions
Authors	Tao Lei, Regina Barzilay, Tommi Jaakkola
Abstract	Prediction without justification has limited applicability. As a remedy, we learn to extract pieces of input text as justifications – rationales – that are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by desiderata for rationales. We evaluate the approach on multi-aspect sentiment analysis against manually annotated test cases. Our approach outperforms attention-based baseline by a significant margin. We also successfully illustrate the method on the question retrieval task.
Tasks	Sentiment Analysis
Published	2016-06-13
URL	http://arxiv.org/abs/1606.04155v2
PDF	http://arxiv.org/pdf/1606.04155v2.pdf
PWC	https://paperswithcode.com/paper/rationalizing-neural-predictions
Repo	https://github.com/Gorov/three_player_for_emnlp
Framework	pytorch

Connecting Generative Adversarial Networks and Actor-Critic Methods


Title	Connecting Generative Adversarial Networks and Actor-Critic Methods
Authors	David Pfau, Oriol Vinyals
Abstract	Both generative adversarial networks (GAN) in unsupervised learning and actor-critic methods in reinforcement learning (RL) have gained a reputation for being difficult to optimize. Practitioners in both fields have amassed a large number of strategies to mitigate these instabilities and improve training. Here we show that GANs can be viewed as actor-critic methods in an environment where the actor cannot affect the reward. We review the strategies for stabilizing training for each class of models, both those that generalize between the two and those that are particular to that model. We also review a number of extensions to GANs and RL algorithms with even more complicated information flow. We hope that by highlighting this formal connection we will encourage both GAN and RL communities to develop general, scalable, and stable algorithms for multilevel optimization with deep networks, and to draw inspiration across communities.
Tasks
Published	2016-10-06
URL	http://arxiv.org/abs/1610.01945v3
PDF	http://arxiv.org/pdf/1610.01945v3.pdf
PWC	https://paperswithcode.com/paper/connecting-generative-adversarial-networks
Repo	https://github.com/170928/Multi-Reinforcement-Learning-Study-List
Framework	none

Transfer String Kernel for Cross-Context DNA-Protein Binding Prediction


Title	Transfer String Kernel for Cross-Context DNA-Protein Binding Prediction
Authors	Ritambhara Singh, Jack Lanchantin, Gabriel Robins, Yanjun Qi
Abstract	Through sequence-based classification, this paper tries to accurately predict the DNA binding sites of transcription factors (TFs) in an unannotated cellular context. Related methods in the literature fail to perform such predictions accurately, since they do not consider sample distribution shift of sequence segments from an annotated (source) context to an unannotated (target) context. We, therefore, propose a method called “Transfer String Kernel” (TSK) that achieves improved prediction of transcription factor binding site (TFBS) using knowledge transfer via cross-context sample adaptation. TSK maps sequence segments to a high-dimensional feature space using a discriminative mismatch string kernel framework. In this high-dimensional space, labeled examples of the source context are re-weighted so that the revised sample distribution matches the target context more closely. We have experimentally verified TSK for TFBS identifications on fourteen different TFs under a cross-organism setting. We find that TSK consistently outperforms the state-of the-art TFBS tools, especially when working with TFs whose binding sequences are not conserved across contexts. We also demonstrate the generalizability of TSK by showing its cutting-edge performance on a different set of cross-context tasks for the MHC peptide binding predictions.
Tasks	Transfer Learning
Published	2016-09-12
URL	http://arxiv.org/abs/1609.03490v1
PDF	http://arxiv.org/pdf/1609.03490v1.pdf
PWC	https://paperswithcode.com/paper/transfer-string-kernel-for-cross-context-dna
Repo	https://github.com/QData/TransferStringKernel
Framework	none

DeepDiary: Automatic Caption Generation for Lifelogging Image Streams


Title	DeepDiary: Automatic Caption Generation for Lifelogging Image Streams
Authors	Chenyou Fan, David J. Crandall
Abstract	Lifelogging cameras capture everyday life from a first-person perspective, but generate so much data that it is hard for users to browse and organize their image collections effectively. In this paper, we propose to use automatic image captioning algorithms to generate textual representations of these collections. We develop and explore novel techniques based on deep learning to generate captions for both individual images and image streams, using temporal consistency constraints to create summaries that are both more compact and less noisy. We evaluate our techniques with quantitative and qualitative results, and apply captioning to an image retrieval application for finding potentially private images. Our results suggest that our automatic captioning algorithms, while imperfect, may work well enough to help users manage lifelogging photo collections.
Tasks	Image Captioning, Image Retrieval
Published	2016-08-12
URL	http://arxiv.org/abs/1608.03819v1
PDF	http://arxiv.org/pdf/1608.03819v1.pdf
PWC	https://paperswithcode.com/paper/deepdiary-automatic-caption-generation-for
Repo	https://github.com/fanchenyou/deepdiary
Framework	none

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks


Title	Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
Authors	Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, Yoav Goldberg
Abstract	There is a lot of research interest in encoding variable length sentences into fixed length vectors, in a way that preserves the sentence meanings. Two common methods include representations based on averaging word vectors, and representations based on the hidden states of recurrent neural networks such as LSTMs. The sentence vectors are used as features for subsequent machine learning tasks or for pre-training in the context of deep learning. However, not much is known about the properties that are encoded in these sentence representations and about the language information they capture. We propose a framework that facilitates better understanding of the encoded representations. We define prediction tasks around isolated aspects of sentence structure (namely sentence length, word content, and word order), and score representations by the ability to train a classifier to solve each prediction task when using the representation as input. We demonstrate the potential contribution of the approach by analyzing different sentence representation mechanisms. The analysis sheds light on the relative strengths of different sentence embedding methods with respect to these low level prediction tasks, and on the effect of the encoded vector’s dimensionality on the resulting representations.
Tasks	Sentence Embedding, Sentence Embeddings
Published	2016-08-15
URL	http://arxiv.org/abs/1608.04207v3
PDF	http://arxiv.org/pdf/1608.04207v3.pdf
PWC	https://paperswithcode.com/paper/fine-grained-analysis-of-sentence-embeddings
Repo	https://github.com/facebookresearch/InferSent
Framework	pytorch

Finding Alternate Features in Lasso


Title	Finding Alternate Features in Lasso
Authors	Satoshi Hara, Takanori Maehara
Abstract	We propose a method for finding alternate features missing in the Lasso optimal solution. In ordinary Lasso problem, one global optimum is obtained and the resulting features are interpreted as task-relevant features. However, this can overlook possibly relevant features not selected by the Lasso. With the proposed method, we can provide not only the Lasso optimal solution but also possible alternate features to the Lasso solution. We show that such alternate features can be computed efficiently by avoiding redundant computations. We also demonstrate how the proposed method works in the 20 newsgroup data, which shows that reasonable features are found as alternate features.
Tasks
Published	2016-11-18
URL	http://arxiv.org/abs/1611.05940v2
PDF	http://arxiv.org/pdf/1611.05940v2.pdf
PWC	https://paperswithcode.com/paper/finding-alternate-features-in-lasso
Repo	https://github.com/sato9hara/LassoVariants
Framework	none