October 20, 2019

3084 words 15 mins read

Paper Group AWR 204

Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks. Deterministic Implementations for Reproducibility in Deep Reinforcement Learning. Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering. Identifying Sources and Sinks in the Presence of Multiple Agents with Gaussian Process Vec …

Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks


Title	Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks
Authors	Fabio Pardo, Vitaly Levdik, Petar Kormushev
Abstract	Being able to reach any desired location in the environment can be a valuable asset for an agent. Learning a policy to navigate between all pairs of states individually is often not feasible. An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallel limited the approach to small tabular cases so far. To tackle this problem we propose to use convolutional network architectures to generate Q-values and updates for a large number of goals at once. We demonstrate the accuracy and generalization qualities of the proposed method on randomly generated mazes and Sokoban puzzles. In the case of on-screen goal coordinates the resulting mapping from frames to distance-maps directly informs the agent about which places are reachable and in how many steps. As an example of application we show that replacing the random actions in epsilon-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma’s Revenge and Super Mario All-Stars games.
Tasks	Montezuma’s Revenge, Q-Learning, SNES Games
Published	2018-10-06
URL	https://arxiv.org/abs/1810.02927v2
PDF	https://arxiv.org/pdf/1810.02927v2.pdf
PWC	https://paperswithcode.com/paper/q-map-a-convolutional-approach-for-goal
Repo	https://github.com/yl3829/Q-map
Framework	tf

Deterministic Implementations for Reproducibility in Deep Reinforcement Learning


Title	Deterministic Implementations for Reproducibility in Deep Reinforcement Learning
Authors	Prabhat Nagarajan, Garrett Warnell, Peter Stone
Abstract	While deep reinforcement learning (DRL) has led to numerous successes in recent years, reproducing these successes can be extremely challenging. One reproducibility challenge particularly relevant to DRL is nondeterminism in the training process, which can substantially affect the results. Motivated by this challenge, we study the positive impacts of deterministic implementations in eliminating nondeterminism in training. To do so, we consider the particular case of the deep Q-learning algorithm, for which we produce a deterministic implementation by identifying and controlling all sources of nondeterminism in the training process. One by one, we then allow individual sources of nondeterminism to affect our otherwise deterministic implementation, and measure the impact of each source on the variance in performance. We find that individual sources of nondeterminism can substantially impact the performance of agent, illustrating the benefits of deterministic implementations. In addition, we also discuss the important role of deterministic implementations in achieving exact replicability of results.
Tasks	Q-Learning
Published	2018-09-15
URL	https://arxiv.org/abs/1809.05676v5
PDF	https://arxiv.org/pdf/1809.05676v5.pdf
PWC	https://paperswithcode.com/paper/deterministic-implementations-for
Repo	https://github.com/prabhatnagarajan/repro_dqn
Framework	pytorch

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering


Title	Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering
Authors	Zhe Wang, Xiaoyi Liu, Liangjian Chen, Limin Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes
Abstract	Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision. Despite much recent progress, general VQA is far from a solved problem. In this paper, we focus on the VQA multiple-choice task, and provide some good practices for designing an effective VQA model that can capture language-vision interactions and perform joint reasoning. We explore mechanisms of incorporating part-of-speech (POS) tag guided attention, convolutional n-grams, triplet attention interactions between the image, question and candidate answer, and structured learning for triplets based on image-question pairs. We evaluate our models on two popular datasets: Visual7W and VQA Real Multiple Choice. Our final model achieves the state-of-the-art performance of 68.2% on Visual7W, and a very competitive performance of 69.6% on the test-standard split of VQA Real Multiple Choice.
Tasks	Question Answering, Visual Question Answering
Published	2018-01-24
URL	http://arxiv.org/abs/1801.07853v1
PDF	http://arxiv.org/pdf/1801.07853v1.pdf
PWC	https://paperswithcode.com/paper/structured-triplet-learning-with-pos-tag
Repo	https://github.com/wangzheallen/STL-VQA
Framework	tf

Identifying Sources and Sinks in the Presence of Multiple Agents with Gaussian Process Vector Calculus


Title	Identifying Sources and Sinks in the Presence of Multiple Agents with Gaussian Process Vector Calculus
Authors	Adam D. Cobb, Richard Everett, Andrew Markham, Stephen J. Roberts
Abstract	In systems of multiple agents, identifying the cause of observed agent dynamics is challenging. Often, these agents operate in diverse, non-stationary environments, where models rely on hand-crafted environment-specific features to infer influential regions in the system’s surroundings. To overcome the limitations of these inflexible models, we present GP-LAPLACE, a technique for locating sources and sinks from trajectories in time-varying fields. Using Gaussian processes, we jointly infer a spatio-temporal vector field, as well as canonical vector calculus operations on that field. Notably, we do this from only agent trajectories without requiring knowledge of the environment, and also obtain a metric for denoting the significance of inferred causal features in the environment by exploiting our probabilistic method. To evaluate our approach, we apply it to both synthetic and real-world GPS data, demonstrating the applicability of our technique in the presence of multiple agents, as well as its superiority over existing methods.
Tasks	Gaussian Processes
Published	2018-02-22
URL	http://arxiv.org/abs/1802.10446v2
PDF	http://arxiv.org/pdf/1802.10446v2.pdf
PWC	https://paperswithcode.com/paper/identifying-sources-and-sinks-in-the-presence
Repo	https://github.com/AdamCobb/GP-LAPLACE
Framework	tf

An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation


Title	An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation
Authors	Liangchen Luo, Jingjing Xu, Junyang Lin, Qi Zeng, Xu Sun
Abstract	Generating semantically coherent responses is still a major challenge in dialogue generation. Different from conventional text generation tasks, the mapping between inputs and responses in conversations is more complicated, which highly demands the understanding of utterance-level semantic dependency, a relation between the whole meanings of inputs and outputs. To address this problem, we propose an Auto-Encoder Matching (AEM) model to learn such dependency. The model contains two auto-encoders and one mapping module. The auto-encoders learn the semantic representations of inputs and responses, and the mapping module learns to connect the utterance-level representations. Experimental results from automatic and human evaluations demonstrate that our model is capable of generating responses of high coherence and fluency compared to baseline models. The code is available at https://github.com/lancopku/AMM
Tasks	Dialogue Generation, Text Generation
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08795v1
PDF	http://arxiv.org/pdf/1808.08795v1.pdf
PWC	https://paperswithcode.com/paper/an-auto-encoder-matching-model-for-learning
Repo	https://github.com/lancopku/AMM
Framework	tf

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems


Title	Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Authors	Bing Liu, Gokhan Tur, Dilek Hakkani-Tur, Pararth Shah, Larry Heck
Abstract	In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent’s capability in successfully completing a task.
Tasks	Dialogue State Tracking, Imitation Learning, Task-Oriented Dialogue Systems
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06512v1
PDF	http://arxiv.org/pdf/1804.06512v1.pdf
PWC	https://paperswithcode.com/paper/dialogue-learning-with-human-teaching-and
Repo	https://github.com/google-research-datasets/simulated-dialogue
Framework	none

Robustness Guarantees for Bayesian Inference with Gaussian Processes


Title	Robustness Guarantees for Bayesian Inference with Gaussian Processes
Authors	Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, Andrea Patane
Abstract	Bayesian inference and Gaussian processes are widely used in applications ranging from robotics and control to biological systems. Many of these applications are safety-critical and require a characterization of the uncertainty associated with the learning model and formal guarantees on its predictions. In this paper we define a robustness measure for Bayesian inference against input perturbations, given by the probability that, for a test point and a compact set in the input space containing the test point, the prediction of the learning model will remain $\delta-$close for all the points in the set, for $\delta>0.$ Such measures can be used to provide formal guarantees for the absence of adversarial examples. By employing the theory of Gaussian processes, we derive tight upper bounds on the resulting robustness by utilising the Borell-TIS inequality, and propose algorithms for their computation. We evaluate our techniques on two examples, a GP regression problem and a fully-connected deep neural network, where we rely on weak convergence to GPs to study adversarial examples on the MNIST dataset.
Tasks	Bayesian Inference, Gaussian Processes
Published	2018-09-17
URL	http://arxiv.org/abs/1809.06452v2
PDF	http://arxiv.org/pdf/1809.06452v2.pdf
PWC	https://paperswithcode.com/paper/robustness-guarantees-for-bayesian-inference
Repo	https://github.com/andreapatane/checkGP
Framework	none

An accurate retrieval through R-MAC+ descriptors for landmark recognition


Title	An accurate retrieval through R-MAC+ descriptors for landmark recognition
Authors	Federico Magliani, Andrea Prati
Abstract	The landmark recognition problem is far from being solved, but with the use of features extracted from intermediate layers of Convolutional Neural Networks (CNNs), excellent results have been obtained. In this work, we propose some improvements on the creation of R-MAC descriptors in order to make the newly-proposed R-MAC+ descriptors more representative than the previous ones. However, the main contribution of this paper is a novel retrieval technique, that exploits the fine representativeness of the MAC descriptors of the database images. Using this descriptors called “db regions” during the retrieval stage, the performance is greatly improved. The proposed method is tested on different public datasets: Oxford5k, Paris6k and Holidays. It outperforms the state-of-the- art results on Holidays and reached excellent results on Oxford5k and Paris6k, overcame only by approaches based on fine-tuning strategies.
Tasks
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08565v1
PDF	http://arxiv.org/pdf/1806.08565v1.pdf
PWC	https://paperswithcode.com/paper/an-accurate-retrieval-through-r-mac
Repo	https://github.com/fmaglia/keras_rmac_plus
Framework	tf

Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference


Title	Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference
Authors	Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, Gerald Tesauro
Abstract	Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
Tasks	Continual Learning, Meta-Learning
Published	2018-10-29
URL	https://arxiv.org/abs/1810.11910v3
PDF	https://arxiv.org/pdf/1810.11910v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-learn-without-forgetting-by
Repo	https://github.com/mattriemer/mer
Framework	pytorch

Adversarially Learned Anomaly Detection


Title	Adversarially Learned Anomaly Detection
Authors	Houssam Zenati, Manon Romain, Chuan Sheng Foo, Bruno Lecouat, Vijay Ramaseshan Chandrasekhar
Abstract	Anomaly detection is a significant and hence well-studied problem. However, developing effective anomaly detection methods for complex and high-dimensional data remains a challenge. As Generative Adversarial Networks (GANs) are able to model the complex high-dimensional distributions of real-world data, they offer a promising approach to address this challenge. In this work, we propose an anomaly detection method, Adversarially Learned Anomaly Detection (ALAD) based on bi-directional GANs, that derives adversarially learned features for the anomaly detection task. ALAD then uses reconstruction errors based on these adversarially learned features to determine if a data sample is anomalous. ALAD builds on recent advances to ensure data-space and latent-space cycle-consistencies and stabilize GAN training, which results in significantly improved anomaly detection performance. ALAD achieves state-of-the-art performance on a range of image and tabular datasets while being several hundred-fold faster at test time than the only published GAN-based method.
Tasks	Anomaly Detection
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02288v1
PDF	http://arxiv.org/pdf/1812.02288v1.pdf
PWC	https://paperswithcode.com/paper/adversarially-learned-anomaly-detection
Repo	https://github.com/houssamzenati/Efficient-GAN-Anomaly-Detection
Framework	tf

q-Space Novelty Detection with Variational Autoencoders


Title	q-Space Novelty Detection with Variational Autoencoders
Authors	Aleksei Vasilev, Vladimir Golkov, Marc Meissner, Ilona Lipp, Eleonora Sgarlata, Valentina Tomassini, Derek K. Jones, Daniel Cremers
Abstract	In machine learning, novelty detection is the task of identifying novel unseen data. During training, only samples from the normal class are available. Test samples are classified as normal or abnormal by assignment of a novelty score. Here we propose novelty detection methods based on training variational autoencoders (VAEs) on normal data. Since abnormal samples are not used during training, we define novelty metrics based on the (partially complementary) assumptions that the VAE is less capable of reconstructing abnormal samples well; that abnormal samples more strongly violate the VAE regularizer; and that abnormal samples differ from normal samples not only in input-feature space, but also in the VAE latent space and VAE output. These approaches, combined with various possibilities of using (e.g. sampling) the probabilistic VAE to obtain scalar novelty scores, yield a large family of methods. We apply these methods to magnetic resonance imaging, namely to the detection of diffusion-space (q-space) abnormalities in diffusion MRI scans of multiple sclerosis patients, i.e. to detect multiple sclerosis lesions without using any lesion labels for training. Many of our methods outperform previously proposed q-space novelty detection methods. We also evaluate the proposed methods on the MNIST handwritten digits dataset and show that many of them are able to outperform the state of the art.
Tasks
Published	2018-06-08
URL	http://arxiv.org/abs/1806.02997v2
PDF	http://arxiv.org/pdf/1806.02997v2.pdf
PWC	https://paperswithcode.com/paper/q-space-novelty-detection-with-variational
Repo	https://github.com/VAlex22/ND_VAE
Framework	none

Goal-based Course Recommendation


Title	Goal-based Course Recommendation
Authors	Weijie Jiang, Zachary A. Pardos, Qiang Wei
Abstract	With cross-disciplinary academic interests increasing and academic advising resources over capacity, the importance of exploring data-assisted methods to support student decision making has never been higher. We build on the findings and methodologies of a quickly developing literature around prediction and recommendation in higher education and develop a novel recurrent neural network-based recommendation system for suggesting courses to help students prepare for target courses of interest, personalized to their estimated prior knowledge background and zone of proximal development. We validate the model using tests of grade prediction and the ability to recover prerequisite relationships articulated by the university. In the third validation, we run the fully personalized recommendation for students the semester before taking a historically difficult course and observe differential overlap with our would-be suggestions. While not proof of causal effectiveness, these three evaluation perspectives on the performance of the goal-based model build confidence and bring us one step closer to deployment of this personalized course preparation affordance in the wild.
Tasks	Decision Making
Published	2018-12-25
URL	http://arxiv.org/abs/1812.10078v1
PDF	http://arxiv.org/pdf/1812.10078v1.pdf
PWC	https://paperswithcode.com/paper/goal-based-course-recommendation
Repo	https://github.com/CAHLR/goal-based-recommendation
Framework	pytorch

Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis


Title	Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis
Authors	Andrew Moore, Paul Rayson
Abstract	Lack of repeatability and generalisability are two significant threats to continuing scientific development in Natural Language Processing. Language models and learning methods are so complex that scientific conference papers no longer contain enough space for the technical depth required for replication or reproduction. Taking Target Dependent Sentiment Analysis as a case study, we show how recent work in the field has not consistently released code, or described settings for learning methods in enough detail, and lacks comparability and generalisability in train, test or validation data. To investigate generalisability and to enable state of the art comparative evaluations, we carry out the first reproduction studies of three groups of complementary methods and perform the first large-scale mass evaluation on six different English datasets. Reflecting on our experiences, we recommend that future replication or reproduction experiments should always consider a variety of datasets alongside documenting and releasing their methods and published code in order to minimise the barriers to both repeatability and generalisability. We have released our code with a model zoo on GitHub with Jupyter Notebooks to aid understanding and full documentation, and we recommend that others do the same with their papers at submission time through an anonymised GitHub account.
Tasks	Sentiment Analysis
Published	2018-06-13
URL	http://arxiv.org/abs/1806.05219v2
PDF	http://arxiv.org/pdf/1806.05219v2.pdf
PWC	https://paperswithcode.com/paper/bringing-replication-and-reproduction
Repo	https://github.com/apmoore1/Bella
Framework	none

Learning To Split and Rephrase From Wikipedia Edit History


Title	Learning To Split and Rephrase From Wikipedia Edit History
Authors	Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das
Abstract	Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia’s edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task. Incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark.
Tasks
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09468v1
PDF	http://arxiv.org/pdf/1808.09468v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-split-and-rephrase-from-wikipedia
Repo	https://github.com/google-research-datasets/wiki-split
Framework	none

Orthogonal Random Forest for Causal Inference


Title	Orthogonal Random Forest for Causal Inference
Authors	Miruna Oprescu, Vasilis Syrgkanis, Zhiwei Steven Wu
Abstract	We propose the orthogonal random forest, an algorithm that combines Neyman-orthogonality to reduce sensitivity with respect to estimation error of nuisance parameters with generalized random forests (Athey et al., 2017)–a flexible non-parametric method for statistical estimation of conditional moment models using random forests. We provide a consistency rate and establish asymptotic normality for our estimator. We show that under mild assumptions on the consistency rate of the nuisance estimator, we can achieve the same error rate as an oracle with a priori knowledge of these nuisance parameters. We show that when the nuisance functions have a locally sparse parametrization, then a local $\ell_1$-penalized regression achieves the required rate. We apply our method to estimate heterogeneous treatment effects from observational data with discrete treatments or continuous treatments, and we show that, unlike prior work, our method provably allows to control for a high-dimensional set of variables under standard sparsity conditions. We also provide a comprehensive empirical evaluation of our algorithm on both synthetic and real data.
Tasks	Causal Inference
Published	2018-06-09
URL	https://arxiv.org/abs/1806.03467v4
PDF	https://arxiv.org/pdf/1806.03467v4.pdf
PWC	https://paperswithcode.com/paper/orthogonal-random-forest-for-causal-inference
Repo	https://github.com/Microsoft/EconML
Framework	none