Paper Group AWR 204
Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks. Deterministic Implementations for Reproducibility in Deep Reinforcement Learning. Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering. Identifying Sources and Sinks in the Presence of Multiple Agents with Gaussian Process Vec …
Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks
Title | Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks |
Authors | Fabio Pardo, Vitaly Levdik, Petar Kormushev |
Abstract | Being able to reach any desired location in the environment can be a valuable asset for an agent. Learning a policy to navigate between all pairs of states individually is often not feasible. An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallel limited the approach to small tabular cases so far. To tackle this problem we propose to use convolutional network architectures to generate Q-values and updates for a large number of goals at once. We demonstrate the accuracy and generalization qualities of the proposed method on randomly generated mazes and Sokoban puzzles. In the case of on-screen goal coordinates the resulting mapping from frames to distance-maps directly informs the agent about which places are reachable and in how many steps. As an example of application we show that replacing the random actions in epsilon-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma’s Revenge and Super Mario All-Stars games. |
Tasks | Montezuma’s Revenge, Q-Learning, SNES Games |
Published | 2018-10-06 |
URL | https://arxiv.org/abs/1810.02927v2 |
https://arxiv.org/pdf/1810.02927v2.pdf | |
PWC | https://paperswithcode.com/paper/q-map-a-convolutional-approach-for-goal |
Repo | https://github.com/yl3829/Q-map |
Framework | tf |
Deterministic Implementations for Reproducibility in Deep Reinforcement Learning
Title | Deterministic Implementations for Reproducibility in Deep Reinforcement Learning |
Authors | Prabhat Nagarajan, Garrett Warnell, Peter Stone |
Abstract | While deep reinforcement learning (DRL) has led to numerous successes in recent years, reproducing these successes can be extremely challenging. One reproducibility challenge particularly relevant to DRL is nondeterminism in the training process, which can substantially affect the results. Motivated by this challenge, we study the positive impacts of deterministic implementations in eliminating nondeterminism in training. To do so, we consider the particular case of the deep Q-learning algorithm, for which we produce a deterministic implementation by identifying and controlling all sources of nondeterminism in the training process. One by one, we then allow individual sources of nondeterminism to affect our otherwise deterministic implementation, and measure the impact of each source on the variance in performance. We find that individual sources of nondeterminism can substantially impact the performance of agent, illustrating the benefits of deterministic implementations. In addition, we also discuss the important role of deterministic implementations in achieving exact replicability of results. |
Tasks | Q-Learning |
Published | 2018-09-15 |
URL | https://arxiv.org/abs/1809.05676v5 |
https://arxiv.org/pdf/1809.05676v5.pdf | |
PWC | https://paperswithcode.com/paper/deterministic-implementations-for |
Repo | https://github.com/prabhatnagarajan/repro_dqn |
Framework | pytorch |
Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering
Title | Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering |
Authors | Zhe Wang, Xiaoyi Liu, Liangjian Chen, Limin Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes |
Abstract | Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision. Despite much recent progress, general VQA is far from a solved problem. In this paper, we focus on the VQA multiple-choice task, and provide some good practices for designing an effective VQA model that can capture language-vision interactions and perform joint reasoning. We explore mechanisms of incorporating part-of-speech (POS) tag guided attention, convolutional n-grams, triplet attention interactions between the image, question and candidate answer, and structured learning for triplets based on image-question pairs. We evaluate our models on two popular datasets: Visual7W and VQA Real Multiple Choice. Our final model achieves the state-of-the-art performance of 68.2% on Visual7W, and a very competitive performance of 69.6% on the test-standard split of VQA Real Multiple Choice. |
Tasks | Question Answering, Visual Question Answering |
Published | 2018-01-24 |
URL | http://arxiv.org/abs/1801.07853v1 |
http://arxiv.org/pdf/1801.07853v1.pdf | |
PWC | https://paperswithcode.com/paper/structured-triplet-learning-with-pos-tag |
Repo | https://github.com/wangzheallen/STL-VQA |
Framework | tf |
Identifying Sources and Sinks in the Presence of Multiple Agents with Gaussian Process Vector Calculus
Title | Identifying Sources and Sinks in the Presence of Multiple Agents with Gaussian Process Vector Calculus |
Authors | Adam D. Cobb, Richard Everett, Andrew Markham, Stephen J. Roberts |
Abstract | In systems of multiple agents, identifying the cause of observed agent dynamics is challenging. Often, these agents operate in diverse, non-stationary environments, where models rely on hand-crafted environment-specific features to infer influential regions in the system’s surroundings. To overcome the limitations of these inflexible models, we present GP-LAPLACE, a technique for locating sources and sinks from trajectories in time-varying fields. Using Gaussian processes, we jointly infer a spatio-temporal vector field, as well as canonical vector calculus operations on that field. Notably, we do this from only agent trajectories without requiring knowledge of the environment, and also obtain a metric for denoting the significance of inferred causal features in the environment by exploiting our probabilistic method. To evaluate our approach, we apply it to both synthetic and real-world GPS data, demonstrating the applicability of our technique in the presence of multiple agents, as well as its superiority over existing methods. |
Tasks | Gaussian Processes |
Published | 2018-02-22 |
URL | http://arxiv.org/abs/1802.10446v2 |
http://arxiv.org/pdf/1802.10446v2.pdf | |
PWC | https://paperswithcode.com/paper/identifying-sources-and-sinks-in-the-presence |
Repo | https://github.com/AdamCobb/GP-LAPLACE |
Framework | tf |
An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation
Title | An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation |
Authors | Liangchen Luo, Jingjing Xu, Junyang Lin, Qi Zeng, Xu Sun |
Abstract | Generating semantically coherent responses is still a major challenge in dialogue generation. Different from conventional text generation tasks, the mapping between inputs and responses in conversations is more complicated, which highly demands the understanding of utterance-level semantic dependency, a relation between the whole meanings of inputs and outputs. To address this problem, we propose an Auto-Encoder Matching (AEM) model to learn such dependency. The model contains two auto-encoders and one mapping module. The auto-encoders learn the semantic representations of inputs and responses, and the mapping module learns to connect the utterance-level representations. Experimental results from automatic and human evaluations demonstrate that our model is capable of generating responses of high coherence and fluency compared to baseline models. The code is available at https://github.com/lancopku/AMM |
Tasks | Dialogue Generation, Text Generation |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08795v1 |
http://arxiv.org/pdf/1808.08795v1.pdf | |
PWC | https://paperswithcode.com/paper/an-auto-encoder-matching-model-for-learning |
Repo | https://github.com/lancopku/AMM |
Framework | tf |
Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Title | Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems |
Authors | Bing Liu, Gokhan Tur, Dilek Hakkani-Tur, Pararth Shah, Larry Heck |
Abstract | In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent’s capability in successfully completing a task. |
Tasks | Dialogue State Tracking, Imitation Learning, Task-Oriented Dialogue Systems |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06512v1 |
http://arxiv.org/pdf/1804.06512v1.pdf | |
PWC | https://paperswithcode.com/paper/dialogue-learning-with-human-teaching-and |
Repo | https://github.com/google-research-datasets/simulated-dialogue |
Framework | none |
Robustness Guarantees for Bayesian Inference with Gaussian Processes
Title | Robustness Guarantees for Bayesian Inference with Gaussian Processes |
Authors | Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, Andrea Patane |
Abstract | Bayesian inference and Gaussian processes are widely used in applications ranging from robotics and control to biological systems. Many of these applications are safety-critical and require a characterization of the uncertainty associated with the learning model and formal guarantees on its predictions. In this paper we define a robustness measure for Bayesian inference against input perturbations, given by the probability that, for a test point and a compact set in the input space containing the test point, the prediction of the learning model will remain $\delta-$close for all the points in the set, for $\delta>0.$ Such measures can be used to provide formal guarantees for the absence of adversarial examples. By employing the theory of Gaussian processes, we derive tight upper bounds on the resulting robustness by utilising the Borell-TIS inequality, and propose algorithms for their computation. We evaluate our techniques on two examples, a GP regression problem and a fully-connected deep neural network, where we rely on weak convergence to GPs to study adversarial examples on the MNIST dataset. |
Tasks | Bayesian Inference, Gaussian Processes |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06452v2 |
http://arxiv.org/pdf/1809.06452v2.pdf | |
PWC | https://paperswithcode.com/paper/robustness-guarantees-for-bayesian-inference |
Repo | https://github.com/andreapatane/checkGP |
Framework | none |
An accurate retrieval through R-MAC+ descriptors for landmark recognition
Title | An accurate retrieval through R-MAC+ descriptors for landmark recognition |
Authors | Federico Magliani, Andrea Prati |
Abstract | The landmark recognition problem is far from being solved, but with the use of features extracted from intermediate layers of Convolutional Neural Networks (CNNs), excellent results have been obtained. In this work, we propose some improvements on the creation of R-MAC descriptors in order to make the newly-proposed R-MAC+ descriptors more representative than the previous ones. However, the main contribution of this paper is a novel retrieval technique, that exploits the fine representativeness of the MAC descriptors of the database images. Using this descriptors called “db regions” during the retrieval stage, the performance is greatly improved. The proposed method is tested on different public datasets: Oxford5k, Paris6k and Holidays. It outperforms the state-of-the- art results on Holidays and reached excellent results on Oxford5k and Paris6k, overcame only by approaches based on fine-tuning strategies. |
Tasks | |
Published | 2018-06-22 |
URL | http://arxiv.org/abs/1806.08565v1 |
http://arxiv.org/pdf/1806.08565v1.pdf | |
PWC | https://paperswithcode.com/paper/an-accurate-retrieval-through-r-mac |
Repo | https://github.com/fmaglia/keras_rmac_plus |
Framework | tf |
Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference
Title | Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference |
Authors | Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, Gerald Tesauro |
Abstract | Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller. |
Tasks | Continual Learning, Meta-Learning |
Published | 2018-10-29 |
URL | https://arxiv.org/abs/1810.11910v3 |
https://arxiv.org/pdf/1810.11910v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-learn-without-forgetting-by |
Repo | https://github.com/mattriemer/mer |
Framework | pytorch |
Adversarially Learned Anomaly Detection
Title | Adversarially Learned Anomaly Detection |
Authors | Houssam Zenati, Manon Romain, Chuan Sheng Foo, Bruno Lecouat, Vijay Ramaseshan Chandrasekhar |
Abstract | Anomaly detection is a significant and hence well-studied problem. However, developing effective anomaly detection methods for complex and high-dimensional data remains a challenge. As Generative Adversarial Networks (GANs) are able to model the complex high-dimensional distributions of real-world data, they offer a promising approach to address this challenge. In this work, we propose an anomaly detection method, Adversarially Learned Anomaly Detection (ALAD) based on bi-directional GANs, that derives adversarially learned features for the anomaly detection task. ALAD then uses reconstruction errors based on these adversarially learned features to determine if a data sample is anomalous. ALAD builds on recent advances to ensure data-space and latent-space cycle-consistencies and stabilize GAN training, which results in significantly improved anomaly detection performance. ALAD achieves state-of-the-art performance on a range of image and tabular datasets while being several hundred-fold faster at test time than the only published GAN-based method. |
Tasks | Anomaly Detection |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02288v1 |
http://arxiv.org/pdf/1812.02288v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarially-learned-anomaly-detection |
Repo | https://github.com/houssamzenati/Efficient-GAN-Anomaly-Detection |
Framework | tf |
q-Space Novelty Detection with Variational Autoencoders
Title | q-Space Novelty Detection with Variational Autoencoders |
Authors | Aleksei Vasilev, Vladimir Golkov, Marc Meissner, Ilona Lipp, Eleonora Sgarlata, Valentina Tomassini, Derek K. Jones, Daniel Cremers |
Abstract | In machine learning, novelty detection is the task of identifying novel unseen data. During training, only samples from the normal class are available. Test samples are classified as normal or abnormal by assignment of a novelty score. Here we propose novelty detection methods based on training variational autoencoders (VAEs) on normal data. Since abnormal samples are not used during training, we define novelty metrics based on the (partially complementary) assumptions that the VAE is less capable of reconstructing abnormal samples well; that abnormal samples more strongly violate the VAE regularizer; and that abnormal samples differ from normal samples not only in input-feature space, but also in the VAE latent space and VAE output. These approaches, combined with various possibilities of using (e.g. sampling) the probabilistic VAE to obtain scalar novelty scores, yield a large family of methods. We apply these methods to magnetic resonance imaging, namely to the detection of diffusion-space (q-space) abnormalities in diffusion MRI scans of multiple sclerosis patients, i.e. to detect multiple sclerosis lesions without using any lesion labels for training. Many of our methods outperform previously proposed q-space novelty detection methods. We also evaluate the proposed methods on the MNIST handwritten digits dataset and show that many of them are able to outperform the state of the art. |
Tasks | |
Published | 2018-06-08 |
URL | http://arxiv.org/abs/1806.02997v2 |
http://arxiv.org/pdf/1806.02997v2.pdf | |
PWC | https://paperswithcode.com/paper/q-space-novelty-detection-with-variational |
Repo | https://github.com/VAlex22/ND_VAE |
Framework | none |
Goal-based Course Recommendation
Title | Goal-based Course Recommendation |
Authors | Weijie Jiang, Zachary A. Pardos, Qiang Wei |
Abstract | With cross-disciplinary academic interests increasing and academic advising resources over capacity, the importance of exploring data-assisted methods to support student decision making has never been higher. We build on the findings and methodologies of a quickly developing literature around prediction and recommendation in higher education and develop a novel recurrent neural network-based recommendation system for suggesting courses to help students prepare for target courses of interest, personalized to their estimated prior knowledge background and zone of proximal development. We validate the model using tests of grade prediction and the ability to recover prerequisite relationships articulated by the university. In the third validation, we run the fully personalized recommendation for students the semester before taking a historically difficult course and observe differential overlap with our would-be suggestions. While not proof of causal effectiveness, these three evaluation perspectives on the performance of the goal-based model build confidence and bring us one step closer to deployment of this personalized course preparation affordance in the wild. |
Tasks | Decision Making |
Published | 2018-12-25 |
URL | http://arxiv.org/abs/1812.10078v1 |
http://arxiv.org/pdf/1812.10078v1.pdf | |
PWC | https://paperswithcode.com/paper/goal-based-course-recommendation |
Repo | https://github.com/CAHLR/goal-based-recommendation |
Framework | pytorch |
Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis
Title | Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis |
Authors | Andrew Moore, Paul Rayson |
Abstract | Lack of repeatability and generalisability are two significant threats to continuing scientific development in Natural Language Processing. Language models and learning methods are so complex that scientific conference papers no longer contain enough space for the technical depth required for replication or reproduction. Taking Target Dependent Sentiment Analysis as a case study, we show how recent work in the field has not consistently released code, or described settings for learning methods in enough detail, and lacks comparability and generalisability in train, test or validation data. To investigate generalisability and to enable state of the art comparative evaluations, we carry out the first reproduction studies of three groups of complementary methods and perform the first large-scale mass evaluation on six different English datasets. Reflecting on our experiences, we recommend that future replication or reproduction experiments should always consider a variety of datasets alongside documenting and releasing their methods and published code in order to minimise the barriers to both repeatability and generalisability. We have released our code with a model zoo on GitHub with Jupyter Notebooks to aid understanding and full documentation, and we recommend that others do the same with their papers at submission time through an anonymised GitHub account. |
Tasks | Sentiment Analysis |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.05219v2 |
http://arxiv.org/pdf/1806.05219v2.pdf | |
PWC | https://paperswithcode.com/paper/bringing-replication-and-reproduction |
Repo | https://github.com/apmoore1/Bella |
Framework | none |
Learning To Split and Rephrase From Wikipedia Edit History
Title | Learning To Split and Rephrase From Wikipedia Edit History |
Authors | Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das |
Abstract | Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia’s edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task. Incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark. |
Tasks | |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09468v1 |
http://arxiv.org/pdf/1808.09468v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-split-and-rephrase-from-wikipedia |
Repo | https://github.com/google-research-datasets/wiki-split |
Framework | none |
Orthogonal Random Forest for Causal Inference
Title | Orthogonal Random Forest for Causal Inference |
Authors | Miruna Oprescu, Vasilis Syrgkanis, Zhiwei Steven Wu |
Abstract | We propose the orthogonal random forest, an algorithm that combines Neyman-orthogonality to reduce sensitivity with respect to estimation error of nuisance parameters with generalized random forests (Athey et al., 2017)–a flexible non-parametric method for statistical estimation of conditional moment models using random forests. We provide a consistency rate and establish asymptotic normality for our estimator. We show that under mild assumptions on the consistency rate of the nuisance estimator, we can achieve the same error rate as an oracle with a priori knowledge of these nuisance parameters. We show that when the nuisance functions have a locally sparse parametrization, then a local $\ell_1$-penalized regression achieves the required rate. We apply our method to estimate heterogeneous treatment effects from observational data with discrete treatments or continuous treatments, and we show that, unlike prior work, our method provably allows to control for a high-dimensional set of variables under standard sparsity conditions. We also provide a comprehensive empirical evaluation of our algorithm on both synthetic and real data. |
Tasks | Causal Inference |
Published | 2018-06-09 |
URL | https://arxiv.org/abs/1806.03467v4 |
https://arxiv.org/pdf/1806.03467v4.pdf | |
PWC | https://paperswithcode.com/paper/orthogonal-random-forest-for-causal-inference |
Repo | https://github.com/Microsoft/EconML |
Framework | none |