April 1, 2020

2995 words 15 mins read

Paper Group NANR 92

Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth. Combining Q-Learning and Search with Amortized Value Estimates. Improved memory in recurrent neural networks with sequential non-normal dynamics. Differentiable Reasoning over a Virtual Knowledge Base. Hamiltonian Generative Networks. GenDICE: Generalized Offli …

Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth


Title	Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth
Authors	Anonymous
Abstract	In most machine learning tasks unambiguous ground truth labels can easily be acquired. However, this luxury is often not afforded to many high-stakes, real-world scenarios such as medical image interpretation, where even expert human annotators typically exhibit very high levels of disagreement with one another. While prior works have focused on overcoming noisy labels during training, the question of how to evaluate models when annotators disagree about ground truth has remained largely unexplored. To address this, we propose the discrepancy ratio: a novel, task-independent and principled framework for validating machine learning models in the presence of high label noise. Conceptually, our approach evaluates a model by comparing its predictions to those of human annotators, taking into account the degree to which annotators disagree with one another. While our approach is entirely general, we show that in the special case of binary classification, our proposed metric can be evaluated in terms of simple, closed-form expressions that depend only on aggregate statistics of the labels and not on any individual label. Finally, we demonstrate how this framework can be used effectively to validate machine learning models that we trained on two real-world tasks from medical imaging. The discrepancy ratio metric reveals what conventional metrics do not: that our models not only vastly exceed the average human performance, but even exceed the performance of the best human experts in our datasets.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Byg-wJSYDS
PDF	https://openreview.net/pdf?id=Byg-wJSYDS
PWC	https://paperswithcode.com/paper/discrepancy-ratio-evaluating-model
Repo
Framework

Combining Q-Learning and Search with Amortized Value Estimates


Title	Combining Q-Learning and Search with Amortized Value Estimates
Authors	Anonymous
Abstract	We introduce “Search with Amortized Value Estimates” (SAVE), an approach for combining model-free Q-learning with model-based Monte-Carlo Tree Search (MCTS). In SAVE, a learned prior over state-action values is used to guide MCTS, which estimates an improved set of state-action values. The new Q-estimates are then used in combination with real experience to update the prior. This effectively amortizes the value computation performed by MCTS, resulting in a cooperative relationship between model-free learning and model-based search. SAVE can be implemented on top of any Q-learning agent with access to a model, which we demonstrate by incorporating it into agents that perform challenging physical reasoning tasks and Atari. SAVE consistently achieves higher rewards with fewer training steps, and—in contrast to typical model-based search approaches—yields strong performance with very small search budgets. By combining real experience with information computed during search, SAVE demonstrates that it is possible to improve on both the performance of model-free learning and the computational cost of planning.
Tasks	Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SkeAaJrKDS
PDF	https://openreview.net/pdf?id=SkeAaJrKDS
PWC	https://paperswithcode.com/paper/combining-q-learning-and-search-with
Repo
Framework

Improved memory in recurrent neural networks with sequential non-normal dynamics


Title	Improved memory in recurrent neural networks with sequential non-normal dynamics
Authors	Anonymous
Abstract	Training recurrent neural networks (RNNs) is a hard problem due to degeneracies in the optimization landscape, a problem also known as the vanishing/exploding gradients problem. Short of designing new RNN architectures, various methods that have been proposed for dealing with this problem usually boil down to orthogonalization of the recurrent dynamics, either at initialization or during the entire training period. The basic motivation behind these methods is that orthogonal transformations are isometries of the Euclidean space, hence they preserve (Euclidean) norms and effectively deal with the vanishing/exploding gradients problem. However, this idea ignores the crucial effects of non-linearity and noise. In the presence of a non-linearity, orthogonal transformations no longer preserve norms, suggesting that alternative transformations might be better suited to non-linear networks. Moreover, in the presence of noise, norm preservation itself ceases to be the ideal objective. A more sensible objective is maximizing the signal-to-noise ratio (SNR) of the propagated signal instead. Previous work has shown that in the linear case, recurrent networks that maximize the SNR display strongly non-normal, sequential dynamics and orthogonal networks are highly suboptimal by this measure. Motivated by this finding, we investigate the potential of non-normal RNNs, i.e. RNNs with a non-normal recurrent connectivity matrix, in sequential processing tasks. Our experimental results show that non-normal RNNs outperform their orthogonal counterparts in a diverse range of benchmarks. We also find evidence for increased non-normality and hidden chain-like feedforward structures in trained RNNs initialized with orthogonal recurrent connectivity matrices.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryx1wRNFvB
PDF	https://openreview.net/pdf?id=ryx1wRNFvB
PWC	https://paperswithcode.com/paper/improved-memory-in-recurrent-neural-networks-1
Repo
Framework

Differentiable Reasoning over a Virtual Knowledge Base


Title	Differentiable Reasoning over a Virtual Knowledge Base
Authors	Anonymous
Abstract	We consider the task of answering complex multi-hop questions using a corpus as a virtual knowledge base (KB). In particular, we describe a neural module, DrKIT, that traverses textual data like a virtual KB, softly following paths of relations between mentions of entities in the corpus. At each step the operation uses a combination of sparse-matrix TFIDF indices and maximum inner product search (MIPS) on a special index of contextual representations. This module is differentiable, so the full system can be trained completely end-to-end using gradient based methods, starting from natural language inputs. We also describe a pretraining scheme for the index mention encoder by generating hard negative examples using existing knowledge bases. We show that DrKIT improves accuracy by 9 points on 3-hop questions in the MetaQA dataset, cutting the gap between text-based and KB-based state-of-the-art by 70%. DrKIT is also very efficient, processing upto 10x more queries per second than existing state-of-the-art QA systems.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJxstlHFPH
PDF	https://openreview.net/pdf?id=SJxstlHFPH
PWC	https://paperswithcode.com/paper/differentiable-reasoning-over-a-virtual
Repo
Framework

Hamiltonian Generative Networks


Title	Hamiltonian Generative Networks
Authors	Anonymous
Abstract	The Hamiltonian formalism plays a central role in classical and quantum physics. Hamiltonians are the main tool for modelling the continuous time evolution of systems with conserved quantities, and they come equipped with many useful properties, like time reversibility and smooth interpolation in time. These properties are important for many machine learning problems - from sequence prediction to reinforcement learning and density modelling - but are not typically provided out of the box by standard tools such as recurrent neural networks. In this paper, we introduce the Hamiltonian Generative Network (HGN), the first approach capable of consistently learning Hamiltonian dynamics from high-dimensional observations (such as images) without restrictive domain assumptions. Once trained, we can use HGN to sample new trajectories, perform rollouts both forward and backward in time, and even speed up or slow down the learned dynamics. We demonstrate how a simple modification of the network architecture turns HGN into a powerful normalising flow model, called Neural Hamiltonian Flow (NHF), that uses Hamiltonian dynamics to model expressive densities. Hence, we hope that our work serves as a first practical demonstration of the value that the Hamiltonian formalism can bring to machine learning. More results and video evaluations are available at: http://tiny.cc/hgn
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJenn6VFvB
PDF	https://openreview.net/pdf?id=HJenn6VFvB
PWC	https://paperswithcode.com/paper/hamiltonian-generative-networks-1
Repo
Framework

GenDICE: Generalized Offline Estimation of Stationary Values


Title	GenDICE: Generalized Offline Estimation of Stationary Values
Authors	Anonymous
Abstract	An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain. In many real-world applications, access to the underlying transition operator is limited to a fixed set of data that has already been collected, without additional interaction with the environment being available. We show that consistent estimation remains possible in this scenario, and that effective estimation can still be achieved in important applications. Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions, derived from fundamental properties of the stationary distribution, and exploiting constraint reformulations based on variational divergence minimization. The resulting algorithm, GenDICE, is straightforward and effective. We prove the consistency of the method under general conditions, provide a detailed error analysis, and demonstrate strong empirical performance on benchmark tasks, including off-line PageRank and off-policy policy evaluation.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxlcnVFwB
PDF	https://openreview.net/pdf?id=HkxlcnVFwB
PWC	https://paperswithcode.com/paper/gendice-generalized-offline-estimation-of
Repo
Framework

Extreme Classification via Adversarial Softmax Approximation


Title	Extreme Classification via Adversarial Softmax Approximation
Authors	Anonymous
Abstract	Training a classifier over a large number of classes, known as ‘extreme classification’, has become a topic of major interest with applications in technology, science, and e-commerce. Traditional softmax regression induces a gradient cost proportional to the number of classes C, which often is prohibitively expensive. A popular scalable softmax approximation relies on uniform negative sampling, which suffers from slow convergence due a poor signal-to-noise ratio. In this paper, we propose a simple training method for drastically enhancing the gradient signal by drawing negative samples from an adversarial model that mimics the data distribution. Our contributions are three-fold: (i) an adversarial sampling mechanism that produces negative samples at a cost only logarithmic in C, thus still resulting in cheap gradient updates; (ii) a mathematical proof that this adversarial sampling minimizes the gradient variance while any bias due to non-uniform sampling can be removed; (iii) experimental results on large scale data sets that show a reduction of the training time by an order of magnitude relative to several competitive baselines.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJxe3xSYDS
PDF	https://openreview.net/pdf?id=rJxe3xSYDS
PWC	https://paperswithcode.com/paper/extreme-classification-via-adversarial
Repo
Framework

Leveraging inductive bias of neural networks for learning without explicit human annotations


Title	Leveraging inductive bias of neural networks for learning without explicit human annotations
Authors	Anonymous
Abstract	Classification problems today are typically solved by first collecting examples along with candidate labels, second obtaining clean labels from workers, and third training a large, overparameterized deep neural network on the clean examples. The second, labeling step is often the most expensive one as it requires manually going through all examples. In this paper we skip the labeling step entirely and propose to directly train the deep neural network on the noisy raw labels and early stop the training to avoid overfitting. With this procedure we exploit an intriguing property of large overparameterized neural networks: While they are capable of perfectly fitting the noisy data, gradient descent fits clean labels much faster than the noisy ones, thus early stopping resembles training on the clean labels. Our results show that early stopping the training of standard deep networks such as ResNet-18 on part of the Tiny Images dataset, which does not involve any human labeled data, and of which only about half of the labels are correct, gives a significantly higher test performance than when trained on the clean CIFAR-10 training dataset, which is a labeled version of the Tiny Images dataset, for the same classification problem. In addition, our results show that the noise generated through the label collection process is not nearly as adversarial for learning as the noise generated by randomly flipping labels, which is the noise most prevalent in works demonstrating noise robustness of neural networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJeIX6EKvr
PDF	https://openreview.net/pdf?id=HJeIX6EKvr
PWC	https://paperswithcode.com/paper/leveraging-inductive-bias-of-neural-networks-1
Repo
Framework

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning


Title	Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
Authors	Anonymous
Abstract	Autonomous agents that must exhibit flexible and broad capabilities will need to be equipped with large repertoires of skills. Defining each skill with a manually-designed reward function limits this repertoire and imposes a manual engineering burden. Self-supervised agents that set their own goals can automate this process, but designing appropriate goal setting objectives can be difficult, and often involves heuristic design decisions. In this paper, we propose a formal exploration objective for goal-reaching policies that maximizes state coverage. We show that this objective is equivalent to maximizing the entropy of the goal distribution together with goal reaching performance, where goals correspond to full state observations. To instantiate this principle, we present an algorithm called Skew-Fit for learning a maximum-entropy goal distributions. Skew-Fit enables self-supervised agents to autonomously choose and practice reaching diverse goals. We show that, under certain regularity conditions, our method converges to a uniform distribution over the set of valid states, even when we do not know this set beforehand. Our experiments show that it can learn a variety of manipulation tasks from images, including opening a door with a real robot, entirely from scratch and without any manually-designed reward function.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1gIdySFPH
PDF	https://openreview.net/pdf?id=r1gIdySFPH
PWC	https://paperswithcode.com/paper/skew-fit-state-covering-self-supervised-1
Repo
Framework

Dynamics-Aware Embeddings


Title	Dynamics-Aware Embeddings
Authors	Anonymous
Abstract	In this paper we consider self-supervised representation learning to improve sample efficiency in reinforcement learning (RL). We propose a forward prediction objective for simultaneously learning embeddings of states and actions. These embeddings capture the structure of the environment’s dynamics, enabling efficient policy learning. We demonstrate that our action embeddings alone improve the sample efficiency and peak performance of model-free RL on control from low-dimensional states. By combining state and action embeddings, we achieve efficient learning of high-quality policies on goal-conditioned continuous control from pixel observations in only 1-2 million environment steps.
Tasks	Continuous Control, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BJgZGeHFPH
PDF	https://openreview.net/pdf?id=BJgZGeHFPH
PWC	https://paperswithcode.com/paper/dynamics-aware-embeddings-1
Repo
Framework

Extreme Tensoring for Low-Memory Preconditioning


Title	Extreme Tensoring for Low-Memory Preconditioning
Authors	Anonymous
Abstract	State-of-the-art models are now trained with billions of parameters, reaching hardware limits in terms of memory consumption. This has created a recent demand for memory-efficient optimizers. To this end, we investigate the limits and performance tradeoffs of memory-efficient adaptively preconditioned gradient methods. We propose \emph{extreme tensoring} for high-dimensional stochastic optimization, showing that an optimizer needs very little memory to benefit from adaptive preconditioning. Our technique applies to arbitrary models (not necessarily with tensor-shaped parameters), and is accompanied by regret and convergence guarantees, which shed light on the tradeoffs between preconditioner quality and expressivity. On a large-scale NLP model, we reduce the optimizer memory overhead by three orders of magnitude, without degrading performance.
Tasks	Stochastic Optimization
Published	2020-01-01
URL	https://openreview.net/forum?id=SklKcRNYDH
PDF	https://openreview.net/pdf?id=SklKcRNYDH
PWC	https://paperswithcode.com/paper/extreme-tensoring-for-low-memory-1
Repo
Framework

Compositional languages emerge in a neural iterated learning model


Title	Compositional languages emerge in a neural iterated learning model
Authors	Anonymous
Abstract	The principle of compositionality, which enables natural language to represent complex concepts via a structured combination of simpler ones, allows us to convey an open-ended set of messages using a limited vocabulary. If compositionality is indeed a natural property of language, we may expect it to appear in communication protocols that are created by neural agents via grounded language learning. Inspired by the iterated learning framework, which simulates the process of language evolution, we propose an effective neural iterated learning algorithm that, when applied to interacting neural agents, facilitates the emergence of a more structured type of language. Indeed, these languages provide specific advantages to neural agents during training, which translates as a larger posterior probability, which is then incrementally amplified via the iterated learning procedure. Our experiments confirm our analysis, and also demonstrate that the emerged languages largely improve the generalization of the neural agent communication.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HkePNpVKPB
PDF	https://openreview.net/pdf?id=HkePNpVKPB
PWC	https://paperswithcode.com/paper/compositional-languages-emerge-in-a-neural
Repo
Framework

Knowledge Consistency between Neural Networks and Beyond


Title	Knowledge Consistency between Neural Networks and Beyond
Authors	Anonymous
Abstract	This paper aims to analyze knowledge consistency between pre-trained deep neural networks. We propose a generic definition for knowledge consistency between neural networks at different fuzziness levels. A task-agnostic method is designed to disentangle feature components, which represent the consistent knowledge, from raw intermediate-layer features of each neural network. As a generic tool, our method can be broadly used for different applications. In preliminary experiments, we have used knowledge consistency as a tool to diagnose knowledge representations of neural networks. Knowledge consistency provides new insights to explain the success of existing deep-learning techniques, such as knowledge distillation and network compression. More crucially, knowledge consistency can also be used to refine pre-trained networks and boost performance.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJeS62EtwH
PDF	https://openreview.net/pdf?id=BJeS62EtwH
PWC	https://paperswithcode.com/paper/knowledge-consistency-between-neural-networks
Repo
Framework

Convolutional Tensor-Train LSTM for Long-Term Video Prediction


Title	Convolutional Tensor-Train LSTM for Long-Term Video Prediction
Authors	Anonymous
Abstract	Long-term video prediction is highly challenging since it entails simultaneously capturing spatial and temporal information across a long range of image frames.Standard recurrent models are ineffective since they are prone to error propagation and cannot effectively capture higher-order correlations. A potential solution is to extend to higher-order spatio-temporal recurrent models. However, such a model requires a large number of parameters and operations, making it intractable to learn in practice and is prone to overfitting. In this work, we propose convolutional tensor-train LSTM (Conv-TT-LSTM), which learns higher-orderConvolutional LSTM (ConvLSTM) efficiently using convolutional tensor-train decomposition (CTTD). Our proposed model naturally incorporates higher-order spatio-temporal information at a small cost of memory and computation by using efficient low-rank tensor representations. We evaluate our model on Moving-MNIST and KTH datasets and show improvements over standard ConvLSTM and better/comparable results to other ConvLSTM-based approaches, but with much fewer parameters.
Tasks	Video Prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkee1JBKwB
PDF	https://openreview.net/pdf?id=Hkee1JBKwB
PWC	https://paperswithcode.com/paper/convolutional-tensor-train-lstm-for-long-term
Repo
Framework

VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation


Title	VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation
Authors	Anonymous
Abstract	Generative models that can model and predict sequences of future events can, in principle, learn to capture complex real-world phenomena, such as physical interactions. However, a central challenge in video prediction is that the future is highly uncertain: a sequence of past observations of events can imply many possible futures. Although a number of recent works have studied probabilistic models that can represent uncertain futures, such models are either extremely expensive computationally as in the case of pixel-level autoregressive models, or do not directly optimize the likelihood of the data. To our knowledge, our work is the first to propose multi-frame video prediction with normalizing flows, which allows for direct optimization of the data likelihood, and produces high-quality stochastic predictions. We describe an approach for modeling the latent space dynamics, and demonstrate that flow-based generative models offer a viable and competitive approach to generative modeling of video.
Tasks	Video Generation, Video Prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=rJgUfTEYvH
PDF	https://openreview.net/pdf?id=rJgUfTEYvH
PWC	https://paperswithcode.com/paper/videoflow-a-conditional-flow-based-model-for
Repo
Framework