Paper Group NANR 51
Understanding the Limitations of Variational Mutual Information Estimators. Regularizing Predictions via Class-wise Self-knowledge Distillation. Quantifying Exposure Bias for Neural Language Generation. Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information. Slow Thinking Enables Task-Uncertain Lifelong a …
Understanding the Limitations of Variational Mutual Information Estimators
Title | Understanding the Limitations of Variational Mutual Information Estimators |
Authors | Anonymous |
Abstract | Variational approaches based on neural networks are showing promise for estimating mutual information (MI) between high dimensional variables. However, they can be difficult to use in practice due to poorly understood bias/variance tradeoffs. We theoretically show that, under some conditions, estimators such as MINE exhibit variance that could grow exponentially with the true amount of underlying MI. We also empirically demonstrate that existing estimators fail to satisfy basic self-consistency properties of MI, such as data processing and additivity under independence. Based on a unified perspective of variational approaches, we develop a new estimator that focuses on variance reduction. Empirical results on standard benchmark tasks demonstrate that our proposed estimator exhibits improved bias-variance trade-offs on standard benchmark tasks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1x62TNtDS |
https://openreview.net/pdf?id=B1x62TNtDS | |
PWC | https://paperswithcode.com/paper/understanding-the-limitations-of-variational-1 |
Repo | |
Framework | |
Regularizing Predictions via Class-wise Self-knowledge Distillation
Title | Regularizing Predictions via Class-wise Self-knowledge Distillation |
Authors | Anonymous |
Abstract | Deep neural networks with millions of parameters may suffer from poor generalizations due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In particular, we distill the predictive distribution between different samples of the same label and augmented samples of the same source during training. In other words, we regularize the dark knowledge (i.e., the knowledge on wrong predictions) of a single network, i.e., a self-knowledge distillation technique, to force it output more meaningful predictions. We demonstrate the effectiveness of the proposed method via experiments on various image classification tasks: it improves not only the generalization ability, but also the calibration accuracy of modern neural networks. |
Tasks | Calibration, Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJluPerYvB |
https://openreview.net/pdf?id=BJluPerYvB | |
PWC | https://paperswithcode.com/paper/regularizing-predictions-via-class-wise-self |
Repo | |
Framework | |
Quantifying Exposure Bias for Neural Language Generation
Title | Quantifying Exposure Bias for Neural Language Generation |
Authors | Anonymous |
Abstract | The exposure bias problem refers to the training-inference discrepancy caused by teacher forcing in maximum likelihood estimation (MLE) training for auto-regressive neural network language models (LM). It has been regarded as a central problem for natural language generation (NLG) model training. Although a lot of algorithms have been proposed to avoid teacher forcing and therefore to alleviate exposure bias, there is little work showing how serious the exposure bias problem is. In this work, we first identify the auto-recovery ability of MLE-trained LM, which casts doubt on the seriousness of exposure bias. We then develop a precise, quantifiable definition for exposure bias. However, according to our measurements in controlled experiments, there’s only around 3% performance gain when the training-inference discrepancy is completely removed. Our results suggest the exposure bias problem could be much less serious than it is currently assumed to be. |
Tasks | Text Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJg2fTNtwr |
https://openreview.net/pdf?id=rJg2fTNtwr | |
PWC | https://paperswithcode.com/paper/quantifying-exposure-bias-for-neural-language-1 |
Repo | |
Framework | |
Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information
Title | Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information |
Authors | Anonymous |
Abstract | Counterfactual regret minimization (CFR) methods are effective for solving two-player zero-sum extensive games with imperfect information with state-of-the-art results. However, the vanilla CFR has to traverse the whole game tree in each round, which is time-consuming in large-scale games. In this paper, we present Lazy-CFR, a CFR algorithm that adopts a lazy update strategy to avoid traversing the whole game tree in each round. We prove that the regret of Lazy-CFR is almost the same to the regret of the vanilla CFR and only needs to visit a small portion of the game tree. Thus, Lazy-CFR is provably faster than CFR. Empirical results consistently show that Lazy-CFR is significantly faster than the vanilla CFR. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJx4p3NYDB |
https://openreview.net/pdf?id=rJx4p3NYDB | |
PWC | https://paperswithcode.com/paper/lazy-cfr-fast-and-near-optimal-regret-1 |
Repo | |
Framework | |
Slow Thinking Enables Task-Uncertain Lifelong and Sequential Few-Shot Learning
Title | Slow Thinking Enables Task-Uncertain Lifelong and Sequential Few-Shot Learning |
Authors | Anonymous |
Abstract | Lifelong machine learning focuses on adapting to novel tasks without forgetting the old tasks, whereas few-shot learning strives to learn a single task given a small amount of data. These two different research areas are crucial for artificial general intelligence, however, their existing studies have somehow assumed some impractical settings when training the models. For lifelong learning, the nature (or the quantity) of incoming tasks during inference time is assumed to be known at training time. As for few-shot learning, it is commonly assumed that a large number of tasks is available during training. Humans, on the other hand, can perform these learning tasks without regard to the aforementioned assumptions. Inspired by how the human brain works, we propose a novel model, called the Slow Thinking to Learn (STL), that makes sophisticated (and slightly slower) predictions by iteratively considering interactions between current and previously seen tasks at runtime. Having conducted experiments, the results empirically demonstrate the effectiveness of STL for more realistic lifelong and few-shot learning settings. |
Tasks | Few-Shot Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HklbIerFDS |
https://openreview.net/pdf?id=HklbIerFDS | |
PWC | https://paperswithcode.com/paper/slow-thinking-enables-task-uncertain-lifelong |
Repo | |
Framework | |
Empirical Studies on the Properties of Linear Regions in Deep Neural Networks
Title | Empirical Studies on the Properties of Linear Regions in Deep Neural Networks |
Authors | Anonymous |
Abstract | A deep neural networks (DNN) with piecewise linear activations can partition the input space into numerous small linear regions, where different linear functions are fitted. It is believed that the number of these regions represents the expressivity of a DNN. This paper provides a novel and meticulous perspective to look into DNNs: Instead of just counting the number of the linear regions, we study their local properties, such as the inspheres, the directions of the corresponding hyperplanes, the decision boundaries, and the relevance of the surrounding regions. We empirically observed that different optimization techniques lead to completely different linear regions, even though they result in similar classification accuracies. We hope our study can inspire the design of novel optimization techniques, and help discover and analyze the behaviors of DNNs. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkeFl1HKwr |
https://openreview.net/pdf?id=SkeFl1HKwr | |
PWC | https://paperswithcode.com/paper/empirical-studies-on-the-properties-of-linear |
Repo | |
Framework | |
State Alignment-based Imitation Learning
Title | State Alignment-based Imitation Learning |
Authors | Anonymous |
Abstract | Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of existing imitation learning methods fail because they focus on the imitation of actions. We propose a novel state alignment-based imitation learning method to train the imitator by following the state sequences in the expert demonstrations as much as possible. The alignment of states comes from both local and global perspectives. We combine them into a reinforcement learning framework by a regularized policy update objective. We show the superiority of our method on standard imitation learning settings as well as the challenging settings in which the expert and the imitator have different dynamics models. |
Tasks | Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rylrdxHFDr |
https://openreview.net/pdf?id=rylrdxHFDr | |
PWC | https://paperswithcode.com/paper/state-alignment-based-imitation-learning |
Repo | |
Framework | |
Influence-Based Multi-Agent Exploration
Title | Influence-Based Multi-Agent Exploration |
Authors | Anonymous |
Abstract | Intrinsically motivated reinforcement learning aims to address the exploration challenge for sparse-reward tasks. However, the study of exploration methods in transition-dependent multi-agent settings is largely absent from the literature. We aim to take a step towards solving this problem. We present two exploration methods: exploration via information-theoretic influence (EITI) and exploration via decision-theoretic influence (EDTI), by exploiting the role of interaction in coordinated behaviors of agents. EITI uses mutual information to capture influence transition dynamics. EDTI uses a novel intrinsic reward, called Value of Interaction (VoI), to characterize and quantify the influence of one agent’s behavior on expected returns of other agents. By optimizing EITI or EDTI objective as a regularizer, agents are encouraged to coordinate their exploration and learn policies to optimize team performance. We show how to optimize these regularizers so that they can be easily integrated with policy gradient reinforcement learning. The resulting update rule draws a connection between coordinated exploration and intrinsic reward distribution. Finally, we empirically demonstrate the significant strength of our method in a variety of multi-agent scenarios. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJgy96EYvr |
https://openreview.net/pdf?id=BJgy96EYvr | |
PWC | https://paperswithcode.com/paper/influence-based-multi-agent-exploration-1 |
Repo | |
Framework | |
Explain Your Move: Understanding Agent Actions Using Focused Feature Saliency
Title | Explain Your Move: Understanding Agent Actions Using Focused Feature Saliency |
Authors | Anonymous |
Abstract | As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our approach generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweights irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare our approach with existing approaches on agents trained to play board games (Chess and Go) and Atari games (Breakout, Pong and Space Invaders). We show through illustrative examples (Chess, Atari, Go), human studies (Chess), and automated evaluation methods (Chess) that our approach generates saliency maps that are more interpretable for humans than existing approaches. |
Tasks | Atari Games, Board Games |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJgzLkBKPB |
https://openreview.net/pdf?id=SJgzLkBKPB | |
PWC | https://paperswithcode.com/paper/explain-your-move-understanding-agent-actions |
Repo | |
Framework | |
Perception-Driven Curiosity with Bayesian Surprise
Title | Perception-Driven Curiosity with Bayesian Surprise |
Authors | Anonymous |
Abstract | Intrinsic rewards in reinforcement learning provide a powerful algorithmic capability for agents to learn how to interact with their environment in a task-generic way. However, increased incentives for motivation can come at the cost of increased fragility to stochasticity. We introduce a method for computing an intrinsic reward for curiosity using metrics derived from sampling a latent variable model used to estimate dynamics. Ultimately, an estimate of the conditional probability of observed states is used as our intrinsic reward for curiosity. In our experiments, a video game agent uses our model to autonomously learn how to play Atari games using our curiosity reward in combination with extrinsic rewards from the game to achieve improved performance on games with sparse extrinsic rewards. When stochasticity is introduced in the environment, our method still demonstrates improved performance over the baseline. |
Tasks | Atari Games |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJlBQkrFvr |
https://openreview.net/pdf?id=rJlBQkrFvr | |
PWC | https://paperswithcode.com/paper/perception-driven-curiosity-with-bayesian |
Repo | |
Framework | |
Striving for Simplicity in Off-Policy Deep Reinforcement Learning
Title | Striving for Simplicity in Off-Policy Deep Reinforcement Learning |
Authors | Anonymous |
Abstract | This paper advocates the use of offline (batch) reinforcement learning (RL) to help (1) isolate the contributions of exploitation vs. exploration in off-policy deep RL, (2) improve reproducibility of deep RL research, and (3) facilitate the design of simpler deep RL algorithms. We propose an offline RL benchmark on Atari 2600 games comprising all of the replay data of a DQN agent. Using this benchmark, we demonstrate that recent off-policy deep RL algorithms, even when trained solely on logged DQN data, can outperform online DQN. We present Random Ensemble Mixture (REM), a simple Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple Q-value estimates. The REM algorithm outperforms more complex RL agents such as C51 and QR-DQN on the offline Atari benchmark and performs comparably in the online setting. |
Tasks | Atari Games, Q-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryeUg0VFwr |
https://openreview.net/pdf?id=ryeUg0VFwr | |
PWC | https://paperswithcode.com/paper/striving-for-simplicity-in-off-policy-deep-1 |
Repo | |
Framework | |
Harnessing Structures for Value-Based Planning and Reinforcement Learning
Title | Harnessing Structures for Value-Based Planning and Reinforcement Learning |
Authors | Anonymous |
Abstract | Value-based methods constitute a fundamental methodology in planning and deep reinforcement learning (RL). In this paper, we propose to exploit the underlying structures of the state-action value function, i.e., Q function, for both planning and deep RL. In particular, if the underlying system dynamics lead to some global structures of the Q function, one should be capable of inferring the function better by leveraging such structures. Specifically, we investigate the low-rank structure, which widely exists for big data matrices. We verify empirically the existence of low-rank Q functions in the context of control and deep RL tasks (Atari games). As our key contribution, by leveraging Matrix Estimation (ME) techniques, we propose a general framework to exploit the underlying low-rank structure in Q functions, leading to a more efficient planning procedure for classical control, and additionally, a simple scheme that can be applied to any value-based RL techniques to consistently achieve better performance on ‘‘low-rank’’ tasks. Extensive experiments on control tasks and Atari games confirm the efficacy of our approach. |
Tasks | Atari Games |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rklHqRVKvH |
https://openreview.net/pdf?id=rklHqRVKvH | |
PWC | https://paperswithcode.com/paper/harnessing-structures-for-value-based-1 |
Repo | |
Framework | |
Learning Key Steps to Attack Deep Reinforcement Learning Agents
Title | Learning Key Steps to Attack Deep Reinforcement Learning Agents |
Authors | Anonymous |
Abstract | Deep reinforcement learning agents are known to be vulnerable to adversarial attacks. In particular, recent studies have shown that attacking a few key steps is effective for decreasing the agent’s cumulative reward. However, all existing attacking methods find those key steps with human-designed heuristics, and it is not clear how more effective key steps can be identified. This paper introduces a novel reinforcement learning framework that learns more effective key steps through interacting with the agent. The proposed framework does not require any human heuristics nor knowledge, and can be flexibly coupled with any white-box or black-box adversarial attack scenarios. Experiments on benchmark Atari games across different scenarios demonstrate that the proposed framework is superior to existing methods for identifying more effective key steps. |
Tasks | Adversarial Attack, Atari Games |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hkxbz1HKvr |
https://openreview.net/pdf?id=Hkxbz1HKvr | |
PWC | https://paperswithcode.com/paper/learning-key-steps-to-attack-deep |
Repo | |
Framework | |
ON COMPUTATION AND GENERALIZATION OF GENER- ATIVE ADVERSARIAL IMITATION LEARNING
Title | ON COMPUTATION AND GENERALIZATION OF GENER- ATIVE ADVERSARIAL IMITATION LEARNING |
Authors | Anonymous |
Abstract | Generative Adversarial Imitation Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies. Different from Reinforcement Learning (RL), GAIL takes advantage of demonstration data by experts (e.g., human), and learns both the policy and reward function of the unknown environment. Despite the significant empirical progresses, the theory behind GAIL is still largely unknown. The major difficulty comes from the underlying temporal dependency of the demonstration data and the minimax computational formulation of GAIL without convex-concave structure. To bridge such a gap between theory and practice, this paper investigates the theoretical properties of GAIL. Specifically, we show: (1) For GAIL with general reward parameterization, the generalization can be guaranteed as long as the class of the reward functions is properly controlled; (2) For GAIL, where the reward is parameterized as a reproducing kernel function, GAIL can be efficiently solved by stochastic first order optimization algorithms, which attain sublinear convergence to a stationary solution. To the best of our knowledge, these are the first results on statistical and computational guarantees of imitation learning with reward/policy function ap- proximation. Numerical experiments are provided to support our analysis. |
Tasks | Decision Making, Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJl-5pNKDB |
https://openreview.net/pdf?id=BJl-5pNKDB | |
PWC | https://paperswithcode.com/paper/on-computation-and-generalization-of-gener |
Repo | |
Framework | |
Deep Ensembles: A Loss Landscape Perspective
Title | Deep Ensembles: A Loss Landscape Perspective |
Authors | Anonymous |
Abstract | Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and out-of-distribution robustness of deep learning models. While deep ensembles were theoretically motivated by the bootstrap, non-bootstrap ensembles trained with just random initialization also perform well in practice, which suggests that there could be other explanations for why deep ensembles work well. Bayesian neural networks, which learn distributions over the parameters of the network, are theoretically well-motivated by Bayesian principles, but do not perform as well as deep ensembles in practice, particularly under dataset shift. One possible explanation for this gap between theory and practice is that popular scalable approximate Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space. We investigate this hypothesis by building on recent work on understanding the loss landscape of neural networks and adding our own exploration to measure the similarity of functions in the space of predictions. Our results show that random initializations explore entirely different modes, while functions along an optimization trajectory or sampled from the subspace thereof cluster within a single mode predictions-wise, while often deviating significantly in the weight space. We demonstrate that while low-loss connectors between modes exist, they are not connected in the space of predictions. Developing the concept of the diversity–accuracy plane, we show that the decorrelation power of random initializations is unmatched by popular subspace sampling methods. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1xZAkrFPr |
https://openreview.net/pdf?id=r1xZAkrFPr | |
PWC | https://paperswithcode.com/paper/deep-ensembles-a-loss-landscape-perspective |
Repo | |
Framework | |