April 1, 2020

2766 words 13 mins read

Paper Group NANR 51

Understanding the Limitations of Variational Mutual Information Estimators. Regularizing Predictions via Class-wise Self-knowledge Distillation. Quantifying Exposure Bias for Neural Language Generation. Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information. Slow Thinking Enables Task-Uncertain Lifelong a …

Understanding the Limitations of Variational Mutual Information Estimators


Title	Understanding the Limitations of Variational Mutual Information Estimators
Authors	Anonymous
Abstract	Variational approaches based on neural networks are showing promise for estimating mutual information (MI) between high dimensional variables. However, they can be difficult to use in practice due to poorly understood bias/variance tradeoffs. We theoretically show that, under some conditions, estimators such as MINE exhibit variance that could grow exponentially with the true amount of underlying MI. We also empirically demonstrate that existing estimators fail to satisfy basic self-consistency properties of MI, such as data processing and additivity under independence. Based on a unified perspective of variational approaches, we develop a new estimator that focuses on variance reduction. Empirical results on standard benchmark tasks demonstrate that our proposed estimator exhibits improved bias-variance trade-offs on standard benchmark tasks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1x62TNtDS
PDF	https://openreview.net/pdf?id=B1x62TNtDS
PWC	https://paperswithcode.com/paper/understanding-the-limitations-of-variational-1
Repo
Framework

Regularizing Predictions via Class-wise Self-knowledge Distillation


Title	Regularizing Predictions via Class-wise Self-knowledge Distillation
Authors	Anonymous
Abstract	Deep neural networks with millions of parameters may suffer from poor generalizations due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In particular, we distill the predictive distribution between different samples of the same label and augmented samples of the same source during training. In other words, we regularize the dark knowledge (i.e., the knowledge on wrong predictions) of a single network, i.e., a self-knowledge distillation technique, to force it output more meaningful predictions. We demonstrate the effectiveness of the proposed method via experiments on various image classification tasks: it improves not only the generalization ability, but also the calibration accuracy of modern neural networks.
Tasks	Calibration, Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=BJluPerYvB
PDF	https://openreview.net/pdf?id=BJluPerYvB
PWC	https://paperswithcode.com/paper/regularizing-predictions-via-class-wise-self
Repo
Framework

Quantifying Exposure Bias for Neural Language Generation


Title	Quantifying Exposure Bias for Neural Language Generation
Authors	Anonymous
Abstract	The exposure bias problem refers to the training-inference discrepancy caused by teacher forcing in maximum likelihood estimation (MLE) training for auto-regressive neural network language models (LM). It has been regarded as a central problem for natural language generation (NLG) model training. Although a lot of algorithms have been proposed to avoid teacher forcing and therefore to alleviate exposure bias, there is little work showing how serious the exposure bias problem is. In this work, we first identify the auto-recovery ability of MLE-trained LM, which casts doubt on the seriousness of exposure bias. We then develop a precise, quantifiable definition for exposure bias. However, according to our measurements in controlled experiments, there’s only around 3% performance gain when the training-inference discrepancy is completely removed. Our results suggest the exposure bias problem could be much less serious than it is currently assumed to be.
Tasks	Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=rJg2fTNtwr
PDF	https://openreview.net/pdf?id=rJg2fTNtwr
PWC	https://paperswithcode.com/paper/quantifying-exposure-bias-for-neural-language-1
Repo
Framework

Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information


Title	Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information
Authors	Anonymous
Abstract	Counterfactual regret minimization (CFR) methods are effective for solving two-player zero-sum extensive games with imperfect information with state-of-the-art results. However, the vanilla CFR has to traverse the whole game tree in each round, which is time-consuming in large-scale games. In this paper, we present Lazy-CFR, a CFR algorithm that adopts a lazy update strategy to avoid traversing the whole game tree in each round. We prove that the regret of Lazy-CFR is almost the same to the regret of the vanilla CFR and only needs to visit a small portion of the game tree. Thus, Lazy-CFR is provably faster than CFR. Empirical results consistently show that Lazy-CFR is significantly faster than the vanilla CFR.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJx4p3NYDB
PDF	https://openreview.net/pdf?id=rJx4p3NYDB
PWC	https://paperswithcode.com/paper/lazy-cfr-fast-and-near-optimal-regret-1
Repo
Framework

Slow Thinking Enables Task-Uncertain Lifelong and Sequential Few-Shot Learning


Title	Slow Thinking Enables Task-Uncertain Lifelong and Sequential Few-Shot Learning
Authors	Anonymous
Abstract	Lifelong machine learning focuses on adapting to novel tasks without forgetting the old tasks, whereas few-shot learning strives to learn a single task given a small amount of data. These two different research areas are crucial for artificial general intelligence, however, their existing studies have somehow assumed some impractical settings when training the models. For lifelong learning, the nature (or the quantity) of incoming tasks during inference time is assumed to be known at training time. As for few-shot learning, it is commonly assumed that a large number of tasks is available during training. Humans, on the other hand, can perform these learning tasks without regard to the aforementioned assumptions. Inspired by how the human brain works, we propose a novel model, called the Slow Thinking to Learn (STL), that makes sophisticated (and slightly slower) predictions by iteratively considering interactions between current and previously seen tasks at runtime. Having conducted experiments, the results empirically demonstrate the effectiveness of STL for more realistic lifelong and few-shot learning settings.
Tasks	Few-Shot Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HklbIerFDS
PDF	https://openreview.net/pdf?id=HklbIerFDS
PWC	https://paperswithcode.com/paper/slow-thinking-enables-task-uncertain-lifelong
Repo
Framework

Empirical Studies on the Properties of Linear Regions in Deep Neural Networks


Title	Empirical Studies on the Properties of Linear Regions in Deep Neural Networks
Authors	Anonymous
Abstract	A deep neural networks (DNN) with piecewise linear activations can partition the input space into numerous small linear regions, where different linear functions are fitted. It is believed that the number of these regions represents the expressivity of a DNN. This paper provides a novel and meticulous perspective to look into DNNs: Instead of just counting the number of the linear regions, we study their local properties, such as the inspheres, the directions of the corresponding hyperplanes, the decision boundaries, and the relevance of the surrounding regions. We empirically observed that different optimization techniques lead to completely different linear regions, even though they result in similar classification accuracies. We hope our study can inspire the design of novel optimization techniques, and help discover and analyze the behaviors of DNNs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkeFl1HKwr
PDF	https://openreview.net/pdf?id=SkeFl1HKwr
PWC	https://paperswithcode.com/paper/empirical-studies-on-the-properties-of-linear
Repo
Framework

State Alignment-based Imitation Learning


Title	State Alignment-based Imitation Learning
Authors	Anonymous
Abstract	Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of existing imitation learning methods fail because they focus on the imitation of actions. We propose a novel state alignment-based imitation learning method to train the imitator by following the state sequences in the expert demonstrations as much as possible. The alignment of states comes from both local and global perspectives. We combine them into a reinforcement learning framework by a regularized policy update objective. We show the superiority of our method on standard imitation learning settings as well as the challenging settings in which the expert and the imitator have different dynamics models.
Tasks	Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rylrdxHFDr
PDF	https://openreview.net/pdf?id=rylrdxHFDr
PWC	https://paperswithcode.com/paper/state-alignment-based-imitation-learning
Repo
Framework

Influence-Based Multi-Agent Exploration


Title	Influence-Based Multi-Agent Exploration
Authors	Anonymous
Abstract	Intrinsically motivated reinforcement learning aims to address the exploration challenge for sparse-reward tasks. However, the study of exploration methods in transition-dependent multi-agent settings is largely absent from the literature. We aim to take a step towards solving this problem. We present two exploration methods: exploration via information-theoretic influence (EITI) and exploration via decision-theoretic influence (EDTI), by exploiting the role of interaction in coordinated behaviors of agents. EITI uses mutual information to capture influence transition dynamics. EDTI uses a novel intrinsic reward, called Value of Interaction (VoI), to characterize and quantify the influence of one agent’s behavior on expected returns of other agents. By optimizing EITI or EDTI objective as a regularizer, agents are encouraged to coordinate their exploration and learn policies to optimize team performance. We show how to optimize these regularizers so that they can be easily integrated with policy gradient reinforcement learning. The resulting update rule draws a connection between coordinated exploration and intrinsic reward distribution. Finally, we empirically demonstrate the significant strength of our method in a variety of multi-agent scenarios.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJgy96EYvr
PDF	https://openreview.net/pdf?id=BJgy96EYvr
PWC	https://paperswithcode.com/paper/influence-based-multi-agent-exploration-1
Repo
Framework

Explain Your Move: Understanding Agent Actions Using Focused Feature Saliency


Title	Explain Your Move: Understanding Agent Actions Using Focused Feature Saliency
Authors	Anonymous
Abstract	As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our approach generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweights irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare our approach with existing approaches on agents trained to play board games (Chess and Go) and Atari games (Breakout, Pong and Space Invaders). We show through illustrative examples (Chess, Atari, Go), human studies (Chess), and automated evaluation methods (Chess) that our approach generates saliency maps that are more interpretable for humans than existing approaches.
Tasks	Atari Games, Board Games
Published	2020-01-01
URL	https://openreview.net/forum?id=SJgzLkBKPB
PDF	https://openreview.net/pdf?id=SJgzLkBKPB
PWC	https://paperswithcode.com/paper/explain-your-move-understanding-agent-actions
Repo
Framework

Perception-Driven Curiosity with Bayesian Surprise


Title	Perception-Driven Curiosity with Bayesian Surprise
Authors	Anonymous
Abstract	Intrinsic rewards in reinforcement learning provide a powerful algorithmic capability for agents to learn how to interact with their environment in a task-generic way. However, increased incentives for motivation can come at the cost of increased fragility to stochasticity. We introduce a method for computing an intrinsic reward for curiosity using metrics derived from sampling a latent variable model used to estimate dynamics. Ultimately, an estimate of the conditional probability of observed states is used as our intrinsic reward for curiosity. In our experiments, a video game agent uses our model to autonomously learn how to play Atari games using our curiosity reward in combination with extrinsic rewards from the game to achieve improved performance on games with sparse extrinsic rewards. When stochasticity is introduced in the environment, our method still demonstrates improved performance over the baseline.
Tasks	Atari Games
Published	2020-01-01
URL	https://openreview.net/forum?id=rJlBQkrFvr
PDF	https://openreview.net/pdf?id=rJlBQkrFvr
PWC	https://paperswithcode.com/paper/perception-driven-curiosity-with-bayesian
Repo
Framework

Striving for Simplicity in Off-Policy Deep Reinforcement Learning


Title	Striving for Simplicity in Off-Policy Deep Reinforcement Learning
Authors	Anonymous
Abstract	This paper advocates the use of offline (batch) reinforcement learning (RL) to help (1) isolate the contributions of exploitation vs. exploration in off-policy deep RL, (2) improve reproducibility of deep RL research, and (3) facilitate the design of simpler deep RL algorithms. We propose an offline RL benchmark on Atari 2600 games comprising all of the replay data of a DQN agent. Using this benchmark, we demonstrate that recent off-policy deep RL algorithms, even when trained solely on logged DQN data, can outperform online DQN. We present Random Ensemble Mixture (REM), a simple Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple Q-value estimates. The REM algorithm outperforms more complex RL agents such as C51 and QR-DQN on the offline Atari benchmark and performs comparably in the online setting.
Tasks	Atari Games, Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=ryeUg0VFwr
PDF	https://openreview.net/pdf?id=ryeUg0VFwr
PWC	https://paperswithcode.com/paper/striving-for-simplicity-in-off-policy-deep-1
Repo
Framework

Harnessing Structures for Value-Based Planning and Reinforcement Learning


Title	Harnessing Structures for Value-Based Planning and Reinforcement Learning
Authors	Anonymous
Abstract	Value-based methods constitute a fundamental methodology in planning and deep reinforcement learning (RL). In this paper, we propose to exploit the underlying structures of the state-action value function, i.e., Q function, for both planning and deep RL. In particular, if the underlying system dynamics lead to some global structures of the Q function, one should be capable of inferring the function better by leveraging such structures. Specifically, we investigate the low-rank structure, which widely exists for big data matrices. We verify empirically the existence of low-rank Q functions in the context of control and deep RL tasks (Atari games). As our key contribution, by leveraging Matrix Estimation (ME) techniques, we propose a general framework to exploit the underlying low-rank structure in Q functions, leading to a more efficient planning procedure for classical control, and additionally, a simple scheme that can be applied to any value-based RL techniques to consistently achieve better performance on ‘‘low-rank’’ tasks. Extensive experiments on control tasks and Atari games confirm the efficacy of our approach.
Tasks	Atari Games
Published	2020-01-01
URL	https://openreview.net/forum?id=rklHqRVKvH
PDF	https://openreview.net/pdf?id=rklHqRVKvH
PWC	https://paperswithcode.com/paper/harnessing-structures-for-value-based-1
Repo
Framework

Learning Key Steps to Attack Deep Reinforcement Learning Agents


Title	Learning Key Steps to Attack Deep Reinforcement Learning Agents
Authors	Anonymous
Abstract	Deep reinforcement learning agents are known to be vulnerable to adversarial attacks. In particular, recent studies have shown that attacking a few key steps is effective for decreasing the agent’s cumulative reward. However, all existing attacking methods find those key steps with human-designed heuristics, and it is not clear how more effective key steps can be identified. This paper introduces a novel reinforcement learning framework that learns more effective key steps through interacting with the agent. The proposed framework does not require any human heuristics nor knowledge, and can be flexibly coupled with any white-box or black-box adversarial attack scenarios. Experiments on benchmark Atari games across different scenarios demonstrate that the proposed framework is superior to existing methods for identifying more effective key steps.
Tasks	Adversarial Attack, Atari Games
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkxbz1HKvr
PDF	https://openreview.net/pdf?id=Hkxbz1HKvr
PWC	https://paperswithcode.com/paper/learning-key-steps-to-attack-deep
Repo
Framework

ON COMPUTATION AND GENERALIZATION OF GENER- ATIVE ADVERSARIAL IMITATION LEARNING


Title	ON COMPUTATION AND GENERALIZATION OF GENER- ATIVE ADVERSARIAL IMITATION LEARNING
Authors	Anonymous
Abstract	Generative Adversarial Imitation Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies. Different from Reinforcement Learning (RL), GAIL takes advantage of demonstration data by experts (e.g., human), and learns both the policy and reward function of the unknown environment. Despite the significant empirical progresses, the theory behind GAIL is still largely unknown. The major difficulty comes from the underlying temporal dependency of the demonstration data and the minimax computational formulation of GAIL without convex-concave structure. To bridge such a gap between theory and practice, this paper investigates the theoretical properties of GAIL. Specifically, we show: (1) For GAIL with general reward parameterization, the generalization can be guaranteed as long as the class of the reward functions is properly controlled; (2) For GAIL, where the reward is parameterized as a reproducing kernel function, GAIL can be efficiently solved by stochastic first order optimization algorithms, which attain sublinear convergence to a stationary solution. To the best of our knowledge, these are the first results on statistical and computational guarantees of imitation learning with reward/policy function ap- proximation. Numerical experiments are provided to support our analysis.
Tasks	Decision Making, Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BJl-5pNKDB
PDF	https://openreview.net/pdf?id=BJl-5pNKDB
PWC	https://paperswithcode.com/paper/on-computation-and-generalization-of-gener
Repo
Framework

Deep Ensembles: A Loss Landscape Perspective


Title	Deep Ensembles: A Loss Landscape Perspective
Authors	Anonymous
Abstract	Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and out-of-distribution robustness of deep learning models. While deep ensembles were theoretically motivated by the bootstrap, non-bootstrap ensembles trained with just random initialization also perform well in practice, which suggests that there could be other explanations for why deep ensembles work well. Bayesian neural networks, which learn distributions over the parameters of the network, are theoretically well-motivated by Bayesian principles, but do not perform as well as deep ensembles in practice, particularly under dataset shift. One possible explanation for this gap between theory and practice is that popular scalable approximate Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space. We investigate this hypothesis by building on recent work on understanding the loss landscape of neural networks and adding our own exploration to measure the similarity of functions in the space of predictions. Our results show that random initializations explore entirely different modes, while functions along an optimization trajectory or sampled from the subspace thereof cluster within a single mode predictions-wise, while often deviating significantly in the weight space. We demonstrate that while low-loss connectors between modes exist, they are not connected in the space of predictions. Developing the concept of the diversity–accuracy plane, we show that the decorrelation power of random initializations is unmatched by popular subspace sampling methods.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1xZAkrFPr
PDF	https://openreview.net/pdf?id=r1xZAkrFPr
PWC	https://paperswithcode.com/paper/deep-ensembles-a-loss-landscape-perspective
Repo
Framework