April 1, 2020

2903 words 14 mins read

Paper Group NANR 82

Paper Group NANR 82

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning. Learning Reusable Options for Multi-Task Reinforcement Learning. AE-OT: A NEW GENERATIVE MODEL BASED ON EXTENDED SEMI-DISCRETE OPTIMAL TRANSPORT. Improving Exploration of Deep Reinforcement Learning using Planning for Policy Search. Off-policy Bandits with Deficient Support …

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

Title Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning
Authors Anonymous
Abstract While most approaches to the problem of Inverse Reinforcement Learning (IRL) focus on estimating a reward function that best explains an expert agent’s policy or demonstrated behavior on a control task, it is often the case that such behavior is more succinctly represented by a simple reward combined with a set of hard constraints. In this setting, the agent is attempting to maximize cumulative rewards subject to these given constraints on their behavior. We reformulate the problem of IRL on Markov Decision Processes (MDPs) such that, given a nominal model of the environment and a nominal reward function, we seek to estimate state, action, and feature constraints in the environment that motivate an agent’s behavior. Our approach is based on the Maximum Entropy IRL framework, which allows us to reason about the likelihood of an expert agent’s demonstrations given our knowledge of an MDP. Using our method, we can infer which constraints can be added to the MDP to most increase the likelihood of observing these demonstrations. We present an algorithm which iteratively infers the Maximum Likelihood Constraint to best explain observed behavior, and we evaluate its efficacy using both simulated behavior and recorded data of humans navigating around an obstacle.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BJliakStvH
PDF https://openreview.net/pdf?id=BJliakStvH
PWC https://paperswithcode.com/paper/maximum-likelihood-constraint-inference-for-1
Repo
Framework

Learning Reusable Options for Multi-Task Reinforcement Learning

Title Learning Reusable Options for Multi-Task Reinforcement Learning
Authors Anonymous
Abstract Reinforcement learning (RL) has become an increasingly active area of research in recent years. Although there are many algorithms that allow an agent to solve tasks efficiently, they often ignore the possibility that prior experience related to the task at hand might be available. For many practical applications, it might be unfeasible for an agent to learn how to solve a task from scratch, given that it is generally a computationally expensive process; however, prior experience could be leveraged to make these problems tractable in practice. In this paper, we propose a framework for exploiting existing experience by learning reusable options. We show that after an agent learns policies for solving a small number of problems, we are able to use the trajectories generated from those policies to learn reusable options that allow an agent to quickly learn how to solve novel and related problems.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1lkKn4KDS
PDF https://openreview.net/pdf?id=r1lkKn4KDS
PWC https://paperswithcode.com/paper/learning-reusable-options-for-multi-task
Repo
Framework

AE-OT: A NEW GENERATIVE MODEL BASED ON EXTENDED SEMI-DISCRETE OPTIMAL TRANSPORT

Title AE-OT: A NEW GENERATIVE MODEL BASED ON EXTENDED SEMI-DISCRETE OPTIMAL TRANSPORT
Authors Anonymous
Abstract Generative adversarial networks (GANs) have attracted huge attention due to its capability to generate visual realistic images. However, most of the existing models suffer from the mode collapse or mode mixture problems. In this work, we give a theoretic explanation of the both problems by Figalli’s regularity theory of optimal transportation maps. Basically, the generator compute the transportation maps between the white noise distributions and the data distributions, which are in general discontinuous. However, DNNs can only represent continuous maps. This intrinsic conflict induces mode collapse and mode mixture. In order to tackle the both problems, we explicitly separate the manifold embedding and the optimal transportation; the first part is carried out using an autoencoder to map the images onto the latent space; the second part is accomplished using a GPU-based convex optimization to find the discontinuous transportation maps. Composing the extended OT map and the decoder, we can finally generate new images from the white noise. This AE-OT model avoids representing discontinuous maps by DNNs, therefore effectively prevents mode collapse and mode mixture.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HkldyTNYwH
PDF https://openreview.net/pdf?id=HkldyTNYwH
PWC https://paperswithcode.com/paper/ae-ot-a-new-generative-model-based-on
Repo
Framework
Title Improving Exploration of Deep Reinforcement Learning using Planning for Policy Search
Authors Anonymous
Abstract Most Deep Reinforcement Learning methods perform local search and therefore are prone to get stuck on non-optimal solutions. Furthermore, in simulation based training, such as domain-randomized simulation training, the availability of a simulation model is not exploited, which potentially decreases efficiency. To overcome issues of local search and exploit access to simulation models, we propose the use of kino-dynamic planning methods as part of a model-based reinforcement learning method and to learn in an off-policy fashion from solved planning instances. We show that, even on a simple toy domain, D-RL methods (DDPG, PPO, SAC) are not immune to local optima and require additional exploration mechanisms. We show that our planning method exhibits a better state space coverage, collects data that allows for better policies than D-RL methods without additional exploration mechanisms and that starting from the planner data and performing additional training results in as good as or better policies than vanilla D-RL methods, while also creating data that is more fit for re-use in modified tasks.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJe7CkrFvS
PDF https://openreview.net/pdf?id=rJe7CkrFvS
PWC https://paperswithcode.com/paper/improving-exploration-of-deep-reinforcement
Repo
Framework

Off-policy Bandits with Deficient Support

Title Off-policy Bandits with Deficient Support
Authors Anonymous
Abstract Off-policy training of contextual-bandit policies is attractive in online systems (e.g. search, recommendation, ad placement), since it enables the reuse of large amounts of log data from the production system. State-of-the-art methods for off-policy learning, however, are based on inverse propensity score (IPS) weighting, which requires that the logging policy chooses all actions with non-zero probability for any context (i.e., full support). In real-world systems, this condition is often violated, and we show that existing off-policy learning methods based on IPS weighting can fail catastrophically. We therefore develop new off-policy contextual-bandit methods that can controllably and robustly learn even when the logging policy has deficient support. To this effect, we explore three approaches that provide various guarantees for safe learning despite the inherent limitations of support deficient data: restricting the action space, reward extrapolation, and restricting the policy space. We analyze the statistical and computational properties of these three approaches, and empirically evaluate their effectiveness in a series of experiments. We find that controlling the policy space is both computationally efficient and that it robustly leads to accurate policies.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SklcyJBtvB
PDF https://openreview.net/pdf?id=SklcyJBtvB
PWC https://paperswithcode.com/paper/off-policy-bandits-with-deficient-support
Repo
Framework

Composing Task-Agnostic Policies with Deep Reinforcement Learning

Title Composing Task-Agnostic Policies with Deep Reinforcement Learning
Authors Anonymous
Abstract The composition of elementary behaviors to solve challenging transfer learning problems is one of the key elements in building intelligent machines. To date, there has been plenty of work on learning task-specific policies or skills but almost no focus on composing necessary, task-agnostic skills to find a solution to new problems. In this paper, we propose a novel deep reinforcement learning-based skill transfer and composition method that takes the agent’s primitive policies to solve unseen tasks. We evaluate our method in difficult cases where training policy through standard reinforcement learning (RL) or even hierarchical RL is either not feasible or exhibits high sample complexity. We show that our method not only transfers skills to new problem settings but also solves the challenging environments requiring both task planning and motion control with high data efficiency.
Tasks Transfer Learning
Published 2020-01-01
URL https://openreview.net/forum?id=H1ezFREtwH
PDF https://openreview.net/pdf?id=H1ezFREtwH
PWC https://paperswithcode.com/paper/composing-task-agnostic-policies-with-deep
Repo
Framework

Making Sense of Reinforcement Learning and Probabilistic Inference

Title Making Sense of Reinforcement Learning and Probabilistic Inference
Authors Anonymous
Abstract Reinforcement learning (RL) combines a control problem with statistical estimation: the system dynamics are not known to the agent, but can be learned through experience. A recent line of research casts ‘RL as inference’ and suggests a particular framework to generalize the RL problem as probabilistic inference. Our paper surfaces key shortcomings in that approach, and clarifies the sense in which RL can be coherently cast as an inference problem. In particular, an RL agent must consider the effects of its actions upon future rewards and observations: the exploration-exploitation tradeoff. In all but the most simple settings, the resulting inference is computationally intractable so that practical RL algorithms must resort to approximation. We show that the popular ‘RL as inference’ approximation can perform poorly in even the simplest settings. Despite this, we demonstrate that with a small modification the RL as inference framework can provably perform well, and we connect the resulting algorithm with Thompson sampling and the recently proposed K-learning algorithm.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=S1xitgHtvS
PDF https://openreview.net/pdf?id=S1xitgHtvS
PWC https://paperswithcode.com/paper/making-sense-of-reinforcement-learning-and
Repo
Framework

Attention Privileged Reinforcement Learning for Domain Transfer

Title Attention Privileged Reinforcement Learning for Domain Transfer
Authors Anonymous
Abstract Applying reinforcement learning (RL) to physical systems presents notable challenges, given requirements regarding sample efficiency, safety, and physical constraints compared to simulated environments. To enable transfer of policies trained in simulation, randomising simulation parameters leads to more robust policies, but also in significantly extended training time. In this paper, we exploit access to privileged information (such as environment states) often available in simulation, in order to improve and accelerate learning over randomised environments. We introduce Attention Privileged Reinforcement Learning (APRiL), which equips the agent with an attention mechanism and makes use of state information in simulation, learning to align attention between state- and image-based policies while additionally sharing generated data. During deployment we can apply the image-based policy to remove the requirement of access to additional information. We experimentally demonstrate accelerated and more robust learning on a number of diverse domains, leading to improved final performance for environments both within and outside the training distribution.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HygW26VYwS
PDF https://openreview.net/pdf?id=HygW26VYwS
PWC https://paperswithcode.com/paper/attention-privileged-reinforcement-learning
Repo
Framework

Relation-based Generalized Zero-shot Classification with the Domain Discriminator on the shared representation

Title Relation-based Generalized Zero-shot Classification with the Domain Discriminator on the shared representation
Authors Anonymous
Abstract Generalized zero-shot learning (GZSL) is the task of predicting a test image from seen or unseen classes using pre-defined class-attributes and images from the seen classes. Typical ZSL models assign the class corresponding to the most relevant attribute as the predicted label of the test image based on the learned relation between the attribute and the image. However, this relation-based approach presents a difficulty: many of the test images are predicted as biased to the seen domain, i.e., the \emph{domain bias problem}. Recently, many methods have addressed this difficulty using a synthesis-based approach that, however, requires generation of large amounts of high-quality unseen images after training and the additional training of classifier given them. Therefore, for this study, we aim at alleviating this difficulty in the manner of the relation-based approach. First, we consider the requirements for good performance in a ZSL setting and introduce a new model based on a variational autoencoder that learns to embed attributes and images into the shared representation space which satisfies those requirements. Next, we assume that the domain bias problem in GZSL derives from a situation in which embedding of the unseen domain overlaps that of the seen one. We introduce a discriminator that distinguishes domains in a shared space and learns jointly with the above embedding model to prevent this situation. After training, we can obtain prior knowledge from the discriminator of which domain is more likely to be embedded anywhere in the shared space. We propose combination of this knowledge and the relation-based classification on the embedded shared space as a mixture model to compensate class prediction. Experimentally obtained results confirm that the proposed method significantly improves the domain bias problem in relation-based settings and achieves almost equal accuracy to that of high-cost synthesis-based methods.
Tasks Zero-Shot Learning
Published 2020-01-01
URL https://openreview.net/forum?id=BJl8ZlHFwr
PDF https://openreview.net/pdf?id=BJl8ZlHFwr
PWC https://paperswithcode.com/paper/relation-based-generalized-zero-shot
Repo
Framework

Robust saliency maps with distribution-preserving decoys

Title Robust saliency maps with distribution-preserving decoys
Authors Anonymous
Abstract Saliency methods help to make deep neural network predictions more interpretable by identifying particular features, such as pixels in an image, that contribute most strongly to the network’s prediction. Unfortunately, recent evidence suggests that many saliency methods perform poorly when gradients are saturated or in the presence of strong inter-feature dependence or noise injected by an adversarial attack. In this work, we propose a data-driven technique that uses the distribution-preserving decoys to infer robust saliency scores in conjunction with a pre-trained convolutional neural network classifier and any off-the-shelf saliency method. We formulate the generation of decoys as an optimization problem, potentially applicable to any convolutional network architecture. We also propose a novel decoy-enhanced saliency score, which provably compensates for gradient saturation and considers joint activation patterns of pixels in a single-layer convolutional neural network. Empirical results on the ImageNet data set using three different deep neural network architectures—VGGNet, AlexNet and ResNet—show both qualitatively and quantitatively that decoy-enhanced saliency scores outperform raw scores produced by three existing saliency methods.
Tasks Adversarial Attack
Published 2020-01-01
URL https://openreview.net/forum?id=Syl89aNYwS
PDF https://openreview.net/pdf?id=Syl89aNYwS
PWC https://paperswithcode.com/paper/robust-saliency-maps-with-distribution
Repo
Framework

Implicit Rugosity Regularization via Data Augmentation

Title Implicit Rugosity Regularization via Data Augmentation
Authors Anonymous
Abstract Deep (neural) networks have been applied productively in a wide range of supervised and unsupervised learning tasks. Unlike classical machine learning algorithms, deep networks typically operate in the overparameterized regime, where the number of parameters is larger than the number of training data points. Consequently, understanding the generalization properties and the role of (explicit or implicit) regularization in these networks is of great importance. In this work, we explore how the oft-used heuristic of data augmentation imposes an implicit regularization penalty of a novel measure of the rugosity or “roughness” based on the tangent Hessian of the function fit to the training data.
Tasks Data Augmentation
Published 2020-01-01
URL https://openreview.net/forum?id=HJg4qxSKPB
PDF https://openreview.net/pdf?id=HJg4qxSKPB
PWC https://paperswithcode.com/paper/implicit-rugosity-regularization-via-data
Repo
Framework

Infinite-Horizon Differentiable Model Predictive Control

Title Infinite-Horizon Differentiable Model Predictive Control
Authors Anonymous
Abstract This paper proposes a differentiable linear quadratic Model Predictive Control (MPC) framework for safe imitation learning. The infinite-horizon cost is enforced using a terminal cost function obtained from the discrete-time algebraic Riccati equation (DARE), so that the learned controller can be proven to be stabilizing in closed-loop. A central contribution is the derivation of the analytical derivative of the solution of the DARE, thereby allowing the use of differentiation-based learning methods. A further contribution is the structure of the MPC optimization problem: an augmented Lagrangian method ensures that the MPC optimization is feasible throughout training whilst enforcing hard constraints on state and input, and a pre-stabilizing controller ensures that the MPC solution and derivatives are accurate at each iteration. The learning capabilities of the framework are demonstrated in a set of numerical studies.
Tasks Imitation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=ryxC6kSYPr
PDF https://openreview.net/pdf?id=ryxC6kSYPr
PWC https://paperswithcode.com/paper/infinite-horizon-differentiable-model
Repo
Framework

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin

Title Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin
Authors Anonymous
Abstract For linear classifiers, the relationship between (normalized) output margin and generalization is captured in a clear and simple bound – a large output margin implies good generalization. Unfortunately, for deep models, this relationship is less clear: existing analyses of the output margin give complicated bounds which sometimes depend exponentially on depth. In this work, we propose to instead analyze a new notion of margin, which we call the “all-layer margin.” Our analysis reveals that the all-layer margin has a clear and direct relationship with generalization for deep models. We present three concrete applications of the all-layer margin: 1) by analyzing the all-layer margin, we obtain tighter generalization bounds for neural nets which depend on Jacobian and hidden layer norms and remove the exponential dependency on depth 2) our neural net results easily translate to the adversarially robust setting, giving the first direct analysis of robust test error for deep networks, and 3) we present a theoretically inspired training algorithm for increasing the all-layer margin and demonstrate that our algorithm improves test performance over strong baselines in practice.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HJe_yR4Fwr
PDF https://openreview.net/pdf?id=HJe_yR4Fwr
PWC https://paperswithcode.com/paper/improved-sample-complexities-for-deep-neural
Repo
Framework

The asymptotic spectrum of the Hessian of DNN throughout training

Title The asymptotic spectrum of the Hessian of DNN throughout training
Authors Anonymous
Abstract The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs: we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SkgscaNYPS
PDF https://openreview.net/pdf?id=SkgscaNYPS
PWC https://paperswithcode.com/paper/the-asymptotic-spectrum-of-the-hessian-of-dnn-1
Repo
Framework

Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks

Title Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks
Authors Anonymous
Abstract The performance of deep network learning strongly depends on the choice of the non-linear activation function associated with each neuron. However, deciding on the best activation is non-trivial and the choice depends on the architecture, hyper-parameters, and even on the dataset. Typically these activations are fixed by hand before training. Here, we demonstrate how to eliminate the reliance on first picking fixed activation functions by using flexible parametric rational functions instead. The resulting Padé Activation Units (PAUs) can both approximate common activation functions and also learn new ones while providing compact representations. Our empirical evidence shows that end-to-end learning deep networks with PAUs can increase the predictive performance. Moreover, PAUs pave the way to approximations with provable robustness.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BJlBSkHtDS
PDF https://openreview.net/pdf?id=BJlBSkHtDS
PWC https://paperswithcode.com/paper/pade-activation-units-end-to-end-learning-of-1
Repo
Framework
comments powered by Disqus