Paper Group NANR 82
Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning. Learning Reusable Options for Multi-Task Reinforcement Learning. AE-OT: A NEW GENERATIVE MODEL BASED ON EXTENDED SEMI-DISCRETE OPTIMAL TRANSPORT. Improving Exploration of Deep Reinforcement Learning using Planning for Policy Search. Off-policy Bandits with Deficient Support …
Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning
Title | Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning |
Authors | Anonymous |
Abstract | While most approaches to the problem of Inverse Reinforcement Learning (IRL) focus on estimating a reward function that best explains an expert agent’s policy or demonstrated behavior on a control task, it is often the case that such behavior is more succinctly represented by a simple reward combined with a set of hard constraints. In this setting, the agent is attempting to maximize cumulative rewards subject to these given constraints on their behavior. We reformulate the problem of IRL on Markov Decision Processes (MDPs) such that, given a nominal model of the environment and a nominal reward function, we seek to estimate state, action, and feature constraints in the environment that motivate an agent’s behavior. Our approach is based on the Maximum Entropy IRL framework, which allows us to reason about the likelihood of an expert agent’s demonstrations given our knowledge of an MDP. Using our method, we can infer which constraints can be added to the MDP to most increase the likelihood of observing these demonstrations. We present an algorithm which iteratively infers the Maximum Likelihood Constraint to best explain observed behavior, and we evaluate its efficacy using both simulated behavior and recorded data of humans navigating around an obstacle. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJliakStvH |
https://openreview.net/pdf?id=BJliakStvH | |
PWC | https://paperswithcode.com/paper/maximum-likelihood-constraint-inference-for-1 |
Repo | |
Framework | |
Learning Reusable Options for Multi-Task Reinforcement Learning
Title | Learning Reusable Options for Multi-Task Reinforcement Learning |
Authors | Anonymous |
Abstract | Reinforcement learning (RL) has become an increasingly active area of research in recent years. Although there are many algorithms that allow an agent to solve tasks efficiently, they often ignore the possibility that prior experience related to the task at hand might be available. For many practical applications, it might be unfeasible for an agent to learn how to solve a task from scratch, given that it is generally a computationally expensive process; however, prior experience could be leveraged to make these problems tractable in practice. In this paper, we propose a framework for exploiting existing experience by learning reusable options. We show that after an agent learns policies for solving a small number of problems, we are able to use the trajectories generated from those policies to learn reusable options that allow an agent to quickly learn how to solve novel and related problems. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1lkKn4KDS |
https://openreview.net/pdf?id=r1lkKn4KDS | |
PWC | https://paperswithcode.com/paper/learning-reusable-options-for-multi-task |
Repo | |
Framework | |
AE-OT: A NEW GENERATIVE MODEL BASED ON EXTENDED SEMI-DISCRETE OPTIMAL TRANSPORT
Title | AE-OT: A NEW GENERATIVE MODEL BASED ON EXTENDED SEMI-DISCRETE OPTIMAL TRANSPORT |
Authors | Anonymous |
Abstract | Generative adversarial networks (GANs) have attracted huge attention due to its capability to generate visual realistic images. However, most of the existing models suffer from the mode collapse or mode mixture problems. In this work, we give a theoretic explanation of the both problems by Figalli’s regularity theory of optimal transportation maps. Basically, the generator compute the transportation maps between the white noise distributions and the data distributions, which are in general discontinuous. However, DNNs can only represent continuous maps. This intrinsic conflict induces mode collapse and mode mixture. In order to tackle the both problems, we explicitly separate the manifold embedding and the optimal transportation; the first part is carried out using an autoencoder to map the images onto the latent space; the second part is accomplished using a GPU-based convex optimization to find the discontinuous transportation maps. Composing the extended OT map and the decoder, we can finally generate new images from the white noise. This AE-OT model avoids representing discontinuous maps by DNNs, therefore effectively prevents mode collapse and mode mixture. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkldyTNYwH |
https://openreview.net/pdf?id=HkldyTNYwH | |
PWC | https://paperswithcode.com/paper/ae-ot-a-new-generative-model-based-on |
Repo | |
Framework | |
Improving Exploration of Deep Reinforcement Learning using Planning for Policy Search
Title | Improving Exploration of Deep Reinforcement Learning using Planning for Policy Search |
Authors | Anonymous |
Abstract | Most Deep Reinforcement Learning methods perform local search and therefore are prone to get stuck on non-optimal solutions. Furthermore, in simulation based training, such as domain-randomized simulation training, the availability of a simulation model is not exploited, which potentially decreases efficiency. To overcome issues of local search and exploit access to simulation models, we propose the use of kino-dynamic planning methods as part of a model-based reinforcement learning method and to learn in an off-policy fashion from solved planning instances. We show that, even on a simple toy domain, D-RL methods (DDPG, PPO, SAC) are not immune to local optima and require additional exploration mechanisms. We show that our planning method exhibits a better state space coverage, collects data that allows for better policies than D-RL methods without additional exploration mechanisms and that starting from the planner data and performing additional training results in as good as or better policies than vanilla D-RL methods, while also creating data that is more fit for re-use in modified tasks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJe7CkrFvS |
https://openreview.net/pdf?id=rJe7CkrFvS | |
PWC | https://paperswithcode.com/paper/improving-exploration-of-deep-reinforcement |
Repo | |
Framework | |
Off-policy Bandits with Deficient Support
Title | Off-policy Bandits with Deficient Support |
Authors | Anonymous |
Abstract | Off-policy training of contextual-bandit policies is attractive in online systems (e.g. search, recommendation, ad placement), since it enables the reuse of large amounts of log data from the production system. State-of-the-art methods for off-policy learning, however, are based on inverse propensity score (IPS) weighting, which requires that the logging policy chooses all actions with non-zero probability for any context (i.e., full support). In real-world systems, this condition is often violated, and we show that existing off-policy learning methods based on IPS weighting can fail catastrophically. We therefore develop new off-policy contextual-bandit methods that can controllably and robustly learn even when the logging policy has deficient support. To this effect, we explore three approaches that provide various guarantees for safe learning despite the inherent limitations of support deficient data: restricting the action space, reward extrapolation, and restricting the policy space. We analyze the statistical and computational properties of these three approaches, and empirically evaluate their effectiveness in a series of experiments. We find that controlling the policy space is both computationally efficient and that it robustly leads to accurate policies. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SklcyJBtvB |
https://openreview.net/pdf?id=SklcyJBtvB | |
PWC | https://paperswithcode.com/paper/off-policy-bandits-with-deficient-support |
Repo | |
Framework | |
Composing Task-Agnostic Policies with Deep Reinforcement Learning
Title | Composing Task-Agnostic Policies with Deep Reinforcement Learning |
Authors | Anonymous |
Abstract | The composition of elementary behaviors to solve challenging transfer learning problems is one of the key elements in building intelligent machines. To date, there has been plenty of work on learning task-specific policies or skills but almost no focus on composing necessary, task-agnostic skills to find a solution to new problems. In this paper, we propose a novel deep reinforcement learning-based skill transfer and composition method that takes the agent’s primitive policies to solve unseen tasks. We evaluate our method in difficult cases where training policy through standard reinforcement learning (RL) or even hierarchical RL is either not feasible or exhibits high sample complexity. We show that our method not only transfers skills to new problem settings but also solves the challenging environments requiring both task planning and motion control with high data efficiency. |
Tasks | Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1ezFREtwH |
https://openreview.net/pdf?id=H1ezFREtwH | |
PWC | https://paperswithcode.com/paper/composing-task-agnostic-policies-with-deep |
Repo | |
Framework | |
Making Sense of Reinforcement Learning and Probabilistic Inference
Title | Making Sense of Reinforcement Learning and Probabilistic Inference |
Authors | Anonymous |
Abstract | Reinforcement learning (RL) combines a control problem with statistical estimation: the system dynamics are not known to the agent, but can be learned through experience. A recent line of research casts ‘RL as inference’ and suggests a particular framework to generalize the RL problem as probabilistic inference. Our paper surfaces key shortcomings in that approach, and clarifies the sense in which RL can be coherently cast as an inference problem. In particular, an RL agent must consider the effects of its actions upon future rewards and observations: the exploration-exploitation tradeoff. In all but the most simple settings, the resulting inference is computationally intractable so that practical RL algorithms must resort to approximation. We show that the popular ‘RL as inference’ approximation can perform poorly in even the simplest settings. Despite this, we demonstrate that with a small modification the RL as inference framework can provably perform well, and we connect the resulting algorithm with Thompson sampling and the recently proposed K-learning algorithm. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1xitgHtvS |
https://openreview.net/pdf?id=S1xitgHtvS | |
PWC | https://paperswithcode.com/paper/making-sense-of-reinforcement-learning-and |
Repo | |
Framework | |
Attention Privileged Reinforcement Learning for Domain Transfer
Title | Attention Privileged Reinforcement Learning for Domain Transfer |
Authors | Anonymous |
Abstract | Applying reinforcement learning (RL) to physical systems presents notable challenges, given requirements regarding sample efficiency, safety, and physical constraints compared to simulated environments. To enable transfer of policies trained in simulation, randomising simulation parameters leads to more robust policies, but also in significantly extended training time. In this paper, we exploit access to privileged information (such as environment states) often available in simulation, in order to improve and accelerate learning over randomised environments. We introduce Attention Privileged Reinforcement Learning (APRiL), which equips the agent with an attention mechanism and makes use of state information in simulation, learning to align attention between state- and image-based policies while additionally sharing generated data. During deployment we can apply the image-based policy to remove the requirement of access to additional information. We experimentally demonstrate accelerated and more robust learning on a number of diverse domains, leading to improved final performance for environments both within and outside the training distribution. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HygW26VYwS |
https://openreview.net/pdf?id=HygW26VYwS | |
PWC | https://paperswithcode.com/paper/attention-privileged-reinforcement-learning |
Repo | |
Framework | |
Relation-based Generalized Zero-shot Classification with the Domain Discriminator on the shared representation
Title | Relation-based Generalized Zero-shot Classification with the Domain Discriminator on the shared representation |
Authors | Anonymous |
Abstract | Generalized zero-shot learning (GZSL) is the task of predicting a test image from seen or unseen classes using pre-defined class-attributes and images from the seen classes. Typical ZSL models assign the class corresponding to the most relevant attribute as the predicted label of the test image based on the learned relation between the attribute and the image. However, this relation-based approach presents a difficulty: many of the test images are predicted as biased to the seen domain, i.e., the \emph{domain bias problem}. Recently, many methods have addressed this difficulty using a synthesis-based approach that, however, requires generation of large amounts of high-quality unseen images after training and the additional training of classifier given them. Therefore, for this study, we aim at alleviating this difficulty in the manner of the relation-based approach. First, we consider the requirements for good performance in a ZSL setting and introduce a new model based on a variational autoencoder that learns to embed attributes and images into the shared representation space which satisfies those requirements. Next, we assume that the domain bias problem in GZSL derives from a situation in which embedding of the unseen domain overlaps that of the seen one. We introduce a discriminator that distinguishes domains in a shared space and learns jointly with the above embedding model to prevent this situation. After training, we can obtain prior knowledge from the discriminator of which domain is more likely to be embedded anywhere in the shared space. We propose combination of this knowledge and the relation-based classification on the embedded shared space as a mixture model to compensate class prediction. Experimentally obtained results confirm that the proposed method significantly improves the domain bias problem in relation-based settings and achieves almost equal accuracy to that of high-cost synthesis-based methods. |
Tasks | Zero-Shot Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJl8ZlHFwr |
https://openreview.net/pdf?id=BJl8ZlHFwr | |
PWC | https://paperswithcode.com/paper/relation-based-generalized-zero-shot |
Repo | |
Framework | |
Robust saliency maps with distribution-preserving decoys
Title | Robust saliency maps with distribution-preserving decoys |
Authors | Anonymous |
Abstract | Saliency methods help to make deep neural network predictions more interpretable by identifying particular features, such as pixels in an image, that contribute most strongly to the network’s prediction. Unfortunately, recent evidence suggests that many saliency methods perform poorly when gradients are saturated or in the presence of strong inter-feature dependence or noise injected by an adversarial attack. In this work, we propose a data-driven technique that uses the distribution-preserving decoys to infer robust saliency scores in conjunction with a pre-trained convolutional neural network classifier and any off-the-shelf saliency method. We formulate the generation of decoys as an optimization problem, potentially applicable to any convolutional network architecture. We also propose a novel decoy-enhanced saliency score, which provably compensates for gradient saturation and considers joint activation patterns of pixels in a single-layer convolutional neural network. Empirical results on the ImageNet data set using three different deep neural network architectures—VGGNet, AlexNet and ResNet—show both qualitatively and quantitatively that decoy-enhanced saliency scores outperform raw scores produced by three existing saliency methods. |
Tasks | Adversarial Attack |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Syl89aNYwS |
https://openreview.net/pdf?id=Syl89aNYwS | |
PWC | https://paperswithcode.com/paper/robust-saliency-maps-with-distribution |
Repo | |
Framework | |
Implicit Rugosity Regularization via Data Augmentation
Title | Implicit Rugosity Regularization via Data Augmentation |
Authors | Anonymous |
Abstract | Deep (neural) networks have been applied productively in a wide range of supervised and unsupervised learning tasks. Unlike classical machine learning algorithms, deep networks typically operate in the overparameterized regime, where the number of parameters is larger than the number of training data points. Consequently, understanding the generalization properties and the role of (explicit or implicit) regularization in these networks is of great importance. In this work, we explore how the oft-used heuristic of data augmentation imposes an implicit regularization penalty of a novel measure of the rugosity or “roughness” based on the tangent Hessian of the function fit to the training data. |
Tasks | Data Augmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJg4qxSKPB |
https://openreview.net/pdf?id=HJg4qxSKPB | |
PWC | https://paperswithcode.com/paper/implicit-rugosity-regularization-via-data |
Repo | |
Framework | |
Infinite-Horizon Differentiable Model Predictive Control
Title | Infinite-Horizon Differentiable Model Predictive Control |
Authors | Anonymous |
Abstract | This paper proposes a differentiable linear quadratic Model Predictive Control (MPC) framework for safe imitation learning. The infinite-horizon cost is enforced using a terminal cost function obtained from the discrete-time algebraic Riccati equation (DARE), so that the learned controller can be proven to be stabilizing in closed-loop. A central contribution is the derivation of the analytical derivative of the solution of the DARE, thereby allowing the use of differentiation-based learning methods. A further contribution is the structure of the MPC optimization problem: an augmented Lagrangian method ensures that the MPC optimization is feasible throughout training whilst enforcing hard constraints on state and input, and a pre-stabilizing controller ensures that the MPC solution and derivatives are accurate at each iteration. The learning capabilities of the framework are demonstrated in a set of numerical studies. |
Tasks | Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxC6kSYPr |
https://openreview.net/pdf?id=ryxC6kSYPr | |
PWC | https://paperswithcode.com/paper/infinite-horizon-differentiable-model |
Repo | |
Framework | |
Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin
Title | Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin |
Authors | Anonymous |
Abstract | For linear classifiers, the relationship between (normalized) output margin and generalization is captured in a clear and simple bound – a large output margin implies good generalization. Unfortunately, for deep models, this relationship is less clear: existing analyses of the output margin give complicated bounds which sometimes depend exponentially on depth. In this work, we propose to instead analyze a new notion of margin, which we call the “all-layer margin.” Our analysis reveals that the all-layer margin has a clear and direct relationship with generalization for deep models. We present three concrete applications of the all-layer margin: 1) by analyzing the all-layer margin, we obtain tighter generalization bounds for neural nets which depend on Jacobian and hidden layer norms and remove the exponential dependency on depth 2) our neural net results easily translate to the adversarially robust setting, giving the first direct analysis of robust test error for deep networks, and 3) we present a theoretically inspired training algorithm for increasing the all-layer margin and demonstrate that our algorithm improves test performance over strong baselines in practice. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJe_yR4Fwr |
https://openreview.net/pdf?id=HJe_yR4Fwr | |
PWC | https://paperswithcode.com/paper/improved-sample-complexities-for-deep-neural |
Repo | |
Framework | |
The asymptotic spectrum of the Hessian of DNN throughout training
Title | The asymptotic spectrum of the Hessian of DNN throughout training |
Authors | Anonymous |
Abstract | The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs: we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkgscaNYPS |
https://openreview.net/pdf?id=SkgscaNYPS | |
PWC | https://paperswithcode.com/paper/the-asymptotic-spectrum-of-the-hessian-of-dnn-1 |
Repo | |
Framework | |
Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks
Title | Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks |
Authors | Anonymous |
Abstract | The performance of deep network learning strongly depends on the choice of the non-linear activation function associated with each neuron. However, deciding on the best activation is non-trivial and the choice depends on the architecture, hyper-parameters, and even on the dataset. Typically these activations are fixed by hand before training. Here, we demonstrate how to eliminate the reliance on first picking fixed activation functions by using flexible parametric rational functions instead. The resulting Padé Activation Units (PAUs) can both approximate common activation functions and also learn new ones while providing compact representations. Our empirical evidence shows that end-to-end learning deep networks with PAUs can increase the predictive performance. Moreover, PAUs pave the way to approximations with provable robustness. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlBSkHtDS |
https://openreview.net/pdf?id=BJlBSkHtDS | |
PWC | https://paperswithcode.com/paper/pade-activation-units-end-to-end-learning-of-1 |
Repo | |
Framework | |