April 1, 2020

3194 words 15 mins read

Paper Group NANR 56

Paper Group NANR 56

Meta-Learning with Warped Gradient Descent. Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment. Domain Aggregation Networks for Multi-Source Domain Adaptation. Using Explainabilty to Detect Adversarial Attacks. Domain-invariant Learning using Adaptive Filter Decomposition. VILD: Variational Imitation Learning with Diverse-qua …

Meta-Learning with Warped Gradient Descent

Title Meta-Learning with Warped Gradient Descent
Authors Anonymous
Abstract Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural network that directly produces updates or by attempting to learn better initialisations or scaling factors for a gradient-based update rule. Both these approaches pose challenges. On one hand, directly producing an update forgoes a useful inductive bias and can easily lead to non-converging behaviour. On the other hand, approaches that try to control a gradient-based update rule typically resort to computing gradients through the learning process to obtain their meta-gradients, leading to methods that can not scale beyond few-shot task adaptation. In this work we propose Warped Gradient Descent (WarpGrad), a method that intersects these approaches to mitigate their limitations. WarpGrad meta-learns an efficiently parameterised preconditioning matrix that facilitates gradient descent across the task distribution. Preconditioning arises by interleaving non-linear layers, referred to as warp-layers, between the layers of a task-learner. Warp-layers are meta-learned without backpropagating through the task training process in a manner similar to methods that learn to directly produce updates. WarpGrad is computationally efficient, easy to implement, and can scale to arbitrarily large meta-learning problems. We provide a geometrical interpretation of the approach and evaluate its effectiveness in a variety of settings, including few-shot, standard supervised, continual and reinforcement learning.
Tasks Meta-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rkeiQlBFPB
PDF https://openreview.net/pdf?id=rkeiQlBFPB
PWC https://paperswithcode.com/paper/meta-learning-with-warped-gradient-descent-1
Repo
Framework

Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment

Title Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment
Authors Anonymous
Abstract Unsupervised knowledge transfer has a great potential to improve the generalizability of deep models to novel domains. Yet the current literature assumes that the label distribution is domain-invariant and only aligns the covariate or vice versa. In this paper, we explore the task of Generalized Domain Adaptation (GDA): How to transfer knowledge across different domains in the presence of both covariate and label shift? We propose a covariate and label distribution CO-ALignment (COAL) model to tackle this problem. Our model leverages prototype-based conditional alignment and label distribution estimation to diminish the covariate and label shifts, respectively. We demonstrate experimentally that when both types of shift exist in the data, COAL leads to state-of-the-art performance on several cross-domain benchmarks.
Tasks Domain Adaptation, Transfer Learning
Published 2020-01-01
URL https://openreview.net/forum?id=BJexP6VKwH
PDF https://openreview.net/pdf?id=BJexP6VKwH
PWC https://paperswithcode.com/paper/generalized-domain-adaptation-with-covariate
Repo
Framework

Domain Aggregation Networks for Multi-Source Domain Adaptation

Title Domain Aggregation Networks for Multi-Source Domain Adaptation
Authors Anonymous
Abstract In many real-world applications, we want to exploit multiple source datasets of similar tasks to learn a model for a different but related target dataset – e.g., recognizing characters of a new font using a set of different fonts. While most recent research has considered ad-hoc combination rules to address this problem, we extend previous work on domain discrepancy minimization to develop a finite-sample generalization bound, and accordingly propose a theoretically justified optimization procedure. The algorithm we develop, Domain AggRegation Network (DARN), is able to effectively adjust the weight of each source domain during training to ensure relevant domains are given more importance for adaptation. We evaluate the proposed method on real-world sentiment analysis and digit recognition datasets and show that DARN can significantly outperform the state-of-the-art alternatives.
Tasks Domain Adaptation, Sentiment Analysis
Published 2020-01-01
URL https://openreview.net/forum?id=ByljMaNKwB
PDF https://openreview.net/pdf?id=ByljMaNKwB
PWC https://paperswithcode.com/paper/domain-aggregation-networks-for-multi-source-1
Repo
Framework

Using Explainabilty to Detect Adversarial Attacks

Title Using Explainabilty to Detect Adversarial Attacks
Authors Anonymous
Abstract Deep learning models are often sensitive to adversarial attacks, where carefully-designed input samples can cause the system to produce incorrect decisions. Here we focus on the problem of detecting attacks, rather than robust classification, since detecting that an attack occurs may be even more important than avoiding misclassification. We build on advances in explainability, where activity-map-like explanations are used to justify and validate decisions, by highlighting features that are involved with a classification decision. The key observation is that it is hard to create explanations for incorrect decisions. We propose EXAID, a novel attack-detection approach, which uses model explainability to identify images whose explanations are inconsistent with the predicted class. Specifically, we use SHAP, which uses Shapley values in the space of the input image, to identify which input features contribute to a class decision. Interestingly, this approach does not require to modify the attacked model, and it can be applied without modelling a specific attack. It can therefore be applied successfully to detect unfamiliar attacks, that were unknown at the time the detection model was designed. We evaluate EXAID on two benchmark datasets CIFAR-10 and SVHN, and against three leading attack techniques, FGSM, PGD and C&W. We find that EXAID improves over the SoTA detection methods by a large margin across a wide range of noise levels, improving detection from 70% to over 90% for small perturbations.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=B1xu6yStPH
PDF https://openreview.net/pdf?id=B1xu6yStPH
PWC https://paperswithcode.com/paper/using-explainabilty-to-detect-adversarial
Repo
Framework

Domain-invariant Learning using Adaptive Filter Decomposition

Title Domain-invariant Learning using Adaptive Filter Decomposition
Authors Anonymous
Abstract Domain shifts are frequently encountered in real-world scenarios. In this paper, we consider the problem of domain-invariant deep learning by explicitly modeling domain shifts with only a small amount of domain-specific parameters in a Convolutional Neural Network (CNN). By exploiting the observation that a convolutional filter can be well approximated as a linear combination of a small set of basis elements, we show for the first time, both empirically and theoretically, that domain shifts can be effectively handled by decomposing a regular convolutional layer into a domain-specific basis layer and a domain-shared basis coefficient layer, while both remain convolutional. An input channel will now first convolve spatially only with each respective domain-specific basis to ``absorb” domain variations, and then output channels are linearly combined using common basis coefficients trained to promote shared semantics across domains. We use toy examples, rigorous analysis, and real-world examples to show the framework’s effectiveness in cross-domain performance and domain adaptation. With the proposed architecture, we need only a small set of basis elements to model each additional domain, which brings a negligible amount of additional parameters, typically a few hundred. |
Tasks Domain Adaptation
Published 2020-01-01
URL https://openreview.net/forum?id=SJxHMaEtwB
PDF https://openreview.net/pdf?id=SJxHMaEtwB
PWC https://paperswithcode.com/paper/domain-invariant-learning-using-adaptive-1
Repo
Framework

VILD: Variational Imitation Learning with Diverse-quality Demonstrations

Title VILD: Variational Imitation Learning with Diverse-quality Demonstrations
Authors Anonymous
Abstract The goal of imitation learning (IL) is to learn a good policy from high-quality demonstrations. However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs. IL in such situations can be challenging, especially when the level of demonstrators’ expertise is unknown. We propose a new IL paradigm called Variational Imitation Learning with Diverse-quality demonstrations (VILD), where we explicitly model the level of demonstrators’ expertise with a probabilistic graphical model and estimate it along with a reward function. We show that a naive estimation approach is not suitable to large state and action spaces, and fix this issue by using a variational approach that can be easily implemented using existing reinforcement learning methods. Experiments on continuous-control benchmarks demonstrate that VILD outperforms state-of-the-art methods. Our work enables scalable and data-efficient IL under more realistic settings than before.
Tasks Continuous Control, Imitation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=SJgNkpVFPr
PDF https://openreview.net/pdf?id=SJgNkpVFPr
PWC https://paperswithcode.com/paper/vild-variational-imitation-learning-with-1
Repo
Framework

Imitation Learning via Off-Policy Distribution Matching

Title Imitation Learning via Off-Policy Distribution Matching
Authors Anonymous
Abstract When performing imitation learning from expert demonstrations, distribution matching is a popular approach, in which one alternates between estimating distribution ratios and then using these ratios as rewards in a standard reinforcement learning (RL) algorithm. Traditionally, estimation of the distribution ratio requires on-policy data, which has caused previous work to either be exorbitantly data- inefficient or alter the original objective in a manner that can drastically change its optimum. In this work, we show how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective. In addition to the data-efficiency that this provides, we are able to show that this objective also renders the use of a separate RL optimization unnecessary. Rather, an imitation policy may be learned directly from this objective without the use of explicit rewards. We call the resulting algorithm ValueDICE and evaluate it on a suite of popular imitation learning benchmarks, finding that it can achieve state-of-the-art sample efficiency and performance.
Tasks Imitation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=Hyg-JC4FDr
PDF https://openreview.net/pdf?id=Hyg-JC4FDr
PWC https://paperswithcode.com/paper/imitation-learning-via-off-policy
Repo
Framework

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Title Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
Authors Anonymous
Abstract Modern deep learning methods provide effective means to learn good representations. However, is a good representation itself sufficient for efficient reinforcement learning? This question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning. This work provides strong negative results for reinforcement learning methods with function approximation for which a good representation (feature extractor) is known to the agent, focusing on natural representational conditions relevant to value-based learning and policy-based learning. For value-based learning, we show that even if the agent has a highly accurate linear representation, the agent still needs to sample an exponential number trajectories in order to find a near-optimal policy. For policy-based learning, we show even if the agent’s linear representation is capable of perfectly predicting the optimal action at any state, the agent still needs to sample an exponential number of trajectories in order to find a near-optimal policy. These lower bounds highlight the fact that having a good (value-based or policy-based) representation in and of itself is insufficient for efficient reinforcement learning and that additional assumptions are needed. In particular, these results provide new insights into why the analysis of existing provably efficient reinforcement learning methods make assumptions which are partly model-based in nature. Furthermore, our lower bounds also imply exponential separations on the sample complexity between 1) value-based learning with perfect representation and value-based learning with a good-but-not-perfect representation, 2) value-based learning and policy-based learning, 3) policy-based learning and supervised learning and 4) reinforcement learning and imitation learning.
Tasks Imitation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=r1genAVKPB
PDF https://openreview.net/pdf?id=r1genAVKPB
PWC https://paperswithcode.com/paper/is-a-good-representation-sufficient-for-1
Repo
Framework

Adaptive Adversarial Imitation Learning

Title Adaptive Adversarial Imitation Learning
Authors Anonymous
Abstract We present the ADaptive Adversarial Imitation Learning (ADAIL) algorithm for learning adaptive policies that can be transferred between environments of varying dynamics, by imitating a small number of demonstrations collected from a single source domain. This problem is important in robotic learning because in real world scenarios 1) reward functions are hard to obtain, 2) learned policies from one domain are difficult to deploy in another due to varying source to target domain statistics, 3) collecting expert demonstrations in multiple environments where the dynamics are known and controlled is often infeasible. We address these constraints by building upon recent advances in adversarial imitation learning; we condition our policy on a learned dynamics embedding and we employ a domain-adversarial loss to learn a dynamics-invariant discriminator. The effectiveness of our method is demonstrated on simulated control tasks with varying environment dynamics and the learned adaptive agent outperforms several recent baselines.
Tasks Imitation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=HklvMJSYPB
PDF https://openreview.net/pdf?id=HklvMJSYPB
PWC https://paperswithcode.com/paper/adaptive-adversarial-imitation-learning
Repo
Framework

Improving Adversarial Robustness Requires Revisiting Misclassified Examples

Title Improving Adversarial Robustness Requires Revisiting Misclassified Examples
Authors Anonymous
Abstract Deep neural networks (DNNs) are vulnerable to adversarial examples crafted by imperceptible perturbations. A range of defense techniques have been proposed to improve DNN robustness to adversarial examples, among which adversarial training has been demonstrated to be the most effective. Adversarial training is often formulated as a min-max optimization problem, with the inner maximization for generating adversarial examples. However, there exists a simple, yet easily overlooked fact that adversarial examples are only defined on correctly classified (natural) examples, but inevitably, some (natural) examples will be misclassified during training. In this paper, we investigate the distinctive influence of misclassified and correctly classified examples on the final robustness of adversarial training. Specifically, we find that misclassified examples indeed have a significant impact on the final robustness. More surprisingly, we find that different maximization techniques on misclassified examples may have a negligible influence on the final robustness, while different minimization techniques are crucial. Motivated by the above discovery, we propose a new defense algorithm called {\em Misclassification Aware adveRsarial Training} (MART), which explicitly differentiates the misclassified and correctly classified examples during the training. We also propose a semi-supervised extension of MART, which can leverage the unlabeled data to further improve the robustness. Experimental results show that MART and its variant could significantly improve the state-of-the-art adversarial robustness.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rklOg6EFwS
PDF https://openreview.net/pdf?id=rklOg6EFwS
PWC https://paperswithcode.com/paper/improving-adversarial-robustness-requires
Repo
Framework

Certifiably Robust Interpretation in Deep Learning

Title Certifiably Robust Interpretation in Deep Learning
Authors Anonymous
Abstract Deep learning interpretation is essential to explain the reasoning behind model predictions. Understanding the robustness of interpretation methods is important especially in sensitive domains such as medical applications since interpretation results are often used in downstream tasks. Although gradient-based saliency maps are popular methods for deep learning interpretation, recent works show that they can be vulnerable to adversarial attacks. In this paper, we address this problem and provide a certifiable defense method for deep learning interpretation. We show that a sparsified version of the popular SmoothGrad method, which computes the average saliency maps over random perturbations of the input, is certifiably robust against adversarial perturbations. We obtain this result by extending recent bounds for certifiably robust smooth classifiers to the interpretation setting. Experiments on ImageNet samples validate our theory.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rkxVz1HKwB
PDF https://openreview.net/pdf?id=rkxVz1HKwB
PWC https://paperswithcode.com/paper/certifiably-robust-interpretation-in-deep-1
Repo
Framework

Rényi Fair Inference

Title Rényi Fair Inference
Authors Anonymous
Abstract Machine learning algorithms have been increasingly deployed in critical automated decision-making systems that directly affect human lives. When these algorithms are solely trained to minimize the training/test error, they could suffer from systematic discrimination against individuals based on their sensitive attributes, such as gender or race. Recently, there has been a surge in machine learning society to develop algorithms for fair machine learning. In particular, several adversarial learning procedures have been proposed to impose fairness. Unfortunately, these algorithms either can only impose fairness up to linear dependence between the variables, or they lack computational convergence guarantees. In this paper, we use Rényi correlation as a measure of fairness of machine learning models and develop a general training framework to impose fairness. In particular, we propose a min-max formulation which balances the accuracy and fairness when solved to optimality. For the case of discrete sensitive attributes, we suggest an iterative algorithm with theoretical convergence guarantee for solving the proposed min-max problem. Our algorithm and analysis are then specialized to fair classification and fair clustering problems. To demonstrate the performance of the proposed Rényi fair inference framework in practice, we compare it with well-known existing methods on several benchmark datasets. Experiments indicate that the proposed method has favorable empirical performance against state-of-the-art approaches.
Tasks Decision Making
Published 2020-01-01
URL https://openreview.net/forum?id=HkgsUJrtDB
PDF https://openreview.net/pdf?id=HkgsUJrtDB
PWC https://paperswithcode.com/paper/renyi-fair-inference-1
Repo
Framework

Zero-Shot Policy Transfer with Disentangled Attention

Title Zero-Shot Policy Transfer with Disentangled Attention
Authors Anonymous
Abstract Domain adaptation is an open problem in deep reinforcement learning (RL). Often, agents are asked to perform in environments where data is difficult to obtain. In such settings, agents are trained in similar environments, such as simulators, and are then transferred to the original environment. The gap between visual observations of the source and target environments often causes the agent to fail in the target environment. We present a new RL agent, SADALA (Soft Attention DisentAngled representation Learning Agent). SADALA first learns a compressed state representation. It then jointly learns to ignore distracting features and solve the task presented. SADALA’s separation of important and unimportant visual features leads to robust domain transfer. SADALA outperforms both prior disentangled-representation based RL and domain randomization approaches across RL environments (Visual Cartpole and DeepMind Lab).
Tasks Domain Adaptation, Representation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=HklPzxHFwB
PDF https://openreview.net/pdf?id=HklPzxHFwB
PWC https://paperswithcode.com/paper/zero-shot-policy-transfer-with-disentangled
Repo
Framework

Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

Title Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm
Authors Anonymous
Abstract How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-\beta}$ where $n$ is the number of training examples and $\beta$ an exponent that depends on both data and algorithm. In this work we measure $\beta$ when applying kernel methods to real datasets. For MNIST we find $\beta\approx 0.4$ and for CIFAR10 $\beta\approx 0.1$. Remarkably, $\beta$ is the same for regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we introduce the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption — namely that the data are sampled from a regular lattice — we derive analytically $\beta$ for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies, $\beta$ depends only on the training data and their dimension. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, our results quantify how smooth Gaussian data should be to avoid the curse of dimensionality, and indicate that for kernel learning the relevant dimension of the data should be defined in terms of how the distance between nearest data points depends on $n$. With this definition one obtains reasonable effective smoothness estimates for MNIST and CIFAR10.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1enqkBtwr
PDF https://openreview.net/pdf?id=r1enqkBtwr
PWC https://paperswithcode.com/paper/asymptotic-learning-curves-of-kernel-methods-1
Repo
Framework

Bounds on Over-Parameterization for Guaranteed Existence of Descent Paths in Shallow ReLU Networks

Title Bounds on Over-Parameterization for Guaranteed Existence of Descent Paths in Shallow ReLU Networks
Authors Anonymous
Abstract We study the landscape of squared loss in neural networks with one-hidden layer and ReLU activation functions. Let $m$ and $d$ be the widths of hidden and input layers, respectively. We show that there exit poor local minima with positive curvature for some training sets of size $n\geq m+2d-2$. By positive curvature of a local minimum, we mean that within a small neighborhood the loss function is strictly increasing in all directions. Consequently, for such training sets, there are initialization of weights from which there is no descent path to global optima. It is known that for $n\le m$, there always exist descent paths to global optima from all initial weights. In this perspective, our results provide a somewhat sharp characterization of the over-parameterization required for “existence of descent paths” in the loss landscape.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BkgXHTNtvS
PDF https://openreview.net/pdf?id=BkgXHTNtvS
PWC https://paperswithcode.com/paper/bounds-on-over-parameterization-for
Repo
Framework
comments powered by Disqus