Paper Group NANR 104
Adversarial AutoAugment. Episodic Reinforcement Learning with Associative Memory. Plug and Play Language Model: A simple baseline for controlled language generation. Abductive Commonsense Reasoning. Once for All: Train One Network and Specialize it for Efficient Deployment. DeFINE: Deep Factorized Input Word Embeddings for Neural Sequence Modeling. …
Adversarial AutoAugment
Title | Adversarial AutoAugment |
Authors | Anonymous |
Abstract | Data augmentation (DA) has been widely utilized to improve generalization in training deep neural networks. Recently, human-designed data augmentation has been gradually replaced by automatically learned augmentation policy. Through finding the best policy in well-designed search space of data augmentation, AutoAugment (Cubuk et al., 2018) can significantly improve validation accuracy on image classification tasks. However, this approach is not computationally practical for large problems. In this paper, we develop an adversarial method to arrive at a computationally-affordable solution called Adversarial AutoAugment, which can simultaneously optimizes target related object and augmentation policy search loss. The augmentation policy network attempts to increase the training loss of a target network through generating adversarial augmentation policies, while the target network can learn more robust features from harder examples to improve the generalization. In contrast to prior work, we reuse the computation in target network training for policy evaluation, and dispense with the retraining of the target network. Compared to AutoAugment, this leads to about 12x reduction in computing cost and 11x shortening in time overhead on ImageNet. We show experimental results of our approach on CIFAR-10/CIFAR-100, ImageNet, and demonstrate significant performance improvements over state-of-the-art. On CIFAR-10, we achieve a top-1 test error of 1.36%, which is the currently best performing single model. On ImageNet, we achieve a leading performance of top-1 accuracy 79.40% on ResNet-50 and 80.00% on ResNet-50-D without extra data. |
Tasks | Data Augmentation, Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByxdUySKvS |
https://openreview.net/pdf?id=ByxdUySKvS | |
PWC | https://paperswithcode.com/paper/adversarial-autoaugment |
Repo | |
Framework | |
Episodic Reinforcement Learning with Associative Memory
Title | Episodic Reinforcement Learning with Associative Memory |
Authors | Anonymous |
Abstract | Sample efficiency has been one of the major challenges for deep reinforcement learning. Non-parametric episodic control has been proposed to speed up parametric reinforcement learning by rapidly latching on previously successful policies. However, previous work on episodic reinforcement learning neglects the relationship between states and only stored the experiences as unrelated items. To improve sample efficiency of reinforcement learning, we propose a novel framework, called Episodic Reinforcement Learning with Associative Memory (ERLAM), which associates related experience trajectories to enable reasoning effective strategies. We build a graph on top of states in memory based on state transitions and develop an efficient reverse-trajectory propagation strategy to allow rapid value propagation through the graph. We use the non-parametric associative memory as early guidance for a parametric reinforcement learning model. Results on Atari games show that our framework has significantly higher sample efficiency and outperforms state-of-the-art episodic reinforcement learning models. |
Tasks | Atari Games |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkxjqxBYDB |
https://openreview.net/pdf?id=HkxjqxBYDB | |
PWC | https://paperswithcode.com/paper/episodic-reinforcement-learning-with |
Repo | |
Framework | |
Plug and Play Language Model: A simple baseline for controlled language generation
Title | Plug and Play Language Model: A simple baseline for controlled language generation |
Authors | Anonymous |
Abstract | Large transformer-based generative models (e.g. GPT-2; 1.5B parameters) trained on a huge corpus (e.g. 40GB of text) have shown unparalleled language generation ability. While these models are powerful, fine-grained control of attributes of the generated language (e.g. gradually switching topic or sentiment) is difficult without modifying the model architecture to allow extra attribute inputs, or fine-tuning with attribute-specific data. Both would entirely change the original generative function, which, if done poorly, cannot be undone; not to mention the cost of retraining. We instead propose the Plug and Play Language Model for controlled language generation that consists of plugging in simple bag-of-words or one-layer classifiers as attribute controllers, and making updates in the activation space, without changing any model parameters. Such a control scheme provides vast flexibility and allows full recovery of the original generative function.The results demonstrate fine-grained control over a range of topics and sentiment styles, as well as the ability to detoxify generated texts. Our experiments, including human evaluation studies, show that text generated via this control scheme is aligned with desired attributes, while retaining fluency. |
Tasks | Language Modelling, Text Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1edEyBKDS |
https://openreview.net/pdf?id=H1edEyBKDS | |
PWC | https://paperswithcode.com/paper/plug-and-play-language-model-a-simple |
Repo | |
Framework | |
Abductive Commonsense Reasoning
Title | Abductive Commonsense Reasoning |
Authors | Anonymous |
Abstract | Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the lines in natural language (Hobbs et al., 1988), there has been relatively little research in support of abductive natural language inference and generation. We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks – (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. Our analysis leads to new insights into the types of reasoning that deep pre-trained language models fail to perform—despite their strong performance on the related but more narrowly defined task of entailment NLI—pointing to interesting avenues for future research. |
Tasks | Natural Language Inference, Question Answering |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Byg1v1HKDB |
https://openreview.net/pdf?id=Byg1v1HKDB | |
PWC | https://paperswithcode.com/paper/abductive-commonsense-reasoning-1 |
Repo | |
Framework | |
Once for All: Train One Network and Specialize it for Efficient Deployment
Title | Once for All: Train One Network and Specialize it for Efficient Deployment |
Authors | Anonymous |
Abstract | We address the challenging problem of efficient deep learning model deployment, where the goal is to design neural network architectures that can fit different hardware platform constraints. Most of the traditional approaches either manually design or use Neural Architecture Search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally expensive and unscalable. Our key idea is to decouple model training from architecture search to save the cost. To this end, we propose to train a once-for-all network (OFA) that supports diverse architectural settings (depth, width, kernel size, and resolution). Given a deployment scenario, we can then quickly get a specialized sub-network by selecting from the OFA network without additional training. To prevent interference between many sub-networks during training, we also propose a novel progressive shrinking algorithm, which can train a surprisingly large number of sub-networks (> 10^{19}) simultaneously, while maintaining the same accuracy as independently trained networks. Extensive experiments on various hardware platforms (CPU, GPU, mCPU, mGPU, FPGA accelerator) show that OFA consistently achieves the same level (or better) ImageNet accuracy than SOTA NAS methods while reducing orders of magnitude GPU hours and CO_2 emission than NAS. In particular, OFA requires 16x fewer GPU hours than ProxylessNAS, 19x fewer GPU hours than FBNet and 1,300x fewer GPU hours than MnasNet under 40 deployment scenarios. |
Tasks | Neural Architecture Search |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HylxE1HKwS |
https://openreview.net/pdf?id=HylxE1HKwS | |
PWC | https://paperswithcode.com/paper/once-for-all-train-one-network-and-specialize-1 |
Repo | |
Framework | |
DeFINE: Deep Factorized Input Word Embeddings for Neural Sequence Modeling
Title | DeFINE: Deep Factorized Input Word Embeddings for Neural Sequence Modeling |
Authors | Anonymous |
Abstract | For sequence models with large word-level vocabularies, a majority of network parameters lie in the input and output layers. In this work, we describe a new method, DeFINE, for learning deep word-level representations efficiently. Our architecture uses a hierarchical structure with novel skip-connections which allows for the use of low dimensional input and output layers, reducing total parameters and training time while delivering similar or better performance versus existing methods. DeFINE can be incorporated easily in new or existing sequence models. Compared to state-of-the-art methods including adaptive input representations, this technique results in a 6% to 20% drop in perplexity. On WikiText-103, DeFINE reduces total parameters of Transformer-XL by half with minimal impact on performance. On the Penn Treebank, DeFINE improves AWD-LSTM by 4 points with a 17% reduction in parameters, achieving comparable performance to state-of-the-art methods with fewer parameters. For machine translation, DeFINE improves a Transformer model by 2% while simultaneously reducing total parameters by 26% |
Tasks | Machine Translation, Word Embeddings |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJeXS04FPH |
https://openreview.net/pdf?id=rJeXS04FPH | |
PWC | https://paperswithcode.com/paper/define-deep-factorized-input-word-embeddings |
Repo | |
Framework | |
Learning to Combat Compounding-Error in Model-Based Reinforcement Learning
Title | Learning to Combat Compounding-Error in Model-Based Reinforcement Learning |
Authors | Anonymous |
Abstract | Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate. An algorithm should ideally be able to trust an imperfect model over a reasonably long planning horizon, and only rely on model-free updates when the model errors get infeasibly large. In this paper, we investigate techniques for choosing the planning horizon on a state-dependent basis, where a state’s planning horizon is determined by the maximum cumulative model error around that state. We demonstrate that these state-dependent model errors can be learned with Temporal Difference methods, based on a novel approach of temporally decomposing the cumulative model errors. Experimental results show that the proposed method can successfully adapt the planning horizon to account for state-dependent model accuracy, significantly improving the efficiency of policy learning compared to model-based and model-free baselines. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1g_S0VYvr |
https://openreview.net/pdf?id=S1g_S0VYvr | |
PWC | https://paperswithcode.com/paper/learning-to-combat-compounding-error-in-model |
Repo | |
Framework | |
Gauge Equivariant Spherical CNNs
Title | Gauge Equivariant Spherical CNNs |
Authors | Anonymous |
Abstract | Spherical CNNs are convolutional neural networks that can process signals on the sphere, such as global climate and weather patterns or omnidirectional images. Over the last few years, a number of spherical convolution methods have been proposed, based on generalized spherical FFTs, graph convolutions, and other ideas. However, none of these methods is simultaneously equivariant to 3D rotations, able to detect anisotropic patterns, computationally efficient, agnostic to the type of sample grid used, and able to deal with signals defined on only a part of the sphere. To address these limitations, we introduce the Gauge Equivariant Spherical CNN. Our method is based on the recently proposed theory of Gauge Equivariant CNNs, which is in principle applicable to signals on any manifold, and which can be computed on any set of local charts covering all of the manifold or only part of it. In this paper we show how this method can be implemented efficiently for the sphere, and show that the resulting method is fast, numerically accurate, and achieves good results on the widely used benchmark problems of climate pattern segmentation and omnidirectional semantic segmentation. |
Tasks | Semantic Segmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJeYSxHFDS |
https://openreview.net/pdf?id=HJeYSxHFDS | |
PWC | https://paperswithcode.com/paper/gauge-equivariant-spherical-cnns |
Repo | |
Framework | |
Dynamics-Aware Unsupervised Skill Discovery
Title | Dynamics-Aware Unsupervised Skill Discovery |
Authors | Anonymous |
Abstract | Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment. A good model can potentially enable planning algorithms to generate a large variety of behaviors and solve diverse tasks. However, learning an accurate model for complex dynamical systems is difficult, and even then, the model might not generalize well outside the distribution of states on which it was trained. In this work, we combine model-based learning with model-free learning of primitives that make model-based planning easy. To that end, we aim to answer the question: how can we discover skills whose outcomes are easy to predict? We propose an unsupervised learning algorithm, Dynamics-Aware Discovery of Skills (DADS), which simultaneously discovers predictable behaviors and learns their dynamics. Our method can leverage continuous skill spaces, theoretically, allowing us to learn infinitely many behaviors even for high-dimensional state-spaces. We demonstrate that zero-shot planning in the learned latent space significantly outperforms standard MBRL and model-free goal-conditioned RL, can handle sparse-reward tasks, and substantially improves over prior hierarchical RL methods for unsupervised skill discovery. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgLZR4KvH |
https://openreview.net/pdf?id=HJgLZR4KvH | |
PWC | https://paperswithcode.com/paper/dynamics-aware-unsupervised-skill-discovery |
Repo | |
Framework | |
Diverse Trajectory Forecasting with Determinantal Point Processes
Title | Diverse Trajectory Forecasting with Determinantal Point Processes |
Authors | Anonymous |
Abstract | The ability to forecast a set of likely yet diverse possible future behaviors of an agent (e.g., future trajectories of a pedestrian) is essential for safety-critical perception systems (e.g., autonomous vehicles). In particular, a set of possible future behaviors generated by the system must be diverse to account for all possible outcomes in order to take necessary safety precautions. It is not sufficient to maintain a set of the most likely future outcomes because the set may only contain perturbations of a dominating single outcome (major mode). While generative models such as variational autoencoders (VAEs) have been shown to be a powerful tool for learning a distribution over future trajectories, randomly drawn samples from the learned implicit likelihood model may not be diverse – the likelihood model is derived from the training data distribution and the samples will concentrate around the major mode of the data. In this work, we propose to learn a diversity sampling function (DSF) that generates a diverse yet likely set of future trajectories. The DSF maps forecasting context features to a set of latent codes which can be decoded by a generative model (e.g., VAE) into a set of diverse trajectory samples. Concretely, the process of identifying the diverse set of samples is posed as DSF parameter estimation. To learn the parameters of the DSF, the diversity of the trajectory samples is evaluated by a diversity loss based on a determinantal point process (DPP). Gradient descent is performed over the DSF parameters, which in turn moves the latent codes of the sample set to find an optimal set of diverse yet likely trajectories. Our method is a novel application of DPPs to optimize a set of items (forecasted trajectories) in continuous space. We demonstrate the diversity of the trajectories produced by our approach on both low-dimensional 2D trajectory data and high-dimensional human motion data. |
Tasks | Autonomous Vehicles, Point Processes |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxnY3NYPS |
https://openreview.net/pdf?id=ryxnY3NYPS | |
PWC | https://paperswithcode.com/paper/diverse-trajectory-forecasting-with-1 |
Repo | |
Framework | |
Improving and Stabilizing Deep Energy-Based Learning
Title | Improving and Stabilizing Deep Energy-Based Learning |
Authors | Anonymous |
Abstract | Deep energy-based models are powerful, but pose challenges for learning and inference (Belanger & McCallum, 2016). Tu & Gimpel (2018) developed an efficient framework for energy-based models by training “inference networks” to approximate structured inference instead of using gradient descent. However, their alternating optimization approach suffers from instabilities during training, requiring additional loss terms and careful hyperparameter tuning. In this paper, we contribute several strategies to stabilize and improve this joint training of energy functions and inference networks for structured prediction. We design a compound objective to jointly train both cost-augmented and test-time inference networks along with the energy function. We propose joint parameterizations for the inference networks that encourage them to capture complementary functionality during learning. We empirically validate our strategies on two sequence labeling tasks, showing easier paths to strong performance than prior work, as well as further improvements with global energy terms. |
Tasks | Structured Prediction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkxHXJrtPr |
https://openreview.net/pdf?id=HkxHXJrtPr | |
PWC | https://paperswithcode.com/paper/improving-and-stabilizing-deep-energy-based |
Repo | |
Framework | |
EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness Against Adversarial Attacks
Title | EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness Against Adversarial Attacks |
Authors | Anonymous |
Abstract | Ensuring robustness of Deep Neural Networks (DNNs) is crucial to their adoption in safety-critical applications such as self-driving cars, drones, and healthcare. Notably, DNNs are vulnerable to adversarial attacks in which small input perturbations can produce catastrophic misclassifications. In this work, we propose EMPIR, ensembles of quantized DNN models with different numerical precisions, as a new approach to increase robustness against adversarial attacks. EMPIR is based on the observation that quantized neural networks often demonstrate much higher robustness to adversarial attacks than full precision networks, but at the cost of a substantial loss in accuracy on the original (unperturbed) inputs. EMPIR overcomes this limitation to achieve the “best of both worlds”, i.e., the higher unperturbed accuracies of the full precision models combined with the higher robustness of the low precision models, by composing them in an ensemble. Further, as low precision DNN models have significantly lower computational and storage requirements than full precision models, EMPIR models only incur modest compute and memory overheads compared to a single full-precision model (<25% in our evaluations). We evaluate EMPIR across a suite of 3 different DNN tasks (MNIST, CIFAR-10 and ImageNet) and under 4 different adversarial attacks. Our results indicate that EMPIR boosts the average adversarial accuracies by 43.6%, 15.3% and 11.9% for the DNN models trained on the MNIST, CIFAR-10 and ImageNet datasets respectively, when compared to single full-precision models, without sacrificing accuracy on the unperturbed inputs. |
Tasks | Self-Driving Cars |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJem3yHKwH |
https://openreview.net/pdf?id=HJem3yHKwH | |
PWC | https://paperswithcode.com/paper/empir-ensembles-of-mixed-precision-deep |
Repo | |
Framework | |
Training Recurrent Neural Networks Online by Learning Explicit State Variables
Title | Training Recurrent Neural Networks Online by Learning Explicit State Variables |
Authors | Anonymous |
Abstract | Recurrent neural networks (RNNs) provide a powerful tool for online prediction in online partially observable problems. However, there are two primary issues one must overcome when training an RNN: the sensitivity of the learning algorithm’s performance to truncation length and and long training times. There are variety of strategies to improve training in RNNs, particularly with Backprop Through Time (BPTT) and by Real-Time Recurrent Learning. These strategies, however, are typically computationally expensive and focus computation on computing gradients back in time. In this work, we reformulate the RNN training objective to explicitly learn state vectors; this breaks the dependence across time and so avoids the need to estimate gradients far back in time. We show that for a fixed buffer of data, our algorithm—called Fixed Point Propagation (FPP)—is sound: it converges to a stationary point of the new objective. We investigate the empirical performance of our online FPP algorithm, particularly in terms of computation compared to truncated BPTT with varying truncation levels. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJgmR0NKPr |
https://openreview.net/pdf?id=SJgmR0NKPr | |
PWC | https://paperswithcode.com/paper/training-recurrent-neural-networks-online-by |
Repo | |
Framework | |
Kernel of CycleGAN as a principal homogeneous space
Title | Kernel of CycleGAN as a principal homogeneous space |
Authors | Anonymous |
Abstract | Unpaired image-to-image translation has attracted significant interest due to the invention of CycleGAN, a method which utilizes a combination of adversarial and cycle consistency losses to avoid the need for paired data. It is known that the CycleGAN problem might admit multiple solutions, and our goal in this paper is to analyze the space of exact solutions and to give perturbation bounds for approximate solutions. We show theoretically that the exact solution space is invariant with respect to automorphisms of the underlying probability spaces, and, furthermore, that the group of automorphisms acts freely and transitively on the space of exact solutions. We examine the case of zero pure CycleGAN loss first in its generality, and, subsequently, expand our analysis to approximate solutions for extended CycleGAN loss where identity loss term is included. In order to demonstrate that these results are applicable, we show that under mild conditions nontrivial smooth automorphisms exist. Furthermore, we provide empirical evidence that neural networks can learn these automorphisms with unexpected and unwanted results. We conclude that finding optimal solutions to the CycleGAN loss does not necessarily lead to the envisioned result in image-to-image translation tasks and that underlying hidden symmetries can render the result useless. |
Tasks | Image-to-Image Translation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1eWOJHKvB |
https://openreview.net/pdf?id=B1eWOJHKvB | |
PWC | https://paperswithcode.com/paper/kernel-of-cyclegan-as-a-principal-homogeneous |
Repo | |
Framework | |
Adversarially Robust Neural Networks via Optimal Control: Bridging Robustness with Lyapunov Stability
Title | Adversarially Robust Neural Networks via Optimal Control: Bridging Robustness with Lyapunov Stability |
Authors | Zhiyang Chen, Hang Su |
Abstract | Deep neural networks are known to be vulnerable to adversarial perturbations. In this paper, we bridge adversarial robustness of neural nets with Lyapunov stability of dynamical systems. From this viewpoint, training neural nets is equivalent to finding an optimal control of the discrete dynamical system, which allows one to utilize methods of successive approximations, an optimal control algorithm based on Pontryagin’s maximum principle, to train neural nets. This decoupled training method allows us to add constraints to the optimization, which makes the deep model more robust. The constrained optimization problem can be formulated as a semi-definite programming problem and hence can be solved efficiently. Experiments show that our method effectively improves deep model’s adversarial robustness. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BklVA2NYvH |
https://openreview.net/pdf?id=BklVA2NYvH | |
PWC | https://paperswithcode.com/paper/adversarially-robust-neural-networks-via |
Repo | |
Framework | |