Paper Group NANR 21
Policy Optimization by Local Improvement through Search. Anchor & Transform: Learning Sparse Representations of Discrete Objects. Low Bias Gradient Estimates for Very Deep Boolean Stochastic Networks. Learning General and Reusable Features via Racecar-Training. Attention Forcing for Sequence-to-sequence Model Training. Stabilizing DARTS with Amende …
Policy Optimization by Local Improvement through Search
Title | Policy Optimization by Local Improvement through Search |
Authors | Anonymous |
Abstract | Imitation learning has emerged as a powerful strategy for learning initial policies that can be refined with reinforcement learning techniques. Most strategies in imitation learning, however, rely on per-step supervision either from expert demonstrations, referred to as behavioral cloning or from interactive expert policy queries such as DAgger. These strategies differ on the state distribution at which the expert actions are collected – the former using the state distribution of the expert, the latter using the state distribution of the policy being trained. However, the learning signal in both cases arises from the expert actions. On the other end of the spectrum, approaches rooted in Policy Iteration, such as Dual Policy Iteration do not choose next step actions based on an expert, but instead use planning or search over the policy to choose an action distribution to train towards. However, this can be computationally expensive, and can also end up training the policy on a state distribution that is far from the current policy’s induced distribution. In this paper, we propose an algorithm that finds a middle ground by using Monte Carlo Tree Search (MCTS) to perform local trajectory improvement over rollouts from the policy. We provide theoretical justification for both the proposed local trajectory search algorithm and for our use of MCTS as a local policy improvement operator. We also show empirically that our method (Policy Optimization by Local Improvement through Search or POLISH) is much faster than methods that plan globally, speeding up training by a factor of up to 14 in wall clock time. Furthermore, the resulting policy outperforms strong baselines in both reinforcement learning and imitation learning. |
Tasks | Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyxgoyHtDB |
https://openreview.net/pdf?id=HyxgoyHtDB | |
PWC | https://paperswithcode.com/paper/policy-optimization-by-local-improvement |
Repo | |
Framework | |
Anchor & Transform: Learning Sparse Representations of Discrete Objects
Title | Anchor & Transform: Learning Sparse Representations of Discrete Objects |
Authors | Anonymous |
Abstract | Learning continuous representations of discrete objects such as text, users, and items lies at the heart of many applications including text and user modeling. Unfortunately, traditional methods that embed all objects do not scale to large vocabulary sizes and embedding dimensions. In this paper, we propose a general method, Anchor & Transform (ANT) that learns sparse representations of discrete objects by jointly learning a small set of anchor embeddings and a sparse transformation from anchor objects to all objects. ANT is scalable, flexible, end-to-end trainable, and allows the user to easily incorporate domain knowledge about object relationships (e.g. WordNet, co-occurrence, item clusters). ANT also recovers several task-specific baselines under certain structural assumptions on the anchors and transformation matrices. On text classification and language modeling benchmarks, ANT demonstrates stronger performance with fewer parameters as compared to existing vocabulary selection and embedding compression baselines. |
Tasks | Language Modelling, Text Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1epaJSYDS |
https://openreview.net/pdf?id=H1epaJSYDS | |
PWC | https://paperswithcode.com/paper/anchor-transform-learning-sparse |
Repo | |
Framework | |
Low Bias Gradient Estimates for Very Deep Boolean Stochastic Networks
Title | Low Bias Gradient Estimates for Very Deep Boolean Stochastic Networks |
Authors | Anonymous |
Abstract | Stochastic neural networks with discrete random variables are an important class of models for their expressivity and interpretability. Since direct differentiation and backpropagation is not possible, Monte Carlo gradient estimation techniques have been widely employed for training such models. Efficient stochastic gradient estimators, such Straight-Through and Gumbel-Softmax, work well for shallow models with one or two stochastic layers. Their performance, however, suffers with increasing model complexity. In this work we focus on stochastic networks with multiple layers of Boolean latent variables. To analyze such such networks, we employ the framework of harmonic analysis for Boolean functions. We use it to derive an analytic formulation for the source of bias in the biased Straight-Through estimator. Based on the analysis we propose \emph{FouST}, a simple gradient estimation algorithm that relies on three simple bias reduction steps. Extensive experiments show that FouST performs favorably compared to state-of-the-art biased estimators, while being much faster than unbiased ones. To the best of our knowledge FouST is the first gradient estimator to train up very deep stochastic neural networks, with up to 80 deterministic and 11 stochastic layers. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bygadh4tDB |
https://openreview.net/pdf?id=Bygadh4tDB | |
PWC | https://paperswithcode.com/paper/low-bias-gradient-estimates-for-very-deep |
Repo | |
Framework | |
Learning General and Reusable Features via Racecar-Training
Title | Learning General and Reusable Features via Racecar-Training |
Authors | Anonymous |
Abstract | We propose a novel training approach for improving the learning of generalizing features in neural networks. We augment the network with a reverse pass which aims for reconstructing the full sequence of internal states of the network. Despite being a surprisingly simple change, we demonstrate that this forward-backward training approach, i.e. racecar training, leads to significantly more general features to be extracted from a given data set. We demonstrate in our paper that a network obtained in this way is continually trained for the original task, it outperforms baseline models trained in a regular fashion. This improved performance is visible for a wide range of learning tasks from classification, to regression and stylization. In addition, networks trained with our approach exhibit improved performance for task transfers. We additionally analyze the mutual information of our networks to explain the improved generalizing capabilities. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1gDaa4YwS |
https://openreview.net/pdf?id=H1gDaa4YwS | |
PWC | https://paperswithcode.com/paper/learning-general-and-reusable-features-via |
Repo | |
Framework | |
Attention Forcing for Sequence-to-sequence Model Training
Title | Attention Forcing for Sequence-to-sequence Model Training |
Authors | Anonymous |
Abstract | Auto-regressive sequence-to-sequence models with attention mechanism have achieved state-of-the-art performance in many tasks such as machine translation and speech synthesis. These models can be difficult to train. The standard approach, teacher forcing, guides a model with reference output history during training. The problem is that the model is unlikely to recover from its mistakes during inference, where the reference output is replaced by generated output. Several approaches deal with this problem, largely by guiding the model with generated output history. To make training stable, these approaches often require a heuristic schedule or an auxiliary classifier. This paper introduces attention forcing, which guides the model with generated output history and reference attention. This approach can train the model to recover from its mistakes, in a stable fashion, without the need for a schedule or a classifier. In addition, it allows the model to generate output sequences aligned with the references, which can be important for cascaded systems like many speech synthesis systems. Experiments on speech synthesis show that attention forcing yields significant performance gain. Experiments on machine translation show that for tasks where various re-orderings of the output are valid, guiding the model with generated output history is challenging, while guiding the model with reference attention is beneficial. |
Tasks | Machine Translation, Speech Synthesis |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJe5_CNtPB |
https://openreview.net/pdf?id=rJe5_CNtPB | |
PWC | https://paperswithcode.com/paper/attention-forcing-for-sequence-to-sequence-1 |
Repo | |
Framework | |
Stabilizing DARTS with Amended Gradient Estimation on Architectural Parameters
Title | Stabilizing DARTS with Amended Gradient Estimation on Architectural Parameters |
Authors | Anonymous |
Abstract | Differentiable neural architecture search has been a popular methodology of exploring architectures for deep learning. Despite the great advantage of search efficiency, it often suffers weak stability, which obstacles it from being applied to a large search space or being flexibly adjusted to different scenarios. This paper investigates DARTS, the currently most popular differentiable search algorithm, and points out an important factor of instability, which lies in its approximation on the gradients of architectural parameters. In the current status, the optimization algorithm can converge to another point which results in dramatic inaccuracy in the re-training process. Based on this analysis, we propose an amending term for computing architectural gradients by making use of a direct property of the optimality of network parameter optimization. Our approach mathematically guarantees that gradient estimation follows a roughly correct direction, which leads the search stage to converge on reasonable architectures. In practice, our algorithm is easily implemented and added to DARTS-based approaches efficiently. Experiments on CIFAR and ImageNet demonstrate that our approach enjoys accuracy gain and, more importantly, enables DARTS-based approaches to explore much larger search spaces that have not been studied before. |
Tasks | Neural Architecture Search |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlgt2EYwr |
https://openreview.net/pdf?id=BJlgt2EYwr | |
PWC | https://paperswithcode.com/paper/stabilizing-darts-with-amended-gradient-1 |
Repo | |
Framework | |
Reinforcement Learning with Chromatic Networks
Title | Reinforcement Learning with Chromatic Networks |
Authors | Anonymous |
Abstract | We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way. By defining the combinatorial search space of NAS to be the set of different edge-partitionings (colorings) into same-weight classes, we represent compact architectures via efficient learned edge-partitionings. For several RL tasks, we manage to learn colorings translating to effective policies parameterized by as few as 17 weight parameters, providing >90 % compression over vanilla policies and 6x compression over state-of-the-art compact policies based on Toeplitz matrices, while still maintaining good reward. We believe that our work is one of the first attempts to propose a rigorous approach to training structured neural network architectures for RL problems that are of interest especially in mobile robotics with limited storage and computational resources. |
Tasks | Neural Architecture Search |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1gKkpNKwH |
https://openreview.net/pdf?id=S1gKkpNKwH | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-with-chromatic-1 |
Repo | |
Framework | |
MANAS: Multi-Agent Neural Architecture Search
Title | MANAS: Multi-Agent Neural Architecture Search |
Authors | Anonymous |
Abstract | The Neural Architecture Search (NAS) problem is typically formulated as a graph search problem where the goal is to learn the optimal operations over edges in order to maximize a graph-level global objective. Due to the large architecture parameter space, efficiency is a key bottleneck preventing NAS from its practical use. In this paper, we address the issue by framing NAS as a multi-agent problem where agents control a subset of the network and coordinate to reach optimal architectures. We provide two distinct lightweight implementations, with reduced memory requirements ($1/8$th of state-of-the-art), and performances above those of much more computationally expensive methods. Theoretically, we demonstrate vanishing regrets of the form $\mathcal{O}(\sqrt{T})$, with $T$ being the total number of rounds. Finally, aware that random search is an (often ignored) effective baseline we perform additional experiments on $3$ alternative datasets and $2$ network configurations, and achieve favorable results in comparison with this baseline and other competing methods. |
Tasks | Neural Architecture Search |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryedqa4FwS |
https://openreview.net/pdf?id=ryedqa4FwS | |
PWC | https://paperswithcode.com/paper/manas-multi-agent-neural-architecture-search-1 |
Repo | |
Framework | |
Improving One-Shot NAS By Suppressing The Posterior Fading
Title | Improving One-Shot NAS By Suppressing The Posterior Fading |
Authors | Anonymous |
Abstract | There is a growing interest in automated neural architecture search (NAS). To improve the efficiency of NAS, previous approaches adopt weight sharing method to force all models share the same set of weights. However, it has been observed that a model performing better with shared weights does not necessarily perform better when trained alone. In this paper, we analyse existing weight sharing one-shot NAS approaches from a Bayesian point of view and identify the posterior fading problem, which compromises the effectiveness of shared weights. To alleviate this problem, we present a practical approach to guide the parameter posterior towards its true distribution. Moreover, a hard latency constraint is introduced during the search so that the desired latency can be achieved. The resulted method, namely Posterior Convergent NAS (PC-NAS), achieves state-of-the-art performance under standard GPU latency constraint on ImageNet. In our small search space, our model PC-NAS-S attains76.8% top-1 accuracy, 2.1% higher than MobileNetV2 (1.4x) with the same latency. When adopted to our large search space, PC-NAS-L achieves 78.1% top-1 accuracy within 11ms. The discovered architecture also transfers well to other computer vision applications such as object detection and person re-identification. |
Tasks | Neural Architecture Search, Object Detection, Person Re-Identification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgJNCEKPr |
https://openreview.net/pdf?id=HJgJNCEKPr | |
PWC | https://paperswithcode.com/paper/improving-one-shot-nas-by-suppressing-the-1 |
Repo | |
Framework | |
Graph Constrained Reinforcement Learning for Natural Language Action Spaces
Title | Graph Constrained Reinforcement Learning for Natural Language Action Spaces |
Authors | Anonymous |
Abstract | Interactive Fiction games are text-based simulations in which an agent interacts with the world purely through natural language. They are ideal environments for studying how to extend reinforcement learning agents to meet the challenges of natural language understanding, partial observability, and action generation in combinatorially-large text-based action spaces. We present KG-A2C, an agent that builds a dynamic knowledge graph while exploring and generates actions using a template-based action space. We contend that the dual uses of the knowledge graph to reason about game state and to constrain natural language generation are the keys to scalable exploration of combinatorially large natural language actions. Results across a wide variety of IF games show that KG-A2C outperforms current IF agents despite the exponential increase in action space size. |
Tasks | Text Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1x6w0EtwH |
https://openreview.net/pdf?id=B1x6w0EtwH | |
PWC | https://paperswithcode.com/paper/graph-constrained-reinforcement-learning-for |
Repo | |
Framework | |
Gated Channel Transformation for Visual Recognition
Title | Gated Channel Transformation for Visual Recognition |
Authors | Anonymous |
Abstract | In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with convolutional weights towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple L2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection, and instance segmentation on COCO, video classification on Kinetics. |
Tasks | Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation, Video Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJxbu6VKDr |
https://openreview.net/pdf?id=SJxbu6VKDr | |
PWC | https://paperswithcode.com/paper/gated-channel-transformation-for-visual-1 |
Repo | |
Framework | |
Emergence of functional and structural properties of the head direction system by optimization of recurrent neural networks
Title | Emergence of functional and structural properties of the head direction system by optimization of recurrent neural networks |
Authors | Anonymous |
Abstract | Recent work suggests goal-driven training of neural networks can be used to model neural activity in the brain. While response properties of neurons in artificial neural networks bear similarities to those in the brain, the network architectures are often constrained to be different. Here we ask if a neural network can recover both neural representations and, if the architecture is unconstrained and optimized, also the anatomical properties of neural circuits. We demonstrate this in a system where the connectivity and the functional organization have been characterized, namely, the head direction circuit of the rodent and fruit fly. We trained recurrent neural networks (RNNs) to estimate head direction through integration of angular velocity. We found that the two distinct classes of neurons observed in the head direction system, the Ring neurons and the Shifter neurons, emerged naturally in artificial neural networks as a result of training. Furthermore, connectivity analysis and in-silico neurophysiology revealed structural and mechanistic similarities between artificial networks and the head direction system. Overall, our results show that optimization of RNNs in a goal-driven task can recapitulate the structure and function of biological circuits, suggesting that artificial neural networks can be used to study the brain at the level of both neural activity and anatomical organization. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HklSeREtPB |
https://openreview.net/pdf?id=HklSeREtPB | |
PWC | https://paperswithcode.com/paper/emergence-of-functional-and-structural |
Repo | |
Framework | |
Adversarially robust transfer learning
Title | Adversarially robust transfer learning |
Authors | Anonymous |
Abstract | Transfer learning, in which a network is trained on one task and re-purposed on another, is often used to produce neural network classifiers when data is scarce or full-scale training is too costly. When the goal is to produce a model that is not only accurate but also adversarially robust, data scarcity and computational limitations become even more cumbersome. We consider robust transfer learning, in which we transfer not only performance but also robustness from a source model to a target domain. We start by observing that robust networks contain robust feature extractors. By training classifiers on top of these feature extractors, we produce new models that inherit the robustness of their parent networks. We then consider the case of “fine tuning” a network by re-training end-to-end in the target domain. When using lifelong learning strategies, this process preserves the robustness of the source network while achieving high accuracy. By using such strategies, it is possible to produce accurate and robust models with little data, and without the cost of adversarial training. Additionally, we can improve the generalization of adversarially trained models, while maintaining their robustness. |
Tasks | Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryebG04YvB |
https://openreview.net/pdf?id=ryebG04YvB | |
PWC | https://paperswithcode.com/paper/adversarially-robust-transfer-learning-1 |
Repo | |
Framework | |
Synthesizing Programmatic Policies that Inductively Generalize
Title | Synthesizing Programmatic Policies that Inductively Generalize |
Authors | Anonymous |
Abstract | Deep reinforcement learning has successfully solved a number of challenging control tasks. However, learned policies typically have difficulty generalizing to novel environments. We propose an algorithm for learning programmatic state machine policies that can capture repeating behaviors. By doing so, they have the ability to generalize to instances requiring an arbitrary number of repetitions, a property we call inductive generalization. However, state machine policies are hard to learn since they consist of a combination of continuous and discrete structure. We propose a learning framework called adaptive teaching, which learns a state machine policy by imitating a teacher; in contrast to traditional imitation learning, our teacher adaptively updates itself based on the structure of the student. We show how our algorithm can be used to learn policies that inductively generalize to novel environments, whereas traditional neural network policies fail to do so. |
Tasks | Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1l8oANFDH |
https://openreview.net/pdf?id=S1l8oANFDH | |
PWC | https://paperswithcode.com/paper/synthesizing-programmatic-policies-that |
Repo | |
Framework | |
Fast Task Adaptation for Few-Shot Learning
Title | Fast Task Adaptation for Few-Shot Learning |
Authors | Anonymous |
Abstract | Few-shot classification is a challenging task due to the scarcity of training examples for each class. The key lies in generalization of prior knowledge learned from large-scale base classes and fast adaptation of the classifier to novel classes. In this paper, we introduce a two-stage framework. In the first stage, we attempt to learn task-agnostic feature on base data with a novel Metric-Softmax loss. The Metric-Softmax loss is trained against the whole label set and learns more discriminative feature than episodic training. Besides, the Metric-Softmax classifier can be applied to base and novel classes in a consistent manner, which is critical for the generalizability of the learned feature. In the second stage, we design a task-adaptive transformation which adapts the classifier to each few-shot setting very fast within a few tuning epochs. Compared with existing fine-tuning scheme, the scarce examples of novel classes are exploited more effectively. Experiments show that our approach outperforms current state-of-the-arts by a large margin on the commonly used mini-ImageNet and CUB-200-2011 benchmarks. |
Tasks | Few-Shot Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByxhOyHYwH |
https://openreview.net/pdf?id=ByxhOyHYwH | |
PWC | https://paperswithcode.com/paper/fast-task-adaptation-for-few-shot-learning |
Repo | |
Framework | |