April 1, 2020

2965 words 14 mins read

Paper Group NANR 6

Paper Group NANR 6

RefNet: Automatic Essay Scoring by Pairwise Comparison. Defensive Tensorization: Randomized Tensor Parametrization for Robust Neural Networks. Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition. Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples. QXplore: Q-Learning Exploration …

RefNet: Automatic Essay Scoring by Pairwise Comparison

Title RefNet: Automatic Essay Scoring by Pairwise Comparison
Authors Anonymous
Abstract Automatic Essay Scoring (AES) has been an active research area as it can greatly reduce the workload of teachers and prevents subjectivity bias . Most recent AES solutions apply deep neural network (DNN)-based models with regression, where the neural neural-based encoder learns an essay representation that helps differentiate among the essays and the corresponding essay score is inferred by a regressor. Such DNN approach usually requires a lot of expert-rated essays as training data in order to learn a good essay representation for accurate scoring. However, such data is usually expensive and thus is sparse. Inspired by the observation that human usually scores an essay by comparing it with some references, we propose a Siamese framework called Referee Network (RefNet) which allows the model to compare the quality of two essays by capturing the relative features that can differentiate the essay pair. The proposed framework can be applied as an extension to regression models as it can capture additional relative features on top of internal information. Moreover, it intrinsically augment the data by pairing thus is ideal for handling data sparsity. Experiment shows that our framework can significantly improve the existing regression models and achieve acceptable performance even when the training data is greatly reduced.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SyePUgBtPr
PDF https://openreview.net/pdf?id=SyePUgBtPr
PWC https://paperswithcode.com/paper/refnet-automatic-essay-scoring-by-pairwise
Repo
Framework

Defensive Tensorization: Randomized Tensor Parametrization for Robust Neural Networks

Title Defensive Tensorization: Randomized Tensor Parametrization for Robust Neural Networks
Authors Anonymous
Abstract As deep neural networks become widely adopted for solving most problems in computer vision and audio-understanding, there are rising concerns about their potential vulnerability. In particular, they are very sensitive to adversarial attacks, which manipulate the input to alter models’ predictions. Despite large bodies of work to address this issue, the problem remains open. In this paper, we propose defensive tensorization, a novel adversarial defense technique that leverages a latent high order factorization of the network. Randomization is applied in the latent subspace, therefore resulting in dense reconstructed weights, without the sparsity or perturbations typically induced by the randomization. Our approach can be easily integrated with any arbitrary neural architecture and combined with techniques like adversarial training. We empirically demonstrate the effectiveness of our approach on standard image classification benchmarks. We further validate the generalizability of our approach across domains and low-precision architectures by considering an audio classification task and binary networks. In all cases, we demonstrate superior performance compared to prior works in the target scenario.
Tasks Adversarial Defense, Audio Classification, Image Classification
Published 2020-01-01
URL https://openreview.net/forum?id=r1gEXgBYDH
PDF https://openreview.net/pdf?id=r1gEXgBYDH
PWC https://paperswithcode.com/paper/defensive-tensorization-randomized-tensor
Repo
Framework

Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition

Title Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition
Authors Anonymous
Abstract We introduce a unified probabilistic approach for deep continual learning based on variational Bayesian inference with open set recognition. Our model combines a joint probabilistic encoder with a generative model and a linear classifier that get shared across tasks. The open set recognition bounds the approximate posterior by fitting regions of high density on the basis of correctly classified data points and balances open set detection with recognition errors. Catastrophic forgetting is significantly alleviated through generative replay, where the open set recognition is used to sample from high density areas of the class specific posterior and reject statistical outliers. Our approach naturally allows for forward and backward transfer while maintaining past knowledge without the necessity of storing old data, regularization or inferring task labels. We demonstrate compelling results in the challenging scenario of incrementally expanding the single-head classifier for both class incremental visual and audio classification tasks, as well as incremental learning of datasets across modalities.
Tasks Audio Classification, Bayesian Inference, Continual Learning, Open Set Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rJlDoT4twr
PDF https://openreview.net/pdf?id=rJlDoT4twr
PWC https://paperswithcode.com/paper/unified-probabilistic-deep-continual-learning-1
Repo
Framework

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

Title Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
Authors Anonymous
Abstract Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle it, we find the procedure and datasets that are used to assess their progress lacking. To address this limitation, we propose Meta-Dataset: a new benchmark for training and evaluating models that is large-scale, consists of diverse datasets, and presents more realistic tasks. We experiment with popular baselines and meta-learners on Meta-Dataset, along with a competitive method that we propose. We analyze performance as a function of various characteristics of test tasks and examine the models’ ability to leverage diverse training sources for improving their generalization. We also propose a new set of baselines for quantifying the benefit of meta-learning in Meta-Dataset. Our extensive experimentation has uncovered important research challenges and we hope to inspire work in these directions.
Tasks Meta-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rkgAGAVKPr
PDF https://openreview.net/pdf?id=rkgAGAVKPr
PWC https://paperswithcode.com/paper/meta-dataset-a-dataset-of-datasets-for-1
Repo
Framework

QXplore: Q-Learning Exploration by Maximizing Temporal Difference Error

Title QXplore: Q-Learning Exploration by Maximizing Temporal Difference Error
Authors Anonymous
Abstract A major challenge in reinforcement learning is exploration, especially when reward landscapes are sparse. Several recent methods provide an intrinsic motivation to explore by directly encouraging agents to seek novel states. A potential disadvantage of pure state novelty-seeking behavior is that unknown states are treated equally regardless of their potential for future reward. In this paper, we propose an exploration objective using the temporal difference error experienced on extrinsic rewards as a secondary reward signal for exploration in deep reinforcement learning. Our objective yields novelty-seeking in the absence of extrinsic reward, while accelerating exploration of reward-relevant states in sparse (but nonzero) reward landscapes. This objective draws inspiration from dopaminergic pathways in the brain that influence animal behavior. We implement the objective with an adversarial Q-learning method in which Q and Qx are the action-value functions for extrinsic and secondary rewards, respectively. Secondary reward is given by the absolute value of the TD-error of Q. Training is off-policy, based on a replay buffer containing a mix of trajectories sampled using Q and Qx. We characterize performance on a set of continuous control benchmark tasks, and demonstrate comparable or faster convergence on all tasks when compared with other state-of-the-art exploration methods.
Tasks Continuous Control, Q-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rkxKwJrKPS
PDF https://openreview.net/pdf?id=rkxKwJrKPS
PWC https://paperswithcode.com/paper/qxplore-q-learning-exploration-by-maximizing-1
Repo
Framework
Title Regulatory Focus: Promotion and Prevention Inclinations in Policy Search
Authors Anonymous
Abstract The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path ensemble, which allows one to flexibly drive the learning process in a promotion focus or prevention focus. On top of this formulation, we systematically study the impacts of different regulatory focuses. Our findings reveal that regulatory focus, when chosen appropriately, can result in significant benefits. In particular, for the environments with sparse rewards, promotion focus would lead to more efficient exploration of the policy space; while for those where individual actions can have critical impacts, prevention focus is preferable. On various benchmarks, including MuJoCo continuous control, Terrain locomotion, Atari games, and sparse-reward environments, the proposed schemes consistently demonstrate improvement over mainstream methods, not only accelerating the learning process but also obtaining substantial performance gains.
Tasks Atari Games, Continuous Control, Efficient Exploration
Published 2020-01-01
URL https://openreview.net/forum?id=SJefPkSFPr
PDF https://openreview.net/pdf?id=SJefPkSFPr
PWC https://paperswithcode.com/paper/regulatory-focus-promotion-and-prevention
Repo
Framework

SEERL : Sample Efficient Ensemble Reinforcement Learning

Title SEERL : Sample Efficient Ensemble Reinforcement Learning
Authors Anonymous
Abstract Ensemble learning is a very prevalent method employed in machine learning. The relative success of ensemble methods is attributed to its ability to tackle a wide range of instances and complex problems that require different low-level approaches. However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved. We present a new training and evaluation framework for model-free algorithms that uses ensembles of policies obtained from a single training instance. These policies are diverse in nature and are learned through directed perturbation of the model parameters at regular intervals. We show that learning an adequately diverse set of policies is required for a good ensemble while extreme diversity can prove detrimental to overall performance. We evaluate our approach to challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform various baseline methods including other ensemble approaches.
Tasks Continuous Control
Published 2020-01-01
URL https://openreview.net/forum?id=HkgM81SYDr
PDF https://openreview.net/pdf?id=HkgM81SYDr
PWC https://paperswithcode.com/paper/seerl-sample-efficient-ensemble-reinforcement
Repo
Framework

Continuous Control with Contexts, Provably

Title Continuous Control with Contexts, Provably
Authors Anonymous
Abstract A fundamental challenge in artificially intelligence is to build an agent that generalizes and adapts to unseen environments. A common strategy is to build a decoder that takes a context of the unseen new environment and generates a policy. The current paper studies how to build a decoder for the fundamental continuous control environment, linear quadratic regulator (LQR), which can model a wide range of real world physical environments. We present a simple algorithm for this problem, which uses upper confidence bound (UCB) to refine the estimate of the decoder and balance the exploration-exploitation trade-off. Theoretically, our algorithm enjoys a $\widetilde{O}\left(\sqrt{T}\right)$ regret bound in the online setting where $T$ is the number of environments the agent played. This also implies after playing $\widetilde{O}\left(1/\epsilon^2\right)$ environments, the agent is able to transfer the learned knowledge to obtain an $\epsilon$-suboptimal policy for an unseen environment. To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting. While our main focus is theoretical, we also present experiments that demonstrate the effectiveness of our algorithm.
Tasks Continuous Control
Published 2020-01-01
URL https://openreview.net/forum?id=Skg5r1BFvB
PDF https://openreview.net/pdf?id=Skg5r1BFvB
PWC https://paperswithcode.com/paper/continuous-control-with-contexts-provably
Repo
Framework

Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

Title Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning
Authors Anonymous
Abstract Multi-step greedy policies have been extensively used in model-based Reinforcement Learning (RL) and in the case when a model of the environment is available (e.g., in the game of Go). In this work, we explore the benefits of multi-step greedy policies in model-free RL when employed in the framework of multi-step Dynamic Programming (DP): multi-step Policy and Value Iteration. These algorithms iteratively solve short-horizon decision problems and converge to the optimal solution of the original one. By using model-free algorithms as solvers of the short-horizon problems we derive fully model-free algorithms which are instances of the multi-step DP framework. As model-free algorithms are prone to instabilities w.r.t. the decision problem horizon, this simple approach can help in mitigating these instabilities and results in an improved model-free algorithms. We test this approach and show results on both discrete and continuous control problems.
Tasks Continuous Control, Game of Go
Published 2020-01-01
URL https://openreview.net/forum?id=r1l7E1HFPH
PDF https://openreview.net/pdf?id=r1l7E1HFPH
PWC https://paperswithcode.com/paper/multi-step-greedy-policies-in-model-free-deep
Repo
Framework

Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning

Title Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning
Authors Anonymous
Abstract The field of Deep Reinforcement Learning (DRL) has recently seen a surge in the popularity of maximum entropy reinforcement learning algorithms. Their popularity stems from the intuitive interpretation of the maximum entropy objective and their superior sample efficiency on standard benchmarks. In this paper, we seek to understand the primary contribution of the entropy term to the performance of maximum entropy algorithms. For the Mujoco benchmark, we demonstrate that the entropy term in Soft Actor Critic (SAC) principally addresses the bounded nature of the action spaces. With this insight, we propose a simple normalization scheme which allows a streamlined algorithm without entropy maximization match the performance of SAC. Our experimental results demonstrate a need to revisit the benefits of entropy regularization in DRL. We also propose a simple non-uniform sampling method for selecting transitions from the replay buffer during training. We further show that the streamlined algorithm with the simple non-uniform sampling scheme outperforms SAC and achieves state-of-the-art performance on challenging continuous control tasks.
Tasks Continuous Control
Published 2020-01-01
URL https://openreview.net/forum?id=SJl47yBYPS
PDF https://openreview.net/pdf?id=SJl47yBYPS
PWC https://paperswithcode.com/paper/towards-simplicity-in-deep-reinforcement
Repo
Framework

Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning

Title Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning
Authors Anonymous
Abstract Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensional state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the ``best” coverage is still an open problem. In this work, we present a novel approach to population-based RL in continuous control that leverages properties of normalizing flows to perform attractive and repulsive operations between current members of the population and previously observed policies. Empirical results on the MuJoCo suite demonstrate a high performance gain for our algorithm compared to prior work, including Soft-Actor Critic (SAC). |
Tasks Continuous Control
Published 2020-01-01
URL https://openreview.net/forum?id=BJlAzTEKwS
PDF https://openreview.net/pdf?id=BJlAzTEKwS
PWC https://paperswithcode.com/paper/attraction-repulsion-actor-critic-for-1
Repo
Framework

Advantage Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

Title Advantage Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
Authors Anonymous
Abstract In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that uses standard supervised learning methods as subroutines. Our goal is an algorithm that utilizes only simple and convergent maximum likelihood loss functions, while also being able to leverage off-policy data. Our proposed approach, which we refer to as advantage-weighted regression (AWR), consists of two standard supervised learning steps: one to regress onto target values for a value function, and another to regress onto weighted target actions for the policy. The method is simple and general, can accommodate continuous and discrete actions, and can be implemented in just a few lines of code on top of standard supervised learning methods. We provide a theoretical motivation for AWR and analyze its properties when incorporating off-policy data from experience replay. We evaluate AWR on a suite of standard OpenAI Gym benchmark tasks, and show that it achieves competitive performance compared to a number of well-established state-of-the-art RL algorithms. AWR is also able to acquire more effective policies than most off-policy algorithms when learning from purely static datasets with no additional environmental interactions. Furthermore, we demonstrate our algorithm on challenging continuous control tasks with highly complex simulated characters.
Tasks Continuous Control
Published 2020-01-01
URL https://openreview.net/forum?id=H1gdF34FvS
PDF https://openreview.net/pdf?id=H1gdF34FvS
PWC https://paperswithcode.com/paper/advantage-weighted-regression-simple-and-1
Repo
Framework
Title CNAS: Channel-Level Neural Architecture Search
Authors Anonymous
Abstract There is growing interest in automating designing good neural network architectures. The NAS methods proposed recently have significantly reduced architecture search cost by sharing parameters, but there is still a challenging problem of designing search space. We consider search space is typically defined with its shape and a set of operations and propose a channel-level architecture search,(CNAS) method using only a fixed type of operation. The resulting architecture is sparse in terms of channel and has different topology at different cell. The experimental results for CIFAR-10 and ImageNet show that a fine-granular and sparse model searched by CNAS achieves very competitive performance with dense models searched by the existing methods.
Tasks Neural Architecture Search
Published 2020-01-01
URL https://openreview.net/forum?id=rklfIeSFwS
PDF https://openreview.net/pdf?id=rklfIeSFwS
PWC https://paperswithcode.com/paper/cnas-channel-level-neural-architecture-search
Repo
Framework
Title On Weight-Sharing and Bilevel Optimization in Architecture Search
Authors Anonymous
Abstract Weight-sharing—the simultaneous optimization of multiple neural networks using the same parameters—has emerged as a key component of state-of-the-art neural architecture search. However, its success is poorly understood and often found to be surprising. We argue that, rather than just being an optimization trick, the weight-sharing approach is induced by the relaxation of a structured hypothesis space, and introduces new algorithmic and theoretical challenges as well as applications beyond neural architecture search. Algorithmically, we show how the geometry of ERM for weight-sharing requires greater care when designing gradient- based minimization methods and apply tools from non-convex non-Euclidean optimization to give general-purpose algorithms that adapt to the underlying structure. We further analyze the learning-theoretic behavior of the bilevel optimization solved by practical weight-sharing methods. Next, using kernel configuration and NLP feature selection as case studies, we demonstrate how weight-sharing applies to the architecture search generalization of NAS and effectively optimizes the resulting bilevel objective. Finally, we use our optimization analysis to develop a simple exponentiated gradient method for NAS that aligns with the underlying optimization geometry and matches state-of-the-art approaches on CIFAR-10.
Tasks bilevel optimization, Feature Selection, Neural Architecture Search
Published 2020-01-01
URL https://openreview.net/forum?id=HJgRCyHFDr
PDF https://openreview.net/pdf?id=HJgRCyHFDr
PWC https://paperswithcode.com/paper/on-weight-sharing-and-bilevel-optimization-in
Repo
Framework

Searching for Stage-wise Neural Graphs In the Limit

Title Searching for Stage-wise Neural Graphs In the Limit
Authors Anonymous
Abstract Search space is a key consideration for neural architecture search. Recently, Xie et al. (2019a) found that randomly generated networks from the same distribution perform similarly, which suggest we should search for random graph distributions instead of graphs. We propose graphon as a new search space. A graphon is the limit of Cauchy sequence of graphs and a scale-free probabilistic distribution, from which graphs of different number of vertices can be drawn. This property enables us to perform NAS using fast, low-capacity models and scale the found models up when necessary. We develop an algorithm for NAS in the space of graphons and empirically demonstrate that it can find stage-wise graphs that outperform DenseNet and other baselines on ImageNet.
Tasks Neural Architecture Search
Published 2020-01-01
URL https://openreview.net/forum?id=SkxWnkStvS
PDF https://openreview.net/pdf?id=SkxWnkStvS
PWC https://paperswithcode.com/paper/searching-for-stage-wise-neural-graphs-in-the
Repo
Framework
comments powered by Disqus