April 1, 2020

2965 words 14 mins read

Paper Group NANR 6

RefNet: Automatic Essay Scoring by Pairwise Comparison. Defensive Tensorization: Randomized Tensor Parametrization for Robust Neural Networks. Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition. Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples. QXplore: Q-Learning Exploration …

RefNet: Automatic Essay Scoring by Pairwise Comparison


Title	RefNet: Automatic Essay Scoring by Pairwise Comparison
Authors	Anonymous
Abstract	Automatic Essay Scoring (AES) has been an active research area as it can greatly reduce the workload of teachers and prevents subjectivity bias . Most recent AES solutions apply deep neural network (DNN)-based models with regression, where the neural neural-based encoder learns an essay representation that helps differentiate among the essays and the corresponding essay score is inferred by a regressor. Such DNN approach usually requires a lot of expert-rated essays as training data in order to learn a good essay representation for accurate scoring. However, such data is usually expensive and thus is sparse. Inspired by the observation that human usually scores an essay by comparing it with some references, we propose a Siamese framework called Referee Network (RefNet) which allows the model to compare the quality of two essays by capturing the relative features that can differentiate the essay pair. The proposed framework can be applied as an extension to regression models as it can capture additional relative features on top of internal information. Moreover, it intrinsically augment the data by pairing thus is ideal for handling data sparsity. Experiment shows that our framework can significantly improve the existing regression models and achieve acceptable performance even when the training data is greatly reduced.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SyePUgBtPr
PDF	https://openreview.net/pdf?id=SyePUgBtPr
PWC	https://paperswithcode.com/paper/refnet-automatic-essay-scoring-by-pairwise
Repo
Framework

Defensive Tensorization: Randomized Tensor Parametrization for Robust Neural Networks


Title	Defensive Tensorization: Randomized Tensor Parametrization for Robust Neural Networks
Authors	Anonymous
Abstract	As deep neural networks become widely adopted for solving most problems in computer vision and audio-understanding, there are rising concerns about their potential vulnerability. In particular, they are very sensitive to adversarial attacks, which manipulate the input to alter models’ predictions. Despite large bodies of work to address this issue, the problem remains open. In this paper, we propose defensive tensorization, a novel adversarial defense technique that leverages a latent high order factorization of the network. Randomization is applied in the latent subspace, therefore resulting in dense reconstructed weights, without the sparsity or perturbations typically induced by the randomization. Our approach can be easily integrated with any arbitrary neural architecture and combined with techniques like adversarial training. We empirically demonstrate the effectiveness of our approach on standard image classification benchmarks. We further validate the generalizability of our approach across domains and low-precision architectures by considering an audio classification task and binary networks. In all cases, we demonstrate superior performance compared to prior works in the target scenario.
Tasks	Adversarial Defense, Audio Classification, Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=r1gEXgBYDH
PDF	https://openreview.net/pdf?id=r1gEXgBYDH
PWC	https://paperswithcode.com/paper/defensive-tensorization-randomized-tensor
Repo
Framework

Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition


Title	Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition
Authors	Anonymous
Abstract	We introduce a unified probabilistic approach for deep continual learning based on variational Bayesian inference with open set recognition. Our model combines a joint probabilistic encoder with a generative model and a linear classifier that get shared across tasks. The open set recognition bounds the approximate posterior by fitting regions of high density on the basis of correctly classified data points and balances open set detection with recognition errors. Catastrophic forgetting is significantly alleviated through generative replay, where the open set recognition is used to sample from high density areas of the class specific posterior and reject statistical outliers. Our approach naturally allows for forward and backward transfer while maintaining past knowledge without the necessity of storing old data, regularization or inferring task labels. We demonstrate compelling results in the challenging scenario of incrementally expanding the single-head classifier for both class incremental visual and audio classification tasks, as well as incremental learning of datasets across modalities.
Tasks	Audio Classification, Bayesian Inference, Continual Learning, Open Set Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rJlDoT4twr
PDF	https://openreview.net/pdf?id=rJlDoT4twr
PWC	https://paperswithcode.com/paper/unified-probabilistic-deep-continual-learning-1
Repo
Framework

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples


Title	Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
Authors	Anonymous
Abstract	Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle it, we find the procedure and datasets that are used to assess their progress lacking. To address this limitation, we propose Meta-Dataset: a new benchmark for training and evaluating models that is large-scale, consists of diverse datasets, and presents more realistic tasks. We experiment with popular baselines and meta-learners on Meta-Dataset, along with a competitive method that we propose. We analyze performance as a function of various characteristics of test tasks and examine the models’ ability to leverage diverse training sources for improving their generalization. We also propose a new set of baselines for quantifying the benefit of meta-learning in Meta-Dataset. Our extensive experimentation has uncovered important research challenges and we hope to inspire work in these directions.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rkgAGAVKPr
PDF	https://openreview.net/pdf?id=rkgAGAVKPr
PWC	https://paperswithcode.com/paper/meta-dataset-a-dataset-of-datasets-for-1
Repo
Framework

QXplore: Q-Learning Exploration by Maximizing Temporal Difference Error


Title	QXplore: Q-Learning Exploration by Maximizing Temporal Difference Error
Authors	Anonymous
Abstract	A major challenge in reinforcement learning is exploration, especially when reward landscapes are sparse. Several recent methods provide an intrinsic motivation to explore by directly encouraging agents to seek novel states. A potential disadvantage of pure state novelty-seeking behavior is that unknown states are treated equally regardless of their potential for future reward. In this paper, we propose an exploration objective using the temporal difference error experienced on extrinsic rewards as a secondary reward signal for exploration in deep reinforcement learning. Our objective yields novelty-seeking in the absence of extrinsic reward, while accelerating exploration of reward-relevant states in sparse (but nonzero) reward landscapes. This objective draws inspiration from dopaminergic pathways in the brain that influence animal behavior. We implement the objective with an adversarial Q-learning method in which Q and Qx are the action-value functions for extrinsic and secondary rewards, respectively. Secondary reward is given by the absolute value of the TD-error of Q. Training is off-policy, based on a replay buffer containing a mix of trajectories sampled using Q and Qx. We characterize performance on a set of continuous control benchmark tasks, and demonstrate comparable or faster convergence on all tasks when compared with other state-of-the-art exploration methods.
Tasks	Continuous Control, Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rkxKwJrKPS
PDF	https://openreview.net/pdf?id=rkxKwJrKPS
PWC	https://paperswithcode.com/paper/qxplore-q-learning-exploration-by-maximizing-1
Repo
Framework

Regulatory Focus: Promotion and Prevention Inclinations in Policy Search


Title	Regulatory Focus: Promotion and Prevention Inclinations in Policy Search
Authors	Anonymous
Abstract	The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path ensemble, which allows one to flexibly drive the learning process in a promotion focus or prevention focus. On top of this formulation, we systematically study the impacts of different regulatory focuses. Our findings reveal that regulatory focus, when chosen appropriately, can result in significant benefits. In particular, for the environments with sparse rewards, promotion focus would lead to more efficient exploration of the policy space; while for those where individual actions can have critical impacts, prevention focus is preferable. On various benchmarks, including MuJoCo continuous control, Terrain locomotion, Atari games, and sparse-reward environments, the proposed schemes consistently demonstrate improvement over mainstream methods, not only accelerating the learning process but also obtaining substantial performance gains.
Tasks	Atari Games, Continuous Control, Efficient Exploration
Published	2020-01-01
URL	https://openreview.net/forum?id=SJefPkSFPr
PDF	https://openreview.net/pdf?id=SJefPkSFPr
PWC	https://paperswithcode.com/paper/regulatory-focus-promotion-and-prevention
Repo
Framework

SEERL : Sample Efficient Ensemble Reinforcement Learning


Title	SEERL : Sample Efficient Ensemble Reinforcement Learning
Authors	Anonymous
Abstract	Ensemble learning is a very prevalent method employed in machine learning. The relative success of ensemble methods is attributed to its ability to tackle a wide range of instances and complex problems that require different low-level approaches. However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved. We present a new training and evaluation framework for model-free algorithms that uses ensembles of policies obtained from a single training instance. These policies are diverse in nature and are learned through directed perturbation of the model parameters at regular intervals. We show that learning an adequately diverse set of policies is required for a good ensemble while extreme diversity can prove detrimental to overall performance. We evaluate our approach to challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform various baseline methods including other ensemble approaches.
Tasks	Continuous Control
Published	2020-01-01
URL	https://openreview.net/forum?id=HkgM81SYDr
PDF	https://openreview.net/pdf?id=HkgM81SYDr
PWC	https://paperswithcode.com/paper/seerl-sample-efficient-ensemble-reinforcement
Repo
Framework

Continuous Control with Contexts, Provably


Title	Continuous Control with Contexts, Provably
Authors	Anonymous
Abstract	A fundamental challenge in artificially intelligence is to build an agent that generalizes and adapts to unseen environments. A common strategy is to build a decoder that takes a context of the unseen new environment and generates a policy. The current paper studies how to build a decoder for the fundamental continuous control environment, linear quadratic regulator (LQR), which can model a wide range of real world physical environments. We present a simple algorithm for this problem, which uses upper confidence bound (UCB) to refine the estimate of the decoder and balance the exploration-exploitation trade-off. Theoretically, our algorithm enjoys a $\widetilde{O}\left(\sqrt{T}\right)$ regret bound in the online setting where $T$ is the number of environments the agent played. This also implies after playing $\widetilde{O}\left(1/\epsilon^2\right)$ environments, the agent is able to transfer the learned knowledge to obtain an $\epsilon$-suboptimal policy for an unseen environment. To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting. While our main focus is theoretical, we also present experiments that demonstrate the effectiveness of our algorithm.
Tasks	Continuous Control
Published	2020-01-01
URL	https://openreview.net/forum?id=Skg5r1BFvB
PDF	https://openreview.net/pdf?id=Skg5r1BFvB
PWC	https://paperswithcode.com/paper/continuous-control-with-contexts-provably
Repo
Framework

Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning


Title	Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning
Authors	Anonymous
Abstract	Multi-step greedy policies have been extensively used in model-based Reinforcement Learning (RL) and in the case when a model of the environment is available (e.g., in the game of Go). In this work, we explore the benefits of multi-step greedy policies in model-free RL when employed in the framework of multi-step Dynamic Programming (DP): multi-step Policy and Value Iteration. These algorithms iteratively solve short-horizon decision problems and converge to the optimal solution of the original one. By using model-free algorithms as solvers of the short-horizon problems we derive fully model-free algorithms which are instances of the multi-step DP framework. As model-free algorithms are prone to instabilities w.r.t. the decision problem horizon, this simple approach can help in mitigating these instabilities and results in an improved model-free algorithms. We test this approach and show results on both discrete and continuous control problems.
Tasks	Continuous Control, Game of Go
Published	2020-01-01
URL	https://openreview.net/forum?id=r1l7E1HFPH
PDF	https://openreview.net/pdf?id=r1l7E1HFPH
PWC	https://paperswithcode.com/paper/multi-step-greedy-policies-in-model-free-deep
Repo
Framework

Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning


Title	Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning
Authors	Anonymous
Abstract	The field of Deep Reinforcement Learning (DRL) has recently seen a surge in the popularity of maximum entropy reinforcement learning algorithms. Their popularity stems from the intuitive interpretation of the maximum entropy objective and their superior sample efficiency on standard benchmarks. In this paper, we seek to understand the primary contribution of the entropy term to the performance of maximum entropy algorithms. For the Mujoco benchmark, we demonstrate that the entropy term in Soft Actor Critic (SAC) principally addresses the bounded nature of the action spaces. With this insight, we propose a simple normalization scheme which allows a streamlined algorithm without entropy maximization match the performance of SAC. Our experimental results demonstrate a need to revisit the benefits of entropy regularization in DRL. We also propose a simple non-uniform sampling method for selecting transitions from the replay buffer during training. We further show that the streamlined algorithm with the simple non-uniform sampling scheme outperforms SAC and achieves state-of-the-art performance on challenging continuous control tasks.
Tasks	Continuous Control
Published	2020-01-01
URL	https://openreview.net/forum?id=SJl47yBYPS
PDF	https://openreview.net/pdf?id=SJl47yBYPS
PWC	https://paperswithcode.com/paper/towards-simplicity-in-deep-reinforcement
Repo
Framework

Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning


Title	Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning
Authors	Anonymous
Abstract	Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensional state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the ``best” coverage is still an open problem. In this work, we present a novel approach to population-based RL in continuous control that leverages properties of normalizing flows to perform attractive and repulsive operations between current members of the population and previously observed policies. Empirical results on the MuJoCo suite demonstrate a high performance gain for our algorithm compared to prior work, including Soft-Actor Critic (SAC). \|
Tasks	Continuous Control
Published	2020-01-01
URL	https://openreview.net/forum?id=BJlAzTEKwS
PDF	https://openreview.net/pdf?id=BJlAzTEKwS
PWC	https://paperswithcode.com/paper/attraction-repulsion-actor-critic-for-1
Repo
Framework

Advantage Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning


Title	Advantage Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
Authors	Anonymous
Abstract	In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that uses standard supervised learning methods as subroutines. Our goal is an algorithm that utilizes only simple and convergent maximum likelihood loss functions, while also being able to leverage off-policy data. Our proposed approach, which we refer to as advantage-weighted regression (AWR), consists of two standard supervised learning steps: one to regress onto target values for a value function, and another to regress onto weighted target actions for the policy. The method is simple and general, can accommodate continuous and discrete actions, and can be implemented in just a few lines of code on top of standard supervised learning methods. We provide a theoretical motivation for AWR and analyze its properties when incorporating off-policy data from experience replay. We evaluate AWR on a suite of standard OpenAI Gym benchmark tasks, and show that it achieves competitive performance compared to a number of well-established state-of-the-art RL algorithms. AWR is also able to acquire more effective policies than most off-policy algorithms when learning from purely static datasets with no additional environmental interactions. Furthermore, we demonstrate our algorithm on challenging continuous control tasks with highly complex simulated characters.
Tasks	Continuous Control
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gdF34FvS
PDF	https://openreview.net/pdf?id=H1gdF34FvS
PWC	https://paperswithcode.com/paper/advantage-weighted-regression-simple-and-1
Repo
Framework

CNAS: Channel-Level Neural Architecture Search


Title	CNAS: Channel-Level Neural Architecture Search
Authors	Anonymous
Abstract	There is growing interest in automating designing good neural network architectures. The NAS methods proposed recently have significantly reduced architecture search cost by sharing parameters, but there is still a challenging problem of designing search space. We consider search space is typically defined with its shape and a set of operations and propose a channel-level architecture search,(CNAS) method using only a fixed type of operation. The resulting architecture is sparse in terms of channel and has different topology at different cell. The experimental results for CIFAR-10 and ImageNet show that a fine-granular and sparse model searched by CNAS achieves very competitive performance with dense models searched by the existing methods.
Tasks	Neural Architecture Search
Published	2020-01-01
URL	https://openreview.net/forum?id=rklfIeSFwS
PDF	https://openreview.net/pdf?id=rklfIeSFwS
PWC	https://paperswithcode.com/paper/cnas-channel-level-neural-architecture-search
Repo
Framework


Title	On Weight-Sharing and Bilevel Optimization in Architecture Search
Authors	Anonymous
Abstract	Weight-sharing—the simultaneous optimization of multiple neural networks using the same parameters—has emerged as a key component of state-of-the-art neural architecture search. However, its success is poorly understood and often found to be surprising. We argue that, rather than just being an optimization trick, the weight-sharing approach is induced by the relaxation of a structured hypothesis space, and introduces new algorithmic and theoretical challenges as well as applications beyond neural architecture search. Algorithmically, we show how the geometry of ERM for weight-sharing requires greater care when designing gradient- based minimization methods and apply tools from non-convex non-Euclidean optimization to give general-purpose algorithms that adapt to the underlying structure. We further analyze the learning-theoretic behavior of the bilevel optimization solved by practical weight-sharing methods. Next, using kernel configuration and NLP feature selection as case studies, we demonstrate how weight-sharing applies to the architecture search generalization of NAS and effectively optimizes the resulting bilevel objective. Finally, we use our optimization analysis to develop a simple exponentiated gradient method for NAS that aligns with the underlying optimization geometry and matches state-of-the-art approaches on CIFAR-10.
Tasks	bilevel optimization, Feature Selection, Neural Architecture Search
Published	2020-01-01
URL	https://openreview.net/forum?id=HJgRCyHFDr
PDF	https://openreview.net/pdf?id=HJgRCyHFDr
PWC	https://paperswithcode.com/paper/on-weight-sharing-and-bilevel-optimization-in
Repo
Framework

Searching for Stage-wise Neural Graphs In the Limit


Title	Searching for Stage-wise Neural Graphs In the Limit
Authors	Anonymous
Abstract	Search space is a key consideration for neural architecture search. Recently, Xie et al. (2019a) found that randomly generated networks from the same distribution perform similarly, which suggest we should search for random graph distributions instead of graphs. We propose graphon as a new search space. A graphon is the limit of Cauchy sequence of graphs and a scale-free probabilistic distribution, from which graphs of different number of vertices can be drawn. This property enables us to perform NAS using fast, low-capacity models and scale the found models up when necessary. We develop an algorithm for NAS in the space of graphons and empirically demonstrate that it can find stage-wise graphs that outperform DenseNet and other baselines on ImageNet.
Tasks	Neural Architecture Search
Published	2020-01-01
URL	https://openreview.net/forum?id=SkxWnkStvS
PDF	https://openreview.net/pdf?id=SkxWnkStvS
PWC	https://paperswithcode.com/paper/searching-for-stage-wise-neural-graphs-in-the
Repo
Framework

Continual Learning Feature Selection Game of Go Adversarial Defense Bayesian Inference Open Set Learning Meta-Learning Continuous Control Neural Architecture Search bilevel optimization computer vision reinforcement learning Atari Games nlp Image Classification Q-Learning machine learning Efficient Exploration Audio Classification dataset