April 1, 2020

2769 words 13 mins read

Paper Group NANR 80

Learning Invariants through Soft Unification. Collapsed amortized variational inference for switching nonlinear dynamical systems. Thinking While Moving: Deep Reinforcement Learning with Concurrent Control. Target-Embedding Autoencoders for Supervised Representation Learning. COMBINED FLEXIBLE ACTIVATION FUNCTIONS FOR DEEP NEURAL NETWORKS. Neural M …

Learning Invariants through Soft Unification


Title	Learning Invariants through Soft Unification
Authors	Anonymous
Abstract	Human reasoning involves recognising common underlying principles across many examples by utilising variables. The by-products of such reasoning are invariants that capture patterns across examples such as “if someone went somewhere then they are there” without mentioning specific people or places. Humans learn what variables are and how to use them at a young age, and the question this paper addresses is whether machines can also learn and use variables solely from examples without requiring human pre-engineering. We propose Unification Networks that incorporate soft unification into neural networks to learn variables and by doing so lift examples into invariants that can then be used to solve a given task. We evaluate our approach on four datasets to demonstrate that learning invariants captures patterns in the data and can improve performance over baselines.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1xwA34KDB
PDF	https://openreview.net/pdf?id=r1xwA34KDB
PWC	https://paperswithcode.com/paper/learning-invariants-through-soft-unification-1
Repo
Framework

Collapsed amortized variational inference for switching nonlinear dynamical systems


Title	Collapsed amortized variational inference for switching nonlinear dynamical systems
Authors	Anonymous
Abstract	We propose an efficient inference method for switching nonlinear dynamical systems. The key idea is to learn an inference network which can be used as a proposal distribution for the continuous latent variables, while performing exact marginalization of the discrete latent variables. This allows us to use the reparameterization trick, and apply end-to-end training with SGD. We show that this method can successfully segment time series data (including videos) into meaningful “regimes”, due to the use of piece-wise nonlinear dynamics.
Tasks	Time Series
Published	2020-01-01
URL	https://openreview.net/forum?id=BkxdqA4tvB
PDF	https://openreview.net/pdf?id=BkxdqA4tvB
PWC	https://paperswithcode.com/paper/collapsed-amortized-variational-inference-for-1
Repo
Framework

Thinking While Moving: Deep Reinforcement Learning with Concurrent Control


Title	Thinking While Moving: Deep Reinforcement Learning with Concurrent Control
Authors	Anonymous
Abstract	We study reinforcement learning in settings where sampling an action from the policy must be done concurrently with the time evolution of the controlled system, such as when a robot must decide on the next action while still performing the previous action. Much like a person or an animal, the robot must think and move at the same time, deciding on its next action before the previous one has completed. In order to develop an algorithmic framework for such concurrent control problems, we start with a continuous-time formulation of the Bellman equations, and then discretize them in a way that is aware of system delays. We instantiate this new class of approximate dynamic programming methods via a simple architectural extension to existing value-based deep reinforcement learning algorithms. We evaluate our methods on simulated benchmark tasks and a large-scale robotic grasping task where the robot must “think while moving.’’
Tasks	Robotic Grasping
Published	2020-01-01
URL	https://openreview.net/forum?id=SJexHkSFPS
PDF	https://openreview.net/pdf?id=SJexHkSFPS
PWC	https://paperswithcode.com/paper/thinking-while-moving-deep-reinforcement
Repo
Framework

Target-Embedding Autoencoders for Supervised Representation Learning


Title	Target-Embedding Autoencoders for Supervised Representation Learning
Authors	Anonymous
Abstract	Autoencoder-based learning has emerged as a staple for disciplining representations in unsupervised and semi-supervised settings. This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional. We motivate and formalize the notion of target-embedding autoencoders (TEA) for supervised prediction, designed to learn intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets—encoding the prior that variations in targets are driven by a compact set of underlying factors. As our theoretical contribution, we provide a guarantee of generalization for linear TEAs by demonstrating uniform stability, interpreting the benefit of the auxiliary reconstruction task as a form of regularization. As our empirical contribution, we extend validation of this approach beyond the commonly-studied static domain to multivariate sequence forecasting, investigating the advantage that TEAs confer on both linear and nonlinear architectures.
Tasks	Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BygXFkSYDH
PDF	https://openreview.net/pdf?id=BygXFkSYDH
PWC	https://paperswithcode.com/paper/target-embedding-autoencoders-for-supervised
Repo
Framework

COMBINED FLEXIBLE ACTIVATION FUNCTIONS FOR DEEP NEURAL NETWORKS


Title	COMBINED FLEXIBLE ACTIVATION FUNCTIONS FOR DEEP NEURAL NETWORKS
Authors	Anonymous
Abstract	Activation in deep neural networks is fundamental to achieving non-linear mappings. Traditional studies mainly focus on finding fixed activations for a particular set of learning tasks or model architectures. The research on flexible activation is quite limited in both designing philosophy and application scenarios. In this study, we propose a general combined form of flexible activation functions as well as three principles of choosing flexible activation component. Based on this, we develop two novel flexible activation functions that can be implemented in LSTM cells and auto-encoder layers. Also two new regularisation terms based on assumptions as prior knowledge are proposed. We find that LSTM and auto-encoder models with proposed flexible activations provides significant improvements on time series forecasting and image compressing tasks, while layer-wise regularization can improve the performance of CNN (LeNet-5) models with RPeLu activation in image classification tasks.
Tasks	Image Classification, Time Series, Time Series Forecasting
Published	2020-01-01
URL	https://openreview.net/forum?id=r1lh6C4FDr
PDF	https://openreview.net/pdf?id=r1lh6C4FDr
PWC	https://paperswithcode.com/paper/combined-flexible-activation-functions-for
Repo
Framework

Neural Module Networks for Reasoning over Text


Title	Neural Module Networks for Reasoning over Text
Authors	Anonymous
Abstract	Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations. Neural module networks (NMNs) learn to parse such questions as executable programs composed of learnable modules, performing well on synthetic visual QA domains. However, we find that it is challenging to learn these models for non-synthetic questions on open-domain text, where a model needs to deal with the diversity of natural language and perform a broader range of reasoning. We extend NMNs by: (a) introducing modules that reason over a paragraph of text, performing symbolic reasoning (such as arithmetic, sorting, counting) over numbers and dates in a probabilistic and differentiable manner; and (b) proposing an unsupervised auxiliary loss to help extract arguments associated with the events in text. Additionally, we show that a limited amount of heuristically-obtained question program and intermediate module output supervision provides sufficient inductive bias for accurate learning. Our proposed model significantly outperforms state-of-the-art models on a subset of the DROP dataset that poses a variety of reasoning challenges that are covered by our modules.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SygWvAVFPr
PDF	https://openreview.net/pdf?id=SygWvAVFPr
PWC	https://paperswithcode.com/paper/neural-module-networks-for-reasoning-over
Repo
Framework

Compressive Transformers for Long-Range Sequence Modelling


Title	Compressive Transformers for Long-Range Sequence Modelling
Authors	Anonymous
Abstract	We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.
Tasks	Language Modelling
Published	2020-01-01
URL	https://openreview.net/forum?id=SylKikSYDH
PDF	https://openreview.net/pdf?id=SylKikSYDH
PWC	https://paperswithcode.com/paper/compressive-transformers-for-long-range
Repo
Framework

Cross Domain Imitation Learning


Title	Cross Domain Imitation Learning
Authors	Anonymous
Abstract	We study the question of how to imitate tasks across domains with discrepancies such as embodiment and viewpoint mismatch. Many prior works require paired, aligned demonstrations and an additional RL procedure for the task. However, paired, aligned demonstrations are seldom obtainable and RL procedures are expensive. In this work, we formalize the Cross Domain Imitation Learning (CDIL) problem, which encompasses imitation learning in the presence of viewpoint and embodiment mismatch. Informally, CDIL is the process of learning how to perform a task optimally, given demonstrations of the task in a distinct domain. We propose a two step approach to CDIL: alignment followed by adaptation. In the alignment step we execute a novel unsupervised MDP alignment algorithm, Generative Adversarial MDP Alignment (GAMA), to learn state and action correspondences from unpaired, unaligned demonstrations. In the adaptation step we leverage the correspondences to zero-shot imitate tasks across domains. To describe when CDIL is feasible via alignment and adaptation, we introduce a theory of MDP alignability. We experimentally evaluate GAMA against baselines in both embodiment and viewpoint mismatch scenarios where aligned demonstrations don’t exist and show the effectiveness of our approach.
Tasks	Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=S1gV6AVKwB
PDF	https://openreview.net/pdf?id=S1gV6AVKwB
PWC	https://paperswithcode.com/paper/cross-domain-imitation-learning-1
Repo
Framework

Improving Federated Learning Personalization via Model Agnostic Meta Learning


Title	Improving Federated Learning Personalization via Model Agnostic Meta Learning
Authors	Anonymous
Abstract	Federated Learning (FL) refers to learning a high quality global model based on decentralized data storage, without ever copying the raw data. A natural scenario arises with data created on mobile phones by the activity of their users. Given the typical data heterogeneity in such situations, it is natural to ask how can the global model be personalized for every such device, individually. In this work, we point out that the setting of Model Agnostic Meta Learning (MAML), where one optimizes for a fast, gradient-based, few-shot adaptation to a heterogeneous distribution of tasks, has a number of similarities with the objective of personalization for FL. We present FL as a natural source of practical applications for MAML algorithms, and make the following observations. 1) The popular FL algorithm, Federated Averaging, can be interpreted as a meta learning algorithm. 2) Careful fine-tuning can yield a global model with higher accuracy, which is at the same time easier to personalize. However, solely optimizing for the global model accuracy yields a weaker personalization result. 3) A model trained using a standard datacenter optimization method is much harder to personalize, compared to one trained using Federated Averaging, supporting the first claim. These results raise new questions for FL, MAML, and broader ML research.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BkeaEyBYDB
PDF	https://openreview.net/pdf?id=BkeaEyBYDB
PWC	https://paperswithcode.com/paper/improving-federated-learning-personalization
Repo
Framework

Learning to Coordinate Manipulation Skills via Skill Behavior Diversification


Title	Learning to Coordinate Manipulation Skills via Skill Behavior Diversification
Authors	Anonymous
Abstract	When mastering a complex manipulation task, humans often decompose the task into sub-skills of their body parts, practice the sub-skills independently, and then execute the sub-skills together. Similarly, a robot with multiple end-effectors can perform a complex task by coordinating sub-skills of each end-effector. To realize temporal and behavioral coordination of skills, we propose a hierarchical framework that first individually trains sub-skills of each end-effector with skill behavior diversification, and learns to coordinate end-effectors using diverse behaviors of the skills. We demonstrate that our proposed framework is able to efficiently learn sub-skills with diverse behaviors and coordinate them to solve challenging collaborative control tasks such as picking up a long bar, placing a block inside a container while pushing the container with two robot arms, and pushing a box with two ant agents.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryxB2lBtvH
PDF	https://openreview.net/pdf?id=ryxB2lBtvH
PWC	https://paperswithcode.com/paper/learning-to-coordinate-manipulation-skills
Repo
Framework

Learning Expensive Coordination: An Event-Based Deep RL Approach


Title	Learning Expensive Coordination: An Event-Based Deep RL Approach
Authors	Anonymous
Abstract	Existing works in deep Multi-Agent Reinforcement Learning (MARL) mainly focus on coordinating cooperative agents to complete certain tasks jointly. However, in many cases of the real world, agents are self-interested such as employees in a company and clubs in a league. Therefore, the leader, i.e., the manager of the company or the league, needs to provide bonuses to followers for efficient coordination, which we call expensive coordination. The main difficulties of expensive coordination are that i) the leader has to consider the long-term effect and predict the followers’ behaviors when assigning bonuses and ii) the complex interactions between followers make the training process hard to converge, especially when the leader’s policy changes with time. In this work, we address this problem through an event-based deep RL approach. Our main contributions are threefold. (1) We model the leader’s decision-making process as a semi-Markov Decision Process and propose a novel multi-agent event-based policy gradient to learn the leader’s long-term policy. (2) We exploit the leader-follower consistency scheme to design a follower-aware module and a follower-specific attention module to predict the followers’ behaviors and make accurate response to their behaviors. (3) We propose an action abstraction-based policy gradient algorithm to reduce the followers’ decision space and thus accelerate the training process of followers. Experiments in resource collections, navigation, and the predator-prey game reveal that our approach outperforms the state-of-the-art methods dramatically.
Tasks	Decision Making, Multi-agent Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=ryeG924twB
PDF	https://openreview.net/pdf?id=ryeG924twB
PWC	https://paperswithcode.com/paper/learning-expensive-coordination-an-event
Repo
Framework

Sparse Skill Coding: Learning Behavioral Hierarchies with Sparse Codes


Title	Sparse Skill Coding: Learning Behavioral Hierarchies with Sparse Codes
Authors	Anonymous
Abstract	Many approaches to hierarchical reinforcement learning aim to identify sub-goal structure in tasks. We consider an alternative perspective based on identifying behavioral `motifs’—repeated action sequences that can be compressed to yield a compact code of action trajectories. We present a method for iteratively compressing action trajectories to learn nested behavioral hierarchies of arbitrary depth, with actions of arbitrary length. The learned temporally extended actions provide new action primitives that can participate in deeper hierarchies as the agent learns. We demonstrate the relevance of this approach for tasks with non-trivial hierarchical structure and show that the approach can be used to accelerate learning in recursively more complex tasks through transfer. \|
Tasks	Hierarchical Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Hygv3xrtDr
PDF	https://openreview.net/pdf?id=Hygv3xrtDr
PWC	https://paperswithcode.com/paper/sparse-skill-coding-learning-behavioral
Repo
Framework

Sub-policy Adaptation for Hierarchical Reinforcement Learning


Title	Sub-policy Adaptation for Hierarchical Reinforcement Learning
Authors	Anonymous
Abstract	Hierarchical reinforcement learning is a promising approach to tackle long-horizon decision-making problems with sparse rewards. Unfortunately, most methods still decouple the lower-level skill acquisition process and the training of a higher level that controls the skills in a new task. Leaving the skills fixed can lead to significant sub-optimality in the transfer setting. In this work, we propose a novel algorithm to discover a set of skills, and continuously adapt them along with the higher level even when training on a new task. Our main contributions are two-fold. First, we derive a new hierarchical policy gradient with an unbiased latent-dependent baseline, and we introduce Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy jointly. Second, we propose a method of training time-abstractions that improves the robustness of the obtained skills to environment changes. Code and videos are available at sites.google.com/view/hippo-rl.
Tasks	Decision Making, Hierarchical Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=ByeWogStDS
PDF	https://openreview.net/pdf?id=ByeWogStDS
PWC	https://paperswithcode.com/paper/sub-policy-adaptation-for-hierarchical-1
Repo
Framework

PROVABLY BENEFITS OF DEEP HIERARCHICAL RL


Title	PROVABLY BENEFITS OF DEEP HIERARCHICAL RL
Authors	Anonymous
Abstract	Modern complex sequential decision-making problem often both low-level policy and high-level planning. Deep hierarchical reinforcement learning (Deep HRL) admits multi-layer abstractions which naturally model the policy in a hierarchical manner, and it is believed that deep HRL can reduce the sample complexity compared to the standard RL frameworks. We initiate the study of rigorously characterizing the complexity of Deep HRL. We present a model-based optimistic algorithm which demonstrates that the complexity of learning a near-optimal policy for deep HRL scales with the sum of number of states at each abstraction layer whereas standard RL scales with the product of number of states at each abstraction layer. Our algorithm achieves this goal by using the fact that distinct high-level states have similar low-level structures, which allows an efficient information exploitation and thus experiences from different high-level state-action pairs can be generalized to unseen state-actions. Overall, our result shows an exponential improvement using Deep HRL comparing to standard RL framework.
Tasks	Decision Making, Hierarchical Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=ByxloeHFPS
PDF	https://openreview.net/pdf?id=ByxloeHFPS
PWC	https://paperswithcode.com/paper/provably-benefits-of-deep-hierarchical-rl
Repo
Framework

Compositional Transfer in Hierarchical Reinforcement Learning


Title	Compositional Transfer in Hierarchical Reinforcement Learning
Authors	Anonymous
Abstract	The successful application of flexible, general learning algorithms to real-world robotics applications is often limited by their poor data-efficiency. To address the challenge, domains with more than one dominant task of interest encourage the sharing of information across tasks to limit required experiment time. To this end, we investigate compositional inductive biases in the form of hierarchical policies as a mechanism for knowledge transfer across tasks in reinforcement learning (RL). We demonstrate that this type of hierarchy enables positive transfer while mitigating negative interference. Furthermore, we demonstrate the benefits of additional incentives to efficiently decompose task solutions. Our experiments show that these incentives are naturally given in multitask learning and can be easily introduced for single objectives. We design an RL algorithm that enables stable and fast learning of structured policies and the effective reuse of both behavior components and transition data across tasks in an off-policy setting. Finally, we evaluate our algorithm in simulated environments as well as physical robot experiments and demonstrate substantial improvements in data data-efficiency over competitive baselines.
Tasks	Hierarchical Reinforcement Learning, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lTRJBtwB
PDF	https://openreview.net/pdf?id=H1lTRJBtwB
PWC	https://paperswithcode.com/paper/compositional-transfer-in-hierarchical
Repo
Framework