April 1, 2020

2864 words 14 mins read

Paper Group NANR 36

INSTANCE CROSS ENTROPY FOR DEEP METRIC LEARNING. Abstract Diagrammatic Reasoning with Multiplex Graph Networks. MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics. Semantically-Guided Representation Learning for Self-Supervised Monocular Depth. Generative Integration Networks. Disc …

INSTANCE CROSS ENTROPY FOR DEEP METRIC LEARNING


Title	INSTANCE CROSS ENTROPY FOR DEEP METRIC LEARNING
Authors	Anonymous
Abstract	Loss functions play a crucial role in deep metric learning thus a variety of them have been proposed. Some supervise the learning process by pairwise or tripletwise similarity constraints while others take the advantage of structured similarity information among multiple data points. In this work, we approach deep metric learning from a novel perspective. We propose instance cross entropy (ICE) which measures the difference between an estimated instance-level matching distribution and its ground-truth one. ICE has three main appealing properties. Firstly, similar to categorical cross entropy (CCE), ICE has clear probabilistic interpretation and exploits structured semantic similarity information for learning supervision. Secondly, ICE is scalable to infinite training data as it learns on mini-batches iteratively and is independent of the training set size. Thirdly, motivated by our relative weight analysis, seamless sample reweighting is incorporated. It rescales samples’ gradients to control the differentiation degree over training examples instead of truncating them by sample mining. In addition to its simplicity and intuitiveness, extensive experiments on three real-world benchmarks demonstrate the superiority of ICE.
Tasks	Metric Learning, Semantic Similarity, Semantic Textual Similarity
Published	2020-01-01
URL	https://openreview.net/forum?id=BJeguTEKDB
PDF	https://openreview.net/pdf?id=BJeguTEKDB
PWC	https://paperswithcode.com/paper/instance-cross-entropy-for-deep-metric
Repo
Framework

Abstract Diagrammatic Reasoning with Multiplex Graph Networks


Title	Abstract Diagrammatic Reasoning with Multiplex Graph Networks
Authors	Anonymous
Abstract	Abstract reasoning, particularly in the visual domain, is a complex human ability, but it remains a challenging problem for artificial neural learning systems. In this work we propose MXGNet, a multilayer graph neural network for multi-panel diagrammatic reasoning tasks. MXGNet combines three powerful concepts, namely, object-level representation, graph neural networks and multiplex graphs, for solving visual reasoning tasks. MXGNet first extracts object-level representations for each element in all panels of the diagrams, and then forms a multi-layer multiplex graph capturing multiple relations between objects across different diagram panels. MXGNet summarises the multiple graphs extracted from the diagrams of the task, and uses this summarisation to pick the most probable answer from the given candidates. We have tested MXGNet on two types of diagrammatic reasoning tasks, namely Diagram Syllogisms and Raven Progressive Matrices (RPM). For an Euler Diagram Syllogism task MXGNet achieves state-of-the-art accuracy of 99.8%. For PGM and RAVEN, two comprehensive datasets for RPM reasoning, MXGNet outperforms the state-of-the-art models by a considerable margin.
Tasks	Visual Reasoning
Published	2020-01-01
URL	https://openreview.net/forum?id=ByxQB1BKwH
PDF	https://openreview.net/pdf?id=ByxQB1BKwH
PWC	https://paperswithcode.com/paper/abstract-diagrammatic-reasoning-with
Repo
Framework

MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics


Title	MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics
Authors	Anonymous
Abstract	Transfer reinforcement learning (RL) aims at improving learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks. However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments. In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently. To address this problem, the proposed approach, MULTI-source POLicy AggRegation (MULTIPOLAR), comprises two key techniques. We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance. Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy’s expressiveness even when some of the source policies perform poorly. We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces.
Tasks	Transfer Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Byx9p2EtDH
PDF	https://openreview.net/pdf?id=Byx9p2EtDH
PWC	https://paperswithcode.com/paper/multipolar-multi-source-policy-aggregation-1
Repo
Framework

Semantically-Guided Representation Learning for Self-Supervised Monocular Depth


Title	Semantically-Guided Representation Learning for Self-Supervised Monocular Depth
Authors	Anonymous
Abstract	Self-supervised learning is showing great promise for monocular depth estimation, using geometry as the only source of supervision. Depth networks are indeed capable of learning representations that relate visual appearance to 3D properties by implicitly leveraging category-level patterns. In this work we investigate how to leverage more directly this semantic structure to guide geometric representation learning, while remaining in the self-supervised regime. Instead of using semantic labels and proxy losses in a multi-task approach, we propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning via pixel-adaptive convolutions. Furthermore, we propose a two-stage training process to overcome a common semantic bias on dynamic objects via resampling. Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.
Tasks	Depth Estimation, Monocular Depth Estimation, Representation Learning, Semantic Segmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=ByxT7TNFvH
PDF	https://openreview.net/pdf?id=ByxT7TNFvH
PWC	https://paperswithcode.com/paper/semantically-guided-representation-learning
Repo
Framework

Generative Integration Networks


Title	Generative Integration Networks
Authors	Anonymous
Abstract	This paper presents an unbiased exploration framework for the belief state $p(s)$ in non-cooperative, multi-agent, partially-observable environments through differentiable recurrent functions. As well as single-agent exploration via intrinsic reward and generative RNNs, several researchers have proposed differentiable multi-agent communication models such as CommNet and IC3Net for scalable exploration through multiple agents. However, none of the existing frameworks so far capture the unbiased belief state in non-cooperative settings as with the nature due to biased examples reported from adersarial agents. {\em Generative integration networks} (GINs) is the first unbiased exploration framework insipired by honest reporting mechanisms in economics. The key idea is {\em synchrony}, an inter-agent reward to discriminate the honest reporting and the adversarial reporting \textbf{without real examples}, which is the different point from the GANs. Experimental results obtained using two non-cooperative multi-agent environments up to 20 agents denote that GINs show state-of-the-art performance in the exploration frameworks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkgcsyBKDH
PDF	https://openreview.net/pdf?id=rkgcsyBKDH
PWC	https://paperswithcode.com/paper/generative-integration-networks
Repo
Framework

Discriminator Based Corpus Generation for General Code Synthesis


Title	Discriminator Based Corpus Generation for General Code Synthesis
Authors	Alexander Wild, Barry Porter
Abstract	Current work on neural code synthesis consists of increasingly sophisticated architectures being trained on highly simplified domain-specific languages, using uniform sampling across program space of those languages for training. By comparison, program space for a C-like language is vast, and extremely sparsely populated in terms of `useful’ functionalities; this requires a far more intelligent approach to corpus generation for effective training. We use a genetic programming approach using an iteratively retrained discriminator to produce a population suitable as labelled training data for a neural code synthesis architecture. We demonstrate that use of a discriminator-based training corpus generator, trained using only unlabelled problem specifications in classic Programming-by-Example format, greatly improves network performance compared to current uniform sampling techniques. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkxDon4Yvr
PDF	https://openreview.net/pdf?id=rkxDon4Yvr
PWC	https://paperswithcode.com/paper/discriminator-based-corpus-generation-for
Repo
Framework

Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning


Title	Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
Authors	Anonymous
Abstract	A central challenge in multi-agent reinforcement learning is the induction of coordination between agents of a team. In this work, we investigate how to promote inter-agent coordination using policy regularization and discuss two possible avenues respectively based on inter-agent modelling and synchronized sub-policy selection. We test each approach in four challenging continuous control tasks with sparse rewards and compare them against three baselines including MADDPG, a state-of-the-art multi-agent reinforcement learning algorithm. To ensure a fair comparison, we rely on a thorough hyper-parameter selection and training methodology that allows a fixed hyper-parameter search budget for each algorithm and environment. We consequently assess both the hyper-parameter sensitivity, sample-efficiency and asymptotic performance of each learning method. Our experiments show that the proposed methods lead to significant improvements on cooperative problems. We further analyse the effects of the proposed regularizations on the behaviors learned by the agents.
Tasks	Continuous Control, Multi-agent Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BkggGREKvS
PDF	https://openreview.net/pdf?id=BkggGREKvS
PWC	https://paperswithcode.com/paper/promoting-coordination-through-policy-1
Repo
Framework

Policy Optimization In the Face of Uncertainty


Title	Policy Optimization In the Face of Uncertainty
Authors	Anonymous
Abstract	Model-based reinforcement learning has the potential to be more sample efficient than model-free approaches. However, existing model-based methods are vulnerable to model bias, which leads to poor generalization and asymptotic performance compared to model-free counterparts. In this paper, we propose a novel policy optimization framework using an uncertainty-aware objective function to handle those issues. In this framework, the agent simultaneously learns an uncertainty-aware dynamics model and optimizes the policy according to these learned models. Under this framework, the objective function can represented end-to-end as a single computational graph, which allows seamless policy gradient computation via backpropagation through the models. In addition to being theoretically sound, our approach shows promising results on challenging continuous control benchmarks with competitive asymptotic performance and sample complexity compared to state-of-the-art baselines.
Tasks	Continuous Control
Published	2020-01-01
URL	https://openreview.net/forum?id=HJg3Rp4FwH
PDF	https://openreview.net/pdf?id=HJg3Rp4FwH
PWC	https://paperswithcode.com/paper/policy-optimization-in-the-face-of
Repo
Framework

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods


Title	Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
Authors	Anonymous
Abstract	Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic’s action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a novel and flexible meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to the vanilla critic, the meta-critic network is explicitly trained to accelerate the learning process; and compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic framework is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning leads to improvements in a variety of continuous control environments when combined with contemporary Off-PAC methods DDPG, TD3 and the state-of-the-art SAC.
Tasks	Continuous Control, Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lKd6NYPS
PDF	https://openreview.net/pdf?id=H1lKd6NYPS
PWC	https://paperswithcode.com/paper/online-meta-critic-learning-for-off-policy
Repo
Framework

Generative Cleaning Networks with Quantized Nonlinear Transform for Deep Neural Network Defense


Title	Generative Cleaning Networks with Quantized Nonlinear Transform for Deep Neural Network Defense
Authors	Anonymous
Abstract	Effective defense of deep neural networks against adversarial attacks remains a challenging problem, especially under white-box attacks. In this paper, we develop a new generative cleaning network with quantized nonlinear transform for effective defense of deep neural networks. The generative cleaning network, equipped with a trainable quantized nonlinear transform block, is able to destroy the sophisticated noise pattern of adversarial attacks and recover the original image content. The generative cleaning network and attack detector network are jointly trained using adversarial learning to minimize both perceptual loss and adversarial loss. Our extensive experimental results demonstrate that our approach outperforms the state-of-art methods by large margins in both white-box and black-box attacks. For example, it improves the classification accuracy for white-box attacks upon the second best method by more than 40% on the SVHN dataset and more than 20% on the challenging CIFAR-10 dataset.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkxOhANKDr
PDF	https://openreview.net/pdf?id=SkxOhANKDr
PWC	https://paperswithcode.com/paper/generative-cleaning-networks-with-quantized
Repo
Framework

Imitation Learning of Robot Policies using Language, Vision and Motion


Title	Imitation Learning of Robot Policies using Language, Vision and Motion
Authors	Anonymous
Abstract	In this work we propose a novel end-to-end imitation learning approach which combines natural language, vision, and motion information to produce an abstract representation of a task, which in turn can be used to synthesize specific motion controllers at run-time. This multimodal approach enables generalization to a wide variety of environmental conditions and allows an end-user to influence a robot policy through verbal communication. We empirically validate our approach with an extensive set of simulations and show that it achieves a high task success rate over a variety of conditions while remaining amenable to probabilistic interpretability.
Tasks	Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkg5LgrYwS
PDF	https://openreview.net/pdf?id=Bkg5LgrYwS
PWC	https://paperswithcode.com/paper/imitation-learning-of-robot-policies-using
Repo
Framework

Model Based Reinforcement Learning for Atari


Title	Model Based Reinforcement Learning for Atari
Authors	Anonymous
Abstract	Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction – substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works and predict which actions will lead to desirable outcomes. In this paper, we explore how video prediction models can similarly enable agents to solve Atari games with fewer interactions than model-free methods. We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting. Our experiments evaluate SimPLe on a range of Atari games in low data regime of 100k interactions between the agent and the environment, which corresponds to two hours of real-time play. In most games SimPLe outperforms state-of-the-art model-free algorithms, in some games by over an order of magnitude.
Tasks	Atari Games, Video Prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=S1xCPJHtDB
PDF	https://openreview.net/pdf?id=S1xCPJHtDB
PWC	https://paperswithcode.com/paper/model-based-reinforcement-learning-for-atari-1
Repo
Framework

Adversarial Filters of Dataset Biases


Title	Adversarial Filters of Dataset Biases
Authors	Anonymous
Abstract	Large-scale benchmark datasets have been among the major driving forces in AI, supporting training of models and measuring their progress. The key assumption is that these benchmarks are realistic approximations of the target tasks in the real world. However, while machine performance on these benchmarks advances rapidly — often surpassing human performance — it still struggles on the target tasks in the wild. This raises an important question: whether the surreal high performance on existing benchmarks are inflated due to spurious biases in them, and if so, how we can effectively revise these benchmarks to better simulate more realistic problem distributions in the real world. In this paper, we posit that while the real world problems consist of a great deal of long-tail problems, existing benchmarks are overly populated with a great deal of similar (thus non-tail) problems, which in turn, leads to a major overestimation of true AI performance. To address this challenge, we present a novel framework of Adversarial Filters to investigate model-based reduction of dataset biases. We discuss that the optimum bias reduction via AFOptimum is intractable, thus propose AFLite, an iterative greedy algorithm that adversarially filters out data points to identify a reduced dataset with more realistic problem distributions and considerably less spurious biases. AFLite is lightweight and can in principle be applied to any task and dataset. We apply it to popular benchmarks that are practically solved — ImageNet and Natural Language Inference (SNLI, MNLI, QNLI) — and present filtered counterparts as new challenge datasets where the model performance drops considerably (e.g., from 84% to 24% for ImageNet and from 92% to 62% for SNLI), while human performance remains high. An extensive suite of analysis demonstrates that AFLite effectively reduces measurable dataset biases in both the synthetic and real datasets. Finally, we introduce new measures of dataset biases based on K-nearest-neighbors to help guide future research on dataset developments and bias reduction.
Tasks	Natural Language Inference
Published	2020-01-01
URL	https://openreview.net/forum?id=H1g8p1BYvS
PDF	https://openreview.net/pdf?id=H1g8p1BYvS
PWC	https://paperswithcode.com/paper/adversarial-filters-of-dataset-biases
Repo
Framework

Agent as Scientist: Learning to Verify Hypotheses


Title	Agent as Scientist: Learning to Verify Hypotheses
Authors	Anonymous
Abstract	In this paper, we formulate hypothesis verification as a reinforcement learning problem. Specifically, we aim to build an agent that, given a hypothesis about the dynamics of the world can take actions to generate observations which can help predict whether the hypothesis is true or false. Our first observation is that agents trained end-to-end with the reward fail to learn to solve this problem. In order to train the agents, we exploit the underlying structure in the majority of hypotheses – they can be formulated as triplets (pre-condition, action sequence, post-condition). Once the agents have been pretrained to verify hypotheses with this structure, they can be fine-tuned to verify more general hypotheses. Our work takes a step towards a ``scientist agent’’ that develops an understanding of the world by generating and testing hypotheses about its environment. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Syxss0EYPS
PDF	https://openreview.net/pdf?id=Syxss0EYPS
PWC	https://paperswithcode.com/paper/agent-as-scientist-learning-to-verify
Repo
Framework

A Boolean Task Algebra for Reinforcement Learning


Title	A Boolean Task Algebra for Reinforcement Learning
Authors	Anonymous
Abstract	We propose a framework for defining a Boolean algebra over the space of tasks. This allows us to formulate new tasks in terms of the negation, disjunction and conjunction of a set of base tasks. We then show that by learning goal-oriented value functions and restricting the transition dynamics of the tasks, an agent can solve these new tasks with no further learning. We prove that by composing these value functions in specific ways, we immediately recover the optimal policies for all tasks expressible under the Boolean algebra. We verify our approach in two domains, including a high-dimensional video game environment requiring function approximation, where an agent first learns a set of base skills, and then composes them to solve a super-exponential number of new tasks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJecbgHtDH
PDF	https://openreview.net/pdf?id=rJecbgHtDH
PWC	https://paperswithcode.com/paper/a-boolean-task-algebra-for-reinforcement
Repo
Framework