April 1, 2020

2952 words 14 mins read

Paper Group NANR 63

Reinforcement Learning with Probabilistically Complete Exploration. Unknown-Aware Deep Neural Network. DP-LSSGD: An Optimization Method to Lift the Utility in Privacy-Preserving ERM. BERT for Sequence-to-Sequence Milti-Label Text Classification. Latent Question Reformulation and Information Accumulation for Multi-Hop Machine Reading. Data-Driven Ap …

Reinforcement Learning with Probabilistically Complete Exploration


Title	Reinforcement Learning with Probabilistically Complete Exploration
Authors	Anonymous
Abstract	Balancing exploration and exploitation remains a key challenge in reinforcement learning (RL). State-of-the-art RL algorithms suffer from high sample complexity, particularly in the sparse reward case, where they can do no better than to explore in all directions until the first positive rewards are found. To mitigate this, we propose Rapidly Randomly-exploring Reinforcement Learning (R3L). We formulate exploration as a search problem and leverage widely-used planning algorithms such as Rapidly-exploring Random Tree (RRT) to find initial solutions. These solutions are used as demonstrations to initialize a policy, then refined by a generic RL algorithm, leading to faster and more stable convergence. We provide theoretical guarantees of R3L exploration finding successful solutions, as well as bounds for its sampling complexity. We experimentally demonstrate the method outperforms classic and intrinsic exploration techniques, requiring only a fraction of exploration samples and achieving better asymptotic performance.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BkgzqRVFDr
PDF	https://openreview.net/pdf?id=BkgzqRVFDr
PWC	https://paperswithcode.com/paper/reinforcement-learning-with-probabilistically
Repo
Framework

Unknown-Aware Deep Neural Network


Title	Unknown-Aware Deep Neural Network
Authors	Anonymous
Abstract	An important property of image classification systems in the real world is that they both accurately classify objects from target classes (`knowns'') and safely reject unknown objects (`unknowns’') that belong to classes not present in the training data. Unfortunately, although the strong generalization ability of existing CNNs ensures their accuracy when classifying known objects, it also causes them to often assign an unknown to a target class with high confidence. As a result, simply using low-confidence detections as a way to detect unknowns does not work well. In this work, we propose an Unknown-aware Deep Neural Network (UDN for short) to solve this challenging problem. The key idea of UDN is to enhance existing CNNs to support a product operation that models the product relationship among the features produced by convolutional layers. This way, missing a single key feature of a target class will greatly reduce the probability of assigning an object to this class. UDN uses a learned ensemble of these product operations, which allows it to balance the contradictory requirements of accurately classifying known objects and correctly rejecting unknowns. To further improve the performance of UDN at detecting unknowns, we propose an information-theoretic regularization strategy that incorporates the objective of rejecting unknowns into the learning process of UDN. We experiment on benchmark image datasets including MNIST, CIFAR-10, CIFAR-100, and SVHN, adding unknowns by injecting one dataset into another. Our results demonstrate that UDN significantly outperforms state-of-the-art methods at rejecting unknowns by 25 percentage points improvement in accuracy, while still preserving the classification accuracy.
Tasks	Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=rkguLC4tPB
PDF	https://openreview.net/pdf?id=rkguLC4tPB
PWC	https://paperswithcode.com/paper/unknown-aware-deep-neural-network
Repo
Framework

DP-LSSGD: An Optimization Method to Lift the Utility in Privacy-Preserving ERM


Title	DP-LSSGD: An Optimization Method to Lift the Utility in Privacy-Preserving ERM
Authors	Anonymous
Abstract	Machine learning (ML) models trained by differentially private stochastic gradient descent (DP-SGD) have much lower utility than the non-private ones. To mitigate this degradation, we propose a DP Laplacian smoothing SGD (DP-LSSGD) to train ML models with differential privacy (DP) guarantees. At the core of DP-LSSGD is the Laplacian smoothing, which smooths out the Gaussian noise used in the Gaussian mechanism. Under the same amount of noise used in the Gaussian mechanism, DP-LSSGD attains the same DP guarantee, but a better utility especially for the scenarios with strong DP guarantees. In practice, DP-LSSGD makes training both convex and nonconvex ML models more stable and enables the trained models to generalize better. The proposed algorithm is simple to implement and the extra computational complexity and memory overhead compared with DP-SGD are negligible. DP-LSSGD is applicable to train a large variety of ML models, including DNNs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJlG5a4FvB
PDF	https://openreview.net/pdf?id=BJlG5a4FvB
PWC	https://paperswithcode.com/paper/dp-lssgd-an-optimization-method-to-lift-the
Repo
Framework

BERT for Sequence-to-Sequence Milti-Label Text Classification


Title	BERT for Sequence-to-Sequence Milti-Label Text Classification
Authors	Anonymous
Abstract	We study the BERT language representation model and the sequence generation model with BERT encoder for multi-label text classification task. We experiment with both models and explore their special qualities for this setting. We also introduce and examine experimentally a mixed model, which is an ensemble of multi-label BERT and sequence generating BERT models. Our experiments demonstrated that BERT-based models and the mixed model, in particular, outperform current baselines in several metrics achieving state-of-the-art results on three well-studied multi-label classification datasets with English texts and two private Yandex Taxi datasets with Russian texts.
Tasks	Multi-Label Classification, Multi-Label Text Classification, Text Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=BJeHFlBYvB
PDF	https://openreview.net/pdf?id=BJeHFlBYvB
PWC	https://paperswithcode.com/paper/bert-for-sequence-to-sequence-milti-label
Repo
Framework

Latent Question Reformulation and Information Accumulation for Multi-Hop Machine Reading


Title	Latent Question Reformulation and Information Accumulation for Multi-Hop Machine Reading
Authors	Anonymous
Abstract	Multi-hop text-based question-answering is a current challenge in machine comprehension. This task requires to sequentially integrate facts from multiple passages to answer complex natural language questions. In this paper, we propose a novel architecture, called the Latent Question Reformulation Network (LQR-net), a multi-hop and parallel attentive network designed for question-answering tasks that require reasoning capabilities. LQR-net is composed of an association of \textbf{reading modules} and \textbf{reformulation modules}. The purpose of the reading module is to produce a question-aware representation of the document. From this document representation, the reformulation module extracts essential elements to calculate an updated representation of the question. This updated question is then passed to the following hop. We evaluate our architecture on the \hotpotqa question-answering dataset designed to assess multi-hop reasoning capabilities. Our model achieves competitive results on the public leaderboard and outperforms the best current \textit{published} models in terms of Exact Match (EM) and $F_1$ score. Finally, we show that an analysis of the sequential reformulations can provide interpretable reasoning paths.
Tasks	Question Answering, Reading Comprehension
Published	2020-01-01
URL	https://openreview.net/forum?id=S1x63TEYvr
PDF	https://openreview.net/pdf?id=S1x63TEYvr
PWC	https://paperswithcode.com/paper/latent-question-reformulation-and-information
Repo
Framework

Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures


Title	Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures
Authors	Anonymous
Abstract	Generative models have achieved impressive results in many domains including image and text generation. In the natural sciences, generative models have lead to rapid progress in automated drug discovery. Many of the current methods focus on either 1-D or 2-D representations of typically small, drug-like molecules. However, many molecules require 3-D descriptors and exceed the chemical complexity of commonly used dataset. We present a method to encode and decode the position of atoms in 3-D molecules along with a dataset of nearly 50,000 stable crystal unit cells that vary from containing 1 to over 100 atoms. We construct a smooth and continuous 3-D density representation of each crystal based on the positions of different atoms. Two different neural networks were trained on a dataset of over 120,000 three-dimensional samples of single and repeating crystal structures. The first, an Encoder-Decoder pair, constructs a compressed latent space representation of each molecule and then decodes this description into an accurate reconstruction of the input. The second network segments the resulting output into atoms and assigns each atom an atomic number. By generating compressed, continuous latent spaces representations of molecules we are able to decode random samples, interpolate between two molecules, and alter known molecules.
Tasks	Drug Discovery, Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=S1enmaVFvS
PDF	https://openreview.net/pdf?id=S1enmaVFvS
PWC	https://paperswithcode.com/paper/data-driven-approach-to-encoding-and-decoding-1
Repo
Framework

Explaining A Black-box By Using A Deep Variational Information Bottleneck Approach


Title	Explaining A Black-box By Using A Deep Variational Information Bottleneck Approach
Authors	Anonymous
Abstract	Interpretable machine learning has gained much attention recently. Briefness and comprehensiveness are necessary in order to provide a large amount of information concisely when explaining a black-box decision system. However, existing interpretable machine learning methods fail to consider briefness and comprehensiveness simultaneously, leading to redundant explanations. We propose the variational information bottleneck for interpretation, VIBI, a system-agnostic interpretable method that provides a brief but comprehensive explanation. VIBI adopts an information theoretic principle, information bottleneck principle, as a criterion for finding such explanations. For each instance, VIBI selects key features that are maximally compressed about an input (briefness), and informative about a decision made by a black-box system on that input (comprehensive). We evaluate VIBI on three datasets and compare with state-of-the-art interpretable machine learning methods in terms of both interpretability and fidelity evaluated by human and quantitative metrics.
Tasks	Interpretable Machine Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BJlLdhNFPr
PDF	https://openreview.net/pdf?id=BJlLdhNFPr
PWC	https://paperswithcode.com/paper/explaining-a-black-box-by-using-a-deep
Repo
Framework

Towards Scalable Imitation Learning for Multi-Agent Systems with Graph Neural Networks


Title	Towards Scalable Imitation Learning for Multi-Agent Systems with Graph Neural Networks
Authors	Anonymous
Abstract	We propose an implementation of GNN that predicts and imitates the motion be- haviors from observed swarm trajectory data. The network’s ability to capture interaction dynamics in swarms is demonstrated through transfer learning. We finally discuss the inherent availability and challenges in the scalability of GNN, and proposed a method to improve it with layer-wise tuning and mixing of data enabled by padding.
Tasks	Imitation Learning, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HJeANgBYwr
PDF	https://openreview.net/pdf?id=HJeANgBYwr
PWC	https://paperswithcode.com/paper/towards-scalable-imitation-learning-for-multi
Repo
Framework

Disentangled Cumulants Help Successor Representations Transfer to New Tasks


Title	Disentangled Cumulants Help Successor Representations Transfer to New Tasks
Authors	Anonymous
Abstract	Biological intelligence can learn to solve many diverse tasks in a data efficient manner by re-using basic knowledge and skills from one task to another. Furthermore, many of such skills are acquired through something called latent learning, where no explicit supervision for skill acquisition is provided. This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch and struggle with knowledge transfer. In this paper we propose a principled way to learn and recombine a basis set of policies, which comes with certain guarantees on the coverage of the final task space. In particular, we construct a learning pipeline where an agent invests time to learn to perform intrinsically generated, goal-based tasks, and subsequently leverages this experience to quickly achieve a high level of performance on externally specified, often significantly more complex tasks through generalised policy improvement. We demonstrate both theoretically and empirically that such goal-based intrinsic tasks produce more transferable policies when the goals are specified in a space that exhibits a form of disentanglement.
Tasks	Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BylUMxSFwS
PDF	https://openreview.net/pdf?id=BylUMxSFwS
PWC	https://paperswithcode.com/paper/disentangled-cumulants-help-successor
Repo
Framework

On Federated Learning of Deep Networks from Non-IID Data: Parameter Divergence and the Effects of Hyperparametric Methods


Title	On Federated Learning of Deep Networks from Non-IID Data: Parameter Divergence and the Effects of Hyperparametric Methods
Authors	Anonymous
Abstract	Federated learning, where a global model is trained by iterative parameter averaging of locally-computed updates, is a promising approach for distributed training of deep networks; it provides high communication-efficiency and privacy-preservability, which allows to fit well into decentralized data environments, e.g., mobile-cloud ecosystems. However, despite the advantages, the federated learning-based methods still have a challenge in dealing with non-IID training data of local devices (i.e., learners). In this regard, we study the effects of a variety of hyperparametric conditions under the non-IID environments, to answer important concerns in practical implementations: (i) We first investigate parameter divergence of local updates to explain performance degradation from non-IID data. The origin of the parameter divergence is also found both empirically and theoretically. (ii) We then revisit the effects of optimizers, network depth/width, and regularization techniques; our observations show that the well-known advantages of the hyperparameter optimization strategies could rather yield diminishing returns with non-IID data. (iii) We finally provide the reasons of the failure cases in a categorized way, mainly based on metrics of the parameter divergence.
Tasks	Hyperparameter Optimization
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeOAJStwB
PDF	https://openreview.net/pdf?id=SJeOAJStwB
PWC	https://paperswithcode.com/paper/on-federated-learning-of-deep-networks-from
Repo
Framework

Avoiding Negative Side-Effects and Promoting Safe Exploration with Imaginative Planning


Title	Avoiding Negative Side-Effects and Promoting Safe Exploration with Imaginative Planning
Authors	Anonymous
Abstract	With the recent proliferation of the usage of reinforcement learning (RL) agents for solving real-world tasks, safety emerges as a necessary ingredient for their successful application. In this paper, we focus on ensuring the safety of the agent while making sure that the agent does not cause any unnecessary disruptions to its environment. The current approaches to this problem, such as manually constraining the agent or adding a safety penalty to the reward function, can introduce bad incentives. In complex domains, these approaches are simply intractable, as they require knowing apriori all the possible unsafe scenarios an agent could encounter. We propose a model-based approach to safety that allows the agent to look into the future and be aware of the future consequences of its actions. We learn the transition dynamics of the environment and generate a directed graph called the imaginative module. This graph encapsulates all possible trajectories that can be followed by the agent, allowing the agent to efficiently traverse through the imagined environment without ever taking any action in reality. A baseline state, which can either represent a safe or an unsafe state (based on whichever is easier to define) is taken as a human input, and the imaginative module is used to predict whether the current actions of the agent can cause it to end up in dangerous states in the future. Our imaginative module can be seen as a ``plug-and-play’’ approach to ensuring safety, as it is compatible with any existing RL algorithm and any task with discrete action space. Our method induces the agent to act safely while learning to solve the task. We experimentally validate our proposal on two gridworld environments and a self-driving car simulator, demonstrating that our approach to safety visits unsafe states significantly less frequently than a baseline. \|
Tasks	Safe Exploration
Published	2020-01-01
URL	https://openreview.net/forum?id=HJe7bxBYvr
PDF	https://openreview.net/pdf?id=HJe7bxBYvr
PWC	https://paperswithcode.com/paper/avoiding-negative-side-effects-and-promoting
Repo
Framework

Fourier networks for uncertainty estimates and out-of-distribution detection


Title	Fourier networks for uncertainty estimates and out-of-distribution detection
Authors	Anonymous
Abstract	A simple method for obtaining uncertainty estimates for Neural Network classifiers (e.g. for out-of-distribution detection) is to use an ensemble of independently trained networks and average the softmax outputs. While this method works, its results are still very far from human performance on standard data sets. We investigate how this method works and observe three fundamental limitations: “Unreasonable” extrapolation, “unreasonable” agreement between the networks in an ensemble, and the filtering out of features that distinguish the training distribution from some out-of-distribution inputs, but do not contribute to the classification. To mitigate these problems we suggest “large” initializations in the first layers and changing the activation function to sin(x) in the last hidden layer. We show that this combines the out-of-distribution behavior from nearest neighbor methods with the generalization capabilities of neural networks, and achieves greatly improved out-of- distribution detection on standard data sets (MNIST/fashionMNIST/notMNIST, SVHN/CIFAR10).
Tasks	Out-of-Distribution Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=rkgAb1Btvr
PDF	https://openreview.net/pdf?id=rkgAb1Btvr
PWC	https://paperswithcode.com/paper/fourier-networks-for-uncertainty-estimates
Repo
Framework


Title	Sharing Knowledge in Multi-Task Deep Reinforcement Learning
Authors	Anonymous
Abstract	We study the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning. We leverage the assumption that learning from different tasks, sharing common properties, is helpful to generalize the knowledge of them resulting in a more effective feature extraction compared to learning a single task. Intuitively, the resulting set of features offers performance benefits when used by Reinforcement Learning algorithms. We prove this by providing theoretical guarantees that highlight the conditions for which is convenient to share representations among tasks, extending the well-known finite-time bounds of Approximate Value-Iteration to the multi-task setting. In addition, we complement our analysis by proposing multi-task extensions of three Reinforcement Learning algorithms that we empirically evaluate on widely used Reinforcement Learning benchmarks showing significant improvements over the single-task counterparts in terms of sample efficiency and performance.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkgpv2VFvr
PDF	https://openreview.net/pdf?id=rkgpv2VFvr
PWC	https://paperswithcode.com/paper/sharing-knowledge-in-multi-task-deep
Repo
Framework

MelNet: A Generative Model for Audio in the Frequency Domain


Title	MelNet: A Generative Model for Audio in the Frequency Domain
Authors	Anonymous
Abstract	Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps. While long-range dependencies are difficult to model directly in the time domain, we show that they can be more tractably modelled in two-dimensional time-frequency representations such as spectrograms. By leveraging this representational advantage, in conjunction with a highly expressive probabilistic model and a multiscale generation procedure, we design a model capable of generating high-fidelity audio samples which capture structure at timescales which time-domain models have yet to achieve. We demonstrate that our model captures longer-range dependencies than time-domain models such as WaveNet across a diverse set of unconditional generation tasks, including single-speaker speech generation, multi-speaker speech generation, and music generation.
Tasks	Music Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=r1gIa0NtDH
PDF	https://openreview.net/pdf?id=r1gIa0NtDH
PWC	https://paperswithcode.com/paper/melnet-a-generative-model-for-audio-in-the-1
Repo
Framework

Model Ensemble-Based Intrinsic Reward for Sparse Reward Reinforcement Learning


Title	Model Ensemble-Based Intrinsic Reward for Sparse Reward Reinforcement Learning
Authors	Anonymous
Abstract	In this paper, a new intrinsic reward generation method for sparse-reward reinforcement learning is proposed based on an ensemble of dynamics models. In the proposed method, the mixture of multiple dynamics models is used to approximate the true unknown transition probability, and the intrinsic reward is designed as the minimum of the surprise seen from each dynamics model to the mixture of the dynamics models. In order to show the effectiveness of the proposed intrinsic reward generation method, a working algorithm is constructed by combining the proposed intrinsic reward generation method with the proximal policy optimization (PPO) algorithm. Numerical results show that for representative locomotion tasks, the proposed model-ensemble-based intrinsic reward generation method outperforms the previous methods based on a single dynamics model.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SyxJU64twr
PDF	https://openreview.net/pdf?id=SyxJU64twr
PWC	https://paperswithcode.com/paper/model-ensemble-based-intrinsic-reward-for
Repo
Framework