April 1, 2020

3030 words 15 mins read

Paper Group NANR 114

Causal Discovery with Reinforcement Learning. SAFE-DNN: A Deep Neural Network with Spike Assisted Feature Extraction for Noise Robust Inference. An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality. GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation. Distillation $\approx$ Early Stopping? Harvesting Dar …

Causal Discovery with Reinforcement Learning


Title	Causal Discovery with Reinforcement Learning
Authors	Anonymous
Abstract	Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model assumptions, they are less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use Reinforcement Learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows for a flexible score function under the acyclicity constraint.
Tasks	Causal Discovery, Combinatorial Optimization
Published	2020-01-01
URL	https://openreview.net/forum?id=S1g2skStPB
PDF	https://openreview.net/pdf?id=S1g2skStPB
PWC	https://paperswithcode.com/paper/causal-discovery-with-reinforcement-learning-1
Repo
Framework

SAFE-DNN: A Deep Neural Network with Spike Assisted Feature Extraction for Noise Robust Inference


Title	SAFE-DNN: A Deep Neural Network with Spike Assisted Feature Extraction for Noise Robust Inference
Authors	Anonymous
Abstract	We present a Deep Neural Network with Spike Assisted Feature Extraction (SAFE-DNN) to improve robustness of classification under stochastic perturbation of inputs. The proposed network augments a DNN with unsupervised learning of low-level features using spiking neuron network (SNN) with Spike-Time-Dependent-Plasticity (STDP). The complete network learns to ignore local perturbation while performing global feature detection and classification. The experimental results on CIFAR-10 and ImageNet subset demonstrate improved noise robustness for multiple DNN architectures without sacrificing accuracy on clean images.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJg1fgBYwH
PDF	https://openreview.net/pdf?id=BJg1fgBYwH
PWC	https://paperswithcode.com/paper/safe-dnn-a-deep-neural-network-with-spike
Repo
Framework

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality


Title	An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality
Authors	Anonymous
Abstract	Distances are pervasive in machine learning. They serve as similarity measures, loss functions, and learning targets; it is said that a good distance measure solves a task. When defining distances, the triangle inequality has proven to be a useful constraint, both theoretically—to prove convergence and optimality guarantees—and empirically—as an inductive bias. Deep metric learning architectures that respect the triangle inequality rely, almost exclusively, on Euclidean distance in the latent space. Though effective, this fails to model two broad classes of subadditive distances, common in graphs and reinforcement learning: asymmetric metrics, and metrics that cannot be embedded into Euclidean space. To address these problems, we introduce novel architectures that are guaranteed to satisfy the triangle inequality. We prove our architectures universally approximate norm-induced metrics on $\mathbb{R}^n$, and present a similar result for modified Input Convex Neural Networks. We show that our architectures outperform existing metric approaches when modeling graph distances and have a better inductive bias than non-metric approaches when training data is limited in the multi-goal reinforcement learning setting.
Tasks	Metric Learning, Multi-Goal Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HJeiDpVFPr
PDF	https://openreview.net/pdf?id=HJeiDpVFPr
PWC	https://paperswithcode.com/paper/an-inductive-bias-for-distances-neural-nets
Repo
Framework

GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation


Title	GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation
Authors	Anonymous
Abstract	This paper presents a new Graph Neural Network (GNN) type using feature-wise linear modulation (FiLM). Many standard GNN variants propagate information along the edges of a graph by computing ``messages’’ based only on the representation of the source of each edge. In GNN-FiLM, the representation of the target node of an edge is additionally used to compute a transformation that can be applied to all incoming messages, allowing feature-wise modulation of the passed information. Results of experiments comparing different GNN architectures on three tasks from the literature are presented, based on re-implementations of baseline methods. Hyperparameters for all methods were found using extensive search, yielding somewhat surprising results: differences between baseline models are smaller than reported in the literature. Nonetheless, GNN-FiLM outperforms baseline methods on a regression task on molecular graphs and performs competitively on other tasks. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJe4Cp4KwH
PDF	https://openreview.net/pdf?id=HJe4Cp4KwH
PWC	https://paperswithcode.com/paper/gnn-film-graph-neural-networks-with-feature-1
Repo
Framework

Distillation $\approx$ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized NN


Title	Distillation $\approx$ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized NN
Authors	Anonymous
Abstract	Distillation is a method to transfer knowledge from one model to another and often achieves higher accuracy with the same capacity. In this paper, we aim to provide a theoretical understanding on what mainly helps with the distillation. Our answer is “early stopping”. Assuming that the teacher network is overparameterized, we argue that the teacher network is essentially harvesting dark knowledge from the data via early stopping. This can be justified by a new concept, Anisotropic In- formation Retrieval (AIR), which means that the neural network tends to fit the informative information first and the non-informative information (including noise) later. Motivated by the recent development on theoretically analyzing overparame- terized neural networks, we can characterize AIR by the eigenspace of the Neural Tangent Kernel(NTK). AIR facilities a new understanding of distillation. With that, we further utilize distillation to refine noisy labels. We propose a self-distillation al- gorithm to sequentially distill knowledge from the network in the previous training epoch to avoid memorizing the wrong labels. We also demonstrate, both theoret- ically and empirically, that self-distillation can benefit from more than just early stopping. Theoretically, we prove convergence of the proposed algorithm to the ground truth labels for randomly initialized overparameterized neural networks in terms of l2 distance, while the previous result was on convergence in 0-1 loss. The theoretical result ensures the learned neural network enjoy a margin on the training data which leads to better generalization. Empirically, we achieve better testing accuracy and entirely avoid early stopping which makes the algorithm more user-friendly.
Tasks	Information Retrieval
Published	2020-01-01
URL	https://openreview.net/forum?id=HJlF3h4FvB
PDF	https://openreview.net/pdf?id=HJlF3h4FvB
PWC	https://paperswithcode.com/paper/distillation-approx-early-stopping-harvesting-1
Repo
Framework

Understanding Top-k Sparsification in Distributed Deep Learning


Title	Understanding Top-k Sparsification in Distributed Deep Learning
Authors	Anonymous
Abstract	Distributed stochastic gradient descent (SGD) algorithms are widely deployed in training large-scale deep learning models, while the communication overhead among workers becomes the new system bottleneck. Recently proposed gradient sparsification techniques, especially Top-$k$ sparsification with error compensation (TopK-SGD), can significantly reduce the communication traffic without obvious impact on the model accuracy. Some theoretical studies have been carried out to analyze the convergence property of TopK-SGD. However, existing studies do not dive into the details of Top-$k$ operator in gradient sparsification and use relaxed bounds (e.g., exact bound of Random-$k$) for analysis; hence the derived results cannot well describe the real convergence performance of TopK-SGD. To this end, we first study the gradient distributions of TopK-SGD during training process through extensive experiments. We then theoretically derive a tighter bound for the Top-$k$ operator. Finally, we exploit the property of gradient distribution to propose an approximate top-$k$ selection algorithm, which is computing-efficient for GPUs, to improve the scaling efficiency of TopK-SGD by significantly reducing the computing overhead.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1gi0TEFDB
PDF	https://openreview.net/pdf?id=B1gi0TEFDB
PWC	https://paperswithcode.com/paper/understanding-top-k-sparsification-in
Repo
Framework

Incorporating Horizontal Connections in Convolution by Spatial Shuffling


Title	Incorporating Horizontal Connections in Convolution by Spatial Shuffling
Authors	Ikki Kishida, Hideki Nakayama
Abstract	Convolutional Neural Networks (CNNs) are composed of multiple convolution layers and show elegant performance in vision tasks. The design of the regular convolution is based on the Receptive Field (RF) where the information within a specific region is processed. In the view of the regular convolution’s RF, the outputs of neurons in lower layers with smaller RF are bundled to create neurons in higher layers with larger RF. As a result, the neurons in high layers are able to capture the global context even though the neurons in low layers only see the local information. However, in lower layers of the biological brain, the information outside of the RF changes the properties of neurons. In this work, we extend the regular convolution and propose spatially shuffled convolution (ss convolution). In ss convolution, the regular convolution is able to use the information outside of its RF by spatial shuffling which is a simple and lightweight operation. We perform experiments on CIFAR-10 and ImageNet-1k dataset, and show that ss convolution improves the classification performance across various CNNs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkgODpVFDr
PDF	https://openreview.net/pdf?id=SkgODpVFDr
PWC	https://paperswithcode.com/paper/incorporating-horizontal-connections-in
Repo
Framework

Learning Cluster Structured Sparsity by Reweighting


Title	Learning Cluster Structured Sparsity by Reweighting
Authors	Anonymous
Abstract	Recently, the paradigm of unfolding iterative algorithms into finite-length feed-forward neural networks has achieved a great success in the area of sparse recovery. Benefit from available training data, the learned networks have achieved state-of-the-art performance in respect of both speed and accuracy. However, the structure behind sparsity, imposing constraint on the support of sparse signals, is often an essential prior knowledge but seldom considered in the existing networks. In this paper, we aim at bridging this gap. Specifically, exploiting the iterative reweighted $\ell_1$ minimization (IRL1) algorithm, we propose to learn the cluster structured sparsity (CSS) by rewegihting adaptively. In particular, we first unfold the Reweighted Iterative Shrinkage Algorithm (RwISTA) into an end-to-end trainable deep architecture termed as RW-LISTA. Then instead of the element-wise reweighting, the global and local reweighting manner are proposed for the cluster structured sparse learning. Numerical experiments further show the superiority of our algorithm against both classical algorithms and learning-based networks on different tasks.
Tasks	Sparse Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Byl28eBtwH
PDF	https://openreview.net/pdf?id=Byl28eBtwH
PWC	https://paperswithcode.com/paper/learning-cluster-structured-sparsity-by
Repo
Framework

Generative Adversarial Networks For Data Scarcity Industrial Positron Images With Attention


Title	Generative Adversarial Networks For Data Scarcity Industrial Positron Images With Attention
Authors	Mingwei Zhu, Min Zhao, Min Yao, Ruipeng Guo
Abstract	In the industrial field, the positron annihilation is not affected by complex environment, and the gamma-ray photon penetration is strong, so the nondestructive detection of industrial parts can be realized. Due to the poor image quality caused by gamma-ray photon scattering, attenuation and short sampling time in positron process, we propose the idea of combining deep learning to generate positron images with good quality and clear details by adversarial nets. The structure of the paper is as follows: firstly, we encode to get the hidden vectors of medical CT images based on transfer Learning, and use PCA to extract positron image features. Secondly, we construct a positron image memory based on attention mechanism as a whole input to the adversarial nets which uses medical hidden variables as a query. Finally, we train the whole model jointly and update the input parameters until convergence. Experiments have proved the possibility of generating rare positron images for industrial non-destructive testing using countermeasure networks, and good imaging results have been achieved.
Tasks	Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SkxcSpEKPS
PDF	https://openreview.net/pdf?id=SkxcSpEKPS
PWC	https://paperswithcode.com/paper/generative-adversarial-networks-for-data
Repo
Framework

Support-guided Adversarial Imitation Learning


Title	Support-guided Adversarial Imitation Learning
Authors	Anonymous
Abstract	We propose Support-guided Adversarial Imitation Learning (SAIL), a generic imitation learning framework that unifies support estimation of the expert policy with the family of Adversarial Imitation Learning (AIL) algorithms. SAIL addresses two important challenges of AIL, including the implicit reward bias and potential training instability. We also show that SAIL is at least as efficient as standard AIL. In an extensive evaluation, we demonstrate that the proposed method effectively handles the reward bias and achieves better performance and training stability than other baseline methods on a wide range of benchmark control tasks.
Tasks	Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=r1x3unVKPS
PDF	https://openreview.net/pdf?id=r1x3unVKPS
PWC	https://paperswithcode.com/paper/support-guided-adversarial-imitation-learning
Repo
Framework

IS THE LABEL TRUSTFUL: TRAINING BETTER DEEP LEARNING MODEL VIA UNCERTAINTY MINING NET


Title	IS THE LABEL TRUSTFUL: TRAINING BETTER DEEP LEARNING MODEL VIA UNCERTAINTY MINING NET
Authors	Anonymous
Abstract	In this work, we consider a new problem of training deep neural network on partially labeled data with label noise. As far as we know, there have been very few efforts to tackle such problems. We present a novel end-to-end deep generative pipeline for improving classifier performance when dealing with such data problems. We call it Uncertainty Mining Net (UMN). During the training stage, we utilize all the available data (labeled and unlabeled) to train the classifier via a semi-supervised generative framework. During training, UMN estimates the uncertainly of the labels’ to focus on clean data for learning. More precisely, UMN applies the sample-wise label uncertainty estimation scheme. Extensive experiments and comparisons against state-of-the-art methods on several popular benchmark datasets demonstrate that UMN can reduce the effects of label noise and significantly improve classifier performance.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1ly2grtvB
PDF	https://openreview.net/pdf?id=S1ly2grtvB
PWC	https://paperswithcode.com/paper/is-the-label-trustful-training-better-deep
Repo
Framework


Title	Learning with Social Influence through Interior Policy Differentiation
Authors	Anonymous
Abstract	Animals develop novel skills not only through the interaction with the environment but also from the influence of the others. In this work we model the social influence into the scheme of reinforcement learning, enabling the agents to learn both from the environment and from their peers. Specifically, we first define a metric to measure the distance between policies then quantitatively derive the definition of uniqueness. Unlike previous precarious joint optimization approaches, the social uniqueness motivation in our work is imposed as a constraint to encourage the agent to learn a policy different from the existing agents while still solve the primal task. The resulting algorithm, namely Interior Policy Differentiation (IPD), brings about performance improvement as well as a collection of policies that solve a given task with distinct behaviors
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeQi1HKDH
PDF	https://openreview.net/pdf?id=SJeQi1HKDH
PWC	https://paperswithcode.com/paper/learning-with-social-influence-through
Repo
Framework

Finding Deep Local Optima Using Network Pruning


Title	Finding Deep Local Optima Using Network Pruning
Authors	Anonymous
Abstract	Artificial neural networks (ANNs) are very popular nowadays and offer reliable solutions to many classification problems. However, training deep neural networks (DNN) is time-consuming due to the large number of parameters. Recent research indicates that these DNNs might be over-parameterized and different solutions have been proposed to reduce the complexity both in the number of parameters and in the training time of the neural networks. Furthermore, some researchers argue that after reducing the neural network complexity via connection pruning, the remaining weights are irrelevant and retraining the sub-network would obtain a comparable accuracy with the original one. This may hold true in most vision problems where we always enjoy a large number of training samples and research indicates that most local optima of the convolutional neural networks may be equivalent. However, in non-vision sparse datasets, especially with many irrelevant features where a standard neural network would overfit, this might not be the case and there might be many non-equivalent local optima. This paper presents empirical evidence for these statements and an empirical study of the learnability of neural networks (NNs) on some challenging non-linear real and simulated data with irrelevant variables. Our simulation experiments indicate that the cross-entropy loss function on XOR-like data has many local optima, and the number of local optima grows exponentially with the number of irrelevant variables. We also introduce a connection pruning method to improve the capability of NNs to find a deep local minimum even when there are irrelevant variables. Furthermore, the performance of the discovered sparse sub-network degrades considerably either by retraining from scratch or the corresponding original initialization, due to the existence of many bad optima around. Finally, we will show that the performance of neural networks for real-world experiments on sparse datasets can be recovered or even improved by discovering a good sub-network architecture via connection pruning.
Tasks	Network Pruning
Published	2020-01-01
URL	https://openreview.net/forum?id=SyeHPgHFDr
PDF	https://openreview.net/pdf?id=SyeHPgHFDr
PWC	https://paperswithcode.com/paper/finding-deep-local-optima-using-network
Repo
Framework

Exploration in Reinforcement Learning with Deep Covering Options


Title	Exploration in Reinforcement Learning with Deep Covering Options
Authors	Anonymous
Abstract	While many option discovery methods have been proposed to accelerate exploration in reinforcement learning, they are often heuristic. Recently, covering options was proposed to discover a set of options that provably reduce the upper bound of the environment’s cover time, a measure of the difficulty of exploration. Covering options are computed using the eigenvectors of the graph Laplacian, but they are constrained to tabular tasks and are not applicable to tasks with large or continuous state-spaces. We introduce deep covering options, an online method that extends covering options to large state spaces, automatically discovering task-agnostic options that encourage exploration. We evaluate our method in several challenging sparse-reward domains and we show that our approach identifies less explored regions of the state-space and successfully generates options to visit these regions, substantially improving both the exploration and the total accumulated reward.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkeIyaVtwB
PDF	https://openreview.net/pdf?id=SkeIyaVtwB
PWC	https://paperswithcode.com/paper/exploration-in-reinforcement-learning-with
Repo
Framework

``"Best-of-Many-Samples” Distribution Matching


Title	``"Best-of-Many-Samples” Distribution Matching \|
Authors	Anonymous
Abstract	Generative Adversarial Networks (GANs) can achieve state-of-the-art sample quality in generative modelling tasks but suffer from the mode collapse problem. Variational Autoencoders (VAE) on the other hand explicitly maximize a reconstruction-based data log-likelihood forcing it to cover all modes, but suffer from poorer sample quality. Recent works have proposed hybrid VAE-GAN frameworks which integrate a GAN-based synthetic likelihood to the VAE objective to address both the mode collapse and sample quality issues, with limited success. This is because the VAE objective forces a trade-off between the data log-likelihood and divergence to the latent prior. The synthetic likelihood ratio term also shows instability during training. We propose a novel objective with a ``"Best-of-Many-Samples” reconstruction cost and a stable direct estimate of the synthetic likelihood. This enables our hybrid VAE-GAN framework to achieve high data log-likelihood and low divergence to the latent prior at the same time and shows significant improvement over both hybrid VAE-GANS and plain GANs in mode coverage and quality. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1lk61BtvB
PDF	https://openreview.net/pdf?id=S1lk61BtvB
PWC	https://paperswithcode.com/paper/best-of-many-samples-distribution-matching
Repo
Framework