April 1, 2020

2887 words 14 mins read

Paper Group NANR 119

Multi-objective Neural Architecture Search via Predictive Network Performance Optimization. High performance RNNs with spiking neurons. A Mechanism of Implicit Regularization in Deep Learning. Towards Principled Objectives for Contrastive Disentanglement. Multigrid Neural Memory. Frequency Analysis for Graph Convolution Network. A Graph Neural Netw …

Multi-objective Neural Architecture Search via Predictive Network Performance Optimization


Title	Multi-objective Neural Architecture Search via Predictive Network Performance Optimization
Authors	Anonymous
Abstract	Neural Architecture Search (NAS) has shown great potentials in finding a better neural network design than human design. Sample-based NAS is the most fundamental method aiming at exploring the search space and evaluating the most promising architecture. However, few works have focused on improving the sampling efficiency for a multi-objective NAS. Inspired by the nature of the graph structure of a neural network, we propose BOGCN-NAS, a NAS algorithm using Bayesian Optimization with Graph Convolutional Network (GCN) predictor. Specifically, we apply GCN as a surrogate model to adaptively discover and incorporate nodes structure to approximate the performance of the architecture. For NAS-oriented tasks, we also design a weighted loss focusing on architectures with high performance. Our method further considers an efficient multi-objective search which can be flexibly injected into any sample-based NAS pipelines to efficiently find the best speed/accuracy trade-off. Extensive experiments are conducted to verify the effectiveness of our method over many competing methods, e.g. 128.4x more efficient than Random Search and 7.8x more efficient than previous SOTA LaNAS for finding the best architecture on the largest NAS dataset NasBench-101.
Tasks	Neural Architecture Search
Published	2020-01-01
URL	https://openreview.net/forum?id=rJgffkSFPS
PDF	https://openreview.net/pdf?id=rJgffkSFPS
PWC	https://paperswithcode.com/paper/multi-objective-neural-architecture-search
Repo
Framework

High performance RNNs with spiking neurons


Title	High performance RNNs with spiking neurons
Authors	Anonymous
Abstract	The increasing need for compact and low-power computing solutions for machine learning applications has triggered a renaissance in the study of energy-efficient neural network accelerators. In particular, in-memory computing neuromorphic architectures have started to receive substantial attention from both academia and industry. However, most of these architectures rely on spiking neural networks, which typically perform poorly compared to their non-spiking counterparts in terms of accuracy. In this paper, we propose a new adaptive spiking neuron model that can also be abstracted as a low-pass filter. This abstraction enables faster and better training of spiking networks using back-propagation, without simulating spikes. We show that this model dramatically improves the inference performance of a recurrent neural network and validate it with three complex spatio-temporal learning tasks: the temporal addition task, the temporal copying task, and a spoken-phrase recognition task. Application of these results will lead to the development of powerful spiking models for neuromorphic hardware that solve relevant edge-computing and Internet-of-Things applications with high accuracy and ultra-low power consumption.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyxnnnVtwB
PDF	https://openreview.net/pdf?id=HyxnnnVtwB
PWC	https://paperswithcode.com/paper/high-performance-rnns-with-spiking-neurons
Repo
Framework

A Mechanism of Implicit Regularization in Deep Learning


Title	A Mechanism of Implicit Regularization in Deep Learning
Authors	Anonymous
Abstract	Despite a lot of theoretical efforts, very little is known about mechanisms of implicit regularization by which the low complexity contributes to generalization in deep learning. In particular, causality between the generalization performance, implicit regularization and nonlinearity of activation functions is one of the basic mysteries of deep neural networks (DNNs). In this work, we introduce a novel technique for DNNs called random walk analysis and reveal a mechanism of the implicit regularization caused by nonlinearity of ReLU activation. Surprisingly, our theoretical results suggest that the learned DNNs interpolate almost linearly between data points, which leads to the low complexity solutions in the over-parameterized regime. As a result, we prove that stochastic gradient descent can learn a class of continuously differentiable functions with generalization bounds of the order of $O(n^{-2})$ ($n$: the number of samples). Furthermore, our analysis is independent of the kernel methods, including neural tangent kernels.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJx0U64FwS
PDF	https://openreview.net/pdf?id=HJx0U64FwS
PWC	https://paperswithcode.com/paper/a-mechanism-of-implicit-regularization-in
Repo
Framework

Towards Principled Objectives for Contrastive Disentanglement


Title	Towards Principled Objectives for Contrastive Disentanglement
Authors	Anonymous
Abstract	Unsupervised learning is an important tool that has received a significant amount of attention for decades. Its goal is `unsupervised recovery,' i.e., extracting salient factors/properties from unlabeled data. Because of the challenges in defining salient properties, recently,` contrastive disentanglement’ has gained popularity to discover the additional variations that are enhanced in one dataset relative to another. %In fact, contrastive disentanglement and unsupervised recovery are often combined in that we seek additional variations that exhibit salient factors/properties. Existing formulations have devised a variety of losses for this task. However, all present day methods exhibit two major shortcomings: (1) encodings for data that does not exhibit salient factors is not pushed to carry no signal; and (2) introduced losses are often hard to estimate and require additional trainable parameters. We present a new formulation for contrastive disentanglement which avoids both shortcomings by carefully formulating a probabilistic model and by using non-parametric yet easily computable metrics. We show on four challenging datasets that the proposed approach is able to better disentangle salient factors.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1lPETVFPS
PDF	https://openreview.net/pdf?id=B1lPETVFPS
PWC	https://paperswithcode.com/paper/towards-principled-objectives-for-contrastive
Repo
Framework

Multigrid Neural Memory


Title	Multigrid Neural Memory
Authors	Anonymous
Abstract	We introduce a novel architecture that integrates a large addressable memory space into the core functionality of a deep neural network. Our design distributes both memory addressing operations and storage capacity over many network layers. Distinct from strategies that connect neural networks to external memory banks, our approach co-locates memory with computation throughout the network structure. Mirroring recent architectural innovations in convolutional networks, we organize memory into a multiresolution hierarchy, whose internal connectivity enables learning of dynamic information routing strategies and data-dependent read/write operations. This multigrid spatial layout permits parameter-efficient scaling of memory size, allowing us to experiment with memories substantially larger than those in prior work. We demonstrate this capability on synthetic exploration and mapping tasks, where the network is able to self-organize and retain long-term memory for trajectories of thousands of time steps. On tasks decoupled from any notion of spatial geometry, such as sorting or associative recall, our design functions as a truly generic memory and yields results competitive with those of the recently proposed Differentiable Neural Computer.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ByxKo04tvr
PDF	https://openreview.net/pdf?id=ByxKo04tvr
PWC	https://paperswithcode.com/paper/multigrid-neural-memory-1
Repo
Framework

Frequency Analysis for Graph Convolution Network


Title	Frequency Analysis for Graph Convolution Network
Authors	Anonymous
Abstract	In this work, we develop quantitative results to the learnablity of a two-layers Graph Convolutional Network (GCN). Instead of analyzing GCN under some classes of functions, our approach provides a quantitative gap between a two-layers GCN and a two-layers MLP model. Our analysis is based on the graph signal processing (GSP) approach, which can provide much more useful insights than the message-passing computational model. Interestingly, based on our analysis, we have been able to empirically demonstrate a few case when GCN and other state-of-the-art models cannot learn even when true vertex features are extremely low-dimensional. To demonstrate our theoretical findings and propose a solution to the aforementioned adversarial cases, we build a proof of concept graph neural network model with stacked filters named Graph Filters Neural Network (gfNN).
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HylthC4twr
PDF	https://openreview.net/pdf?id=HylthC4twr
PWC	https://paperswithcode.com/paper/frequency-analysis-for-graph-convolution
Repo
Framework

A Graph Neural Network Assisted Monte Carlo Tree Search Approach to Traveling Salesman Problem


Title	A Graph Neural Network Assisted Monte Carlo Tree Search Approach to Traveling Salesman Problem
Authors	Anonymous
Abstract	We present a graph neural network assisted Monte Carlo Tree Search approach for the classical traveling salesman problem (TSP). We adopt a greedy algorithm framework to construct the optimal solution to TSP by adding the nodes successively. A graph neural network (GNN) is trained to capture the local and global graph structure and give the prior probability of selecting each vertex every step. The prior probability provides a heuristics for MCTS, and the MCTS output is an improved probability for selecting the successive vertex, as it is the feedback information by fusing the prior with the scouting procedure. Experimental results on TSP up to 100 nodes demonstrate that the proposed method obtains shorter tours than other learning-based methods.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Syg6fxrKDB
PDF	https://openreview.net/pdf?id=Syg6fxrKDB
PWC	https://paperswithcode.com/paper/a-graph-neural-network-assisted-monte-carlo
Repo
Framework

Neural Design of Contests and All-Pay Auctions using Multi-Agent Simulation


Title	Neural Design of Contests and All-Pay Auctions using Multi-Agent Simulation
Authors	Anonymous
Abstract	We propose a multi-agent learning approach for designing crowdsourcing contests and all-pay auctions. Prizes in contests incentivise contestants to expend effort on their entries, with different prize allocations resulting in different incentives and bidding behaviors. In contrast to auctions designed manually by economists, our method searches the possible design space using a simulation of the multi-agent learning process, and can thus handle settings where a game-theoretic equilibrium analysis is not tractable. Our method simulates agent learning in contests and evaluates the utility of the resulting outcome for the auctioneer. Given a large contest design space, we assess through simulation many possible contest designs within the space, and fit a neural network to predict outcomes for previously untested contest designs. Finally, we apply mirror descent to optimize the design so as to achieve more desirable outcomes. Our empirical analysis shows our approach closely matches the optimal outcomes in settings where the equilibrium is known, and can produce high quality designs in settings where the equilibrium strategies are not solvable analytically.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Bklg1grtDr
PDF	https://openreview.net/pdf?id=Bklg1grtDr
PWC	https://paperswithcode.com/paper/neural-design-of-contests-and-all-pay
Repo
Framework

A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions


Title	A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions
Authors	Anonymous
Abstract	One of the distinguishing characteristics of modern deep learning systems is that they typically employ neural network architectures that utilize enormous numbers of parameters, often in the millions and sometimes even in the billions. While this paradigm has inspired significant research on the properties of large networks, relatively little work has been devoted to the fact that these networks are often used to model large complex datasets, which may themselves contain millions or even billions of constraints. In this work, we focus on this high-dimensional regime in which both the dataset size and the number of features tend to infinity. We analyze the performance of a simple regression model trained on the random features $F=f(WX+B)$ for a random weight matrix $W$ and random bias vector $B$, obtaining an exact formula for the asymptotic training error on a noisy autoencoding task. The role of the bias can be understood as parameterizing a distribution over activation functions, and our analysis actually extends to general such distributions, even those not expressible with a traditional additive bias. Intruigingly, we find that a mixture of nonlinearities can outperform the best single nonlinearity on the noisy autoecndoing task, suggesting that mixtures of nonlinearities might be useful for approximate kernel methods or neural network architecture design.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJx7N1SKvB
PDF	https://openreview.net/pdf?id=BJx7N1SKvB
PWC	https://paperswithcode.com/paper/a-random-matrix-perspective-on-mixtures-of
Repo
Framework

Program Guided Agent


Title	Program Guided Agent
Authors	Anonymous
Abstract	Developing agents that can learn to follow natural language instructions has been an emerging research direction. While being accessible and flexible, natural language instructions can sometimes be ambiguous even to humans. To address this, we propose to utilize programs, structured in a formal language, as a precise and expressive way to specify tasks. We then devise a modular framework that learns to perform a task specified by a program – as different circumstances give rise to diverse ways to accomplish the task, our framework can perceive which circumstance it is currently under, and instruct a multitask policy accordingly to fulfill each subtask of the overall task. Experimental results on a 2D Minecraft environment not only demonstrate that the proposed framework learns to reliably accomplish program instructions and achieves zero-shot generalization to more complex instructions but also verify the efficiency of the proposed modulation mechanism for learning the multitask policy. We also conduct an analysis comparing various models which learn from programs and natural language instructions in an end-to-end fashion.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BkxUvnEYDH
PDF	https://openreview.net/pdf?id=BkxUvnEYDH
PWC	https://paperswithcode.com/paper/program-guided-agent
Repo
Framework

MissDeepCausal: causal inference from incomplete data using deep latent variable models


Title	MissDeepCausal: causal inference from incomplete data using deep latent variable models
Authors	Anonymous
Abstract	Inferring causal effects of a treatment, intervention or policy from observational data is central to many applications. However, state-of-the-art methods for causal inference seldom consider the possibility that covariates have missing values, which is ubiquitous in many real-world analyses. Missing data greatly complicate causal inference procedures as they require an adapted unconfoundedness hypothesis which can be difficult to justify in practice. We circumvent this issue by considering latent confounders whose distribution is learned through variational autoencoders adapted to missing values. They can be used either as a pre-processing step prior to causal inference but we also suggest to embed them in a multiple imputation strategy to take into account the variability due to missing values. Numerical experiments demonstrate the effectiveness of the proposed methodology especially for non-linear models compared to competitors.
Tasks	Causal Inference, Imputation, Latent Variable Models
Published	2020-01-01
URL	https://openreview.net/forum?id=SylpBgrKPH
PDF	https://openreview.net/pdf?id=SylpBgrKPH
PWC	https://paperswithcode.com/paper/missdeepcausal-causal-inference-from
Repo
Framework

Data Valuation using Reinforcement Learning


Title	Data Valuation using Reinforcement Learning
Authors	Anonymous
Abstract	Quantifying the value of data is a fundamental problem in machine learning. Data valuation has multiple important use cases: (1) building insights about the learning task, (2) domain adaptation, (3) corrupted sample discovery, and (4) robust learning. To adaptively learn data values jointly with the target task predictor model, we propose a meta learning framework which we name Data Valuation using Reinforcement Learning (DVRL). We employ a data value estimator (modeled by a deep neural network) to learn how likely each datum is used in training of the predictor model. We train the data value estimator using a reinforcement signal of the reward obtained on a small validation set that reflects performance on the target task. We demonstrate that DVRL yields superior data value estimates compared to alternative methods across different types of datasets and in a diverse set of application scenarios. The corrupted sample discovery performance of DVRL is close to optimal in many regimes (i.e. as if the noisy samples were known apriori), and for domain adaptation and robust learning DVRL significantly outperforms state-of-the-art by 14.6% and 10.8%, respectively.
Tasks	Domain Adaptation, Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BJx8YnEFPH
PDF	https://openreview.net/pdf?id=BJx8YnEFPH
PWC	https://paperswithcode.com/paper/data-valuation-using-reinforcement-learning-1
Repo
Framework

Deep Graph Matching Consensus


Title	Deep Graph Matching Consensus
Authors	Anonymous
Abstract	This work presents a two-stage neural architecture for learning and refining structural correspondences between graphs. First, we use localized node embeddings computed by a graph neural network to obtain an initial ranking of soft correspondences between nodes. Secondly, we employ synchronous message passing networks to iteratively re-rank the soft correspondences to reach a matching consensus in local neighborhoods between graphs. We show, theoretically and empirically, that our message passing scheme computes a well-founded measure of consensus for corresponding neighborhoods, which is then used to guide the iterative re-ranking process. Our purely local and sparsity-aware architecture scales well to large, real-world inputs while still being able to recover global correspondences consistently. We demonstrate the practical effectiveness of our method on real-world tasks from the fields of computer vision and entity alignment between knowledge graphs, on which we improve upon the current state-of-the-art.
Tasks	Entity Alignment, Graph Matching, Knowledge Graphs
Published	2020-01-01
URL	https://openreview.net/forum?id=HyeJf1HKvS
PDF	https://openreview.net/pdf?id=HyeJf1HKvS
PWC	https://paperswithcode.com/paper/deep-graph-matching-consensus
Repo
Framework

Behavior-Guided Reinforcement Learning


Title	Behavior-Guided Reinforcement Learning
Authors	Anonymous
Abstract	We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space. We show that by utilizing the dual formulation of the WD, we can learn score functions over trajectories that can be in turn used to lead policy optimization towards (or away from) (un)desired behaviors. Combined with smoothed WDs, the dual formulation allows us to devise efficient algorithms that take stochastic gradient descent steps through WD regularizers. We incorporate these regularizers into two novel on-policy algorithms, Behavior-Guided Policy Gradient and Behavior-Guided Evolution Strategies, which we demonstrate can outperform existing methods in a variety of challenging environments. We also provide an open source demo.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hklo5RNtwS
PDF	https://openreview.net/pdf?id=Hklo5RNtwS
PWC	https://paperswithcode.com/paper/behavior-guided-reinforcement-learning
Repo
Framework

AMUSED: A Multi-Stream Vector Representation Method for Use In Natural Dialogue


Title	AMUSED: A Multi-Stream Vector Representation Method for Use In Natural Dialogue
Authors	Anonymous
Abstract	The problem of building a coherent and non-monotonous conversational agent with proper discourse and coverage is still an area of open research. Current architectures only take care of semantic and contextual information for a given query and fail to completely account for syntactic and external knowledge which are crucial for generating responses in a chit-chat system. To overcome this problem, we propose an end to end multi-stream deep learning architecture which learns unified embeddings for query-response pairs by leveraging contextual information from memory networks and syntactic information by incorporating Graph Convolution Networks (GCN) over their dependency parse. A stream of this network also utilizes transfer learning by pre-training a bidirectional transformer to extract semantic representation for each input sentence and incorporates external knowledge through the neighbourhood of the entities from a Knowledge Base (KB). We benchmark these embeddings on next sentence prediction task and significantly improve upon the existing techniques. Furthermore, we use AMUSED to represent query and responses along with its context to develop a retrieval based conversational agent which has been validated by expert linguists to have comprehensive engagement with humans.
Tasks	Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rJe6t1SFDB
PDF	https://openreview.net/pdf?id=rJe6t1SFDB
PWC	https://paperswithcode.com/paper/amused-a-multi-stream-vector-representation
Repo
Framework