Paper Group NANR 119
Multi-objective Neural Architecture Search via Predictive Network Performance Optimization. High performance RNNs with spiking neurons. A Mechanism of Implicit Regularization in Deep Learning. Towards Principled Objectives for Contrastive Disentanglement. Multigrid Neural Memory. Frequency Analysis for Graph Convolution Network. A Graph Neural Netw …
Multi-objective Neural Architecture Search via Predictive Network Performance Optimization
Title | Multi-objective Neural Architecture Search via Predictive Network Performance Optimization |
Authors | Anonymous |
Abstract | Neural Architecture Search (NAS) has shown great potentials in finding a better neural network design than human design. Sample-based NAS is the most fundamental method aiming at exploring the search space and evaluating the most promising architecture. However, few works have focused on improving the sampling efficiency for a multi-objective NAS. Inspired by the nature of the graph structure of a neural network, we propose BOGCN-NAS, a NAS algorithm using Bayesian Optimization with Graph Convolutional Network (GCN) predictor. Specifically, we apply GCN as a surrogate model to adaptively discover and incorporate nodes structure to approximate the performance of the architecture. For NAS-oriented tasks, we also design a weighted loss focusing on architectures with high performance. Our method further considers an efficient multi-objective search which can be flexibly injected into any sample-based NAS pipelines to efficiently find the best speed/accuracy trade-off. Extensive experiments are conducted to verify the effectiveness of our method over many competing methods, e.g. 128.4x more efficient than Random Search and 7.8x more efficient than previous SOTA LaNAS for finding the best architecture on the largest NAS dataset NasBench-101. |
Tasks | Neural Architecture Search |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJgffkSFPS |
https://openreview.net/pdf?id=rJgffkSFPS | |
PWC | https://paperswithcode.com/paper/multi-objective-neural-architecture-search |
Repo | |
Framework | |
High performance RNNs with spiking neurons
Title | High performance RNNs with spiking neurons |
Authors | Anonymous |
Abstract | The increasing need for compact and low-power computing solutions for machine learning applications has triggered a renaissance in the study of energy-efficient neural network accelerators. In particular, in-memory computing neuromorphic architectures have started to receive substantial attention from both academia and industry. However, most of these architectures rely on spiking neural networks, which typically perform poorly compared to their non-spiking counterparts in terms of accuracy. In this paper, we propose a new adaptive spiking neuron model that can also be abstracted as a low-pass filter. This abstraction enables faster and better training of spiking networks using back-propagation, without simulating spikes. We show that this model dramatically improves the inference performance of a recurrent neural network and validate it with three complex spatio-temporal learning tasks: the temporal addition task, the temporal copying task, and a spoken-phrase recognition task. Application of these results will lead to the development of powerful spiking models for neuromorphic hardware that solve relevant edge-computing and Internet-of-Things applications with high accuracy and ultra-low power consumption. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyxnnnVtwB |
https://openreview.net/pdf?id=HyxnnnVtwB | |
PWC | https://paperswithcode.com/paper/high-performance-rnns-with-spiking-neurons |
Repo | |
Framework | |
A Mechanism of Implicit Regularization in Deep Learning
Title | A Mechanism of Implicit Regularization in Deep Learning |
Authors | Anonymous |
Abstract | Despite a lot of theoretical efforts, very little is known about mechanisms of implicit regularization by which the low complexity contributes to generalization in deep learning. In particular, causality between the generalization performance, implicit regularization and nonlinearity of activation functions is one of the basic mysteries of deep neural networks (DNNs). In this work, we introduce a novel technique for DNNs called random walk analysis and reveal a mechanism of the implicit regularization caused by nonlinearity of ReLU activation. Surprisingly, our theoretical results suggest that the learned DNNs interpolate almost linearly between data points, which leads to the low complexity solutions in the over-parameterized regime. As a result, we prove that stochastic gradient descent can learn a class of continuously differentiable functions with generalization bounds of the order of $O(n^{-2})$ ($n$: the number of samples). Furthermore, our analysis is independent of the kernel methods, including neural tangent kernels. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJx0U64FwS |
https://openreview.net/pdf?id=HJx0U64FwS | |
PWC | https://paperswithcode.com/paper/a-mechanism-of-implicit-regularization-in |
Repo | |
Framework | |
Towards Principled Objectives for Contrastive Disentanglement
Title | Towards Principled Objectives for Contrastive Disentanglement |
Authors | Anonymous |
Abstract | Unsupervised learning is an important tool that has received a significant amount of attention for decades. Its goal is unsupervised recovery,' i.e., extracting salient factors/properties from unlabeled data. Because of the challenges in defining salient properties, recently, contrastive disentanglement’ has gained popularity to discover the additional variations that are enhanced in one dataset relative to another. %In fact, contrastive disentanglement and unsupervised recovery are often combined in that we seek additional variations that exhibit salient factors/properties. Existing formulations have devised a variety of losses for this task. However, all present day methods exhibit two major shortcomings: (1) encodings for data that does not exhibit salient factors is not pushed to carry no signal; and (2) introduced losses are often hard to estimate and require additional trainable parameters. We present a new formulation for contrastive disentanglement which avoids both shortcomings by carefully formulating a probabilistic model and by using non-parametric yet easily computable metrics. We show on four challenging datasets that the proposed approach is able to better disentangle salient factors. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1lPETVFPS |
https://openreview.net/pdf?id=B1lPETVFPS | |
PWC | https://paperswithcode.com/paper/towards-principled-objectives-for-contrastive |
Repo | |
Framework | |
Multigrid Neural Memory
Title | Multigrid Neural Memory |
Authors | Anonymous |
Abstract | We introduce a novel architecture that integrates a large addressable memory space into the core functionality of a deep neural network. Our design distributes both memory addressing operations and storage capacity over many network layers. Distinct from strategies that connect neural networks to external memory banks, our approach co-locates memory with computation throughout the network structure. Mirroring recent architectural innovations in convolutional networks, we organize memory into a multiresolution hierarchy, whose internal connectivity enables learning of dynamic information routing strategies and data-dependent read/write operations. This multigrid spatial layout permits parameter-efficient scaling of memory size, allowing us to experiment with memories substantially larger than those in prior work. We demonstrate this capability on synthetic exploration and mapping tasks, where the network is able to self-organize and retain long-term memory for trajectories of thousands of time steps. On tasks decoupled from any notion of spatial geometry, such as sorting or associative recall, our design functions as a truly generic memory and yields results competitive with those of the recently proposed Differentiable Neural Computer. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByxKo04tvr |
https://openreview.net/pdf?id=ByxKo04tvr | |
PWC | https://paperswithcode.com/paper/multigrid-neural-memory-1 |
Repo | |
Framework | |
Frequency Analysis for Graph Convolution Network
Title | Frequency Analysis for Graph Convolution Network |
Authors | Anonymous |
Abstract | In this work, we develop quantitative results to the learnablity of a two-layers Graph Convolutional Network (GCN). Instead of analyzing GCN under some classes of functions, our approach provides a quantitative gap between a two-layers GCN and a two-layers MLP model. Our analysis is based on the graph signal processing (GSP) approach, which can provide much more useful insights than the message-passing computational model. Interestingly, based on our analysis, we have been able to empirically demonstrate a few case when GCN and other state-of-the-art models cannot learn even when true vertex features are extremely low-dimensional. To demonstrate our theoretical findings and propose a solution to the aforementioned adversarial cases, we build a proof of concept graph neural network model with stacked filters named Graph Filters Neural Network (gfNN). |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HylthC4twr |
https://openreview.net/pdf?id=HylthC4twr | |
PWC | https://paperswithcode.com/paper/frequency-analysis-for-graph-convolution |
Repo | |
Framework | |
A Graph Neural Network Assisted Monte Carlo Tree Search Approach to Traveling Salesman Problem
Title | A Graph Neural Network Assisted Monte Carlo Tree Search Approach to Traveling Salesman Problem |
Authors | Anonymous |
Abstract | We present a graph neural network assisted Monte Carlo Tree Search approach for the classical traveling salesman problem (TSP). We adopt a greedy algorithm framework to construct the optimal solution to TSP by adding the nodes successively. A graph neural network (GNN) is trained to capture the local and global graph structure and give the prior probability of selecting each vertex every step. The prior probability provides a heuristics for MCTS, and the MCTS output is an improved probability for selecting the successive vertex, as it is the feedback information by fusing the prior with the scouting procedure. Experimental results on TSP up to 100 nodes demonstrate that the proposed method obtains shorter tours than other learning-based methods. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Syg6fxrKDB |
https://openreview.net/pdf?id=Syg6fxrKDB | |
PWC | https://paperswithcode.com/paper/a-graph-neural-network-assisted-monte-carlo |
Repo | |
Framework | |
Neural Design of Contests and All-Pay Auctions using Multi-Agent Simulation
Title | Neural Design of Contests and All-Pay Auctions using Multi-Agent Simulation |
Authors | Anonymous |
Abstract | We propose a multi-agent learning approach for designing crowdsourcing contests and all-pay auctions. Prizes in contests incentivise contestants to expend effort on their entries, with different prize allocations resulting in different incentives and bidding behaviors. In contrast to auctions designed manually by economists, our method searches the possible design space using a simulation of the multi-agent learning process, and can thus handle settings where a game-theoretic equilibrium analysis is not tractable. Our method simulates agent learning in contests and evaluates the utility of the resulting outcome for the auctioneer. Given a large contest design space, we assess through simulation many possible contest designs within the space, and fit a neural network to predict outcomes for previously untested contest designs. Finally, we apply mirror descent to optimize the design so as to achieve more desirable outcomes. Our empirical analysis shows our approach closely matches the optimal outcomes in settings where the equilibrium is known, and can produce high quality designs in settings where the equilibrium strategies are not solvable analytically. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bklg1grtDr |
https://openreview.net/pdf?id=Bklg1grtDr | |
PWC | https://paperswithcode.com/paper/neural-design-of-contests-and-all-pay |
Repo | |
Framework | |
A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions
Title | A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions |
Authors | Anonymous |
Abstract | One of the distinguishing characteristics of modern deep learning systems is that they typically employ neural network architectures that utilize enormous numbers of parameters, often in the millions and sometimes even in the billions. While this paradigm has inspired significant research on the properties of large networks, relatively little work has been devoted to the fact that these networks are often used to model large complex datasets, which may themselves contain millions or even billions of constraints. In this work, we focus on this high-dimensional regime in which both the dataset size and the number of features tend to infinity. We analyze the performance of a simple regression model trained on the random features $F=f(WX+B)$ for a random weight matrix $W$ and random bias vector $B$, obtaining an exact formula for the asymptotic training error on a noisy autoencoding task. The role of the bias can be understood as parameterizing a distribution over activation functions, and our analysis actually extends to general such distributions, even those not expressible with a traditional additive bias. Intruigingly, we find that a mixture of nonlinearities can outperform the best single nonlinearity on the noisy autoecndoing task, suggesting that mixtures of nonlinearities might be useful for approximate kernel methods or neural network architecture design. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJx7N1SKvB |
https://openreview.net/pdf?id=BJx7N1SKvB | |
PWC | https://paperswithcode.com/paper/a-random-matrix-perspective-on-mixtures-of |
Repo | |
Framework | |
Program Guided Agent
Title | Program Guided Agent |
Authors | Anonymous |
Abstract | Developing agents that can learn to follow natural language instructions has been an emerging research direction. While being accessible and flexible, natural language instructions can sometimes be ambiguous even to humans. To address this, we propose to utilize programs, structured in a formal language, as a precise and expressive way to specify tasks. We then devise a modular framework that learns to perform a task specified by a program – as different circumstances give rise to diverse ways to accomplish the task, our framework can perceive which circumstance it is currently under, and instruct a multitask policy accordingly to fulfill each subtask of the overall task. Experimental results on a 2D Minecraft environment not only demonstrate that the proposed framework learns to reliably accomplish program instructions and achieves zero-shot generalization to more complex instructions but also verify the efficiency of the proposed modulation mechanism for learning the multitask policy. We also conduct an analysis comparing various models which learn from programs and natural language instructions in an end-to-end fashion. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkxUvnEYDH |
https://openreview.net/pdf?id=BkxUvnEYDH | |
PWC | https://paperswithcode.com/paper/program-guided-agent |
Repo | |
Framework | |
MissDeepCausal: causal inference from incomplete data using deep latent variable models
Title | MissDeepCausal: causal inference from incomplete data using deep latent variable models |
Authors | Anonymous |
Abstract | Inferring causal effects of a treatment, intervention or policy from observational data is central to many applications. However, state-of-the-art methods for causal inference seldom consider the possibility that covariates have missing values, which is ubiquitous in many real-world analyses. Missing data greatly complicate causal inference procedures as they require an adapted unconfoundedness hypothesis which can be difficult to justify in practice. We circumvent this issue by considering latent confounders whose distribution is learned through variational autoencoders adapted to missing values. They can be used either as a pre-processing step prior to causal inference but we also suggest to embed them in a multiple imputation strategy to take into account the variability due to missing values. Numerical experiments demonstrate the effectiveness of the proposed methodology especially for non-linear models compared to competitors. |
Tasks | Causal Inference, Imputation, Latent Variable Models |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SylpBgrKPH |
https://openreview.net/pdf?id=SylpBgrKPH | |
PWC | https://paperswithcode.com/paper/missdeepcausal-causal-inference-from |
Repo | |
Framework | |
Data Valuation using Reinforcement Learning
Title | Data Valuation using Reinforcement Learning |
Authors | Anonymous |
Abstract | Quantifying the value of data is a fundamental problem in machine learning. Data valuation has multiple important use cases: (1) building insights about the learning task, (2) domain adaptation, (3) corrupted sample discovery, and (4) robust learning. To adaptively learn data values jointly with the target task predictor model, we propose a meta learning framework which we name Data Valuation using Reinforcement Learning (DVRL). We employ a data value estimator (modeled by a deep neural network) to learn how likely each datum is used in training of the predictor model. We train the data value estimator using a reinforcement signal of the reward obtained on a small validation set that reflects performance on the target task. We demonstrate that DVRL yields superior data value estimates compared to alternative methods across different types of datasets and in a diverse set of application scenarios. The corrupted sample discovery performance of DVRL is close to optimal in many regimes (i.e. as if the noisy samples were known apriori), and for domain adaptation and robust learning DVRL significantly outperforms state-of-the-art by 14.6% and 10.8%, respectively. |
Tasks | Domain Adaptation, Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJx8YnEFPH |
https://openreview.net/pdf?id=BJx8YnEFPH | |
PWC | https://paperswithcode.com/paper/data-valuation-using-reinforcement-learning-1 |
Repo | |
Framework | |
Deep Graph Matching Consensus
Title | Deep Graph Matching Consensus |
Authors | Anonymous |
Abstract | This work presents a two-stage neural architecture for learning and refining structural correspondences between graphs. First, we use localized node embeddings computed by a graph neural network to obtain an initial ranking of soft correspondences between nodes. Secondly, we employ synchronous message passing networks to iteratively re-rank the soft correspondences to reach a matching consensus in local neighborhoods between graphs. We show, theoretically and empirically, that our message passing scheme computes a well-founded measure of consensus for corresponding neighborhoods, which is then used to guide the iterative re-ranking process. Our purely local and sparsity-aware architecture scales well to large, real-world inputs while still being able to recover global correspondences consistently. We demonstrate the practical effectiveness of our method on real-world tasks from the fields of computer vision and entity alignment between knowledge graphs, on which we improve upon the current state-of-the-art. |
Tasks | Entity Alignment, Graph Matching, Knowledge Graphs |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyeJf1HKvS |
https://openreview.net/pdf?id=HyeJf1HKvS | |
PWC | https://paperswithcode.com/paper/deep-graph-matching-consensus |
Repo | |
Framework | |
Behavior-Guided Reinforcement Learning
Title | Behavior-Guided Reinforcement Learning |
Authors | Anonymous |
Abstract | We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space. We show that by utilizing the dual formulation of the WD, we can learn score functions over trajectories that can be in turn used to lead policy optimization towards (or away from) (un)desired behaviors. Combined with smoothed WDs, the dual formulation allows us to devise efficient algorithms that take stochastic gradient descent steps through WD regularizers. We incorporate these regularizers into two novel on-policy algorithms, Behavior-Guided Policy Gradient and Behavior-Guided Evolution Strategies, which we demonstrate can outperform existing methods in a variety of challenging environments. We also provide an open source demo. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hklo5RNtwS |
https://openreview.net/pdf?id=Hklo5RNtwS | |
PWC | https://paperswithcode.com/paper/behavior-guided-reinforcement-learning |
Repo | |
Framework | |
AMUSED: A Multi-Stream Vector Representation Method for Use In Natural Dialogue
Title | AMUSED: A Multi-Stream Vector Representation Method for Use In Natural Dialogue |
Authors | Anonymous |
Abstract | The problem of building a coherent and non-monotonous conversational agent with proper discourse and coverage is still an area of open research. Current architectures only take care of semantic and contextual information for a given query and fail to completely account for syntactic and external knowledge which are crucial for generating responses in a chit-chat system. To overcome this problem, we propose an end to end multi-stream deep learning architecture which learns unified embeddings for query-response pairs by leveraging contextual information from memory networks and syntactic information by incorporating Graph Convolution Networks (GCN) over their dependency parse. A stream of this network also utilizes transfer learning by pre-training a bidirectional transformer to extract semantic representation for each input sentence and incorporates external knowledge through the neighbourhood of the entities from a Knowledge Base (KB). We benchmark these embeddings on next sentence prediction task and significantly improve upon the existing techniques. Furthermore, we use AMUSED to represent query and responses along with its context to develop a retrieval based conversational agent which has been validated by expert linguists to have comprehensive engagement with humans. |
Tasks | Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJe6t1SFDB |
https://openreview.net/pdf?id=rJe6t1SFDB | |
PWC | https://paperswithcode.com/paper/amused-a-multi-stream-vector-representation |
Repo | |
Framework | |