April 1, 2020

3146 words 15 mins read

Paper Group NANR 68

Novelty Search in representational space for sample efficient exploration. Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data. EXACT ANALYSIS OF CURVATURE CORRECTED LEARNING DYNAMICS IN DEEP LINEAR NETWORKS. Equilibrium Propagation with Continual Weight Updates. Mincut Pooling in Graph Neural Networks. Never Give Up: Learning Dir …

Novelty Search in representational space for sample efficient exploration


Title	Novelty Search in representational space for sample efficient exploration
Authors	Anonymous
Abstract	We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on a weighted distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space. One key element of our approach is that we perform more gradient steps in-between every environment step in order to ensure the model accuracy. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
Tasks	Efficient Exploration
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeoE0VKDS
PDF	https://openreview.net/pdf?id=SJeoE0VKDS
PWC	https://paperswithcode.com/paper/novelty-search-in-representational-space-for
Repo
Framework

Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data


Title	Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data
Authors	Anonymous
Abstract	Nowadays, deep neural networks (DNNs) have become the main instrument for machine learning tasks within a wide range of domains, including vision, NLP, and speech. Meanwhile, in an important case of heterogenous tabular data, the advantage of DNNs over shallow counterparts remains questionable. In particular, there is no sufficient evidence that deep learning machinery allows constructing methods that outperform gradient boosting decision trees (GBDT), which are often the top choice for tabular problems. In this paper, we introduce Neural Oblivious Decision Ensembles (NODE), a new deep learning architecture, designed to work with any tabular data. In a nutshell, the proposed NODE architecture generalizes ensembles of oblivious decision trees, but benefits from both end-to-end gradient-based optimization and the power of multi-layer hierarchical representation learning. With an extensive experimental comparison to the leading GBDT packages on a large number of tabular datasets, we demonstrate the advantage of the proposed NODE architecture, which outperforms the competitors on most of the tasks. We open-source the PyTorch implementation of NODE and believe that it will become a universal framework for machine learning on tabular data.
Tasks	Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=r1eiu2VtwH
PDF	https://openreview.net/pdf?id=r1eiu2VtwH
PWC	https://paperswithcode.com/paper/neural-oblivious-decision-ensembles-for-deep-1
Repo
Framework

EXACT ANALYSIS OF CURVATURE CORRECTED LEARNING DYNAMICS IN DEEP LINEAR NETWORKS


Title	EXACT ANALYSIS OF CURVATURE CORRECTED LEARNING DYNAMICS IN DEEP LINEAR NETWORKS
Authors	Anonymous
Abstract	Deep neural networks exhibit complex learning dynamics due to the highly non-convex loss landscape, which causes slow convergence and vanishing gradient problems. Second order approaches, such as natural gradient descent, mitigate such problems by neutralizing the effect of potentially ill-conditioned curvature on the gradient-based updates, yet precise theoretical understanding on how such curvature correction affects the learning dynamics of deep networks has been lack- ing. Here, we analyze the dynamics of training deep neural networks under a generalized family of natural gradient methods that applies curvature corrections, and derive precise analytical solutions. Our analysis reveals that curvature corrected update rules preserve many features of gradient descent, such that the learning trajectory of each singular mode in natural gradient descent follows precisely the same path as gradient descent, while only accelerating the temporal dynamics along the path. We also show that layer-restricted approximations of natural gradient, which are widely used in most second order methods (e.g. K-FAC), can significantly distort the learning trajectory into highly diverging dynamics that significantly differs from true natural gradient, which may lead to undesirable net- work properties. We also introduce fractional natural gradient that applies partial curvature correction, and show that it provides most of the benefit of full curvature correction in terms of convergence speed, with additional benefit of superior numerical stability and neutralizing vanishing/exploding gradient problems, which holds true also in layer-restricted approximations.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryx4TlHKDS
PDF	https://openreview.net/pdf?id=ryx4TlHKDS
PWC	https://paperswithcode.com/paper/exact-analysis-of-curvature-corrected
Repo
Framework

Equilibrium Propagation with Continual Weight Updates


Title	Equilibrium Propagation with Continual Weight Updates
Authors	Anonymous
Abstract	Equilibrium Propagation (EP) is a learning algorithm that bridges Machine Learning and Neuroscience, by computing gradients closely matching those of Backpropagation Through Time (BPTT), but with a learning rule local in space. Given an input x and associated target y, EP proceeds in two phases: in the first phase neurons evolve freely towards a first steady state; in the second phase output neurons are nudged towards y until they reach a second steady state. However, in existing implementations of EP, the learning rule is not local in time: the weight update is performed after the dynamics of the second phase have converged and requires information of the first phase that is no longer available physically. This is a major impediment to the biological plausibility of EP and its efficient hardware implementation. In this work, we propose a version of EP named Continual Equilibrium Propagation (C-EP) where neuron and synapse dynamics occur simultaneously throughout the second phase, so that the weight update becomes local in time. We prove theoretically that, provided the learning rates are sufficiently small, at each time step of the second phase the dynamics of neurons and synapses follow the gradients of the loss given by BPTT (Theorem 1). We demonstrate training with C-EP on MNIST and generalize C-EP to neural networks where neurons are connected by asymmetric connections. We show through experiments that the more the network updates follows the gradients of BPTT, the best it performs in terms of training. These results bring EP a step closer to biology while maintaining its intimate link with backpropagation.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1xJhJStPS
PDF	https://openreview.net/pdf?id=H1xJhJStPS
PWC	https://paperswithcode.com/paper/equilibrium-propagation-with-continual-weight
Repo
Framework

Mincut Pooling in Graph Neural Networks


Title	Mincut Pooling in Graph Neural Networks
Authors	Anonymous
Abstract	The advance of node pooling operations in Graph Neural Networks (GNNs) has lagged behind the feverish design of new message-passing techniques, and pooling remains an important and challenging endeavor for the design of deep architectures. In this paper, we propose a pooling operation for GNNs that leverages a differentiable unsupervised loss based on the minCut optimization objective. For each node, our method learns a soft cluster assignment vector that depends on the node features, the target inference task (e.g., a graph classification loss), and, thanks to the minCut objective, also on the connectivity structure of the graph. Graph pooling is obtained by applying the matrix of assignment vectors to the adjacency matrix and the node features. We validate the effectiveness of the proposed pooling method on a variety of supervised and unsupervised tasks.
Tasks	Graph Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=BkxfshNYwB
PDF	https://openreview.net/pdf?id=BkxfshNYwB
PWC	https://paperswithcode.com/paper/mincut-pooling-in-graph-neural-networks-1
Repo
Framework

Never Give Up: Learning Directed Exploration Strategies


Title	Never Give Up: Learning Directed Exploration Strategies
Authors	Anonymous
Abstract	We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent’s recent experience to train the directed exploratory policies, thereby encouraging the agent to repeatedly revisit all states in its environment. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control. We employ the framework of Universal Value Function Approximators to simultaneously learn many directed exploration policies with the same neural network, with different trade-offs between exploration and exploitation. By using the same neural network for different degrees of exploration/exploitation, transfer is demonstrated from predominantly exploratory policies yielding effective exploitative policies. The proposed method can be incorporated to run with modern distributed RL agents that collect large amounts of experience from many actors running in parallel on separate environment instances. Our method doubles the performance of the base agent in all hard exploration in the Atari-57 suite while maintaining a very high score across the remaining games, obtaining a median human normalised score of 1344.0%. Notably, the proposed method is the first algorithm to achieve non-zero rewards (with a mean score of 8,400) in the game of Pitfall! without using demonstrations or hand-crafted features.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Sye57xStvB
PDF	https://openreview.net/pdf?id=Sye57xStvB
PWC	https://paperswithcode.com/paper/never-give-up-learning-directed-exploration
Repo
Framework

Semi-supervised Autoencoding Projective Dependency Parsing


Title	Semi-supervised Autoencoding Projective Dependency Parsing
Authors	Anonymous
Abstract	We describe two end-to-end autoencoding models for semi-supervised graph-based dependency parsing. The first model is a Local Autoencoding Parser (LAP) encoding the input using continuous latent variables in a sequential manner; The second model is a Global Autoencoding Parser (GAP) encoding the input into dependency trees as latent variables, with exact inference. Both models consist of two parts: an encoder enhanced by deep neural networks (DNN) that can utilize the contextual information to encode the input into latent variables, and a decoder which is a generative model able to reconstruct the input. Both LAP and GAP admit a unified structure with different loss functions for labeled and unlabeled data with shared parameters. We conducted experiments on WSJ and UD dependency parsing data sets, showing that our models can exploit the unlabeled data to boost the performance given a limited amount of labeled data.
Tasks	Dependency Parsing
Published	2020-01-01
URL	https://openreview.net/forum?id=B1lsFlrKDr
PDF	https://openreview.net/pdf?id=B1lsFlrKDr
PWC	https://paperswithcode.com/paper/semi-supervised-autoencoding-projective
Repo
Framework

CAT: Compression-Aware Training for bandwidth reduction


Title	CAT: Compression-Aware Training for bandwidth reduction
Authors	Anonymous
Abstract	Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression method can result in substantial performance gains. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model in a way that allows better compression of feature maps during inference. Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization. For example, on ResNet-34 we achieve 73.1% accuracy (0.2% degradation from the baseline) with an average representation of only 1.79 bits per value. Reference implementation accompanies the paper.
Tasks	Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxCcJHtPr
PDF	https://openreview.net/pdf?id=HkxCcJHtPr
PWC	https://paperswithcode.com/paper/cat-compression-aware-training-for-bandwidth
Repo
Framework

Equivariant neural networks and equivarification


Title	Equivariant neural networks and equivarification
Authors	Anonymous
Abstract	A key difference from existing works is that our equivarification method can be applied without knowledge of the detailed functions of a layer in a neural network, and hence, can be generalized to any feedforward neural networks. Although the network size scales up, the constructed equivariant neural network does not increase the complexity of the network compared with the original one, in terms of the number of parameters. As an illustration, we build an equivariant neural network for image classification by equivarifying a convolutional neural network. Results show that our proposed method significantly reduces the design and training complexity, yet preserving the learning performance in terms of accuracy.
Tasks	Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=BkxDthVtvS
PDF	https://openreview.net/pdf?id=BkxDthVtvS
PWC	https://paperswithcode.com/paper/equivariant-neural-networks-and-1
Repo
Framework

The Variational InfoMax AutoEncoder


Title	The Variational InfoMax AutoEncoder
Authors	Anonymous
Abstract	We propose the Variational InfoMax AutoEncoder (VIMAE), an autoencoder based on a new learning principle for unsupervised models: the Capacity-Constrained InfoMax, which allows the learning of a disentangled representation while maintaining optimal generative performance. The variational capacity of an autoencoder is defined and we investigate its role. We associate the two main properties of a Variational AutoEncoder (VAE), generation quality and disentangled representation, to two different information concepts, respectively Mutual Information and network capacity. We deduce that a small capacity autoencoder tends to learn a more robust and disentangled representation than a high capacity one. This observation is confirmed by the computational experiments.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1eUukrtwH
PDF	https://openreview.net/pdf?id=r1eUukrtwH
PWC	https://paperswithcode.com/paper/the-variational-infomax-autoencoder-1
Repo
Framework

Group-Connected Multilayer Perceptron Networks


Title	Group-Connected Multilayer Perceptron Networks
Authors	Anonymous
Abstract	Despite the success of deep learning in domains such as image, voice, and graphs, there has been little progress in deep representation learning for domains without a known structure between features. For instance, a tabular dataset of different demographic and clinical factors where the feature interactions are not given as a prior. In this paper, we propose Group-Connected Multilayer Perceptron (GMLP) networks to enable deep representation learning in these domains. GMLP is based on the idea of learning expressive feature combinations (groups) and exploiting them to reduce the network complexity by defining local group-wise operations. During the training phase, GMLP learns a sparse feature grouping matrix using temperature annealing softmax with an added entropy loss term to encourage the sparsity. Furthermore, an architecture is suggested which resembles binary trees, where group-wise operations are followed by pooling operations to combine information; reducing the number of groups as the network grows in depth. To evaluate the proposed method, we conducted experiments on five different real-world datasets covering various application areas. Additionally, we provide visualizations on MNIST and synthesized data. According to the results, GMLP is able to successfully learn and exploit expressive feature combinations and achieve state-of-the-art classification performance on different datasets.
Tasks	Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJg4Y3VFPS
PDF	https://openreview.net/pdf?id=SJg4Y3VFPS
PWC	https://paperswithcode.com/paper/group-connected-multilayer-perceptron
Repo
Framework

A Mutual Information Maximization Perspective of Language Representation Learning


Title	A Mutual Information Maximization Perspective of Language Representation Learning
Authors	Anonymous
Abstract	We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing our theoretical understanding of these methods, our derivation leads to a principled framework that can be used to construct new self-supervised tasks. We provide an example by drawing inspirations from related methods based on mutual information maximization that have been successful in computer vision, and introduce a simple self-supervised objective that maximizes the mutual information between a global sentence representation and n-grams in the sentence. Our analysis offers a holistic view of representation learning methods to transfer knowledge and translate progress across multiple domains (e.g., natural language processing, computer vision, audio processing).
Tasks	Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Syx79eBKwr
PDF	https://openreview.net/pdf?id=Syx79eBKwr
PWC	https://paperswithcode.com/paper/a-mutual-information-maximization-perspective
Repo
Framework

Prune or quantize? Strategy for Pareto-optimally low-cost and accurate CNN


Title	Prune or quantize? Strategy for Pareto-optimally low-cost and accurate CNN
Authors	Anonymous
Abstract	Pruning and quantization are typical approaches to reduce the computational cost of CNN inference. Although the idea to combine them together seems natural, it is being unexpectedly difficult to figure out the resultant effect of the combination unless measuring the performance on a certain hardware which a user is going to use. This is because the benefits of pruning and quantization strongly depend on the hardware architecture where the model is executed. For example, a CPU-like architecture without any parallelization may fully exploit the reduction of computations by unstructured pruning for speeding up, but a GPU-like massive parallel architecture would not. Besides, there have been emerging proposals of novel hardware architectures such as one supporting variable bit precision quantization. From an engineering viewpoint, optimization for each hardware architecture is useful and important in practice, but this is quite a brute-force approach. Therefore, in this paper, we first propose hardware-agnostic metric to measure the computational cost. And using the metric, we demonstrate that Pareto-optimal performance, where the best accuracy is obtained at a given computational cost, is achieved when a slim model with smaller number of parameters is quantized moderately rather than a fat model with huge number of parameters is quantized to extremely low bit precision such as binary or ternary. Furthermore, we empirically found the possible quantitative relation between the proposed metric and the signal to noise ratio during SGD training, by which the information obtained during SGD training provides the optimal policy of quantization and pruning. We show the Pareto frontier is improved by 4 times in post-training quantization scenario based on these findings. These findings are available not only to improve the Pareto frontier for accuracy vs. computational cost, but also give us some new insights on deep neural network.
Tasks	Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxAS6VFDB
PDF	https://openreview.net/pdf?id=HkxAS6VFDB
PWC	https://paperswithcode.com/paper/prune-or-quantize-strategy-for-pareto
Repo
Framework

Disentanglement through Nonlinear ICA with General Incompressible-flow Networks (GIN)


Title	Disentanglement through Nonlinear ICA with General Incompressible-flow Networks (GIN)
Authors	Anonymous
Abstract	A central question of representation learning asks under which conditions it is possible to reconstruct the true latent variables of an arbitrarily complex generative process. Recent breakthrough work by Khemakhem et al. (2019) on nonlinear ICA has answered this question for a broad class of conditional generative processes. We extend this important result in a direction relevant for application to real-world data. First, we generalize the theory to the case of unknown intrinsic problem dimension and prove that in some special (but not very restrictive) cases, informative latent variables will be automatically separated from noise by an estimating model. Furthermore, the recovered informative latent variables will be in one-to-one correspondence with the true latent variables of the generating process, up to a trivial component-wise transformation. Second, we introduce a modification of the RealNVP invertible neural network architecture (Dinh et al. (2016)) which is particularly suitable for this type of problem: the General Incompressible-flow Network (GIN). Experiments on artificial data and EMNIST demonstrate that theoretical predictions are indeed verified in practice. In particular, we provide a detailed set of exactly 22 informative latent variables extracted from EMNIST.
Tasks	Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rygeHgSFDH
PDF	https://openreview.net/pdf?id=rygeHgSFDH
PWC	https://paperswithcode.com/paper/disentanglement-through-nonlinear-ica-with
Repo
Framework

Learning Time-Aware Assistance Functions for Numerical Fluid Solvers


Title	Learning Time-Aware Assistance Functions for Numerical Fluid Solvers
Authors	Kiwon Um, Yun (Raymond) Fei, Philipp Holl, Nils Thuerey
Abstract	Improving the accuracy of numerical methods remains a central challenge in many disciplines and is especially important for nonlinear simulation problems. A representative example of such problems is fluid flow, which has been thoroughly studied to arrive at efficient simulations of complex flow phenomena. This paper presents a data-driven approach that learns to improve the accuracy of numerical solvers. The proposed method utilizes an advanced numerical scheme with a fine simulation resolution to acquire reference data. We, then, employ a neural network that infers a correction to move a coarse thus quickly obtainable result closer to the reference data. We provide insights into the targeted learning problem with different learning approaches: fully supervised learning methods with a naive and an optimized data acquisition as well as an unsupervised learning method with a differentiable Navier-Stokes solver. While our approach is very general and applicable to arbitrary partial differential equation models, we specifically highlight gains in accuracy for fluid flow simulations.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rketraEtPr
PDF	https://openreview.net/pdf?id=rketraEtPr
PWC	https://paperswithcode.com/paper/learning-time-aware-assistance-functions-for
Repo
Framework