April 1, 2020

3073 words 15 mins read

Paper Group NANR 81

Paper Group NANR 81

Neural networks with motivation. Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos. Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video. The Logical Expressiveness of Graph Neural Networks. Iterative energy-based projection on a normal data manifold for anomaly localization. Size-f …

Neural networks with motivation

Title Neural networks with motivation
Authors Anonymous
Abstract How can animals behave effectively in conditions involving different motivational contexts? Here, we propose how reinforcement learning neural networks can learn optimal behavior for dynamically changing motivational salience vectors. First, we show that Q-learning neural networks with motivation can navigate in environment with dynamic rewards. Second, we show that such networks can learn complex behaviors simultaneously directed towards several goals distributed in an environment. Finally, we show that in Pavlovian conditioning task, the responses of the neurons in our model resemble the firing patterns of neurons in the ventral pallidum (VP), a basal ganglia structure involved in motivated behaviors. We show that, similarly to real neurons, recurrent networks with motivation are composed of two oppositely-tuned classes of neurons, responding to positive and negative rewards. Our model generates predictions for the VP connectivity. We conclude that networks with motivation can rapidly adapt their behavior to varying conditions without changes in synaptic strength when expected reward is modulated by motivation. Such networks may also provide a mechanism for how hierarchical reinforcement learning is implemented in the brain.
Tasks Hierarchical Reinforcement Learning, Q-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=BJlJVCEYDB
PDF https://openreview.net/pdf?id=BJlJVCEYDB
PWC https://paperswithcode.com/paper/neural-networks-with-motivation-1
Repo
Framework

Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos

Title Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos
Authors Anonymous
Abstract Unsupervised landmark learning is the task of learning semantic keypoint-like representations without the use of expensive keypoint-level annotations. A popular approach is to factorize an image into a pose and appearance data stream, then to reconstruct the image from the factorized components. The pose representation should capture a set of consistent and tightly localized landmarks in order to facilitate reconstruction of the input image. Ultimately, we wish for our learned landmarks to focus on the foreground object of interest. However, the reconstruction task of the entire image forces the model to allocate landmarks to model the background. This work explores the effects of factorizing the reconstruction task into separate foreground and background reconstructions, conditioning only the foreground reconstruction on the unsupervised landmarks. Our experiments demonstrate that the proposed factorization results in landmarks that are focused on the foreground object of interest. Furthermore, the rendered background quality is also improved, as the background rendering pipeline no longer requires the ill-suited landmarks to model its pose and appearance. We demonstrate this improvement in the context of the video-prediction.
Tasks Video Prediction
Published 2020-01-01
URL https://openreview.net/forum?id=ryen_CEFwr
PDF https://openreview.net/pdf?id=ryen_CEFwr
PWC https://paperswithcode.com/paper/unsupervised-disentanglement-of-pose
Repo
Framework

Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video

Title Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video
Authors Anonymous
Abstract We propose a model that is able to perform physical parameter estimation of systems from video, where the differential equations governing the scene dynamics are known, but labeled states or objects are not available. Existing physical scene understanding methods require either object state supervision, or do not integrate with differentiable physics to learn interpretable system parameters and states. We address this problem through a \textit{physics-as-inverse-graphics} approach that brings together vision-as-inverse-graphics and differentiable physics engines, where objects and explicit state and velocity representations are discovered by the model. This framework allows us to perform long term extrapolative video prediction, as well as vision-based model-predictive control. Our approach significantly outperforms related unsupervised methods in long-term future frame prediction of systems with interacting objects (such as ball-spring or 3-body gravitational systems), due to its ability to build dynamics into the model as an inductive bias. We further show the value of this tight vision-physics integration by demonstrating data-efficient learning of vision-actuated model-based control for a pendulum system. We also show that the controller’s interpretability provides unique capabilities in goal-driven control and physical reasoning for zero-data adaptation.
Tasks Scene Understanding, Video Prediction
Published 2020-01-01
URL https://openreview.net/forum?id=BJeKwTNFvB
PDF https://openreview.net/pdf?id=BJeKwTNFvB
PWC https://paperswithcode.com/paper/physics-as-inverse-graphics-unsupervised
Repo
Framework

The Logical Expressiveness of Graph Neural Networks

Title The Logical Expressiveness of Graph Neural Networks
Authors Anonymous
Abstract The ability of graph neural networks (GNNs) for distinguishing nodes in graphs has been recently characterized in terms of the Weisfeiler-Lehman (WL) test for checking graph isomorphism. This characterization, however, does not settle the issue of which Boolean node classifiers (i.e., functions classifying nodes in graphs as true or false) can be expressed by GNNs. We tackle this problem by focusing on Boolean classifiers expressible as formulas in the logic FOC2, a well-studied fragment of first order logic. FOC2 is tightly related to the WL test, and hence to GNNs. We start by studying a popular class of GNNs, which we call AC-GNNs, in which the features of each node in the graph are updated, in successive layers, only in terms of the features of its neighbors. We show that this class of GNNs is too weak to capture all FOC2 classifiers, and provide a syntactic characterization of the largest subclass of FOC2 classifiers that can be captured by AC-GNNs. This subclass coincides with a logic heavily used by the knowledge representation community. We then look at what needs to be added to AC-GNNs for capturing all FOC2 classifiers. We show that it suffices to add readout functions, which allow to update the features of a node not only in terms of its neighbors, but also in terms of a global attribute vector. We call GNNs of this kind ACR-GNNs. We experimentally validate our findings showing that, on synthetic data conforming to FOC2 formulas, AC-GNNs struggle to fit the training data while ACR-GNNs can generalize even to graphs of sizes not seen during training.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1lZ7AEKvB
PDF https://openreview.net/pdf?id=r1lZ7AEKvB
PWC https://paperswithcode.com/paper/the-logical-expressiveness-of-graph-neural
Repo
Framework

Iterative energy-based projection on a normal data manifold for anomaly localization

Title Iterative energy-based projection on a normal data manifold for anomaly localization
Authors Anonymous
Abstract Autoencoder reconstructions are widely used for the task of unsupervised anomaly localization. Indeed, an autoencoder trained on normal data is expected to only be able to reconstruct normal features of the data, allowing the segmentation of anomalous pixels in an image via a simple comparison between the image and its autoencoder reconstruction. In practice however, local defects added to a normal image can deteriorate the whole reconstruction, making this segmentation challenging. To tackle the issue, we propose in this paper a new approach for projecting anomalous data on a autoencoder-learned normal data manifold, by using gradient descent on an energy derived from the autoencoder’s loss function. This energy can be augmented with regularization terms that model priors on what constitutes the user-defined optimal projection. By iteratively updating the input of the autoencoder, we bypass the loss of high-frequency information caused by the autoencoder bottleneck. This allows to produce images of higher quality than classic reconstructions. Our method achieves state-of-the-art results on various anomaly localization datasets. It also shows promising results at an inpainting task on the CelebA dataset.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HJx81ySKwr
PDF https://openreview.net/pdf?id=HJx81ySKwr
PWC https://paperswithcode.com/paper/iterative-energy-based-projection-on-a-normal
Repo
Framework

Size-free generalization bounds for convolutional neural networks

Title Size-free generalization bounds for convolutional neural networks
Authors Philip M. Long, Hanie Sedghi
Abstract We prove bounds on the generalization error of convolutional networks. The bounds are in terms of the training loss, the number of parameters, the Lipschitz constant of the loss and the distance from the weights to the initial weights. They are independent of the number of pixels in the input, and the height and width of hidden feature maps. We present experiments with CIFAR-10, along with varying hyperparameters of a deep convolutional network, comparing our bounds with practical generalization gaps.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1e_FpNFDr
PDF https://openreview.net/pdf?id=r1e_FpNFDr
PWC https://paperswithcode.com/paper/size-free-generalization-bounds-for-1
Repo
Framework

MMA Training: Direct Input Space Margin Maximization through Adversarial Training

Title MMA Training: Direct Input Space Margin Maximization through Adversarial Training
Authors Anonymous
Abstract We study adversarial robustness of neural networks from a margin maximization perspective, where margins are defined as the distances from inputs to a classifier’s decision boundary. Our study shows that maximizing margins can be achieved by minimizing the adversarial loss on the decision boundary at the shortest successful perturbation'', demonstrating a close connection between adversarial losses and the margins. We propose Max-Margin Adversarial (MMA) training to directly maximize the margins to achieve adversarial robustness. Instead of adversarial training with a fixed $\eps$, MMA offers an improvement by enabling adaptive selection of the correct’’ $\eps$ as the margin individually for each datapoint. In addition, we rigorously analyze adversarial training with the perspective of margin maximization, and provide an alternative interpretation for adversarial training, maximizing either a lower or an upper bound of the margins. Our experiments empirically confirm our theory and demonstrate MMA training’s efficacy on the MNIST and CIFAR10 datasets w.r.t. $\ell_\infty$ and $\ell_2$ robustness.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HkeryxBtPB
PDF https://openreview.net/pdf?id=HkeryxBtPB
PWC https://paperswithcode.com/paper/mma-training-direct-input-space-margin
Repo
Framework

Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

Title Unsupervised Distillation of Syntactic Information from Contextualized Word Representations
Authors Anonymous
Abstract Contextualized word representations, such as ELMo and BERT, were shown to perform well on a various of semantic and structural (syntactic) task. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metric-learning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original contextualized representations in few-shot parsing setting.
Tasks Metric Learning
Published 2020-01-01
URL https://openreview.net/forum?id=HJlRFlHFPS
PDF https://openreview.net/pdf?id=HJlRFlHFPS
PWC https://paperswithcode.com/paper/unsupervised-distillation-of-syntactic
Repo
Framework

Monotonic Multihead Attention

Title Monotonic Multihead Attention
Authors Anonymous
Abstract Simultaneous machine translation models start generating a target sequence before they have encoded or read the source sequence. Recent approach for this task either apply a fixed policy on transformer, or a learnable monotonic attention on a weaker recurrent neural network based structure. In this paper, we propose a new attention mechanism, Monotonic Multihead Attention (MMA), which introduced the monotonic attention mechanism to multihead attention. We also introduced two novel interpretable approaches for latency control that are specifically designed for multiple attentions. We apply MMA to the simultaneous machine translation task and demonstrate better latency-quality tradeoffs compared to MILk, the previous state-of-the-art approach. Code will be released upon publication.
Tasks Machine Translation
Published 2020-01-01
URL https://openreview.net/forum?id=Hyg96gBKPS
PDF https://openreview.net/pdf?id=Hyg96gBKPS
PWC https://paperswithcode.com/paper/monotonic-multihead-attention
Repo
Framework

Feature-Robustness, Flatness and Generalization Error for Deep Neural Networks

Title Feature-Robustness, Flatness and Generalization Error for Deep Neural Networks
Authors Anonymous
Abstract The performance of deep neural networks is often attributed to their automated, task-related feature construction. It remains an open question, though, why this leads to solutions with good generalization, even in cases where the number of parameters is larger than the number of samples. Back in the 90s, Hochreiter and Schmidhuber observed that flatness of the loss surface around a local minimum correlates with low generalization error. For several flatness measures, this correlation has been empirically validated. However, it has recently been shown that existing measures of flatness cannot theoretically be related to generalization: if a network uses ReLU activations, the network function can be reparameterized without changing its output in such a way that flatness is changed almost arbitrarily. This paper proposes a natural modification of existing flatness measures that results in invariance to reparameterization. The proposed measures imply a robustness of the network to changes in the input and the hidden layers. Connecting this feature robustness to generalization leads to a generalized definition of the representativeness of data. With this, the generalization error of a model trained on representative data can be bounded by its feature robustness which depends on our novel flatness measure.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJxFpp4Fvr
PDF https://openreview.net/pdf?id=rJxFpp4Fvr
PWC https://paperswithcode.com/paper/feature-robustness-flatness-and
Repo
Framework

Learning representations for binary-classification without backpropagation

Title Learning representations for binary-classification without backpropagation
Authors Anonymous
Abstract The family of feedback alignment (FA) algorithms aims to provide a more biologically motivated alternative to backpropagation (BP), by substituting the computations that are unrealistic to be implemented in physical brains. While FA algorithms have been shown to work well in practice, there is a lack of rigorous theory proofing their learning capabilities. Here we introduce the first feedback alignment algorithm with provable learning guarantees. In contrast to existing work, we do not require any assumption about the size or depth of the network except that it has a single output neuron, i.e., such as for binary classification tasks. We show that our FA algorithm can deliver its theoretical promises in practice, surpassing the learning performance of existing FA methods and matching backpropagation in binary classification tasks. Finally, we demonstrate the limits of our FA variant when the number of output neurons grows beyond a certain quantity.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Bke61krFvS
PDF https://openreview.net/pdf?id=Bke61krFvS
PWC https://paperswithcode.com/paper/learning-representations-for-binary
Repo
Framework

Topic Models with Survival Supervision: Archetypal Analysis and Neural Approaches

Title Topic Models with Survival Supervision: Archetypal Analysis and Neural Approaches
Authors George H. Chen, Linhong Li, Ren Zuo, Amanda Coston, Jeremy C. Weiss
Abstract We introduce two approaches to topic modeling supervised by survival analysis. Both approaches predict time-to-event outcomes while simultaneously learning topics over features that help prediction. The high-level idea is to represent each data point as a distribution over topics using some underlying topic model. Then each data point’s distribution over topics is fed as input to a survival model. The topic and survival models are jointly learned. The two approaches we propose differ in the generality of topic models they can learn. The first approach finds topics via archetypal analysis, a nonnegative matrix factorization method that optimizes over a wide class of topic models encompassing latent Dirichlet allocation (LDA), correlated topic models, and topic models based on the ``anchor word’’ assumption; the resulting survival-supervised variant solves an alternating minimization problem. Our second approach builds on recent work that approximates LDA in a neural net framework. We add a survival loss layer to this neural net to form an approximation to survival-supervised LDA. Both of our approaches can be combined with a variety of survival models. We demonstrate our approach on two survival datasets, showing that survival-supervised topic models can achieve competitive time-to-event prediction accuracy while outputting clinically interpretable topics. |
Tasks Survival Analysis, Time-to-Event Prediction, Topic Models
Published 2020-01-01
URL https://openreview.net/forum?id=rJg9OANFwS
PDF https://openreview.net/pdf?id=rJg9OANFwS
PWC https://paperswithcode.com/paper/topic-models-with-survival-supervision
Repo
Framework

What graph neural networks cannot learn: depth vs width

Title What graph neural networks cannot learn: depth vs width
Authors Anonymous
Abstract This paper studies theoretically the capacity limits of graph neural networks (GNN) falling within the message-passing framework. Two main results are presented. First, GNN are shown to be Turing universal under sufficient conditions on their depth, width, node identification, and layer expressiveness. Second, it is discovered that GNN can lose a significant portion of their power when their depth and width is restricted. The proposed impossibility statements stem from a new technique that enables the repurposing of seminal results from theoretical computer science and leads to lower bounds for an array of decision, optimization, and estimation problems involving graphs. Strikingly, several of these problems are deemed impossible unless the product of a GNN’s depth and width exceeds (a function of) the graph size; this dependence remains significant even for tasks that appear simple or when considering approximation.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=B1l2bp4YwS
PDF https://openreview.net/pdf?id=B1l2bp4YwS
PWC https://paperswithcode.com/paper/what-graph-neural-networks-cannot-learn-depth-1
Repo
Framework

Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog

Title Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog
Authors Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard
Abstract Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment. This is a critical shortcoming for applying RL to real-world problems where collecting data is expensive, and models must be tested offline before being deployed to interact with the environment – e.g. systems that learn from human interaction. Thus, we develop a novel class of off-policy batch RL algorithms which use KL-control to penalize divergence from a pre-trained prior model of probable actions. This KL-constraint reduces extrapolation error, enabling effective offline learning, without exploration, from a fixed batch of data. We also use dropout-based uncertainty estimates to lower bound the target Q-values as a more efficient alternative to Double Q-Learning. This Way Off-Policy (WOP) algorithm is tested on both traditional RL tasks from OpenAI Gym, and on the problem of open-domain dialog generation; a challenging reinforcement learning problem with a 20,000 dimensional action space. WOP allows for the extraction of multiple different reward functions post-hoc from collected human interaction data, and can learn effectively from all of these. We test real-world generalization by deploying dialog models live to converse with humans in an open-domain setting, and demonstrate that WOP achieves significant improvements over state-of-the-art prior methods in batch deep RL.
Tasks Q-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rJl5rRVFvH
PDF https://openreview.net/pdf?id=rJl5rRVFvH
PWC https://paperswithcode.com/paper/way-off-policy-batch-deep-reinforcement-1
Repo
Framework

Neural Arithmetic Units

Title Neural Arithmetic Units
Authors Anonymous
Abstract Neural networks can approximate complex functions, but they struggle to perform exact arithmetic operations over real numbers. The lack of inductive bias for arithmetic operations leaves neural networks without the underlying logic needed to extrapolate on tasks such as addition, subtraction, and multiplication. We present two new neural network components: the Neural Addition Unit (NAU), which can learn to add and subtract; and Neural Multiplication Unit (NMU) that can multiply subsets of a vector. The NMU is to our knowledge the first arithmetic neural network component that can learn multiplication of a vector with a large hidden size. The two new components draw inspiration from a theoretical analysis of recent arithmetic components. We find that careful initialization, restricting parameter space, and regularizing for sparsity is important when optimizing the NAU and NMU. Our results, compared with previous attempts, show that the NAU and NMU converges more consistently, have fewer parameters, learns faster, does not diverge with large hidden sizes, obtains sparse and meaningful weights, and can extrapolate to negative and small numbers.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=H1gNOeHKPS
PDF https://openreview.net/pdf?id=H1gNOeHKPS
PWC https://paperswithcode.com/paper/neural-arithmetic-units
Repo
Framework
comments powered by Disqus