April 1, 2020

2840 words 14 mins read

Paper Group NANR 14

Phase Transitions for the Information Bottleneck in Representation Learning. DO-AutoEncoder: Learning and Intervening Bivariate Causal Mechanisms in Images. Task-Mediated Representation Learning. Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds. Learning Multi-facet Embeddings of Phrases and Sentences using Sparse Coding for U …

Phase Transitions for the Information Bottleneck in Representation Learning


Title	Phase Transitions for the Information Bottleneck in Representation Learning
Authors	Anonymous
Abstract	In the Information Bottleneck (IB) (Tishby et al., 2000), when tuning the relative strength between compression and prediction terms, how do the two terms behave, and what’s their relationship with the dataset and the learned representation? In this paper, we set out to answer this question by studying multiple phase transitions in the IB objective: IBβ[p(zx)] = I(X;Z) − βI(Y ;Z), where sudden jumps of dI(Y ;Z)/dβ and prediction accuracy are observed with increasing β. We introduce a definition for IB phase transitions as a qualitative change of the IB loss landscape, and show that the transitions correspond to the onset of learning new classes. Using second-order calculus of variations, we derive a formula that provides the condition for IB phase transitions, and draw its connection with the Fisher information matrix for parameterized models. We provide two perspectives to understand the formula, revealing that each IB phase transition is finding a component of maximum (nonlinear) correlation between X and Y orthogonal to the learned representation, in close analogy with canonical-correlation analysis (CCA) in linear settings. Based on the theory, we present an algorithm for discovering phase transition points. Finally, we verify that our theory and algorithm accurately predict phase transitions in categorical datasets, predict the onset of learning new classes and class difficulty in MNIST, and predict prominent phase transitions in CIFAR10 experiments.
Tasks	Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HJloElBYvB
PDF	https://openreview.net/pdf?id=HJloElBYvB
PWC	https://paperswithcode.com/paper/phase-transitions-for-the-information
Repo
Framework

DO-AutoEncoder: Learning and Intervening Bivariate Causal Mechanisms in Images


Title	DO-AutoEncoder: Learning and Intervening Bivariate Causal Mechanisms in Images
Authors	Anonymous
Abstract	Some fundamental limitations of deep learning have been exposed such as lacking generalizability and being vunerable to adversarial attack. Instead, researchers realize that causation is much more stable than association relationship in data. In this paper, we propose a new framework called do-calculus AutoEncoder(DO-AE) for deep representation learning that fully capture bivariate causal relationship in the images which allows us to intervene in images generation process. DO-AE consists of two key ingredients: causal relationship mining in images and intervention-enabling deep causal structured representation learning. The goal here is to learn deep representations that correspond to the concepts in the physical world as well as their causal structure. To verify the proposed method, we create a dataset named PHY2D, which contains abstract graphic description in accordance with the laws of physics. Our experiments demonstrate our method is able to correctly identify the bivariate causal relationship between concepts in images and the representation learned enables a do-calculus manipulation to images, which generates artificial images that might possibly break the physical law depending on where we intervene the causal system.
Tasks	Adversarial Attack, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=r1e7NgrYvH
PDF	https://openreview.net/pdf?id=r1e7NgrYvH
PWC	https://paperswithcode.com/paper/do-autoencoder-learning-and-intervening
Repo
Framework

Task-Mediated Representation Learning


Title	Task-Mediated Representation Learning
Authors	Anonymous
Abstract	Traditionally, unsupervised representation learning is used to discover underlying regularities from raw sensory data without relying on labeled data. A great number of algorithms in this field resorts to utilizing proxy objectives to facilitate learning. Further, learning how to act upon these regularities is left to a separate algorithm. Neural encoding in biological systems, on the other hand, is optimized to represent behaviorally relevant features of the environment in order to make inferences that guide successful behavior. Evidence suggests that neural encoding in biological systems is shaped by such behavioral objectives. In our work, we propose a model of inference-driven representation learning. Rather than following some auxiliary, a priori objective (e.g. minimization of reconstruction error, maximization of the fidelity of a generative model, etc.) and indiscriminately encoding information present in an observation, our model learns to build representations that support accurate inferences. Given a set of observations, our model encodes underlying regularities that de facto are necessary to solve the inference problem in hand. Rather than labeling the observations and learning representations that portray corresponding labels or learning representation in a self-supervised manner and learning explicit features of the input observations, we propose a model that learns representations that implicitly shaped by the goal of correct inference.
Tasks	Representation Learning, Unsupervised Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HklJ4gSFPS
PDF	https://openreview.net/pdf?id=HklJ4gSFPS
PWC	https://paperswithcode.com/paper/task-mediated-representation-learning
Repo
Framework

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds


Title	Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
Authors	Anonymous
Abstract	We design a new algorithm for batch active learning with deep neural network models. Our algorithm, Batch Active learning by Diverse Gradient Embeddings (BADGE), samples groups of points that are disparate and high-magnitude when represented in a hallucinated gradient space, a strategy designed to incorporate both predictive uncertainty and sample diversity into every selected batch. Crucially, BADGE trades off between diversity and uncertainty without requiring any hand-tuned hyperparameters. While other approaches sometimes succeed for particular batch sizes or architectures, BADGE consistently performs as well or better, making it a useful option for real world active learning problems.
Tasks	Active Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=ryghZJBKPS
PDF	https://openreview.net/pdf?id=ryghZJBKPS
PWC	https://paperswithcode.com/paper/deep-batch-active-learning-by-diverse-1
Repo
Framework


Title	Learning Multi-facet Embeddings of Phrases and Sentences using Sparse Coding for Unsupervised Semantic Applications
Authors	Anonymous
Abstract	Most deep learning for NLP represents each word with a single point or single-mode region in semantic space, while the existing multi-mode word embeddings cannot represent longer word sequences like phrases or sentences. We introduce a phrase representation (also applicable to sentences) where each phrase has a distinct set of multi-mode codebook embeddings to capture different semantic facets of the phrase’s meaning. The codebook embeddings can be viewed as the cluster centers which summarize the distribution of possibly co-occurring words in a pre-trained word embedding space. We propose an end-to-end trainable neural model that directly predicts the set of cluster centers from the input text sequence (e.g., a phrase or a sentence) during test time. We find that the per-phrase/sentence codebook embeddings not only provide a more interpretable semantic representation but also outperform strong baselines (by a large margin in some tasks) on benchmark datasets for unsupervised phrase similarity, sentence similarity, hypernym detection, and extractive summarization.
Tasks	Word Embeddings
Published	2020-01-01
URL	https://openreview.net/forum?id=HkebMlrFPS
PDF	https://openreview.net/pdf?id=HkebMlrFPS
PWC	https://paperswithcode.com/paper/learning-multi-facet-embeddings-of-phrases
Repo
Framework

Neural Subgraph Isomorphism Counting


Title	Neural Subgraph Isomorphism Counting
Authors	Anonymous
Abstract	In this paper, we study a new graph learning problem: learning to count subgraph isomorphisms. Although the learning based approach is inexact, we are able to generalize to count large patterns and data graphs in polynomial time compared to the exponential time of the original NP-complete problem. Different from other traditional graph learning problems such as node classification and link prediction, subgraph isomorphism counting requires more global inference to oversee the whole graph. To tackle this problem, we propose a dynamic intermedium attention memory network (DIAMNet) which augments different representation learning architectures and iteratively attends pattern and target data graphs to memorize different subgraph isomorphisms for the global counting. We develop both small graphs (<= 1,024 subgraph isomorphisms in each) and large graphs (<= 4,096 subgraph isomorphisms in each) sets to evaluate different models. Experimental results show that learning based subgraph isomorphism counting can help reduce the time complexity with acceptable accuracy. Our DIAMNet can further improve existing representation learning models for this more global problem.
Tasks	Link Prediction, Node Classification, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HJx-akSKPS
PDF	https://openreview.net/pdf?id=HJx-akSKPS
PWC	https://paperswithcode.com/paper/neural-subgraph-isomorphism-counting
Repo
Framework

Composable Semi-parametric Modelling for Long-range Motion Generation


Title	Composable Semi-parametric Modelling for Long-range Motion Generation
Authors	Anonymous
Abstract	Learning diverse and natural behaviors is one of the longstanding goal for creating intelligent characters in the animated world. In this paper, we propose ``COmposable Semi-parametric MOdelling’’ (COSMO), a method for generating long range diverse and distinctive behaviors to achieve a specific goal location. Our proposed method learns to model the motion of human by combining the complementary strengths of both non-parametric techniques and parametric ones. Given the starting and ending state, a memory bank is used to retrieve motion references that are provided as source material to a deep network. The synthesis is performed by a deep network that controls the style of the provided motion material and modifies it to become natural. On skeleton datasets with diverse motion, we show that the proposed method outperforms existing parametric and non-parametric baselines. We also demonstrate the generated sequences are useful as subgoals for actual physical execution in the animated world. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkl44TEtwH
PDF	https://openreview.net/pdf?id=rkl44TEtwH
PWC	https://paperswithcode.com/paper/composable-semi-parametric-modelling-for-long
Repo
Framework

A Bayes-Optimal View on Adversarial Examples


Title	A Bayes-Optimal View on Adversarial Examples
Authors	Anonymous
Abstract	Adversarial attacks on CNN classifiers can make an imperceptible change to an input image and alter the classification result. The source of these failures is still poorly understood, and many explanations invoke the “unreasonably linear extrapolation” used by CNNs along with the geometry of high dimensions. In this paper we show that similar attacks can be used against the Bayes-Optimal classifier for certain class distributions, while for others the optimal classifier is robust to such attacks. We present analytical results showing conditions on the data distribution under which all points can be made arbitrarily close to the optimal decision boundary and show that this can happen even when the classes are easy to separate, when the ideal classifier has a smooth decision surface and when the data lies in low dimensions. We introduce new datasets of realistic images of faces and digits where the Bayes-Optimal classifier can be calculated efficiently and show that for some of these datasets the optimal classifier is robust and for others it is vulnerable to adversarial examples. In systematic experiments with many such datasets, we find that standard CNN training consistently finds a vulnerable classifier even when the optimal classifier is robust while large-margin methods often find a robust classifier with the exact same training data. Our results suggest that adversarial vulnerability is not an unavoidable consequence of machine learning in high dimensions, and may often be a result of suboptimal training methods used in current practice.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1l3s6NtvH
PDF	https://openreview.net/pdf?id=H1l3s6NtvH
PWC	https://paperswithcode.com/paper/a-bayes-optimal-view-on-adversarial-examples
Repo
Framework

Multi-agent Reinforcement Learning for Networked System Control


Title	Multi-agent Reinforcement Learning for Networked System Control
Authors	Tianshu Chu, Sandeep Chinchali, Sachin Katti
Abstract	This paper considers multi-agent reinforcement learning (MARL) in networked system control. Specifically, each agent learns a decentralized control policy based on local observations and messages from connected neighbors. We formulate such a networked MARL (NMARL) problem as a spatiotemporal Markov decision process and introduce a spatial discount factor to stabilize the training of each local agent. Further, we propose a new differentiable communication protocol, called NeurComm, to reduce information loss and non-stationarity in NMARL. Based on experiments in realistic NMARL scenarios of adaptive traffic signal control and cooperative adaptive cruise control, an appropriate spatial discount factor effectively enhances the learning curves of non-communicative MARL algorithms, while NeurComm outperforms existing communication protocols in both learning efficiency and control performance.
Tasks	Multi-agent Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Syx7A3NFvH
PDF	https://openreview.net/pdf?id=Syx7A3NFvH
PWC	https://paperswithcode.com/paper/multi-agent-reinforcement-learning-for
Repo
Framework

Task-Relevant Adversarial Imitation Learning


Title	Task-Relevant Adversarial Imitation Learning
Authors	Anonymous
Abstract	We show that a critical problem in adversarial imitation from high-dimensional sensory data is the tendency of discriminator networks to distinguish agent and expert behaviour using task-irrelevant features beyond the control of the agent. We analyze this problem in detail and propose a solution as well as several baselines that outperform standard Generative Adversarial Imitation Learning (GAIL). Our proposed solution, Task-Relevant Adversarial Imitation Learning (TRAIL), uses a constrained optimization objective to overcome task-irrelevant features. Comprehensive experiments show that TRAIL can solve challenging manipulation tasks from pixels by imitating human operators, where other agents such as behaviour cloning (BC), standard GAIL, improved GAIL variants including our newly proposed baselines, and Deterministic Policy Gradients from Demonstrations (DPGfD) fail to find solutions, even when the other agents have access to task reward.
Tasks	Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=S1x2PCNKDB
PDF	https://openreview.net/pdf?id=S1x2PCNKDB
PWC	https://paperswithcode.com/paper/task-relevant-adversarial-imitation-learning-1
Repo
Framework

Retrospection: Leveraging the Past for Efficient Training of Deep Neural Networks


Title	Retrospection: Leveraging the Past for Efficient Training of Deep Neural Networks
Authors	Anonymous
Abstract	Deep neural networks are powerful learning machines that have enabled breakthroughs in several domains. In this work, we introduce retrospection loss to improve the performance of neural networks by utilizing prior experiences during training. Minimizing the retrospection loss pushes the parameter state at the current training step towards the optimal parameter state while pulling it away from the parameter state at a previous training step. We conduct extensive experiments to show that the proposed retrospection loss results in improved performance across multiple tasks, input types and network architectures.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1eY00VFDB
PDF	https://openreview.net/pdf?id=H1eY00VFDB
PWC	https://paperswithcode.com/paper/retrospection-leveraging-the-past-for
Repo
Framework

Deep Multiple Instance Learning with Gaussian Weighting


Title	Deep Multiple Instance Learning with Gaussian Weighting
Authors	Anonymous
Abstract	In this paper we present a deep Multiple Instance Learning (MIL) method that can be trained end-to-end to perform classification from weak supervision. Our MIL method is implemented as a two stream neural network, specialized in tasks of instance classification and weighting. Our instance weighting stream makes use of Gaussian radial basis function to normalize the instance weights by comparing instances locally within the bag and globally across bags. The final classification score of the bag is an aggregate of all instance classification scores. The instance representation is shared by both instance classification and weighting streams. The Gaussian instance weighting allows us to regularize the representation learning of instances such that all positive instances to be closer to each other w.r.t. the instance weighting function. We evaluate our method on five standard MIL datasets and show that our method outperforms other MIL methods. We also evaluate our model on two datasets where all models are trained end-to-end. Our method obtain better bag-classification and instance classification results on these datasets. We conduct extensive experiments to investigate the robustness of the proposed model and obtain interesting insights.
Tasks	Multiple Instance Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Bklrea4KwS
PDF	https://openreview.net/pdf?id=Bklrea4KwS
PWC	https://paperswithcode.com/paper/deep-multiple-instance-learning-with-gaussian
Repo
Framework

Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities


Title	Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities
Authors	Anonymous
Abstract	Multivariate spatial point process models can describe heterotopic data over space. However, highly multivariate intensities are computationally challenging due to the curse of dimensionality. To bridge this gap, we introduce a declustering based hidden variable model that leads to an efficient inference procedure via a variational autoencoder (VAE). We also prove that this model is a generalization of the VAE-based model for collaborative filtering. This leads to an interesting application of spatial point process models to recommender systems. Experimental results show the method’s utility on both synthetic data and real-world data sets.
Tasks	Point Processes, Recommendation Systems
Published	2020-01-01
URL	https://openreview.net/forum?id=B1lj20NFDS
PDF	https://openreview.net/pdf?id=B1lj20NFDS
PWC	https://paperswithcode.com/paper/variational-autoencoders-for-highly
Repo
Framework

$\ell_1$ Adversarial Robustness Certificates: a Randomized Smoothing Approach


Title	$\ell_1$ Adversarial Robustness Certificates: a Randomized Smoothing Approach
Authors	Anonymous
Abstract	Robustness is an important property to guarantee the security of machine learning models. It has recently been demonstrated that strong robustness certificates can be obtained on ensemble classifiers generated by input randomization. However, tight robustness certificates are only known for symmetric norms including $\ell_0$ and $\ell_2$, while for asymmetric norms like $\ell_1$, the existing techniques do not apply. By converting the likelihood ratio into a one-dimensional mixed random variable, we derive the first tight $\ell_1$ robustness certificate under isotropic Laplace distributions. Empirically, the deep networks smoothed by Laplace distributions yield the state-of-the-art certified robustness in $\ell_1$ norm on CIFAR-10 and ImageNet.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lQIgrFDS
PDF	https://openreview.net/pdf?id=H1lQIgrFDS
PWC	https://paperswithcode.com/paper/ell_1-adversarial-robustness-certificates-a
Repo
Framework

Sample-Based Point Cloud Decoder Networks


Title	Sample-Based Point Cloud Decoder Networks
Authors	Anonymous
Abstract	Point clouds are a flexible and ubiquitous way to represent 3D objects with arbitrary resolution and precision. Previous work has shown that adapting encoder networks to match the semantics of their input point clouds can significantly improve their effectiveness over naive feedforward alternatives. However, the vast majority of work on point-cloud decoders are still based on fully-connected networks that map shape representations to a fixed number of output points. In this work, we investigate decoder architectures that more closely match the semantics of variable sized point clouds. Specifically, we study sample-based point-cloud decoders that map a shape representation to a point feature distribution, allowing an arbitrary number of sampled features to be transformed into individual output points. We develop three sample-based decoder architectures and compare their performance to each other and show their improved effectiveness over feedforward architectures. In addition, we investigate the learned distributions to gain insight into the output transformation. Our work is available as an extensible software platform to reproduce these results and serve as a baseline for future work.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SklVI1HKvH
PDF	https://openreview.net/pdf?id=SklVI1HKvH
PWC	https://paperswithcode.com/paper/sample-based-point-cloud-decoder-networks
Repo
Framework