April 1, 2020

2806 words 14 mins read

Paper Group NANR 139

Translation Between Waves, wave2wave. Deep Generative Classifier for Out-of-distribution Sample Detection. Lean Images for Geo-Localization. Differentiable Programming for Physical Simulation. IEG: Robust neural net training with severe label noises. Frequency Pooling: Shift-Equivalent and Anti-Aliasing Down Sampling. Effective Use of Variational E …

Translation Between Waves, wave2wave


Title	Translation Between Waves, wave2wave
Authors	Tsuyoshi Okita, Hirotaka Hachiya, Sozo Inoue, Naonori Ueda
Abstract	The understanding of sensor data has been greatly improved by advanced deep learning methods with big data. However, available sensor data in the real world are still limited, which is called the opportunistic sensor problem. This paper proposes a new variant of neural machine translation seq2seq to deal with continuous signal waves by introducing the window-based (inverse-) representation to adaptively represent partial shapes of waves and the iterative back-translation model for high-dimensional data. Experimental results are shown for two real-life data: earthquake and activity translation. The performance improvements of one-dimensional data was about 46 % in test loss and that of high-dimensional data was about 1625 % in perplexity with regard to the original seq2seq.
Tasks	Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=rJxG3pVKPB
PDF	https://openreview.net/pdf?id=rJxG3pVKPB
PWC	https://paperswithcode.com/paper/translation-between-waves-wave2wave
Repo
Framework

Deep Generative Classifier for Out-of-distribution Sample Detection


Title	Deep Generative Classifier for Out-of-distribution Sample Detection
Authors	Anonymous
Abstract	The capability of reliably detecting out-of-distribution samples is one of the key factors in deploying a good classifier, as the test distribution always does not match with the training distribution in most real-world applications. In this work, we propose a deep generative classifier which is effective to detect out-of-distribution samples as well as classify in-distribution samples, by integrating the concept of Gaussian discriminant analysis into deep neural networks. Unlike the discriminative (or softmax) classifier that only focuses on the decision boundary partitioning its latent space into multiple regions, our generative classifier aims to explicitly model class-conditional distributions as separable Gaussian distributions. Thereby, we can define the confidence score by the distance between a test sample and the center of each distribution. Our empirical evaluation on multi-class images and tabular data demonstrate that the generative classifier achieves the best performances in distinguishing out-of-distribution samples, and also it can be generalized well for various types of deep neural networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJePXkHtvS
PDF	https://openreview.net/pdf?id=HJePXkHtvS
PWC	https://paperswithcode.com/paper/deep-generative-classifier-for-out-of
Repo
Framework

Lean Images for Geo-Localization


Title	Lean Images for Geo-Localization
Authors	Anonymous
Abstract	Most computer vision tasks use textured images. In this paper we consider the geo-localization task - finding the pose of a camera in a large 3D scene from a single lean image, i.e. an image with no texture. We aim to experimentally explore whether texture and correlation between nearby images are necessary in a CNN-based solution for this task. Our results may give insight to the role of geometry (as opposed to textures) in a CNN-based geo-localization solution. Lean images are projections of a simple 3D model of a city. They contain solely information that relates to the geometry of the scene viewed (edges, faces, or relative depth). We find that the network is capable of estimating the camera pose from lean images for a relatively large number of locations (order of hundreds of thousands of images). The main contributions of this paper are: (i) demonstrating the power of CNNs for recovering camera pose using lean images; and (ii) providing insight into the role of geometry in the CNN learning process;
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1xcLJrYwH
PDF	https://openreview.net/pdf?id=B1xcLJrYwH
PWC	https://paperswithcode.com/paper/lean-images-for-geo-localization
Repo
Framework

Differentiable Programming for Physical Simulation


Title	Differentiable Programming for Physical Simulation
Authors	Anonymous
Abstract	We study the problem of learning and optimizing through physical simulations via differentiable programming. We present DiffSim, a new differentiable programming language tailored for building high-performance differentiable physical simulations. We demonstrate the performance and productivity of our language in gradient-based learning and optimization tasks on 10 different physical simulators. For example, a differentiable elastic object simulator written in our language is 4.6x faster than the hand-engineered CUDA version yet runs as fast, and is 188x faster than TensorFlow. Using our differentiable programs, neural network controllers are typically optimized within only tens of iterations. Finally, we share the lessons learned from our experience developing these simulators, that is, differentiating physical simulators does not always yield useful gradients of the physical system being simulated. We systematically study the underlying reasons and propose solutions to improve gradient quality.
Tasks	Physical Simulations
Published	2020-01-01
URL	https://openreview.net/forum?id=B1eB5xSFvr
PDF	https://openreview.net/pdf?id=B1eB5xSFvr
PWC	https://paperswithcode.com/paper/differentiable-programming-for-physical
Repo
Framework

IEG: Robust neural net training with severe label noises


Title	IEG: Robust neural net training with severe label noises
Authors	Anonymous
Abstract	Collecting large-scale data with clean labels for supervised training of neural networks is practically challenging. Although noisy labels are usually cheap to acquire, existing methods suffer severely for training datasets with high noise ratios, making high-cost human labeling a necessity. Here we present a method to train neural networks in a way that is almost invulnerable to severe label noise by utilizing a tiny trusted set. Our method, named IEG, is based on three key factors: (i) Isolation of noisy labels, (ii) Escalation of useful supervision from mislabeled data, and (iii) Guidance from small trusted data. On CIFAR100 with a 40% uniform noise ratio and 10 trusted labeled data per class, our method achieves $80.2{\pm}0.3%$ classification accuracy, only 1.4% higher error than a neural network trained without label noise. Moreover, increasing the noise ratio to 80%, our method still achieves a high accuracy of $75.5{\pm}0.2%$, compared to the previous best 47.7%. Finally, our method sets new state of the art on various types of challenging label corruption levels and large-scale WebVision benchmarks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJxyOhVtvB
PDF	https://openreview.net/pdf?id=SJxyOhVtvB
PWC	https://paperswithcode.com/paper/ieg-robust-neural-net-training-with-severe
Repo
Framework

Frequency Pooling: Shift-Equivalent and Anti-Aliasing Down Sampling


Title	Frequency Pooling: Shift-Equivalent and Anti-Aliasing Down Sampling
Authors	Anonymous
Abstract	Convolutional layer utilizes the shift-equivalent prior of images which makes it a great success for image processing. However, commonly used down sampling methods in convolutional neural networks (CNNs), such as max-pooling, average-pooling, and strided-convolution, are not shift-equivalent. This destroys the shift-equivalent property of CNNs and degrades their performance. In this paper, we propose a novel pooling method which is \emph{strict shift equivalent and anti-aliasing} in theory. This is achieved by (inverse) Discrete Fourier Transform and we call our method frequency pooling. Experiments on image classifications show that frequency pooling improves accuracy and robustness w.r.t shifts of CNNs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SyxD7lrFPH
PDF	https://openreview.net/pdf?id=SyxD7lrFPH
PWC	https://paperswithcode.com/paper/frequency-pooling-shift-equivalent-and-anti
Repo
Framework

Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis


Title	Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis
Authors	Anonymous
Abstract	Recent work has explored sequence-to-sequence latent variable models for expressive speech synthesis (supporting control and transfer of prosody and style), but has not presented a coherent framework for understanding the trade-offs between the competing methods. In this paper, we propose embedding capacity (the amount of information the embedding contains about the data) as a unified method of analyzing the behavior of latent variable models of speech, comparing existing heuristic (non-variational) methods to variational methods that are able to explicitly constrain capacity using an upper bound on representational mutual information. In our proposed model (Capacitron), we show that by adding conditional dependencies to the variational posterior such that it matches the form of the true posterior, the same model can be used for high-precision prosody transfer, text-agnostic style transfer, and generation of natural-sounding prior samples. For multi-speaker models, Capacitron is able to preserve target speaker identity during inter-speaker prosody transfer and when drawing samples from the latent prior. Lastly, we introduce a method for decomposing embedding capacity hierarchically across two sets of latents, allowing a portion of the latent variability to be specified and the remaining variability sampled from a learned prior. Audio examples are available on the web.
Tasks	Latent Variable Models, Speech Synthesis, Style Transfer
Published	2020-01-01
URL	https://openreview.net/forum?id=SJgBQaVKwH
PDF	https://openreview.net/pdf?id=SJgBQaVKwH
PWC	https://paperswithcode.com/paper/effective-use-of-variational-embedding-1
Repo
Framework

A SIMPLE AND EFFECTIVE FRAMEWORK FOR PAIRWISE DEEP METRIC LEARNING


Title	A SIMPLE AND EFFECTIVE FRAMEWORK FOR PAIRWISE DEEP METRIC LEARNING
Authors	Anonymous
Abstract	Deep metric learning (DML) has received much attention in deep learning due to its wide applications in computer vision. Previous studies have focused on designing complicated losses and hard example mining methods, which are mostly heuristic and lack of theoretical understanding. In this paper, we cast DML as a simple pairwise binary classification problem that classifies a pair of examples as similar or dissimilar. It identifies the most critical issue in this problem—imbalanced data pairs. To tackle this issue, we propose a simple and effective framework to sample pairs in a batch of data for updating the model. The key to this framework is to define a robust loss for all pairs over a mini-batch of data, which is formulated by distributionally robust optimization. The flexibility in constructing the {\it uncertainty decision set} of the dual variable allows us to recover state-of-the-art complicated losses and also to induce novel variants. Empirical studies on several benchmark data sets demonstrate that our simple and effective method outperforms the state-of-the-art results.
Tasks	Metric Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJl3CANKvB
PDF	https://openreview.net/pdf?id=SJl3CANKvB
PWC	https://paperswithcode.com/paper/a-simple-and-effective-framework-for-pairwise
Repo
Framework

Newton Residual Learning


Title	Newton Residual Learning
Authors	Anonymous
Abstract	A plethora of computer vision tasks, such as optical flow and image alignment, can be formulated as non-linear optimization problems. Before the resurgence of deep learning, the dominant family for solving such optimization problems was numerical optimization, e.g, Gauss-Newton (GN). More recently, several attempts were made to formulate learnable GN steps as cascade regression architectures. In this paper, we investigate recent machine learning architectures, such as deep neural networks with residual connections, under the above perspective. To this end, we first demonstrate how residual blocks (when considered as discretization of ODEs) can be viewed as GN steps. Then, we go a step further and propose a new residual block, that is reminiscent of Newton’s method in numerical optimization and exhibits faster convergence. We thoroughly evaluate the proposed Newton-ResNet by conducting experiments on image and speech classification and image generation, using 4 datasets. All the experiments demonstrate that Newton-ResNet requires less parameters to achieve the same performance with the original ResNet.
Tasks	Image Generation, Optical Flow Estimation
Published	2020-01-01
URL	https://openreview.net/forum?id=BkxaXeHYDB
PDF	https://openreview.net/pdf?id=BkxaXeHYDB
PWC	https://paperswithcode.com/paper/newton-residual-learning
Repo
Framework

PairNorm: Tackling Oversmoothing in GNNs


Title	PairNorm: Tackling Oversmoothing in GNNs
Authors	Anonymous
Abstract	The performance of graph neural nets (GNNs) is known to gradually decrease with increasing number of layers. This decay is partly attributed to oversmoothing, where repeated graph convolutions eventually make node embeddings indistinguishable. We take a closer look at two different interpretations, aiming to quantify oversmoothing. Our main contribution is PairNorm, a novel normalization layer that is based on a careful analysis of the graph convolution operator, which prevents all node embeddings from becoming too similar. What is more, PairNorm is fast, easy to implement without any change to network architecture nor any additional parameters, and is broadly applicable to any GNN. Experiments on real-world graphs demonstrate that PairNorm makes deeper GCN, GAT, and SGC models more robust against oversmoothing, and significantly boosts performance for a new problem setting that benefits from deeper GNNs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkecl1rtwB
PDF	https://openreview.net/pdf?id=rkecl1rtwB
PWC	https://paperswithcode.com/paper/pairnorm-tackling-oversmoothing-in-gnns-1
Repo
Framework

Entropy Minimization In Emergent Languages


Title	Entropy Minimization In Emergent Languages
Authors	Anonymous
Abstract	There is a growing interest in studying the languages emerging when neural agents are jointly trained to solve tasks requiring communication through a discrete channel. We investigate here the information-theoretic complexity of such languages, focusing on the basic two-agent, one-exchange setup. We find that, under common training procedures, the emergent languages are subject to an entropy minimization pressure that has also been detected in human language, whereby the mutual information between the communicating agent’s inputs and the messages is minimized, within the range afforded by the need for successful communication. This pressure is amplified as we increase communication channel discreteness. Further, we observe that stronger discrete-channel-driven entropy minimization leads to representations with increased robustness to overfitting and adversarial attacks. We conclude by discussing the implications of our findings for the study of natural and artificial communication systems.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SylVJTNKDr
PDF	https://openreview.net/pdf?id=SylVJTNKDr
PWC	https://paperswithcode.com/paper/entropy-minimization-in-emergent-languages
Repo
Framework

Adversarial Lipschitz Regularization


Title	Adversarial Lipschitz Regularization
Authors	Anonymous
Abstract	Generative adversarial networks (GANs) are one of the most popular approaches when it comes to training generative models, among which variants of Wasserstein GANs are considered superior to the standard GAN formulation in terms of learning stability and sample quality. However, Wasserstein GANs require the critic to be 1-Lipschitz, which is often enforced implicitly by penalizing the norm of its gradient, or by globally restricting its Lipschitz constant via weight normalization techniques. Training with a regularization term penalizing the violation of the Lipschitz constraint explicitly, instead of through the norm of the gradient, was found to be practically infeasible in most situations. With a novel generalization of Virtual Adversarial Training, called Adversarial Lipschitz Regularization, we show that using an explicit Lipschitz penalty is indeed viable and leads to competitive performance when applied to Wasserstein GANs, highlighting an important connection between Lipschitz regularization and adversarial training.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Bke_DertPB
PDF	https://openreview.net/pdf?id=Bke_DertPB
PWC	https://paperswithcode.com/paper/adversarial-lipschitz-regularization
Repo
Framework

Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators


Title	Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators
Authors	Anonymous
Abstract	Generative adversarial networks (GANs) have shown great success in applications such as image generation and inpainting. However, they typically require large datasets, which are often not available, especially in the context of prediction tasks such as image segmentation that require labels. Therefore, methods such as the CycleGAN use more easily available unlabelled data, but do not offer a way to leverage additional labelled data for improved performance. To address this shortcoming, we show how to factorise the joint data distribution into a set of lower-dimensional distributions along with their dependencies. This allows splitting the discriminator in a GAN into multiple “sub-discriminators” that can be independently trained from incomplete observations. Their outputs can be combined to estimate the density ratio between the joint real and the generator distribution, which enables training generators as in the original GAN framework. We apply our method to image generation, image segmentation and audio source separation, and obtain improved performance over a standard GAN when additional incomplete training examples are available. For the Cityscapes segmentation task in particular, our method also improves accuracy by an absolute 13.6% over CycleGAN while using only 25 additional paired examples.
Tasks	Image Generation, Semantic Segmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=Hye1RJHKwB
PDF	https://openreview.net/pdf?id=Hye1RJHKwB
PWC	https://paperswithcode.com/paper/training-generative-adversarial-networks-from-1
Repo
Framework

NeuralUCB: Contextual Bandits with Neural Network-Based Exploration


Title	NeuralUCB: Contextual Bandits with Neural Network-Based Exploration
Authors	Anonymous
Abstract	We study the stochastic contextual bandit problem, where the reward is generated from an unknown bounded function with additive noise. We propose the NeuralUCB algorithm, which leverages the representation power of deep neural networks and uses the neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration. We prove that, under mild assumptions, NeuralUCB achieves $\tilde O(\sqrt{T})$ regret bound, where $T$ is the number of rounds. To the best of our knowledge, our algorithm is the first neural network-based contextual bandit algorithm with near-optimal regret guarantee. Preliminary experiment results on synthetic data corroborate our theory, and shed light on potential applications of our algorithm to real-world problems.
Tasks	Efficient Exploration, Multi-Armed Bandits
Published	2020-01-01
URL	https://openreview.net/forum?id=r1xa9TVFvH
PDF	https://openreview.net/pdf?id=r1xa9TVFvH
PWC	https://paperswithcode.com/paper/neuralucb-contextual-bandits-with-neural
Repo
Framework

BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning


Title	BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning
Authors	Anonymous
Abstract	Ensembles, where multiple neural networks are trained individually and their predictions are averaged, have been shown to be widely successful for improving both the accuracy and predictive uncertainty of single neural networks. However, an ensemble’s cost for both training and testing increases linearly with the number of networks. In this paper, we propose BatchEnsemble, an ensemble method whose computational and memory costs are significantly lower than typical ensembles. BatchEnsemble achieves this by defining each weight matrix to be the Hadamard product of a shared weight among all ensemble members and a rank-one matrix per member. Unlike ensembles, BatchEnsemble is not only parallelizable across devices, where one device trains one member, but also parallelizable within a device, where multiple ensemble members are updated simultaneously for a given mini-batch. Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and contextual bandits tasks, BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles; the speedup at test time is 3X and memory reduction is 3X at an ensemble of size 4. We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs. We further show that BatchEnsemble can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.
Tasks	Multi-Armed Bandits
Published	2020-01-01
URL	https://openreview.net/forum?id=Sklf1yrYDr
PDF	https://openreview.net/pdf?id=Sklf1yrYDr
PWC	https://paperswithcode.com/paper/batchensemble-an-alternative-approach-to
Repo
Framework