April 1, 2020

2717 words 13 mins read

Paper Group NANR 136

Variational Hyper RNN for Sequence Modeling. Context Based Machine Translation With Recurrent Neural Network For English-Amharic Translation. Unsupervised Out-of-Distribution Detection with Batch Normalization. Score and Lyrics-Free Singing Voice Generation. Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation. GENERALIZATIO …

Variational Hyper RNN for Sequence Modeling


Title	Variational Hyper RNN for Sequence Modeling
Authors	Anonymous
Abstract	In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence. Our method uses temporal latent variables to capture information about the underlying data pattern and dynamically decodes the latent information into modifications of weights of the base decoder and recurrent model. The efficacy of the proposed method is demonstrated on a range of synthetic and real-world sequential data that exhibit large scale variations, regime shifts, and complex dynamics.
Tasks	Time Series
Published	2020-01-01
URL	https://openreview.net/forum?id=SylUiREKvB
PDF	https://openreview.net/pdf?id=SylUiREKvB
PWC	https://paperswithcode.com/paper/variational-hyper-rnn-for-sequence-modeling
Repo
Framework

Context Based Machine Translation With Recurrent Neural Network For English-Amharic Translation


Title	Context Based Machine Translation With Recurrent Neural Network For English-Amharic Translation
Authors	Anonymous
Abstract	The current approaches for machine translation usually require large set of parallel corpus in order to achieve fluency like in the case of neural machine translation (NMT), statistical machine translation (SMT) and example-based machine translation (EBMT). The context awareness of phrase-based machine translation (PBMT) approaches is also questionable. This research develops a system that translates English text to Amharic text using a combination of context based machine translation (CBMT) and a recurrent neural network machine translation (RNNMT). We built a bilingual dictionary for the CBMT system to use along with a large target corpus. The RNNMT model has then been provided with the output of the CBMT and a parallel corpus for training. Our combinational approach on English-Amharic language pair yields a performance improvement over the simple neural machine translation (NMT).
Tasks	Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=r1lUdpVtwB
PDF	https://openreview.net/pdf?id=r1lUdpVtwB
PWC	https://paperswithcode.com/paper/context-based-machine-translation-with
Repo
Framework

Unsupervised Out-of-Distribution Detection with Batch Normalization


Title	Unsupervised Out-of-Distribution Detection with Batch Normalization
Authors	Anonymous
Abstract	Likelihood from a generative model is a natural statistic for detecting out-of-distribution (OoD) samples. However, generative models have been shown to assign higher likelihood to OoD samples compared to ones from the training distribution, preventing simple threshold-based detection rules. We demonstrate that OoD detection fails even when using more sophisticated statistics based on the likelihoods of individual samples. To address these issues, we propose a new method that leverages batch normalization. We argue that batch normalization for generative models challenges the traditional \emph{i.i.d.} data assumption and changes the corresponding maximum likelihood objective. Based on this insight, we propose to exploit in-batch dependencies for OoD detection. Empirical results suggest that this leads to more robust detection for high-dimensional images.
Tasks	Out-of-Distribution Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeC2TNYwB
PDF	https://openreview.net/pdf?id=SJeC2TNYwB
PWC	https://paperswithcode.com/paper/unsupervised-out-of-distribution-detection-1
Repo
Framework

Score and Lyrics-Free Singing Voice Generation


Title	Score and Lyrics-Free Singing Voice Generation
Authors	Anonymous
Abstract	Generative models for singing voice have been mostly concerned with the task of “singing voice synthesis,” i.e., to produce singing voice waveforms given musical scores and text lyrics. In this work, we explore a novel yet challenging alternative: singing voice generation without pre-assigned scores and lyrics, in both training and inference time. In particular, we experiment with three different schemes: 1) free singer, where the model generates singing voices without taking any conditions; 2) accompanied singer, where the model generates singing voices over a waveform of instrumental music; and 3) solo singer, where the model improvises a chord sequence first and then uses that to generate voices. We outline the associated challenges and propose a pipeline to tackle these new tasks. This involves the development of source separation and transcription models for data preparation, adversarial networks for audio generation, and customized metrics for evaluation.
Tasks	Audio Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=HygcdeBFvr
PDF	https://openreview.net/pdf?id=HygcdeBFvr
PWC	https://paperswithcode.com/paper/score-and-lyrics-free-singing-voice
Repo
Framework

Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation


Title	Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation
Authors	Anonymous
Abstract	Few-shot classification aims to recognize novel categories with only few labeled images in each class. Existing metric-based few-shot classification algorithms predict categories by comparing the feature embeddings of query images with those from a few labeled images (support examples) using a learned metric function. While promising performance has been demonstrated, these methods often fail to generalize to unseen domains due to large discrepancy of the feature distribution across domains. In this work, we address the problem of few-shot classification under domain shifts for metric-based methods. Our core idea is to use feature-wise transformation layers for augmenting the image features using affine transforms to simulate various feature distributions under different domains in the training stage. To capture variations of the feature distributions under different domains, we further apply a learning-to-learn approach to search for the hyper-parameters of the feature-wise transformation layers. We conduct extensive experiments and ablation study under the domain generalization setting using five few-shot classification datasets: mini-ImageNet, CUB, Cars, Places, and Plantae. Experimental results demonstrate that the proposed feature-wise transformation layer is applicable to various metric-based models, and provides consistent improvements on the few-shot classification performance under domain shift.
Tasks	Cross-Domain Few-Shot, Domain Generalization
Published	2020-01-01
URL	https://openreview.net/forum?id=SJl5Np4tPr
PDF	https://openreview.net/pdf?id=SJl5Np4tPr
PWC	https://paperswithcode.com/paper/cross-domain-few-shot-classification-via
Repo
Framework

GENERALIZATION GUARANTEES FOR NEURAL NETS VIA HARNESSING THE LOW-RANKNESS OF JACOBIAN


Title	GENERALIZATION GUARANTEES FOR NEURAL NETS VIA HARNESSING THE LOW-RANKNESS OF JACOBIAN
Authors	Anonymous
Abstract	Modern neural network architectures often generalize well despite containing many more parameters than the size of the training dataset. This paper explores the generalization capabilities of neural networks trained via gradient descent. We develop a data-dependent optimization and generalization theory which leverages the low-rank structure of the Jacobian matrix associated with the network. Our results help demystify why training and generalization is easier on clean and structured datasets and harder on noisy and unstructured datasets as well as how the network size affects the evolution of the train and test errors during training. Specifically, we use a control knob to split the Jacobian spectum into `information" and` nuisance” spaces associated with the large and small singular values. We show that over the information space learning is fast and one can quickly train a model with zero training loss that can also generalize well. Over the nuisance space training is slower and early stopping can help with generalization at the expense of some bias. We also show that the overall generalization capability of the network is controlled by how well the labels are aligned with the information space. A key feature of our results is that even constant width neural nets can provably generalize for sufficiently nice datasets. We conduct various numerical experiments on deep networks that corroborate our theoretical findings and demonstrate that: (i) the Jacobian of typical neural networks exhibit low-rank structure with a few large singular values and many small ones leading to a low-dimensional information space, (ii) over the information space learning is fast and most of the labels falls on this space, and (iii) label noise falls on the nuisance space and impedes optimization/generalization.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryl5CJSFPS
PDF	https://openreview.net/pdf?id=ryl5CJSFPS
PWC	https://paperswithcode.com/paper/generalization-guarantees-for-neural-nets-via
Repo
Framework

WaveFlow: A Compact Flow-based Model for Raw Audio


Title	WaveFlow: A Compact Flow-based Model for Raw Audio
Authors	Anonymous
Abstract	In this work, we present WaveFlow, a small-footprint generative flow for raw audio, which is trained with maximum likelihood without complicated density distillation and auxiliary losses as used in Parallel WaveNet. It provides a unified view of flow-based models for raw audio, including autoregressive flow (e.g., WaveNet) and bipartite flow (e.g., WaveGlow) as special cases. We systematically study these likelihood-based generative models for raw waveforms in terms of test likelihood and speech fidelity. We demonstrate that WaveFlow can synthesize high-fidelity speech and obtain comparable likelihood as WaveNet, while only requiring a few sequential steps to generate very long waveforms. In particular, our small-footprint WaveFlow has only 5.91M parameters and can generate 22.05kHz speech 15.39 times faster than real-time on a GPU without customized inference kernels.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Skeh1krtvH
PDF	https://openreview.net/pdf?id=Skeh1krtvH
PWC	https://paperswithcode.com/paper/waveflow-a-compact-flow-based-model-for-raw
Repo
Framework

Perceptual Generative Autoencoders


Title	Perceptual Generative Autoencoders
Authors	Anonymous
Abstract	Modern generative models are usually designed to match target distributions directly in the data space, where the intrinsic dimensionality of data can be much lower than the ambient dimensionality. We argue that this discrepancy may contribute to the difficulties in training generative models. We therefore propose to map both the generated and target distributions to the latent space using the encoder of a standard autoencoder, and train the generator (or decoder) to match the target distribution in the latent space. The resulting method, perceptual generative autoencoder (PGA), is then incorporated with a maximum likelihood or variational autoencoder (VAE) objective to train the generative model. With maximum likelihood, PGAs generalize the idea of reversible generative models to unrestricted neural network architectures and arbitrary latent dimensionalities. When combined with VAEs, PGAs can generate sharper samples than vanilla VAEs. Compared to other autoencoder-based generative models using simple priors, PGAs achieve state-of-the-art FID scores on CIFAR-10 and CelebA.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BkxX30EFPS
PDF	https://openreview.net/pdf?id=BkxX30EFPS
PWC	https://paperswithcode.com/paper/perceptual-generative-autoencoders-1
Repo
Framework

Measuring and Improving the Use of Graph Information in Graph Neural Networks


Title	Measuring and Improving the Use of Graph Information in Graph Neural Networks
Authors	Anonymous
Abstract	Graph neural networks (GNNs) have been widely used for representation learning on graph data. However, there is limited understanding on how much performance GNNs actually gain from graph data. This paper introduces a context-surrounding GNN framework and proposes two smoothness metrics to measure the quantity and quality of information obtained from graph data. A new, improved GNN model, called CS-GNN, is then devised to improve the use of graph information based on the smoothness values of a graph. CS-GNN is shown to achieve better performance than existing methods in different types of real graphs.
Tasks	Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rkeIIkHKvS
PDF	https://openreview.net/pdf?id=rkeIIkHKvS
PWC	https://paperswithcode.com/paper/measuring-and-improving-the-use-of-graph
Repo
Framework

City Metro Network Expansion with Reinforcement Learning


Title	City Metro Network Expansion with Reinforcement Learning
Authors	Anonymous
Abstract	This paper presents a method to solve the city metro network expansion problem using reinforcement learning (RL). In this method, we formulate the metro expansion as a process of sequential station selection, and design feasibility rules based on the selected station sequence to ensure the reasonable connection patterns of metro line. Following this formulation, we train an actor critic model to design the next metro line. The actor is a seq2seq network with attention mechanism to generate the parameterized policy which is the probability distribution over feasible stations. The critic is used to estimate the expected reward, which is determined by the output station sequences generated by the actor during training, in order to reduce the training variance. The learning procedure only requires the reward calculation, thus our general method can be extended to multi-factor cases easily. Considering origin-destination (OD) trips and social equity, we expand the current metro network in Xi’an, China, based on the real mobility information of 24,770,715 mobile phone users in the whole city. The results demonstrate the effectiveness of our method.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJxAlgrYDr
PDF	https://openreview.net/pdf?id=SJxAlgrYDr
PWC	https://paperswithcode.com/paper/city-metro-network-expansion-with
Repo
Framework

A Coordinate-Free Construction of Scalable Natural Gradient


Title	A Coordinate-Free Construction of Scalable Natural Gradient
Authors	Anonymous
Abstract	Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model. Natural gradient descent is invariant to smooth reparameterizations because it is defined in a coordinate-free way, but tractable approximations are typically defined in terms of coordinate systems, and hence may lose the invariance properties. We analyze the invariance properties of the Kronecker-Factored Approximate Curvature (K-FAC) algorithm by constructing the algorithm in a coordinate-free way. We explicitly construct a Riemannian metric under which the natural gradient matches the K-FAC update; invariance to affine transformations of the activations follows immediately. We extend our framework to analyze the invariance properties of K-FAC appied to convolutional networks and recurrent neural networks, as well as metrics other than the usual Fisher metric.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lBYCEFDB
PDF	https://openreview.net/pdf?id=H1lBYCEFDB
PWC	https://paperswithcode.com/paper/a-coordinate-free-construction-of-scalable-1
Repo
Framework

Information Plane Analysis of Deep Neural Networks via Matrix–Based Renyi’s Entropy and Tensor Kernels


Title	Information Plane Analysis of Deep Neural Networks via Matrix–Based Renyi’s Entropy and Tensor Kernels
Authors	Anonymous
Abstract	Analyzing deep neural networks (DNNs) via information plane (IP) theory has gained tremendous attention recently as a tool to gain insight into, among others, their generalization ability. However, it is by no means obvious how to estimate mutual information (MI) between each hidden layer and the input/desired output, to construct the IP. For instance, hidden layers with many neurons require MI estimators with robustness towards the high dimensionality associated with such layers. MI estimators should also be able to naturally handle convolutional layers, while at the same time being computationally tractable to scale to large networks. None of the existing IP methods to date have been able to study truly deep Convolutional Neural Networks (CNNs), such as the e.g.\ VGG-16. In this paper, we propose an IP analysis using the new matrix–based R'enyi’s entropy coupled with tensor kernels over convolutional layers, leveraging the power of kernel methods to represent properties of the probability distribution independently of the dimensionality of the data. The obtained results shed new light on the previous literature concerning small-scale DNNs, however using a completely new approach. Importantly, the new framework enables us to provide the first comprehensive IP analysis of contemporary large-scale DNNs and CNNs, investigating the different training phases and providing new insights into the training dynamics of large-scale neural networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1l0wp4tvr
PDF	https://openreview.net/pdf?id=B1l0wp4tvr
PWC	https://paperswithcode.com/paper/information-plane-analysis-of-deep-neural-1
Repo
Framework

Mixing Up Real Samples and Adversarial Samples for Semi-Supervised Learning


Title	Mixing Up Real Samples and Adversarial Samples for Semi-Supervised Learning
Authors	Anonymous
Abstract	Consistency regularization methods have shown great success in semi-supervised learning tasks. Most existing methods focus on either the local neighborhood or in-between neighborhood of training samples to enforce the consistency constraint. In this paper, we propose a novel generalized framework called Adversarial Mixup (AdvMixup), which unifies the local and in-between neighborhood approaches by defining a virtual data distribution along the paths between the training samples and adversarial samples. Experimental results on both synthetic data and benchmark datasets exhibit the benefits of AdvMixup on semi-supervised learning.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rklKdJSYPS
PDF	https://openreview.net/pdf?id=rklKdJSYPS
PWC	https://paperswithcode.com/paper/mixing-up-real-samples-and-adversarial
Repo
Framework

Mirror-Generative Neural Machine Translation


Title	Mirror-Generative Neural Machine Translation
Authors	Anonymous
Abstract	Training neural machine translation models (NMT) requires a large amount of parallel corpus, which is scarce for many language pairs. However, raw non-parallel corpora are often easy to obtain. Existing approaches have not exploited the full potential of non-parallel bilingual data either in training or decoding. In this paper, we propose the mirror-generative NMT (MGNMT), a single unified architecture that simultaneously integrates the source to target translation model, the target to source translation model, and two language models. Both translation models and language models share the same latent semantic space, therefore both translation directions can learn from non-parallel data more effectively. Besides, the translation models and language models can collaborate together during decoding. Our experiments show that the proposed MGNMT consistently outperforms existing approaches in all a variety of scenarios and language pairs, including resource-rich and low-resource languages.
Tasks	Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxQRTNYPH
PDF	https://openreview.net/pdf?id=HkxQRTNYPH
PWC	https://paperswithcode.com/paper/mirror-generative-neural-machine-translation
Repo
Framework

Domain Adaptation via Low-Rank Basis Approximation


Title	Domain Adaptation via Low-Rank Basis Approximation
Authors	Anonymous
Abstract	Domain adaptation focuses on the reuse of supervised learning models in a new context. Prominent applications can be found in robotics, image processing or web mining. In these areas, learning scenarios change by nature, but often remain related and motivate the reuse of existing supervised models. While the majority of symmetric and asymmetric domain adaptation algorithms utilize all available source and target domain data, we show that efficient domain adaptation requires only a substantially smaller subset from both domains. This makes it more suitable for real-world scenarios where target domain data is rare. The presented approach finds a target subspace representation for source and target data to address domain differences by orthogonal basis transfer. By employing a low-rank approximation, the approach remains low in computational time. The presented idea is evaluated in typical domain adaptation tasks with standard benchmark data.
Tasks	Domain Adaptation
Published	2020-01-01
URL	https://openreview.net/forum?id=BJeUs3VFPH
PDF	https://openreview.net/pdf?id=BJeUs3VFPH
PWC	https://paperswithcode.com/paper/domain-adaptation-via-low-rank-basis-1
Repo
Framework