July 30, 2019

2749 words 13 mins read

Paper Group AWR 69

Improving speech recognition by revising gated recurrent units. The Numerics of GANs. AP17-OLR Challenge: Data, Plan, and Baseline. Word Translation Without Parallel Data. The Implicit Bias of Gradient Descent on Separable Data. Learning Deep Latent Spaces for Multi-Label Classification. End-to-End Learning of Geometry and Context for Deep Stereo R …

Improving speech recognition by revising gated recurrent units


Title	Improving speech recognition by revising gated recurrent units
Authors	Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Abstract	Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability to learn long-term dependencies and robustness to vanishing gradients. Nevertheless, LSTMs have a rather complex design with three multiplicative gates, that might impair their efficient implementation. An attempt to simplify LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just two multiplicative gates. This paper builds on these efforts by further revising GRUs and proposing a simplified architecture potentially more suitable for speech recognition. The contribution of this work is two-fold. First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture. Second, we propose to replace tanh with ReLU activations in the state update equations. Results show that, in our implementation, the revised architecture reduces the per-epoch training time with more than 30% and consistently improves recognition performance across different tasks, input features, and noisy conditions when compared to a standard GRU.
Tasks	Speech Recognition
Published	2017-09-29
URL	http://arxiv.org/abs/1710.00641v1
PDF	http://arxiv.org/pdf/1710.00641v1.pdf
PWC	https://paperswithcode.com/paper/improving-speech-recognition-by-revising
Repo	https://github.com/mravanelli/theano-kaldi-rnn
Framework	pytorch

The Numerics of GANs


Title	The Numerics of GANs
Authors	Lars Mescheder, Sebastian Nowozin, Andreas Geiger
Abstract	In this paper, we analyze the numerics of common algorithms for training Generative Adversarial Networks (GANs). Using the formalism of smooth two-player games we analyze the associated gradient vector field of GAN training objectives. Our findings suggest that the convergence of current algorithms suffers due to two factors: i) presence of eigenvalues of the Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues with big imaginary part. Using these findings, we design a new algorithm that overcomes some of these limitations and has better convergence properties. Experimentally, we demonstrate its superiority on training common GAN architectures and show convergence on GAN architectures that are known to be notoriously hard to train.
Tasks
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10461v3
PDF	http://arxiv.org/pdf/1705.10461v3.pdf
PWC	https://paperswithcode.com/paper/the-numerics-of-gans
Repo	https://github.com/nhynes/abc
Framework	pytorch

AP17-OLR Challenge: Data, Plan, and Baseline


Title	AP17-OLR Challenge: Data, Plan, and Baseline
Authors	Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen
Abstract	We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR. Compared to the event last year (AP16-OLR), the new challenge involves more languages and focuses more on short utterances. The data is offered by SpeechOcean and the NSFC M2ASR project. Two types of baselines are constructed to assist the participants, one is based on the i-vector model and the other is based on various neural networks. We report the baseline results evaluated with various metrics defined by the AP17-OLR evaluation plan and demonstrate that the combined database is a reasonable data resource for multilingual research. All the data is free for participants, and the Kaldi recipes for the baselines have been published online.
Tasks
Published	2017-06-28
URL	http://arxiv.org/abs/1706.09742v1
PDF	http://arxiv.org/pdf/1706.09742v1.pdf
PWC	https://paperswithcode.com/paper/ap17-olr-challenge-data-plan-and-baseline
Repo	https://github.com/Rithmax/Sub-band-Envelope-Features-Using-Frequency-Domain-Linear-Prediction
Framework	none

Word Translation Without Parallel Data


Title	Word Translation Without Parallel Data
Authors	Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
Abstract	State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a common alphabet. In this work, we show that we can build a bilingual dictionary between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Without using any character information, our model even outperforms existing supervised methods on cross-lingual tasks for some language pairs. Our experiments demonstrate that our method works very well also for distant language pairs, like English-Russian or English-Chinese. We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation. Our code, embeddings and dictionaries are publicly available.
Tasks	Machine Translation, Unsupervised Machine Translation, Word Alignment, Word Embeddings
Published	2017-10-11
URL	http://arxiv.org/abs/1710.04087v3
PDF	http://arxiv.org/pdf/1710.04087v3.pdf
PWC	https://paperswithcode.com/paper/word-translation-without-parallel-data
Repo	https://github.com/facebookresearch/MUSE
Framework	pytorch

The Implicit Bias of Gradient Descent on Separable Data


Title	The Implicit Bias of Gradient Descent on Separable Data
Authors	Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro
Abstract	We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.
Tasks
Published	2017-10-27
URL	http://arxiv.org/abs/1710.10345v4
PDF	http://arxiv.org/pdf/1710.10345v4.pdf
PWC	https://paperswithcode.com/paper/the-implicit-bias-of-gradient-descent-on
Repo	https://github.com/paper-submissions/MaxMargin
Framework	pytorch

Learning Deep Latent Spaces for Multi-Label Classification


Title	Learning Deep Latent Spaces for Multi-Label Classification
Authors	Chih-Kuan Yeh, Wei-Chieh Wu, Wei-Jen Ko, Yu-Chiang Frank Wang
Abstract	Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of label-correlation sensitive loss function for recovering the predicted label outputs. Our C2AE is achieved by integrating the DNN architectures of canonical correlation analysis and autoencoder, which allows end-to-end learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with missing labels. Our experiments on multiple datasets with different scales confirm the effectiveness and robustness of our proposed method, which is shown to perform favorably against state-of-the-art methods for multi-label classification.
Tasks	Multi-Label Classification
Published	2017-07-03
URL	http://arxiv.org/abs/1707.00418v1
PDF	http://arxiv.org/pdf/1707.00418v1.pdf
PWC	https://paperswithcode.com/paper/learning-deep-latent-spaces-for-multi-label
Repo	https://github.com/yankeesrules/C2AE
Framework	none

End-to-End Learning of Geometry and Context for Deep Stereo Regression


Title	End-to-End Learning of Geometry and Context for Deep Stereo Regression
Authors	Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, Adam Bry
Abstract	We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem’s geometry to form a cost volume using deep feature representations. We learn to incorporate contextual information using 3-D convolutions over this volume. Disparity values are regressed from the cost volume using a proposed differentiable soft argmin operation, which allows us to train our method end-to-end to sub-pixel accuracy without any additional post-processing or regularization. We evaluate our method on the Scene Flow and KITTI datasets and on KITTI we set a new state-of-the-art benchmark, while being significantly faster than competing approaches.
Tasks
Published	2017-03-13
URL	http://arxiv.org/abs/1703.04309v1
PDF	http://arxiv.org/pdf/1703.04309v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-learning-of-geometry-and-context
Repo	https://github.com/JiaRenChang/PSMNet
Framework	pytorch

Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction


Title	Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction
Authors	Gang Cao, Lihui Huang, Huawei Tian, Xianglin Huang, Yongbin Wang, Ruicong Zhi
Abstract	As an efficient image contrast enhancement (CE) tool, adaptive gamma correction (AGC) was previously proposed by relating gamma parameter with cumulative distribution function (CDF) of the pixel gray levels within an image. ACG deals well with most dimmed images, but fails for globally bright images and the dimmed images with local bright regions. Such two categories of brightness-distorted images are universal in real scenarios, such as improper exposure and white object regions. In order to attenuate such deficiencies, here we propose an improved AGC algorithm. The novel strategy of negative images is used to realize CE of the bright images, and the gamma correction modulated by truncated CDF is employed to enhance the dimmed ones. As such, local over-enhancement and structure distortion can be alleviated. Both qualitative and quantitative experimental results show that our proposed method yields consistently good CE results.
Tasks
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04427v1
PDF	http://arxiv.org/pdf/1709.04427v1.pdf
PWC	https://paperswithcode.com/paper/contrast-enhancement-of-brightness-distorted
Repo	https://github.com/leowang7/iagcwd
Framework	none

Dance Dance Convolution


Title	Dance Dance Convolution
Authors	Chris Donahue, Zachary C. Lipton, Julian McAuley
Abstract	Dance Dance Revolution (DDR) is a popular rhythm-based video game. Players perform steps on a dance platform in synchronization with music as directed by on-screen step charts. While many step charts are available in standardized packs, players may grow tired of existing charts, or wish to dance to a song for which no chart exists. We introduce the task of learning to choreograph. Given a raw audio track, the goal is to produce a new step chart. This task decomposes naturally into two subtasks: deciding when to place steps and deciding which steps to select. For the step placement task, we combine recurrent and convolutional neural networks to ingest spectrograms of low-level audio features to predict steps, conditioned on chart difficulty. For step selection, we present a conditional LSTM generative model that substantially outperforms n-gram and fixed-window approaches.
Tasks
Published	2017-03-20
URL	http://arxiv.org/abs/1703.06891v3
PDF	http://arxiv.org/pdf/1703.06891v3.pdf
PWC	https://paperswithcode.com/paper/dance-dance-convolution
Repo	https://github.com/chrisdonahue/ddc
Framework	tf

All-but-the-Top: Simple and Effective Postprocessing for Word Representations


Title	All-but-the-Top: Simple and Effective Postprocessing for Word Representations
Authors	Jiaqi Mu, Suma Bhat, Pramod Viswanath
Abstract	Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a {\em very simple}, and yet counter-intuitive, postprocessing technique – eliminate the common mean vector and a few top dominating directions from the word vectors – that renders off-the-shelf representations {\em even stronger}. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and { text classification}) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.
Tasks	Sentiment Analysis, Subjectivity Analysis, Text Classification
Published	2017-02-05
URL	http://arxiv.org/abs/1702.01417v2
PDF	http://arxiv.org/pdf/1702.01417v2.pdf
PWC	https://paperswithcode.com/paper/all-but-the-top-simple-and-effective
Repo	https://github.com/lgalke/vec4ir
Framework	tf

Safe Model-based Reinforcement Learning with Stability Guarantees


Title	Safe Model-based Reinforcement Learning with Stability Guarantees
Authors	Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause
Abstract	Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.
Tasks
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08551v3
PDF	http://arxiv.org/pdf/1705.08551v3.pdf
PWC	https://paperswithcode.com/paper/safe-model-based-reinforcement-learning-with
Repo	https://github.com/befelix/safe_learning
Framework	tf

Pixel Deconvolutional Networks


Title	Pixel Deconvolutional Networks
Authors	Hongyang Gao, Hao Yuan, Zhengyang Wang, Shuiwang Ji
Abstract	Deconvolutional layers have been widely used in a variety of deep models for up-sampling, including encoder-decoder networks for semantic segmentation and deep generative models for unsupervised learning. One of the key limitations of deconvolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pixels on the output feature map. To address this problem, we propose the pixel deconvolutional layer (PixelDCL) to establish direct relationships among adjacent pixels on the up-sampled feature map. Our method is based on a fresh interpretation of the regular deconvolution operation. The resulting PixelDCL can be used to replace any deconvolutional layer in a plug-and-play manner without compromising the fully trainable capabilities of original models. The proposed PixelDCL may result in slight decrease in efficiency, but this can be overcome by an implementation trick. Experimental results on semantic segmentation demonstrate that PixelDCL can consider spatial features such as edges and shapes and yields more accurate segmentation outputs than deconvolutional layers. When used in image generation tasks, our PixelDCL can largely overcome the checkerboard problem suffered by regular deconvolution operations.
Tasks	Image Generation, Semantic Segmentation
Published	2017-05-18
URL	http://arxiv.org/abs/1705.06820v4
PDF	http://arxiv.org/pdf/1705.06820v4.pdf
PWC	https://paperswithcode.com/paper/pixel-deconvolutional-networks
Repo	https://github.com/fourmi1995/IronsegExperiment-PixelDCL
Framework	tf

Stabilizing Training of Generative Adversarial Networks through Regularization


Title	Stabilizing Training of Generative Adversarial Networks through Regularization
Authors	Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann
Abstract	Deep generative models based on Generative Adversarial Networks (GANs) have demonstrated impressive sample quality but in order to work they require a careful choice of architecture, parameter initialization, and selection of hyper-parameters. This fragility is in part due to a dimensional mismatch or non-overlapping support between the model distribution and the data distribution, causing their density ratio and the associated f-divergence to be undefined. We overcome this fundamental limitation and propose a new regularization approach with low computational cost that yields a stable GAN training procedure. We demonstrate the effectiveness of this regularizer across several architectures trained on common benchmark image generation tasks. Our regularization turns GAN models into reliable building blocks for deep learning.
Tasks	Image Generation
Published	2017-05-25
URL	http://arxiv.org/abs/1705.09367v2
PDF	http://arxiv.org/pdf/1705.09367v2.pdf
PWC	https://paperswithcode.com/paper/stabilizing-training-of-generative
Repo	https://github.com/rothk/Stabilizing_GANs
Framework	tf

Federated Multi-Task Learning


Title	Federated Multi-Task Learning
Authors	Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, Ameet Talwalkar
Abstract	Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical systems issues. Our method and theory for the first time consider issues of high communication cost, stragglers, and fault tolerance for distributed multi-task learning. The resulting method achieves significant speedups compared to alternatives in the federated setting, as we demonstrate through simulations on real-world federated datasets.
Tasks	Multi-Task Learning
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10467v2
PDF	http://arxiv.org/pdf/1705.10467v2.pdf
PWC	https://paperswithcode.com/paper/federated-multi-task-learning
Repo	https://github.com/gingsmith/fmtl
Framework	none

Large-Scale Optimal Transport and Mapping Estimation


Title	Large-Scale Optimal Transport and Mapping Estimation
Authors	Vivien Seguy, Bharath Bhushan Damodaran, Rémi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel
Abstract	This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a \textit{Monge map} as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. This parameterization allows generalization of the mapping outside the support of the input measure. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling.
Tasks	Domain Adaptation
Published	2017-11-07
URL	http://arxiv.org/abs/1711.02283v2
PDF	http://arxiv.org/pdf/1711.02283v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-optimal-transport-and-mapping
Repo	https://github.com/mikigom/large-scale-OT-mapping-TF
Framework	tf