July 30, 2019

2749 words 13 mins read

Paper Group AWR 69

Paper Group AWR 69

Improving speech recognition by revising gated recurrent units. The Numerics of GANs. AP17-OLR Challenge: Data, Plan, and Baseline. Word Translation Without Parallel Data. The Implicit Bias of Gradient Descent on Separable Data. Learning Deep Latent Spaces for Multi-Label Classification. End-to-End Learning of Geometry and Context for Deep Stereo R …

Improving speech recognition by revising gated recurrent units

Title Improving speech recognition by revising gated recurrent units
Authors Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Abstract Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability to learn long-term dependencies and robustness to vanishing gradients. Nevertheless, LSTMs have a rather complex design with three multiplicative gates, that might impair their efficient implementation. An attempt to simplify LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just two multiplicative gates. This paper builds on these efforts by further revising GRUs and proposing a simplified architecture potentially more suitable for speech recognition. The contribution of this work is two-fold. First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture. Second, we propose to replace tanh with ReLU activations in the state update equations. Results show that, in our implementation, the revised architecture reduces the per-epoch training time with more than 30% and consistently improves recognition performance across different tasks, input features, and noisy conditions when compared to a standard GRU.
Tasks Speech Recognition
Published 2017-09-29
URL http://arxiv.org/abs/1710.00641v1
PDF http://arxiv.org/pdf/1710.00641v1.pdf
PWC https://paperswithcode.com/paper/improving-speech-recognition-by-revising
Repo https://github.com/mravanelli/theano-kaldi-rnn
Framework pytorch

The Numerics of GANs

Title The Numerics of GANs
Authors Lars Mescheder, Sebastian Nowozin, Andreas Geiger
Abstract In this paper, we analyze the numerics of common algorithms for training Generative Adversarial Networks (GANs). Using the formalism of smooth two-player games we analyze the associated gradient vector field of GAN training objectives. Our findings suggest that the convergence of current algorithms suffers due to two factors: i) presence of eigenvalues of the Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues with big imaginary part. Using these findings, we design a new algorithm that overcomes some of these limitations and has better convergence properties. Experimentally, we demonstrate its superiority on training common GAN architectures and show convergence on GAN architectures that are known to be notoriously hard to train.
Tasks
Published 2017-05-30
URL http://arxiv.org/abs/1705.10461v3
PDF http://arxiv.org/pdf/1705.10461v3.pdf
PWC https://paperswithcode.com/paper/the-numerics-of-gans
Repo https://github.com/nhynes/abc
Framework pytorch

AP17-OLR Challenge: Data, Plan, and Baseline

Title AP17-OLR Challenge: Data, Plan, and Baseline
Authors Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen
Abstract We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR. Compared to the event last year (AP16-OLR), the new challenge involves more languages and focuses more on short utterances. The data is offered by SpeechOcean and the NSFC M2ASR project. Two types of baselines are constructed to assist the participants, one is based on the i-vector model and the other is based on various neural networks. We report the baseline results evaluated with various metrics defined by the AP17-OLR evaluation plan and demonstrate that the combined database is a reasonable data resource for multilingual research. All the data is free for participants, and the Kaldi recipes for the baselines have been published online.
Tasks
Published 2017-06-28
URL http://arxiv.org/abs/1706.09742v1
PDF http://arxiv.org/pdf/1706.09742v1.pdf
PWC https://paperswithcode.com/paper/ap17-olr-challenge-data-plan-and-baseline
Repo https://github.com/Rithmax/Sub-band-Envelope-Features-Using-Frequency-Domain-Linear-Prediction
Framework none

Word Translation Without Parallel Data

Title Word Translation Without Parallel Data
Authors Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
Abstract State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a common alphabet. In this work, we show that we can build a bilingual dictionary between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Without using any character information, our model even outperforms existing supervised methods on cross-lingual tasks for some language pairs. Our experiments demonstrate that our method works very well also for distant language pairs, like English-Russian or English-Chinese. We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation. Our code, embeddings and dictionaries are publicly available.
Tasks Machine Translation, Unsupervised Machine Translation, Word Alignment, Word Embeddings
Published 2017-10-11
URL http://arxiv.org/abs/1710.04087v3
PDF http://arxiv.org/pdf/1710.04087v3.pdf
PWC https://paperswithcode.com/paper/word-translation-without-parallel-data
Repo https://github.com/facebookresearch/MUSE
Framework pytorch

The Implicit Bias of Gradient Descent on Separable Data

Title The Implicit Bias of Gradient Descent on Separable Data
Authors Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro
Abstract We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.
Tasks
Published 2017-10-27
URL http://arxiv.org/abs/1710.10345v4
PDF http://arxiv.org/pdf/1710.10345v4.pdf
PWC https://paperswithcode.com/paper/the-implicit-bias-of-gradient-descent-on
Repo https://github.com/paper-submissions/MaxMargin
Framework pytorch

Learning Deep Latent Spaces for Multi-Label Classification

Title Learning Deep Latent Spaces for Multi-Label Classification
Authors Chih-Kuan Yeh, Wei-Chieh Wu, Wei-Jen Ko, Yu-Chiang Frank Wang
Abstract Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of label-correlation sensitive loss function for recovering the predicted label outputs. Our C2AE is achieved by integrating the DNN architectures of canonical correlation analysis and autoencoder, which allows end-to-end learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with missing labels. Our experiments on multiple datasets with different scales confirm the effectiveness and robustness of our proposed method, which is shown to perform favorably against state-of-the-art methods for multi-label classification.
Tasks Multi-Label Classification
Published 2017-07-03
URL http://arxiv.org/abs/1707.00418v1
PDF http://arxiv.org/pdf/1707.00418v1.pdf
PWC https://paperswithcode.com/paper/learning-deep-latent-spaces-for-multi-label
Repo https://github.com/yankeesrules/C2AE
Framework none

End-to-End Learning of Geometry and Context for Deep Stereo Regression

Title End-to-End Learning of Geometry and Context for Deep Stereo Regression
Authors Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, Adam Bry
Abstract We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem’s geometry to form a cost volume using deep feature representations. We learn to incorporate contextual information using 3-D convolutions over this volume. Disparity values are regressed from the cost volume using a proposed differentiable soft argmin operation, which allows us to train our method end-to-end to sub-pixel accuracy without any additional post-processing or regularization. We evaluate our method on the Scene Flow and KITTI datasets and on KITTI we set a new state-of-the-art benchmark, while being significantly faster than competing approaches.
Tasks
Published 2017-03-13
URL http://arxiv.org/abs/1703.04309v1
PDF http://arxiv.org/pdf/1703.04309v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-learning-of-geometry-and-context
Repo https://github.com/JiaRenChang/PSMNet
Framework pytorch

Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction

Title Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction
Authors Gang Cao, Lihui Huang, Huawei Tian, Xianglin Huang, Yongbin Wang, Ruicong Zhi
Abstract As an efficient image contrast enhancement (CE) tool, adaptive gamma correction (AGC) was previously proposed by relating gamma parameter with cumulative distribution function (CDF) of the pixel gray levels within an image. ACG deals well with most dimmed images, but fails for globally bright images and the dimmed images with local bright regions. Such two categories of brightness-distorted images are universal in real scenarios, such as improper exposure and white object regions. In order to attenuate such deficiencies, here we propose an improved AGC algorithm. The novel strategy of negative images is used to realize CE of the bright images, and the gamma correction modulated by truncated CDF is employed to enhance the dimmed ones. As such, local over-enhancement and structure distortion can be alleviated. Both qualitative and quantitative experimental results show that our proposed method yields consistently good CE results.
Tasks
Published 2017-09-13
URL http://arxiv.org/abs/1709.04427v1
PDF http://arxiv.org/pdf/1709.04427v1.pdf
PWC https://paperswithcode.com/paper/contrast-enhancement-of-brightness-distorted
Repo https://github.com/leowang7/iagcwd
Framework none

Dance Dance Convolution

Title Dance Dance Convolution
Authors Chris Donahue, Zachary C. Lipton, Julian McAuley
Abstract Dance Dance Revolution (DDR) is a popular rhythm-based video game. Players perform steps on a dance platform in synchronization with music as directed by on-screen step charts. While many step charts are available in standardized packs, players may grow tired of existing charts, or wish to dance to a song for which no chart exists. We introduce the task of learning to choreograph. Given a raw audio track, the goal is to produce a new step chart. This task decomposes naturally into two subtasks: deciding when to place steps and deciding which steps to select. For the step placement task, we combine recurrent and convolutional neural networks to ingest spectrograms of low-level audio features to predict steps, conditioned on chart difficulty. For step selection, we present a conditional LSTM generative model that substantially outperforms n-gram and fixed-window approaches.
Tasks
Published 2017-03-20
URL http://arxiv.org/abs/1703.06891v3
PDF http://arxiv.org/pdf/1703.06891v3.pdf
PWC https://paperswithcode.com/paper/dance-dance-convolution
Repo https://github.com/chrisdonahue/ddc
Framework tf

All-but-the-Top: Simple and Effective Postprocessing for Word Representations

Title All-but-the-Top: Simple and Effective Postprocessing for Word Representations
Authors Jiaqi Mu, Suma Bhat, Pramod Viswanath
Abstract Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a {\em very simple}, and yet counter-intuitive, postprocessing technique – eliminate the common mean vector and a few top dominating directions from the word vectors – that renders off-the-shelf representations {\em even stronger}. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and { text classification}) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.
Tasks Sentiment Analysis, Subjectivity Analysis, Text Classification
Published 2017-02-05
URL http://arxiv.org/abs/1702.01417v2
PDF http://arxiv.org/pdf/1702.01417v2.pdf
PWC https://paperswithcode.com/paper/all-but-the-top-simple-and-effective
Repo https://github.com/lgalke/vec4ir
Framework tf

Safe Model-based Reinforcement Learning with Stability Guarantees

Title Safe Model-based Reinforcement Learning with Stability Guarantees
Authors Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause
Abstract Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.
Tasks
Published 2017-05-23
URL http://arxiv.org/abs/1705.08551v3
PDF http://arxiv.org/pdf/1705.08551v3.pdf
PWC https://paperswithcode.com/paper/safe-model-based-reinforcement-learning-with
Repo https://github.com/befelix/safe_learning
Framework tf

Pixel Deconvolutional Networks

Title Pixel Deconvolutional Networks
Authors Hongyang Gao, Hao Yuan, Zhengyang Wang, Shuiwang Ji
Abstract Deconvolutional layers have been widely used in a variety of deep models for up-sampling, including encoder-decoder networks for semantic segmentation and deep generative models for unsupervised learning. One of the key limitations of deconvolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pixels on the output feature map. To address this problem, we propose the pixel deconvolutional layer (PixelDCL) to establish direct relationships among adjacent pixels on the up-sampled feature map. Our method is based on a fresh interpretation of the regular deconvolution operation. The resulting PixelDCL can be used to replace any deconvolutional layer in a plug-and-play manner without compromising the fully trainable capabilities of original models. The proposed PixelDCL may result in slight decrease in efficiency, but this can be overcome by an implementation trick. Experimental results on semantic segmentation demonstrate that PixelDCL can consider spatial features such as edges and shapes and yields more accurate segmentation outputs than deconvolutional layers. When used in image generation tasks, our PixelDCL can largely overcome the checkerboard problem suffered by regular deconvolution operations.
Tasks Image Generation, Semantic Segmentation
Published 2017-05-18
URL http://arxiv.org/abs/1705.06820v4
PDF http://arxiv.org/pdf/1705.06820v4.pdf
PWC https://paperswithcode.com/paper/pixel-deconvolutional-networks
Repo https://github.com/fourmi1995/IronsegExperiment-PixelDCL
Framework tf

Stabilizing Training of Generative Adversarial Networks through Regularization

Title Stabilizing Training of Generative Adversarial Networks through Regularization
Authors Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann
Abstract Deep generative models based on Generative Adversarial Networks (GANs) have demonstrated impressive sample quality but in order to work they require a careful choice of architecture, parameter initialization, and selection of hyper-parameters. This fragility is in part due to a dimensional mismatch or non-overlapping support between the model distribution and the data distribution, causing their density ratio and the associated f-divergence to be undefined. We overcome this fundamental limitation and propose a new regularization approach with low computational cost that yields a stable GAN training procedure. We demonstrate the effectiveness of this regularizer across several architectures trained on common benchmark image generation tasks. Our regularization turns GAN models into reliable building blocks for deep learning.
Tasks Image Generation
Published 2017-05-25
URL http://arxiv.org/abs/1705.09367v2
PDF http://arxiv.org/pdf/1705.09367v2.pdf
PWC https://paperswithcode.com/paper/stabilizing-training-of-generative
Repo https://github.com/rothk/Stabilizing_GANs
Framework tf

Federated Multi-Task Learning

Title Federated Multi-Task Learning
Authors Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, Ameet Talwalkar
Abstract Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical systems issues. Our method and theory for the first time consider issues of high communication cost, stragglers, and fault tolerance for distributed multi-task learning. The resulting method achieves significant speedups compared to alternatives in the federated setting, as we demonstrate through simulations on real-world federated datasets.
Tasks Multi-Task Learning
Published 2017-05-30
URL http://arxiv.org/abs/1705.10467v2
PDF http://arxiv.org/pdf/1705.10467v2.pdf
PWC https://paperswithcode.com/paper/federated-multi-task-learning
Repo https://github.com/gingsmith/fmtl
Framework none

Large-Scale Optimal Transport and Mapping Estimation

Title Large-Scale Optimal Transport and Mapping Estimation
Authors Vivien Seguy, Bharath Bhushan Damodaran, Rémi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel
Abstract This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a \textit{Monge map} as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. This parameterization allows generalization of the mapping outside the support of the input measure. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling.
Tasks Domain Adaptation
Published 2017-11-07
URL http://arxiv.org/abs/1711.02283v2
PDF http://arxiv.org/pdf/1711.02283v2.pdf
PWC https://paperswithcode.com/paper/large-scale-optimal-transport-and-mapping
Repo https://github.com/mikigom/large-scale-OT-mapping-TF
Framework tf
comments powered by Disqus