Paper Group AWR 69
Improving speech recognition by revising gated recurrent units. The Numerics of GANs. AP17-OLR Challenge: Data, Plan, and Baseline. Word Translation Without Parallel Data. The Implicit Bias of Gradient Descent on Separable Data. Learning Deep Latent Spaces for Multi-Label Classification. End-to-End Learning of Geometry and Context for Deep Stereo R …
Improving speech recognition by revising gated recurrent units
Title | Improving speech recognition by revising gated recurrent units |
Authors | Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio |
Abstract | Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability to learn long-term dependencies and robustness to vanishing gradients. Nevertheless, LSTMs have a rather complex design with three multiplicative gates, that might impair their efficient implementation. An attempt to simplify LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just two multiplicative gates. This paper builds on these efforts by further revising GRUs and proposing a simplified architecture potentially more suitable for speech recognition. The contribution of this work is two-fold. First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture. Second, we propose to replace tanh with ReLU activations in the state update equations. Results show that, in our implementation, the revised architecture reduces the per-epoch training time with more than 30% and consistently improves recognition performance across different tasks, input features, and noisy conditions when compared to a standard GRU. |
Tasks | Speech Recognition |
Published | 2017-09-29 |
URL | http://arxiv.org/abs/1710.00641v1 |
http://arxiv.org/pdf/1710.00641v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-speech-recognition-by-revising |
Repo | https://github.com/mravanelli/theano-kaldi-rnn |
Framework | pytorch |
The Numerics of GANs
Title | The Numerics of GANs |
Authors | Lars Mescheder, Sebastian Nowozin, Andreas Geiger |
Abstract | In this paper, we analyze the numerics of common algorithms for training Generative Adversarial Networks (GANs). Using the formalism of smooth two-player games we analyze the associated gradient vector field of GAN training objectives. Our findings suggest that the convergence of current algorithms suffers due to two factors: i) presence of eigenvalues of the Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues with big imaginary part. Using these findings, we design a new algorithm that overcomes some of these limitations and has better convergence properties. Experimentally, we demonstrate its superiority on training common GAN architectures and show convergence on GAN architectures that are known to be notoriously hard to train. |
Tasks | |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10461v3 |
http://arxiv.org/pdf/1705.10461v3.pdf | |
PWC | https://paperswithcode.com/paper/the-numerics-of-gans |
Repo | https://github.com/nhynes/abc |
Framework | pytorch |
AP17-OLR Challenge: Data, Plan, and Baseline
Title | AP17-OLR Challenge: Data, Plan, and Baseline |
Authors | Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen |
Abstract | We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR. Compared to the event last year (AP16-OLR), the new challenge involves more languages and focuses more on short utterances. The data is offered by SpeechOcean and the NSFC M2ASR project. Two types of baselines are constructed to assist the participants, one is based on the i-vector model and the other is based on various neural networks. We report the baseline results evaluated with various metrics defined by the AP17-OLR evaluation plan and demonstrate that the combined database is a reasonable data resource for multilingual research. All the data is free for participants, and the Kaldi recipes for the baselines have been published online. |
Tasks | |
Published | 2017-06-28 |
URL | http://arxiv.org/abs/1706.09742v1 |
http://arxiv.org/pdf/1706.09742v1.pdf | |
PWC | https://paperswithcode.com/paper/ap17-olr-challenge-data-plan-and-baseline |
Repo | https://github.com/Rithmax/Sub-band-Envelope-Features-Using-Frequency-Domain-Linear-Prediction |
Framework | none |
Word Translation Without Parallel Data
Title | Word Translation Without Parallel Data |
Authors | Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou |
Abstract | State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a common alphabet. In this work, we show that we can build a bilingual dictionary between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Without using any character information, our model even outperforms existing supervised methods on cross-lingual tasks for some language pairs. Our experiments demonstrate that our method works very well also for distant language pairs, like English-Russian or English-Chinese. We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation. Our code, embeddings and dictionaries are publicly available. |
Tasks | Machine Translation, Unsupervised Machine Translation, Word Alignment, Word Embeddings |
Published | 2017-10-11 |
URL | http://arxiv.org/abs/1710.04087v3 |
http://arxiv.org/pdf/1710.04087v3.pdf | |
PWC | https://paperswithcode.com/paper/word-translation-without-parallel-data |
Repo | https://github.com/facebookresearch/MUSE |
Framework | pytorch |
The Implicit Bias of Gradient Descent on Separable Data
Title | The Implicit Bias of Gradient Descent on Separable Data |
Authors | Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro |
Abstract | We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods. |
Tasks | |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.10345v4 |
http://arxiv.org/pdf/1710.10345v4.pdf | |
PWC | https://paperswithcode.com/paper/the-implicit-bias-of-gradient-descent-on |
Repo | https://github.com/paper-submissions/MaxMargin |
Framework | pytorch |
Learning Deep Latent Spaces for Multi-Label Classification
Title | Learning Deep Latent Spaces for Multi-Label Classification |
Authors | Chih-Kuan Yeh, Wei-Chieh Wu, Wei-Jen Ko, Yu-Chiang Frank Wang |
Abstract | Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of label-correlation sensitive loss function for recovering the predicted label outputs. Our C2AE is achieved by integrating the DNN architectures of canonical correlation analysis and autoencoder, which allows end-to-end learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with missing labels. Our experiments on multiple datasets with different scales confirm the effectiveness and robustness of our proposed method, which is shown to perform favorably against state-of-the-art methods for multi-label classification. |
Tasks | Multi-Label Classification |
Published | 2017-07-03 |
URL | http://arxiv.org/abs/1707.00418v1 |
http://arxiv.org/pdf/1707.00418v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-deep-latent-spaces-for-multi-label |
Repo | https://github.com/yankeesrules/C2AE |
Framework | none |
End-to-End Learning of Geometry and Context for Deep Stereo Regression
Title | End-to-End Learning of Geometry and Context for Deep Stereo Regression |
Authors | Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, Adam Bry |
Abstract | We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem’s geometry to form a cost volume using deep feature representations. We learn to incorporate contextual information using 3-D convolutions over this volume. Disparity values are regressed from the cost volume using a proposed differentiable soft argmin operation, which allows us to train our method end-to-end to sub-pixel accuracy without any additional post-processing or regularization. We evaluate our method on the Scene Flow and KITTI datasets and on KITTI we set a new state-of-the-art benchmark, while being significantly faster than competing approaches. |
Tasks | |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04309v1 |
http://arxiv.org/pdf/1703.04309v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-learning-of-geometry-and-context |
Repo | https://github.com/JiaRenChang/PSMNet |
Framework | pytorch |
Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction
Title | Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction |
Authors | Gang Cao, Lihui Huang, Huawei Tian, Xianglin Huang, Yongbin Wang, Ruicong Zhi |
Abstract | As an efficient image contrast enhancement (CE) tool, adaptive gamma correction (AGC) was previously proposed by relating gamma parameter with cumulative distribution function (CDF) of the pixel gray levels within an image. ACG deals well with most dimmed images, but fails for globally bright images and the dimmed images with local bright regions. Such two categories of brightness-distorted images are universal in real scenarios, such as improper exposure and white object regions. In order to attenuate such deficiencies, here we propose an improved AGC algorithm. The novel strategy of negative images is used to realize CE of the bright images, and the gamma correction modulated by truncated CDF is employed to enhance the dimmed ones. As such, local over-enhancement and structure distortion can be alleviated. Both qualitative and quantitative experimental results show that our proposed method yields consistently good CE results. |
Tasks | |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04427v1 |
http://arxiv.org/pdf/1709.04427v1.pdf | |
PWC | https://paperswithcode.com/paper/contrast-enhancement-of-brightness-distorted |
Repo | https://github.com/leowang7/iagcwd |
Framework | none |
Dance Dance Convolution
Title | Dance Dance Convolution |
Authors | Chris Donahue, Zachary C. Lipton, Julian McAuley |
Abstract | Dance Dance Revolution (DDR) is a popular rhythm-based video game. Players perform steps on a dance platform in synchronization with music as directed by on-screen step charts. While many step charts are available in standardized packs, players may grow tired of existing charts, or wish to dance to a song for which no chart exists. We introduce the task of learning to choreograph. Given a raw audio track, the goal is to produce a new step chart. This task decomposes naturally into two subtasks: deciding when to place steps and deciding which steps to select. For the step placement task, we combine recurrent and convolutional neural networks to ingest spectrograms of low-level audio features to predict steps, conditioned on chart difficulty. For step selection, we present a conditional LSTM generative model that substantially outperforms n-gram and fixed-window approaches. |
Tasks | |
Published | 2017-03-20 |
URL | http://arxiv.org/abs/1703.06891v3 |
http://arxiv.org/pdf/1703.06891v3.pdf | |
PWC | https://paperswithcode.com/paper/dance-dance-convolution |
Repo | https://github.com/chrisdonahue/ddc |
Framework | tf |
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
Title | All-but-the-Top: Simple and Effective Postprocessing for Word Representations |
Authors | Jiaqi Mu, Suma Bhat, Pramod Viswanath |
Abstract | Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a {\em very simple}, and yet counter-intuitive, postprocessing technique – eliminate the common mean vector and a few top dominating directions from the word vectors – that renders off-the-shelf representations {\em even stronger}. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and { text classification}) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones. |
Tasks | Sentiment Analysis, Subjectivity Analysis, Text Classification |
Published | 2017-02-05 |
URL | http://arxiv.org/abs/1702.01417v2 |
http://arxiv.org/pdf/1702.01417v2.pdf | |
PWC | https://paperswithcode.com/paper/all-but-the-top-simple-and-effective |
Repo | https://github.com/lgalke/vec4ir |
Framework | tf |
Safe Model-based Reinforcement Learning with Stability Guarantees
Title | Safe Model-based Reinforcement Learning with Stability Guarantees |
Authors | Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause |
Abstract | Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down. |
Tasks | |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08551v3 |
http://arxiv.org/pdf/1705.08551v3.pdf | |
PWC | https://paperswithcode.com/paper/safe-model-based-reinforcement-learning-with |
Repo | https://github.com/befelix/safe_learning |
Framework | tf |
Pixel Deconvolutional Networks
Title | Pixel Deconvolutional Networks |
Authors | Hongyang Gao, Hao Yuan, Zhengyang Wang, Shuiwang Ji |
Abstract | Deconvolutional layers have been widely used in a variety of deep models for up-sampling, including encoder-decoder networks for semantic segmentation and deep generative models for unsupervised learning. One of the key limitations of deconvolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pixels on the output feature map. To address this problem, we propose the pixel deconvolutional layer (PixelDCL) to establish direct relationships among adjacent pixels on the up-sampled feature map. Our method is based on a fresh interpretation of the regular deconvolution operation. The resulting PixelDCL can be used to replace any deconvolutional layer in a plug-and-play manner without compromising the fully trainable capabilities of original models. The proposed PixelDCL may result in slight decrease in efficiency, but this can be overcome by an implementation trick. Experimental results on semantic segmentation demonstrate that PixelDCL can consider spatial features such as edges and shapes and yields more accurate segmentation outputs than deconvolutional layers. When used in image generation tasks, our PixelDCL can largely overcome the checkerboard problem suffered by regular deconvolution operations. |
Tasks | Image Generation, Semantic Segmentation |
Published | 2017-05-18 |
URL | http://arxiv.org/abs/1705.06820v4 |
http://arxiv.org/pdf/1705.06820v4.pdf | |
PWC | https://paperswithcode.com/paper/pixel-deconvolutional-networks |
Repo | https://github.com/fourmi1995/IronsegExperiment-PixelDCL |
Framework | tf |
Stabilizing Training of Generative Adversarial Networks through Regularization
Title | Stabilizing Training of Generative Adversarial Networks through Regularization |
Authors | Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann |
Abstract | Deep generative models based on Generative Adversarial Networks (GANs) have demonstrated impressive sample quality but in order to work they require a careful choice of architecture, parameter initialization, and selection of hyper-parameters. This fragility is in part due to a dimensional mismatch or non-overlapping support between the model distribution and the data distribution, causing their density ratio and the associated f-divergence to be undefined. We overcome this fundamental limitation and propose a new regularization approach with low computational cost that yields a stable GAN training procedure. We demonstrate the effectiveness of this regularizer across several architectures trained on common benchmark image generation tasks. Our regularization turns GAN models into reliable building blocks for deep learning. |
Tasks | Image Generation |
Published | 2017-05-25 |
URL | http://arxiv.org/abs/1705.09367v2 |
http://arxiv.org/pdf/1705.09367v2.pdf | |
PWC | https://paperswithcode.com/paper/stabilizing-training-of-generative |
Repo | https://github.com/rothk/Stabilizing_GANs |
Framework | tf |
Federated Multi-Task Learning
Title | Federated Multi-Task Learning |
Authors | Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, Ameet Talwalkar |
Abstract | Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical systems issues. Our method and theory for the first time consider issues of high communication cost, stragglers, and fault tolerance for distributed multi-task learning. The resulting method achieves significant speedups compared to alternatives in the federated setting, as we demonstrate through simulations on real-world federated datasets. |
Tasks | Multi-Task Learning |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10467v2 |
http://arxiv.org/pdf/1705.10467v2.pdf | |
PWC | https://paperswithcode.com/paper/federated-multi-task-learning |
Repo | https://github.com/gingsmith/fmtl |
Framework | none |
Large-Scale Optimal Transport and Mapping Estimation
Title | Large-Scale Optimal Transport and Mapping Estimation |
Authors | Vivien Seguy, Bharath Bhushan Damodaran, Rémi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel |
Abstract | This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a \textit{Monge map} as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. This parameterization allows generalization of the mapping outside the support of the input measure. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling. |
Tasks | Domain Adaptation |
Published | 2017-11-07 |
URL | http://arxiv.org/abs/1711.02283v2 |
http://arxiv.org/pdf/1711.02283v2.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-optimal-transport-and-mapping |
Repo | https://github.com/mikigom/large-scale-OT-mapping-TF |
Framework | tf |