July 27, 2019

2796 words 14 mins read

Paper Group ANR 621

Paper Group ANR 621

What does Attention in Neural Machine Translation Pay Attention to?. Matching neural paths: transfer from recognition to correspondence search. Linearly constrained Gaussian processes. Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition. Robust Speech Recognition Using Generative Adversarial Networks. A H …

What does Attention in Neural Machine Translation Pay Attention to?

Title What does Attention in Neural Machine Translation Pay Attention to?
Authors Hamidreza Ghader, Christof Monz
Abstract Attention in neural machine translation provides the possibility to encode relevant parts of the source sentence at each translation step. As a result, attention is considered to be an alignment model as well. However, there is no work that specifically studies attention and provides analysis of what is being learned by attention models. Thus, the question still remains that how attention is similar or different from the traditional alignment. In this paper, we provide detailed analysis of attention and compare it to traditional alignment. We answer the question of whether attention is only capable of modelling translational equivalent or it captures more information. We show that attention is different from alignment in some cases and is capturing useful information other than alignments.
Tasks Machine Translation
Published 2017-10-09
URL http://arxiv.org/abs/1710.03348v1
PDF http://arxiv.org/pdf/1710.03348v1.pdf
PWC https://paperswithcode.com/paper/what-does-attention-in-neural-machine
Repo
Framework
Title Matching neural paths: transfer from recognition to correspondence search
Authors Nikolay Savinov, Lubor Ladicky, Marc Pollefeys
Abstract Many machine learning tasks require finding per-part correspondences between objects. In this work we focus on low-level correspondences - a highly ambiguous matching problem. We propose to use a hierarchical semantic representation of the objects, coming from a convolutional neural network, to solve this ambiguity. Training it for low-level correspondence prediction directly might not be an option in some domains where the ground-truth correspondences are hard to obtain. We show how transfer from recognition can be used to avoid such training. Our idea is to mark parts as “matching” if their features are close to each other at all the levels of convolutional feature hierarchy (neural paths). Although the overall number of such paths is exponential in the number of layers, we propose a polynomial algorithm for aggregating all of them in a single backward pass. The empirical validation is done on the task of stereo correspondence and demonstrates that we achieve competitive results among the methods which do not use labeled target domain data.
Tasks
Published 2017-05-19
URL http://arxiv.org/abs/1705.08272v3
PDF http://arxiv.org/pdf/1705.08272v3.pdf
PWC https://paperswithcode.com/paper/matching-neural-paths-transfer-from
Repo
Framework

Linearly constrained Gaussian processes

Title Linearly constrained Gaussian processes
Authors Carl Jidling, Niklas Wahlström, Adrian Wills, Thomas B. Schön
Abstract We consider a modification of the covariance function in Gaussian processes to correctly account for known linear constraints. By modelling the target function as a transformation of an underlying function, the constraints are explicitly incorporated in the model such that they are guaranteed to be fulfilled by any sample drawn or prediction made. We also propose a constructive procedure for designing the transformation operator and illustrate the result on both simulated and real-data examples.
Tasks Gaussian Processes
Published 2017-03-02
URL http://arxiv.org/abs/1703.00787v2
PDF http://arxiv.org/pdf/1703.00787v2.pdf
PWC https://paperswithcode.com/paper/linearly-constrained-gaussian-processes
Repo
Framework

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition

Title Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition
Authors Chris Donahue, Bo Li, Rohit Prabhavalkar
Abstract We investigate the effectiveness of generative adversarial networks (GANs) for speech enhancement, in the context of improving noise robustness of automatic speech recognition (ASR) systems. Prior work demonstrates that GANs can effectively suppress additive noise in raw waveform speech signals, improving perceptual quality metrics; however this technique was not justified in the context of ASR. In this work, we conduct a detailed study to measure the effectiveness of GANs in enhancing speech contaminated by both additive and reverberant noise. Motivated by recent advances in image processing, we propose operating GANs on log-Mel filterbank spectra instead of waveforms, which requires less computation and is more robust to reverberant noise. While GAN enhancement improves the performance of a clean-trained ASR system on noisy speech, it falls short of the performance achieved by conventional multi-style training (MTR). By appending the GAN-enhanced features to the noisy inputs and retraining, we achieve a 7% WER improvement relative to the MTR system.
Tasks Robust Speech Recognition, Speech Enhancement, Speech Recognition
Published 2017-11-15
URL http://arxiv.org/abs/1711.05747v2
PDF http://arxiv.org/pdf/1711.05747v2.pdf
PWC https://paperswithcode.com/paper/exploring-speech-enhancement-with-generative
Repo
Framework

Robust Speech Recognition Using Generative Adversarial Networks

Title Robust Speech Recognition Using Generative Adversarial Networks
Authors Anuroop Sriram, Heewoo Jun, Yashesh Gaur, Sanjeev Satheesh
Abstract This paper describes a general, scalable, end-to-end framework that uses the generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Unlike previous methods, the new framework does not rely on domain expertise or simplifying assumptions as are often needed in signal processing, and directly encourages robustness in a data-driven way. We show the new approach improves simulated far-field speech recognition of vanilla sequence-to-sequence models without specialized front-ends or preprocessing.
Tasks Robust Speech Recognition, Speech Recognition
Published 2017-11-05
URL http://arxiv.org/abs/1711.01567v1
PDF http://arxiv.org/pdf/1711.01567v1.pdf
PWC https://paperswithcode.com/paper/robust-speech-recognition-using-generative
Repo
Framework

A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification

Title A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification
Authors Hamid Eghbal-zadeh, Bernhard Lehner, Matthias Dorfer, Gerhard Widmer
Abstract In Acoustic Scene Classification (ASC) two major approaches have been followed . While one utilizes engineered features such as mel-frequency-cepstral-coefficients (MFCCs), the other uses learned features that are the outcome of an optimization algorithm. I-vectors are the result of a modeling technique that usually takes engineered features as input. It has been shown that standard MFCCs extracted from monaural audio signals lead to i-vectors that exhibit poor performance, especially on indoor acoustic scenes. At the same time, Convolutional Neural Networks (CNNs) are well known for their ability to learn features by optimizing their filters. They have been applied on ASC and have shown promising results. In this paper, we first propose a novel multi-channel i-vector extraction and scoring scheme for ASC, improving their performance on indoor and outdoor scenes. Second, we propose a CNN architecture that achieves promising ASC results. Further, we show that i-vectors and CNNs capture complementary information from acoustic scenes. Finally, we propose a hybrid system for ASC using multi-channel i-vectors and CNNs by utilizing a score fusion technique. Using our method, we participated in the ASC task of the DCASE-2016 challenge. Our hybrid approach achieved 1 st rank among 49 submissions, substantially improving the previous state of the art.
Tasks Acoustic Scene Classification, Scene Classification
Published 2017-06-20
URL http://arxiv.org/abs/1706.06525v1
PDF http://arxiv.org/pdf/1706.06525v1.pdf
PWC https://paperswithcode.com/paper/a-hybrid-approach-with-multi-channel-i
Repo
Framework

Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes

Title Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes
Authors Hassan Ashtiani, Shai Ben-David, Nick Harvey, Christopher Liaw, Abbas Mehrabian, Yaniv Plan
Abstract We prove that $\tilde{\Theta}(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for learning a mixture of $k$ Gaussians in $\mathbb{R}^d$, up to error $\varepsilon$ in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that $\tilde{O}(k d / \varepsilon^2)$ samples suffice, matching a known lower bound. Moreover, these results hold in the agnostic-learning/robust-estimation setting as well, where the target distribution is only approximately a mixture of Gaussians. The upper bound is shown using a novel technique for distribution learning based on a notion of `compression.’ Any class of distributions that allows such a compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in $\mathbb{R}^d$ admits a small-sized compression scheme. |
Tasks
Published 2017-10-14
URL https://arxiv.org/abs/1710.05209v4
PDF https://arxiv.org/pdf/1710.05209v4.pdf
PWC https://paperswithcode.com/paper/near-optimal-sample-complexity-bounds-for
Repo
Framework

DNN Filter Bank Cepstral Coefficients for Spoofing Detection

Title DNN Filter Bank Cepstral Coefficients for Spoofing Detection
Authors Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo
Abstract With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof {2015} database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks.
Tasks Speaker Verification, Speech Synthesis
Published 2017-02-13
URL http://arxiv.org/abs/1702.03791v1
PDF http://arxiv.org/pdf/1702.03791v1.pdf
PWC https://paperswithcode.com/paper/dnn-filter-bank-cepstral-coefficients-for
Repo
Framework

Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation

Title Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation
Authors Wei-Ning Hsu, Yu Zhang, James Glass
Abstract Domain mismatch between training and testing can lead to significant degradation in performance in many machine learning scenarios. Unfortunately, this is not a rare situation for automatic speech recognition deployments in real-world applications. Research on robust speech recognition can be regarded as trying to overcome this domain mismatch issue. In this paper, we address the unsupervised domain adaptation problem for robust speech recognition, where both source and target domain speech are presented, but word transcripts are only available for the source domain speech. We present novel augmentation-based methods that transform speech in a way that does not change the transcripts. Specifically, we first train a variational autoencoder on both source and target domain data (without supervision) to learn a latent representation of speech. We then transform nuisance attributes of speech that are irrelevant to recognition by modifying the latent representations, in order to augment labeled training data with additional data whose distribution is more similar to the target domain. The proposed method is evaluated on the CHiME-4 dataset and reduces the absolute word error rate (WER) by as much as 35% compared to the non-adapted baseline.
Tasks Data Augmentation, Domain Adaptation, Robust Speech Recognition, Speech Recognition, Unsupervised Domain Adaptation
Published 2017-07-19
URL http://arxiv.org/abs/1707.06265v2
PDF http://arxiv.org/pdf/1707.06265v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-domain-adaptation-for-robust
Repo
Framework

Nonparametric Neural Networks

Title Nonparametric Neural Networks
Authors George Philipp, Jaime G. Carbonell
Abstract Automatically determining the optimal size of a neural network for a given task without prior information currently requires an expensive global search and training many networks from scratch. In this paper, we address the problem of automatically finding a good network size during a single training cycle. We introduce nonparametric neural networks, a non-probabilistic framework for conducting optimization over all possible network sizes and prove its soundness when network growth is limited via an L_p penalty. We train networks under this framework by continuously adding new units while eliminating redundant units via an L_2 penalty. We employ a novel optimization algorithm, which we term adaptive radial-angular gradient descent or AdaRad, and obtain promising results.
Tasks
Published 2017-12-14
URL http://arxiv.org/abs/1712.05440v1
PDF http://arxiv.org/pdf/1712.05440v1.pdf
PWC https://paperswithcode.com/paper/nonparametric-neural-networks
Repo
Framework

An Empirical Study of Discriminative Sequence Labeling Models for Vietnamese Text Processing

Title An Empirical Study of Discriminative Sequence Labeling Models for Vietnamese Text Processing
Authors Phuong Le-Hong, Minh Pham Quang Nhat, Thai-Hoang Pham, Tuan-Anh Tran, Dang-Minh Nguyen
Abstract This paper presents an empirical study of two widely-used sequence prediction models, Conditional Random Fields (CRFs) and Long Short-Term Memory Networks (LSTMs), on two fundamental tasks for Vietnamese text processing, including part-of-speech tagging and named entity recognition. We show that a strong lower bound for labeling accuracy can be obtained by relying only on simple word-based features with minimal hand-crafted feature engineering, of 90.65% and 86.03% performance scores on the standard test sets for the two tasks respectively. In particular, we demonstrate empirically the surprising efficiency of word embeddings in both of the two tasks, with both of the two models. We point out that the state-of-the-art LSTMs model does not always outperform significantly the traditional CRFs model, especially on moderate-sized data sets. Finally, we give some suggestions and discussions for efficient use of sequence labeling models in practical applications.
Tasks Feature Engineering, Named Entity Recognition, Part-Of-Speech Tagging, Word Embeddings
Published 2017-08-30
URL http://arxiv.org/abs/1708.09163v1
PDF http://arxiv.org/pdf/1708.09163v1.pdf
PWC https://paperswithcode.com/paper/an-empirical-study-of-discriminative-sequence
Repo
Framework

Channel-Recurrent Autoencoding for Image Modeling

Title Channel-Recurrent Autoencoding for Image Modeling
Authors Wenling Shang, Kihyuk Sohn, Yuandong Tian
Abstract Despite recent successes in synthesizing faces and bedrooms, existing generative models struggle to capture more complex image types, potentially due to the oversimplification of their latent space constructions. To tackle this issue, building on Variational Autoencoders (VAEs), we integrate recurrent connections across channels to both inference and generation steps, allowing the high-level features to be captured in global-to-local, coarse-to-fine manners. Combined with adversarial loss, our channel-recurrent VAE-GAN (crVAE-GAN) outperforms VAE-GAN in generating a diverse spectrum of high resolution images while maintaining the same level of computational efficacy. Our model produces interpretable and expressive latent representations to benefit downstream tasks such as image completion. Moreover, we propose two novel regularizations, namely the KL objective weighting scheme over time steps and mutual information maximization between transformed latent variables and the outputs, to enhance the training.
Tasks
Published 2017-06-12
URL http://arxiv.org/abs/1706.03729v2
PDF http://arxiv.org/pdf/1706.03729v2.pdf
PWC https://paperswithcode.com/paper/channel-recurrent-autoencoding-for-image
Repo
Framework

Smarnet: Teaching Machines to Read and Comprehend Like Human

Title Smarnet: Teaching Machines to Read and Comprehend Like Human
Authors Zheqian Chen, Rongqin Yang, Bin Cao, Zhou Zhao, Deng Cai, Xiaofei He
Abstract Machine Comprehension (MC) is a challenging task in Natural Language Processing field, which aims to guide the machine to comprehend a passage and answer the given question. Many existing approaches on MC task are suffering the inefficiency in some bottlenecks, such as insufficient lexical understanding, complex question-passage interaction, incorrect answer extraction and so on. In this paper, we address these problems from the viewpoint of how humans deal with reading tests in a scientific way. Specifically, we first propose a novel lexical gating mechanism to dynamically combine the words and characters representations. We then guide the machines to read in an interactive way with attention mechanism and memory network. Finally we add a checking layer to refine the answer for insurance. The extensive experiments on two popular datasets SQuAD and TriviaQA show that our method exceeds considerable performance than most state-of-the-art solutions at the time of submission.
Tasks Question Answering, Reading Comprehension
Published 2017-10-08
URL http://arxiv.org/abs/1710.02772v1
PDF http://arxiv.org/pdf/1710.02772v1.pdf
PWC https://paperswithcode.com/paper/smarnet-teaching-machines-to-read-and
Repo
Framework

Two-view 3D Reconstruction for Food Volume Estimation

Title Two-view 3D Reconstruction for Food Volume Estimation
Authors Joachim Dehais, Marios Anthimopoulos, Sergey Shevchik, Stavroula Mougiakakou
Abstract The increasing prevalence of diet-related chronic diseases coupled with the ineffectiveness of traditional diet management methods have resulted in a need for novel tools to accurately and automatically assess meals. Recently, computer vision based systems that use meal images to assess their content have been proposed. Food portion estimation is the most difficult task for individuals assessing their meals and it is also the least studied area. The present paper proposes a three-stage system to calculate portion sizes using two images of a dish acquired by mobile devices. The first stage consists in understanding the configuration of the different views, after which a dense 3D model is built from the two images; finally, this 3D model serves to extract the volume of the different items. The system was extensively tested on 77 real dishes of known volume, and achieved an average error of less than 10% in 5.5 seconds per dish. The proposed pipeline is computationally tractable and requires no user input, making it a viable option for fully automated dietary assessment.
Tasks 3D Reconstruction
Published 2017-01-12
URL http://arxiv.org/abs/1701.03330v1
PDF http://arxiv.org/pdf/1701.03330v1.pdf
PWC https://paperswithcode.com/paper/two-view-3d-reconstruction-for-food-volume
Repo
Framework

Collaborative Descriptors: Convolutional Maps for Preprocessing

Title Collaborative Descriptors: Convolutional Maps for Preprocessing
Authors Hirokatsu Kataoka, Kaori Abe, Akio Nakamura, Yutaka Satoh
Abstract The paper presents a novel concept for collaborative descriptors between deeply learned and hand-crafted features. To achieve this concept, we apply convolutional maps for pre-processing, namely the convovlutional maps are used as input of hand-crafted features. We recorded an increase in the performance rate of +17.06 % (multi-class object recognition) and +24.71 % (car detection) from grayscale input to convolutional maps. Although the framework is straight-forward, the concept should be inherited for an improved representation.
Tasks Object Recognition
Published 2017-05-10
URL http://arxiv.org/abs/1705.03595v1
PDF http://arxiv.org/pdf/1705.03595v1.pdf
PWC https://paperswithcode.com/paper/collaborative-descriptors-convolutional-maps
Repo
Framework
comments powered by Disqus