July 27, 2019

2796 words 14 mins read

Paper Group ANR 621

What does Attention in Neural Machine Translation Pay Attention to?. Matching neural paths: transfer from recognition to correspondence search. Linearly constrained Gaussian processes. Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition. Robust Speech Recognition Using Generative Adversarial Networks. A H …

What does Attention in Neural Machine Translation Pay Attention to?


Title	What does Attention in Neural Machine Translation Pay Attention to?
Authors	Hamidreza Ghader, Christof Monz
Abstract	Attention in neural machine translation provides the possibility to encode relevant parts of the source sentence at each translation step. As a result, attention is considered to be an alignment model as well. However, there is no work that specifically studies attention and provides analysis of what is being learned by attention models. Thus, the question still remains that how attention is similar or different from the traditional alignment. In this paper, we provide detailed analysis of attention and compare it to traditional alignment. We answer the question of whether attention is only capable of modelling translational equivalent or it captures more information. We show that attention is different from alignment in some cases and is capturing useful information other than alignments.
Tasks	Machine Translation
Published	2017-10-09
URL	http://arxiv.org/abs/1710.03348v1
PDF	http://arxiv.org/pdf/1710.03348v1.pdf
PWC	https://paperswithcode.com/paper/what-does-attention-in-neural-machine
Repo
Framework

Matching neural paths: transfer from recognition to correspondence search


Title	Matching neural paths: transfer from recognition to correspondence search
Authors	Nikolay Savinov, Lubor Ladicky, Marc Pollefeys
Abstract	Many machine learning tasks require finding per-part correspondences between objects. In this work we focus on low-level correspondences - a highly ambiguous matching problem. We propose to use a hierarchical semantic representation of the objects, coming from a convolutional neural network, to solve this ambiguity. Training it for low-level correspondence prediction directly might not be an option in some domains where the ground-truth correspondences are hard to obtain. We show how transfer from recognition can be used to avoid such training. Our idea is to mark parts as “matching” if their features are close to each other at all the levels of convolutional feature hierarchy (neural paths). Although the overall number of such paths is exponential in the number of layers, we propose a polynomial algorithm for aggregating all of them in a single backward pass. The empirical validation is done on the task of stereo correspondence and demonstrates that we achieve competitive results among the methods which do not use labeled target domain data.
Tasks
Published	2017-05-19
URL	http://arxiv.org/abs/1705.08272v3
PDF	http://arxiv.org/pdf/1705.08272v3.pdf
PWC	https://paperswithcode.com/paper/matching-neural-paths-transfer-from
Repo
Framework

Linearly constrained Gaussian processes


Title	Linearly constrained Gaussian processes
Authors	Carl Jidling, Niklas Wahlström, Adrian Wills, Thomas B. Schön
Abstract	We consider a modification of the covariance function in Gaussian processes to correctly account for known linear constraints. By modelling the target function as a transformation of an underlying function, the constraints are explicitly incorporated in the model such that they are guaranteed to be fulfilled by any sample drawn or prediction made. We also propose a constructive procedure for designing the transformation operator and illustrate the result on both simulated and real-data examples.
Tasks	Gaussian Processes
Published	2017-03-02
URL	http://arxiv.org/abs/1703.00787v2
PDF	http://arxiv.org/pdf/1703.00787v2.pdf
PWC	https://paperswithcode.com/paper/linearly-constrained-gaussian-processes
Repo
Framework

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition


Title	Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition
Authors	Chris Donahue, Bo Li, Rohit Prabhavalkar
Abstract	We investigate the effectiveness of generative adversarial networks (GANs) for speech enhancement, in the context of improving noise robustness of automatic speech recognition (ASR) systems. Prior work demonstrates that GANs can effectively suppress additive noise in raw waveform speech signals, improving perceptual quality metrics; however this technique was not justified in the context of ASR. In this work, we conduct a detailed study to measure the effectiveness of GANs in enhancing speech contaminated by both additive and reverberant noise. Motivated by recent advances in image processing, we propose operating GANs on log-Mel filterbank spectra instead of waveforms, which requires less computation and is more robust to reverberant noise. While GAN enhancement improves the performance of a clean-trained ASR system on noisy speech, it falls short of the performance achieved by conventional multi-style training (MTR). By appending the GAN-enhanced features to the noisy inputs and retraining, we achieve a 7% WER improvement relative to the MTR system.
Tasks	Robust Speech Recognition, Speech Enhancement, Speech Recognition
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05747v2
PDF	http://arxiv.org/pdf/1711.05747v2.pdf
PWC	https://paperswithcode.com/paper/exploring-speech-enhancement-with-generative
Repo
Framework

Robust Speech Recognition Using Generative Adversarial Networks


Title	Robust Speech Recognition Using Generative Adversarial Networks
Authors	Anuroop Sriram, Heewoo Jun, Yashesh Gaur, Sanjeev Satheesh
Abstract	This paper describes a general, scalable, end-to-end framework that uses the generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Unlike previous methods, the new framework does not rely on domain expertise or simplifying assumptions as are often needed in signal processing, and directly encourages robustness in a data-driven way. We show the new approach improves simulated far-field speech recognition of vanilla sequence-to-sequence models without specialized front-ends or preprocessing.
Tasks	Robust Speech Recognition, Speech Recognition
Published	2017-11-05
URL	http://arxiv.org/abs/1711.01567v1
PDF	http://arxiv.org/pdf/1711.01567v1.pdf
PWC	https://paperswithcode.com/paper/robust-speech-recognition-using-generative
Repo
Framework

A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification


Title	A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification
Authors	Hamid Eghbal-zadeh, Bernhard Lehner, Matthias Dorfer, Gerhard Widmer
Abstract	In Acoustic Scene Classification (ASC) two major approaches have been followed . While one utilizes engineered features such as mel-frequency-cepstral-coefficients (MFCCs), the other uses learned features that are the outcome of an optimization algorithm. I-vectors are the result of a modeling technique that usually takes engineered features as input. It has been shown that standard MFCCs extracted from monaural audio signals lead to i-vectors that exhibit poor performance, especially on indoor acoustic scenes. At the same time, Convolutional Neural Networks (CNNs) are well known for their ability to learn features by optimizing their filters. They have been applied on ASC and have shown promising results. In this paper, we first propose a novel multi-channel i-vector extraction and scoring scheme for ASC, improving their performance on indoor and outdoor scenes. Second, we propose a CNN architecture that achieves promising ASC results. Further, we show that i-vectors and CNNs capture complementary information from acoustic scenes. Finally, we propose a hybrid system for ASC using multi-channel i-vectors and CNNs by utilizing a score fusion technique. Using our method, we participated in the ASC task of the DCASE-2016 challenge. Our hybrid approach achieved 1 st rank among 49 submissions, substantially improving the previous state of the art.
Tasks	Acoustic Scene Classification, Scene Classification
Published	2017-06-20
URL	http://arxiv.org/abs/1706.06525v1
PDF	http://arxiv.org/pdf/1706.06525v1.pdf
PWC	https://paperswithcode.com/paper/a-hybrid-approach-with-multi-channel-i
Repo
Framework

Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes


Title	Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes
Authors	Hassan Ashtiani, Shai Ben-David, Nick Harvey, Christopher Liaw, Abbas Mehrabian, Yaniv Plan
Abstract	We prove that $\tilde{\Theta}(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for learning a mixture of $k$ Gaussians in $\mathbb{R}^d$, up to error $\varepsilon$ in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that $\tilde{O}(k d / \varepsilon^2)$ samples suffice, matching a known lower bound. Moreover, these results hold in the agnostic-learning/robust-estimation setting as well, where the target distribution is only approximately a mixture of Gaussians. The upper bound is shown using a novel technique for distribution learning based on a notion of `compression.’ Any class of distributions that allows such a compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in $\mathbb{R}^d$ admits a small-sized compression scheme. \|
Tasks
Published	2017-10-14
URL	https://arxiv.org/abs/1710.05209v4
PDF	https://arxiv.org/pdf/1710.05209v4.pdf
PWC	https://paperswithcode.com/paper/near-optimal-sample-complexity-bounds-for
Repo
Framework

DNN Filter Bank Cepstral Coefficients for Spoofing Detection


Title	DNN Filter Bank Cepstral Coefficients for Spoofing Detection
Authors	Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo
Abstract	With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof {2015} database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks.
Tasks	Speaker Verification, Speech Synthesis
Published	2017-02-13
URL	http://arxiv.org/abs/1702.03791v1
PDF	http://arxiv.org/pdf/1702.03791v1.pdf
PWC	https://paperswithcode.com/paper/dnn-filter-bank-cepstral-coefficients-for
Repo
Framework

Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation


Title	Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation
Authors	Wei-Ning Hsu, Yu Zhang, James Glass
Abstract	Domain mismatch between training and testing can lead to significant degradation in performance in many machine learning scenarios. Unfortunately, this is not a rare situation for automatic speech recognition deployments in real-world applications. Research on robust speech recognition can be regarded as trying to overcome this domain mismatch issue. In this paper, we address the unsupervised domain adaptation problem for robust speech recognition, where both source and target domain speech are presented, but word transcripts are only available for the source domain speech. We present novel augmentation-based methods that transform speech in a way that does not change the transcripts. Specifically, we first train a variational autoencoder on both source and target domain data (without supervision) to learn a latent representation of speech. We then transform nuisance attributes of speech that are irrelevant to recognition by modifying the latent representations, in order to augment labeled training data with additional data whose distribution is more similar to the target domain. The proposed method is evaluated on the CHiME-4 dataset and reduces the absolute word error rate (WER) by as much as 35% compared to the non-adapted baseline.
Tasks	Data Augmentation, Domain Adaptation, Robust Speech Recognition, Speech Recognition, Unsupervised Domain Adaptation
Published	2017-07-19
URL	http://arxiv.org/abs/1707.06265v2
PDF	http://arxiv.org/pdf/1707.06265v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-domain-adaptation-for-robust
Repo
Framework

Nonparametric Neural Networks


Title	Nonparametric Neural Networks
Authors	George Philipp, Jaime G. Carbonell
Abstract	Automatically determining the optimal size of a neural network for a given task without prior information currently requires an expensive global search and training many networks from scratch. In this paper, we address the problem of automatically finding a good network size during a single training cycle. We introduce nonparametric neural networks, a non-probabilistic framework for conducting optimization over all possible network sizes and prove its soundness when network growth is limited via an L_p penalty. We train networks under this framework by continuously adding new units while eliminating redundant units via an L_2 penalty. We employ a novel optimization algorithm, which we term adaptive radial-angular gradient descent or AdaRad, and obtain promising results.
Tasks
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05440v1
PDF	http://arxiv.org/pdf/1712.05440v1.pdf
PWC	https://paperswithcode.com/paper/nonparametric-neural-networks
Repo
Framework

An Empirical Study of Discriminative Sequence Labeling Models for Vietnamese Text Processing


Title	An Empirical Study of Discriminative Sequence Labeling Models for Vietnamese Text Processing
Authors	Phuong Le-Hong, Minh Pham Quang Nhat, Thai-Hoang Pham, Tuan-Anh Tran, Dang-Minh Nguyen
Abstract	This paper presents an empirical study of two widely-used sequence prediction models, Conditional Random Fields (CRFs) and Long Short-Term Memory Networks (LSTMs), on two fundamental tasks for Vietnamese text processing, including part-of-speech tagging and named entity recognition. We show that a strong lower bound for labeling accuracy can be obtained by relying only on simple word-based features with minimal hand-crafted feature engineering, of 90.65% and 86.03% performance scores on the standard test sets for the two tasks respectively. In particular, we demonstrate empirically the surprising efficiency of word embeddings in both of the two tasks, with both of the two models. We point out that the state-of-the-art LSTMs model does not always outperform significantly the traditional CRFs model, especially on moderate-sized data sets. Finally, we give some suggestions and discussions for efficient use of sequence labeling models in practical applications.
Tasks	Feature Engineering, Named Entity Recognition, Part-Of-Speech Tagging, Word Embeddings
Published	2017-08-30
URL	http://arxiv.org/abs/1708.09163v1
PDF	http://arxiv.org/pdf/1708.09163v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-of-discriminative-sequence
Repo
Framework

Channel-Recurrent Autoencoding for Image Modeling


Title	Channel-Recurrent Autoencoding for Image Modeling
Authors	Wenling Shang, Kihyuk Sohn, Yuandong Tian
Abstract	Despite recent successes in synthesizing faces and bedrooms, existing generative models struggle to capture more complex image types, potentially due to the oversimplification of their latent space constructions. To tackle this issue, building on Variational Autoencoders (VAEs), we integrate recurrent connections across channels to both inference and generation steps, allowing the high-level features to be captured in global-to-local, coarse-to-fine manners. Combined with adversarial loss, our channel-recurrent VAE-GAN (crVAE-GAN) outperforms VAE-GAN in generating a diverse spectrum of high resolution images while maintaining the same level of computational efficacy. Our model produces interpretable and expressive latent representations to benefit downstream tasks such as image completion. Moreover, we propose two novel regularizations, namely the KL objective weighting scheme over time steps and mutual information maximization between transformed latent variables and the outputs, to enhance the training.
Tasks
Published	2017-06-12
URL	http://arxiv.org/abs/1706.03729v2
PDF	http://arxiv.org/pdf/1706.03729v2.pdf
PWC	https://paperswithcode.com/paper/channel-recurrent-autoencoding-for-image
Repo
Framework

Smarnet: Teaching Machines to Read and Comprehend Like Human


Title	Smarnet: Teaching Machines to Read and Comprehend Like Human
Authors	Zheqian Chen, Rongqin Yang, Bin Cao, Zhou Zhao, Deng Cai, Xiaofei He
Abstract	Machine Comprehension (MC) is a challenging task in Natural Language Processing field, which aims to guide the machine to comprehend a passage and answer the given question. Many existing approaches on MC task are suffering the inefficiency in some bottlenecks, such as insufficient lexical understanding, complex question-passage interaction, incorrect answer extraction and so on. In this paper, we address these problems from the viewpoint of how humans deal with reading tests in a scientific way. Specifically, we first propose a novel lexical gating mechanism to dynamically combine the words and characters representations. We then guide the machines to read in an interactive way with attention mechanism and memory network. Finally we add a checking layer to refine the answer for insurance. The extensive experiments on two popular datasets SQuAD and TriviaQA show that our method exceeds considerable performance than most state-of-the-art solutions at the time of submission.
Tasks	Question Answering, Reading Comprehension
Published	2017-10-08
URL	http://arxiv.org/abs/1710.02772v1
PDF	http://arxiv.org/pdf/1710.02772v1.pdf
PWC	https://paperswithcode.com/paper/smarnet-teaching-machines-to-read-and
Repo
Framework

Two-view 3D Reconstruction for Food Volume Estimation


Title	Two-view 3D Reconstruction for Food Volume Estimation
Authors	Joachim Dehais, Marios Anthimopoulos, Sergey Shevchik, Stavroula Mougiakakou
Abstract	The increasing prevalence of diet-related chronic diseases coupled with the ineffectiveness of traditional diet management methods have resulted in a need for novel tools to accurately and automatically assess meals. Recently, computer vision based systems that use meal images to assess their content have been proposed. Food portion estimation is the most difficult task for individuals assessing their meals and it is also the least studied area. The present paper proposes a three-stage system to calculate portion sizes using two images of a dish acquired by mobile devices. The first stage consists in understanding the configuration of the different views, after which a dense 3D model is built from the two images; finally, this 3D model serves to extract the volume of the different items. The system was extensively tested on 77 real dishes of known volume, and achieved an average error of less than 10% in 5.5 seconds per dish. The proposed pipeline is computationally tractable and requires no user input, making it a viable option for fully automated dietary assessment.
Tasks	3D Reconstruction
Published	2017-01-12
URL	http://arxiv.org/abs/1701.03330v1
PDF	http://arxiv.org/pdf/1701.03330v1.pdf
PWC	https://paperswithcode.com/paper/two-view-3d-reconstruction-for-food-volume
Repo
Framework

Collaborative Descriptors: Convolutional Maps for Preprocessing


Title	Collaborative Descriptors: Convolutional Maps for Preprocessing
Authors	Hirokatsu Kataoka, Kaori Abe, Akio Nakamura, Yutaka Satoh
Abstract	The paper presents a novel concept for collaborative descriptors between deeply learned and hand-crafted features. To achieve this concept, we apply convolutional maps for pre-processing, namely the convovlutional maps are used as input of hand-crafted features. We recorded an increase in the performance rate of +17.06 % (multi-class object recognition) and +24.71 % (car detection) from grayscale input to convolutional maps. Although the framework is straight-forward, the concept should be inherited for an improved representation.
Tasks	Object Recognition
Published	2017-05-10
URL	http://arxiv.org/abs/1705.03595v1
PDF	http://arxiv.org/pdf/1705.03595v1.pdf
PWC	https://paperswithcode.com/paper/collaborative-descriptors-convolutional-maps
Repo
Framework