January 29, 2020

2870 words 14 mins read

Paper Group ANR 730

Phase-aware Speech Enhancement with Deep Complex U-Net. Continual adaptation for efficient machine communication. An improved uncertainty propagation method for robust i-vector based speaker recognition. Multi-level Texture Encoding and Representation (MuLTER) based on Deep Neural Networks. Probabilistic Rollouts for Learning Curve Extrapolation Ac …

Phase-aware Speech Enhancement with Deep Complex U-Net


Title	Phase-aware Speech Enhancement with Deep Complex U-Net
Authors	Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, Kyogu Lee
Abstract	Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction. This is due to the difficulty of estimating the phase of clean speech. To improve speech enhancement performance, we tackle the phase estimation problem in three ways. First, we propose Deep Complex U-Net, an advanced U-Net structured model incorporating well-defined complex-valued building blocks to deal with complex-valued spectrograms. Second, we propose a polar coordinate-wise complex-valued masking method to reflect the distribution of complex ideal ratio masks. Third, we define a novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure. Our model was evaluated on a mixture of the Voice Bank corpus and DEMAND database, which has been widely used by many deep learning models for speech enhancement. Ablation experiments were conducted on the mixed dataset showing that all three proposed approaches are empirically valid. Experimental results show that the proposed method achieves state-of-the-art performance in all metrics, outperforming previous approaches by a large margin.
Tasks	Speech Enhancement
Published	2019-03-07
URL	http://arxiv.org/abs/1903.03107v2
PDF	http://arxiv.org/pdf/1903.03107v2.pdf
PWC	https://paperswithcode.com/paper/phase-aware-speech-enhancement-with-deep-1
Repo
Framework

Continual adaptation for efficient machine communication


Title	Continual adaptation for efficient machine communication
Authors	Robert D. Hawkins, Minae Kwon, Dorsa Sadigh, Noah D. Goodman
Abstract	To communicate with new partners in new contexts, humans rapidly form new linguistic conventions. Recent language models trained with deep neural networks are able to comprehend and produce the existing conventions present in their training data, but are not able to flexibly and interactively adapt those conventions on the fly as humans do. We introduce a repeated reference task as a benchmark for models of adaptation in communication and propose a regularized continual learning framework that allows an artificial agent initialized with a generic language model to more accurately and efficiently communicate with a partner over time. We evaluate this framework through simulations on COCO and in real-time reference game experiments with human partners.
Tasks	Continual Learning, Language Modelling
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09896v1
PDF	https://arxiv.org/pdf/1911.09896v1.pdf
PWC	https://paperswithcode.com/paper/continual-adaptation-for-efficient-machine
Repo
Framework

An improved uncertainty propagation method for robust i-vector based speaker recognition


Title	An improved uncertainty propagation method for robust i-vector based speaker recognition
Authors	Dayana Ribas, Emmanuel Vincent
Abstract	The performance of automatic speaker recognition systems degrades when facing distorted speech data containing additive noise and/or reverberation. Statistical uncertainty propagation has been introduced as a promising paradigm to address this challenge. So far, different uncertainty propagation methods have been proposed to compensate noise and reverberation in i-vectors in the context of speaker recognition. They have achieved promising results on small datasets such as YOHO and Wall Street Journal, but little or no improvement on the larger, highly variable NIST Speaker Recognition Evaluation (SRE) corpus. In this paper, we propose a complete uncertainty propagation method, whereby we model the effect of uncertainty both in the computation of unbiased Baum-Welch statistics and in the derivation of the posterior expectation of the i-vector. We conduct experiments on the NIST-SRE corpus mixed with real domestic noise and reverberation from the CHiME-2 corpus and preprocessed by multichannel speech enhancement. The proposed method improves the equal error rate (EER) by 4% relative compared to a conventional i-vector based speaker verification baseline. This is to be compared with previous methods which degrade performance.
Tasks	Speaker Recognition, Speaker Verification, Speech Enhancement
Published	2019-02-15
URL	http://arxiv.org/abs/1902.05761v2
PDF	http://arxiv.org/pdf/1902.05761v2.pdf
PWC	https://paperswithcode.com/paper/an-improved-uncertainty-propagation-method
Repo
Framework

Multi-level Texture Encoding and Representation (MuLTER) based on Deep Neural Networks


Title	Multi-level Texture Encoding and Representation (MuLTER) based on Deep Neural Networks
Authors	Yuting Hu, Zhiling Long, Ghassan AlRegib
Abstract	In this paper, we propose a multi-level texture encoding and representation network (MuLTER) for texture-related applications. Based on a multi-level pooling architecture, the MuLTER network simultaneously leverages low- and high-level features to maintain both texture details and spatial information. Such a pooling architecture involves few extra parameters and keeps feature dimensions fixed despite of the changes of image sizes. In comparison with state-of-the-art texture descriptors, the MuLTER network yields higher recognition accuracy on typical texture datasets such as MINC-2500 and GTOS-mobile with a discriminative and compact representation. In addition, we analyze the impact of combining features from different levels, which supports our claim that the fusion of multi-level features efficiently enhances recognition performance. Our source code will be published on GitHub (https://github.com/olivesgatech).
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09907v1
PDF	https://arxiv.org/pdf/1905.09907v1.pdf
PWC	https://paperswithcode.com/paper/multi-level-texture-encoding-and
Repo
Framework

Probabilistic Rollouts for Learning Curve Extrapolation Across Hyperparameter Settings


Title	Probabilistic Rollouts for Learning Curve Extrapolation Across Hyperparameter Settings
Authors	Matilde Gargiani, Aaron Klein, Stefan Falkner, Frank Hutter
Abstract	We propose probabilistic models that can extrapolate learning curves of iterative machine learning algorithms, such as stochastic gradient descent for training deep networks, based on training data with variable-length learning curves. We study instantiations of this framework based on random forests and Bayesian recurrent neural networks. Our experiments show that these models yield better predictions than state-of-the-art models from the hyperparameter optimization literature when extrapolating the performance of neural networks trained with different hyperparameter settings.
Tasks	Hyperparameter Optimization
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04522v1
PDF	https://arxiv.org/pdf/1910.04522v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-rollouts-for-learning-curve
Repo
Framework

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge


Title	GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge
Authors	Luyao Huang, Chi Sun, Xipeng Qiu, Xuanjing Huang
Abstract	Word Sense Disambiguation (WSD) aims to find the exact sense of an ambiguous word in a particular context. Traditional supervised methods rarely take into consideration the lexical resources like WordNet, which are widely utilized in knowledge-based methods. Recent studies have shown the effectiveness of incorporating gloss (sense definition) into neural networks for WSD. However, compared with traditional word expert supervised methods, they have not achieved much improvement. In this paper, we focus on how to better leverage gloss knowledge in a supervised neural WSD system. We construct context-gloss pairs and propose three BERT-based models for WSD. We fine-tune the pre-trained BERT model on SemCor3.0 training corpus and the experimental results on several English all-words WSD benchmark datasets show that our approach outperforms the state-of-the-art systems.
Tasks	Word Sense Disambiguation
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07245v4
PDF	https://arxiv.org/pdf/1908.07245v4.pdf
PWC	https://paperswithcode.com/paper/glossbert-bert-for-word-sense-disambiguation
Repo
Framework

HPC AI500: A Benchmark Suite for HPC AI Systems


Title	HPC AI500: A Benchmark Suite for HPC AI Systems
Authors	Zihan Jiang, Wanling Gao, Lei Wang, Xingwang Xiong, Yuchen Zhang, Xu Wen, Chunjie Luo, Hainan Ye, Yunquan Zhang, Shengzhong Feng, Kenli Li, Weijia Xu, Jianfeng Zhan
Abstract	In recent years, with the trend of applying deep learning (DL) in high performance scientific computing, the unique characteristics of emerging DL workloads in HPC raise great challenges in designing, implementing HPC AI systems. The community needs a new yard stick for evaluating the future HPC systems. In this paper, we propose HPC AI500 — a benchmark suite for evaluating HPC systems that running scientific DL workloads. Covering the most representative scientific fields, each workload from HPC AI500 is based on real-world scientific DL applications. Currently, we choose 14 scientific DL benchmarks from perspectives of application scenarios, data sets, and software stack. We propose a set of metrics for comprehensively evaluating the HPC AI systems, considering both accuracy, performance as well as power and cost. We provide a scalable reference implementation of HPC AI500. HPC AI500 is a part of the open-source AIBench project, the specification and source code are publicly available from \url{http://www.benchcouncil.org/AIBench/index.html}.
Tasks
Published	2019-07-27
URL	https://arxiv.org/abs/1908.02607v3
PDF	https://arxiv.org/pdf/1908.02607v3.pdf
PWC	https://paperswithcode.com/paper/hpc-ai500-a-benchmark-suite-for-hpc-ai
Repo
Framework

Exponentially convergent stochastic k-PCA without variance reduction


Title	Exponentially convergent stochastic k-PCA without variance reduction
Authors	Cheng Tang
Abstract	We present Matrix Krasulina, an algorithm for online k-PCA, by generalizing the classic Krasulina’s method (Krasulina, 1969) from vector to matrix case. We show, both theoretically and empirically, that the algorithm naturally adapts to data low-rankness and converges exponentially fast to the ground-truth principal subspace. Notably, our result suggests that despite various recent efforts to accelerate the convergence of stochastic-gradient based methods by adding a O(n)-time variance reduction step, for the k-PCA problem, a truly online SGD variant suffices to achieve exponential convergence on intrinsically low-rank data.
Tasks
Published	2019-04-03
URL	http://arxiv.org/abs/1904.01750v1
PDF	http://arxiv.org/pdf/1904.01750v1.pdf
PWC	https://paperswithcode.com/paper/exponentially-convergent-stochastic-k-pca
Repo
Framework

Sex-Prediction from Periocular Images across Multiple Sensors and Spectra


Title	Sex-Prediction from Periocular Images across Multiple Sensors and Spectra
Authors	Juan Tapia, Christian Rathgeb, Christoph Busch
Abstract	In this paper, we provide a comprehensive analysis of periocular-based sex-prediction (commonly referred to as gender classification) using state-of-the-art machine learning techniques. In order to reflect a more challenging scenario where periocular images are likely to be obtained from an unknown source, i.e. sensor, convolutional neural networks are trained on fused sets composed of several near-infrared (NIR) and visible wavelength (VW) image databases. In a cross-sensor scenario within each spectrum an average classification accuracy of approximately 85% is achieved. When sex-prediction is performed across spectra an average classification accuracy of about 82% is obtained. Finally, a multi-spectral sex-prediction yields a classification accuracy of 83% on average. Compared to proposed works, obtained results provide a more realistic estimation of the feasibility to predict a subject’s sex from the periocular region.
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00396v1
PDF	http://arxiv.org/pdf/1905.00396v1.pdf
PWC	https://paperswithcode.com/paper/sex-prediction-from-periocular-images-across
Repo
Framework

Deep Speech Enhancement for Reverberated and Noisy Signals using Wide Residual Networks


Title	Deep Speech Enhancement for Reverberated and Noisy Signals using Wide Residual Networks
Authors	Dayana Ribas, Jorge Llombart, Antonio Miguel, Luis Vicente
Abstract	This paper proposes a deep speech enhancement method which exploits the high potential of residual connections in a wide neural network architecture, a topology known as Wide Residual Network. This is supported on single dimensional convolutions computed alongside the time domain, which is a powerful approach to process contextually correlated representations through the temporal domain, such as speech feature sequences. We find the residual mechanism extremely useful for the enhancement task since the signal always has a linear shortcut and the non-linear path enhances it in several steps by adding or subtracting corrections. The enhancement capacity of the proposal is assessed by objective quality metrics and the performance of a speech recognition system. This was evaluated in the framework of the REVERB Challenge dataset, including simulated and real samples of reverberated and noisy speech signals. Results showed that enhanced speech from the proposed method succeeded for both, the enhancement task with intelligibility purposes and the speech recognition system. The DNN model, trained with artificial synthesized reverberation data, was able to deal with far-field reverberated speech from real scenarios. Furthermore, the method was able to take advantage of the residual connection achieving to enhance signals with low noise level, which is usually a strong handicap of traditional enhancement methods.
Tasks	Speech Enhancement, Speech Recognition
Published	2019-01-03
URL	http://arxiv.org/abs/1901.00660v1
PDF	http://arxiv.org/pdf/1901.00660v1.pdf
PWC	https://paperswithcode.com/paper/deep-speech-enhancement-for-reverberated-and
Repo
Framework

LocalNorm: Robust Image Classification through Dynamically Regularized Normalization


Title	LocalNorm: Robust Image Classification through Dynamically Regularized Normalization
Authors	Bojian Yin, Siebren Schaafsma, Henk Corporaal, H. Steven Scholte, Sander M. Bohte
Abstract	While modern convolutional neural networks achieve outstanding accuracy on many image classification tasks, they are, compared to humans, much more sensitive to image degradation. Here, we describe a variant of Batch Normalization, LocalNorm, that regularizes the normalization layer in the spirit of Dropout while dynamically adapting to the local image intensity and contrast at test-time. We show that the resulting deep neural networks are much more resistant to noise-induced image degradation, improving accuracy by up to three times, while achieving the same or slightly better accuracy on non-degraded classical benchmarks. In computational terms, LocalNorm adds negligible training cost and little or no cost at inference time, and can be applied to already-trained networks in a straightforward manner.
Tasks	Image Classification
Published	2019-02-18
URL	http://arxiv.org/abs/1902.06550v3
PDF	http://arxiv.org/pdf/1902.06550v3.pdf
PWC	https://paperswithcode.com/paper/localnorm-robust-image-classification-through
Repo
Framework

1D Convolutional Neural Networks and Applications: A Survey


Title	1D Convolutional Neural Networks and Applications: A Survey
Authors	Serkan Kiranyaz, Onur Avci, Osama Abdeljaber, Turker Ince, Moncef Gabbouj, Daniel J. Inman
Abstract	During the last decade, Convolutional Neural Networks (CNNs) have become the de facto standard for various Computer Vision and Machine Learning operations. CNNs are feed-forward Artificial Neural Networks (ANNs) with alternating convolutional and subsampling layers. Deep 2D CNNs with many hidden layers and millions of parameters have the ability to learn complex objects and patterns providing that they can be trained on a massive size visual database with ground-truth labels. With a proper training, this unique ability makes them the primary tool for various engineering applications for 2D signals such as images and video frames. Yet, this may not be a viable option in numerous applications over 1D signals especially when the training data is scarce or application-specific. To address this issue, 1D CNNs have recently been proposed and immediately achieved the state-of-the-art performance levels in several applications such as personalized biomedical data classification and early diagnosis, structural health monitoring, anomaly detection and identification in power electronics and motor-fault detection. Another major advantage is that a real-time and low-cost hardware implementation is feasible due to the simple and compact configuration of 1D CNNs that perform only 1D convolutions (scalar multiplications and additions). This paper presents a comprehensive review of the general architecture and principals of 1D CNNs along with their major engineering applications, especially focused on the recent progress in this field. Their state-of-the-art performance is highlighted concluding with their unique properties. The benchmark datasets and the principal 1D CNN software used in those applications are also publically shared in a dedicated website.
Tasks	Anomaly Detection, Fault Detection
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03554v1
PDF	https://arxiv.org/pdf/1905.03554v1.pdf
PWC	https://paperswithcode.com/paper/190503554
Repo
Framework

Goal-Driven Sequential Data Abstraction


Title	Goal-Driven Sequential Data Abstraction
Authors	Umar Riaz Muhammad, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song
Abstract	Automatic data abstraction is an important capability for both benchmarking machine intelligence and supporting summarization applications. In the former one asks whether a machine can `understand’ enough about the meaning of input data to produce a meaningful but more compact abstraction. In the latter this capability is exploited for saving space or human time by summarizing the essence of input data. In this paper we study a general reinforcement learning based framework for learning to abstract sequential data in a goal-driven way. The ability to define different abstraction goals uniquely allows different aspects of the input data to be preserved according to the ultimate purpose of the abstraction. Our reinforcement learning objective does not require human-defined examples of ideal abstraction. Importantly our model processes the input sequence holistically without being constrained by the original input order. Our framework is also domain agnostic – we demonstrate applications to sketch, video and text data and achieve promising results in all domains. \|
Tasks
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12336v2
PDF	https://arxiv.org/pdf/1907.12336v2.pdf
PWC	https://paperswithcode.com/paper/goal-driven-sequential-data-abstraction
Repo
Framework

A Survey of Black-Box Adversarial Attacks on Computer Vision Models


Title	A Survey of Black-Box Adversarial Attacks on Computer Vision Models
Authors	Siddhant Bhambri, Sumanyu Muku, Avinash Tulasi, Arun Balaji Buduru
Abstract	Machine learning has seen tremendous advances in the past few years, which has lead to deep learning models being deployed in varied applications of day-to-day life. Attacks on such models using perturbations, particularly in real-life scenarios, pose a severe challenge to their applicability, pushing research into the direction which aims to enhance the robustness of these models. After the introduction of these perturbations by Szegedy et al. [1], significant amount of research has focused on the reliability of such models, primarily in two aspects - white-box, where the adversary has access to the targeted model and related parameters; and the black-box, which resembles a real-life scenario with the adversary having almost no knowledge of the model to be attacked. To provide a comprehensive security cover, it is essential to identify, study, and build defenses against such attacks. Hence, in this paper, we propose to present a comprehensive comparative study of various black-box adversarial attacks and defense techniques.
Tasks
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01667v3
PDF	https://arxiv.org/pdf/1912.01667v3.pdf
PWC	https://paperswithcode.com/paper/a-study-of-black-box-adversarial-attacks-in
Repo
Framework

Discriminative Topic Modeling with Logistic LDA


Title	Discriminative Topic Modeling with Logistic LDA
Authors	Iryna Korshunova, Hanchen Xiong, Mateusz Fedoryszak, Lucas Theis
Abstract	Despite many years of research into latent Dirichlet allocation (LDA), applying LDA to collections of non-categorical items is still challenging. Yet many problems with much richer data share a similar structure and could benefit from the vast literature on LDA. We propose logistic LDA, a novel discriminative variant of latent Dirichlet allocation which is easy to apply to arbitrary inputs. In particular, our model can easily be applied to groups of images, arbitrary text embeddings, and integrates well with deep neural networks. Although it is a discriminative model, we show that logistic LDA can learn from unlabeled data in an unsupervised manner by exploiting the group structure present in the data. In contrast to other recent topic models designed to handle arbitrary inputs, our model does not sacrifice the interpretability and principled motivation of LDA.
Tasks	Topic Models
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01436v2
PDF	https://arxiv.org/pdf/1909.01436v2.pdf
PWC	https://paperswithcode.com/paper/discriminative-topic-modeling-with-logistic
Repo
Framework