Paper Group ANR 730
Phase-aware Speech Enhancement with Deep Complex U-Net. Continual adaptation for efficient machine communication. An improved uncertainty propagation method for robust i-vector based speaker recognition. Multi-level Texture Encoding and Representation (MuLTER) based on Deep Neural Networks. Probabilistic Rollouts for Learning Curve Extrapolation Ac …
Phase-aware Speech Enhancement with Deep Complex U-Net
Title | Phase-aware Speech Enhancement with Deep Complex U-Net |
Authors | Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, Kyogu Lee |
Abstract | Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction. This is due to the difficulty of estimating the phase of clean speech. To improve speech enhancement performance, we tackle the phase estimation problem in three ways. First, we propose Deep Complex U-Net, an advanced U-Net structured model incorporating well-defined complex-valued building blocks to deal with complex-valued spectrograms. Second, we propose a polar coordinate-wise complex-valued masking method to reflect the distribution of complex ideal ratio masks. Third, we define a novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure. Our model was evaluated on a mixture of the Voice Bank corpus and DEMAND database, which has been widely used by many deep learning models for speech enhancement. Ablation experiments were conducted on the mixed dataset showing that all three proposed approaches are empirically valid. Experimental results show that the proposed method achieves state-of-the-art performance in all metrics, outperforming previous approaches by a large margin. |
Tasks | Speech Enhancement |
Published | 2019-03-07 |
URL | http://arxiv.org/abs/1903.03107v2 |
http://arxiv.org/pdf/1903.03107v2.pdf | |
PWC | https://paperswithcode.com/paper/phase-aware-speech-enhancement-with-deep-1 |
Repo | |
Framework | |
Continual adaptation for efficient machine communication
Title | Continual adaptation for efficient machine communication |
Authors | Robert D. Hawkins, Minae Kwon, Dorsa Sadigh, Noah D. Goodman |
Abstract | To communicate with new partners in new contexts, humans rapidly form new linguistic conventions. Recent language models trained with deep neural networks are able to comprehend and produce the existing conventions present in their training data, but are not able to flexibly and interactively adapt those conventions on the fly as humans do. We introduce a repeated reference task as a benchmark for models of adaptation in communication and propose a regularized continual learning framework that allows an artificial agent initialized with a generic language model to more accurately and efficiently communicate with a partner over time. We evaluate this framework through simulations on COCO and in real-time reference game experiments with human partners. |
Tasks | Continual Learning, Language Modelling |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.09896v1 |
https://arxiv.org/pdf/1911.09896v1.pdf | |
PWC | https://paperswithcode.com/paper/continual-adaptation-for-efficient-machine |
Repo | |
Framework | |
An improved uncertainty propagation method for robust i-vector based speaker recognition
Title | An improved uncertainty propagation method for robust i-vector based speaker recognition |
Authors | Dayana Ribas, Emmanuel Vincent |
Abstract | The performance of automatic speaker recognition systems degrades when facing distorted speech data containing additive noise and/or reverberation. Statistical uncertainty propagation has been introduced as a promising paradigm to address this challenge. So far, different uncertainty propagation methods have been proposed to compensate noise and reverberation in i-vectors in the context of speaker recognition. They have achieved promising results on small datasets such as YOHO and Wall Street Journal, but little or no improvement on the larger, highly variable NIST Speaker Recognition Evaluation (SRE) corpus. In this paper, we propose a complete uncertainty propagation method, whereby we model the effect of uncertainty both in the computation of unbiased Baum-Welch statistics and in the derivation of the posterior expectation of the i-vector. We conduct experiments on the NIST-SRE corpus mixed with real domestic noise and reverberation from the CHiME-2 corpus and preprocessed by multichannel speech enhancement. The proposed method improves the equal error rate (EER) by 4% relative compared to a conventional i-vector based speaker verification baseline. This is to be compared with previous methods which degrade performance. |
Tasks | Speaker Recognition, Speaker Verification, Speech Enhancement |
Published | 2019-02-15 |
URL | http://arxiv.org/abs/1902.05761v2 |
http://arxiv.org/pdf/1902.05761v2.pdf | |
PWC | https://paperswithcode.com/paper/an-improved-uncertainty-propagation-method |
Repo | |
Framework | |
Multi-level Texture Encoding and Representation (MuLTER) based on Deep Neural Networks
Title | Multi-level Texture Encoding and Representation (MuLTER) based on Deep Neural Networks |
Authors | Yuting Hu, Zhiling Long, Ghassan AlRegib |
Abstract | In this paper, we propose a multi-level texture encoding and representation network (MuLTER) for texture-related applications. Based on a multi-level pooling architecture, the MuLTER network simultaneously leverages low- and high-level features to maintain both texture details and spatial information. Such a pooling architecture involves few extra parameters and keeps feature dimensions fixed despite of the changes of image sizes. In comparison with state-of-the-art texture descriptors, the MuLTER network yields higher recognition accuracy on typical texture datasets such as MINC-2500 and GTOS-mobile with a discriminative and compact representation. In addition, we analyze the impact of combining features from different levels, which supports our claim that the fusion of multi-level features efficiently enhances recognition performance. Our source code will be published on GitHub (https://github.com/olivesgatech). |
Tasks | |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09907v1 |
https://arxiv.org/pdf/1905.09907v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-level-texture-encoding-and |
Repo | |
Framework | |
Probabilistic Rollouts for Learning Curve Extrapolation Across Hyperparameter Settings
Title | Probabilistic Rollouts for Learning Curve Extrapolation Across Hyperparameter Settings |
Authors | Matilde Gargiani, Aaron Klein, Stefan Falkner, Frank Hutter |
Abstract | We propose probabilistic models that can extrapolate learning curves of iterative machine learning algorithms, such as stochastic gradient descent for training deep networks, based on training data with variable-length learning curves. We study instantiations of this framework based on random forests and Bayesian recurrent neural networks. Our experiments show that these models yield better predictions than state-of-the-art models from the hyperparameter optimization literature when extrapolating the performance of neural networks trained with different hyperparameter settings. |
Tasks | Hyperparameter Optimization |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04522v1 |
https://arxiv.org/pdf/1910.04522v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-rollouts-for-learning-curve |
Repo | |
Framework | |
GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge
Title | GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge |
Authors | Luyao Huang, Chi Sun, Xipeng Qiu, Xuanjing Huang |
Abstract | Word Sense Disambiguation (WSD) aims to find the exact sense of an ambiguous word in a particular context. Traditional supervised methods rarely take into consideration the lexical resources like WordNet, which are widely utilized in knowledge-based methods. Recent studies have shown the effectiveness of incorporating gloss (sense definition) into neural networks for WSD. However, compared with traditional word expert supervised methods, they have not achieved much improvement. In this paper, we focus on how to better leverage gloss knowledge in a supervised neural WSD system. We construct context-gloss pairs and propose three BERT-based models for WSD. We fine-tune the pre-trained BERT model on SemCor3.0 training corpus and the experimental results on several English all-words WSD benchmark datasets show that our approach outperforms the state-of-the-art systems. |
Tasks | Word Sense Disambiguation |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07245v4 |
https://arxiv.org/pdf/1908.07245v4.pdf | |
PWC | https://paperswithcode.com/paper/glossbert-bert-for-word-sense-disambiguation |
Repo | |
Framework | |
HPC AI500: A Benchmark Suite for HPC AI Systems
Title | HPC AI500: A Benchmark Suite for HPC AI Systems |
Authors | Zihan Jiang, Wanling Gao, Lei Wang, Xingwang Xiong, Yuchen Zhang, Xu Wen, Chunjie Luo, Hainan Ye, Yunquan Zhang, Shengzhong Feng, Kenli Li, Weijia Xu, Jianfeng Zhan |
Abstract | In recent years, with the trend of applying deep learning (DL) in high performance scientific computing, the unique characteristics of emerging DL workloads in HPC raise great challenges in designing, implementing HPC AI systems. The community needs a new yard stick for evaluating the future HPC systems. In this paper, we propose HPC AI500 — a benchmark suite for evaluating HPC systems that running scientific DL workloads. Covering the most representative scientific fields, each workload from HPC AI500 is based on real-world scientific DL applications. Currently, we choose 14 scientific DL benchmarks from perspectives of application scenarios, data sets, and software stack. We propose a set of metrics for comprehensively evaluating the HPC AI systems, considering both accuracy, performance as well as power and cost. We provide a scalable reference implementation of HPC AI500. HPC AI500 is a part of the open-source AIBench project, the specification and source code are publicly available from \url{http://www.benchcouncil.org/AIBench/index.html}. |
Tasks | |
Published | 2019-07-27 |
URL | https://arxiv.org/abs/1908.02607v3 |
https://arxiv.org/pdf/1908.02607v3.pdf | |
PWC | https://paperswithcode.com/paper/hpc-ai500-a-benchmark-suite-for-hpc-ai |
Repo | |
Framework | |
Exponentially convergent stochastic k-PCA without variance reduction
Title | Exponentially convergent stochastic k-PCA without variance reduction |
Authors | Cheng Tang |
Abstract | We present Matrix Krasulina, an algorithm for online k-PCA, by generalizing the classic Krasulina’s method (Krasulina, 1969) from vector to matrix case. We show, both theoretically and empirically, that the algorithm naturally adapts to data low-rankness and converges exponentially fast to the ground-truth principal subspace. Notably, our result suggests that despite various recent efforts to accelerate the convergence of stochastic-gradient based methods by adding a O(n)-time variance reduction step, for the k-PCA problem, a truly online SGD variant suffices to achieve exponential convergence on intrinsically low-rank data. |
Tasks | |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01750v1 |
http://arxiv.org/pdf/1904.01750v1.pdf | |
PWC | https://paperswithcode.com/paper/exponentially-convergent-stochastic-k-pca |
Repo | |
Framework | |
Sex-Prediction from Periocular Images across Multiple Sensors and Spectra
Title | Sex-Prediction from Periocular Images across Multiple Sensors and Spectra |
Authors | Juan Tapia, Christian Rathgeb, Christoph Busch |
Abstract | In this paper, we provide a comprehensive analysis of periocular-based sex-prediction (commonly referred to as gender classification) using state-of-the-art machine learning techniques. In order to reflect a more challenging scenario where periocular images are likely to be obtained from an unknown source, i.e. sensor, convolutional neural networks are trained on fused sets composed of several near-infrared (NIR) and visible wavelength (VW) image databases. In a cross-sensor scenario within each spectrum an average classification accuracy of approximately 85% is achieved. When sex-prediction is performed across spectra an average classification accuracy of about 82% is obtained. Finally, a multi-spectral sex-prediction yields a classification accuracy of 83% on average. Compared to proposed works, obtained results provide a more realistic estimation of the feasibility to predict a subject’s sex from the periocular region. |
Tasks | |
Published | 2019-05-01 |
URL | http://arxiv.org/abs/1905.00396v1 |
http://arxiv.org/pdf/1905.00396v1.pdf | |
PWC | https://paperswithcode.com/paper/sex-prediction-from-periocular-images-across |
Repo | |
Framework | |
Deep Speech Enhancement for Reverberated and Noisy Signals using Wide Residual Networks
Title | Deep Speech Enhancement for Reverberated and Noisy Signals using Wide Residual Networks |
Authors | Dayana Ribas, Jorge Llombart, Antonio Miguel, Luis Vicente |
Abstract | This paper proposes a deep speech enhancement method which exploits the high potential of residual connections in a wide neural network architecture, a topology known as Wide Residual Network. This is supported on single dimensional convolutions computed alongside the time domain, which is a powerful approach to process contextually correlated representations through the temporal domain, such as speech feature sequences. We find the residual mechanism extremely useful for the enhancement task since the signal always has a linear shortcut and the non-linear path enhances it in several steps by adding or subtracting corrections. The enhancement capacity of the proposal is assessed by objective quality metrics and the performance of a speech recognition system. This was evaluated in the framework of the REVERB Challenge dataset, including simulated and real samples of reverberated and noisy speech signals. Results showed that enhanced speech from the proposed method succeeded for both, the enhancement task with intelligibility purposes and the speech recognition system. The DNN model, trained with artificial synthesized reverberation data, was able to deal with far-field reverberated speech from real scenarios. Furthermore, the method was able to take advantage of the residual connection achieving to enhance signals with low noise level, which is usually a strong handicap of traditional enhancement methods. |
Tasks | Speech Enhancement, Speech Recognition |
Published | 2019-01-03 |
URL | http://arxiv.org/abs/1901.00660v1 |
http://arxiv.org/pdf/1901.00660v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-speech-enhancement-for-reverberated-and |
Repo | |
Framework | |
LocalNorm: Robust Image Classification through Dynamically Regularized Normalization
Title | LocalNorm: Robust Image Classification through Dynamically Regularized Normalization |
Authors | Bojian Yin, Siebren Schaafsma, Henk Corporaal, H. Steven Scholte, Sander M. Bohte |
Abstract | While modern convolutional neural networks achieve outstanding accuracy on many image classification tasks, they are, compared to humans, much more sensitive to image degradation. Here, we describe a variant of Batch Normalization, LocalNorm, that regularizes the normalization layer in the spirit of Dropout while dynamically adapting to the local image intensity and contrast at test-time. We show that the resulting deep neural networks are much more resistant to noise-induced image degradation, improving accuracy by up to three times, while achieving the same or slightly better accuracy on non-degraded classical benchmarks. In computational terms, LocalNorm adds negligible training cost and little or no cost at inference time, and can be applied to already-trained networks in a straightforward manner. |
Tasks | Image Classification |
Published | 2019-02-18 |
URL | http://arxiv.org/abs/1902.06550v3 |
http://arxiv.org/pdf/1902.06550v3.pdf | |
PWC | https://paperswithcode.com/paper/localnorm-robust-image-classification-through |
Repo | |
Framework | |
1D Convolutional Neural Networks and Applications: A Survey
Title | 1D Convolutional Neural Networks and Applications: A Survey |
Authors | Serkan Kiranyaz, Onur Avci, Osama Abdeljaber, Turker Ince, Moncef Gabbouj, Daniel J. Inman |
Abstract | During the last decade, Convolutional Neural Networks (CNNs) have become the de facto standard for various Computer Vision and Machine Learning operations. CNNs are feed-forward Artificial Neural Networks (ANNs) with alternating convolutional and subsampling layers. Deep 2D CNNs with many hidden layers and millions of parameters have the ability to learn complex objects and patterns providing that they can be trained on a massive size visual database with ground-truth labels. With a proper training, this unique ability makes them the primary tool for various engineering applications for 2D signals such as images and video frames. Yet, this may not be a viable option in numerous applications over 1D signals especially when the training data is scarce or application-specific. To address this issue, 1D CNNs have recently been proposed and immediately achieved the state-of-the-art performance levels in several applications such as personalized biomedical data classification and early diagnosis, structural health monitoring, anomaly detection and identification in power electronics and motor-fault detection. Another major advantage is that a real-time and low-cost hardware implementation is feasible due to the simple and compact configuration of 1D CNNs that perform only 1D convolutions (scalar multiplications and additions). This paper presents a comprehensive review of the general architecture and principals of 1D CNNs along with their major engineering applications, especially focused on the recent progress in this field. Their state-of-the-art performance is highlighted concluding with their unique properties. The benchmark datasets and the principal 1D CNN software used in those applications are also publically shared in a dedicated website. |
Tasks | Anomaly Detection, Fault Detection |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03554v1 |
https://arxiv.org/pdf/1905.03554v1.pdf | |
PWC | https://paperswithcode.com/paper/190503554 |
Repo | |
Framework | |
Goal-Driven Sequential Data Abstraction
Title | Goal-Driven Sequential Data Abstraction |
Authors | Umar Riaz Muhammad, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song |
Abstract | Automatic data abstraction is an important capability for both benchmarking machine intelligence and supporting summarization applications. In the former one asks whether a machine can `understand’ enough about the meaning of input data to produce a meaningful but more compact abstraction. In the latter this capability is exploited for saving space or human time by summarizing the essence of input data. In this paper we study a general reinforcement learning based framework for learning to abstract sequential data in a goal-driven way. The ability to define different abstraction goals uniquely allows different aspects of the input data to be preserved according to the ultimate purpose of the abstraction. Our reinforcement learning objective does not require human-defined examples of ideal abstraction. Importantly our model processes the input sequence holistically without being constrained by the original input order. Our framework is also domain agnostic – we demonstrate applications to sketch, video and text data and achieve promising results in all domains. | |
Tasks | |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12336v2 |
https://arxiv.org/pdf/1907.12336v2.pdf | |
PWC | https://paperswithcode.com/paper/goal-driven-sequential-data-abstraction |
Repo | |
Framework | |
A Survey of Black-Box Adversarial Attacks on Computer Vision Models
Title | A Survey of Black-Box Adversarial Attacks on Computer Vision Models |
Authors | Siddhant Bhambri, Sumanyu Muku, Avinash Tulasi, Arun Balaji Buduru |
Abstract | Machine learning has seen tremendous advances in the past few years, which has lead to deep learning models being deployed in varied applications of day-to-day life. Attacks on such models using perturbations, particularly in real-life scenarios, pose a severe challenge to their applicability, pushing research into the direction which aims to enhance the robustness of these models. After the introduction of these perturbations by Szegedy et al. [1], significant amount of research has focused on the reliability of such models, primarily in two aspects - white-box, where the adversary has access to the targeted model and related parameters; and the black-box, which resembles a real-life scenario with the adversary having almost no knowledge of the model to be attacked. To provide a comprehensive security cover, it is essential to identify, study, and build defenses against such attacks. Hence, in this paper, we propose to present a comprehensive comparative study of various black-box adversarial attacks and defense techniques. |
Tasks | |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01667v3 |
https://arxiv.org/pdf/1912.01667v3.pdf | |
PWC | https://paperswithcode.com/paper/a-study-of-black-box-adversarial-attacks-in |
Repo | |
Framework | |
Discriminative Topic Modeling with Logistic LDA
Title | Discriminative Topic Modeling with Logistic LDA |
Authors | Iryna Korshunova, Hanchen Xiong, Mateusz Fedoryszak, Lucas Theis |
Abstract | Despite many years of research into latent Dirichlet allocation (LDA), applying LDA to collections of non-categorical items is still challenging. Yet many problems with much richer data share a similar structure and could benefit from the vast literature on LDA. We propose logistic LDA, a novel discriminative variant of latent Dirichlet allocation which is easy to apply to arbitrary inputs. In particular, our model can easily be applied to groups of images, arbitrary text embeddings, and integrates well with deep neural networks. Although it is a discriminative model, we show that logistic LDA can learn from unlabeled data in an unsupervised manner by exploiting the group structure present in the data. In contrast to other recent topic models designed to handle arbitrary inputs, our model does not sacrifice the interpretability and principled motivation of LDA. |
Tasks | Topic Models |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01436v2 |
https://arxiv.org/pdf/1909.01436v2.pdf | |
PWC | https://paperswithcode.com/paper/discriminative-topic-modeling-with-logistic |
Repo | |
Framework | |