October 19, 2019

3272 words 16 mins read

Paper Group ANR 173

Lip-Reading Driven Deep Learning Approach for Speech Enhancement. Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. A Fully Convolutional Neural Network Approach to End-to-End Speech Enhancement. Real-Time Prediction of the Duration of Distribution System Outages. On the Suitability of $L_p$-norms for Creating …

Lip-Reading Driven Deep Learning Approach for Speech Enhancement


Title	Lip-Reading Driven Deep Learning Approach for Speech Enhancement
Authors	Ahsan Adeel, Mandar Gogate, Amir Hussain, William M. Whitmer
Abstract	This paper proposes a novel lip-reading driven deep learning framework for speech enhancement. The proposed approach leverages the complementary strengths of both deep learning and analytical acoustic modelling (filtering based approach) as compared to recently published, comparatively simpler benchmark approaches that rely only on deep learning. The proposed audio-visual (AV) speech enhancement framework operates at two levels. In the first level, a novel deep learning-based lip-reading regression model is employed. In the second level, lip-reading approximated clean-audio features are exploited, using an enhanced, visually-derived Wiener filter (EVWF), for the clean audio power spectrum estimation. Specifically, a stacked long-short-term memory (LSTM) based lip-reading regression model is designed for clean audio features estimation using only temporal visual features considering different number of prior visual frames. For clean speech spectrum estimation, a new filterbank-domain EVWF is formulated, which exploits estimated speech features. The proposed EVWF is compared with conventional Spectral Subtraction and Log-Minimum Mean-Square Error methods using both ideal AV mapping and LSTM driven AV mapping. The potential of the proposed speech enhancement framework is evaluated under different dynamic real-world commercially-motivated scenarios (e.g. cafe, public transport, pedestrian area) at different SNR levels (ranging from low to high SNRs) using benchmark Grid and ChiME3 corpora. For objective testing, perceptual evaluation of speech quality is used to evaluate the quality of restored speech. For subjective testing, the standard mean-opinion-score method is used with inferential statistics. Comparative simulation results demonstrate significant lip-reading and speech enhancement improvement in terms of both speech quality and speech intelligibility.
Tasks	Acoustic Modelling, Speech Enhancement
Published	2018-07-31
URL	http://arxiv.org/abs/1808.00046v1
PDF	http://arxiv.org/pdf/1808.00046v1.pdf
PWC	https://paperswithcode.com/paper/lip-reading-driven-deep-learning-approach-for
Repo
Framework

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds


Title	Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds
Authors	Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, Mubarak Shah
Abstract	With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision. In particular, counting in highly dense crowds is a challenging problem with far-reaching applicability in crowd safety and management, as well as gauging political significance of protests and demonstrations. In this paper, we propose a novel approach that simultaneously solves the problems of counting, density map estimation and localization of people in a given dense crowd image. Our formulation is based on an important observation that the three problems are inherently related to each other making the loss function for optimizing a deep CNN decomposable. Since localization requires high-quality images and annotations, we introduce UCF-QNRF dataset that overcomes the shortcomings of previous datasets, and contains 1.25 million humans manually marked with dot annotations. Finally, we present evaluation measures and comparison with recent deep CNN networks, including those developed specifically for crowd counting. Our approach significantly outperforms state-of-the-art on the new dataset, which is the most challenging dataset with the largest number of crowd annotations in the most diverse set of scenes.
Tasks	Crowd Counting, Visual Crowd Analysis
Published	2018-08-02
URL	http://arxiv.org/abs/1808.01050v1
PDF	http://arxiv.org/pdf/1808.01050v1.pdf
PWC	https://paperswithcode.com/paper/composition-loss-for-counting-density-map
Repo
Framework

A Fully Convolutional Neural Network Approach to End-to-End Speech Enhancement


Title	A Fully Convolutional Neural Network Approach to End-to-End Speech Enhancement
Authors	Frank Longueira, Sam Keene
Abstract	This paper will describe a novel approach to the cocktail party problem that relies on a fully convolutional neural network (FCN) architecture. The FCN takes noisy audio data as input and performs nonlinear, filtering operations to produce clean audio data of the target speech at the output. Our method learns a model for one specific speaker, and is then able to extract that speakers voice from babble background noise. Results from experimentation indicate the ability to generalize to new speakers and robustness to new noise environments of varying signal-to-noise ratios. A potential application of this method would be for use in hearing aids. A pre-trained model could be quickly fine tuned for an individuals family members and close friends, and deployed onto a hearing aid to assist listeners in noisy environments.
Tasks	Speech Enhancement
Published	2018-07-20
URL	http://arxiv.org/abs/1807.07959v1
PDF	http://arxiv.org/pdf/1807.07959v1.pdf
PWC	https://paperswithcode.com/paper/a-fully-convolutional-neural-network-approach
Repo
Framework

Real-Time Prediction of the Duration of Distribution System Outages


Title	Real-Time Prediction of the Duration of Distribution System Outages
Authors	Aaron Jaech, Baosen Zhang, Mari Ostendorf, Daniel S. Kirschen
Abstract	This paper addresses the problem of predicting duration of unplanned power outages, using historical outage records to train a series of neural network predictors. The initial duration prediction is made based on environmental factors, and it is updated based on incoming field reports using natural language processing to automatically analyze the text. Experiments using 15 years of outage records show good initial results and improved performance leveraging text. Case studies show that the language processing identifies phrases that point to outage causes and repair steps.
Tasks
Published	2018-04-03
URL	http://arxiv.org/abs/1804.01189v2
PDF	http://arxiv.org/pdf/1804.01189v2.pdf
PWC	https://paperswithcode.com/paper/real-time-prediction-of-the-duration-of
Repo
Framework

On the Suitability of $L_p$-norms for Creating and Preventing Adversarial Examples


Title	On the Suitability of $L_p$-norms for Creating and Preventing Adversarial Examples
Authors	Mahmood Sharif, Lujo Bauer, Michael K. Reiter
Abstract	Much research effort has been devoted to better understanding adversarial examples, which are specially crafted inputs to machine-learning models that are perceptually similar to benign inputs, but are classified differently (i.e., misclassified). Both algorithms that create adversarial examples and strategies for defending against them typically use $L_p$-norms to measure the perceptual similarity between an adversarial input and its benign original. Prior work has already shown, however, that two images need not be close to each other as measured by an $L_p$-norm to be perceptually similar. In this work, we show that nearness according to an $L_p$-norm is not just unnecessary for perceptual similarity, but is also insufficient. Specifically, focusing on datasets (CIFAR10 and MNIST), $L_p$-norms, and thresholds used in prior work, we show through online user studies that “adversarial examples” that are closer to their benign counterparts than required by commonly used $L_p$-norm thresholds can nevertheless be perceptually different to humans from the corresponding benign examples. Namely, the perceptual distance between two images that are “near” each other according to an $L_p$-norm can be high enough that participants frequently classify the two images as representing different objects or digits. Combined with prior work, we thus demonstrate that nearness of inputs as measured by $L_p$-norms is neither necessary nor sufficient for perceptual similarity, which has implications for both creating and defending against adversarial examples. We propose and discuss alternative similarity metrics to stimulate future research in the area.
Tasks
Published	2018-02-27
URL	http://arxiv.org/abs/1802.09653v3
PDF	http://arxiv.org/pdf/1802.09653v3.pdf
PWC	https://paperswithcode.com/paper/on-the-suitability-of-l_p-norms-for-creating
Repo
Framework

Deterministic Inequalities for Smooth M-estimators


Title	Deterministic Inequalities for Smooth M-estimators
Authors	Arun Kumar Kuchibhotla
Abstract	Ever since the proof of asymptotic normality of maximum likelihood estimator by Cramer (1946), it has been understood that a basic technique of the Taylor series expansion suffices for asymptotics of $M$-estimators with smooth/differentiable loss function. Although the Taylor series expansion is a purely deterministic tool, the realization that the asymptotic normality results can also be made deterministic (and so finite sample) received far less attention. With the advent of big data and high-dimensional statistics, the need for finite sample results has increased. In this paper, we use the (well-known) Banach fixed point theorem to derive various deterministic inequalities that lead to the classical results when studied under randomness. In addition, we provide applications of these deterministic inequalities for crossvalidation/subsampling, marginal screening and uniform-in-submodel results that are very useful for post-selection inference and in the study of post-regularization estimators. Our results apply to many classical estimators, in particular, generalized linear models, non-linear regression and cox proportional hazards model. Extensions to non-smooth and constrained problems are also discussed.
Tasks
Published	2018-09-13
URL	http://arxiv.org/abs/1809.05172v1
PDF	http://arxiv.org/pdf/1809.05172v1.pdf
PWC	https://paperswithcode.com/paper/deterministic-inequalities-for-smooth-m
Repo
Framework

Theory of Estimation-of-Distribution Algorithms


Title	Theory of Estimation-of-Distribution Algorithms
Authors	Martin S. Krejca, Carsten Witt
Abstract	Estimation-of-distribution algorithms (EDAs) are general metaheuristics used in optimization that represent a more recent alternative to classical approaches like evolutionary algorithms. In a nutshell, EDAs typically do not directly evolve populations of search points but build probabilistic models of promising solutions by repeatedly sampling and selecting points from the underlying search space. Recently, there has been made significant progress in the theoretical understanding of EDAs. This article provides an up-to-date overview of the most commonly analyzed EDAs and the most recent theoretical results in this area. In particular, emphasis is put on the runtime analysis of simple univariate EDAs, including a description of typical benchmark functions and tools for the analysis. Along the way, open problems and directions for future research are described.
Tasks
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05392v1
PDF	http://arxiv.org/pdf/1806.05392v1.pdf
PWC	https://paperswithcode.com/paper/theory-of-estimation-of-distribution
Repo
Framework

Memetic Viability Evolution for Constrained Optimization


Title	Memetic Viability Evolution for Constrained Optimization
Authors	A. Maesani, G. Iacca, D. Floreano
Abstract	The performance of evolutionary algorithms can be heavily undermined when constraints limit the feasible areas of the search space. For instance, while Covariance Matrix Adaptation Evolution Strategy is one of the most efficient algorithms for unconstrained optimization problems, it cannot be readily applied to constrained ones. Here, we used concepts from Memetic Computing, i.e. the harmonious combination of multiple units of algorithmic information, and Viability Evolution, an alternative abstraction of artificial evolution, to devise a novel approach for solving optimization problems with inequality constraints. Viability Evolution emphasizes elimination of solutions not satisfying viability criteria, defined as boundaries on objectives and constraints. These boundaries are adapted during the search to drive a population of local search units, based on Covariance Matrix Adaptation Evolution Strategy, towards feasible regions. These units can be recombined by means of Differential Evolution operators. Of crucial importance for the performance of our method, an adaptive scheduler toggles between exploitation and exploration by selecting to advance one of the local search units and/or recombine them. The proposed algorithm can outperform several state-of-the-art methods on a diverse set of benchmark and engineering problems, both for quality of solutions and computational resources needed.
Tasks
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02702v1
PDF	http://arxiv.org/pdf/1810.02702v1.pdf
PWC	https://paperswithcode.com/paper/memetic-viability-evolution-for-constrained
Repo
Framework

The particle track reconstruction based on deep learning neural networks


Title	The particle track reconstruction based on deep learning neural networks
Authors	Dmitriy Baranov, Sergey Mitsyn, Pavel Goncharov, Gennady Ososkov
Abstract	One of the most important problems of data processing in high energy and nuclear physics is the event reconstruction. Its main part is the track reconstruction procedure which consists in looking for all tracks that elementary particles leave when they pass through a detector among a huge number of points, so-called hits, produced when flying particles fire detector coordinate planes. Unfortunately, the tracking is seriously impeded by the famous shortcoming of multiwired, strip in GEM detectors due to the appearance in them a lot of fake hits caused by extra spurious crossings of fired strips. Since the number of those fakes is several orders of magnitude greater than for true hits, one faces with the quite serious difficulty to unravel possible track-candidates via true hits ignoring fakes. On the basis of our previous two-stage approach based on hits preprocessing using directed K-d tree search followed by a deep neural classifier we introduce here two new tracking algorithms. Both algorithms combine those two stages in one while using different types of deep neural nets. We show that both proposed deep networks do not require any special preprocessing stage, are more accurate, faster and can be easier parallelized. Preliminary results of our new approaches for simulated events are presented.
Tasks
Published	2018-12-07
URL	http://arxiv.org/abs/1812.03859v1
PDF	http://arxiv.org/pdf/1812.03859v1.pdf
PWC	https://paperswithcode.com/paper/the-particle-track-reconstruction-based-on
Repo
Framework

Transfer Incremental Learning using Data Augmentation


Title	Transfer Incremental Learning using Data Augmentation
Authors	Ghouthi Boukli Hacene, Vincent Gripon, Nicolas Farrugia, Matthieu Arzel, Michel Jezequel
Abstract	Deep learning-based methods have reached state of the art performances, relying on large quantity of available data and computational power. Such methods still remain highly inappropriate when facing a major open machine learning problem, which consists of learning incrementally new classes and examples over time. Combining the outstanding performances of Deep Neural Networks (DNNs) with the flexibility of incremental learning techniques is a promising venue of research. In this contribution, we introduce Transfer Incremental Learning using Data Augmentation (TILDA). TILDA is based on pre-trained DNNs as feature extractor, robust selection of feature vectors in subspaces using a nearest-class-mean based technique, majority votes and data augmentation at both the training and the prediction stages. Experiments on challenging vision datasets demonstrate the ability of the proposed method for low complexity incremental learning, while achieving significantly better accuracy than existing incremental counterparts.
Tasks	Data Augmentation
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02020v1
PDF	http://arxiv.org/pdf/1810.02020v1.pdf
PWC	https://paperswithcode.com/paper/transfer-incremental-learning-using-data
Repo
Framework

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training


Title	Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training
Authors	Bin Liu, Shuai Nie, Yaping Zhang, Dengfeng Ke, Shan Liang, Wenju Liu1
Abstract	In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech distortions and mismatches to training. In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model. Specifically, a jointly compositional scheme of generative adversarial net (GAN) and neural network-based acoustic model (AM) is used in the training phase. GAN is used to generate clean feature representations from noisy features by the guidance of a discriminator that tries to distinguish between the true clean signals and generated signals. The joint optimization of generator, discriminator and AM concentrates the strengths of both GAN and AM for speech recognition. Systematic experiments on CHiME-4 show that the proposed method significantly improves the noise robustness of AM and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively.
Tasks	Speech Enhancement, Speech Recognition
Published	2018-05-02
URL	http://arxiv.org/abs/1805.01357v1
PDF	http://arxiv.org/pdf/1805.01357v1.pdf
PWC	https://paperswithcode.com/paper/boosting-noise-robustness-of-acoustic-model
Repo
Framework

Convolutional-Recurrent Neural Networks for Speech Enhancement


Title	Convolutional-Recurrent Neural Networks for Speech Enhancement
Authors	Han Zhao, Shuayb Zarar, Ivan Tashev, Chin-Hui Lee
Abstract	We propose an end-to-end model based on convolutional and recurrent neural networks for speech enhancement. Our model is purely data-driven and does not make any assumptions about the type or the stationarity of the noise. In contrast to existing methods that use multilayer perceptrons (MLPs), we employ both convolutional and recurrent neural network architectures. Thus, our approach allows us to exploit local structures in both the frequency and temporal domains. By incorporating prior knowledge of speech signals into the design of model structures, we build a model that is more data-efficient and achieves better generalization on both seen and unseen noise. Based on experiments with synthetic data, we demonstrate that our model outperforms existing methods, improving PESQ by up to 0.6 on seen noise and 0.64 on unseen noise.
Tasks	Speech Enhancement
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00579v1
PDF	http://arxiv.org/pdf/1805.00579v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-recurrent-neural-networks-for-5
Repo
Framework

A Directionally Selective Small Target Motion Detecting Visual Neural Network in Cluttered Backgrounds


Title	A Directionally Selective Small Target Motion Detecting Visual Neural Network in Cluttered Backgrounds
Authors	Hongxin Wang, Jigen Peng, Shigang Yue
Abstract	Discriminating targets moving against a cluttered background is a huge challenge, let alone detecting a target as small as one or a few pixels and tracking it in flight. In the fly’s visual system, a class of specific neurons, called small target motion detectors (STMDs), have been identified as showing exquisite selectivity for small target motion. Some of the STMDs have also demonstrated directional selectivity which means these STMDs respond strongly only to their preferred motion direction. Directional selectivity is an important property of these STMD neurons which could contribute to tracking small targets such as mates in flight. However, little has been done on systematically modeling these directional selective STMD neurons. In this paper, we propose a directional selective STMD-based neural network (DSTMD) for small target detection in a cluttered background. In the proposed neural network, a new correlation mechanism is introduced for direction selectivity via correlating signals relayed from two pixels. Then, a lateral inhibition mechanism is implemented on the spatial field for size selectivity of STMD neurons. Extensive experiments showed that the proposed neural network not only is in accord with current biological findings, i.e. showing directional preferences, but also worked reliably in detecting small targets against cluttered backgrounds.
Tasks
Published	2018-01-20
URL	http://arxiv.org/abs/1801.06687v5
PDF	http://arxiv.org/pdf/1801.06687v5.pdf
PWC	https://paperswithcode.com/paper/a-directionally-selective-small-target-motion
Repo
Framework

Recent Progresses in Deep Learning based Acoustic Models (Updated)


Title	Recent Progresses in Deep Learning based Acoustic Models (Updated)
Authors	Dong Yu, Jinyu Li
Abstract	In this paper, we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques. We first discuss acoustic models that can effectively exploit variable-length contextual information, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and their various combination with other models. We then describe acoustic models that are optimized end-to-end with emphasis on feature representations learned jointly with rest of the system, the connectionist temporal classification (CTC) criterion, and the attention-based sequence-to-sequence model. We further illustrate robustness issues in speech recognition systems, and discuss acoustic model adaptation, speech enhancement and separation, and robust training strategies. We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.
Tasks	Speech Enhancement, Speech Recognition
Published	2018-04-25
URL	http://arxiv.org/abs/1804.09298v2
PDF	http://arxiv.org/pdf/1804.09298v2.pdf
PWC	https://paperswithcode.com/paper/recent-progresses-in-deep-learning-based
Repo
Framework

A novel channel pruning method for deep neural network compression


Title	A novel channel pruning method for deep neural network compression
Authors	Yiming Hu, Siyang Sun, Jianquan Li, Xingang Wang, Qingyi Gu
Abstract	In recent years, deep neural networks have achieved great success in the field of computer vision. However, it is still a big challenge to deploy these deep models on resource-constrained embedded devices such as mobile robots, smart phones and so on. Therefore, network compression for such platforms is a reasonable solution to reduce memory consumption and computation complexity. In this paper, a novel channel pruning method based on genetic algorithm is proposed to compress very deep Convolution Neural Networks (CNNs). Firstly, a pre-trained CNN model is pruned layer by layer according to the sensitivity of each layer. After that, the pruned model is fine-tuned based on knowledge distillation framework. These two improvements significantly decrease the model redundancy with less accuracy drop. Channel selection is a combinatorial optimization problem that has exponential solution space. In order to accelerate the selection process, the proposed method formulates it as a search problem, which can be solved efficiently by genetic algorithm. Meanwhile, a two-step approximation fitness function is designed to further improve the efficiency of genetic process. The proposed method has been verified on three benchmark datasets with two popular CNN models: VGGNet and ResNet. On the CIFAR-100 and ImageNet datasets, our approach outperforms several state-of-the-art methods. On the CIFAR-10 and SVHN datasets, the pruned VGGNet achieves better performance than the original model with 8 times parameters compression and 3 times FLOPs reduction.
Tasks	Combinatorial Optimization, Neural Network Compression
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11394v1
PDF	http://arxiv.org/pdf/1805.11394v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-channel-pruning-method-for-deep
Repo
Framework