July 27, 2019

2980 words 14 mins read

Paper Group ANR 678

Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification. Robust Fusion of LiDAR and Wide-Angle Camera Data for Autonomous Mobile Robots. On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data. Deep Speaker Feature Learning for Text-independent Speaker Verification. Deep Le …

Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification


Title	Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification
Authors	Yi Liu, Liang He, Yao Tian, Zhuzi Chen, Jia Liu, Michael T. Johnson
Abstract	Text-dependent speaker verification is becoming popular in the speaker recognition society. However, the conventional i-vector framework which has been successful for speaker identification and other similar tasks works relatively poorly in this task. Researchers have proposed several new methods to improve performance, but it is still unclear that which model is the best choice, especially when the pass-phrases are prompted during enrollment and test. In this paper, we introduce four modeling methods and compare their performance on the newly published RedDots dataset. To further explore the influence of different frame alignments, Viterbi and forward-backward algorithms are both used in the HMM-based models. Several bottleneck features are also investigated. Our experiments show that, by explicitly modeling the lexical content, the HMM-based modeling achieves good results in the fixed-phrase condition. In the prompted-phrase condition, GMM-HMM and i-vector/HMM are not as successful. In both conditions, the forward-backward algorithm brings more benefits to the i-vector/HMM system. Additionally, we also find that even though bottleneck features perform well for text-independent speaker verification, they do not outperform MFCCs on the most challenging Imposter-Correct trials on RedDots.
Tasks	Speaker Identification, Speaker Recognition, Speaker Verification, Text-Dependent Speaker Verification, Text-Independent Speaker Verification
Published	2017-07-14
URL	http://arxiv.org/abs/1707.04373v2
PDF	http://arxiv.org/pdf/1707.04373v2.pdf
PWC	https://paperswithcode.com/paper/comparison-of-multiple-features-and-modeling
Repo
Framework

Robust Fusion of LiDAR and Wide-Angle Camera Data for Autonomous Mobile Robots


Title	Robust Fusion of LiDAR and Wide-Angle Camera Data for Autonomous Mobile Robots
Authors	Varuna De Silva, Jamie Roche, Ahmet Kondoz
Abstract	Autonomous robots that assist humans in day to day living tasks are becoming increasingly popular. Autonomous mobile robots operate by sensing and perceiving their surrounding environment to make accurate driving decisions. A combination of several different sensors such as LiDAR, radar, ultrasound sensors and cameras are utilized to sense the surrounding environment of autonomous vehicles. These heterogeneous sensors simultaneously capture various physical attributes of the environment. Such multimodality and redundancy of sensing need to be positively utilized for reliable and consistent perception of the environment through sensor data fusion. However, these multimodal sensor data streams are different from each other in many ways, such as temporal and spatial resolution, data format, and geometric alignment. For the subsequent perception algorithms to utilize the diversity offered by multimodal sensing, the data streams need to be spatially, geometrically and temporally aligned with each other. In this paper, we address the problem of fusing the outputs of a Light Detection and Ranging (LiDAR) scanner and a wide-angle monocular image sensor for free space detection. The outputs of LiDAR scanner and the image sensor are of different spatial resolutions and need to be aligned with each other. A geometrical model is used to spatially align the two sensor outputs, followed by a Gaussian Process (GP) regression-based resolution matching algorithm to interpolate the missing data with quantifiable uncertainty. The results indicate that the proposed sensor data fusion framework significantly aids the subsequent perception steps, as illustrated by the performance improvement of a uncertainty aware free space detection algorithm
Tasks	Autonomous Vehicles
Published	2017-10-17
URL	http://arxiv.org/abs/1710.06230v3
PDF	http://arxiv.org/pdf/1710.06230v3.pdf
PWC	https://paperswithcode.com/paper/robust-fusion-of-lidar-and-wide-angle-camera
Repo
Framework

On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data


Title	On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data
Authors	Dhruv Choudhary, Arun Kejariwal, Francois Orsini
Abstract	Ever growing volume and velocity of data coupled with decreasing attention span of end users underscore the critical need for real-time analytics. In this regard, anomaly detection plays a key role as an application as well as a means to verify data fidelity. Although the subject of anomaly detection has been researched for over 100 years in a multitude of disciplines such as, but not limited to, astronomy, statistics, manufacturing, econometrics, marketing, most of the existing techniques cannot be used as is on real-time data streams. Further, the lack of characterization of performance – both with respect to real-timeliness and accuracy – on production data sets makes model selection very challenging. To this end, we present an in-depth analysis, geared towards real-time streaming data, of anomaly detection techniques. Given the requirements with respect to real-timeliness and accuracy, the analysis presented in this paper should serve as a guide for selection of the “best” anomaly detection technique. To the best of our knowledge, this is the first characterization of anomaly detection techniques proposed in very diverse set of fields, using production data sets corresponding to a wide set of application domains.
Tasks	Anomaly Detection, Model Selection
Published	2017-10-12
URL	http://arxiv.org/abs/1710.04735v1
PDF	http://arxiv.org/pdf/1710.04735v1.pdf
PWC	https://paperswithcode.com/paper/on-the-runtime-efficacy-trade-off-of-anomaly
Repo
Framework

Deep Speaker Feature Learning for Text-independent Speaker Verification


Title	Deep Speaker Feature Learning for Text-independent Speaker Verification
Authors	Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang
Abstract	Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.
Tasks	Speaker Verification, Text-Independent Speaker Verification
Published	2017-05-10
URL	http://arxiv.org/abs/1705.03670v1
PDF	http://arxiv.org/pdf/1705.03670v1.pdf
PWC	https://paperswithcode.com/paper/deep-speaker-feature-learning-for-text
Repo
Framework

Deep Learning: A Bayesian Perspective


Title	Deep Learning: A Bayesian Perspective
Authors	Nicholas Polson, Vadim Sokolov
Abstract	Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and prediction. By taking a Bayesian probabilistic perspective, we provide a number of insights into more efficient algorithms for optimisation and hyper-parameter tuning. Traditional high-dimensional data reduction techniques, such as principal component analysis (PCA), partial least squares (PLS), reduced rank regression (RRR), projection pursuit regression (PPR) are all shown to be shallow learners. Their deep learning counterparts exploit multiple deep layers of data reduction which provide predictive performance gains. Stochastic gradient descent (SGD) training optimisation and Dropout (DO) regularization provide estimation and variable selection. Bayesian regularization is central to finding weights and connections in networks to optimize the predictive bias-variance trade-off. To illustrate our methodology, we provide an analysis of international bookings on Airbnb. Finally, we conclude with directions for future research.
Tasks
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00473v4
PDF	http://arxiv.org/pdf/1706.00473v4.pdf
PWC	https://paperswithcode.com/paper/deep-learning-a-bayesian-perspective
Repo
Framework

Day-Ahead Solar Forecasting Based on Multi-level Solar Measurements


Title	Day-Ahead Solar Forecasting Based on Multi-level Solar Measurements
Authors	Mohana Alanazi, Mohsen Mahoor, Amin Khodaei
Abstract	The growing proliferation in solar deployment, especially at distribution level, has made the case for power system operators to develop more accurate solar forecasting models. This paper proposes a solar photovoltaic (PV) generation forecasting model based on multi-level solar measurements and utilizing a nonlinear autoregressive with exogenous input (NARX) model to improve the training and achieve better forecasts. The proposed model consists of four stages of data preparation, establishment of fitting model, model training, and forecasting. The model is tested under different weather conditions. Numerical simulations exhibit the acceptable performance of the model when compared to forecasting results obtained from two-level and single-level studies.
Tasks
Published	2017-10-10
URL	http://arxiv.org/abs/1710.03803v1
PDF	http://arxiv.org/pdf/1710.03803v1.pdf
PWC	https://paperswithcode.com/paper/day-ahead-solar-forecasting-based-on-multi
Repo
Framework

Provenance Filtering for Multimedia Phylogeny


Title	Provenance Filtering for Multimedia Phylogeny
Authors	Allan Pinto, Daniel Moreira, Aparna Bharati, Joel Brogan, Kevin Bowyer, Patrick Flynn, Walter Scheirer, Anderson Rocha
Abstract	Departing from traditional digital forensics modeling, which seeks to analyze single objects in isolation, multimedia phylogeny analyzes the evolutionary processes that influence digital objects and collections over time. One of its integral pieces is provenance filtering, which consists of searching a potentially large pool of objects for the most related ones with respect to a given query, in terms of possible ancestors (donors or contributors) and descendants. In this paper, we propose a two-tiered provenance filtering approach to find all the potential images that might have contributed to the creation process of a given query $q$. In our solution, the first (coarse) tier aims to find the most likely “host” images — the major donor or background — contributing to a composite/doctored image. The search is then refined in the second tier, in which we search for more specific (potentially small) parts of the query that might have been extracted from other images and spliced into the query image. Experimental results with a dataset containing more than a million images show that the two-tiered solution underpinned by the context of the query is highly useful for solving this difficult task.
Tasks
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00447v1
PDF	http://arxiv.org/pdf/1706.00447v1.pdf
PWC	https://paperswithcode.com/paper/provenance-filtering-for-multimedia-phylogeny
Repo
Framework

Parallelizing Over Artificial Neural Network Training Runs with Multigrid


Title	Parallelizing Over Artificial Neural Network Training Runs with Multigrid
Authors	Jacob B. Schroder
Abstract	Artificial neural networks are a popular and effective machine learning technique. Great progress has been made parallelizing the expensive training phase of an individual network, leading to highly specialized pieces of hardware, many based on GPU-type architectures, and more concurrent algorithms such as synthetic gradients. However, the training phase continues to be a bottleneck, where the training data must be processed serially over thousands of individual training runs. This work considers a multigrid reduction in time (MGRIT) algorithm that is able to parallelize over the thousands of training runs and converge to the exact same solution as traditional training would provide. MGRIT was originally developed to provide parallelism for time evolution problems that serially step through a finite number of time-steps. This work recasts the training of a neural network similarly, treating neural network training as an evolution equation that evolves the network weights from one step to the next. Thus, this work concerns distributed computing approaches for neural networks, but is distinct from other approaches which seek to parallelize only over individual training runs. The work concludes with supporting numerical results for two model problems.
Tasks
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02276v2
PDF	http://arxiv.org/pdf/1708.02276v2.pdf
PWC	https://paperswithcode.com/paper/parallelizing-over-artificial-neural-network
Repo
Framework

Sparse principal component analysis via axis-aligned random projections


Title	Sparse principal component analysis via axis-aligned random projections
Authors	Milana Gataric, Tengyao Wang, Richard J. Samworth
Abstract	We introduce a new method for sparse principal component analysis, based on the aggregation of eigenvector information from carefully-selected axis-aligned random projections of the sample covariance matrix. Unlike most alternative approaches, our algorithm is non-iterative, so is not vulnerable to a bad choice of initialisation. We provide theoretical guarantees under which our principal subspace estimator can attain the minimax optimal rate of convergence in polynomial time. In addition, our theory provides a more refined understanding of the statistical and computational trade-off in the problem of sparse principal component estimation, revealing a subtle interplay between the effective sample size and the number of random projections that are required to achieve the minimax optimal rate. Numerical studies provide further insight into the procedure and confirm its highly competitive finite-sample performance.
Tasks
Published	2017-12-15
URL	https://arxiv.org/abs/1712.05630v4
PDF	https://arxiv.org/pdf/1712.05630v4.pdf
PWC	https://paperswithcode.com/paper/sparse-principal-component-analysis-via
Repo
Framework

Automated text summarisation and evidence-based medicine: A survey of two domains


Title	Automated text summarisation and evidence-based medicine: A survey of two domains
Authors	Abeed Sarker, Diego Molla, Cecile Paris
Abstract	The practice of evidence-based medicine (EBM) urges medical practitioners to utilise the latest research evidence when making clinical decisions. Because of the massive and growing volume of published research on various medical topics, practitioners often find themselves overloaded with information. As such, natural language processing research has recently commenced exploring techniques for performing medical domain-specific automated text summarisation (ATS) techniques– targeted towards the task of condensing large medical texts. However, the development of effective summarisation techniques for this task requires cross-domain knowledge. We present a survey of EBM, the domain-specific needs for EBM, automated summarisation techniques, and how they have been applied hitherto. We envision that this survey will serve as a first resource for the development of future operational text summarisation techniques for EBM.
Tasks
Published	2017-06-25
URL	http://arxiv.org/abs/1706.08162v1
PDF	http://arxiv.org/pdf/1706.08162v1.pdf
PWC	https://paperswithcode.com/paper/automated-text-summarisation-and-evidence
Repo
Framework

Revealing Hidden Potentials of the q-Space Signal in Breast Cancer


Title	Revealing Hidden Potentials of the q-Space Signal in Breast Cancer
Authors	Paul Jaeger, Sebastian Bickelhaupt, Frederik Bernd Laun, Wolfgang Lederer, Daniel Heidi, Tristan Anselm Kuder, Daniel Paech, David Bonekamp, Alexander Radbruch, Stefan Delorme, Heinz-Peter Schlemmer, Franziska Steudle, Klaus H. Maier-Hein
Abstract	Mammography screening for early detection of breast lesions currently suffers from high amounts of false positive findings, which result in unnecessary invasive biopsies. Diffusion-weighted MR images (DWI) can help to reduce many of these false-positive findings prior to biopsy. Current approaches estimate tissue properties by means of quantitative parameters taken from generative, biophysical models fit to the q-space encoded signal under certain assumptions regarding noise and spatial homogeneity. This process is prone to fitting instability and partial information loss due to model simplicity. We reveal unexplored potentials of the signal by integrating all data processing components into a convolutional neural network (CNN) architecture that is designed to propagate clinical target information down to the raw input images. This approach enables simultaneous and target-specific optimization of image normalization, signal exploitation, global representation learning and classification. Using a multicentric data set of 222 patients, we demonstrate that our approach significantly improves clinical decision making with respect to the current state of the art.
Tasks	Decision Making, Representation Learning
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08379v3
PDF	http://arxiv.org/pdf/1702.08379v3.pdf
PWC	https://paperswithcode.com/paper/revealing-hidden-potentials-of-the-q-space
Repo
Framework

Fast Learning and Prediction for Object Detection using Whitened CNN Features


Title	Fast Learning and Prediction for Object Detection using Whitened CNN Features
Authors	Björn Barz, Erik Rodner, Christoph Käding, Joachim Denzler
Abstract	We combine features extracted from pre-trained convolutional neural networks (CNNs) with the fast, linear Exemplar-LDA classifier to get the advantages of both: the high detection performance of CNNs, automatic feature engineering, fast model learning from few training samples and efficient sliding-window detection. The Adaptive Real-Time Object Detection System (ARTOS) has been refactored broadly to be used in combination with Caffe for the experimental studies reported in this work.
Tasks	Feature Engineering, Object Detection, Real-Time Object Detection, Window Detection
Published	2017-04-10
URL	http://arxiv.org/abs/1704.02930v2
PDF	http://arxiv.org/pdf/1704.02930v2.pdf
PWC	https://paperswithcode.com/paper/fast-learning-and-prediction-for-object
Repo
Framework

Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks


Title	Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks
Authors	Zhe Liu, Anbang Xu, Mengdi Zhang, Jalal Mahmud, Vibha Sinha
Abstract	One problem that every presenter faces when delivering a public discourse is how to hold the listeners’ attentions or to keep them involved. Therefore, many studies in conversation analysis work on this issue and suggest qualitatively con-structions that can effectively lead to audience’s applause. To investigate these proposals quantitatively, in this study we an-alyze the transcripts of 2,135 TED Talks, with a particular fo-cus on the rhetorical devices that are used by the presenters for applause elicitation. Through conducting regression anal-ysis, we identify and interpret 24 rhetorical devices as triggers of audience applauding. We further build models that can rec-ognize applause-evoking sentences and conclude this work with potential implications.
Tasks
Published	2017-03-17
URL	http://arxiv.org/abs/1704.02362v2
PDF	http://arxiv.org/pdf/1704.02362v2.pdf
PWC	https://paperswithcode.com/paper/fostering-user-engagement-rhetorical-devices
Repo
Framework

Hypothesis Testing based Intrinsic Evaluation of Word Embeddings


Title	Hypothesis Testing based Intrinsic Evaluation of Word Embeddings
Authors	Nishant Gurnani
Abstract	We introduce the cross-match test - an exact, distribution free, high-dimensional hypothesis test as an intrinsic evaluation metric for word embeddings. We show that cross-match is an effective means of measuring distributional similarity between different vector representations and of evaluating the statistical significance of different vector embedding models. Additionally, we find that cross-match can be used to provide a quantitative measure of linguistic similarity for selecting bridge languages for machine translation. We demonstrate that the results of the hypothesis test align with our expectations and note that the framework of two sample hypothesis testing is not limited to word embeddings and can be extended to all vector representations.
Tasks	Machine Translation, Word Embeddings
Published	2017-09-04
URL	http://arxiv.org/abs/1709.00831v1
PDF	http://arxiv.org/pdf/1709.00831v1.pdf
PWC	https://paperswithcode.com/paper/hypothesis-testing-based-intrinsic-evaluation
Repo
Framework

Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition


Title	Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition
Authors	Pichao Wang, Wanqing Li, Jun Wan, Philip Ogunbona, Xinwang Liu
Abstract	A novel deep neural network training paradigm that exploits the conjoint information in multiple heterogeneous sources is proposed. Specifically, in a RGB-D based action recognition task, it cooperatively trains a single convolutional neural network (named c-ConvNet) on both RGB visual features and depth features, and deeply aggregates the two kinds of features for action recognition. Differently from the conventional ConvNet that learns the deep separable features for homogeneous modality-based classification with only one softmax loss function, the c-ConvNet enhances the discriminative power of the deeply learned features and weakens the undesired modality discrepancy by jointly optimizing a ranking loss and a softmax loss for both homogeneous and heterogeneous modalities. The ranking loss consists of intra-modality and cross-modality triplet losses, and it reduces both the intra-modality and cross-modality feature variations. Furthermore, the correlations between RGB and depth data are embedded in the c-ConvNet, and can be retrieved by either of the modalities and contribute to the recognition in the case even only one of the modalities is available. The proposed method was extensively evaluated on two large RGB-D action recognition datasets, ChaLearn LAP IsoGD and NTU RGB+D datasets, and one small dataset, SYSU 3D HOI, and achieved state-of-the-art results.
Tasks	Temporal Action Localization
Published	2017-12-05
URL	http://arxiv.org/abs/1801.01080v1
PDF	http://arxiv.org/pdf/1801.01080v1.pdf
PWC	https://paperswithcode.com/paper/cooperative-training-of-deep-aggregation
Repo
Framework