Paper Group ANR 678
Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification. Robust Fusion of LiDAR and Wide-Angle Camera Data for Autonomous Mobile Robots. On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data. Deep Speaker Feature Learning for Text-independent Speaker Verification. Deep Le …
Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification
Title | Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification |
Authors | Yi Liu, Liang He, Yao Tian, Zhuzi Chen, Jia Liu, Michael T. Johnson |
Abstract | Text-dependent speaker verification is becoming popular in the speaker recognition society. However, the conventional i-vector framework which has been successful for speaker identification and other similar tasks works relatively poorly in this task. Researchers have proposed several new methods to improve performance, but it is still unclear that which model is the best choice, especially when the pass-phrases are prompted during enrollment and test. In this paper, we introduce four modeling methods and compare their performance on the newly published RedDots dataset. To further explore the influence of different frame alignments, Viterbi and forward-backward algorithms are both used in the HMM-based models. Several bottleneck features are also investigated. Our experiments show that, by explicitly modeling the lexical content, the HMM-based modeling achieves good results in the fixed-phrase condition. In the prompted-phrase condition, GMM-HMM and i-vector/HMM are not as successful. In both conditions, the forward-backward algorithm brings more benefits to the i-vector/HMM system. Additionally, we also find that even though bottleneck features perform well for text-independent speaker verification, they do not outperform MFCCs on the most challenging Imposter-Correct trials on RedDots. |
Tasks | Speaker Identification, Speaker Recognition, Speaker Verification, Text-Dependent Speaker Verification, Text-Independent Speaker Verification |
Published | 2017-07-14 |
URL | http://arxiv.org/abs/1707.04373v2 |
http://arxiv.org/pdf/1707.04373v2.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-multiple-features-and-modeling |
Repo | |
Framework | |
Robust Fusion of LiDAR and Wide-Angle Camera Data for Autonomous Mobile Robots
Title | Robust Fusion of LiDAR and Wide-Angle Camera Data for Autonomous Mobile Robots |
Authors | Varuna De Silva, Jamie Roche, Ahmet Kondoz |
Abstract | Autonomous robots that assist humans in day to day living tasks are becoming increasingly popular. Autonomous mobile robots operate by sensing and perceiving their surrounding environment to make accurate driving decisions. A combination of several different sensors such as LiDAR, radar, ultrasound sensors and cameras are utilized to sense the surrounding environment of autonomous vehicles. These heterogeneous sensors simultaneously capture various physical attributes of the environment. Such multimodality and redundancy of sensing need to be positively utilized for reliable and consistent perception of the environment through sensor data fusion. However, these multimodal sensor data streams are different from each other in many ways, such as temporal and spatial resolution, data format, and geometric alignment. For the subsequent perception algorithms to utilize the diversity offered by multimodal sensing, the data streams need to be spatially, geometrically and temporally aligned with each other. In this paper, we address the problem of fusing the outputs of a Light Detection and Ranging (LiDAR) scanner and a wide-angle monocular image sensor for free space detection. The outputs of LiDAR scanner and the image sensor are of different spatial resolutions and need to be aligned with each other. A geometrical model is used to spatially align the two sensor outputs, followed by a Gaussian Process (GP) regression-based resolution matching algorithm to interpolate the missing data with quantifiable uncertainty. The results indicate that the proposed sensor data fusion framework significantly aids the subsequent perception steps, as illustrated by the performance improvement of a uncertainty aware free space detection algorithm |
Tasks | Autonomous Vehicles |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06230v3 |
http://arxiv.org/pdf/1710.06230v3.pdf | |
PWC | https://paperswithcode.com/paper/robust-fusion-of-lidar-and-wide-angle-camera |
Repo | |
Framework | |
On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data
Title | On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data |
Authors | Dhruv Choudhary, Arun Kejariwal, Francois Orsini |
Abstract | Ever growing volume and velocity of data coupled with decreasing attention span of end users underscore the critical need for real-time analytics. In this regard, anomaly detection plays a key role as an application as well as a means to verify data fidelity. Although the subject of anomaly detection has been researched for over 100 years in a multitude of disciplines such as, but not limited to, astronomy, statistics, manufacturing, econometrics, marketing, most of the existing techniques cannot be used as is on real-time data streams. Further, the lack of characterization of performance – both with respect to real-timeliness and accuracy – on production data sets makes model selection very challenging. To this end, we present an in-depth analysis, geared towards real-time streaming data, of anomaly detection techniques. Given the requirements with respect to real-timeliness and accuracy, the analysis presented in this paper should serve as a guide for selection of the “best” anomaly detection technique. To the best of our knowledge, this is the first characterization of anomaly detection techniques proposed in very diverse set of fields, using production data sets corresponding to a wide set of application domains. |
Tasks | Anomaly Detection, Model Selection |
Published | 2017-10-12 |
URL | http://arxiv.org/abs/1710.04735v1 |
http://arxiv.org/pdf/1710.04735v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-runtime-efficacy-trade-off-of-anomaly |
Repo | |
Framework | |
Deep Speaker Feature Learning for Text-independent Speaker Verification
Title | Deep Speaker Feature Learning for Text-independent Speaker Verification |
Authors | Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang |
Abstract | Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames. |
Tasks | Speaker Verification, Text-Independent Speaker Verification |
Published | 2017-05-10 |
URL | http://arxiv.org/abs/1705.03670v1 |
http://arxiv.org/pdf/1705.03670v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-speaker-feature-learning-for-text |
Repo | |
Framework | |
Deep Learning: A Bayesian Perspective
Title | Deep Learning: A Bayesian Perspective |
Authors | Nicholas Polson, Vadim Sokolov |
Abstract | Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and prediction. By taking a Bayesian probabilistic perspective, we provide a number of insights into more efficient algorithms for optimisation and hyper-parameter tuning. Traditional high-dimensional data reduction techniques, such as principal component analysis (PCA), partial least squares (PLS), reduced rank regression (RRR), projection pursuit regression (PPR) are all shown to be shallow learners. Their deep learning counterparts exploit multiple deep layers of data reduction which provide predictive performance gains. Stochastic gradient descent (SGD) training optimisation and Dropout (DO) regularization provide estimation and variable selection. Bayesian regularization is central to finding weights and connections in networks to optimize the predictive bias-variance trade-off. To illustrate our methodology, we provide an analysis of international bookings on Airbnb. Finally, we conclude with directions for future research. |
Tasks | |
Published | 2017-06-01 |
URL | http://arxiv.org/abs/1706.00473v4 |
http://arxiv.org/pdf/1706.00473v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-a-bayesian-perspective |
Repo | |
Framework | |
Day-Ahead Solar Forecasting Based on Multi-level Solar Measurements
Title | Day-Ahead Solar Forecasting Based on Multi-level Solar Measurements |
Authors | Mohana Alanazi, Mohsen Mahoor, Amin Khodaei |
Abstract | The growing proliferation in solar deployment, especially at distribution level, has made the case for power system operators to develop more accurate solar forecasting models. This paper proposes a solar photovoltaic (PV) generation forecasting model based on multi-level solar measurements and utilizing a nonlinear autoregressive with exogenous input (NARX) model to improve the training and achieve better forecasts. The proposed model consists of four stages of data preparation, establishment of fitting model, model training, and forecasting. The model is tested under different weather conditions. Numerical simulations exhibit the acceptable performance of the model when compared to forecasting results obtained from two-level and single-level studies. |
Tasks | |
Published | 2017-10-10 |
URL | http://arxiv.org/abs/1710.03803v1 |
http://arxiv.org/pdf/1710.03803v1.pdf | |
PWC | https://paperswithcode.com/paper/day-ahead-solar-forecasting-based-on-multi |
Repo | |
Framework | |
Provenance Filtering for Multimedia Phylogeny
Title | Provenance Filtering for Multimedia Phylogeny |
Authors | Allan Pinto, Daniel Moreira, Aparna Bharati, Joel Brogan, Kevin Bowyer, Patrick Flynn, Walter Scheirer, Anderson Rocha |
Abstract | Departing from traditional digital forensics modeling, which seeks to analyze single objects in isolation, multimedia phylogeny analyzes the evolutionary processes that influence digital objects and collections over time. One of its integral pieces is provenance filtering, which consists of searching a potentially large pool of objects for the most related ones with respect to a given query, in terms of possible ancestors (donors or contributors) and descendants. In this paper, we propose a two-tiered provenance filtering approach to find all the potential images that might have contributed to the creation process of a given query $q$. In our solution, the first (coarse) tier aims to find the most likely “host” images — the major donor or background — contributing to a composite/doctored image. The search is then refined in the second tier, in which we search for more specific (potentially small) parts of the query that might have been extracted from other images and spliced into the query image. Experimental results with a dataset containing more than a million images show that the two-tiered solution underpinned by the context of the query is highly useful for solving this difficult task. |
Tasks | |
Published | 2017-06-01 |
URL | http://arxiv.org/abs/1706.00447v1 |
http://arxiv.org/pdf/1706.00447v1.pdf | |
PWC | https://paperswithcode.com/paper/provenance-filtering-for-multimedia-phylogeny |
Repo | |
Framework | |
Parallelizing Over Artificial Neural Network Training Runs with Multigrid
Title | Parallelizing Over Artificial Neural Network Training Runs with Multigrid |
Authors | Jacob B. Schroder |
Abstract | Artificial neural networks are a popular and effective machine learning technique. Great progress has been made parallelizing the expensive training phase of an individual network, leading to highly specialized pieces of hardware, many based on GPU-type architectures, and more concurrent algorithms such as synthetic gradients. However, the training phase continues to be a bottleneck, where the training data must be processed serially over thousands of individual training runs. This work considers a multigrid reduction in time (MGRIT) algorithm that is able to parallelize over the thousands of training runs and converge to the exact same solution as traditional training would provide. MGRIT was originally developed to provide parallelism for time evolution problems that serially step through a finite number of time-steps. This work recasts the training of a neural network similarly, treating neural network training as an evolution equation that evolves the network weights from one step to the next. Thus, this work concerns distributed computing approaches for neural networks, but is distinct from other approaches which seek to parallelize only over individual training runs. The work concludes with supporting numerical results for two model problems. |
Tasks | |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02276v2 |
http://arxiv.org/pdf/1708.02276v2.pdf | |
PWC | https://paperswithcode.com/paper/parallelizing-over-artificial-neural-network |
Repo | |
Framework | |
Sparse principal component analysis via axis-aligned random projections
Title | Sparse principal component analysis via axis-aligned random projections |
Authors | Milana Gataric, Tengyao Wang, Richard J. Samworth |
Abstract | We introduce a new method for sparse principal component analysis, based on the aggregation of eigenvector information from carefully-selected axis-aligned random projections of the sample covariance matrix. Unlike most alternative approaches, our algorithm is non-iterative, so is not vulnerable to a bad choice of initialisation. We provide theoretical guarantees under which our principal subspace estimator can attain the minimax optimal rate of convergence in polynomial time. In addition, our theory provides a more refined understanding of the statistical and computational trade-off in the problem of sparse principal component estimation, revealing a subtle interplay between the effective sample size and the number of random projections that are required to achieve the minimax optimal rate. Numerical studies provide further insight into the procedure and confirm its highly competitive finite-sample performance. |
Tasks | |
Published | 2017-12-15 |
URL | https://arxiv.org/abs/1712.05630v4 |
https://arxiv.org/pdf/1712.05630v4.pdf | |
PWC | https://paperswithcode.com/paper/sparse-principal-component-analysis-via |
Repo | |
Framework | |
Automated text summarisation and evidence-based medicine: A survey of two domains
Title | Automated text summarisation and evidence-based medicine: A survey of two domains |
Authors | Abeed Sarker, Diego Molla, Cecile Paris |
Abstract | The practice of evidence-based medicine (EBM) urges medical practitioners to utilise the latest research evidence when making clinical decisions. Because of the massive and growing volume of published research on various medical topics, practitioners often find themselves overloaded with information. As such, natural language processing research has recently commenced exploring techniques for performing medical domain-specific automated text summarisation (ATS) techniques– targeted towards the task of condensing large medical texts. However, the development of effective summarisation techniques for this task requires cross-domain knowledge. We present a survey of EBM, the domain-specific needs for EBM, automated summarisation techniques, and how they have been applied hitherto. We envision that this survey will serve as a first resource for the development of future operational text summarisation techniques for EBM. |
Tasks | |
Published | 2017-06-25 |
URL | http://arxiv.org/abs/1706.08162v1 |
http://arxiv.org/pdf/1706.08162v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-text-summarisation-and-evidence |
Repo | |
Framework | |
Revealing Hidden Potentials of the q-Space Signal in Breast Cancer
Title | Revealing Hidden Potentials of the q-Space Signal in Breast Cancer |
Authors | Paul Jaeger, Sebastian Bickelhaupt, Frederik Bernd Laun, Wolfgang Lederer, Daniel Heidi, Tristan Anselm Kuder, Daniel Paech, David Bonekamp, Alexander Radbruch, Stefan Delorme, Heinz-Peter Schlemmer, Franziska Steudle, Klaus H. Maier-Hein |
Abstract | Mammography screening for early detection of breast lesions currently suffers from high amounts of false positive findings, which result in unnecessary invasive biopsies. Diffusion-weighted MR images (DWI) can help to reduce many of these false-positive findings prior to biopsy. Current approaches estimate tissue properties by means of quantitative parameters taken from generative, biophysical models fit to the q-space encoded signal under certain assumptions regarding noise and spatial homogeneity. This process is prone to fitting instability and partial information loss due to model simplicity. We reveal unexplored potentials of the signal by integrating all data processing components into a convolutional neural network (CNN) architecture that is designed to propagate clinical target information down to the raw input images. This approach enables simultaneous and target-specific optimization of image normalization, signal exploitation, global representation learning and classification. Using a multicentric data set of 222 patients, we demonstrate that our approach significantly improves clinical decision making with respect to the current state of the art. |
Tasks | Decision Making, Representation Learning |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08379v3 |
http://arxiv.org/pdf/1702.08379v3.pdf | |
PWC | https://paperswithcode.com/paper/revealing-hidden-potentials-of-the-q-space |
Repo | |
Framework | |
Fast Learning and Prediction for Object Detection using Whitened CNN Features
Title | Fast Learning and Prediction for Object Detection using Whitened CNN Features |
Authors | Björn Barz, Erik Rodner, Christoph Käding, Joachim Denzler |
Abstract | We combine features extracted from pre-trained convolutional neural networks (CNNs) with the fast, linear Exemplar-LDA classifier to get the advantages of both: the high detection performance of CNNs, automatic feature engineering, fast model learning from few training samples and efficient sliding-window detection. The Adaptive Real-Time Object Detection System (ARTOS) has been refactored broadly to be used in combination with Caffe for the experimental studies reported in this work. |
Tasks | Feature Engineering, Object Detection, Real-Time Object Detection, Window Detection |
Published | 2017-04-10 |
URL | http://arxiv.org/abs/1704.02930v2 |
http://arxiv.org/pdf/1704.02930v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-learning-and-prediction-for-object |
Repo | |
Framework | |
Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks
Title | Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks |
Authors | Zhe Liu, Anbang Xu, Mengdi Zhang, Jalal Mahmud, Vibha Sinha |
Abstract | One problem that every presenter faces when delivering a public discourse is how to hold the listeners’ attentions or to keep them involved. Therefore, many studies in conversation analysis work on this issue and suggest qualitatively con-structions that can effectively lead to audience’s applause. To investigate these proposals quantitatively, in this study we an-alyze the transcripts of 2,135 TED Talks, with a particular fo-cus on the rhetorical devices that are used by the presenters for applause elicitation. Through conducting regression anal-ysis, we identify and interpret 24 rhetorical devices as triggers of audience applauding. We further build models that can rec-ognize applause-evoking sentences and conclude this work with potential implications. |
Tasks | |
Published | 2017-03-17 |
URL | http://arxiv.org/abs/1704.02362v2 |
http://arxiv.org/pdf/1704.02362v2.pdf | |
PWC | https://paperswithcode.com/paper/fostering-user-engagement-rhetorical-devices |
Repo | |
Framework | |
Hypothesis Testing based Intrinsic Evaluation of Word Embeddings
Title | Hypothesis Testing based Intrinsic Evaluation of Word Embeddings |
Authors | Nishant Gurnani |
Abstract | We introduce the cross-match test - an exact, distribution free, high-dimensional hypothesis test as an intrinsic evaluation metric for word embeddings. We show that cross-match is an effective means of measuring distributional similarity between different vector representations and of evaluating the statistical significance of different vector embedding models. Additionally, we find that cross-match can be used to provide a quantitative measure of linguistic similarity for selecting bridge languages for machine translation. We demonstrate that the results of the hypothesis test align with our expectations and note that the framework of two sample hypothesis testing is not limited to word embeddings and can be extended to all vector representations. |
Tasks | Machine Translation, Word Embeddings |
Published | 2017-09-04 |
URL | http://arxiv.org/abs/1709.00831v1 |
http://arxiv.org/pdf/1709.00831v1.pdf | |
PWC | https://paperswithcode.com/paper/hypothesis-testing-based-intrinsic-evaluation |
Repo | |
Framework | |
Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition
Title | Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition |
Authors | Pichao Wang, Wanqing Li, Jun Wan, Philip Ogunbona, Xinwang Liu |
Abstract | A novel deep neural network training paradigm that exploits the conjoint information in multiple heterogeneous sources is proposed. Specifically, in a RGB-D based action recognition task, it cooperatively trains a single convolutional neural network (named c-ConvNet) on both RGB visual features and depth features, and deeply aggregates the two kinds of features for action recognition. Differently from the conventional ConvNet that learns the deep separable features for homogeneous modality-based classification with only one softmax loss function, the c-ConvNet enhances the discriminative power of the deeply learned features and weakens the undesired modality discrepancy by jointly optimizing a ranking loss and a softmax loss for both homogeneous and heterogeneous modalities. The ranking loss consists of intra-modality and cross-modality triplet losses, and it reduces both the intra-modality and cross-modality feature variations. Furthermore, the correlations between RGB and depth data are embedded in the c-ConvNet, and can be retrieved by either of the modalities and contribute to the recognition in the case even only one of the modalities is available. The proposed method was extensively evaluated on two large RGB-D action recognition datasets, ChaLearn LAP IsoGD and NTU RGB+D datasets, and one small dataset, SYSU 3D HOI, and achieved state-of-the-art results. |
Tasks | Temporal Action Localization |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1801.01080v1 |
http://arxiv.org/pdf/1801.01080v1.pdf | |
PWC | https://paperswithcode.com/paper/cooperative-training-of-deep-aggregation |
Repo | |
Framework | |