Paper Group ANR 105
Audio Recording Device Identification Based on Deep Learning. Generic Feature Learning for Wireless Capsule Endoscopy Analysis. Gaussian Process Regression for Out-of-Sample Extension. Unsupervised Dialogue Act Induction using Gaussian Mixtures. The Role of Context Types and Dimensionality in Learning Word Embeddings. Automatic Segmentation of Dyna …
Audio Recording Device Identification Based on Deep Learning
Title | Audio Recording Device Identification Based on Deep Learning |
Authors | Simeng Qi, Zheng Huang, Yan Li, Shaopei Shi |
Abstract | In this paper we present a research on identification of audio recording devices from background noise, thus providing a method for forensics. The audio signal is the sum of speech signal and noise signal. Usually, people pay more attention to speech signal, because it carries the information to deliver. So a great amount of researches have been dedicated to getting higher Signal-Noise-Ratio (SNR). There are many speech enhancement algorithms to improve the quality of the speech, which can be seen as reducing the noise. However, noises can be regarded as the intrinsic fingerprint traces of an audio recording device. These digital traces can be characterized and identified by new machine learning techniques. Therefore, in our research, we use the noise as the intrinsic features. As for the identification, multiple classifiers of deep learning methods are used and compared. The identification result shows that the method of getting feature vector from the noise of each device and identifying them with deep learning techniques is viable, and well-preformed. |
Tasks | Speech Enhancement |
Published | 2016-02-18 |
URL | http://arxiv.org/abs/1602.05682v2 |
http://arxiv.org/pdf/1602.05682v2.pdf | |
PWC | https://paperswithcode.com/paper/audio-recording-device-identification-based |
Repo | |
Framework | |
Generic Feature Learning for Wireless Capsule Endoscopy Analysis
Title | Generic Feature Learning for Wireless Capsule Endoscopy Analysis |
Authors | Santi Seguí, Michal Drozdzal, Guillem Pascual, Petia Radeva, Carolina Malagelada, Fernando Azpiroz, Jordi Vitrià |
Abstract | The interpretation and analysis of the wireless capsule endoscopy recording is a complex task which requires sophisticated computer aided decision (CAD) systems in order to help physicians with the video screening and, finally, with the diagnosis. Most of the CAD systems in the capsule endoscopy share a common system design, but use very different image and video representations. As a result, each time a new clinical application of WCE appears, new CAD system has to be designed from scratch. This characteristic makes the design of new CAD systems a very time consuming. Therefore, in this paper we introduce a system for small intestine motility characterization, based on Deep Convolutional Neural Networks, which avoids the laborious step of designing specific features for individual motility events. Experimental results show the superiority of the learned features over alternative classifiers constructed by using state of the art hand-crafted features. In particular, it reaches a mean classification accuracy of 96% for six intestinal motility events, outperforming the other classifiers by a large margin (a 14% relative performance increase). |
Tasks | |
Published | 2016-07-26 |
URL | http://arxiv.org/abs/1607.07604v1 |
http://arxiv.org/pdf/1607.07604v1.pdf | |
PWC | https://paperswithcode.com/paper/generic-feature-learning-for-wireless-capsule |
Repo | |
Framework | |
Gaussian Process Regression for Out-of-Sample Extension
Title | Gaussian Process Regression for Out-of-Sample Extension |
Authors | Oren Barkan, Jonathan Weill, Amir Averbuch |
Abstract | Manifold learning methods are useful for high dimensional data analysis. Many of the existing methods produce a low dimensional representation that attempts to describe the intrinsic geometric structure of the original data. Typically, this process is computationally expensive and the produced embedding is limited to the training data. In many real life scenarios, the ability to produce embedding of unseen samples is essential. In this paper we propose a Bayesian non-parametric approach for out-of-sample extension. The method is based on Gaussian Process Regression and independent of the manifold learning algorithm. Additionally, the method naturally provides a measure for the degree of abnormality for a newly arrived data point that did not participate in the training process. We derive the mathematical connection between the proposed method and the Nystrom extension and show that the latter is a special case of the former. We present extensive experimental results that demonstrate the performance of the proposed method and compare it to other existing out-of-sample extension methods. |
Tasks | |
Published | 2016-03-07 |
URL | http://arxiv.org/abs/1603.02194v2 |
http://arxiv.org/pdf/1603.02194v2.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-process-regression-for-out-of-sample |
Repo | |
Framework | |
Unsupervised Dialogue Act Induction using Gaussian Mixtures
Title | Unsupervised Dialogue Act Induction using Gaussian Mixtures |
Authors | Tomáš Brychcín, Pavel Král |
Abstract | This paper introduces a new unsupervised approach for dialogue act induction. Given the sequence of dialogue utterances, the task is to assign them the labels representing their function in the dialogue. Utterances are represented as real-valued vectors encoding their meaning. We model the dialogue as Hidden Markov model with emission probabilities estimated by Gaussian mixtures. We use Gibbs sampling for posterior inference. We present the results on the standard Switchboard-DAMSL corpus. Our algorithm achieves promising results compared with strong supervised baselines and outperforms other unsupervised algorithms. |
Tasks | |
Published | 2016-12-20 |
URL | http://arxiv.org/abs/1612.06572v2 |
http://arxiv.org/pdf/1612.06572v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-dialogue-act-induction-using-1 |
Repo | |
Framework | |
The Role of Context Types and Dimensionality in Learning Word Embeddings
Title | The Role of Context Types and Dimensionality in Learning Word Embeddings |
Authors | Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal |
Abstract | We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks. Our results suggest that while intrinsic tasks tend to exhibit a clear preference to particular types of contexts and higher dimensionality, more careful tuning is required for finding the optimal settings for most of the extrinsic tasks that we considered. Furthermore, for these extrinsic tasks, we find that once the benefit from increasing the embedding dimensionality is mostly exhausted, simple concatenation of word embeddings, learned with different context types, can yield further performance gains. As an additional contribution, we propose a new variant of the skip-gram model that learns word embeddings from weighted contexts of substitute words. |
Tasks | Learning Word Embeddings, Word Embeddings |
Published | 2016-01-05 |
URL | http://arxiv.org/abs/1601.00893v2 |
http://arxiv.org/pdf/1601.00893v2.pdf | |
PWC | https://paperswithcode.com/paper/the-role-of-context-types-and-dimensionality |
Repo | |
Framework | |
Automatic Segmentation of Dynamic Objects from an Image Pair
Title | Automatic Segmentation of Dynamic Objects from an Image Pair |
Authors | Sri Raghu Malireddi, Shanmuganathan Raman |
Abstract | Automatic segmentation of objects from a single image is a challenging problem which generally requires training on large number of images. We consider the problem of automatically segmenting only the dynamic objects from a given pair of images of a scene captured from different positions. We exploit dense correspondences along with saliency measures in order to first localize the interest points on the dynamic objects from the two images. We propose a novel approach based on techniques from computational geometry in order to automatically segment the dynamic objects from both the images using a top-down segmentation strategy. We discuss how the proposed approach is unique in novelty compared to other state-of-the-art segmentation algorithms. We show that the proposed approach for segmentation is efficient in handling large motions and is able to achieve very good segmentation of the objects for different scenes. We analyse the results with respect to the manually marked ground truth segmentation masks created using our own dataset and provide key observations in order to improve the work in future. |
Tasks | |
Published | 2016-04-16 |
URL | http://arxiv.org/abs/1604.04724v1 |
http://arxiv.org/pdf/1604.04724v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-segmentation-of-dynamic-objects |
Repo | |
Framework | |
Contrastive Entropy: A new evaluation metric for unnormalized language models
Title | Contrastive Entropy: A new evaluation metric for unnormalized language models |
Authors | Kushal Arora, Anand Rangarajan |
Abstract | Perplexity (per word) is the most widely used metric for evaluating language models. Despite this, there has been no dearth of criticism for this metric. Most of these criticisms center around lack of correlation with extrinsic metrics like word error rate (WER), dependence upon shared vocabulary for model comparison and unsuitability for unnormalized language model evaluation. In this paper, we address the last problem and propose a new discriminative entropy based intrinsic metric that works for both traditional word level models and unnormalized language models like sentence level models. We also propose a discriminatively trained sentence level interpretation of recurrent neural network based language model (RNN) as an example of unnormalized sentence level model. We demonstrate that for word level models, contrastive entropy shows a strong correlation with perplexity. We also observe that when trained at lower distortion levels, sentence level RNN considerably outperforms traditional RNNs on this new metric. |
Tasks | Language Modelling |
Published | 2016-01-03 |
URL | http://arxiv.org/abs/1601.00248v2 |
http://arxiv.org/pdf/1601.00248v2.pdf | |
PWC | https://paperswithcode.com/paper/contrastive-entropy-a-new-evaluation-metric |
Repo | |
Framework | |
Dimensionality-Dependent Generalization Bounds for $k$-Dimensional Coding Schemes
Title | Dimensionality-Dependent Generalization Bounds for $k$-Dimensional Coding Schemes |
Authors | Tongliang Liu, Dacheng Tao, Dong Xu |
Abstract | The $k$-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative $k$-dimensional vectors, and include non-negative matrix factorization, dictionary learning, sparse coding, $k$-means clustering and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the $k$-dimensional coding schemes are mainly dimensionality independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data is mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for $k$-dimensional coding schemes that are tighter than dimensionality-independent bounds when data is in a finite-dimensional feature space? The answer is positive. In this paper, we address this problem and derive a dimensionality-dependent generalization bound for $k$-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order $\mathcal{O}\left(\left(mk\ln(mkn)/n\right)^{\lambda_n}\right)$, where $m$ is the dimension of features, $k$ is the number of the columns in the linear implementation of coding schemes, $n$ is the size of sample, $\lambda_n>0.5$ when $n$ is finite and $\lambda_n=0.5$ when $n$ is infinite. We show that our bound can be tighter than previous results, because it avoids inducing the worst-case upper bound on $k$ of the loss function and converges faster. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to these dimensionality-independent generalization bounds. |
Tasks | Dictionary Learning, Quantization |
Published | 2016-01-03 |
URL | http://arxiv.org/abs/1601.00238v2 |
http://arxiv.org/pdf/1601.00238v2.pdf | |
PWC | https://paperswithcode.com/paper/dimensionality-dependent-generalization |
Repo | |
Framework | |
Multigrid Neural Architectures
Title | Multigrid Neural Architectures |
Authors | Tsung-Wei Ke, Michael Maire, Stella X. Yu |
Abstract | We propose a multigrid extension of convolutional neural networks (CNNs). Rather than manipulating representations living on a single spatial grid, our network layers operate across scale space, on a pyramid of grids. They consume multigrid inputs and produce multigrid outputs; convolutional filters themselves have both within-scale and cross-scale extent. This aspect is distinct from simple multiscale designs, which only process the input at different scales. Viewed in terms of information flow, a multigrid network passes messages across a spatial pyramid. As a consequence, receptive field size grows exponentially with depth, facilitating rapid integration of context. Most critically, multigrid structure enables networks to learn internal attention and dynamic routing mechanisms, and use them to accomplish tasks on which modern CNNs fail. Experiments demonstrate wide-ranging performance advantages of multigrid. On CIFAR and ImageNet classification tasks, flipping from a single grid to multigrid within the standard CNN paradigm improves accuracy, while being compute and parameter efficient. Multigrid is independent of other architectural choices; we show synergy in combination with residual connections. Multigrid yields dramatic improvement on a synthetic semantic segmentation dataset. Most strikingly, relatively shallow multigrid networks can learn to directly perform spatial transformation tasks, where, in contrast, current CNNs fail. Together, our results suggest that continuous evolution of features on a multigrid pyramid is a more powerful alternative to existing CNN designs on a flat grid. |
Tasks | Image Classification, Semantic Segmentation |
Published | 2016-11-23 |
URL | http://arxiv.org/abs/1611.07661v2 |
http://arxiv.org/pdf/1611.07661v2.pdf | |
PWC | https://paperswithcode.com/paper/multigrid-neural-architectures |
Repo | |
Framework | |
An Analysis of Tournament Structure
Title | An Analysis of Tournament Structure |
Authors | Nhien Pham Hoang Bao, Hiroyuki Iida |
Abstract | This paper explores a novel way for analyzing the tournament structures to find a best suitable one for the tournament under consideration. It concerns about three aspects such as tournament conducting cost, competitiveness development and ranking precision. It then proposes a new method using progress tree to detect potential throwaway matches. The analysis performed using the proposed method reveals the strengths and weaknesses of tournament structures. As a conclusion, single elimination is best if we want to qualify one winner only, all matches conducted are exciting in term of competitiveness. Double elimination with proper seeding system is a better choice if we want to qualify more winners. A reasonable number of extra matches need to be conducted in exchange of being able to qualify top four winners. Round-robin gives reliable ranking precision for all participants. However, its conduction cost is very high, and it fails to maintain competitiveness development. |
Tasks | |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.08499v1 |
http://arxiv.org/pdf/1611.08499v1.pdf | |
PWC | https://paperswithcode.com/paper/an-analysis-of-tournament-structure |
Repo | |
Framework | |
Adaptive matching pursuit for sparse signal recovery
Title | Adaptive matching pursuit for sparse signal recovery |
Authors | Tiep H. Vu, Hojjat S. Mousavi, Vishal Monga |
Abstract | Spike and Slab priors have been of much recent interest in signal processing as a means of inducing sparsity in Bayesian inference. Applications domains that benefit from the use of these priors include sparse recovery, regression and classification. It is well-known that solving for the sparse coefficient vector to maximize these priors results in a hard non-convex and mixed integer programming problem. Most existing solutions to this optimization problem either involve simplifying assumptions/relaxations or are computationally expensive. We propose a new greedy and adaptive matching pursuit (AMP) algorithm to directly solve this hard problem. Essentially, in each step of the algorithm, the set of active elements would be updated by either adding or removing one index, whichever results in better improvement. In addition, the intermediate steps of the algorithm are calculated via an inexpensive Cholesky decomposition which makes the algorithm much faster. Results on simulated data sets as well as real-world image recovery challenges confirm the benefits of the proposed AMP, particularly in providing a superior cost-quality trade-off over existing alternatives. |
Tasks | Bayesian Inference |
Published | 2016-09-12 |
URL | http://arxiv.org/abs/1610.08495v1 |
http://arxiv.org/pdf/1610.08495v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-matching-pursuit-for-sparse-signal |
Repo | |
Framework | |
Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories
Title | Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories |
Authors | Mark Harmon, Patrick Lucey, Diego Klabjan |
Abstract | In this paper, we predict the likelihood of a player making a shot in basketball from multiagent trajectories. Previous approaches to similar problems center on hand-crafting features to capture domain specific knowledge. Although intuitive, recent work in deep learning has shown this approach is prone to missing important predictive features. To circumvent this issue, we present a convolutional neural network (CNN) approach where we initially represent the multiagent behavior as an image. To encode the adversarial nature of basketball, we use a multi-channel image which we then feed into a CNN. Additionally, to capture the temporal aspect of the trajectories we “fade” the player trajectories. We find that this approach is superior to a traditional FFN model. By using gradient ascent to create images using an already trained CNN, we discover what features the CNN filters learn. Last, we find that a combined CNN+FFN is the best performing network with an error rate of 39%. |
Tasks | |
Published | 2016-09-15 |
URL | http://arxiv.org/abs/1609.04849v4 |
http://arxiv.org/pdf/1609.04849v4.pdf | |
PWC | https://paperswithcode.com/paper/predicting-shot-making-in-basketball-learnt |
Repo | |
Framework | |
Scale Invariant Interest Points with Shearlets
Title | Scale Invariant Interest Points with Shearlets |
Authors | Miguel A. Duval-Poo, Nicoletta Noceti, Francesca Odone, Ernesto De Vito |
Abstract | Shearlets are a relatively new directional multi-scale framework for signal analysis, which have been shown effective to enhance signal discontinuities such as edges and corners at multiple scales. In this work we address the problem of detecting and describing blob-like features in the shearlets framework. We derive a measure which is very effective for blob detection and closely related to the Laplacian of Gaussian. We demonstrate the measure satisfies the perfect scale invariance property in the continuous case. In the discrete setting, we derive algorithms for blob detection and keypoint description. Finally, we provide qualitative justifications of our findings as well as a quantitative evaluation on benchmark data. We also report an experimental evidence that our method is very suitable to deal with compressed and noisy images, thanks to the sparsity property of shearlets. |
Tasks | |
Published | 2016-07-26 |
URL | http://arxiv.org/abs/1607.07639v1 |
http://arxiv.org/pdf/1607.07639v1.pdf | |
PWC | https://paperswithcode.com/paper/scale-invariant-interest-points-with |
Repo | |
Framework | |
Training Auto-encoders Effectively via Eliminating Task-irrelevant Input Variables
Title | Training Auto-encoders Effectively via Eliminating Task-irrelevant Input Variables |
Authors | Hui Shen, Dehua Li, Hong Wu, Zhaoxiang Zang |
Abstract | Auto-encoders are often used as building blocks of deep network classifier to learn feature extractors, but task-irrelevant information in the input data may lead to bad extractors and result in poor generalization performance of the network. In this paper,via dropping the task-irrelevant input variables the performance of auto-encoders can be obviously improved .Specifically, an importance-based variable selection method is proposed to aim at finding the task-irrelevant input variables and dropping them.It firstly estimates importance of each variable,and then drops the variables with importance value lower than a threshold. In order to obtain better performance, the method can be employed for each layer of stacked auto-encoders. Experimental results show that when combined with our method the stacked denoising auto-encoders achieves significantly improved performance on three challenging datasets. |
Tasks | Denoising |
Published | 2016-05-31 |
URL | http://arxiv.org/abs/1605.09458v1 |
http://arxiv.org/pdf/1605.09458v1.pdf | |
PWC | https://paperswithcode.com/paper/training-auto-encoders-effectively-via |
Repo | |
Framework | |
T-CONV: A Convolutional Neural Network For Multi-scale Taxi Trajectory Prediction
Title | T-CONV: A Convolutional Neural Network For Multi-scale Taxi Trajectory Prediction |
Authors | Jianming Lv, Qing Li, Xintong Wang |
Abstract | Precise destination prediction of taxi trajectories can benefit many intelligent location based services such as accurate ad for passengers. Traditional prediction approaches, which treat trajectories as one-dimensional sequences and process them in single scale, fail to capture the diverse two-dimensional patterns of trajectories in different spatial scales. In this paper, we propose T-CONV which models trajectories as two-dimensional images, and adopts multi-layer convolutional neural networks to combine multi-scale trajectory patterns to achieve precise prediction. Furthermore, we conduct gradient analysis to visualize the multi-scale spatial patterns captured by T-CONV and extract the areas with distinct influence on the ultimate prediction. Finally, we integrate multiple local enhancement convolutional fields to explore these important areas deeply for better prediction. Comprehensive experiments based on real trajectory data show that T-CONV can achieve higher accuracy than the state-of-the-art methods. |
Tasks | Trajectory Prediction |
Published | 2016-11-23 |
URL | http://arxiv.org/abs/1611.07635v3 |
http://arxiv.org/pdf/1611.07635v3.pdf | |
PWC | https://paperswithcode.com/paper/t-conv-a-convolutional-neural-network-for |
Repo | |
Framework | |