May 7, 2019

2991 words 15 mins read

Paper Group ANR 105

Audio Recording Device Identification Based on Deep Learning. Generic Feature Learning for Wireless Capsule Endoscopy Analysis. Gaussian Process Regression for Out-of-Sample Extension. Unsupervised Dialogue Act Induction using Gaussian Mixtures. The Role of Context Types and Dimensionality in Learning Word Embeddings. Automatic Segmentation of Dyna …

Audio Recording Device Identification Based on Deep Learning


Title	Audio Recording Device Identification Based on Deep Learning
Authors	Simeng Qi, Zheng Huang, Yan Li, Shaopei Shi
Abstract	In this paper we present a research on identification of audio recording devices from background noise, thus providing a method for forensics. The audio signal is the sum of speech signal and noise signal. Usually, people pay more attention to speech signal, because it carries the information to deliver. So a great amount of researches have been dedicated to getting higher Signal-Noise-Ratio (SNR). There are many speech enhancement algorithms to improve the quality of the speech, which can be seen as reducing the noise. However, noises can be regarded as the intrinsic fingerprint traces of an audio recording device. These digital traces can be characterized and identified by new machine learning techniques. Therefore, in our research, we use the noise as the intrinsic features. As for the identification, multiple classifiers of deep learning methods are used and compared. The identification result shows that the method of getting feature vector from the noise of each device and identifying them with deep learning techniques is viable, and well-preformed.
Tasks	Speech Enhancement
Published	2016-02-18
URL	http://arxiv.org/abs/1602.05682v2
PDF	http://arxiv.org/pdf/1602.05682v2.pdf
PWC	https://paperswithcode.com/paper/audio-recording-device-identification-based
Repo
Framework

Generic Feature Learning for Wireless Capsule Endoscopy Analysis


Title	Generic Feature Learning for Wireless Capsule Endoscopy Analysis
Authors	Santi Seguí, Michal Drozdzal, Guillem Pascual, Petia Radeva, Carolina Malagelada, Fernando Azpiroz, Jordi Vitrià
Abstract	The interpretation and analysis of the wireless capsule endoscopy recording is a complex task which requires sophisticated computer aided decision (CAD) systems in order to help physicians with the video screening and, finally, with the diagnosis. Most of the CAD systems in the capsule endoscopy share a common system design, but use very different image and video representations. As a result, each time a new clinical application of WCE appears, new CAD system has to be designed from scratch. This characteristic makes the design of new CAD systems a very time consuming. Therefore, in this paper we introduce a system for small intestine motility characterization, based on Deep Convolutional Neural Networks, which avoids the laborious step of designing specific features for individual motility events. Experimental results show the superiority of the learned features over alternative classifiers constructed by using state of the art hand-crafted features. In particular, it reaches a mean classification accuracy of 96% for six intestinal motility events, outperforming the other classifiers by a large margin (a 14% relative performance increase).
Tasks
Published	2016-07-26
URL	http://arxiv.org/abs/1607.07604v1
PDF	http://arxiv.org/pdf/1607.07604v1.pdf
PWC	https://paperswithcode.com/paper/generic-feature-learning-for-wireless-capsule
Repo
Framework

Gaussian Process Regression for Out-of-Sample Extension


Title	Gaussian Process Regression for Out-of-Sample Extension
Authors	Oren Barkan, Jonathan Weill, Amir Averbuch
Abstract	Manifold learning methods are useful for high dimensional data analysis. Many of the existing methods produce a low dimensional representation that attempts to describe the intrinsic geometric structure of the original data. Typically, this process is computationally expensive and the produced embedding is limited to the training data. In many real life scenarios, the ability to produce embedding of unseen samples is essential. In this paper we propose a Bayesian non-parametric approach for out-of-sample extension. The method is based on Gaussian Process Regression and independent of the manifold learning algorithm. Additionally, the method naturally provides a measure for the degree of abnormality for a newly arrived data point that did not participate in the training process. We derive the mathematical connection between the proposed method and the Nystrom extension and show that the latter is a special case of the former. We present extensive experimental results that demonstrate the performance of the proposed method and compare it to other existing out-of-sample extension methods.
Tasks
Published	2016-03-07
URL	http://arxiv.org/abs/1603.02194v2
PDF	http://arxiv.org/pdf/1603.02194v2.pdf
PWC	https://paperswithcode.com/paper/gaussian-process-regression-for-out-of-sample
Repo
Framework

Unsupervised Dialogue Act Induction using Gaussian Mixtures


Title	Unsupervised Dialogue Act Induction using Gaussian Mixtures
Authors	Tomáš Brychcín, Pavel Král
Abstract	This paper introduces a new unsupervised approach for dialogue act induction. Given the sequence of dialogue utterances, the task is to assign them the labels representing their function in the dialogue. Utterances are represented as real-valued vectors encoding their meaning. We model the dialogue as Hidden Markov model with emission probabilities estimated by Gaussian mixtures. We use Gibbs sampling for posterior inference. We present the results on the standard Switchboard-DAMSL corpus. Our algorithm achieves promising results compared with strong supervised baselines and outperforms other unsupervised algorithms.
Tasks
Published	2016-12-20
URL	http://arxiv.org/abs/1612.06572v2
PDF	http://arxiv.org/pdf/1612.06572v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-dialogue-act-induction-using-1
Repo
Framework

The Role of Context Types and Dimensionality in Learning Word Embeddings


Title	The Role of Context Types and Dimensionality in Learning Word Embeddings
Authors	Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal
Abstract	We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks. Our results suggest that while intrinsic tasks tend to exhibit a clear preference to particular types of contexts and higher dimensionality, more careful tuning is required for finding the optimal settings for most of the extrinsic tasks that we considered. Furthermore, for these extrinsic tasks, we find that once the benefit from increasing the embedding dimensionality is mostly exhausted, simple concatenation of word embeddings, learned with different context types, can yield further performance gains. As an additional contribution, we propose a new variant of the skip-gram model that learns word embeddings from weighted contexts of substitute words.
Tasks	Learning Word Embeddings, Word Embeddings
Published	2016-01-05
URL	http://arxiv.org/abs/1601.00893v2
PDF	http://arxiv.org/pdf/1601.00893v2.pdf
PWC	https://paperswithcode.com/paper/the-role-of-context-types-and-dimensionality
Repo
Framework

Automatic Segmentation of Dynamic Objects from an Image Pair


Title	Automatic Segmentation of Dynamic Objects from an Image Pair
Authors	Sri Raghu Malireddi, Shanmuganathan Raman
Abstract	Automatic segmentation of objects from a single image is a challenging problem which generally requires training on large number of images. We consider the problem of automatically segmenting only the dynamic objects from a given pair of images of a scene captured from different positions. We exploit dense correspondences along with saliency measures in order to first localize the interest points on the dynamic objects from the two images. We propose a novel approach based on techniques from computational geometry in order to automatically segment the dynamic objects from both the images using a top-down segmentation strategy. We discuss how the proposed approach is unique in novelty compared to other state-of-the-art segmentation algorithms. We show that the proposed approach for segmentation is efficient in handling large motions and is able to achieve very good segmentation of the objects for different scenes. We analyse the results with respect to the manually marked ground truth segmentation masks created using our own dataset and provide key observations in order to improve the work in future.
Tasks
Published	2016-04-16
URL	http://arxiv.org/abs/1604.04724v1
PDF	http://arxiv.org/pdf/1604.04724v1.pdf
PWC	https://paperswithcode.com/paper/automatic-segmentation-of-dynamic-objects
Repo
Framework

Contrastive Entropy: A new evaluation metric for unnormalized language models


Title	Contrastive Entropy: A new evaluation metric for unnormalized language models
Authors	Kushal Arora, Anand Rangarajan
Abstract	Perplexity (per word) is the most widely used metric for evaluating language models. Despite this, there has been no dearth of criticism for this metric. Most of these criticisms center around lack of correlation with extrinsic metrics like word error rate (WER), dependence upon shared vocabulary for model comparison and unsuitability for unnormalized language model evaluation. In this paper, we address the last problem and propose a new discriminative entropy based intrinsic metric that works for both traditional word level models and unnormalized language models like sentence level models. We also propose a discriminatively trained sentence level interpretation of recurrent neural network based language model (RNN) as an example of unnormalized sentence level model. We demonstrate that for word level models, contrastive entropy shows a strong correlation with perplexity. We also observe that when trained at lower distortion levels, sentence level RNN considerably outperforms traditional RNNs on this new metric.
Tasks	Language Modelling
Published	2016-01-03
URL	http://arxiv.org/abs/1601.00248v2
PDF	http://arxiv.org/pdf/1601.00248v2.pdf
PWC	https://paperswithcode.com/paper/contrastive-entropy-a-new-evaluation-metric
Repo
Framework

Dimensionality-Dependent Generalization Bounds for $k$-Dimensional Coding Schemes


Title	Dimensionality-Dependent Generalization Bounds for $k$-Dimensional Coding Schemes
Authors	Tongliang Liu, Dacheng Tao, Dong Xu
Abstract	The $k$-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative $k$-dimensional vectors, and include non-negative matrix factorization, dictionary learning, sparse coding, $k$-means clustering and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the $k$-dimensional coding schemes are mainly dimensionality independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data is mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for $k$-dimensional coding schemes that are tighter than dimensionality-independent bounds when data is in a finite-dimensional feature space? The answer is positive. In this paper, we address this problem and derive a dimensionality-dependent generalization bound for $k$-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order $\mathcal{O}\left(\left(mk\ln(mkn)/n\right)^{\lambda_n}\right)$, where $m$ is the dimension of features, $k$ is the number of the columns in the linear implementation of coding schemes, $n$ is the size of sample, $\lambda_n>0.5$ when $n$ is finite and $\lambda_n=0.5$ when $n$ is infinite. We show that our bound can be tighter than previous results, because it avoids inducing the worst-case upper bound on $k$ of the loss function and converges faster. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to these dimensionality-independent generalization bounds.
Tasks	Dictionary Learning, Quantization
Published	2016-01-03
URL	http://arxiv.org/abs/1601.00238v2
PDF	http://arxiv.org/pdf/1601.00238v2.pdf
PWC	https://paperswithcode.com/paper/dimensionality-dependent-generalization
Repo
Framework

Multigrid Neural Architectures


Title	Multigrid Neural Architectures
Authors	Tsung-Wei Ke, Michael Maire, Stella X. Yu
Abstract	We propose a multigrid extension of convolutional neural networks (CNNs). Rather than manipulating representations living on a single spatial grid, our network layers operate across scale space, on a pyramid of grids. They consume multigrid inputs and produce multigrid outputs; convolutional filters themselves have both within-scale and cross-scale extent. This aspect is distinct from simple multiscale designs, which only process the input at different scales. Viewed in terms of information flow, a multigrid network passes messages across a spatial pyramid. As a consequence, receptive field size grows exponentially with depth, facilitating rapid integration of context. Most critically, multigrid structure enables networks to learn internal attention and dynamic routing mechanisms, and use them to accomplish tasks on which modern CNNs fail. Experiments demonstrate wide-ranging performance advantages of multigrid. On CIFAR and ImageNet classification tasks, flipping from a single grid to multigrid within the standard CNN paradigm improves accuracy, while being compute and parameter efficient. Multigrid is independent of other architectural choices; we show synergy in combination with residual connections. Multigrid yields dramatic improvement on a synthetic semantic segmentation dataset. Most strikingly, relatively shallow multigrid networks can learn to directly perform spatial transformation tasks, where, in contrast, current CNNs fail. Together, our results suggest that continuous evolution of features on a multigrid pyramid is a more powerful alternative to existing CNN designs on a flat grid.
Tasks	Image Classification, Semantic Segmentation
Published	2016-11-23
URL	http://arxiv.org/abs/1611.07661v2
PDF	http://arxiv.org/pdf/1611.07661v2.pdf
PWC	https://paperswithcode.com/paper/multigrid-neural-architectures
Repo
Framework

An Analysis of Tournament Structure


Title	An Analysis of Tournament Structure
Authors	Nhien Pham Hoang Bao, Hiroyuki Iida
Abstract	This paper explores a novel way for analyzing the tournament structures to find a best suitable one for the tournament under consideration. It concerns about three aspects such as tournament conducting cost, competitiveness development and ranking precision. It then proposes a new method using progress tree to detect potential throwaway matches. The analysis performed using the proposed method reveals the strengths and weaknesses of tournament structures. As a conclusion, single elimination is best if we want to qualify one winner only, all matches conducted are exciting in term of competitiveness. Double elimination with proper seeding system is a better choice if we want to qualify more winners. A reasonable number of extra matches need to be conducted in exchange of being able to qualify top four winners. Round-robin gives reliable ranking precision for all participants. However, its conduction cost is very high, and it fails to maintain competitiveness development.
Tasks
Published	2016-11-16
URL	http://arxiv.org/abs/1611.08499v1
PDF	http://arxiv.org/pdf/1611.08499v1.pdf
PWC	https://paperswithcode.com/paper/an-analysis-of-tournament-structure
Repo
Framework

Adaptive matching pursuit for sparse signal recovery


Title	Adaptive matching pursuit for sparse signal recovery
Authors	Tiep H. Vu, Hojjat S. Mousavi, Vishal Monga
Abstract	Spike and Slab priors have been of much recent interest in signal processing as a means of inducing sparsity in Bayesian inference. Applications domains that benefit from the use of these priors include sparse recovery, regression and classification. It is well-known that solving for the sparse coefficient vector to maximize these priors results in a hard non-convex and mixed integer programming problem. Most existing solutions to this optimization problem either involve simplifying assumptions/relaxations or are computationally expensive. We propose a new greedy and adaptive matching pursuit (AMP) algorithm to directly solve this hard problem. Essentially, in each step of the algorithm, the set of active elements would be updated by either adding or removing one index, whichever results in better improvement. In addition, the intermediate steps of the algorithm are calculated via an inexpensive Cholesky decomposition which makes the algorithm much faster. Results on simulated data sets as well as real-world image recovery challenges confirm the benefits of the proposed AMP, particularly in providing a superior cost-quality trade-off over existing alternatives.
Tasks	Bayesian Inference
Published	2016-09-12
URL	http://arxiv.org/abs/1610.08495v1
PDF	http://arxiv.org/pdf/1610.08495v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-matching-pursuit-for-sparse-signal
Repo
Framework

Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories


Title	Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories
Authors	Mark Harmon, Patrick Lucey, Diego Klabjan
Abstract	In this paper, we predict the likelihood of a player making a shot in basketball from multiagent trajectories. Previous approaches to similar problems center on hand-crafting features to capture domain specific knowledge. Although intuitive, recent work in deep learning has shown this approach is prone to missing important predictive features. To circumvent this issue, we present a convolutional neural network (CNN) approach where we initially represent the multiagent behavior as an image. To encode the adversarial nature of basketball, we use a multi-channel image which we then feed into a CNN. Additionally, to capture the temporal aspect of the trajectories we “fade” the player trajectories. We find that this approach is superior to a traditional FFN model. By using gradient ascent to create images using an already trained CNN, we discover what features the CNN filters learn. Last, we find that a combined CNN+FFN is the best performing network with an error rate of 39%.
Tasks
Published	2016-09-15
URL	http://arxiv.org/abs/1609.04849v4
PDF	http://arxiv.org/pdf/1609.04849v4.pdf
PWC	https://paperswithcode.com/paper/predicting-shot-making-in-basketball-learnt
Repo
Framework

Scale Invariant Interest Points with Shearlets


Title	Scale Invariant Interest Points with Shearlets
Authors	Miguel A. Duval-Poo, Nicoletta Noceti, Francesca Odone, Ernesto De Vito
Abstract	Shearlets are a relatively new directional multi-scale framework for signal analysis, which have been shown effective to enhance signal discontinuities such as edges and corners at multiple scales. In this work we address the problem of detecting and describing blob-like features in the shearlets framework. We derive a measure which is very effective for blob detection and closely related to the Laplacian of Gaussian. We demonstrate the measure satisfies the perfect scale invariance property in the continuous case. In the discrete setting, we derive algorithms for blob detection and keypoint description. Finally, we provide qualitative justifications of our findings as well as a quantitative evaluation on benchmark data. We also report an experimental evidence that our method is very suitable to deal with compressed and noisy images, thanks to the sparsity property of shearlets.
Tasks
Published	2016-07-26
URL	http://arxiv.org/abs/1607.07639v1
PDF	http://arxiv.org/pdf/1607.07639v1.pdf
PWC	https://paperswithcode.com/paper/scale-invariant-interest-points-with
Repo
Framework

Training Auto-encoders Effectively via Eliminating Task-irrelevant Input Variables


Title	Training Auto-encoders Effectively via Eliminating Task-irrelevant Input Variables
Authors	Hui Shen, Dehua Li, Hong Wu, Zhaoxiang Zang
Abstract	Auto-encoders are often used as building blocks of deep network classifier to learn feature extractors, but task-irrelevant information in the input data may lead to bad extractors and result in poor generalization performance of the network. In this paper,via dropping the task-irrelevant input variables the performance of auto-encoders can be obviously improved .Specifically, an importance-based variable selection method is proposed to aim at finding the task-irrelevant input variables and dropping them.It firstly estimates importance of each variable,and then drops the variables with importance value lower than a threshold. In order to obtain better performance, the method can be employed for each layer of stacked auto-encoders. Experimental results show that when combined with our method the stacked denoising auto-encoders achieves significantly improved performance on three challenging datasets.
Tasks	Denoising
Published	2016-05-31
URL	http://arxiv.org/abs/1605.09458v1
PDF	http://arxiv.org/pdf/1605.09458v1.pdf
PWC	https://paperswithcode.com/paper/training-auto-encoders-effectively-via
Repo
Framework

T-CONV: A Convolutional Neural Network For Multi-scale Taxi Trajectory Prediction


Title	T-CONV: A Convolutional Neural Network For Multi-scale Taxi Trajectory Prediction
Authors	Jianming Lv, Qing Li, Xintong Wang
Abstract	Precise destination prediction of taxi trajectories can benefit many intelligent location based services such as accurate ad for passengers. Traditional prediction approaches, which treat trajectories as one-dimensional sequences and process them in single scale, fail to capture the diverse two-dimensional patterns of trajectories in different spatial scales. In this paper, we propose T-CONV which models trajectories as two-dimensional images, and adopts multi-layer convolutional neural networks to combine multi-scale trajectory patterns to achieve precise prediction. Furthermore, we conduct gradient analysis to visualize the multi-scale spatial patterns captured by T-CONV and extract the areas with distinct influence on the ultimate prediction. Finally, we integrate multiple local enhancement convolutional fields to explore these important areas deeply for better prediction. Comprehensive experiments based on real trajectory data show that T-CONV can achieve higher accuracy than the state-of-the-art methods.
Tasks	Trajectory Prediction
Published	2016-11-23
URL	http://arxiv.org/abs/1611.07635v3
PDF	http://arxiv.org/pdf/1611.07635v3.pdf
PWC	https://paperswithcode.com/paper/t-conv-a-convolutional-neural-network-for
Repo
Framework