May 7, 2019

3087 words 15 mins read

Paper Group ANR 51

Multi-task Learning with Weak Class Labels: Leveraging iEEG to Detect Cortical Lesions in Cryptogenic Epilepsy. Unsupervised Ranking Model for Entity Coreference Resolution. Kernel Selection using Multiple Kernel Learning and Domain Adaptation in Reproducing Kernel Hilbert Space, for Face Recognition under Surveillance Scenario. Pursuits in Structu …

Multi-task Learning with Weak Class Labels: Leveraging iEEG to Detect Cortical Lesions in Cryptogenic Epilepsy


Title	Multi-task Learning with Weak Class Labels: Leveraging iEEG to Detect Cortical Lesions in Cryptogenic Epilepsy
Authors	Bilal Ahmed, Thomas Thesen, Karen E. Blackmon, Ruben Kuzniecky, Orrin Devinsky, Jennifer G. Dy, Carla E. Brodley
Abstract	Multi-task learning (MTL) is useful for domains in which data originates from multiple sources that are individually under-sampled. MTL methods are able to learn classification models that have higher performance as compared to learning a single model by aggregating all the data together or learning a separate model for each data source. The performance of these methods relies on label accuracy. We address the problem of simultaneously learning multiple classifiers in the MTL framework when the training data has imprecise labels. We assume that there is an additional source of information that provides a score for each instance which reflects the certainty about its label. Modeling this score as being generated by an underlying ranking function, we augment the MTL framework with an added layer of supervision. This results in new MTL methods that are able to learn accurate classifiers while preserving the domain structure provided through the rank information. We apply these methods to the task of detecting abnormal cortical regions in the MRIs of patients suffering from focal epilepsy whose MRI were read as normal by expert neuroradiologists. In addition to the noisy labels provided by the results of surgical resection, we employ the results of an invasive intracranial-EEG exam as an additional source of label information. Our proposed methods are able to successfully detect abnormal regions for all patients in our dataset and achieve a higher performance as compared to baseline methods.
Tasks	EEG, Multi-Task Learning
Published	2016-07-30
URL	http://arxiv.org/abs/1608.00148v1
PDF	http://arxiv.org/pdf/1608.00148v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-learning-with-weak-class-labels
Repo
Framework

Unsupervised Ranking Model for Entity Coreference Resolution


Title	Unsupervised Ranking Model for Entity Coreference Resolution
Authors	Xuezhe Ma, Zhengzhong Liu, Eduard Hovy
Abstract	Coreference resolution is one of the first stages in deep language understanding and its importance has been well recognized in the natural language processing community. In this paper, we propose a generative, unsupervised ranking model for entity coreference resolution by introducing resolution mode variables. Our unsupervised system achieves 58.44% F1 score of the CoNLL metric on the English data from the CoNLL-2012 shared task (Pradhan et al., 2012), outperforming the Stanford deterministic system (Lee et al., 2013) by 3.01%.
Tasks	Coreference Resolution
Published	2016-03-15
URL	http://arxiv.org/abs/1603.04553v1
PDF	http://arxiv.org/pdf/1603.04553v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-ranking-model-for-entity
Repo
Framework

Kernel Selection using Multiple Kernel Learning and Domain Adaptation in Reproducing Kernel Hilbert Space, for Face Recognition under Surveillance Scenario


Title	Kernel Selection using Multiple Kernel Learning and Domain Adaptation in Reproducing Kernel Hilbert Space, for Face Recognition under Surveillance Scenario
Authors	Samik Banerjee, Sukhendu Das
Abstract	Face Recognition (FR) has been the interest to several researchers over the past few decades due to its passive nature of biometric authentication. Despite high accuracy achieved by face recognition algorithms under controlled conditions, achieving the same performance for face images obtained in surveillance scenarios, is a major hurdle. Some attempts have been made to super-resolve the low-resolution face images and improve the contrast, without considerable degree of success. The proposed technique in this paper tries to cope with the very low resolution and low contrast face images obtained from surveillance cameras, for FR under surveillance conditions. For Support Vector Machine classification, the selection of appropriate kernel has been a widely discussed issue in the research community. In this paper, we propose a novel kernel selection technique termed as MFKL (Multi-Feature Kernel Learning) to obtain the best feature-kernel pairing. Our proposed technique employs a effective kernel selection by Multiple Kernel Learning (MKL) method, to choose the optimal kernel to be used along with unsupervised domain adaptation method in the Reproducing Kernel Hilbert Space (RKHS), for a solution to the problem. Rigorous experimentation has been performed on three real-world surveillance face datasets : FR_SURV, SCface and ChokePoint. Results have been shown using Rank-1 Recognition Accuracy, ROC and CMC measures. Our proposed method outperforms all other recent state-of-the-art techniques by a considerable margin.
Tasks	Domain Adaptation, Face Recognition, Unsupervised Domain Adaptation
Published	2016-10-03
URL	http://arxiv.org/abs/1610.00660v1
PDF	http://arxiv.org/pdf/1610.00660v1.pdf
PWC	https://paperswithcode.com/paper/kernel-selection-using-multiple-kernel
Repo
Framework

Pursuits in Structured Non-Convex Matrix Factorizations


Title	Pursuits in Structured Non-Convex Matrix Factorizations
Authors	Rajiv Khanna, Michael Tschannen, Martin Jaggi
Abstract	Efficiently representing real world data in a succinct and parsimonious manner is of central importance in many fields. We present a generalized greedy pursuit framework, allowing us to efficiently solve structured matrix factorization problems, where the factors are allowed to be from arbitrary sets of structured vectors. Such structure may include sparsity, non-negativeness, order, or a combination thereof. The algorithm approximates a given matrix by a linear combination of few rank-1 matrices, each factorized into an outer product of two vector atoms of the desired structure. For the non-convex subproblems of obtaining good rank-1 structured matrix atoms, we employ and analyze a general atomic power method. In addition to the above applications, we prove linear convergence for generalized pursuit variants in Hilbert spaces - for the task of approximation over the linear span of arbitrary dictionaries - which generalizes OMP and is useful beyond matrix problems. Our experiments on real datasets confirm both the efficiency and also the broad applicability of our framework in practice.
Tasks
Published	2016-02-12
URL	http://arxiv.org/abs/1602.04208v1
PDF	http://arxiv.org/pdf/1602.04208v1.pdf
PWC	https://paperswithcode.com/paper/pursuits-in-structured-non-convex-matrix
Repo
Framework

Document image classification, with a specific view on applications of patent images


Title	Document image classification, with a specific view on applications of patent images
Authors	Gabriela Csurka
Abstract	The main focus of this paper is document image classification and retrieval, where we analyze and compare different parameters for the RunLeght Histogram (RL) and Fisher Vector (FV) based image representations. We do an exhaustive experimental study using different document image datasets, including the MARG benchmarks, two datasets built on customer data and the images from the Patent Image Classification task of the Clef-IP 2011. The aim of the study is to give guidelines on how to best choose the parameters such that the same features perform well on different tasks. As an example of such need, we describe the Image-based Patent Retrieval task’s of Clef-IP 2011, where we used the same image representation to predict the image type and retrieve relevant patents.
Tasks	Document Image Classification, Image Classification
Published	2016-01-13
URL	http://arxiv.org/abs/1601.03295v1
PDF	http://arxiv.org/pdf/1601.03295v1.pdf
PWC	https://paperswithcode.com/paper/document-image-classification-with-a-specific
Repo
Framework

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions


Title	The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Authors	Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel
Abstract	One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and counting, to segmentation and reconstruction. To train a method to perform even one of these operations accurately from {image,question,answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best. We propose here instead a more general and scalable approach which exploits the fact that very good methods to achieve these operations already exist, and thus do not need to be trained. Our method thus learns how to exploit a set of external off-the-shelf algorithms to achieve its goal, an approach that has something in common with the Neural Turing Machine. The core of our proposed method is a new co-attention model. In addition, the proposed approach generates human-readable reasons for its decision, and can still be trained end-to-end without ground truth reasons being given. We demonstrate the effectiveness on two publicly available datasets, Visual Genome and VQA, and show that it produces the state-of-the-art results in both cases.
Tasks	Question Answering, Visual Question Answering
Published	2016-12-16
URL	http://arxiv.org/abs/1612.05386v1
PDF	http://arxiv.org/pdf/1612.05386v1.pdf
PWC	https://paperswithcode.com/paper/the-vqa-machine-learning-how-to-use-existing
Repo
Framework

Modeling selectional restrictions in a relational type system


Title	Modeling selectional restrictions in a relational type system
Authors	Erkki Luuk
Abstract	Selectional restrictions are semantic constraints on forming certain complex types in natural language. The paper gives an overview of modeling selectional restrictions in a relational type system with morphological and syntactic types. We discuss some foundations of the system and ways of formalizing selectional restrictions. Keywords: type theory, selectional restrictions, syntax, morphology
Tasks
Published	2016-07-28
URL	http://arxiv.org/abs/1607.08592v1
PDF	http://arxiv.org/pdf/1607.08592v1.pdf
PWC	https://paperswithcode.com/paper/modeling-selectional-restrictions-in-a
Repo
Framework

3D Keypoint Detection Based on Deep Neural Network with Sparse Autoencoder


Title	3D Keypoint Detection Based on Deep Neural Network with Sparse Autoencoder
Authors	Xinyu Lin, Ce Zhu, Qian Zhang, Yipeng Liu
Abstract	Researchers have proposed various methods to extract 3D keypoints from the surface of 3D mesh models over the last decades, but most of them are based on geometric methods, which lack enough flexibility to meet the requirements for various applications. In this paper, we propose a new method on the basis of deep learning by formulating the 3D keypoint detection as a regression problem using deep neural network (DNN) with sparse autoencoder (SAE) as our regression model. Both local information and global information of a 3D mesh model in multi-scale space are fully utilized to detect whether a vertex is a keypoint or not. SAE can effectively extract the internal structure of these two kinds of information and formulate high-level features for them, which is beneficial to the regression model. Three SAEs are used to formulate the hidden layers of the DNN and then a logistic regression layer is trained to process the high-level features produced in the third SAE. Numerical experiments show that the proposed DNN based 3D keypoint detection algorithm outperforms current five state-of-the-art methods for various 3D mesh models.
Tasks	Keypoint Detection
Published	2016-04-30
URL	http://arxiv.org/abs/1605.00129v1
PDF	http://arxiv.org/pdf/1605.00129v1.pdf
PWC	https://paperswithcode.com/paper/3d-keypoint-detection-based-on-deep-neural
Repo
Framework

Convergence of Contrastive Divergence Algorithm in Exponential Family


Title	Convergence of Contrastive Divergence Algorithm in Exponential Family
Authors	Bai Jiang, Tung-Yu Wu, Yifan Jin, Wing H. Wong
Abstract	The Contrastive Divergence (CD) algorithm has achieved notable success in training energy-based models including Restricted Boltzmann Machines and played a key role in the emergence of deep learning. The idea of this algorithm is to approximate the intractable term in the exact gradient of the log-likelihood function by using short Markov chain Monte Carlo (MCMC) runs. The approximate gradient is computationally-cheap but biased. Whether and why the CD algorithm provides an asymptotically consistent estimate are still open questions. This paper studies the asymptotic properties of the CD algorithm in canonical exponential families, which are special cases of the energy-based model. Suppose the CD algorithm runs $m$ MCMC transition steps at each iteration $t$ and iteratively generates a sequence of parameter estimates ${\theta_t}{t \ge 0}$ given an i.i.d. data sample ${X_i}{i=1}^n \sim p_{\theta_\star}$. Under conditions which are commonly obeyed by the CD algorithm in practice, we prove the existence of some bounded $m$ such that any limit point of the time average $\left. \sum_{s=0}^{t-1} \theta_s \right/ t$ as $t \to \infty$ is a consistent estimate for the true parameter $\theta_\star$. Our proof is based on the fact that ${\theta_t}_{t \ge 0}$ is a homogenous Markov chain conditional on the data sample ${X_i}_{i=1}^n$. This chain meets the Foster-Lyapunov drift criterion and converges to a random walk around the Maximum Likelihood Estimate. The range of the random walk shrinks to zero at rate $\mathcal{O}(1/\sqrt[3]{n})$ as the sample size $n \to \infty$.
Tasks
Published	2016-03-17
URL	http://arxiv.org/abs/1603.05729v3
PDF	http://arxiv.org/pdf/1603.05729v3.pdf
PWC	https://paperswithcode.com/paper/convergence-of-contrastive-divergence
Repo
Framework

Large Scale Deep Convolutional Neural Network Features Search with Lucene


Title	Large Scale Deep Convolutional Neural Network Features Search with Lucene
Authors	Claudio Gennaro
Abstract	In this work, we propose an approach to index Deep Convolutional Neural Network Features to support efficient content-based retrieval on large image databases. To this aim, we have converted the these features into a textual form, to index them into an inverted index by means of Lucene. In this way, we were able to set up a robust retrieval system that combines full-text search with content-based image retrieval capabilities. We evaluated different strategies of textual representation in order to optimize the index occupation and the query response time. In order to show that our approach is able to handle large datasets, we have developed a web-based prototype that provides an interface for combined textual and visual searching into a dataset of about 100 million of images.
Tasks	Content-Based Image Retrieval, Image Retrieval
Published	2016-03-31
URL	http://arxiv.org/abs/1603.09687v4
PDF	http://arxiv.org/pdf/1603.09687v4.pdf
PWC	https://paperswithcode.com/paper/large-scale-deep-convolutional-neural-network
Repo
Framework

Automated Image Captioning for Rapid Prototyping and Resource Constrained Environments


Title	Automated Image Captioning for Rapid Prototyping and Resource Constrained Environments
Authors	Karan Sharma, Arun CS Kumar, Suchendra Bhandarkar
Abstract	Significant performance gains in deep learning coupled with the exponential growth of image and video data on the Internet have resulted in the recent emergence of automated image captioning systems. Ensuring scalability of automated image captioning systems with respect to the ever increasing volume of image and video data is a significant challenge. This paper provides a valuable insight in that the detection of a few significant (top) objects in an image allows one to extract other relevant information such as actions (verbs) in the image. We expect this insight to be useful in the design of scalable image captioning systems. We address two parameters by which the scalability of image captioning systems could be quantified, i.e., the traditional algorithmic time complexity which is important given the resource limitations of the user device and the system development time since the programmers’ time is a critical resource constraint in many real-world scenarios. Additionally, we address the issue of how word embeddings could be used to infer the verb (action) from the nouns (objects) in a given image in a zero-shot manner. Our results show that it is possible to attain reasonably good performance on predicting actions and captioning images using our approaches with the added advantage of simplicity of implementation.
Tasks	Image Captioning, Word Embeddings
Published	2016-06-04
URL	http://arxiv.org/abs/1606.01393v1
PDF	http://arxiv.org/pdf/1606.01393v1.pdf
PWC	https://paperswithcode.com/paper/automated-image-captioning-for-rapid
Repo
Framework

Multi-view Dimensionality Reduction for Dialect Identification of Arabic Broadcast Speech


Title	Multi-view Dimensionality Reduction for Dialect Identification of Arabic Broadcast Speech
Authors	Sameer Khurana, Ahmed Ali, Steve Renals
Abstract	In this work, we present a new Vector Space Model (VSM) of speech utterances for the task of spoken dialect identification. Generally, DID systems are built using two sets of features that are extracted from speech utterances; acoustic and phonetic. The acoustic and phonetic features are used to form vector representations of speech utterances in an attempt to encode information about the spoken dialects. The Phonotactic and Acoustic VSMs, thus formed, are used for the task of DID. The aim of this paper is to construct a single VSM that encodes information about spoken dialects from both the Phonotactic and Acoustic VSMs. Given the two views of the data, we make use of a well known multi-view dimensionality reduction technique known as Canonical Correlation Analysis (CCA), to form a single vector representation for each speech utterance that encodes dialect specific discriminative information from both the phonetic and acoustic representations. We refer to this approach as feature space combination approach and show that our CCA based feature vector representation performs better on the Arabic DID task than the phonetic and acoustic feature representations used alone. We also present the feature space combination approach as a viable alternative to the model based combination approach, where two DID systems are built using the two VSMs (Phonotactic and Acoustic) and the final prediction score is the output score combination from the two systems.
Tasks	Dimensionality Reduction
Published	2016-09-19
URL	http://arxiv.org/abs/1609.05650v1
PDF	http://arxiv.org/pdf/1609.05650v1.pdf
PWC	https://paperswithcode.com/paper/multi-view-dimensionality-reduction-for
Repo
Framework

Emergence of Compositional Representations in Restricted Boltzmann Machines


Title	Emergence of Compositional Representations in Restricted Boltzmann Machines
Authors	Jérôme Tubiana, Rémi Monasson
Abstract	Extracting automatically the complex set of features composing real high-dimensional data is crucial for achieving high performance in machine–learning tasks. Restricted Boltzmann Machines (RBM) are empirically known to be efficient for this purpose, and to be able to generate distributed and graded representations of the data. We characterize the structural conditions (sparsity of the weights, low effective temperature, nonlinearities in the activation functions of hidden units, and adaptation of fields maintaining the activity in the visible layer) allowing RBM to operate in such a compositional phase. Evidence is provided by the replica analysis of an adequate statistical ensemble of random RBMs and by RBM trained on the handwritten digits dataset MNIST.
Tasks
Published	2016-11-21
URL	http://arxiv.org/abs/1611.06759v2
PDF	http://arxiv.org/pdf/1611.06759v2.pdf
PWC	https://paperswithcode.com/paper/emergence-of-compositional-representations-in
Repo
Framework

Autism Spectrum Disorder Classification using Graph Kernels on Multidimensional Time Series


Title	Autism Spectrum Disorder Classification using Graph Kernels on Multidimensional Time Series
Authors	Rushil Anirudh, Jayaraman J. Thiagarajan, Irene Kim, Wolfgang Polonik
Abstract	We present an approach to model time series data from resting state fMRI for autism spectrum disorder (ASD) severity classification. We propose to adopt kernel machines and employ graph kernels that define a kernel dot product between two graphs. This enables us to take advantage of spatio-temporal information to capture the dynamics of the brain network, as opposed to aggregating them in the spatial or temporal dimension. In addition to the conventional similarity graphs, we explore the use of L1 graph using sparse coding, and the persistent homology of time delay embeddings, in the proposed pipeline for ASD classification. In our experiments on two datasets from the ABIDE collection, we demonstrate a consistent and significant advantage in using graph kernels over traditional linear or non linear kernels for a variety of time series features.
Tasks	Time Series
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09897v1
PDF	http://arxiv.org/pdf/1611.09897v1.pdf
PWC	https://paperswithcode.com/paper/autism-spectrum-disorder-classification-using
Repo
Framework

End-to-end attention-based distant speech recognition with Highway LSTM


Title	End-to-end attention-based distant speech recognition with Highway LSTM
Authors	Hassan Taherian
Abstract	End-to-end attention-based models have been shown to be competitive alternatives to conventional DNN-HMM models in the Speech Recognition Systems. In this paper, we extend existing end-to-end attention-based models that can be applied for Distant Speech Recognition (DSR) task. Specifically, we propose an end-to-end attention-based speech recognizer with multichannel input that performs sequence prediction directly at the character level. To gain a better performance, we also incorporate Highway long short-term memory (HLSTM) which outperforms previous models on AMI distant speech recognition task.
Tasks	Distant Speech Recognition, Speech Recognition
Published	2016-10-17
URL	http://arxiv.org/abs/1610.05361v1
PDF	http://arxiv.org/pdf/1610.05361v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-attention-based-distant-speech
Repo
Framework