May 7, 2019

3289 words 16 mins read

Paper Group ANR 45

Sequence to sequence learning for unconstrained scene text recognition. Am I a Baller? Basketball Performance Assessment from First-Person Videos. Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems. Learning Compact Recurrent Neural Networks. Centrog Feature technique for vehicle type recognition at day and nigh …

Sequence to sequence learning for unconstrained scene text recognition


Title	Sequence to sequence learning for unconstrained scene text recognition
Authors	Ahmed Mamdouh A. Hassanien
Abstract	In this work we present a state-of-the-art approach for unconstrained natural scene text recognition. We propose a cascade approach that incorporates a convolutional neural network (CNN) architecture followed by a long short term memory model (LSTM). The CNN learns visual features for the characters and uses them with a softmax layer to detect sequence of characters. While the CNN gives very good recognition results, it does not model relation between characters, hence gives rise to false positive and false negative cases (confusing characters due to visual similarities like “g” and “9”, or confusing background patches with characters; either removing existing characters or adding non-existing ones) To alleviate these problems we leverage recent developments in LSTM architectures to encode contextual information. We show that the LSTM can dramatically reduce such errors and achieve state-of-the-art accuracy in the task of unconstrained natural scene text recognition. Moreover we manually remove all occurrences of the words that exist in the test set from our training set to test whether our approach will generalize to unseen data. We use the ICDAR 13 test set for evaluation and compare the results with the state of the art approaches [11, 18]. We finally present an application of the work in the domain of for traffic monitoring.
Tasks	Scene Text Recognition
Published	2016-07-20
URL	http://arxiv.org/abs/1607.06125v1
PDF	http://arxiv.org/pdf/1607.06125v1.pdf
PWC	https://paperswithcode.com/paper/sequence-to-sequence-learning-for
Repo
Framework

Am I a Baller? Basketball Performance Assessment from First-Person Videos


Title	Am I a Baller? Basketball Performance Assessment from First-Person Videos
Authors	Gedas Bertasius, Hyun Soo Park, Stella X. Yu, Jianbo Shi
Abstract	This paper presents a method to assess a basketball player’s performance from his/her first-person video. A key challenge lies in the fact that the evaluation metric is highly subjective and specific to a particular evaluator. We leverage the first-person camera to address this challenge. The spatiotemporal visual semantics provided by a first-person view allows us to reason about the camera wearer’s actions while he/she is participating in an unscripted basketball game. Our method takes a player’s first-person video and provides a player’s performance measure that is specific to an evaluator’s preference. To achieve this goal, we first use a convolutional LSTM network to detect atomic basketball events from first-person videos. Our network’s ability to zoom-in to the salient regions addresses the issue of a severe camera wearer’s head movement in first-person videos. The detected atomic events are then passed through the Gaussian mixtures to construct a highly non-linear visual spatiotemporal basketball assessment feature. Finally, we use this feature to learn a basketball assessment model from pairs of labeled first-person basketball videos, for which a basketball expert indicates, which of the two players is better. We demonstrate that despite not knowing the basketball evaluator’s criterion, our model learns to accurately assess the players in real-world games. Furthermore, our model can also discover basketball events that contribute positively and negatively to a player’s performance.
Tasks
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05365v4
PDF	http://arxiv.org/pdf/1611.05365v4.pdf
PWC	https://paperswithcode.com/paper/am-i-a-baller-basketball-performance
Repo
Framework

Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems


Title	Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems
Authors	Andrew Gordon Wilson, Been Kim, William Herlands
Abstract	This is the Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems, held in Barcelona, Spain on December 9, 2016
Tasks	Interpretable Machine Learning
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09139v1
PDF	http://arxiv.org/pdf/1611.09139v1.pdf
PWC	https://paperswithcode.com/paper/proceedings-of-nips-2016-workshop-on
Repo
Framework

Learning Compact Recurrent Neural Networks


Title	Learning Compact Recurrent Neural Networks
Authors	Zhiyun Lu, Vikas Sindhwani, Tara N. Sainath
Abstract	Recurrent neural networks (RNNs), including long short-term memory (LSTM) RNNs, have produced state-of-the-art results on a variety of speech recognition tasks. However, these models are often too large in size for deployment on mobile devices with memory and latency constraints. In this work, we study mechanisms for learning compact RNNs and LSTMs via low-rank factorizations and parameter sharing schemes. Our goal is to investigate redundancies in recurrent architectures where compression can be admitted without losing performance. A hybrid strategy of using structured matrices in the bottom layers and shared low-rank factors on the top layers is found to be particularly effective, reducing the parameters of a standard LSTM by 75%, at a small cost of 0.3% increase in WER, on a 2,000-hr English Voice Search task.
Tasks	Speech Recognition
Published	2016-04-09
URL	http://arxiv.org/abs/1604.02594v1
PDF	http://arxiv.org/pdf/1604.02594v1.pdf
PWC	https://paperswithcode.com/paper/learning-compact-recurrent-neural-networks
Repo
Framework

Centrog Feature technique for vehicle type recognition at day and night times


Title	Centrog Feature technique for vehicle type recognition at day and night times
Authors	Martins E. Irhebhude, Philip O. Odion, Darius T. Chinyio
Abstract	This work proposes a feature-based technique to recognize vehicle types within day and night times. Support vector machine (SVM) classifier is applied on image histogram and CENsus Transformed histogRam Oriented Gradient (CENTROG) features in order to classify vehicle types during the day and night. Thermal images were used for the night time experiments. Although thermal images suffer from low image resolution, lack of colour and poor texture information, they offer the advantage of being unaffected by high intensity light sources such as vehicle headlights which tend to render normal images unsuitable for night time image capturing and subsequent analysis. Since contour is useful in shape based categorisation and the most distinctive feature within thermal images, CENTROG is used to capture this feature information and is used within the experiments. The experimental results so obtained were compared with those obtained by employing the CENsus TRansformed hISTogram (CENTRIST). Experimental results revealed that CENTROG offers better recognition accuracies for both day and night times vehicle types recognition.
Tasks
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00645v1
PDF	http://arxiv.org/pdf/1612.00645v1.pdf
PWC	https://paperswithcode.com/paper/centrog-feature-technique-for-vehicle-type
Repo
Framework

Synthesizing Training Images for Boosting Human 3D Pose Estimation


Title	Synthesizing Training Images for Boosting Human 3D Pose Estimation
Authors	Wenzheng Chen, Huan Wang, Yangyan Li, Hao Su, Zhenhua Wang, Changhe Tu, Dani Lischinski, Daniel Cohen-Or, Baoquan Chen
Abstract	Human 3D pose estimation from a single image is a challenging task with numerous applications. Convolutional Neural Networks (CNNs) have recently achieved superior performance on the task of 2D pose estimation from a single image, by training on images with 2D annotations collected by crowd sourcing. This suggests that similar success could be achieved for direct estimation of 3D poses. However, 3D poses are much harder to annotate, and the lack of suitable annotated training images hinders attempts towards end-to-end solutions. To address this issue, we opt to automatically synthesize training images with ground truth pose annotations. Our work is a systematic study along this road. We find that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data. We present a fully automatic, scalable approach that samples the human pose space for guiding the synthesis procedure and extracts clothing textures from real images. Furthermore, we explore domain adaptation for bridging the gap between our synthetic training images and real testing photos. We demonstrate that CNNs trained with our synthetic images out-perform those trained with real photos on 3D pose estimation tasks.
Tasks	3D Pose Estimation, Domain Adaptation, Pose Estimation
Published	2016-04-10
URL	http://arxiv.org/abs/1604.02703v6
PDF	http://arxiv.org/pdf/1604.02703v6.pdf
PWC	https://paperswithcode.com/paper/synthesizing-training-images-for-boosting
Repo
Framework

The KB paradigm and its application to interactive configuration


Title	The KB paradigm and its application to interactive configuration
Authors	Pieter Van Hertum, Ingmar Dasseville, Gerda Janssens, Marc Denecker
Abstract	The knowledge base paradigm aims to express domain knowledge in a rich formal language, and to use this domain knowledge as a knowledge base to solve various problems and tasks that arise in the domain by applying multiple forms of inference. As such, the paradigm applies a strict separation of concerns between information and problem solving. In this paper, we analyze the principles and feasibility of the knowledge base paradigm in the context of an important class of applications: interactive configuration problems. In interactive configuration problems, a configuration of interrelated objects under constraints is searched, where the system assists the user in reaching an intended configuration. It is widely recognized in industry that good software solutions for these problems are very difficult to develop. We investigate such problems from the perspective of the KB paradigm. We show that multiple functionalities in this domain can be achieved by applying different forms of logical inferences on a formal specification of the configuration domain. We report on a proof of concept of this approach in a real-life application with a banking company. To appear in Theory and Practice of Logic Programming (TPLP).
Tasks
Published	2016-05-06
URL	http://arxiv.org/abs/1605.01846v1
PDF	http://arxiv.org/pdf/1605.01846v1.pdf
PWC	https://paperswithcode.com/paper/the-kb-paradigm-and-its-application-to
Repo
Framework

An Improved Approach for Prediction of Parkinson’s Disease using Machine Learning Techniques


Title	An Improved Approach for Prediction of Parkinson’s Disease using Machine Learning Techniques
Authors	Kamal Nayan Reddy Challa, Venkata Sasank Pagolu, Ganapati Panda, Babita Majhi
Abstract	Parkinson’s disease (PD) is one of the major public health problems in the world. It is a well-known fact that around one million people suffer from Parkinson’s disease in the United States whereas the number of people suffering from Parkinson’s disease worldwide is around 5 million. Thus, it is important to predict Parkinson’s disease in early stages so that early plan for the necessary treatment can be made. People are mostly familiar with the motor symptoms of Parkinson’s disease, however, an increasing amount of research is being done to predict the Parkinson’s disease from non-motor symptoms that precede the motor ones. If an early and reliable prediction is possible then a patient can get a proper treatment at the right time. Nonmotor symptoms considered are Rapid Eye Movement (REM) sleep Behaviour Disorder (RBD) and olfactory loss. Developing machine learning models that can help us in predicting the disease can play a vital role in early prediction. In this paper, we extend a work which used the non-motor features such as RBD and olfactory loss. Along with this the extended work also uses important biomarkers. In this paper, we try to model this classifier using different machine learning models that have not been used before. We developed automated diagnostic models using Multilayer Perceptron, BayesNet, Random Forest and Boosted Logistic Regression. It has been observed that Boosted Logistic Regression provides the best performance with an impressive accuracy of 97.159 % and the area under the ROC curve was 98.9%. Thus, it is concluded that these models can be used for early prediction of Parkinson’s disease.
Tasks
Published	2016-10-26
URL	http://arxiv.org/abs/1610.08250v1
PDF	http://arxiv.org/pdf/1610.08250v1.pdf
PWC	https://paperswithcode.com/paper/an-improved-approach-for-prediction-of
Repo
Framework

Fantastic 4 system for NIST 2015 Language Recognition Evaluation


Title	Fantastic 4 system for NIST 2015 Language Recognition Evaluation
Authors	Kong Aik Lee, Ville Hautamäki, Anthony Larcher, Wei Rao, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, Haizhou Li, Sylvain Meignier
Abstract	This article describes the systems jointly submitted by Institute for Infocomm (I$^2$R), the Laboratoire d’Informatique de l’Universit'e du Maine (LIUM), Nanyang Technology University (NTU) and the University of Eastern Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE). The submitted system is a fusion of nine sub-systems based on i-vectors extracted from different types of features. Given the i-vectors, several classifiers are adopted for the language detection task including support vector machines (SVM), multi-class logistic regression (MCLR), Probabilistic Linear Discriminant Analysis (PLDA) and Deep Neural Networks (DNN).
Tasks
Published	2016-02-05
URL	http://arxiv.org/abs/1602.01929v1
PDF	http://arxiv.org/pdf/1602.01929v1.pdf
PWC	https://paperswithcode.com/paper/fantastic-4-system-for-nist-2015-language
Repo
Framework

Pandora: Description of a Painting Database for Art Movement Recognition with Baselines and Perspectives


Title	Pandora: Description of a Painting Database for Art Movement Recognition with Baselines and Perspectives
Authors	Corneliu Florea, Razvan Condorovici, Constantin Vertan, Raluca Boia, Laura Florea, Ruxandra Vranceanu
Abstract	To facilitate computer analysis of visual art, in the form of paintings, we introduce Pandora (Paintings Dataset for Recognizing the Art movement) database, a collection of digitized paintings labelled with respect to the artistic movement. Noting that the set of databases available as benchmarks for evaluation is highly reduced and most existing ones are limited in variability and number of images, we propose a novel large scale dataset of digital paintings. The database consists of more than 7700 images from 12 art movements. Each genre is illustrated by a number of images varying from 250 to nearly 1000. We investigate how local and global features and classification systems are able to recognize the art movement. Our experimental results suggest that accurate recognition is achievable by a combination of various categories.To facilitate computer analysis of visual art, in the form of paintings, we introduce Pandora (Paintings Dataset for Recognizing the Art movement) database, a collection of digitized paintings labelled with respect to the artistic movement. Noting that the set of databases available as benchmarks for evaluation is highly reduced and most existing ones are limited in variability and number of images, we propose a novel large scale dataset of digital paintings. The database consists of more than 7700 images from 12 art movements. Each genre is illustrated by a number of images varying from 250 to nearly 1000. We investigate how local and global features and classification systems are able to recognize the art movement. Our experimental results suggest that accurate recognition is achievable by a combination of various categories.
Tasks
Published	2016-02-29
URL	http://arxiv.org/abs/1602.08855v1
PDF	http://arxiv.org/pdf/1602.08855v1.pdf
PWC	https://paperswithcode.com/paper/pandora-description-of-a-painting-database
Repo
Framework

Class-prior Estimation for Learning from Positive and Unlabeled Data


Title	Class-prior Estimation for Learning from Positive and Unlabeled Data
Authors	Marthinus C. du Plessis, Gang Niu, Masashi Sugiyama
Abstract	We consider the problem of estimating the class prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized $L_1$-distance gives a computationally efficient algorithm with an analytic solution. The consistency, stability, and estimation error are theoretically analyzed. Finally, we experimentally demonstrate the usefulness of the proposed method.
Tasks
Published	2016-11-05
URL	http://arxiv.org/abs/1611.01586v1
PDF	http://arxiv.org/pdf/1611.01586v1.pdf
PWC	https://paperswithcode.com/paper/class-prior-estimation-for-learning-from
Repo
Framework

Active Learning for Speech Recognition: the Power of Gradients


Title	Active Learning for Speech Recognition: the Power of Gradients
Authors	Jiaji Huang, Rewon Child, Vinay Rao, Hairong Liu, Sanjeev Satheesh, Adam Coates
Abstract	In training speech recognition systems, labeling audio clips can be expensive, and not all data is equally valuable. Active learning aims to label only the most informative samples to reduce cost. For speech recognition, confidence scores and other likelihood-based active learning methods have been shown to be effective. Gradient-based active learning methods, however, are still not well-understood. This work investigates the Expected Gradient Length (EGL) approach in active learning for end-to-end speech recognition. We justify EGL from a variance reduction perspective, and observe that EGL’s measure of informativeness picks novel samples uncorrelated with confidence scores. Experimentally, we show that EGL can reduce word errors by 11%, or alternatively, reduce the number of samples to label by 50%, when compared to random sampling.
Tasks	Active Learning, End-To-End Speech Recognition, Speech Recognition
Published	2016-12-10
URL	http://arxiv.org/abs/1612.03226v1
PDF	http://arxiv.org/pdf/1612.03226v1.pdf
PWC	https://paperswithcode.com/paper/active-learning-for-speech-recognition-the
Repo
Framework

A Tube-and-Droplet-based Approach for Representing and Analyzing Motion Trajectories


Title	A Tube-and-Droplet-based Approach for Representing and Analyzing Motion Trajectories
Authors	Weiyao Lin, Yang Zhou, Hongteng Xu, Junchi Yan, Mingliang Xu, Jianxin Wu, Zicheng Liu
Abstract	Trajectory analysis is essential in many applications. In this paper, we address the problem of representing motion trajectories in a highly informative way, and consequently utilize it for analyzing trajectories. Our approach first leverages the complete information from given trajectories to construct a thermal transfer field which provides a context-rich way to describe the global motion pattern in a scene. Then, a 3D tube is derived which depicts an input trajectory by integrating its surrounding motion patterns contained in the thermal transfer field. The 3D tube effectively: 1) maintains the movement information of a trajectory, 2) embeds the complete contextual motion pattern around a trajectory, 3) visualizes information about a trajectory in a clear and unified way. We further introduce a droplet-based process. It derives a droplet vector from a 3D tube, so as to characterize the high-dimensional 3D tube information in a simple but effective way. Finally, we apply our tube-and-droplet representation to trajectory analysis applications including trajectory clustering, trajectory classification & abnormality detection, and 3D action recognition. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our approach.
Tasks	3D Human Action Recognition, Anomaly Detection, Temporal Action Localization
Published	2016-09-10
URL	http://arxiv.org/abs/1609.03058v2
PDF	http://arxiv.org/pdf/1609.03058v2.pdf
PWC	https://paperswithcode.com/paper/a-tube-and-droplet-based-approach-for
Repo
Framework

VAST : The Virtual Acoustic Space Traveler Dataset


Title	VAST : The Virtual Acoustic Space Traveler Dataset
Authors	Clément Gaultier, Saurabh Kataria, Antoine Deleforge
Abstract	This paper introduces a new paradigm for sound source lo-calization referred to as virtual acoustic space traveling (VAST) and presents a first dataset designed for this purpose. Existing sound source localization methods are either based on an approximate physical model (physics-driven) or on a specific-purpose calibration set (data-driven). With VAST, the idea is to learn a mapping from audio features to desired audio properties using a massive dataset of simulated room impulse responses. This virtual dataset is designed to be maximally representative of the potential audio scenes that the considered system may be evolving in, while remaining reasonably compact. We show that virtually-learned mappings on this dataset generalize to real data, overcoming some intrinsic limitations of traditional binaural sound localization methods based on time differences of arrival.
Tasks	Calibration
Published	2016-12-14
URL	http://arxiv.org/abs/1612.06287v1
PDF	http://arxiv.org/pdf/1612.06287v1.pdf
PWC	https://paperswithcode.com/paper/vast-the-virtual-acoustic-space-traveler
Repo
Framework

Selecting Bases in Spectral learning of Predictive State Representations via Model Entropy


Title	Selecting Bases in Spectral learning of Predictive State Representations via Model Entropy
Authors	Yunlong Liu, Hexing Zhu
Abstract	Predictive State Representations (PSRs) are powerful techniques for modelling dynamical systems, which represent a state as a vector of predictions about future observable events (tests). In PSRs, one of the fundamental problems is the learning of the PSR model of the underlying system. Recently, spectral methods have been successfully used to address this issue by treating the learning problem as the task of computing an singular value decomposition (SVD) over a submatrix of a special type of matrix called the Hankel matrix. Under the assumptions that the rows and columns of the submatrix of the Hankel Matrix are sufficient~(which usually means a very large number of rows and columns, and almost fails in practice) and the entries of the matrix can be estimated accurately, it has been proven that the spectral approach for learning PSRs is statistically consistent and the learned parameters can converge to the true parameters. However, in practice, due to the limit of the computation ability, only a finite set of rows or columns can be chosen to be used for the spectral learning. While different sets of columns usually lead to variant accuracy of the learned model, in this paper, we propose an approach for selecting the set of columns, namely basis selection, by adopting a concept of model entropy to measure the accuracy of the learned model. Experimental results are shown to demonstrate the effectiveness of the proposed approach.
Tasks
Published	2016-12-29
URL	http://arxiv.org/abs/1612.09076v1
PDF	http://arxiv.org/pdf/1612.09076v1.pdf
PWC	https://paperswithcode.com/paper/selecting-bases-in-spectral-learning-of
Repo
Framework