May 6, 2019

3043 words 15 mins read

Paper Group ANR 253

Investigation of Synthetic Speech Detection Using Frame- and Segment-Specific Importance Weighting. Hyperspectral CNN Classification with Limited Training Samples. Recoverability of Joint Distribution from Missing Data. Interactive Illumination Invariance. An Attentional Neural Conversation Model with Improved Specificity. Forecasting Interactive D …

Investigation of Synthetic Speech Detection Using Frame- and Segment-Specific Importance Weighting


Title	Investigation of Synthetic Speech Detection Using Frame- and Segment-Specific Importance Weighting
Authors	Ali Khodabakhsh, Cenk Demiroglu
Abstract	Speaker verification systems are vulnerable to spoofing attacks which presents a major problem in their real-life deployment. To date, most of the proposed synthetic speech detectors (SSDs) have weighted the importance of different segments of speech equally. However, different attack methods have different strengths and weaknesses and the traces that they leave may be short or long term acoustic artifacts. Moreover, those may occur for only particular phonemes or sounds. Here, we propose three algorithms that weigh likelihood-ratio scores of individual frames, phonemes, and sound-classes depending on their importance for the SSD. Significant improvement over the baseline system has been obtained for known attack methods that were used in training the SSDs. However, improvement with unknown attack types was not substantial. Thus, the type of distortions that were caused by the unknown systems were different and could not be captured better with the proposed SSD compared to the baseline SSD.
Tasks	Speaker Verification
Published	2016-10-10
URL	http://arxiv.org/abs/1610.03009v1
PDF	http://arxiv.org/pdf/1610.03009v1.pdf
PWC	https://paperswithcode.com/paper/investigation-of-synthetic-speech-detection
Repo
Framework

Hyperspectral CNN Classification with Limited Training Samples


Title	Hyperspectral CNN Classification with Limited Training Samples
Authors	Lloyd Windrim, Rishi Ramakrishnan, Arman Melkumyan, Richard Murphy
Abstract	Hyperspectral imaging sensors are becoming increasingly popular in robotics applications such as agriculture and mining, and allow per-pixel thematic classification of materials in a scene based on their unique spectral signatures. Recently, convolutional neural networks have shown remarkable performance for classification tasks, but require substantial amounts of labelled training data. This data must sufficiently cover the variability expected to be encountered in the environment. For hyperspectral data, one of the main variations encountered outdoors is due to incident illumination, which can change in spectral shape and intensity depending on the scene geometry. For example, regions occluded from the sun have a lower intensity and their incident irradiance skewed towards shorter wavelengths. In this work, a data augmentation strategy based on relighting is used during training of a hyperspectral convolutional neural network. It allows training to occur in the outdoor environment given only a small labelled region, which does not need to sufficiently represent the geometric variability of the entire scene. This is important for applications where obtaining large amounts of training data is labourious, hazardous or difficult, such as labelling pixels within shadows. Radiometric normalisation approaches for pre-processing the hyperspectral data are analysed and it is shown that methods based on the raw pixel data are sufficient to be used as input for the classifier. This removes the need for external hardware such as calibration boards, which can restrict the application of hyperspectral sensors in robotics applications. Experiments to evaluate the classification system are carried out on two datasets captured from a field-based platform.
Tasks	Calibration, Data Augmentation
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09007v1
PDF	http://arxiv.org/pdf/1611.09007v1.pdf
PWC	https://paperswithcode.com/paper/hyperspectral-cnn-classification-with-limited
Repo
Framework

Recoverability of Joint Distribution from Missing Data


Title	Recoverability of Joint Distribution from Missing Data
Authors	Jin Tian
Abstract	A probabilistic query may not be estimable from observed data corrupted by missing values if the data are not missing at random (MAR). It is therefore of theoretical interest and practical importance to determine in principle whether a probabilistic query is estimable from missing data or not when the data are not MAR. We present an algorithm that systematically determines whether the joint probability is estimable from observed data with missing values, assuming that the data-generation model is represented as a Bayesian network containing unobserved latent variables that not only encodes the dependencies among the variables but also explicitly portrays the mechanisms responsible for the missingness process. The result significantly advances the existing work.
Tasks
Published	2016-11-15
URL	http://arxiv.org/abs/1611.04709v1
PDF	http://arxiv.org/pdf/1611.04709v1.pdf
PWC	https://paperswithcode.com/paper/recoverability-of-joint-distribution-from
Repo
Framework

Interactive Illumination Invariance


Title	Interactive Illumination Invariance
Authors	Han Gong, Graham Finlayson
Abstract	Illumination effects cause problems for many computer vision algorithms. We present a user-friendly interactive system for robust illumination-invariant image generation. Compared with the previous automated illumination-invariant image derivation approaches, our system enables users to specify a particular kind of illumination variation for removal. The derivation of illumination-invariant image is guided by the user input. The input is a stroke that defines an area covering a set of pixels whose intensities are influenced predominately by the illumination variation. This additional flexibility enhances the robustness for processing non-linearly rendered images and the images of the scenes where their illumination variations are difficult to estimate automatically. Finally, we present some evaluation results of our method.
Tasks	Image Generation
Published	2016-07-20
URL	http://arxiv.org/abs/1607.05967v1
PDF	http://arxiv.org/pdf/1607.05967v1.pdf
PWC	https://paperswithcode.com/paper/interactive-illumination-invariance
Repo
Framework

An Attentional Neural Conversation Model with Improved Specificity


Title	An Attentional Neural Conversation Model with Improved Specificity
Authors	Kaisheng Yao, Baolin Peng, Geoffrey Zweig, Kam-Fai Wong
Abstract	In this paper we propose a neural conversation model for conducting dialogues. We demonstrate the use of this model to generate help desk responses, where users are asking questions about PC applications. Our model is distinguished by two characteristics. First, it models intention across turns with a recurrent network, and incorporates an attention model that is conditioned on the representation of intention. Secondly, it avoids generating non-specific responses by incorporating an IDF term in the objective function. The model is evaluated both as a pure generation model in which a help-desk response is generated from scratch, and as a retrieval model with performance measured using recall rates of the correct response. Experimental results indicate that the model outperforms previously proposed neural conversation architectures, and that using specificity in the objective function significantly improves performances for both generation and retrieval.
Tasks
Published	2016-06-03
URL	http://arxiv.org/abs/1606.01292v1
PDF	http://arxiv.org/pdf/1606.01292v1.pdf
PWC	https://paperswithcode.com/paper/an-attentional-neural-conversation-model-with
Repo
Framework

Forecasting Interactive Dynamics of Pedestrians with Fictitious Play


Title	Forecasting Interactive Dynamics of Pedestrians with Fictitious Play
Authors	Wei-Chiu Ma, De-An Huang, Namhoon Lee, Kris M. Kitani
Abstract	We develop predictive models of pedestrian dynamics by encoding the coupled nature of multi-pedestrian interaction using game theory, and deep learning-based visual analysis to estimate person-specific behavior parameters. Building predictive models for multi-pedestrian interactions however, is very challenging due to two reasons: (1) the dynamics of interaction are complex interdependent processes, where the predicted behavior of one pedestrian can affect the actions taken by others and (2) dynamics are variable depending on an individuals physical characteristics (e.g., an older person may walk slowly while the younger person may walk faster). To address these challenges, we (1) utilize concepts from game theory to model the interdependent decision making process of multiple pedestrians and (2) use visual classifiers to learn a mapping from pedestrian appearance to behavior parameters. We evaluate our proposed model on several public multiple pedestrian interaction video datasets. Results show that our strategic planning model explains human interactions 25% better when compared to state-of-the-art methods.
Tasks	Decision Making
Published	2016-04-05
URL	http://arxiv.org/abs/1604.01431v3
PDF	http://arxiv.org/pdf/1604.01431v3.pdf
PWC	https://paperswithcode.com/paper/forecasting-interactive-dynamics-of
Repo
Framework

Tweet Acts: A Speech Act Classifier for Twitter


Title	Tweet Acts: A Speech Act Classifier for Twitter
Authors	Soroush Vosoughi, Deb Roy
Abstract	Speech acts are a way to conceptualize speech as action. This holds true for communication on any platform, including social media platforms such as Twitter. In this paper, we explored speech act recognition on Twitter by treating it as a multi-class classification problem. We created a taxonomy of six speech acts for Twitter and proposed a set of semantic and syntactic features. We trained and tested a logistic regression classifier using a data set of manually labelled tweets. Our method achieved a state-of-the-art performance with an average F1 score of more than $0.70$. We also explored classifiers with three different granularities (Twitter-wide, type-specific and topic-specific) in order to find the right balance between generalization and overfitting for our task.
Tasks
Published	2016-05-17
URL	http://arxiv.org/abs/1605.05156v1
PDF	http://arxiv.org/pdf/1605.05156v1.pdf
PWC	https://paperswithcode.com/paper/tweet-acts-a-speech-act-classifier-for
Repo
Framework

On the boosting ability of top-down decision tree learning algorithm for multiclass classification


Title	On the boosting ability of top-down decision tree learning algorithm for multiclass classification
Authors	Anna Choromanska, Krzysztof Choromanski, Mariusz Bojarski
Abstract	We analyze the performance of the top-down multiclass classification algorithm for decision tree learning called LOMtree, recently proposed in the literature Choromanska and Langford (2014) for solving efficiently classification problems with very large number of classes. The algorithm online optimizes the objective function which simultaneously controls the depth of the tree and its statistical accuracy. We prove important properties of this objective and explore its connection to three well-known entropy-based decision tree objectives, i.e. Shannon entropy, Gini-entropy and its modified version, for which instead online optimization schemes were not yet developed. We show, via boosting-type guarantees, that maximizing the considered objective leads also to the reduction of all of these entropy-based objectives. The bounds we obtain critically depend on the strong-concavity properties of the entropy-based criteria, where the mildest dependence on the number of classes (only logarithmic) corresponds to the Shannon entropy.
Tasks
Published	2016-05-17
URL	http://arxiv.org/abs/1605.05223v1
PDF	http://arxiv.org/pdf/1605.05223v1.pdf
PWC	https://paperswithcode.com/paper/on-the-boosting-ability-of-top-down-decision
Repo
Framework

Weekly maintenance scheduling using exact and genetic methods


Title	Weekly maintenance scheduling using exact and genetic methods
Authors	Andrew W. Palmer, Robin Vujanic, Andrew J. Hill, Steven J. Scheding
Abstract	The weekly maintenance schedule specifies when maintenance activities should be performed on the equipment, taking into account the availability of workers and maintenance bays, and other operational constraints. The current approach to generating this schedule is labour intensive and requires coordination between the maintenance schedulers and operations staff to minimise its impact on the operation of the mine. This paper presents methods for automatically generating this schedule from the list of maintenance tasks to be performed, the availability roster of the maintenance staff, and time windows in which each piece of equipment is available for maintenance. Both Mixed-Integer Linear Programming (MILP) and genetic algorithms are evaluated, with the genetic algorithm shown to significantly outperform the MILP. Two fitness functions for the genetic algorithm are also examined, with a linear fitness function outperforming an inverse fitness function by up to 5% for the same calculation time. The genetic algorithm approach is computationally fast, allowing the schedule to be rapidly recalculated in response to unexpected delays and breakdowns.
Tasks
Published	2016-10-17
URL	http://arxiv.org/abs/1610.05016v1
PDF	http://arxiv.org/pdf/1610.05016v1.pdf
PWC	https://paperswithcode.com/paper/weekly-maintenance-scheduling-using-exact-and
Repo
Framework

Cardea: Context-Aware Visual Privacy Protection from Pervasive Cameras


Title	Cardea: Context-Aware Visual Privacy Protection from Pervasive Cameras
Authors	Jiayu Shu, Rui Zheng, Pan Hui
Abstract	The growing popularity of mobile and wearable devices with built-in cameras, the bright prospect of camera related applications such as augmented reality and life-logging system, the increased ease of taking and sharing photos, and advances in computer vision techniques have greatly facilitated people’s lives in many aspects, but have also inevitably raised people’s concerns about visual privacy at the same time. Motivated by recent user studies that people’s privacy concerns are dependent on the context, in this paper, we propose Cardea, a context-aware and interactive visual privacy protection framework that enforces privacy protection according to people’s privacy preferences. The framework provides people with fine-grained visual privacy protection using: i) personal privacy profiles, with which people can define their context-dependent privacy preferences; and ii) visual indicators: face features, for devices to automatically locate individuals who request privacy protection; and iii) hand gestures, for people to flexibly interact with cameras to temporarily change their privacy preferences. We design and implement the framework consisting of the client app on Android devices and the cloud server. Our evaluation results confirm this framework is practical and effective with 86% overall accuracy, showing promising future for context-aware visual privacy protection from pervasive cameras.
Tasks
Published	2016-10-04
URL	http://arxiv.org/abs/1610.00889v1
PDF	http://arxiv.org/pdf/1610.00889v1.pdf
PWC	https://paperswithcode.com/paper/cardea-context-aware-visual-privacy
Repo
Framework

Automation of Pedestrian Tracking in a Crowded Situation


Title	Automation of Pedestrian Tracking in a Crowded Situation
Authors	Saman Saadat, Kardi Teknomo
Abstract	Studies on microscopic pedestrian requires large amounts of trajectory data from real-world pedestrian crowds. Such data collection, if done manually, needs tremendous effort and is very time consuming. Though many studies have asserted the possibility of automating this task using video cameras, we found that only a few have demonstrated good performance in very crowded situations or from a top-angled view scene. This paper deals with tracking pedestrian crowd under heavy occlusions from an angular scene. Our automated tracking system consists of two modules that perform sequentially. The first module detects moving objects as blobs. The second module is a tracking system. We employ probability distribution from the detection of each pedestrian and use Bayesian update to track the next position. The result of such tracking is a database of pedestrian trajectories over time and space. With certain prior information, we showed that the system can track a large number of people under occlusion and clutter scene.
Tasks
Published	2016-09-06
URL	http://arxiv.org/abs/1609.01710v1
PDF	http://arxiv.org/pdf/1609.01710v1.pdf
PWC	https://paperswithcode.com/paper/automation-of-pedestrian-tracking-in-a
Repo
Framework

Visualizing and Understanding Curriculum Learning for Long Short-Term Memory Networks


Title	Visualizing and Understanding Curriculum Learning for Long Short-Term Memory Networks
Authors	Volkan Cirik, Eduard Hovy, Louis-Philippe Morency
Abstract	Curriculum Learning emphasizes the order of training instances in a computational learning setup. The core hypothesis is that simpler instances should be learned early as building blocks to learn more complex ones. Despite its usefulness, it is still unknown how exactly the internal representation of models are affected by curriculum learning. In this paper, we study the effect of curriculum learning on Long Short-Term Memory (LSTM) networks, which have shown strong competency in many Natural Language Processing (NLP) problems. Our experiments on sentiment analysis task and a synthetic task similar to sequence prediction tasks in NLP show that curriculum learning has a positive effect on the LSTM’s internal states by biasing the model towards building constructive representations i.e. the internal representation at the previous timesteps are used as building blocks for the final prediction. We also find that smaller models significantly improves when they are trained with curriculum learning. Lastly, we show that curriculum learning helps more when the amount of training data is limited.
Tasks	Sentiment Analysis
Published	2016-11-18
URL	http://arxiv.org/abs/1611.06204v1
PDF	http://arxiv.org/pdf/1611.06204v1.pdf
PWC	https://paperswithcode.com/paper/visualizing-and-understanding-curriculum
Repo
Framework

A 3D Face Modelling Approach for Pose-Invariant Face Recognition in a Human-Robot Environment


Title	A 3D Face Modelling Approach for Pose-Invariant Face Recognition in a Human-Robot Environment
Authors	Michael Grupp, Philipp Kopp, Patrik Huber, Matthias Rätsch
Abstract	Face analysis techniques have become a crucial component of human-machine interaction in the fields of assistive and humanoid robotics. However, the variations in head-pose that arise naturally in these environments are still a great challenge. In this paper, we present a real-time capable 3D face modelling framework for 2D in-the-wild images that is applicable for robotics. The fitting of the 3D Morphable Model is based exclusively on automatically detected landmarks. After fitting, the face can be corrected in pose and transformed back to a frontal 2D representation that is more suitable for face recognition. We conduct face recognition experiments with non-frontal images from the MUCT database and uncontrolled, in the wild images from the PaSC database, the most challenging face recognition database to date, showing an improved performance. Finally, we present our SCITOS G5 robot system, which incorporates our framework as a means of image pre-processing for face analysis.
Tasks	Face Recognition, Robust Face Recognition
Published	2016-06-01
URL	http://arxiv.org/abs/1606.00474v1
PDF	http://arxiv.org/pdf/1606.00474v1.pdf
PWC	https://paperswithcode.com/paper/a-3d-face-modelling-approach-for-pose
Repo
Framework


Title	Interpreting the Syntactic and Social Elements of the Tweet Representations via Elementary Property Prediction Tasks
Authors	J Ganesh, Manish Gupta, Vasudeva Varma
Abstract	Research in social media analysis is experiencing a recent surge with a large number of works applying representation learning models to solve high-level syntactico-semantic tasks such as sentiment analysis, semantic textual similarity computation, hashtag prediction and so on. Although the performance of the representation learning models are better than the traditional baselines for the tasks, little is known about the core properties of a tweet encoded within the representations. Understanding these core properties would empower us in making generalizable conclusions about the quality of representations. Our work presented here constitutes the first step in opening the black-box of vector embedding for social media posts, with emphasis on tweets in particular. In order to understand the core properties encoded in a tweet representation, we evaluate the representations to estimate the extent to which it can model each of those properties such as tweet length, presence of words, hashtags, mentions, capitalization, and so on. This is done with the help of multiple classifiers which take the representation as input. Essentially, each classifier evaluates one of the syntactic or social properties which are arguably salient for a tweet. This is also the first holistic study on extensively analysing the ability to encode these properties for a wide variety of tweet representation models including the traditional unsupervised methods (BOW, LDA), unsupervised representation learning methods (Siamese CBOW, Tweet2Vec) as well as supervised methods (CNN, BLSTM).
Tasks	Representation Learning, Semantic Textual Similarity, Sentiment Analysis, Unsupervised Representation Learning
Published	2016-11-15
URL	http://arxiv.org/abs/1611.04887v1
PDF	http://arxiv.org/pdf/1611.04887v1.pdf
PWC	https://paperswithcode.com/paper/interpreting-the-syntactic-and-social
Repo
Framework

Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals


Title	Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals
Authors	Alexandre Coninx, Pierre Bessière, Jacques Droulez
Abstract	Reconstruction of the tridimensional geometry of a visual scene using the binocular disparity information is an important issue in computer vision and mobile robotics, which can be formulated as a Bayesian inference problem. However, computation of the full disparity distribution with an advanced Bayesian model is usually an intractable problem, and proves computationally challenging even with a simple model. In this paper, we show how probabilistic hardware using distributed memory and alternate representation of data as stochastic bitstreams can solve that problem with high performance and energy efficiency. We put forward a way to express discrete probability distributions using stochastic data representations and perform Bayesian fusion using those representations, and show how that approach can be applied to diparity computation. We evaluate the system using a simulated stochastic implementation and discuss possible hardware implementations of such architectures and their potential for sensorimotor processing and robotics.
Tasks	Bayesian Inference
Published	2016-09-14
URL	http://arxiv.org/abs/1609.04337v2
PDF	http://arxiv.org/pdf/1609.04337v2.pdf
PWC	https://paperswithcode.com/paper/quick-and-energy-efficient-bayesian-computing
Repo
Framework