Paper Group ANR 253
Investigation of Synthetic Speech Detection Using Frame- and Segment-Specific Importance Weighting. Hyperspectral CNN Classification with Limited Training Samples. Recoverability of Joint Distribution from Missing Data. Interactive Illumination Invariance. An Attentional Neural Conversation Model with Improved Specificity. Forecasting Interactive D …
Investigation of Synthetic Speech Detection Using Frame- and Segment-Specific Importance Weighting
Title | Investigation of Synthetic Speech Detection Using Frame- and Segment-Specific Importance Weighting |
Authors | Ali Khodabakhsh, Cenk Demiroglu |
Abstract | Speaker verification systems are vulnerable to spoofing attacks which presents a major problem in their real-life deployment. To date, most of the proposed synthetic speech detectors (SSDs) have weighted the importance of different segments of speech equally. However, different attack methods have different strengths and weaknesses and the traces that they leave may be short or long term acoustic artifacts. Moreover, those may occur for only particular phonemes or sounds. Here, we propose three algorithms that weigh likelihood-ratio scores of individual frames, phonemes, and sound-classes depending on their importance for the SSD. Significant improvement over the baseline system has been obtained for known attack methods that were used in training the SSDs. However, improvement with unknown attack types was not substantial. Thus, the type of distortions that were caused by the unknown systems were different and could not be captured better with the proposed SSD compared to the baseline SSD. |
Tasks | Speaker Verification |
Published | 2016-10-10 |
URL | http://arxiv.org/abs/1610.03009v1 |
http://arxiv.org/pdf/1610.03009v1.pdf | |
PWC | https://paperswithcode.com/paper/investigation-of-synthetic-speech-detection |
Repo | |
Framework | |
Hyperspectral CNN Classification with Limited Training Samples
Title | Hyperspectral CNN Classification with Limited Training Samples |
Authors | Lloyd Windrim, Rishi Ramakrishnan, Arman Melkumyan, Richard Murphy |
Abstract | Hyperspectral imaging sensors are becoming increasingly popular in robotics applications such as agriculture and mining, and allow per-pixel thematic classification of materials in a scene based on their unique spectral signatures. Recently, convolutional neural networks have shown remarkable performance for classification tasks, but require substantial amounts of labelled training data. This data must sufficiently cover the variability expected to be encountered in the environment. For hyperspectral data, one of the main variations encountered outdoors is due to incident illumination, which can change in spectral shape and intensity depending on the scene geometry. For example, regions occluded from the sun have a lower intensity and their incident irradiance skewed towards shorter wavelengths. In this work, a data augmentation strategy based on relighting is used during training of a hyperspectral convolutional neural network. It allows training to occur in the outdoor environment given only a small labelled region, which does not need to sufficiently represent the geometric variability of the entire scene. This is important for applications where obtaining large amounts of training data is labourious, hazardous or difficult, such as labelling pixels within shadows. Radiometric normalisation approaches for pre-processing the hyperspectral data are analysed and it is shown that methods based on the raw pixel data are sufficient to be used as input for the classifier. This removes the need for external hardware such as calibration boards, which can restrict the application of hyperspectral sensors in robotics applications. Experiments to evaluate the classification system are carried out on two datasets captured from a field-based platform. |
Tasks | Calibration, Data Augmentation |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.09007v1 |
http://arxiv.org/pdf/1611.09007v1.pdf | |
PWC | https://paperswithcode.com/paper/hyperspectral-cnn-classification-with-limited |
Repo | |
Framework | |
Recoverability of Joint Distribution from Missing Data
Title | Recoverability of Joint Distribution from Missing Data |
Authors | Jin Tian |
Abstract | A probabilistic query may not be estimable from observed data corrupted by missing values if the data are not missing at random (MAR). It is therefore of theoretical interest and practical importance to determine in principle whether a probabilistic query is estimable from missing data or not when the data are not MAR. We present an algorithm that systematically determines whether the joint probability is estimable from observed data with missing values, assuming that the data-generation model is represented as a Bayesian network containing unobserved latent variables that not only encodes the dependencies among the variables but also explicitly portrays the mechanisms responsible for the missingness process. The result significantly advances the existing work. |
Tasks | |
Published | 2016-11-15 |
URL | http://arxiv.org/abs/1611.04709v1 |
http://arxiv.org/pdf/1611.04709v1.pdf | |
PWC | https://paperswithcode.com/paper/recoverability-of-joint-distribution-from |
Repo | |
Framework | |
Interactive Illumination Invariance
Title | Interactive Illumination Invariance |
Authors | Han Gong, Graham Finlayson |
Abstract | Illumination effects cause problems for many computer vision algorithms. We present a user-friendly interactive system for robust illumination-invariant image generation. Compared with the previous automated illumination-invariant image derivation approaches, our system enables users to specify a particular kind of illumination variation for removal. The derivation of illumination-invariant image is guided by the user input. The input is a stroke that defines an area covering a set of pixels whose intensities are influenced predominately by the illumination variation. This additional flexibility enhances the robustness for processing non-linearly rendered images and the images of the scenes where their illumination variations are difficult to estimate automatically. Finally, we present some evaluation results of our method. |
Tasks | Image Generation |
Published | 2016-07-20 |
URL | http://arxiv.org/abs/1607.05967v1 |
http://arxiv.org/pdf/1607.05967v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-illumination-invariance |
Repo | |
Framework | |
An Attentional Neural Conversation Model with Improved Specificity
Title | An Attentional Neural Conversation Model with Improved Specificity |
Authors | Kaisheng Yao, Baolin Peng, Geoffrey Zweig, Kam-Fai Wong |
Abstract | In this paper we propose a neural conversation model for conducting dialogues. We demonstrate the use of this model to generate help desk responses, where users are asking questions about PC applications. Our model is distinguished by two characteristics. First, it models intention across turns with a recurrent network, and incorporates an attention model that is conditioned on the representation of intention. Secondly, it avoids generating non-specific responses by incorporating an IDF term in the objective function. The model is evaluated both as a pure generation model in which a help-desk response is generated from scratch, and as a retrieval model with performance measured using recall rates of the correct response. Experimental results indicate that the model outperforms previously proposed neural conversation architectures, and that using specificity in the objective function significantly improves performances for both generation and retrieval. |
Tasks | |
Published | 2016-06-03 |
URL | http://arxiv.org/abs/1606.01292v1 |
http://arxiv.org/pdf/1606.01292v1.pdf | |
PWC | https://paperswithcode.com/paper/an-attentional-neural-conversation-model-with |
Repo | |
Framework | |
Forecasting Interactive Dynamics of Pedestrians with Fictitious Play
Title | Forecasting Interactive Dynamics of Pedestrians with Fictitious Play |
Authors | Wei-Chiu Ma, De-An Huang, Namhoon Lee, Kris M. Kitani |
Abstract | We develop predictive models of pedestrian dynamics by encoding the coupled nature of multi-pedestrian interaction using game theory, and deep learning-based visual analysis to estimate person-specific behavior parameters. Building predictive models for multi-pedestrian interactions however, is very challenging due to two reasons: (1) the dynamics of interaction are complex interdependent processes, where the predicted behavior of one pedestrian can affect the actions taken by others and (2) dynamics are variable depending on an individuals physical characteristics (e.g., an older person may walk slowly while the younger person may walk faster). To address these challenges, we (1) utilize concepts from game theory to model the interdependent decision making process of multiple pedestrians and (2) use visual classifiers to learn a mapping from pedestrian appearance to behavior parameters. We evaluate our proposed model on several public multiple pedestrian interaction video datasets. Results show that our strategic planning model explains human interactions 25% better when compared to state-of-the-art methods. |
Tasks | Decision Making |
Published | 2016-04-05 |
URL | http://arxiv.org/abs/1604.01431v3 |
http://arxiv.org/pdf/1604.01431v3.pdf | |
PWC | https://paperswithcode.com/paper/forecasting-interactive-dynamics-of |
Repo | |
Framework | |
Tweet Acts: A Speech Act Classifier for Twitter
Title | Tweet Acts: A Speech Act Classifier for Twitter |
Authors | Soroush Vosoughi, Deb Roy |
Abstract | Speech acts are a way to conceptualize speech as action. This holds true for communication on any platform, including social media platforms such as Twitter. In this paper, we explored speech act recognition on Twitter by treating it as a multi-class classification problem. We created a taxonomy of six speech acts for Twitter and proposed a set of semantic and syntactic features. We trained and tested a logistic regression classifier using a data set of manually labelled tweets. Our method achieved a state-of-the-art performance with an average F1 score of more than $0.70$. We also explored classifiers with three different granularities (Twitter-wide, type-specific and topic-specific) in order to find the right balance between generalization and overfitting for our task. |
Tasks | |
Published | 2016-05-17 |
URL | http://arxiv.org/abs/1605.05156v1 |
http://arxiv.org/pdf/1605.05156v1.pdf | |
PWC | https://paperswithcode.com/paper/tweet-acts-a-speech-act-classifier-for |
Repo | |
Framework | |
On the boosting ability of top-down decision tree learning algorithm for multiclass classification
Title | On the boosting ability of top-down decision tree learning algorithm for multiclass classification |
Authors | Anna Choromanska, Krzysztof Choromanski, Mariusz Bojarski |
Abstract | We analyze the performance of the top-down multiclass classification algorithm for decision tree learning called LOMtree, recently proposed in the literature Choromanska and Langford (2014) for solving efficiently classification problems with very large number of classes. The algorithm online optimizes the objective function which simultaneously controls the depth of the tree and its statistical accuracy. We prove important properties of this objective and explore its connection to three well-known entropy-based decision tree objectives, i.e. Shannon entropy, Gini-entropy and its modified version, for which instead online optimization schemes were not yet developed. We show, via boosting-type guarantees, that maximizing the considered objective leads also to the reduction of all of these entropy-based objectives. The bounds we obtain critically depend on the strong-concavity properties of the entropy-based criteria, where the mildest dependence on the number of classes (only logarithmic) corresponds to the Shannon entropy. |
Tasks | |
Published | 2016-05-17 |
URL | http://arxiv.org/abs/1605.05223v1 |
http://arxiv.org/pdf/1605.05223v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-boosting-ability-of-top-down-decision |
Repo | |
Framework | |
Weekly maintenance scheduling using exact and genetic methods
Title | Weekly maintenance scheduling using exact and genetic methods |
Authors | Andrew W. Palmer, Robin Vujanic, Andrew J. Hill, Steven J. Scheding |
Abstract | The weekly maintenance schedule specifies when maintenance activities should be performed on the equipment, taking into account the availability of workers and maintenance bays, and other operational constraints. The current approach to generating this schedule is labour intensive and requires coordination between the maintenance schedulers and operations staff to minimise its impact on the operation of the mine. This paper presents methods for automatically generating this schedule from the list of maintenance tasks to be performed, the availability roster of the maintenance staff, and time windows in which each piece of equipment is available for maintenance. Both Mixed-Integer Linear Programming (MILP) and genetic algorithms are evaluated, with the genetic algorithm shown to significantly outperform the MILP. Two fitness functions for the genetic algorithm are also examined, with a linear fitness function outperforming an inverse fitness function by up to 5% for the same calculation time. The genetic algorithm approach is computationally fast, allowing the schedule to be rapidly recalculated in response to unexpected delays and breakdowns. |
Tasks | |
Published | 2016-10-17 |
URL | http://arxiv.org/abs/1610.05016v1 |
http://arxiv.org/pdf/1610.05016v1.pdf | |
PWC | https://paperswithcode.com/paper/weekly-maintenance-scheduling-using-exact-and |
Repo | |
Framework | |
Cardea: Context-Aware Visual Privacy Protection from Pervasive Cameras
Title | Cardea: Context-Aware Visual Privacy Protection from Pervasive Cameras |
Authors | Jiayu Shu, Rui Zheng, Pan Hui |
Abstract | The growing popularity of mobile and wearable devices with built-in cameras, the bright prospect of camera related applications such as augmented reality and life-logging system, the increased ease of taking and sharing photos, and advances in computer vision techniques have greatly facilitated people’s lives in many aspects, but have also inevitably raised people’s concerns about visual privacy at the same time. Motivated by recent user studies that people’s privacy concerns are dependent on the context, in this paper, we propose Cardea, a context-aware and interactive visual privacy protection framework that enforces privacy protection according to people’s privacy preferences. The framework provides people with fine-grained visual privacy protection using: i) personal privacy profiles, with which people can define their context-dependent privacy preferences; and ii) visual indicators: face features, for devices to automatically locate individuals who request privacy protection; and iii) hand gestures, for people to flexibly interact with cameras to temporarily change their privacy preferences. We design and implement the framework consisting of the client app on Android devices and the cloud server. Our evaluation results confirm this framework is practical and effective with 86% overall accuracy, showing promising future for context-aware visual privacy protection from pervasive cameras. |
Tasks | |
Published | 2016-10-04 |
URL | http://arxiv.org/abs/1610.00889v1 |
http://arxiv.org/pdf/1610.00889v1.pdf | |
PWC | https://paperswithcode.com/paper/cardea-context-aware-visual-privacy |
Repo | |
Framework | |
Automation of Pedestrian Tracking in a Crowded Situation
Title | Automation of Pedestrian Tracking in a Crowded Situation |
Authors | Saman Saadat, Kardi Teknomo |
Abstract | Studies on microscopic pedestrian requires large amounts of trajectory data from real-world pedestrian crowds. Such data collection, if done manually, needs tremendous effort and is very time consuming. Though many studies have asserted the possibility of automating this task using video cameras, we found that only a few have demonstrated good performance in very crowded situations or from a top-angled view scene. This paper deals with tracking pedestrian crowd under heavy occlusions from an angular scene. Our automated tracking system consists of two modules that perform sequentially. The first module detects moving objects as blobs. The second module is a tracking system. We employ probability distribution from the detection of each pedestrian and use Bayesian update to track the next position. The result of such tracking is a database of pedestrian trajectories over time and space. With certain prior information, we showed that the system can track a large number of people under occlusion and clutter scene. |
Tasks | |
Published | 2016-09-06 |
URL | http://arxiv.org/abs/1609.01710v1 |
http://arxiv.org/pdf/1609.01710v1.pdf | |
PWC | https://paperswithcode.com/paper/automation-of-pedestrian-tracking-in-a |
Repo | |
Framework | |
Visualizing and Understanding Curriculum Learning for Long Short-Term Memory Networks
Title | Visualizing and Understanding Curriculum Learning for Long Short-Term Memory Networks |
Authors | Volkan Cirik, Eduard Hovy, Louis-Philippe Morency |
Abstract | Curriculum Learning emphasizes the order of training instances in a computational learning setup. The core hypothesis is that simpler instances should be learned early as building blocks to learn more complex ones. Despite its usefulness, it is still unknown how exactly the internal representation of models are affected by curriculum learning. In this paper, we study the effect of curriculum learning on Long Short-Term Memory (LSTM) networks, which have shown strong competency in many Natural Language Processing (NLP) problems. Our experiments on sentiment analysis task and a synthetic task similar to sequence prediction tasks in NLP show that curriculum learning has a positive effect on the LSTM’s internal states by biasing the model towards building constructive representations i.e. the internal representation at the previous timesteps are used as building blocks for the final prediction. We also find that smaller models significantly improves when they are trained with curriculum learning. Lastly, we show that curriculum learning helps more when the amount of training data is limited. |
Tasks | Sentiment Analysis |
Published | 2016-11-18 |
URL | http://arxiv.org/abs/1611.06204v1 |
http://arxiv.org/pdf/1611.06204v1.pdf | |
PWC | https://paperswithcode.com/paper/visualizing-and-understanding-curriculum |
Repo | |
Framework | |
A 3D Face Modelling Approach for Pose-Invariant Face Recognition in a Human-Robot Environment
Title | A 3D Face Modelling Approach for Pose-Invariant Face Recognition in a Human-Robot Environment |
Authors | Michael Grupp, Philipp Kopp, Patrik Huber, Matthias Rätsch |
Abstract | Face analysis techniques have become a crucial component of human-machine interaction in the fields of assistive and humanoid robotics. However, the variations in head-pose that arise naturally in these environments are still a great challenge. In this paper, we present a real-time capable 3D face modelling framework for 2D in-the-wild images that is applicable for robotics. The fitting of the 3D Morphable Model is based exclusively on automatically detected landmarks. After fitting, the face can be corrected in pose and transformed back to a frontal 2D representation that is more suitable for face recognition. We conduct face recognition experiments with non-frontal images from the MUCT database and uncontrolled, in the wild images from the PaSC database, the most challenging face recognition database to date, showing an improved performance. Finally, we present our SCITOS G5 robot system, which incorporates our framework as a means of image pre-processing for face analysis. |
Tasks | Face Recognition, Robust Face Recognition |
Published | 2016-06-01 |
URL | http://arxiv.org/abs/1606.00474v1 |
http://arxiv.org/pdf/1606.00474v1.pdf | |
PWC | https://paperswithcode.com/paper/a-3d-face-modelling-approach-for-pose |
Repo | |
Framework | |
Interpreting the Syntactic and Social Elements of the Tweet Representations via Elementary Property Prediction Tasks
Title | Interpreting the Syntactic and Social Elements of the Tweet Representations via Elementary Property Prediction Tasks |
Authors | J Ganesh, Manish Gupta, Vasudeva Varma |
Abstract | Research in social media analysis is experiencing a recent surge with a large number of works applying representation learning models to solve high-level syntactico-semantic tasks such as sentiment analysis, semantic textual similarity computation, hashtag prediction and so on. Although the performance of the representation learning models are better than the traditional baselines for the tasks, little is known about the core properties of a tweet encoded within the representations. Understanding these core properties would empower us in making generalizable conclusions about the quality of representations. Our work presented here constitutes the first step in opening the black-box of vector embedding for social media posts, with emphasis on tweets in particular. In order to understand the core properties encoded in a tweet representation, we evaluate the representations to estimate the extent to which it can model each of those properties such as tweet length, presence of words, hashtags, mentions, capitalization, and so on. This is done with the help of multiple classifiers which take the representation as input. Essentially, each classifier evaluates one of the syntactic or social properties which are arguably salient for a tweet. This is also the first holistic study on extensively analysing the ability to encode these properties for a wide variety of tweet representation models including the traditional unsupervised methods (BOW, LDA), unsupervised representation learning methods (Siamese CBOW, Tweet2Vec) as well as supervised methods (CNN, BLSTM). |
Tasks | Representation Learning, Semantic Textual Similarity, Sentiment Analysis, Unsupervised Representation Learning |
Published | 2016-11-15 |
URL | http://arxiv.org/abs/1611.04887v1 |
http://arxiv.org/pdf/1611.04887v1.pdf | |
PWC | https://paperswithcode.com/paper/interpreting-the-syntactic-and-social |
Repo | |
Framework | |
Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals
Title | Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals |
Authors | Alexandre Coninx, Pierre Bessière, Jacques Droulez |
Abstract | Reconstruction of the tridimensional geometry of a visual scene using the binocular disparity information is an important issue in computer vision and mobile robotics, which can be formulated as a Bayesian inference problem. However, computation of the full disparity distribution with an advanced Bayesian model is usually an intractable problem, and proves computationally challenging even with a simple model. In this paper, we show how probabilistic hardware using distributed memory and alternate representation of data as stochastic bitstreams can solve that problem with high performance and energy efficiency. We put forward a way to express discrete probability distributions using stochastic data representations and perform Bayesian fusion using those representations, and show how that approach can be applied to diparity computation. We evaluate the system using a simulated stochastic implementation and discuss possible hardware implementations of such architectures and their potential for sensorimotor processing and robotics. |
Tasks | Bayesian Inference |
Published | 2016-09-14 |
URL | http://arxiv.org/abs/1609.04337v2 |
http://arxiv.org/pdf/1609.04337v2.pdf | |
PWC | https://paperswithcode.com/paper/quick-and-energy-efficient-bayesian-computing |
Repo | |
Framework | |