January 30, 2020

3021 words 15 mins read

Paper Group ANR 217

Paper Group ANR 217

EyeCar: Modeling the Visual Attention Allocation of Drivers in Semi-Autonomous Vehicles. Leveraging sentence similarity in natural language generation: Improving beam search using range voting. Tensor Graph Convolutional Networks for Prediction on Dynamic Graphs. Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems. Zero-Shot De …

EyeCar: Modeling the Visual Attention Allocation of Drivers in Semi-Autonomous Vehicles

Title EyeCar: Modeling the Visual Attention Allocation of Drivers in Semi-Autonomous Vehicles
Authors Sonia Baee, Erfan Pakdamanian, Vicente Ordonez, Inki Kim, Lu Feng, Laura Barnes
Abstract A safe transition between autonomous and manual control requires sustained visual attention of the driver for the perception and assessment of hazards in dynamic driving environments. Thus, drivers must retain a certain level of situation awareness to safely takeover. Understanding the visual attention allocation of drivers can pave the way for inferring their dynamic state of situational awareness. We propose a reinforcement and inverse-reinforcement learning framework for modeling passive drivers’ visual attention allocation in semi-autonomous vehicles. The proposed approach measures the eye-movement of passive drivers to evaluate their responses to real-world rear-end collisions. The results show substantial individual differences in the eye fixation patterns by driving experience, even among fully attentive drivers. Experienced drivers were more attentive to the situational dynamics and were able to identify potentially hazardous objects before any collisions occurred. These models of visual attention could potentially be integrated into autonomous systems to continuously monitor and guide effective intervention. Keywords: Visual attention allocation; Situation awareness; Eye movements; Eye fixation; Eye-Tracking; Reinforcement Learning; Inverse Reinforcement Learning
Tasks Autonomous Vehicles, Eye Tracking
Published 2019-12-17
URL https://arxiv.org/abs/1912.07773v2
PDF https://arxiv.org/pdf/1912.07773v2.pdf
PWC https://paperswithcode.com/paper/eyecar-modeling-the-visual-attention
Repo
Framework

Leveraging sentence similarity in natural language generation: Improving beam search using range voting

Title Leveraging sentence similarity in natural language generation: Improving beam search using range voting
Authors Sebastian Borgeaud, Guy Emerson
Abstract We propose a novel method for generating natural language sentences from probabilistic language models, selecting from a beam search using a range voting procedure. The proposed method could be applied to any language model, including both n-gram models and neural network models, and could be applied to any generation task. Instead of choosing the most likely output, our method chooses the most representative output, providing a solution to the common problem of short outputs being preferred over longer and more informative ones. We evaluate our method on an image captioning task, and find that the generated captions are longer and more diverse than those generated using standard beam search, with higher BLEU scores (particularly when the beam size is large), and better performance in a human evaluation.
Tasks Image Captioning, Language Modelling, Text Generation
Published 2019-08-17
URL https://arxiv.org/abs/1908.06288v1
PDF https://arxiv.org/pdf/1908.06288v1.pdf
PWC https://paperswithcode.com/paper/leveraging-sentence-similarity-in-natural
Repo
Framework

Tensor Graph Convolutional Networks for Prediction on Dynamic Graphs

Title Tensor Graph Convolutional Networks for Prediction on Dynamic Graphs
Authors Osman Asif Malik, Shashanka Ubaru, Lior Horesh, Misha E. Kilmer, Haim Avron
Abstract Many irregular domains such as social networks, financial transactions, neuron connections, and natural language structures are represented as graphs. In recent years, a variety of graph neural networks (GNNs) have been successfully applied for representation learning and prediction on such graphs. However, in many of the applications, the underlying graph changes over time and existing GNNs are inadequate for handling such dynamic graphs. In this paper we propose a novel technique for learning embeddings of dynamic graphs based on a tensor algebra framework. Our method extends the popular graph convolutional network (GCN) for learning representations of dynamic graphs using the recently proposed tensor M-product technique. Theoretical results that establish the connection between the proposed tensor approach and spectral convolution of tensors are developed. Numerical experiments on real datasets demonstrate the usefulness of the proposed method for an edge classification task on dynamic graphs.
Tasks Representation Learning
Published 2019-10-16
URL https://arxiv.org/abs/1910.07643v1
PDF https://arxiv.org/pdf/1910.07643v1.pdf
PWC https://paperswithcode.com/paper/tensor-graph-convolutional-networks-for
Repo
Framework

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Title Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
Authors Young Hun Jung, Ambuj Tewari
Abstract Restless bandit problems are instances of non-stationary multi-armed bandits. These problems have been studied well from the optimization perspective, where the goal is to efficiently find a near-optimal policy when system parameters are known. However, very few papers adopt a learning perspective, where the parameters are unknown. In this paper, we analyze the performance of Thompson sampling in episodic restless bandits with unknown parameters. We consider a general policy map to define our competitor and prove an $\tilde{\mathcal{O}}(\sqrt{T})$ Bayesian regret bound. Our competitor is flexible enough to represent various benchmarks including the best fixed action policy, the optimal policy, the Whittle index policy, or the myopic policy. We also present empirical results that support our theoretical findings.
Tasks Multi-Armed Bandits
Published 2019-05-29
URL https://arxiv.org/abs/1905.12673v2
PDF https://arxiv.org/pdf/1905.12673v2.pdf
PWC https://paperswithcode.com/paper/regret-bounds-for-thompson-sampling-in
Repo
Framework

Zero-Shot Deep Hashing and Neural Network Based Error Correction for Face Template Protection

Title Zero-Shot Deep Hashing and Neural Network Based Error Correction for Face Template Protection
Authors Veeru Talreja, Matthew C. Valenti, Nasser M. Nasrabadi
Abstract In this paper, we present a novel architecture that integrates a deep hashing framework with a neural network decoder (NND) for application to face template protection. It improves upon existing face template protection techniques to provide better matching performance with one-shot and multi-shot enrollment. A key novelty of our proposed architecture is that the framework can also be used with zero-shot enrollment. This implies that our architecture does not need to be re-trained even if a new subject is to be enrolled into the system. The proposed architecture consists of two major components: a deep hashing (DH) component, which is used for robust mapping of face images to their corresponding intermediate binary codes, and a NND component, which corrects errors in the intermediate binary codes that are caused by differences in the enrollment and probe biometrics due to factors such as variation in pose, illumination, and other factors. The final binary code generated by the NND is then cryptographically hashed and stored as a secure face template in the database. The efficacy of our approach with zero-shot, one-shot, and multi-shot enrollments is shown for CMU-PIE, Extended Yale B, WVU multimodal and Multi-PIE face databases. With zero-shot enrollment, the system achieves approximately 85% genuine accept rates (GAR) at 0.01% false accept rate (FAR), and with one-shot and multi-shot enrollments, it achieves approximately 99.95% GAR at 0.01% FAR, while providing a high level of template security.
Tasks
Published 2019-08-05
URL https://arxiv.org/abs/1908.02706v1
PDF https://arxiv.org/pdf/1908.02706v1.pdf
PWC https://paperswithcode.com/paper/zero-shot-deep-hashing-and-neural-network
Repo
Framework

Simplified Neural Unsupervised Domain Adaptation

Title Simplified Neural Unsupervised Domain Adaptation
Authors Timothy A Miller
Abstract Unsupervised domain adaptation (UDA) is the task of modifying a statistical model trained on labeled data from a source domain to achieve better performance on data from a target domain, with access to only unlabeled data in the target domain. Existing state-of-the-art UDA approaches use neural networks to learn representations that can predict the values of subset of important features called “pivot features.” In this work, we show that it is possible to improve on these methods by jointly training the representation learner with the task learner, and examine the importance of existing pivot selection methods.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2019-05-22
URL https://arxiv.org/abs/1905.09153v1
PDF https://arxiv.org/pdf/1905.09153v1.pdf
PWC https://paperswithcode.com/paper/simplified-neural-unsupervised-domain
Repo
Framework

Detecting Finger-Vein Presentation Attacks Using 3D Shape & Diffuse Reflectance Decomposition

Title Detecting Finger-Vein Presentation Attacks Using 3D Shape & Diffuse Reflectance Decomposition
Authors Jag Mohan Singh, Sushma Venkatesh, Kiran B. Raja, Raghavendra Ramachandra, Christoph Busch
Abstract Despite the high biometric performance, finger-vein recognition systems are vulnerable to presentation attacks (aka., spoofing attacks). In this paper, we present a new and robust approach for detecting presentation attacks on finger-vein biometric systems exploiting the 3D Shape (normal-map) and material properties (diffuse-map) of the finger. Observing the normal-map and diffuse-map exhibiting enhanced textural differences in comparison with the original finger-vein image, especially in the presence of varying illumination intensity, we propose to employ textural feature-descriptors on both of them independently. The features are subsequently used to compute a separating hyper-plane using Support Vector Machine (SVM) classifiers for the features computed from normal-maps and diffuse-maps independently. Given the scores from each classifier for normal-map and diffuse-map, we propose sum-rule based score level fusion to make detection of such presentation attack more robust. To this end, we construct a new database of finger-vein images acquired using a custom capture device with three inbuilt illuminations and validate the applicability of the proposed approach. The newly collected database consists of 936 images, which corresponds to 468 bona fide images and 468 artefact images. We establish the superiority of the proposed approach by benchmarking it with classical textural feature-descriptor applied directly on finger-vein images. The proposed approach outperforms the classical approaches by providing the Attack Presentation Classification Error Rate (APCER) & Bona fide Presentation Classification Error Rate (BPCER) of 0% compared to comparable traditional methods.
Tasks
Published 2019-12-03
URL https://arxiv.org/abs/1912.01408v1
PDF https://arxiv.org/pdf/1912.01408v1.pdf
PWC https://paperswithcode.com/paper/detecting-finger-vein-presentation-attacks
Repo
Framework

Reducing the Computational Complexity of Pseudoinverse for the Incremental Broad Learning System on Added Inputs

Title Reducing the Computational Complexity of Pseudoinverse for the Incremental Broad Learning System on Added Inputs
Authors Hufei Zhu, Chenghao Wei
Abstract In this brief, we improve the Broad Learning System (BLS) [7] by reducing the computational complexity of the incremental learning for added inputs. We utilize the inverse of a sum of matrices in [8] to improve a step in the pseudoinverse of a row-partitioned matrix. Accordingly we propose two fast algorithms for the cases of q > k and q < k, respectively, where q and k denote the number of additional training samples and the total number of nodes, respectively. Specifically, when q > k, the proposed algorithm computes only a k * k matrix inverse, instead of a q * q matrix inverse in the existing algorithm. Accordingly it can reduce the complexity dramatically. Our simulations, which follow those for Table V in [7], show that the proposed algorithm and the existing algorithm achieve the same testing accuracy, while the speedups in BLS training time of the proposed algorithm over the existing algorithm are 1.24 - 1.30.
Tasks
Published 2019-10-17
URL https://arxiv.org/abs/1910.07755v1
PDF https://arxiv.org/pdf/1910.07755v1.pdf
PWC https://paperswithcode.com/paper/reducing-the-computational-complexity-of
Repo
Framework

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

Title Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice
Authors Vikramjit Mitra, Sue Booker, Erik Marchi, David Scott Farrar, Ute Dorothea Peitz, Bridget Cheng, Ermine Teves, Anuj Mehta, Devang Naik
Abstract Millions of people reach out to digital assistants such as Siri every day, asking for information, making phone calls, seeking assistance, and much more. The expectation is that such assistants should understand the intent of the users query. Detecting the intent of a query from a short, isolated utterance is a difficult task. Intent cannot always be obtained from speech-recognized transcriptions. A transcription driven approach can interpret what has been said but fails to acknowledge how it has been said, and as a consequence, may ignore the expression present in the voice. Our work investigates whether a system can reliably detect vocal expression in queries using acoustic and paralinguistic embedding. Results show that the proposed method offers a relative equal error rate (EER) decrease of 60% compared to a bag-of-word based system, corroborating that expression is significantly represented by vocal attributes, rather than being purely lexical. Addition of emotion embedding helped to reduce the EER by 30% relative to the acoustic embedding, demonstrating the relevance of emotion in expressive voice.
Tasks
Published 2019-06-28
URL https://arxiv.org/abs/1907.00112v1
PDF https://arxiv.org/pdf/1907.00112v1.pdf
PWC https://paperswithcode.com/paper/leveraging-acoustic-cues-and-paralinguistic
Repo
Framework

A User-Centered Concept Mining System for Query and Document Understanding at Tencent

Title A User-Centered Concept Mining System for Query and Document Understanding at Tencent
Authors Bang Liu, Weidong Guo, Di Niu, Chaoyue Wang, Shunnan Xu, Jinghong Lin, Kunfeng Lai, Yu Xu
Abstract Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.
Tasks
Published 2019-05-21
URL https://arxiv.org/abs/1905.08487v1
PDF https://arxiv.org/pdf/1905.08487v1.pdf
PWC https://paperswithcode.com/paper/a-user-centered-concept-mining-system-for
Repo
Framework

The Computational Structure of Unintentional Meaning

Title The Computational Structure of Unintentional Meaning
Authors Mark K. Ho, Joanna Korman, Thomas L. Griffiths
Abstract Speech-acts can have literal meaning as well as pragmatic meaning, but these both involve consequences typically intended by a speaker. Speech-acts can also have unintentional meaning, in which what is conveyed goes above and beyond what was intended. Here, we present a Bayesian analysis of how, to a listener, the meaning of an utterance can significantly differ from a speaker’s intended meaning. Our model emphasizes how comprehending the intentional and unintentional meaning of speech-acts requires listeners to engage in sophisticated model-based perspective-taking and reasoning about the history of the state of the world, each other’s actions, and each other’s observations. To test our model, we have human participants make judgments about vignettes where speakers make utterances that could be interpreted as intentional insults or unintentional faux pas. In elucidating the mechanics of speech-acts with unintentional meanings, our account provides insight into how communication both functions and malfunctions.
Tasks
Published 2019-06-03
URL https://arxiv.org/abs/1906.01983v1
PDF https://arxiv.org/pdf/1906.01983v1.pdf
PWC https://paperswithcode.com/paper/the-computational-structure-of-unintentional
Repo
Framework

Unsupervised Multi-stream Highlight detection for the Game “Honor of Kings”

Title Unsupervised Multi-stream Highlight detection for the Game “Honor of Kings”
Authors Li Wang, Zixun Sun, Wentao Yao, Hui Zhan, Chengwei Zhu
Abstract With the increasing popularity of E-sport live, Highlight Flashback has been a critical functionality of live platforms, which aggregates the overall exciting fighting scenes in a few seconds. In this paper, we introduce a novel training strategy without any additional annotation to automatically generate highlights for game video live. Considering that the existing manual edited clips contain more highlights than long game live videos, we perform pair-wise ranking constraints across clips from edited and long live videos. A multi-stream framework is also proposed to fuse spatial, temporal as well as audio features extracted from videos. To evaluate our method, we test on long game live videos with an average length of about 15 minutes. Extensive experimental results on videos demonstrate its satisfying performance on highlights generation and effectiveness by the fusion of three streams.
Tasks
Published 2019-10-14
URL https://arxiv.org/abs/1910.06189v2
PDF https://arxiv.org/pdf/1910.06189v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-multi-stream-highlight-detection
Repo
Framework

Universal One-Dimensional Cellular Automata Derived for Turing Machines and its Dynamical Behaviour

Title Universal One-Dimensional Cellular Automata Derived for Turing Machines and its Dynamical Behaviour
Authors Sergio J. Martinez, Ivan M. Mendoza, Genaro J. Martinez, Shigeru Ninagawa
Abstract Universality in cellular automata theory is a central problem studied and developed from their origins by John von Neumann. In this paper, we present an algorithm where any Turing machine can be converted to one-dimensional cellular automaton with a 2-linear time and display its spatial dynamics. Three particular Turing machines are converted in three universal one-dimensional cellular automata, they are: binary sum, rule 110 and a universal reversible Turing machine.
Tasks
Published 2019-07-06
URL https://arxiv.org/abs/1907.04211v1
PDF https://arxiv.org/pdf/1907.04211v1.pdf
PWC https://paperswithcode.com/paper/universal-one-dimensional-cellular-automata
Repo
Framework

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

Title Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation
Authors Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko
Abstract Vision-and-Language Navigation (VLN) requires grounding instructions, such as “turn right and stop at the door”, to routes in a visual environment. The actual grounding can connect language to the environment through multiple modalities, e.g. “stop at the door” might ground into visual objects, while “turn right” might rely only on the geometric structure of a route. We investigate where the natural language empirically grounds under two recent state-of-the-art VLN models. Surprisingly, we discover that visual features may actually hurt these models: models which only use route structure, ablating visual features, outperform their visual counterparts in unseen new environments on the benchmark Room-to-Room dataset. To better use all the available modalities, we propose to decompose the grounding procedure into a set of expert models with access to different modalities (including object detections) and ensemble them at prediction time, improving the performance of state-of-the-art models on the VLN task.
Tasks
Published 2019-06-02
URL https://arxiv.org/abs/1906.00347v3
PDF https://arxiv.org/pdf/1906.00347v3.pdf
PWC https://paperswithcode.com/paper/190600347
Repo
Framework

Explanatory Masks for Neural Network Interpretability

Title Explanatory Masks for Neural Network Interpretability
Authors Lawrence Phillips, Garrett Goh, Nathan Hodas
Abstract Neural network interpretability is a vital component for applications across a wide variety of domains. In such cases it is often useful to analyze a network which has already been trained for its specific purpose. In this work, we develop a method to produce explanation masks for pre-trained networks. The mask localizes the most important aspects of each input for prediction of the original network. Masks are created by a secondary network whose goal is to create as small an explanation as possible while still preserving the predictive accuracy of the original network. We demonstrate the applicability of our method for image classification with CNNs, sentiment analysis with RNNs, and chemical property prediction with mixed CNN/RNN architectures.
Tasks Image Classification, Sentiment Analysis
Published 2019-11-15
URL https://arxiv.org/abs/1911.06876v1
PDF https://arxiv.org/pdf/1911.06876v1.pdf
PWC https://paperswithcode.com/paper/explanatory-masks-for-neural-network
Repo
Framework
comments powered by Disqus