October 19, 2019

2871 words 14 mins read

Paper Group ANR 275

Paper Group ANR 275

EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning. SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties. Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency. Visual Semantic Re-ranker …

EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning

Title EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning
Authors Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter
Abstract Acoustically expressed emotions can make communication with a robot more efficient. Detecting emotions like anger could provide a clue for the robot indicating unsafe/undesired situations. Recently, several deep neural network-based models have been proposed which establish new state-of-the-art results in affective state evaluation. These models typically start processing at the end of each utterance, which not only requires a mechanism to detect the end of an utterance but also makes it difficult to use them in a real-time communication scenario, e.g. human-robot interaction. We propose the EmoRL model that triggers an emotion classification as soon as it gains enough confidence while listening to a person speaking. As a result, we minimize the need for segmenting the audio signal for classification and achieve lower latency as the audio signal is processed incrementally. The method is competitive with the accuracy of a strong baseline model, while allowing much earlier prediction.
Tasks Emotion Classification
Published 2018-04-03
URL http://arxiv.org/abs/1804.04053v1
PDF http://arxiv.org/pdf/1804.04053v1.pdf
PWC https://paperswithcode.com/paper/emorl-continuous-acoustic-emotion

SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties

Title SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties
Authors Jian Huang, Yuling Jiao, Xiliang Lu, Yueyong Shi, Qinglong Yang
Abstract We propose a semismooth Newton algorithm for pathwise optimization (SNAP) for the LASSO and Enet in sparse, high-dimensional linear regression. SNAP is derived from a suitable formulation of the KKT conditions based on Newton derivatives. It solves the semismooth KKT equations efficiently by actively and continuously seeking the support of the regression coefficients along the solution path with warm start. At each knot in the path, SNAP converges locally superlinearly for the Enet criterion and achieves an optimal local convergence rate for the LASSO criterion, i.e., SNAP converges in one step at the cost of two matrix-vector multiplication per iteration. Under certain regularity conditions on the design matrix and the minimum magnitude of the nonzero elements of the target regression coefficients, we show that SNAP hits a solution with the same signs as the regression coefficients and achieves a sharp estimation error bound in finite steps with high probability. The computational complexity of SNAP is shown to be the same as that of LARS and coordinate descent algorithms per iteration. Simulation studies and real data analysis support our theoretical results and demonstrate that SNAP is faster and accurate than LARS and coordinate descent algorithms.
Published 2018-10-09
URL http://arxiv.org/abs/1810.03814v1
PDF http://arxiv.org/pdf/1810.03814v1.pdf
PWC https://paperswithcode.com/paper/snap-a-semismooth-newton-algorithm-for

Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency

Title Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
Authors Eunji Chong, Nataniel Ruiz, Yongxin Wang, Yun Zhang, Agata Rozga, James Rehg
Abstract This paper addresses the challenging problem of estimating the general visual attention of people in images. Our proposed method is designed to work across multiple naturalistic social scenarios and provides a full picture of the subject’s attention and gaze. In contrast, earlier works on gaze and attention estimation have focused on constrained problems in more specific contexts. In particular, our model explicitly represents the gaze direction and handles out-of-frame gaze targets. We leverage three different datasets using a multi-task learning approach. We evaluate our method on widely used benchmarks for single-tasks such as gaze angle estimation and attention-within-an-image, as well as on the new challenging task of generalized visual attention prediction. In addition, we have created extended annotations for the MMDB and GazeFollow datasets which are used in our experiments, which we will publicly release.
Tasks Multi-Task Learning
Published 2018-07-27
URL http://arxiv.org/abs/1807.10437v1
PDF http://arxiv.org/pdf/1807.10437v1.pdf
PWC https://paperswithcode.com/paper/connecting-gaze-scene-and-attention

Visual Semantic Re-ranker for Text Spotting

Title Visual Semantic Re-ranker for Text Spotting
Authors Ahmed Sabir, Francesc Moreno-Noguer, Lluís Padró
Abstract Many current state-of-the-art methods for text recognition are based on purely local information and ignore the semantic correlation between text and its surrounding visual context. In this paper, we propose a post-processing approach to improve the accuracy of text spotting by using the semantic relation between the text and the scene. We initially rely on an off-the-shelf deep neural network that provides a series of text hypotheses for each input image. These text hypotheses are then re-ranked using the semantic relatedness with the object in the image. As a result of this combination, the performance of the original network is boosted with a very low computational cost. The proposed framework can be used as a drop-in complement for any text-spotting algorithm that outputs a ranking of word hypotheses. We validate our approach on ICDAR’17 shared task dataset.
Tasks Text Spotting
Published 2018-10-23
URL http://arxiv.org/abs/1810.09776v2
PDF http://arxiv.org/pdf/1810.09776v2.pdf
PWC https://paperswithcode.com/paper/visual-semantic-re-ranker-for-text-spotting

Representation Learning of Pedestrian Trajectories Using Actor-Critic Sequence-to-Sequence Autoencoder

Title Representation Learning of Pedestrian Trajectories Using Actor-Critic Sequence-to-Sequence Autoencoder
Authors Ka-Ho Chow, Anish Hiranandani, Yifeng Zhang, S. -H. Gary Chan
Abstract Representation learning of pedestrian trajectories transforms variable-length timestamp-coordinate tuples of a trajectory into a fixed-length vector representation that summarizes spatiotemporal characteristics. It is a crucial technique to connect feature-based data mining with trajectory data. Trajectory representation is a challenging problem, because both environmental constraints (e.g., wall partitions) and temporal user dynamics should be meticulously considered and accounted for. Furthermore, traditional sequence-to-sequence autoencoders using maximum log-likelihood often require dataset covering all the possible spatiotemporal characteristics to perform well. This is infeasible or impractical in reality. We propose TREP, a practical pedestrian trajectory representation learning algorithm which captures the environmental constraints and the pedestrian dynamics without the need of any training dataset. By formulating a sequence-to-sequence autoencoder with a spatial-aware objective function under the paradigm of actor-critic reinforcement learning, TREP intelligently encodes spatiotemporal characteristics of trajectories with the capability of handling diverse trajectory patterns. Extensive experiments on both synthetic and real datasets validate the high fidelity of TREP to represent trajectories.
Tasks Representation Learning
Published 2018-11-20
URL http://arxiv.org/abs/1811.08069v1
PDF http://arxiv.org/pdf/1811.08069v1.pdf
PWC https://paperswithcode.com/paper/representation-learning-of-pedestrian

Inductive Visual Localisation: Factorised Training for Superior Generalisation

Title Inductive Visual Localisation: Factorised Training for Superior Generalisation
Authors Ankush Gupta, Andrea Vedaldi, Andrew Zisserman
Abstract End-to-end trained Recurrent Neural Networks (RNNs) have been successfully applied to numerous problems that require processing sequences, such as image captioning, machine translation, and text recognition. However, RNNs often struggle to generalise to sequences longer than the ones encountered during training. In this work, we propose to optimise neural networks explicitly for induction. The idea is to first decompose the problem in a sequence of inductive steps and then to explicitly train the RNN to reproduce such steps. Generalisation is achieved as the RNN is not allowed to learn an arbitrary internal state; instead, it is tasked with mimicking the evolution of a valid state. In particular, the state is restricted to a spatial memory map that tracks parts of the input image which have been accounted for in previous steps. The RNN is trained for single inductive steps, where it produces updates to the memory in addition to the desired output. We evaluate our method on two different visual recognition problems involving visual sequences: (1) text spotting, i.e. joint localisation and reading of text in images containing multiple lines (or a block) of text, and (2) sequential counting of objects in aerial images. We show that inductive training of recurrent models enhances their generalisation ability on challenging image datasets.
Tasks Image Captioning, Machine Translation, Text Spotting
Published 2018-07-21
URL http://arxiv.org/abs/1807.08179v1
PDF http://arxiv.org/pdf/1807.08179v1.pdf
PWC https://paperswithcode.com/paper/inductive-visual-localisation-factorised

High-dimensional Index Volatility Models via Stein’s Identity

Title High-dimensional Index Volatility Models via Stein’s Identity
Authors Sen Na, Mladen Kolar
Abstract We study estimation of the parametric components of single and multiple index volatility models. Using the first- and second-order Stein’s identity, we develop methods that are applicable for estimation of the variance index in a high-dimensional setting requiring finite moment condition, which allows for heavy-tailed data. Our approach complements the existing literature in a low-dimensional setting, while relaxing the conditions on estimation, and provides a novel approach in a high-dimensional setting. We prove that the statistical rate of convergence of our variance index estimators consists of a parametric rate and a nonparametric rate, where the latter appears from the estimation of the mean link function. However, under standard assumptions, the parametric rate dominates the rate of convergence and our results match the minimax optimal rate for the mean index estimation. Simulation results illustrate finite sample properties of our methodology and back our theoretical conclusions.
Published 2018-11-27
URL http://arxiv.org/abs/1811.10790v2
PDF http://arxiv.org/pdf/1811.10790v2.pdf
PWC https://paperswithcode.com/paper/high-dimensional-index-volatility-models-via

Deep Neural Networks for Query Expansion using Word Embeddings

Title Deep Neural Networks for Query Expansion using Word Embeddings
Authors Ayyoob Imani, Amir Vakili, Ali Montazer, Azadeh Shakery
Abstract Query expansion is a method for alleviating the vocabulary mismatch problem present in information retrieval tasks. Previous works have shown that terms selected for query expansion by traditional methods such as pseudo-relevance feedback are not always helpful to the retrieval process. In this paper, we show that this is also true for more recently proposed embedding-based query expansion methods. We then introduce an artificial neural network classifier to predict the usefulness of query expansion terms. This classifier uses term word embeddings as inputs. We perform experiments on four TREC newswire and web collections show that using terms selected by the classifier for expansion significantly improves retrieval performance when compared to competitive baselines. The results are also shown to be more robust than the baselines.
Tasks Information Retrieval, Word Embeddings
Published 2018-11-08
URL http://arxiv.org/abs/1811.03514v1
PDF http://arxiv.org/pdf/1811.03514v1.pdf
PWC https://paperswithcode.com/paper/deep-neural-networks-for-query-expansion

Learning to Collaborate for User-Controlled Privacy

Title Learning to Collaborate for User-Controlled Privacy
Authors Martin Bertran, Natalia Martinez, Afroditi Papadaki, Qiang Qiu, Miguel Rodrigues, Guillermo Sapiro
Abstract It is becoming increasingly clear that users should own and control their data. Utility providers are also becoming more interested in guaranteeing data privacy. As such, users and utility providers should collaborate in data privacy, a paradigm that has not yet been developed in the privacy research community. We introduce this concept and present explicit architectures where the user controls what characteristics of the data she/he wants to share and what she/he wants to keep private. This is achieved by collaborative learning a sensitization function, either a deterministic or a stochastic one, that retains valuable information for the utility tasks but it also eliminates necessary information for the privacy ones. As illustration examples, we implement them using a plug-and-play approach, where no algorithm is changed at the system provider end, and an adversarial approach, where minor re-training of the privacy inferring engine is allowed. In both cases the learned sanitization function keeps the data in the original domain, thereby allowing the system to use the same algorithms it was using before for both original and privatized data. We show how we can maintain utility while fully protecting private information if the user chooses to do so, even when the first is harder than the second, as in the case here illustrated of identity detection while hiding gender.
Published 2018-05-18
URL http://arxiv.org/abs/1805.07410v1
PDF http://arxiv.org/pdf/1805.07410v1.pdf
PWC https://paperswithcode.com/paper/learning-to-collaborate-for-user-controlled

Arcades: A deep model for adaptive decision making in voice controlled smart-home

Title Arcades: A deep model for adaptive decision making in voice controlled smart-home
Authors Alexis Brenon, François Portet, Michel Vacher
Abstract In a voice-controlled smart-home, a controller must respond not only to user’s requests but also according to the interaction context. This paper describes Arcades, a system which uses deep reinforcement learning to extract context from a graphical representation of home automation system and to update continuously its behavior to the user’s one. This system is robust to changes in the environment (sensor breakdown or addition) through its graphical representation (scale well) and the reinforcement mechanism (adapt well). The experiments on realistic data demonstrate that this method promises to reach long life context-aware control of smart-home.
Tasks Decision Making
Published 2018-07-05
URL http://arxiv.org/abs/1807.01970v1
PDF http://arxiv.org/pdf/1807.01970v1.pdf
PWC https://paperswithcode.com/paper/arcades-a-deep-model-for-adaptive-decision

Attentioned Convolutional LSTM InpaintingNetwork for Anomaly Detection in Videos

Title Attentioned Convolutional LSTM InpaintingNetwork for Anomaly Detection in Videos
Authors Itamar Ben-Ari, Ravid Shwartz-Ziv
Abstract We propose a semi-supervised model for detecting anomalies in videos inspiredby the Video Pixel Network [van den Oord et al., 2016]. VPN is a probabilisticgenerative model based on a deep neural network that estimates the discrete jointdistribution of raw pixels in video frames. Our model extends the Convolutional-LSTM video encoder part of the VPN with a novel convolutional based attentionmechanism. We also modify the Pixel-CNN decoder part of the VPN to a frameinpainting task where a partially masked version of the frame to predict is given asinput. The frame reconstruction error is used as an anomaly indicator. We test ourmodel on a modified version of the moving mnist dataset [Srivastava et al., 2015]. Our model is shown to be effective in detecting anomalies in videos. This approachcould be a component in applications requiring visual common sense.
Tasks Anomaly Detection, Common Sense Reasoning
Published 2018-11-26
URL http://arxiv.org/abs/1811.10228v1
PDF http://arxiv.org/pdf/1811.10228v1.pdf
PWC https://paperswithcode.com/paper/attentioned-convolutional-lstm

Deep Structured Generative Models

Title Deep Structured Generative Models
Authors Kun Xu, Haoyu Liang, Jun Zhu, Hang Su, Bo Zhang
Abstract Deep generative models have shown promising results in generating realistic images, but it is still non-trivial to generate images with complicated structures. The main reason is that most of the current generative models fail to explore the structures in the images including spatial layout and semantic relations between objects. To address this issue, we propose a novel deep structured generative model which boosts generative adversarial networks (GANs) with the aid of structure information. In particular, the layout or structure of the scene is encoded by a stochastic and-or graph (sAOG), in which the terminal nodes represent single objects and edges represent relations between objects. With the sAOG appropriately harnessed, our model can successfully capture the intrinsic structure in the scenes and generate images of complicated scenes accordingly. Furthermore, a detection network is introduced to infer scene structures from a image. Experimental results demonstrate the effectiveness of our proposed method on both modeling the intrinsic structures, and generating realistic images.
Published 2018-07-10
URL http://arxiv.org/abs/1807.03877v1
PDF http://arxiv.org/pdf/1807.03877v1.pdf
PWC https://paperswithcode.com/paper/deep-structured-generative-models

Model Agnostic Saliency for Weakly Supervised Lesion Detection from Breast DCE-MRI

Title Model Agnostic Saliency for Weakly Supervised Lesion Detection from Breast DCE-MRI
Authors Gabriel Maicas, Gerard Snaauw, Andrew P. Bradley, Ian Reid, Gustavo Carneiro
Abstract There is a heated debate on how to interpret the decisions provided by deep learning models (DLM), where the main approaches rely on the visualization of salient regions to interpret the DLM classification process. However, these approaches generally fail to satisfy three conditions for the problem of lesion detection from medical images: 1) for images with lesions, all salient regions should represent lesions, 2) for images containing no lesions, no salient region should be produced,and 3) lesions are generally small with relatively smooth borders. We propose a new model-agnostic paradigm to interpret DLM classification decisions supported by a novel definition of saliency that incorporates the conditions above. Our model-agnostic 1-class saliency detector (MASD) is tested on weakly supervised breast lesion detection from DCE-MRI, achieving state-of-the-art detection accuracy when compared to current visualization methods.
Published 2018-07-20
URL http://arxiv.org/abs/1807.07784v3
PDF http://arxiv.org/pdf/1807.07784v3.pdf
PWC https://paperswithcode.com/paper/model-agnostic-saliency-for-weakly-supervised

Regularized Finite Dimensional Kernel Sobolev Discrepancy

Title Regularized Finite Dimensional Kernel Sobolev Discrepancy
Authors Youssef Mroueh
Abstract We show in this note that the Sobolev Discrepancy introduced in Mroueh et al in the context of generative adversarial networks, is actually the weighted negative Sobolev norm $._{\dot{H}^{-1}(\nu_q)}$, that is known to linearize the Wasserstein $W_2$ distance and plays a fundamental role in the dynamic formulation of optimal transport of Benamou and Brenier. Given a Kernel with finite dimensional feature map we show that the Sobolev discrepancy can be approximated from finite samples. Assuming this discrepancy is finite, the error depends on the approximation error in the function space induced by the finite dimensional feature space kernel and on a statistical error due to the finite sample approximation.
Published 2018-05-16
URL http://arxiv.org/abs/1805.06441v1
PDF http://arxiv.org/pdf/1805.06441v1.pdf
PWC https://paperswithcode.com/paper/regularized-finite-dimensional-kernel-sobolev

UMDuluth-CS8761 at SemEval-2018 Task 9: Hypernym Discovery using Hearst Patterns, Co-occurrence frequencies and Word Embeddings

Title UMDuluth-CS8761 at SemEval-2018 Task 9: Hypernym Discovery using Hearst Patterns, Co-occurrence frequencies and Word Embeddings
Authors Arshia Z. Hassan, Manikya S. Vallabhajosyula, Ted Pedersen
Abstract Hypernym Discovery is the task of identifying potential hypernyms for a given term. A hypernym is a more generalized word that is super-ordinate to more specific words. This paper explores several approaches that rely on co-occurrence frequencies of word pairs, Hearst Patterns based on regular expressions, and word embeddings created from the UMBC corpus. Our system Babbage participated in Subtask 1A for English and placed 6th of 19 systems when identifying concept hypernyms, and 12th of 18 systems for entity hypernyms.
Tasks Hypernym Discovery, Word Embeddings
Published 2018-05-25
URL http://arxiv.org/abs/1805.10271v1
PDF http://arxiv.org/pdf/1805.10271v1.pdf
PWC https://paperswithcode.com/paper/umduluth-cs8761-at-semeval-2018-task-9
comments powered by Disqus