October 19, 2019

2871 words 14 mins read

Paper Group ANR 275

EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning. SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties. Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency. Visual Semantic Re-ranker …

EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning


Title	EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning
Authors	Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter
Abstract	Acoustically expressed emotions can make communication with a robot more efficient. Detecting emotions like anger could provide a clue for the robot indicating unsafe/undesired situations. Recently, several deep neural network-based models have been proposed which establish new state-of-the-art results in affective state evaluation. These models typically start processing at the end of each utterance, which not only requires a mechanism to detect the end of an utterance but also makes it difficult to use them in a real-time communication scenario, e.g. human-robot interaction. We propose the EmoRL model that triggers an emotion classification as soon as it gains enough confidence while listening to a person speaking. As a result, we minimize the need for segmenting the audio signal for classification and achieve lower latency as the audio signal is processed incrementally. The method is competitive with the accuracy of a strong baseline model, while allowing much earlier prediction.
Tasks	Emotion Classification
Published	2018-04-03
URL	http://arxiv.org/abs/1804.04053v1
PDF	http://arxiv.org/pdf/1804.04053v1.pdf
PWC	https://paperswithcode.com/paper/emorl-continuous-acoustic-emotion
Repo
Framework

SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties


Title	SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties
Authors	Jian Huang, Yuling Jiao, Xiliang Lu, Yueyong Shi, Qinglong Yang
Abstract	We propose a semismooth Newton algorithm for pathwise optimization (SNAP) for the LASSO and Enet in sparse, high-dimensional linear regression. SNAP is derived from a suitable formulation of the KKT conditions based on Newton derivatives. It solves the semismooth KKT equations efficiently by actively and continuously seeking the support of the regression coefficients along the solution path with warm start. At each knot in the path, SNAP converges locally superlinearly for the Enet criterion and achieves an optimal local convergence rate for the LASSO criterion, i.e., SNAP converges in one step at the cost of two matrix-vector multiplication per iteration. Under certain regularity conditions on the design matrix and the minimum magnitude of the nonzero elements of the target regression coefficients, we show that SNAP hits a solution with the same signs as the regression coefficients and achieves a sharp estimation error bound in finite steps with high probability. The computational complexity of SNAP is shown to be the same as that of LARS and coordinate descent algorithms per iteration. Simulation studies and real data analysis support our theoretical results and demonstrate that SNAP is faster and accurate than LARS and coordinate descent algorithms.
Tasks
Published	2018-10-09
URL	http://arxiv.org/abs/1810.03814v1
PDF	http://arxiv.org/pdf/1810.03814v1.pdf
PWC	https://paperswithcode.com/paper/snap-a-semismooth-newton-algorithm-for
Repo
Framework

Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency


Title	Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
Authors	Eunji Chong, Nataniel Ruiz, Yongxin Wang, Yun Zhang, Agata Rozga, James Rehg
Abstract	This paper addresses the challenging problem of estimating the general visual attention of people in images. Our proposed method is designed to work across multiple naturalistic social scenarios and provides a full picture of the subject’s attention and gaze. In contrast, earlier works on gaze and attention estimation have focused on constrained problems in more specific contexts. In particular, our model explicitly represents the gaze direction and handles out-of-frame gaze targets. We leverage three different datasets using a multi-task learning approach. We evaluate our method on widely used benchmarks for single-tasks such as gaze angle estimation and attention-within-an-image, as well as on the new challenging task of generalized visual attention prediction. In addition, we have created extended annotations for the MMDB and GazeFollow datasets which are used in our experiments, which we will publicly release.
Tasks	Multi-Task Learning
Published	2018-07-27
URL	http://arxiv.org/abs/1807.10437v1
PDF	http://arxiv.org/pdf/1807.10437v1.pdf
PWC	https://paperswithcode.com/paper/connecting-gaze-scene-and-attention
Repo
Framework

Visual Semantic Re-ranker for Text Spotting


Title	Visual Semantic Re-ranker for Text Spotting
Authors	Ahmed Sabir, Francesc Moreno-Noguer, Lluís Padró
Abstract	Many current state-of-the-art methods for text recognition are based on purely local information and ignore the semantic correlation between text and its surrounding visual context. In this paper, we propose a post-processing approach to improve the accuracy of text spotting by using the semantic relation between the text and the scene. We initially rely on an off-the-shelf deep neural network that provides a series of text hypotheses for each input image. These text hypotheses are then re-ranked using the semantic relatedness with the object in the image. As a result of this combination, the performance of the original network is boosted with a very low computational cost. The proposed framework can be used as a drop-in complement for any text-spotting algorithm that outputs a ranking of word hypotheses. We validate our approach on ICDAR’17 shared task dataset.
Tasks	Text Spotting
Published	2018-10-23
URL	http://arxiv.org/abs/1810.09776v2
PDF	http://arxiv.org/pdf/1810.09776v2.pdf
PWC	https://paperswithcode.com/paper/visual-semantic-re-ranker-for-text-spotting
Repo
Framework

Representation Learning of Pedestrian Trajectories Using Actor-Critic Sequence-to-Sequence Autoencoder


Title	Representation Learning of Pedestrian Trajectories Using Actor-Critic Sequence-to-Sequence Autoencoder
Authors	Ka-Ho Chow, Anish Hiranandani, Yifeng Zhang, S. -H. Gary Chan
Abstract	Representation learning of pedestrian trajectories transforms variable-length timestamp-coordinate tuples of a trajectory into a fixed-length vector representation that summarizes spatiotemporal characteristics. It is a crucial technique to connect feature-based data mining with trajectory data. Trajectory representation is a challenging problem, because both environmental constraints (e.g., wall partitions) and temporal user dynamics should be meticulously considered and accounted for. Furthermore, traditional sequence-to-sequence autoencoders using maximum log-likelihood often require dataset covering all the possible spatiotemporal characteristics to perform well. This is infeasible or impractical in reality. We propose TREP, a practical pedestrian trajectory representation learning algorithm which captures the environmental constraints and the pedestrian dynamics without the need of any training dataset. By formulating a sequence-to-sequence autoencoder with a spatial-aware objective function under the paradigm of actor-critic reinforcement learning, TREP intelligently encodes spatiotemporal characteristics of trajectories with the capability of handling diverse trajectory patterns. Extensive experiments on both synthetic and real datasets validate the high fidelity of TREP to represent trajectories.
Tasks	Representation Learning
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08069v1
PDF	http://arxiv.org/pdf/1811.08069v1.pdf
PWC	https://paperswithcode.com/paper/representation-learning-of-pedestrian
Repo
Framework

Inductive Visual Localisation: Factorised Training for Superior Generalisation


Title	Inductive Visual Localisation: Factorised Training for Superior Generalisation
Authors	Ankush Gupta, Andrea Vedaldi, Andrew Zisserman
Abstract	End-to-end trained Recurrent Neural Networks (RNNs) have been successfully applied to numerous problems that require processing sequences, such as image captioning, machine translation, and text recognition. However, RNNs often struggle to generalise to sequences longer than the ones encountered during training. In this work, we propose to optimise neural networks explicitly for induction. The idea is to first decompose the problem in a sequence of inductive steps and then to explicitly train the RNN to reproduce such steps. Generalisation is achieved as the RNN is not allowed to learn an arbitrary internal state; instead, it is tasked with mimicking the evolution of a valid state. In particular, the state is restricted to a spatial memory map that tracks parts of the input image which have been accounted for in previous steps. The RNN is trained for single inductive steps, where it produces updates to the memory in addition to the desired output. We evaluate our method on two different visual recognition problems involving visual sequences: (1) text spotting, i.e. joint localisation and reading of text in images containing multiple lines (or a block) of text, and (2) sequential counting of objects in aerial images. We show that inductive training of recurrent models enhances their generalisation ability on challenging image datasets.
Tasks	Image Captioning, Machine Translation, Text Spotting
Published	2018-07-21
URL	http://arxiv.org/abs/1807.08179v1
PDF	http://arxiv.org/pdf/1807.08179v1.pdf
PWC	https://paperswithcode.com/paper/inductive-visual-localisation-factorised
Repo
Framework

High-dimensional Index Volatility Models via Stein’s Identity


Title	High-dimensional Index Volatility Models via Stein’s Identity
Authors	Sen Na, Mladen Kolar
Abstract	We study estimation of the parametric components of single and multiple index volatility models. Using the first- and second-order Stein’s identity, we develop methods that are applicable for estimation of the variance index in a high-dimensional setting requiring finite moment condition, which allows for heavy-tailed data. Our approach complements the existing literature in a low-dimensional setting, while relaxing the conditions on estimation, and provides a novel approach in a high-dimensional setting. We prove that the statistical rate of convergence of our variance index estimators consists of a parametric rate and a nonparametric rate, where the latter appears from the estimation of the mean link function. However, under standard assumptions, the parametric rate dominates the rate of convergence and our results match the minimax optimal rate for the mean index estimation. Simulation results illustrate finite sample properties of our methodology and back our theoretical conclusions.
Tasks
Published	2018-11-27
URL	http://arxiv.org/abs/1811.10790v2
PDF	http://arxiv.org/pdf/1811.10790v2.pdf
PWC	https://paperswithcode.com/paper/high-dimensional-index-volatility-models-via
Repo
Framework

Deep Neural Networks for Query Expansion using Word Embeddings


Title	Deep Neural Networks for Query Expansion using Word Embeddings
Authors	Ayyoob Imani, Amir Vakili, Ali Montazer, Azadeh Shakery
Abstract	Query expansion is a method for alleviating the vocabulary mismatch problem present in information retrieval tasks. Previous works have shown that terms selected for query expansion by traditional methods such as pseudo-relevance feedback are not always helpful to the retrieval process. In this paper, we show that this is also true for more recently proposed embedding-based query expansion methods. We then introduce an artificial neural network classifier to predict the usefulness of query expansion terms. This classifier uses term word embeddings as inputs. We perform experiments on four TREC newswire and web collections show that using terms selected by the classifier for expansion significantly improves retrieval performance when compared to competitive baselines. The results are also shown to be more robust than the baselines.
Tasks	Information Retrieval, Word Embeddings
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03514v1
PDF	http://arxiv.org/pdf/1811.03514v1.pdf
PWC	https://paperswithcode.com/paper/deep-neural-networks-for-query-expansion
Repo
Framework

Learning to Collaborate for User-Controlled Privacy


Title	Learning to Collaborate for User-Controlled Privacy
Authors	Martin Bertran, Natalia Martinez, Afroditi Papadaki, Qiang Qiu, Miguel Rodrigues, Guillermo Sapiro
Abstract	It is becoming increasingly clear that users should own and control their data. Utility providers are also becoming more interested in guaranteeing data privacy. As such, users and utility providers should collaborate in data privacy, a paradigm that has not yet been developed in the privacy research community. We introduce this concept and present explicit architectures where the user controls what characteristics of the data she/he wants to share and what she/he wants to keep private. This is achieved by collaborative learning a sensitization function, either a deterministic or a stochastic one, that retains valuable information for the utility tasks but it also eliminates necessary information for the privacy ones. As illustration examples, we implement them using a plug-and-play approach, where no algorithm is changed at the system provider end, and an adversarial approach, where minor re-training of the privacy inferring engine is allowed. In both cases the learned sanitization function keeps the data in the original domain, thereby allowing the system to use the same algorithms it was using before for both original and privatized data. We show how we can maintain utility while fully protecting private information if the user chooses to do so, even when the first is harder than the second, as in the case here illustrated of identity detection while hiding gender.
Tasks
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07410v1
PDF	http://arxiv.org/pdf/1805.07410v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-collaborate-for-user-controlled
Repo
Framework

Arcades: A deep model for adaptive decision making in voice controlled smart-home


Title	Arcades: A deep model for adaptive decision making in voice controlled smart-home
Authors	Alexis Brenon, François Portet, Michel Vacher
Abstract	In a voice-controlled smart-home, a controller must respond not only to user’s requests but also according to the interaction context. This paper describes Arcades, a system which uses deep reinforcement learning to extract context from a graphical representation of home automation system and to update continuously its behavior to the user’s one. This system is robust to changes in the environment (sensor breakdown or addition) through its graphical representation (scale well) and the reinforcement mechanism (adapt well). The experiments on realistic data demonstrate that this method promises to reach long life context-aware control of smart-home.
Tasks	Decision Making
Published	2018-07-05
URL	http://arxiv.org/abs/1807.01970v1
PDF	http://arxiv.org/pdf/1807.01970v1.pdf
PWC	https://paperswithcode.com/paper/arcades-a-deep-model-for-adaptive-decision
Repo
Framework

Attentioned Convolutional LSTM InpaintingNetwork for Anomaly Detection in Videos


Title	Attentioned Convolutional LSTM InpaintingNetwork for Anomaly Detection in Videos
Authors	Itamar Ben-Ari, Ravid Shwartz-Ziv
Abstract	We propose a semi-supervised model for detecting anomalies in videos inspiredby the Video Pixel Network [van den Oord et al., 2016]. VPN is a probabilisticgenerative model based on a deep neural network that estimates the discrete jointdistribution of raw pixels in video frames. Our model extends the Convolutional-LSTM video encoder part of the VPN with a novel convolutional based attentionmechanism. We also modify the Pixel-CNN decoder part of the VPN to a frameinpainting task where a partially masked version of the frame to predict is given asinput. The frame reconstruction error is used as an anomaly indicator. We test ourmodel on a modified version of the moving mnist dataset [Srivastava et al., 2015]. Our model is shown to be effective in detecting anomalies in videos. This approachcould be a component in applications requiring visual common sense.
Tasks	Anomaly Detection, Common Sense Reasoning
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10228v1
PDF	http://arxiv.org/pdf/1811.10228v1.pdf
PWC	https://paperswithcode.com/paper/attentioned-convolutional-lstm
Repo
Framework

Deep Structured Generative Models


Title	Deep Structured Generative Models
Authors	Kun Xu, Haoyu Liang, Jun Zhu, Hang Su, Bo Zhang
Abstract	Deep generative models have shown promising results in generating realistic images, but it is still non-trivial to generate images with complicated structures. The main reason is that most of the current generative models fail to explore the structures in the images including spatial layout and semantic relations between objects. To address this issue, we propose a novel deep structured generative model which boosts generative adversarial networks (GANs) with the aid of structure information. In particular, the layout or structure of the scene is encoded by a stochastic and-or graph (sAOG), in which the terminal nodes represent single objects and edges represent relations between objects. With the sAOG appropriately harnessed, our model can successfully capture the intrinsic structure in the scenes and generate images of complicated scenes accordingly. Furthermore, a detection network is introduced to infer scene structures from a image. Experimental results demonstrate the effectiveness of our proposed method on both modeling the intrinsic structures, and generating realistic images.
Tasks
Published	2018-07-10
URL	http://arxiv.org/abs/1807.03877v1
PDF	http://arxiv.org/pdf/1807.03877v1.pdf
PWC	https://paperswithcode.com/paper/deep-structured-generative-models
Repo
Framework

Model Agnostic Saliency for Weakly Supervised Lesion Detection from Breast DCE-MRI


Title	Model Agnostic Saliency for Weakly Supervised Lesion Detection from Breast DCE-MRI
Authors	Gabriel Maicas, Gerard Snaauw, Andrew P. Bradley, Ian Reid, Gustavo Carneiro
Abstract	There is a heated debate on how to interpret the decisions provided by deep learning models (DLM), where the main approaches rely on the visualization of salient regions to interpret the DLM classification process. However, these approaches generally fail to satisfy three conditions for the problem of lesion detection from medical images: 1) for images with lesions, all salient regions should represent lesions, 2) for images containing no lesions, no salient region should be produced,and 3) lesions are generally small with relatively smooth borders. We propose a new model-agnostic paradigm to interpret DLM classification decisions supported by a novel definition of saliency that incorporates the conditions above. Our model-agnostic 1-class saliency detector (MASD) is tested on weakly supervised breast lesion detection from DCE-MRI, achieving state-of-the-art detection accuracy when compared to current visualization methods.
Tasks
Published	2018-07-20
URL	http://arxiv.org/abs/1807.07784v3
PDF	http://arxiv.org/pdf/1807.07784v3.pdf
PWC	https://paperswithcode.com/paper/model-agnostic-saliency-for-weakly-supervised
Repo
Framework

Regularized Finite Dimensional Kernel Sobolev Discrepancy


Title	Regularized Finite Dimensional Kernel Sobolev Discrepancy
Authors	Youssef Mroueh
Abstract	We show in this note that the Sobolev Discrepancy introduced in Mroueh et al in the context of generative adversarial networks, is actually the weighted negative Sobolev norm $._{\dot{H}^{-1}(\nu_q)}$, that is known to linearize the Wasserstein $W_2$ distance and plays a fundamental role in the dynamic formulation of optimal transport of Benamou and Brenier. Given a Kernel with finite dimensional feature map we show that the Sobolev discrepancy can be approximated from finite samples. Assuming this discrepancy is finite, the error depends on the approximation error in the function space induced by the finite dimensional feature space kernel and on a statistical error due to the finite sample approximation.
Tasks
Published	2018-05-16
URL	http://arxiv.org/abs/1805.06441v1
PDF	http://arxiv.org/pdf/1805.06441v1.pdf
PWC	https://paperswithcode.com/paper/regularized-finite-dimensional-kernel-sobolev
Repo
Framework

UMDuluth-CS8761 at SemEval-2018 Task 9: Hypernym Discovery using Hearst Patterns, Co-occurrence frequencies and Word Embeddings


Title	UMDuluth-CS8761 at SemEval-2018 Task 9: Hypernym Discovery using Hearst Patterns, Co-occurrence frequencies and Word Embeddings
Authors	Arshia Z. Hassan, Manikya S. Vallabhajosyula, Ted Pedersen
Abstract	Hypernym Discovery is the task of identifying potential hypernyms for a given term. A hypernym is a more generalized word that is super-ordinate to more specific words. This paper explores several approaches that rely on co-occurrence frequencies of word pairs, Hearst Patterns based on regular expressions, and word embeddings created from the UMBC corpus. Our system Babbage participated in Subtask 1A for English and placed 6th of 19 systems when identifying concept hypernyms, and 12th of 18 systems for entity hypernyms.
Tasks	Hypernym Discovery, Word Embeddings
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10271v1
PDF	http://arxiv.org/pdf/1805.10271v1.pdf
PWC	https://paperswithcode.com/paper/umduluth-cs8761-at-semeval-2018-task-9
Repo
Framework