Paper Group ANR 275
EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning. SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties. Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency. Visual Semantic Re-ranker …
EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning
Title | EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning |
Authors | Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter |
Abstract | Acoustically expressed emotions can make communication with a robot more efficient. Detecting emotions like anger could provide a clue for the robot indicating unsafe/undesired situations. Recently, several deep neural network-based models have been proposed which establish new state-of-the-art results in affective state evaluation. These models typically start processing at the end of each utterance, which not only requires a mechanism to detect the end of an utterance but also makes it difficult to use them in a real-time communication scenario, e.g. human-robot interaction. We propose the EmoRL model that triggers an emotion classification as soon as it gains enough confidence while listening to a person speaking. As a result, we minimize the need for segmenting the audio signal for classification and achieve lower latency as the audio signal is processed incrementally. The method is competitive with the accuracy of a strong baseline model, while allowing much earlier prediction. |
Tasks | Emotion Classification |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.04053v1 |
http://arxiv.org/pdf/1804.04053v1.pdf | |
PWC | https://paperswithcode.com/paper/emorl-continuous-acoustic-emotion |
Repo | |
Framework | |
SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties
Title | SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties |
Authors | Jian Huang, Yuling Jiao, Xiliang Lu, Yueyong Shi, Qinglong Yang |
Abstract | We propose a semismooth Newton algorithm for pathwise optimization (SNAP) for the LASSO and Enet in sparse, high-dimensional linear regression. SNAP is derived from a suitable formulation of the KKT conditions based on Newton derivatives. It solves the semismooth KKT equations efficiently by actively and continuously seeking the support of the regression coefficients along the solution path with warm start. At each knot in the path, SNAP converges locally superlinearly for the Enet criterion and achieves an optimal local convergence rate for the LASSO criterion, i.e., SNAP converges in one step at the cost of two matrix-vector multiplication per iteration. Under certain regularity conditions on the design matrix and the minimum magnitude of the nonzero elements of the target regression coefficients, we show that SNAP hits a solution with the same signs as the regression coefficients and achieves a sharp estimation error bound in finite steps with high probability. The computational complexity of SNAP is shown to be the same as that of LARS and coordinate descent algorithms per iteration. Simulation studies and real data analysis support our theoretical results and demonstrate that SNAP is faster and accurate than LARS and coordinate descent algorithms. |
Tasks | |
Published | 2018-10-09 |
URL | http://arxiv.org/abs/1810.03814v1 |
http://arxiv.org/pdf/1810.03814v1.pdf | |
PWC | https://paperswithcode.com/paper/snap-a-semismooth-newton-algorithm-for |
Repo | |
Framework | |
Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
Title | Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency |
Authors | Eunji Chong, Nataniel Ruiz, Yongxin Wang, Yun Zhang, Agata Rozga, James Rehg |
Abstract | This paper addresses the challenging problem of estimating the general visual attention of people in images. Our proposed method is designed to work across multiple naturalistic social scenarios and provides a full picture of the subject’s attention and gaze. In contrast, earlier works on gaze and attention estimation have focused on constrained problems in more specific contexts. In particular, our model explicitly represents the gaze direction and handles out-of-frame gaze targets. We leverage three different datasets using a multi-task learning approach. We evaluate our method on widely used benchmarks for single-tasks such as gaze angle estimation and attention-within-an-image, as well as on the new challenging task of generalized visual attention prediction. In addition, we have created extended annotations for the MMDB and GazeFollow datasets which are used in our experiments, which we will publicly release. |
Tasks | Multi-Task Learning |
Published | 2018-07-27 |
URL | http://arxiv.org/abs/1807.10437v1 |
http://arxiv.org/pdf/1807.10437v1.pdf | |
PWC | https://paperswithcode.com/paper/connecting-gaze-scene-and-attention |
Repo | |
Framework | |
Visual Semantic Re-ranker for Text Spotting
Title | Visual Semantic Re-ranker for Text Spotting |
Authors | Ahmed Sabir, Francesc Moreno-Noguer, Lluís Padró |
Abstract | Many current state-of-the-art methods for text recognition are based on purely local information and ignore the semantic correlation between text and its surrounding visual context. In this paper, we propose a post-processing approach to improve the accuracy of text spotting by using the semantic relation between the text and the scene. We initially rely on an off-the-shelf deep neural network that provides a series of text hypotheses for each input image. These text hypotheses are then re-ranked using the semantic relatedness with the object in the image. As a result of this combination, the performance of the original network is boosted with a very low computational cost. The proposed framework can be used as a drop-in complement for any text-spotting algorithm that outputs a ranking of word hypotheses. We validate our approach on ICDAR’17 shared task dataset. |
Tasks | Text Spotting |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.09776v2 |
http://arxiv.org/pdf/1810.09776v2.pdf | |
PWC | https://paperswithcode.com/paper/visual-semantic-re-ranker-for-text-spotting |
Repo | |
Framework | |
Representation Learning of Pedestrian Trajectories Using Actor-Critic Sequence-to-Sequence Autoencoder
Title | Representation Learning of Pedestrian Trajectories Using Actor-Critic Sequence-to-Sequence Autoencoder |
Authors | Ka-Ho Chow, Anish Hiranandani, Yifeng Zhang, S. -H. Gary Chan |
Abstract | Representation learning of pedestrian trajectories transforms variable-length timestamp-coordinate tuples of a trajectory into a fixed-length vector representation that summarizes spatiotemporal characteristics. It is a crucial technique to connect feature-based data mining with trajectory data. Trajectory representation is a challenging problem, because both environmental constraints (e.g., wall partitions) and temporal user dynamics should be meticulously considered and accounted for. Furthermore, traditional sequence-to-sequence autoencoders using maximum log-likelihood often require dataset covering all the possible spatiotemporal characteristics to perform well. This is infeasible or impractical in reality. We propose TREP, a practical pedestrian trajectory representation learning algorithm which captures the environmental constraints and the pedestrian dynamics without the need of any training dataset. By formulating a sequence-to-sequence autoencoder with a spatial-aware objective function under the paradigm of actor-critic reinforcement learning, TREP intelligently encodes spatiotemporal characteristics of trajectories with the capability of handling diverse trajectory patterns. Extensive experiments on both synthetic and real datasets validate the high fidelity of TREP to represent trajectories. |
Tasks | Representation Learning |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08069v1 |
http://arxiv.org/pdf/1811.08069v1.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-of-pedestrian |
Repo | |
Framework | |
Inductive Visual Localisation: Factorised Training for Superior Generalisation
Title | Inductive Visual Localisation: Factorised Training for Superior Generalisation |
Authors | Ankush Gupta, Andrea Vedaldi, Andrew Zisserman |
Abstract | End-to-end trained Recurrent Neural Networks (RNNs) have been successfully applied to numerous problems that require processing sequences, such as image captioning, machine translation, and text recognition. However, RNNs often struggle to generalise to sequences longer than the ones encountered during training. In this work, we propose to optimise neural networks explicitly for induction. The idea is to first decompose the problem in a sequence of inductive steps and then to explicitly train the RNN to reproduce such steps. Generalisation is achieved as the RNN is not allowed to learn an arbitrary internal state; instead, it is tasked with mimicking the evolution of a valid state. In particular, the state is restricted to a spatial memory map that tracks parts of the input image which have been accounted for in previous steps. The RNN is trained for single inductive steps, where it produces updates to the memory in addition to the desired output. We evaluate our method on two different visual recognition problems involving visual sequences: (1) text spotting, i.e. joint localisation and reading of text in images containing multiple lines (or a block) of text, and (2) sequential counting of objects in aerial images. We show that inductive training of recurrent models enhances their generalisation ability on challenging image datasets. |
Tasks | Image Captioning, Machine Translation, Text Spotting |
Published | 2018-07-21 |
URL | http://arxiv.org/abs/1807.08179v1 |
http://arxiv.org/pdf/1807.08179v1.pdf | |
PWC | https://paperswithcode.com/paper/inductive-visual-localisation-factorised |
Repo | |
Framework | |
High-dimensional Index Volatility Models via Stein’s Identity
Title | High-dimensional Index Volatility Models via Stein’s Identity |
Authors | Sen Na, Mladen Kolar |
Abstract | We study estimation of the parametric components of single and multiple index volatility models. Using the first- and second-order Stein’s identity, we develop methods that are applicable for estimation of the variance index in a high-dimensional setting requiring finite moment condition, which allows for heavy-tailed data. Our approach complements the existing literature in a low-dimensional setting, while relaxing the conditions on estimation, and provides a novel approach in a high-dimensional setting. We prove that the statistical rate of convergence of our variance index estimators consists of a parametric rate and a nonparametric rate, where the latter appears from the estimation of the mean link function. However, under standard assumptions, the parametric rate dominates the rate of convergence and our results match the minimax optimal rate for the mean index estimation. Simulation results illustrate finite sample properties of our methodology and back our theoretical conclusions. |
Tasks | |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.10790v2 |
http://arxiv.org/pdf/1811.10790v2.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-index-volatility-models-via |
Repo | |
Framework | |
Deep Neural Networks for Query Expansion using Word Embeddings
Title | Deep Neural Networks for Query Expansion using Word Embeddings |
Authors | Ayyoob Imani, Amir Vakili, Ali Montazer, Azadeh Shakery |
Abstract | Query expansion is a method for alleviating the vocabulary mismatch problem present in information retrieval tasks. Previous works have shown that terms selected for query expansion by traditional methods such as pseudo-relevance feedback are not always helpful to the retrieval process. In this paper, we show that this is also true for more recently proposed embedding-based query expansion methods. We then introduce an artificial neural network classifier to predict the usefulness of query expansion terms. This classifier uses term word embeddings as inputs. We perform experiments on four TREC newswire and web collections show that using terms selected by the classifier for expansion significantly improves retrieval performance when compared to competitive baselines. The results are also shown to be more robust than the baselines. |
Tasks | Information Retrieval, Word Embeddings |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03514v1 |
http://arxiv.org/pdf/1811.03514v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-networks-for-query-expansion |
Repo | |
Framework | |
Learning to Collaborate for User-Controlled Privacy
Title | Learning to Collaborate for User-Controlled Privacy |
Authors | Martin Bertran, Natalia Martinez, Afroditi Papadaki, Qiang Qiu, Miguel Rodrigues, Guillermo Sapiro |
Abstract | It is becoming increasingly clear that users should own and control their data. Utility providers are also becoming more interested in guaranteeing data privacy. As such, users and utility providers should collaborate in data privacy, a paradigm that has not yet been developed in the privacy research community. We introduce this concept and present explicit architectures where the user controls what characteristics of the data she/he wants to share and what she/he wants to keep private. This is achieved by collaborative learning a sensitization function, either a deterministic or a stochastic one, that retains valuable information for the utility tasks but it also eliminates necessary information for the privacy ones. As illustration examples, we implement them using a plug-and-play approach, where no algorithm is changed at the system provider end, and an adversarial approach, where minor re-training of the privacy inferring engine is allowed. In both cases the learned sanitization function keeps the data in the original domain, thereby allowing the system to use the same algorithms it was using before for both original and privatized data. We show how we can maintain utility while fully protecting private information if the user chooses to do so, even when the first is harder than the second, as in the case here illustrated of identity detection while hiding gender. |
Tasks | |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07410v1 |
http://arxiv.org/pdf/1805.07410v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-collaborate-for-user-controlled |
Repo | |
Framework | |
Arcades: A deep model for adaptive decision making in voice controlled smart-home
Title | Arcades: A deep model for adaptive decision making in voice controlled smart-home |
Authors | Alexis Brenon, François Portet, Michel Vacher |
Abstract | In a voice-controlled smart-home, a controller must respond not only to user’s requests but also according to the interaction context. This paper describes Arcades, a system which uses deep reinforcement learning to extract context from a graphical representation of home automation system and to update continuously its behavior to the user’s one. This system is robust to changes in the environment (sensor breakdown or addition) through its graphical representation (scale well) and the reinforcement mechanism (adapt well). The experiments on realistic data demonstrate that this method promises to reach long life context-aware control of smart-home. |
Tasks | Decision Making |
Published | 2018-07-05 |
URL | http://arxiv.org/abs/1807.01970v1 |
http://arxiv.org/pdf/1807.01970v1.pdf | |
PWC | https://paperswithcode.com/paper/arcades-a-deep-model-for-adaptive-decision |
Repo | |
Framework | |
Attentioned Convolutional LSTM InpaintingNetwork for Anomaly Detection in Videos
Title | Attentioned Convolutional LSTM InpaintingNetwork for Anomaly Detection in Videos |
Authors | Itamar Ben-Ari, Ravid Shwartz-Ziv |
Abstract | We propose a semi-supervised model for detecting anomalies in videos inspiredby the Video Pixel Network [van den Oord et al., 2016]. VPN is a probabilisticgenerative model based on a deep neural network that estimates the discrete jointdistribution of raw pixels in video frames. Our model extends the Convolutional-LSTM video encoder part of the VPN with a novel convolutional based attentionmechanism. We also modify the Pixel-CNN decoder part of the VPN to a frameinpainting task where a partially masked version of the frame to predict is given asinput. The frame reconstruction error is used as an anomaly indicator. We test ourmodel on a modified version of the moving mnist dataset [Srivastava et al., 2015]. Our model is shown to be effective in detecting anomalies in videos. This approachcould be a component in applications requiring visual common sense. |
Tasks | Anomaly Detection, Common Sense Reasoning |
Published | 2018-11-26 |
URL | http://arxiv.org/abs/1811.10228v1 |
http://arxiv.org/pdf/1811.10228v1.pdf | |
PWC | https://paperswithcode.com/paper/attentioned-convolutional-lstm |
Repo | |
Framework | |
Deep Structured Generative Models
Title | Deep Structured Generative Models |
Authors | Kun Xu, Haoyu Liang, Jun Zhu, Hang Su, Bo Zhang |
Abstract | Deep generative models have shown promising results in generating realistic images, but it is still non-trivial to generate images with complicated structures. The main reason is that most of the current generative models fail to explore the structures in the images including spatial layout and semantic relations between objects. To address this issue, we propose a novel deep structured generative model which boosts generative adversarial networks (GANs) with the aid of structure information. In particular, the layout or structure of the scene is encoded by a stochastic and-or graph (sAOG), in which the terminal nodes represent single objects and edges represent relations between objects. With the sAOG appropriately harnessed, our model can successfully capture the intrinsic structure in the scenes and generate images of complicated scenes accordingly. Furthermore, a detection network is introduced to infer scene structures from a image. Experimental results demonstrate the effectiveness of our proposed method on both modeling the intrinsic structures, and generating realistic images. |
Tasks | |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03877v1 |
http://arxiv.org/pdf/1807.03877v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-structured-generative-models |
Repo | |
Framework | |
Model Agnostic Saliency for Weakly Supervised Lesion Detection from Breast DCE-MRI
Title | Model Agnostic Saliency for Weakly Supervised Lesion Detection from Breast DCE-MRI |
Authors | Gabriel Maicas, Gerard Snaauw, Andrew P. Bradley, Ian Reid, Gustavo Carneiro |
Abstract | There is a heated debate on how to interpret the decisions provided by deep learning models (DLM), where the main approaches rely on the visualization of salient regions to interpret the DLM classification process. However, these approaches generally fail to satisfy three conditions for the problem of lesion detection from medical images: 1) for images with lesions, all salient regions should represent lesions, 2) for images containing no lesions, no salient region should be produced,and 3) lesions are generally small with relatively smooth borders. We propose a new model-agnostic paradigm to interpret DLM classification decisions supported by a novel definition of saliency that incorporates the conditions above. Our model-agnostic 1-class saliency detector (MASD) is tested on weakly supervised breast lesion detection from DCE-MRI, achieving state-of-the-art detection accuracy when compared to current visualization methods. |
Tasks | |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.07784v3 |
http://arxiv.org/pdf/1807.07784v3.pdf | |
PWC | https://paperswithcode.com/paper/model-agnostic-saliency-for-weakly-supervised |
Repo | |
Framework | |
Regularized Finite Dimensional Kernel Sobolev Discrepancy
Title | Regularized Finite Dimensional Kernel Sobolev Discrepancy |
Authors | Youssef Mroueh |
Abstract | We show in this note that the Sobolev Discrepancy introduced in Mroueh et al in the context of generative adversarial networks, is actually the weighted negative Sobolev norm $._{\dot{H}^{-1}(\nu_q)}$, that is known to linearize the Wasserstein $W_2$ distance and plays a fundamental role in the dynamic formulation of optimal transport of Benamou and Brenier. Given a Kernel with finite dimensional feature map we show that the Sobolev discrepancy can be approximated from finite samples. Assuming this discrepancy is finite, the error depends on the approximation error in the function space induced by the finite dimensional feature space kernel and on a statistical error due to the finite sample approximation. |
Tasks | |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06441v1 |
http://arxiv.org/pdf/1805.06441v1.pdf | |
PWC | https://paperswithcode.com/paper/regularized-finite-dimensional-kernel-sobolev |
Repo | |
Framework | |
UMDuluth-CS8761 at SemEval-2018 Task 9: Hypernym Discovery using Hearst Patterns, Co-occurrence frequencies and Word Embeddings
Title | UMDuluth-CS8761 at SemEval-2018 Task 9: Hypernym Discovery using Hearst Patterns, Co-occurrence frequencies and Word Embeddings |
Authors | Arshia Z. Hassan, Manikya S. Vallabhajosyula, Ted Pedersen |
Abstract | Hypernym Discovery is the task of identifying potential hypernyms for a given term. A hypernym is a more generalized word that is super-ordinate to more specific words. This paper explores several approaches that rely on co-occurrence frequencies of word pairs, Hearst Patterns based on regular expressions, and word embeddings created from the UMBC corpus. Our system Babbage participated in Subtask 1A for English and placed 6th of 19 systems when identifying concept hypernyms, and 12th of 18 systems for entity hypernyms. |
Tasks | Hypernym Discovery, Word Embeddings |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10271v1 |
http://arxiv.org/pdf/1805.10271v1.pdf | |
PWC | https://paperswithcode.com/paper/umduluth-cs8761-at-semeval-2018-task-9 |
Repo | |
Framework | |