October 18, 2019

3014 words 15 mins read

Paper Group ANR 463

Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition. Black-Box Autoregressive Density Estimation for State-Space Models. Finite Mixture Model of Nonparametric Density Estimation using Sampling Importance Resampling for Persistence Landscape. Learning what you can do before doing anything. Real-tim …

Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition


Title	Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition
Authors	Wissam J. Baddar, Yong Man Ro
Abstract	Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect of mode variations on the encoded spatio-temporal features using LSTMs. We show that the LSTM retains information related to the mode variation in the sequence, which is irrelevant to the task at hand (e.g. classification facial expressions). Actually, the LSTM forget mechanism is not robust enough to mode variations and preserves information that could negatively affect the encoded spatio-temporal features. We propose the mode variational LSTM to encode spatio-temporal features robust to unseen modes of variation. The mode variational LSTM modifies the original LSTM structure by adding an additional cell state that focuses on encoding the mode variation in the input sequence. To efficiently regulate what features should be stored in the additional cell state, additional gating functionality is also introduced. The effectiveness of the proposed mode variational LSTM is verified using the facial expression recognition task. Comparative experiments on publicly available datasets verified that the proposed mode variational LSTM outperforms existing methods. Moreover, a new dynamic facial expression dataset with different modes of variation, including various modes like pose and illumination variations, was collected to comprehensively evaluate the proposed mode variational LSTM. Experimental results verified that the proposed mode variational LSTM encodes spatio-temporal features robust to unseen modes of variation.
Tasks	Facial Expression Recognition
Published	2018-11-16
URL	http://arxiv.org/abs/1811.06937v1
PDF	http://arxiv.org/pdf/1811.06937v1.pdf
PWC	https://paperswithcode.com/paper/mode-variational-lstm-robust-to-unseen-modes
Repo
Framework

Black-Box Autoregressive Density Estimation for State-Space Models


Title	Black-Box Autoregressive Density Estimation for State-Space Models
Authors	Tom Ryder, Andrew Golighty, A. Stephen McGough, Dennis Prangle
Abstract	State-space models (SSMs) provide a flexible framework for modelling time-series data. Consequently, SSMs are ubiquitously applied in areas such as engineering, econometrics and epidemiology. In this paper we provide a fast approach for approximate Bayesian inference in SSMs using the tools of deep learning and variational inference.
Tasks	Bayesian Inference, Density Estimation, Epidemiology, Time Series
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08337v2
PDF	http://arxiv.org/pdf/1811.08337v2.pdf
PWC	https://paperswithcode.com/paper/black-box-autoregressive-density-estimation
Repo
Framework

Finite Mixture Model of Nonparametric Density Estimation using Sampling Importance Resampling for Persistence Landscape


Title	Finite Mixture Model of Nonparametric Density Estimation using Sampling Importance Resampling for Persistence Landscape
Authors	Farzad Eskandari, Soroush Pakniat
Abstract	Considering the creation of persistence landscape on a parametrized curve and structure of sampling, there exists a random process for which a finite mixture model of persistence landscape (FMMPL) can provide a better description for a given dataset. In this paper, a nonparametric approach for computing integrated mean of square error (IMSE) in persistence landscape has been presented. As a result, FMMPL is more accurate than the another way. Also, the sampling importance resampling (SIR) has been presented a better description of important landmark from parametrized curve. The result, provides more accuracy and less space complexity than the landmarks selected with simple sampling.
Tasks	Density Estimation
Published	2018-11-17
URL	http://arxiv.org/abs/1811.08297v1
PDF	http://arxiv.org/pdf/1811.08297v1.pdf
PWC	https://paperswithcode.com/paper/finite-mixture-model-of-nonparametric-density
Repo
Framework

Learning what you can do before doing anything


Title	Learning what you can do before doing anything
Authors	Oleh Rybkin, Karl Pertsch, Konstantinos G. Derpanis, Kostas Daniilidis, Andrew Jaegle
Abstract	Intelligent agents can learn to represent the action spaces of other agents simply by observing them act. Such representations help agents quickly learn to predict the effects of their own actions on the environment and to plan complex action sequences. In this work, we address the problem of learning an agent’s action space purely from visual observation. We use stochastic video prediction to learn a latent variable that captures the scene’s dynamics while being minimally sensitive to the scene’s static content. We introduce a loss term that encourages the network to capture the composability of visual sequences and show that it leads to representations that disentangle the structure of actions. We call the full model with composable action representations Composable Learned Action Space Predictor (CLASP). We show the applicability of our method to synthetic settings and its potential to capture action spaces in complex, realistic visual settings. When used in a semi-supervised setting, our learned representations perform comparably to existing fully supervised methods on tasks such as action-conditioned video prediction and planning in the learned action space, while requiring orders of magnitude fewer action labels. Project website: https://daniilidis-group.github.io/learned_action_spaces
Tasks	Video Prediction
Published	2018-06-25
URL	http://arxiv.org/abs/1806.09655v2
PDF	http://arxiv.org/pdf/1806.09655v2.pdf
PWC	https://paperswithcode.com/paper/learning-what-you-can-do-before-doing
Repo
Framework

Real-time Neural-based Input Method


Title	Real-time Neural-based Input Method
Authors	Jiali Yao, Raphael Shu, Xinjian Li, Katsutoshi Ohtsuki, Hideki Nakayama
Abstract	The input method is an essential service on every mobile and desktop devices that provides text suggestions. It converts sequential keyboard inputs to the characters in its target language, which is indispensable for Japanese and Chinese users. Due to critical resource constraints and limited network bandwidth of the target devices, applying neural models to input method is not well explored. In this work, we apply a LSTM-based language model to input method and evaluate its performance for both prediction and conversion tasks with Japanese BCCWJ corpus. We articulate the bottleneck to be the slow softmax computation during conversion. To solve the issue, we propose incremental softmax approximation approach, which computes softmax with a selected subset vocabulary and fix the stale probabilities when the vocabulary is updated in future steps. We refer to this method as incremental selective softmax. The results show a two order speedup for the softmax computation when converting Japanese input sequences with a large vocabulary, reaching real-time speed on commodity CPU. We also exploit the model compressing potential to achieve a 92% model size reduction without losing accuracy.
Tasks	Language Modelling
Published	2018-10-19
URL	http://arxiv.org/abs/1810.09309v1
PDF	http://arxiv.org/pdf/1810.09309v1.pdf
PWC	https://paperswithcode.com/paper/real-time-neural-based-input-method
Repo
Framework

Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints


Title	Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints
Authors	Jan Křetínský, Guillermo A. Pérez, Jean-François Raskin
Abstract	We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in a Markov decision process (MDP) with unknown probabilistic transition function and unknown reward function. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we show that in MDPs consisting of a single end component, two combinations of guarantees on the parity and mean-payoff objectives can be achieved depending on how much memory one is willing to use. (i) For all $\epsilon$ and $\gamma$ we can construct an online-learning finite-memory strategy that almost-surely satisfies the parity objective and which achieves an $\epsilon$-optimal mean payoff with probability at least $1 - \gamma$. (ii) Alternatively, for all $\epsilon$ and $\gamma$ there exists an online-learning infinite-memory strategy that satisfies the parity objective surely and which achieves an $\epsilon$-optimal mean payoff with probability at least $1 - \gamma$. We extend the above results to MDPs consisting of more than one end component in a natural way. Finally, we show that the aforementioned guarantees are tight, i.e. there are MDPs for which stronger combinations of the guarantees cannot be ensured.
Tasks
Published	2018-04-24
URL	http://arxiv.org/abs/1804.08924v4
PDF	http://arxiv.org/pdf/1804.08924v4.pdf
PWC	https://paperswithcode.com/paper/learning-based-mean-payoff-optimization-in-an
Repo
Framework

The MeMAD Submission to the WMT18 Multimodal Translation Task


Title	The MeMAD Submission to the WMT18 Multimodal Translation Task
Authors	Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphael Troncy, Raúl Vázquez
Abstract	This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top scoring system for both English-to-German and English-to-French, according to the automatic metrics for flickr18. Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective.
Tasks	Machine Translation, Multimodal Machine Translation
Published	2018-08-31
URL	http://arxiv.org/abs/1808.10802v2
PDF	http://arxiv.org/pdf/1808.10802v2.pdf
PWC	https://paperswithcode.com/paper/the-memad-submission-to-the-wmt18-multimodal
Repo
Framework

Formulating Camera-Adaptive Color Constancy as a Few-shot Meta-Learning Problem


Title	Formulating Camera-Adaptive Color Constancy as a Few-shot Meta-Learning Problem
Authors	Steven McDonagh, Sarah Parisot, Fengwei Zhou, Xing Zhang, Ales Leonardis, Zhenguo Li, Gregory Slabaugh
Abstract	Digital camera pipelines employ color constancy methods to estimate an unknown scene illuminant, in order to re-illuminate images as if they were acquired under an achromatic light source. Fully-supervised learning approaches exhibit state-of-the-art estimation accuracy with camera-specific labelled training imagery. Resulting models typically suffer from domain gaps and fail to generalise across imaging devices. In this work, we propose a new approach that affords fast adaptation to previously unseen cameras, and robustness to changes in capture device by leveraging annotated samples across different cameras and datasets. We present a general approach that utilizes the concept of color temperature to frame color constancy as a set of distinct, homogeneous few-shot regression tasks, each associated with an intuitive physical meaning. We integrate this novel formulation within a meta-learning framework, enabling fast generalisation to previously unseen cameras using only handfuls of camera specific training samples. Consequently, the time spent for data collection and annotation substantially diminishes in practice whenever a new sensor is used. To quantify this gain, we evaluate our pipeline on three publicly available datasets comprising 12 different cameras and diverse scene content. Our approach delivers competitive results both qualitatively and quantitatively while requiring a small fraction of the camera-specific samples compared to standard approaches.
Tasks	Color Constancy, Few-Shot Camera-Adaptive Color Constancy, few-shot regression, Meta-Learning
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11788v2
PDF	http://arxiv.org/pdf/1811.11788v2.pdf
PWC	https://paperswithcode.com/paper/meta-learning-for-few-shot-camera-adaptive
Repo
Framework

Unsupervised Control Through Non-Parametric Discriminative Rewards


Title	Unsupervised Control Through Non-Parametric Discriminative Rewards
Authors	David Warde-Farley, Tom Van de Wiele, Tejas Kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih
Abstract	Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research. We present an unsupervised learning algorithm to train agents to achieve perceptually-specified goals using only a stream of observations and actions. Our agent simultaneously learns a goal-conditioned policy and a goal achievement reward function that measures how similar a state is to the goal state. This dual optimization leads to a co-operative game, giving rise to a learned reward function that reflects similarity in controllable aspects of the environment instead of distance in the space of observations. We demonstrate the efficacy of our agent to learn, in an unsupervised manner, to reach a diverse set of goals on three domains – Atari, the DeepMind Control Suite and DeepMind Lab.
Tasks
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11359v1
PDF	http://arxiv.org/pdf/1811.11359v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-control-through-non-parametric
Repo
Framework

Deep Reinforcement Learning for Chinese Zero pronoun Resolution


Title	Deep Reinforcement Learning for Chinese Zero pronoun Resolution
Authors	Qingyu Yin, Yu Zhang, Weinan Zhang, Ting Liu, William Yang Wang
Abstract	Deep neural network models for Chinese zero pronoun resolution learn semantic information for zero pronoun and candidate antecedents, but tend to be short-sighted—they often make local decisions. They typically predict coreference chains between the zero pronoun and one single candidate antecedent one link at a time, while overlooking their long-term influence on future decisions. Ideally, modeling useful information of preceding potential antecedents is critical when later predicting zero pronoun-candidate antecedent pairs. In this study, we show how to integrate local and global decision-making by exploiting deep reinforcement learning models. With the help of the reinforcement learning agent, our model learns the policy of selecting antecedents in a sequential manner, where useful information provided by earlier predicted antecedents could be utilized for making later coreference decisions. Experimental results on OntoNotes 5.0 dataset show that our technique surpasses the state-of-the-art models.
Tasks	Chinese Zero Pronoun Resolution, Decision Making
Published	2018-06-10
URL	http://arxiv.org/abs/1806.03711v2
PDF	http://arxiv.org/pdf/1806.03711v2.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-chinese-zero
Repo
Framework

Deep Morphing: Detecting bone structures in fluoroscopic X-ray images with prior knowledge


Title	Deep Morphing: Detecting bone structures in fluoroscopic X-ray images with prior knowledge
Authors	Aaron Pries, Peter J. Schreier, Artur Lamm, Stefan Pede, Jürgen Schmidt
Abstract	We propose approaches based on deep learning to localize objects in images when only a small training dataset is available and the images have low quality. That applies to many problems in medical image processing, and in particular to the analysis of fluoroscopic (low-dose) X-ray images, where the images have low contrast. We solve the problem by incorporating high-level information about the objects, which could be a simple geometrical model, like a circular outline, or a more complex statistical model. A simple geometrical representation can sufficiently describe some objects and only requires minimal labeling. Statistical shape models can be used to represent more complex objects. We propose computationally efficient two-stage approaches, which we call deep morphing, for both representations by fitting the representation to the output of a deep segmentation network.
Tasks
Published	2018-08-09
URL	http://arxiv.org/abs/1808.04441v2
PDF	http://arxiv.org/pdf/1808.04441v2.pdf
PWC	https://paperswithcode.com/paper/deep-morphing-detecting-bone-structures-in
Repo
Framework

Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data


Title	Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data
Authors	Bao Wang, Xiyang Luo, Fangbo Zhang, Baichuan Yuan, Andrea L. Bertozzi, P. Jeffrey Brantingham
Abstract	We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting.
Tasks
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00684v1
PDF	http://arxiv.org/pdf/1804.00684v1.pdf
PWC	https://paperswithcode.com/paper/graph-based-deep-modeling-and-real-time
Repo
Framework

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures


Title	Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
Authors	Gongbo Tang, Mathias Müller, Annette Rios, Rico Sennrich
Abstract	Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.
Tasks	Machine Translation, Word Sense Disambiguation
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08946v3
PDF	http://arxiv.org/pdf/1808.08946v3.pdf
PWC	https://paperswithcode.com/paper/why-self-attention-a-targeted-evaluation-of
Repo
Framework

Recent Advances in Efficient Computation of Deep Convolutional Neural Networks


Title	Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Authors	Jian Cheng, Peisong Wang, Gang Li, Qinghao Hu, Hanqing Lu
Abstract	Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems. At the same time, the computational complexity and resource consumption of these networks also continue to increase. This will pose a significant challenge to the deployment of such networks, especially in real-time applications or on resource-limited devices. Thus, network acceleration has become a hot topic within the deep learning community. As for hardware implementation of deep neural networks, a batch of accelerators based on FPGA/ASIC have been proposed in recent years. In this paper, we provide a comprehensive survey of recent advances in network acceleration, compression and accelerator design from both algorithm and hardware points of view. Specifically, we provide a thorough analysis of each of the following topics: network pruning, low-rank approximation, network quantization, teacher-student networks, compact network design and hardware accelerators. Finally, we will introduce and discuss a few possible future directions.
Tasks	Network Pruning, Quantization
Published	2018-02-03
URL	http://arxiv.org/abs/1802.00939v2
PDF	http://arxiv.org/pdf/1802.00939v2.pdf
PWC	https://paperswithcode.com/paper/recent-advances-in-efficient-computation-of
Repo
Framework

Region-Based Classification of PolSAR Data Using Radial Basis Kernel Functions With Stochastic Distances


Title	Region-Based Classification of PolSAR Data Using Radial Basis Kernel Functions With Stochastic Distances
Authors	R. G. Negri, A. C. Frery, W. B. Silva, T. S. G. Mendes, L. V. Dutra
Abstract	Region-based classification of PolSAR data can be effectively performed by seeking for the assignment that minimizes a distance between prototypes and segments. Silva et al (2013) used stochastic distances between complex multivariate Wishart models which, differently from other measures, are computationally tractable. In this work we assess the robustness of such approach with respect to errors in the training stage, and propose an extension that alleviates such problems. We introduce robustness in the process by incorporating a combination of radial basis kernel functions and stochastic distances with Support Vector Machines (SVM). We consider several stochastic distances between Wishart: Bhatacharyya, Kullback-Leibler, Chi-Square, R'{e}nyi, and Hellinger. We perform two case studies with PolSAR images, both simulated and from actual sensors, and different classification scenarios to compare the performance of Minimum Distance and SVM classification frameworks. With this, we model the situation of imperfect training samples. We show that SVM with the proposed kernel functions achieves better performance with respect to Minimum Distance, at the expense of more computational resources and the need of parameter tuning. Code and data are provided for reproducibility.
Tasks
Published	2018-05-07
URL	http://arxiv.org/abs/1805.07438v1
PDF	http://arxiv.org/pdf/1805.07438v1.pdf
PWC	https://paperswithcode.com/paper/region-based-classification-of-polsar-data
Repo
Framework