October 18, 2019

3014 words 15 mins read

Paper Group ANR 463

Paper Group ANR 463

Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition. Black-Box Autoregressive Density Estimation for State-Space Models. Finite Mixture Model of Nonparametric Density Estimation using Sampling Importance Resampling for Persistence Landscape. Learning what you can do before doing anything. Real-tim …

Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition

Title Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition
Authors Wissam J. Baddar, Yong Man Ro
Abstract Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect of mode variations on the encoded spatio-temporal features using LSTMs. We show that the LSTM retains information related to the mode variation in the sequence, which is irrelevant to the task at hand (e.g. classification facial expressions). Actually, the LSTM forget mechanism is not robust enough to mode variations and preserves information that could negatively affect the encoded spatio-temporal features. We propose the mode variational LSTM to encode spatio-temporal features robust to unseen modes of variation. The mode variational LSTM modifies the original LSTM structure by adding an additional cell state that focuses on encoding the mode variation in the input sequence. To efficiently regulate what features should be stored in the additional cell state, additional gating functionality is also introduced. The effectiveness of the proposed mode variational LSTM is verified using the facial expression recognition task. Comparative experiments on publicly available datasets verified that the proposed mode variational LSTM outperforms existing methods. Moreover, a new dynamic facial expression dataset with different modes of variation, including various modes like pose and illumination variations, was collected to comprehensively evaluate the proposed mode variational LSTM. Experimental results verified that the proposed mode variational LSTM encodes spatio-temporal features robust to unseen modes of variation.
Tasks Facial Expression Recognition
Published 2018-11-16
URL http://arxiv.org/abs/1811.06937v1
PDF http://arxiv.org/pdf/1811.06937v1.pdf
PWC https://paperswithcode.com/paper/mode-variational-lstm-robust-to-unseen-modes
Repo
Framework

Black-Box Autoregressive Density Estimation for State-Space Models

Title Black-Box Autoregressive Density Estimation for State-Space Models
Authors Tom Ryder, Andrew Golighty, A. Stephen McGough, Dennis Prangle
Abstract State-space models (SSMs) provide a flexible framework for modelling time-series data. Consequently, SSMs are ubiquitously applied in areas such as engineering, econometrics and epidemiology. In this paper we provide a fast approach for approximate Bayesian inference in SSMs using the tools of deep learning and variational inference.
Tasks Bayesian Inference, Density Estimation, Epidemiology, Time Series
Published 2018-11-20
URL http://arxiv.org/abs/1811.08337v2
PDF http://arxiv.org/pdf/1811.08337v2.pdf
PWC https://paperswithcode.com/paper/black-box-autoregressive-density-estimation
Repo
Framework

Finite Mixture Model of Nonparametric Density Estimation using Sampling Importance Resampling for Persistence Landscape

Title Finite Mixture Model of Nonparametric Density Estimation using Sampling Importance Resampling for Persistence Landscape
Authors Farzad Eskandari, Soroush Pakniat
Abstract Considering the creation of persistence landscape on a parametrized curve and structure of sampling, there exists a random process for which a finite mixture model of persistence landscape (FMMPL) can provide a better description for a given dataset. In this paper, a nonparametric approach for computing integrated mean of square error (IMSE) in persistence landscape has been presented. As a result, FMMPL is more accurate than the another way. Also, the sampling importance resampling (SIR) has been presented a better description of important landmark from parametrized curve. The result, provides more accuracy and less space complexity than the landmarks selected with simple sampling.
Tasks Density Estimation
Published 2018-11-17
URL http://arxiv.org/abs/1811.08297v1
PDF http://arxiv.org/pdf/1811.08297v1.pdf
PWC https://paperswithcode.com/paper/finite-mixture-model-of-nonparametric-density
Repo
Framework

Learning what you can do before doing anything

Title Learning what you can do before doing anything
Authors Oleh Rybkin, Karl Pertsch, Konstantinos G. Derpanis, Kostas Daniilidis, Andrew Jaegle
Abstract Intelligent agents can learn to represent the action spaces of other agents simply by observing them act. Such representations help agents quickly learn to predict the effects of their own actions on the environment and to plan complex action sequences. In this work, we address the problem of learning an agent’s action space purely from visual observation. We use stochastic video prediction to learn a latent variable that captures the scene’s dynamics while being minimally sensitive to the scene’s static content. We introduce a loss term that encourages the network to capture the composability of visual sequences and show that it leads to representations that disentangle the structure of actions. We call the full model with composable action representations Composable Learned Action Space Predictor (CLASP). We show the applicability of our method to synthetic settings and its potential to capture action spaces in complex, realistic visual settings. When used in a semi-supervised setting, our learned representations perform comparably to existing fully supervised methods on tasks such as action-conditioned video prediction and planning in the learned action space, while requiring orders of magnitude fewer action labels. Project website: https://daniilidis-group.github.io/learned_action_spaces
Tasks Video Prediction
Published 2018-06-25
URL http://arxiv.org/abs/1806.09655v2
PDF http://arxiv.org/pdf/1806.09655v2.pdf
PWC https://paperswithcode.com/paper/learning-what-you-can-do-before-doing
Repo
Framework

Real-time Neural-based Input Method

Title Real-time Neural-based Input Method
Authors Jiali Yao, Raphael Shu, Xinjian Li, Katsutoshi Ohtsuki, Hideki Nakayama
Abstract The input method is an essential service on every mobile and desktop devices that provides text suggestions. It converts sequential keyboard inputs to the characters in its target language, which is indispensable for Japanese and Chinese users. Due to critical resource constraints and limited network bandwidth of the target devices, applying neural models to input method is not well explored. In this work, we apply a LSTM-based language model to input method and evaluate its performance for both prediction and conversion tasks with Japanese BCCWJ corpus. We articulate the bottleneck to be the slow softmax computation during conversion. To solve the issue, we propose incremental softmax approximation approach, which computes softmax with a selected subset vocabulary and fix the stale probabilities when the vocabulary is updated in future steps. We refer to this method as incremental selective softmax. The results show a two order speedup for the softmax computation when converting Japanese input sequences with a large vocabulary, reaching real-time speed on commodity CPU. We also exploit the model compressing potential to achieve a 92% model size reduction without losing accuracy.
Tasks Language Modelling
Published 2018-10-19
URL http://arxiv.org/abs/1810.09309v1
PDF http://arxiv.org/pdf/1810.09309v1.pdf
PWC https://paperswithcode.com/paper/real-time-neural-based-input-method
Repo
Framework

Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints

Title Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints
Authors Jan Křetínský, Guillermo A. Pérez, Jean-François Raskin
Abstract We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in a Markov decision process (MDP) with unknown probabilistic transition function and unknown reward function. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we show that in MDPs consisting of a single end component, two combinations of guarantees on the parity and mean-payoff objectives can be achieved depending on how much memory one is willing to use. (i) For all $\epsilon$ and $\gamma$ we can construct an online-learning finite-memory strategy that almost-surely satisfies the parity objective and which achieves an $\epsilon$-optimal mean payoff with probability at least $1 - \gamma$. (ii) Alternatively, for all $\epsilon$ and $\gamma$ there exists an online-learning infinite-memory strategy that satisfies the parity objective surely and which achieves an $\epsilon$-optimal mean payoff with probability at least $1 - \gamma$. We extend the above results to MDPs consisting of more than one end component in a natural way. Finally, we show that the aforementioned guarantees are tight, i.e. there are MDPs for which stronger combinations of the guarantees cannot be ensured.
Tasks
Published 2018-04-24
URL http://arxiv.org/abs/1804.08924v4
PDF http://arxiv.org/pdf/1804.08924v4.pdf
PWC https://paperswithcode.com/paper/learning-based-mean-payoff-optimization-in-an
Repo
Framework

The MeMAD Submission to the WMT18 Multimodal Translation Task

Title The MeMAD Submission to the WMT18 Multimodal Translation Task
Authors Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphael Troncy, Raúl Vázquez
Abstract This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top scoring system for both English-to-German and English-to-French, according to the automatic metrics for flickr18. Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective.
Tasks Machine Translation, Multimodal Machine Translation
Published 2018-08-31
URL http://arxiv.org/abs/1808.10802v2
PDF http://arxiv.org/pdf/1808.10802v2.pdf
PWC https://paperswithcode.com/paper/the-memad-submission-to-the-wmt18-multimodal
Repo
Framework

Formulating Camera-Adaptive Color Constancy as a Few-shot Meta-Learning Problem

Title Formulating Camera-Adaptive Color Constancy as a Few-shot Meta-Learning Problem
Authors Steven McDonagh, Sarah Parisot, Fengwei Zhou, Xing Zhang, Ales Leonardis, Zhenguo Li, Gregory Slabaugh
Abstract Digital camera pipelines employ color constancy methods to estimate an unknown scene illuminant, in order to re-illuminate images as if they were acquired under an achromatic light source. Fully-supervised learning approaches exhibit state-of-the-art estimation accuracy with camera-specific labelled training imagery. Resulting models typically suffer from domain gaps and fail to generalise across imaging devices. In this work, we propose a new approach that affords fast adaptation to previously unseen cameras, and robustness to changes in capture device by leveraging annotated samples across different cameras and datasets. We present a general approach that utilizes the concept of color temperature to frame color constancy as a set of distinct, homogeneous few-shot regression tasks, each associated with an intuitive physical meaning. We integrate this novel formulation within a meta-learning framework, enabling fast generalisation to previously unseen cameras using only handfuls of camera specific training samples. Consequently, the time spent for data collection and annotation substantially diminishes in practice whenever a new sensor is used. To quantify this gain, we evaluate our pipeline on three publicly available datasets comprising 12 different cameras and diverse scene content. Our approach delivers competitive results both qualitatively and quantitatively while requiring a small fraction of the camera-specific samples compared to standard approaches.
Tasks Color Constancy, Few-Shot Camera-Adaptive Color Constancy, few-shot regression, Meta-Learning
Published 2018-11-28
URL http://arxiv.org/abs/1811.11788v2
PDF http://arxiv.org/pdf/1811.11788v2.pdf
PWC https://paperswithcode.com/paper/meta-learning-for-few-shot-camera-adaptive
Repo
Framework

Unsupervised Control Through Non-Parametric Discriminative Rewards

Title Unsupervised Control Through Non-Parametric Discriminative Rewards
Authors David Warde-Farley, Tom Van de Wiele, Tejas Kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih
Abstract Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research. We present an unsupervised learning algorithm to train agents to achieve perceptually-specified goals using only a stream of observations and actions. Our agent simultaneously learns a goal-conditioned policy and a goal achievement reward function that measures how similar a state is to the goal state. This dual optimization leads to a co-operative game, giving rise to a learned reward function that reflects similarity in controllable aspects of the environment instead of distance in the space of observations. We demonstrate the efficacy of our agent to learn, in an unsupervised manner, to reach a diverse set of goals on three domains – Atari, the DeepMind Control Suite and DeepMind Lab.
Tasks
Published 2018-11-28
URL http://arxiv.org/abs/1811.11359v1
PDF http://arxiv.org/pdf/1811.11359v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-control-through-non-parametric
Repo
Framework

Deep Reinforcement Learning for Chinese Zero pronoun Resolution

Title Deep Reinforcement Learning for Chinese Zero pronoun Resolution
Authors Qingyu Yin, Yu Zhang, Weinan Zhang, Ting Liu, William Yang Wang
Abstract Deep neural network models for Chinese zero pronoun resolution learn semantic information for zero pronoun and candidate antecedents, but tend to be short-sighted—they often make local decisions. They typically predict coreference chains between the zero pronoun and one single candidate antecedent one link at a time, while overlooking their long-term influence on future decisions. Ideally, modeling useful information of preceding potential antecedents is critical when later predicting zero pronoun-candidate antecedent pairs. In this study, we show how to integrate local and global decision-making by exploiting deep reinforcement learning models. With the help of the reinforcement learning agent, our model learns the policy of selecting antecedents in a sequential manner, where useful information provided by earlier predicted antecedents could be utilized for making later coreference decisions. Experimental results on OntoNotes 5.0 dataset show that our technique surpasses the state-of-the-art models.
Tasks Chinese Zero Pronoun Resolution, Decision Making
Published 2018-06-10
URL http://arxiv.org/abs/1806.03711v2
PDF http://arxiv.org/pdf/1806.03711v2.pdf
PWC https://paperswithcode.com/paper/deep-reinforcement-learning-for-chinese-zero
Repo
Framework

Deep Morphing: Detecting bone structures in fluoroscopic X-ray images with prior knowledge

Title Deep Morphing: Detecting bone structures in fluoroscopic X-ray images with prior knowledge
Authors Aaron Pries, Peter J. Schreier, Artur Lamm, Stefan Pede, Jürgen Schmidt
Abstract We propose approaches based on deep learning to localize objects in images when only a small training dataset is available and the images have low quality. That applies to many problems in medical image processing, and in particular to the analysis of fluoroscopic (low-dose) X-ray images, where the images have low contrast. We solve the problem by incorporating high-level information about the objects, which could be a simple geometrical model, like a circular outline, or a more complex statistical model. A simple geometrical representation can sufficiently describe some objects and only requires minimal labeling. Statistical shape models can be used to represent more complex objects. We propose computationally efficient two-stage approaches, which we call deep morphing, for both representations by fitting the representation to the output of a deep segmentation network.
Tasks
Published 2018-08-09
URL http://arxiv.org/abs/1808.04441v2
PDF http://arxiv.org/pdf/1808.04441v2.pdf
PWC https://paperswithcode.com/paper/deep-morphing-detecting-bone-structures-in
Repo
Framework

Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data

Title Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data
Authors Bao Wang, Xiyang Luo, Fangbo Zhang, Baichuan Yuan, Andrea L. Bertozzi, P. Jeffrey Brantingham
Abstract We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting.
Tasks
Published 2018-04-02
URL http://arxiv.org/abs/1804.00684v1
PDF http://arxiv.org/pdf/1804.00684v1.pdf
PWC https://paperswithcode.com/paper/graph-based-deep-modeling-and-real-time
Repo
Framework

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

Title Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
Authors Gongbo Tang, Mathias Müller, Annette Rios, Rico Sennrich
Abstract Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.
Tasks Machine Translation, Word Sense Disambiguation
Published 2018-08-27
URL http://arxiv.org/abs/1808.08946v3
PDF http://arxiv.org/pdf/1808.08946v3.pdf
PWC https://paperswithcode.com/paper/why-self-attention-a-targeted-evaluation-of
Repo
Framework

Recent Advances in Efficient Computation of Deep Convolutional Neural Networks

Title Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Authors Jian Cheng, Peisong Wang, Gang Li, Qinghao Hu, Hanqing Lu
Abstract Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems. At the same time, the computational complexity and resource consumption of these networks also continue to increase. This will pose a significant challenge to the deployment of such networks, especially in real-time applications or on resource-limited devices. Thus, network acceleration has become a hot topic within the deep learning community. As for hardware implementation of deep neural networks, a batch of accelerators based on FPGA/ASIC have been proposed in recent years. In this paper, we provide a comprehensive survey of recent advances in network acceleration, compression and accelerator design from both algorithm and hardware points of view. Specifically, we provide a thorough analysis of each of the following topics: network pruning, low-rank approximation, network quantization, teacher-student networks, compact network design and hardware accelerators. Finally, we will introduce and discuss a few possible future directions.
Tasks Network Pruning, Quantization
Published 2018-02-03
URL http://arxiv.org/abs/1802.00939v2
PDF http://arxiv.org/pdf/1802.00939v2.pdf
PWC https://paperswithcode.com/paper/recent-advances-in-efficient-computation-of
Repo
Framework

Region-Based Classification of PolSAR Data Using Radial Basis Kernel Functions With Stochastic Distances

Title Region-Based Classification of PolSAR Data Using Radial Basis Kernel Functions With Stochastic Distances
Authors R. G. Negri, A. C. Frery, W. B. Silva, T. S. G. Mendes, L. V. Dutra
Abstract Region-based classification of PolSAR data can be effectively performed by seeking for the assignment that minimizes a distance between prototypes and segments. Silva et al (2013) used stochastic distances between complex multivariate Wishart models which, differently from other measures, are computationally tractable. In this work we assess the robustness of such approach with respect to errors in the training stage, and propose an extension that alleviates such problems. We introduce robustness in the process by incorporating a combination of radial basis kernel functions and stochastic distances with Support Vector Machines (SVM). We consider several stochastic distances between Wishart: Bhatacharyya, Kullback-Leibler, Chi-Square, R'{e}nyi, and Hellinger. We perform two case studies with PolSAR images, both simulated and from actual sensors, and different classification scenarios to compare the performance of Minimum Distance and SVM classification frameworks. With this, we model the situation of imperfect training samples. We show that SVM with the proposed kernel functions achieves better performance with respect to Minimum Distance, at the expense of more computational resources and the need of parameter tuning. Code and data are provided for reproducibility.
Tasks
Published 2018-05-07
URL http://arxiv.org/abs/1805.07438v1
PDF http://arxiv.org/pdf/1805.07438v1.pdf
PWC https://paperswithcode.com/paper/region-based-classification-of-polsar-data
Repo
Framework
comments powered by Disqus