May 5, 2019

2584 words 13 mins read

Paper Group ANR 481

State Duration and Interval Modeling in Hidden Semi-Markov Model for Sequential Data Analysis. Fairness in Reinforcement Learning. Backprop KF: Learning Discriminative Deterministic State Estimators. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits. Binomial Checkpointing for Arbitrary Programs with No User Annotation. A scaled Breg …

State Duration and Interval Modeling in Hidden Semi-Markov Model for Sequential Data Analysis


Title	State Duration and Interval Modeling in Hidden Semi-Markov Model for Sequential Data Analysis
Authors	Hiromi Narimatsu, Hiroyuki Kasai
Abstract	Sequential data modeling and analysis have become indispensable tools for analyzing sequential data, such as time-series data, because larger amounts of sensed event data have become available. These methods capture the sequential structure of data of interest, such as input-output relations and correlation among datasets. However, because most studies in this area are specialized or limited to their respective applications, rigorous requirement analysis of such models has not been undertaken from a general perspective. Therefore, we particularly examine the structure of sequential data, and extract the necessity of `state duration' and` state interval’ of events for efficient and rich representation of sequential data. Specifically addressing the hidden semi-Markov model (HSMM) that represents such state duration inside a model, we attempt to add representational capability of a state interval of events onto HSMM. To this end, we propose two extended models: an interval state hidden semi-Markov model (IS-HSMM) to express the length of a state interval with a special state node designated as “interval state node”; and an interval length probability hidden semi-Markov model (ILP-HSMM) which represents the length of the state interval with a new probabilistic parameter “interval length probability.” Exhaustive simulations have revealed superior performance of the proposed models in comparison with HSMM. These proposed models are the first reported extensions of HMM to support state interval representation as well as state duration representation.
Tasks	Time Series
Published	2016-08-24
URL	http://arxiv.org/abs/1608.06954v2
PDF	http://arxiv.org/pdf/1608.06954v2.pdf
PWC	https://paperswithcode.com/paper/state-duration-and-interval-modeling-in
Repo
Framework

Fairness in Reinforcement Learning


Title	Fairness in Reinforcement Learning
Authors	Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Aaron Roth
Abstract	We initiate the study of fairness in reinforcement learning, where the actions of a learning algorithm may affect its environment and future rewards. Our fairness constraint requires that an algorithm never prefers one action over another if the long-term (discounted) reward of choosing the latter action is higher. Our first result is negative: despite the fact that fairness is consistent with the optimal policy, any learning algorithm satisfying fairness must take time exponential in the number of states to achieve non-trivial approximation to the optimal policy. We then provide a provably fair polynomial time algorithm under an approximate notion of fairness, thus establishing an exponential gap between exact and approximate fairness
Tasks
Published	2016-11-09
URL	http://arxiv.org/abs/1611.03071v4
PDF	http://arxiv.org/pdf/1611.03071v4.pdf
PWC	https://paperswithcode.com/paper/fairness-in-reinforcement-learning
Repo
Framework

Backprop KF: Learning Discriminative Deterministic State Estimators


Title	Backprop KF: Learning Discriminative Deterministic State Estimators
Authors	Tuomas Haarnoja, Anurag Ajay, Sergey Levine, Pieter Abbeel
Abstract	Generative state estimators based on probabilistic filters and smoothers are one of the most popular classes of state estimators for robots and autonomous vehicles. However, generative models have limited capacity to handle rich sensory observations, such as camera images, since they must model the entire distribution over sensor readings. Discriminative models do not suffer from this limitation, but are typically more complex to train as latent variable models for state estimation. We present an alternative approach where the parameters of the latent state distribution are directly optimized as a deterministic computation graph, resulting in a simple and effective gradient descent algorithm for training discriminative state estimators. We show that this procedure can be used to train state estimators that use complex input, such as raw camera images, which must be processed using expressive nonlinear function approximators such as convolutional neural networks. Our model can be viewed as a type of recurrent neural network, and the connection to probabilistic filtering allows us to design a network architecture that is particularly well suited for state estimation. We evaluate our approach on synthetic tracking task with raw image inputs and on the visual odometry task in the KITTI dataset. The results show significant improvement over both standard generative approaches and regular recurrent neural networks.
Tasks	Autonomous Vehicles, Latent Variable Models, Visual Odometry
Published	2016-05-23
URL	http://arxiv.org/abs/1605.07148v4
PDF	http://arxiv.org/pdf/1605.07148v4.pdf
PWC	https://paperswithcode.com/paper/backprop-kf-learning-discriminative
Repo
Framework

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits


Title	Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
Authors	Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik
Abstract	We study the off-policy evaluation problem—estimating the value of a target policy using data collected by another policy—under the contextual bandit model. We consider the general (agnostic) setting without access to a consistent model of rewards and establish a minimax lower bound on the mean squared error (MSE). The bound is matched up to constants by the inverse propensity scoring (IPS) and doubly robust (DR) estimators. This highlights the difficulty of the agnostic contextual setting, in contrast with multi-armed bandits and contextual bandits with access to a consistent reward model, where IPS is suboptimal. We then propose the SWITCH estimator, which can use an existing reward model (not necessarily consistent) to achieve a better bias-variance tradeoff than IPS and DR. We prove an upper bound on its MSE and demonstrate its benefits empirically on a diverse collection of data sets, often outperforming prior work by orders of magnitude.
Tasks	Multi-Armed Bandits
Published	2016-12-04
URL	http://arxiv.org/abs/1612.01205v2
PDF	http://arxiv.org/pdf/1612.01205v2.pdf
PWC	https://paperswithcode.com/paper/optimal-and-adaptive-off-policy-evaluation-in
Repo
Framework

Binomial Checkpointing for Arbitrary Programs with No User Annotation


Title	Binomial Checkpointing for Arbitrary Programs with No User Annotation
Authors	Jeffrey Mark Siskind, Barak A. Pearlmutter
Abstract	Heretofore, automatic checkpointing at procedure-call boundaries, to reduce the space complexity of reverse mode, has been provided by systems like Tapenade. However, binomial checkpointing, or treeverse, has only been provided in Automatic Differentiation (AD) systems in special cases, e.g., through user-provided pragmas on DO loops in Tapenade, or as the nested taping mechanism in adol-c for time integration processes, which requires that user code be refactored. We present a framework for applying binomial checkpointing to arbitrary code with no special annotation or refactoring required. This is accomplished by applying binomial checkpointing directly to a program trace. This trace is produced by a general-purpose checkpointing mechanism that is orthogonal to AD.
Tasks
Published	2016-11-10
URL	http://arxiv.org/abs/1611.03410v1
PDF	http://arxiv.org/pdf/1611.03410v1.pdf
PWC	https://paperswithcode.com/paper/binomial-checkpointing-for-arbitrary-programs
Repo
Framework

A scaled Bregman theorem with applications


Title	A scaled Bregman theorem with applications
Authors	Richard Nock, Aditya Krishna Menon, Cheng Soon Ong
Abstract	Bregman divergences play a central role in the design and analysis of a range of machine learning algorithms. This paper explores the use of Bregman divergences to establish reductions between such algorithms and their analyses. We present a new scaled isodistortion theorem involving Bregman divergences (scaled Bregman theorem for short) which shows that certain “Bregman distortions’” (employing a potentially non-convex generator) may be exactly re-written as a scaled Bregman divergence computed over transformed data. Admissible distortions include geodesic distances on curved manifolds and projections or gauge-normalisation, while admissible data include scalars, vectors and matrices. Our theorem allows one to leverage to the wealth and convenience of Bregman divergences when analysing algorithms relying on the aforementioned Bregman distortions. We illustrate this with three novel applications of our theorem: a reduction from multi-class density ratio to class-probability estimation, a new adaptive projection free yet norm-enforcing dual norm mirror descent algorithm, and a reduction from clustering on flat manifolds to clustering on curved manifolds. Experiments on each of these domains validate the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence properties that have been pervasive in machine learning.
Tasks
Published	2016-07-01
URL	http://arxiv.org/abs/1607.00360v1
PDF	http://arxiv.org/pdf/1607.00360v1.pdf
PWC	https://paperswithcode.com/paper/a-scaled-bregman-theorem-with-applications
Repo
Framework

Markov substitute processes : a new model for linguistics and beyond


Title	Markov substitute processes : a new model for linguistics and beyond
Authors	Olivier Catoni, Thomas Mainguy
Abstract	We introduce Markov substitute processes, a new model at the crossroad of statistics and formal grammars, and prove its main property : Markov substitute processes with a given support form an exponential family.
Tasks
Published	2016-03-25
URL	http://arxiv.org/abs/1603.07850v1
PDF	http://arxiv.org/pdf/1603.07850v1.pdf
PWC	https://paperswithcode.com/paper/markov-substitute-processes-a-new-model-for
Repo
Framework

Trajectory probability hypothesis density filter


Title	Trajectory probability hypothesis density filter
Authors	Ángel F. García-Fernández, Lennart Svensson
Abstract	This paper presents the probability hypothesis density (PHD) filter for sets of trajectories: the trajectory probability density (TPHD) filter. The TPHD filter is capable of estimating trajectories in a principled way without requiring to evaluate all measurement-to-target association hypotheses. The TPHD filter is based on recursively obtaining the best Poisson approximation to the multitrajectory filtering density in the sense of minimising the Kullback-Leibler divergence. We also propose a Gaussian mixture implementation of the TPHD recursion. Finally, we include simulation results to show the performance of the proposed algorithm.
Tasks
Published	2016-05-24
URL	http://arxiv.org/abs/1605.07264v2
PDF	http://arxiv.org/pdf/1605.07264v2.pdf
PWC	https://paperswithcode.com/paper/trajectory-probability-hypothesis-density
Repo
Framework

Investigation Into The Effectiveness Of Long Short Term Memory Networks For Stock Price Prediction


Title	Investigation Into The Effectiveness Of Long Short Term Memory Networks For Stock Price Prediction
Authors	Hengjian Jia
Abstract	The effectiveness of long short term memory networks trained by backpropagation through time for stock price prediction is explored in this paper. A range of different architecture LSTM networks are constructed trained and tested.
Tasks	Stock Price Prediction
Published	2016-03-25
URL	http://arxiv.org/abs/1603.07893v3
PDF	http://arxiv.org/pdf/1603.07893v3.pdf
PWC	https://paperswithcode.com/paper/investigation-into-the-effectiveness-of-long
Repo
Framework

A Statistical Test for Joint Distributions Equivalence


Title	A Statistical Test for Joint Distributions Equivalence
Authors	Francesco Solera, Andrea Palazzi
Abstract	We provide a distribution-free test that can be used to determine whether any two joint distributions $p$ and $q$ are statistically different by inspection of a large enough set of samples. Following recent efforts from Long et al. [1], we rely on joint kernel distribution embedding to extend the kernel two-sample test of Gretton et al. [2] to the case of joint probability distributions. Our main result can be directly applied to verify if a dataset-shift has occurred between training and test distributions in a learning framework, without further assuming the shift has occurred only in the input, in the target or in the conditional distribution.
Tasks
Published	2016-07-25
URL	http://arxiv.org/abs/1607.07270v1
PDF	http://arxiv.org/pdf/1607.07270v1.pdf
PWC	https://paperswithcode.com/paper/a-statistical-test-for-joint-distributions
Repo
Framework

Question Retrieval for Community-based Question Answering via Heterogeneous Network Integration Learning


Title	Question Retrieval for Community-based Question Answering via Heterogeneous Network Integration Learning
Authors	Zheqian Chen, Chi Zhang, Zhou Zhao, Deng Cai
Abstract	Community based question answering platforms have attracted substantial users to share knowledge and learn from each other. As the rapid enlargement of CQA platforms, quantities of overlapped questions emerge, which makes users confounded to select a proper reference. It is urgent for us to take effective automated algorithms to reuse historical questions with corresponding answers. In this paper we focus on the problem with question retrieval, which aims to match historical questions that are relevant or semantically equivalent to resolve one s query directly. The challenges in this task are the lexical gaps between questions for the word ambiguity and word mismatch problem. Furthermore, limited words in queried sentences cause sparsity of word features. To alleviate these challenges, we propose a novel framework named HNIL which encodes not only the question contents but also the askers social interactions to enhance the question embedding performance. More specifically, we apply random walk based learning method with recurrent neural network to match the similarities between askers question and historical questions proposed by other users. Extensive experiments on a large scale dataset from a real world CQA site show that employing the heterogeneous social network information outperforms the other state of the art solutions in this task.
Tasks	Question Answering
Published	2016-11-24
URL	http://arxiv.org/abs/1611.08135v1
PDF	http://arxiv.org/pdf/1611.08135v1.pdf
PWC	https://paperswithcode.com/paper/question-retrieval-for-community-based
Repo
Framework

Deep Gate Recurrent Neural Network


Title	Deep Gate Recurrent Neural Network
Authors	Yuan Gao, Dorota Glowacka
Abstract	This paper introduces two recurrent neural network structures called Simple Gated Unit (SGU) and Deep Simple Gated Unit (DSGU), which are general structures for learning long term dependencies. Compared to traditional Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), both structures require fewer parameters and less computation time in sequence classification tasks. Unlike GRU and LSTM, which require more than one gates to control information flow in the network, SGU and DSGU only use one multiplicative gate to control the flow of information. We show that this difference can accelerate the learning speed in tasks that require long dependency information. We also show that DSGU is more numerically stable than SGU. In addition, we also propose a standard way of representing inner structure of RNN called RNN Conventional Graph (RCG), which helps analyzing the relationship between input units and hidden units of RNN.
Tasks
Published	2016-04-11
URL	http://arxiv.org/abs/1604.02910v3
PDF	http://arxiv.org/pdf/1604.02910v3.pdf
PWC	https://paperswithcode.com/paper/deep-gate-recurrent-neural-network
Repo
Framework

Vanishing point detection with convolutional neural networks


Title	Vanishing point detection with convolutional neural networks
Authors	Ali Borji
Abstract	Inspired by the finding that vanishing point (road tangent) guides driver’s gaze, in our previous work we showed that vanishing point attracts gaze during free viewing of natural scenes as well as in visual search (Borji et al., Journal of Vision 2016). We have also introduced improved saliency models using vanishing point detectors (Feng et al., WACV 2016). Here, we aim to predict vanishing points in naturalistic environments by training convolutional neural networks in an end-to-end manner over a large set of road images downloaded from Youtube with vanishing points annotated. Results demonstrate effectiveness of our approach compared to classic approaches of vanishing point detection in the literature.
Tasks
Published	2016-09-04
URL	http://arxiv.org/abs/1609.00967v1
PDF	http://arxiv.org/pdf/1609.00967v1.pdf
PWC	https://paperswithcode.com/paper/vanishing-point-detection-with-convolutional
Repo
Framework

Credibilistic TOPSIS Model for Evaluation and Selection of Municipal Solid Waste Disposal Methods


Title	Credibilistic TOPSIS Model for Evaluation and Selection of Municipal Solid Waste Disposal Methods
Authors	Jagannath Roy, Krishnendu Adhikary, Samarjit Kar
Abstract	Municipal solid waste management (MSWM) is a challenging issue of urban development in developing countries. Each country having different socio-economic-environmental background, might not accept a particular disposal method as the optimal choice. Selection of suitable disposal method in MSWM, under vague and imprecise information can be considered as multi criteria decision making problem (MCDM). In the present paper, TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) methodology is extended based on credibility theory for evaluating the performances of MSW disposal methods under some criteria fixed by experts. The proposed model helps decision makers to choose a preferable alternative for their municipal area. A sensitivity analysis by our proposed model confirms this fact.
Tasks	Decision Making
Published	2016-06-29
URL	http://arxiv.org/abs/1606.08965v3
PDF	http://arxiv.org/pdf/1606.08965v3.pdf
PWC	https://paperswithcode.com/paper/credibilistic-topsis-model-for-evaluation-and
Repo
Framework

Reinforcement Learning of POMDPs using Spectral Methods


Title	Reinforcement Learning of POMDPs using Spectral Methods
Authors	Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
Abstract	We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through episodes, in each episode we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the episode, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.
Tasks	Latent Variable Models
Published	2016-02-25
URL	http://arxiv.org/abs/1602.07764v2
PDF	http://arxiv.org/pdf/1602.07764v2.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-of-pomdps-using
Repo
Framework