July 28, 2019

3093 words 15 mins read

Paper Group ANR 408

Determinantal Point Processes for Mini-Batch Diversification. Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery. Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks. Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recogniti …

Determinantal Point Processes for Mini-Batch Diversification


Title	Determinantal Point Processes for Mini-Batch Diversification
Authors	Cheng Zhang, Hedvig Kjellstrom, Stephan Mandt
Abstract	We study a mini-batch diversification scheme for stochastic gradient descent (SGD). While classical SGD relies on uniformly sampling data points to form a mini-batch, we propose a non-uniform sampling scheme based on the Determinantal Point Process (DPP). The DPP relies on a similarity measure between data points and gives low probabilities to mini-batches which contain redundant data, and higher probabilities to mini-batches with more diverse data. This simultaneously balances the data and leads to stochastic gradients with lower variance. We term this approach Diversified Mini-Batch SGD (DM-SGD). We show that regular SGD and a biased version of stratified sampling emerge as special cases. Furthermore, DM-SGD generalizes stratified sampling to cases where no discrete features exist to bin the data into groups. We show experimentally that our method results more interpretable and diverse features in unsupervised setups, and in better classification accuracies in supervised setups.
Tasks	Point Processes
Published	2017-05-01
URL	http://arxiv.org/abs/1705.00607v2
PDF	http://arxiv.org/pdf/1705.00607v2.pdf
PWC	https://paperswithcode.com/paper/determinantal-point-processes-for-mini-batch
Repo
Framework

Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery


Title	Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery
Authors	Eric V. Strobl, Kun Zhang, Shyam Visweswaran
Abstract	Constraint-based causal discovery (CCD) algorithms require fast and accurate conditional independence (CI) testing. The Kernel Conditional Independence Test (KCIT) is currently one of the most popular CI tests in the non-parametric setting, but many investigators cannot use KCIT with large datasets because the test scales cubicly with sample size. We therefore devise two relaxations called the Randomized Conditional Independence Test (RCIT) and the Randomized conditional Correlation Test (RCoT) which both approximate KCIT by utilizing random Fourier features. In practice, both of the proposed tests scale linearly with sample size and return accurate p-values much faster than KCIT in the large sample size context. CCD algorithms run with RCIT or RCoT also return graphs at least as accurate as the same algorithms run with KCIT but with large reductions in run time.
Tasks	Causal Discovery
Published	2017-02-13
URL	http://arxiv.org/abs/1702.03877v2
PDF	http://arxiv.org/pdf/1702.03877v2.pdf
PWC	https://paperswithcode.com/paper/approximate-kernel-based-conditional
Repo
Framework

Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks


Title	Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks
Authors	Shuai Xiao, Junchi Yan, Mehrdad Farajtabar, Le Song, Xiaokang Yang, Hongyuan Zha
Abstract	A variety of real-world processes (over networks) produce sequences of data whose complex temporal dynamics need to be studied. More especially, the event timestamps can carry important information about the underlying network dynamics, which otherwise are not available from the time-series evenly sampled from continuous signals. Moreover, in most complex processes, event sequences and evenly-sampled times series data can interact with each other, which renders joint modeling of those two sources of data necessary. To tackle the above problems, in this paper, we utilize the rich framework of (temporal) point processes to model event data and timely update its intensity function by the synergic twin Recurrent Neural Networks (RNNs). In the proposed architecture, the intensity function is synergistically modulated by one RNN with asynchronous events as input and another RNN with time series as input. Furthermore, to enhance the interpretability of the model, the attention mechanism for the neural point process is introduced. The whole model with event type and timestamp prediction output layers can be trained end-to-end and allows a black-box treatment for modeling the intensity. We substantiate the superiority of our model in synthetic data and three real-world benchmark datasets.
Tasks	Point Processes, Time Series
Published	2017-03-24
URL	http://arxiv.org/abs/1703.08524v1
PDF	http://arxiv.org/pdf/1703.08524v1.pdf
PWC	https://paperswithcode.com/paper/joint-modeling-of-event-sequence-and-time
Repo
Framework

Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition


Title	Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition
Authors	Jian Liu, Naveed Akhtar, Ajmal Mian
Abstract	Human skeleton joints are popular for action analysis since they can be easily extracted from videos to discard background noises. However, current skeleton representations do not fully benefit from machine learning with CNNs. We propose “Skepxels” a spatio-temporal representation for skeleton sequences to fully exploit the “local” correlations between joints using the 2D convolution kernels of CNN. We transform skeleton videos into images of flexible dimensions using Skepxels and develop a CNN-based framework for effective human action recognition using the resulting images. Skepxels encode rich spatio-temporal information about the skeleton joints in the frames by maximizing a unique distance metric, defined collaboratively over the distinct joint arrangements used in the skeletal image. Moreover, they are flexible in encoding compound semantic notions such as location and speed of the joints. The proposed action recognition exploits the representation in a hierarchical manner by first capturing the micro-temporal relations between the skeleton joints with the Skepxels and then exploiting their macro-temporal relations by computing the Fourier Temporal Pyramids over the CNN features of the skeletal images. We extend the Inception-ResNet CNN architecture with the proposed method and improve the state-of-the-art accuracy by 4.4% on the large scale NTU human activity dataset. On the medium-sized N-UCLA and UTH-MHAD datasets, our method outperforms the existing results by 5.7% and 9.3% respectively.
Tasks	Temporal Action Localization
Published	2017-11-16
URL	http://arxiv.org/abs/1711.05941v4
PDF	http://arxiv.org/pdf/1711.05941v4.pdf
PWC	https://paperswithcode.com/paper/skepxels-spatio-temporal-image-representation
Repo
Framework

Learning flexible representations of stochastic processes on graphs


Title	Learning flexible representations of stochastic processes on graphs
Authors	Addison Bohannon, Brian Sadler, Radu Balan
Abstract	Graph convolutional networks adapt the architecture of convolutional neural networks to learn rich representations of data supported on arbitrary graphs by replacing the convolution operations of convolutional neural networks with graph-dependent linear operations. However, these graph-dependent linear operations are developed for scalar functions supported on undirected graphs. We propose a class of linear operations for stochastic (time-varying) processes on directed (or undirected) graphs to be used in graph convolutional networks. We propose a parameterization of such linear operations using functional calculus to achieve arbitrarily low learning complexity. The proposed approach is shown to model richer behaviors and display greater flexibility in learning representations than product graph methods.
Tasks
Published	2017-11-03
URL	http://arxiv.org/abs/1711.01191v2
PDF	http://arxiv.org/pdf/1711.01191v2.pdf
PWC	https://paperswithcode.com/paper/learning-flexible-representations-of
Repo
Framework

Sparse Hierarchical Regression with Polynomials


Title	Sparse Hierarchical Regression with Polynomials
Authors	Dimitris Bertsimas, Bart Van Parys
Abstract	We present a novel method for exact hierarchical sparse polynomial regression. Our regressor is that degree $r$ polynomial which depends on at most $k$ inputs, counting at most $\ell$ monomial terms, which minimizes the sum of the squares of its prediction errors. The previous hierarchical sparse specification aligns well with modern big data settings where many inputs are not relevant for prediction purposes and the functional complexity of the regressor needs to be controlled as to avoid overfitting. We present a two-step approach to this hierarchical sparse regression problem. First, we discard irrelevant inputs using an extremely fast input ranking heuristic. Secondly, we take advantage of modern cutting plane methods for integer optimization to solve our resulting reduced hierarchical $(k, \ell)$-sparse problem exactly. The ability of our method to identify all $k$ relevant inputs and all $\ell$ monomial terms is shown empirically to experience a phase transition. Crucially, the same transition also presents itself in our ability to reject all irrelevant features and monomials as well. In the regime where our method is statistically powerful, its computational complexity is interestingly on par with Lasso based heuristics. The presented work fills a void in terms of a lack of powerful disciplined nonlinear sparse regression methods in high-dimensional settings. Our method is shown empirically to scale to regression problems with $n\approx 10,000$ observations for input dimension $p\approx 1,000$.
Tasks
Published	2017-09-28
URL	http://arxiv.org/abs/1709.10030v1
PDF	http://arxiv.org/pdf/1709.10030v1.pdf
PWC	https://paperswithcode.com/paper/sparse-hierarchical-regression-with
Repo
Framework

Novel Exploration Techniques (NETs) for Malaria Policy Interventions


Title	Novel Exploration Techniques (NETs) for Malaria Policy Interventions
Authors	Oliver Bent, Sekou L. Remy, Stephen Roberts, Aisha Walcott-Bryant
Abstract	The task of decision-making under uncertainty is daunting, especially for problems which have significant complexity. Healthcare policy makers across the globe are facing problems under challenging constraints, with limited tools to help them make data driven decisions. In this work we frame the process of finding an optimal malaria policy as a stochastic multi-armed bandit problem, and implement three agent based strategies to explore the policy space. We apply a Gaussian Process regression to the findings of each agent, both for comparison and to account for stochastic results from simulating the spread of malaria in a fixed population. The generated policy spaces are compared with published results to give a direct reference with human expert decisions for the same simulated population. Our novel approach provides a powerful resource for policy makers, and a platform which can be readily extended to capture future more nuanced policy spaces.
Tasks	Decision Making, Decision Making Under Uncertainty
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00428v1
PDF	http://arxiv.org/pdf/1712.00428v1.pdf
PWC	https://paperswithcode.com/paper/novel-exploration-techniques-nets-for-malaria
Repo
Framework

Tight Bounds for Bandit Combinatorial Optimization


Title	Tight Bounds for Bandit Combinatorial Optimization
Authors	Alon Cohen, Tamir Hazan, Tomer Koren
Abstract	We revisit the study of optimal regret rates in bandit combinatorial optimization—a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems. We prove that the attainable regret in this setting grows as $\widetilde{\Theta}(k^{3/2}\sqrt{dT})$ where $d$ is the dimension of the problem and $k$ is a bound over the maximal instantaneous loss, disproving a conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal rate should be of the form $\widetilde{\Theta}(k\sqrt{dT})$. Our bounds apply to several important instances of the framework, and in particular, imply a tight bound for the well-studied bandit shortest path problem. By that, we also resolve an open problem posed by Cesa-Bianchi and Lugosi (2012).
Tasks	Combinatorial Optimization, Decision Making, Decision Making Under Uncertainty
Published	2017-02-24
URL	http://arxiv.org/abs/1702.07539v1
PDF	http://arxiv.org/pdf/1702.07539v1.pdf
PWC	https://paperswithcode.com/paper/tight-bounds-for-bandit-combinatorial
Repo
Framework

Leveraging Discourse Information Effectively for Authorship Attribution


Title	Leveraging Discourse Information Effectively for Authorship Attribution
Authors	Su Wang, Elisa Ferracane, Raymond J. Mooney
Abstract	We explore techniques to maximize the effectiveness of discourse information in the task of authorship attribution. We present a novel method to embed discourse features in a Convolutional Neural Network text classifier, which achieves a state-of-the-art result by a substantial margin. We empirically investigate several featurization methods to understand the conditions under which discourse features contribute non-trivial performance gains, and analyze discourse embeddings.
Tasks
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02271v1
PDF	http://arxiv.org/pdf/1709.02271v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-discourse-information-effectively
Repo
Framework

A Deep Convolutional Neural Network for Background Subtraction


Title	A Deep Convolutional Neural Network for Background Subtraction
Authors	Mohammadreza Babaee, Duc Tung Dinh, Gerhard Rigoll
Abstract	In this work, we present a novel background subtraction system that uses a deep Convolutional Neural Network (CNN) to perform the segmentation. With this approach, feature engineering and parameter tuning become unnecessary since the network parameters can be learned from data by training a single CNN that can handle various video scenes. Additionally, we propose a new approach to estimate background model from video. For the training of the CNN, we employed randomly 5 percent video frames and their ground truth segmentations taken from the Change Detection challenge 2014(CDnet 2014). We also utilized spatial-median filtering as the post-processing of the network outputs. Our method is evaluated with different data-sets, and the network outperforms the existing algorithms with respect to the average ranking over different evaluation metrics. Furthermore, due to the network architecture, our CNN is capable of real time processing.
Tasks	Feature Engineering
Published	2017-02-06
URL	http://arxiv.org/abs/1702.01731v1
PDF	http://arxiv.org/pdf/1702.01731v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-convolutional-neural-network-for
Repo
Framework

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes


Title	Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes
Authors	Kunal Menda, Yi-Chun Chen, Justin Grana, James W. Bono, Brendan D. Tracey, Mykel J. Kochenderfer, David Wolpert
Abstract	The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies generalized advantage estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as “event-driven decision processes,” which are scenarios in which the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely separated events occur, adversely affecting the policies learned. In addition, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.
Tasks
Published	2017-09-19
URL	https://arxiv.org/abs/1709.06656v2
PDF	https://arxiv.org/pdf/1709.06656v2.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-event-driven
Repo
Framework

Distributed algorithm for empty vehicles management in personal rapid transit (PRT) network


Title	Distributed algorithm for empty vehicles management in personal rapid transit (PRT) network
Authors	Wiktor B. Daszczuk, Jerzy Mieścicki, Waldemar Grabski
Abstract	In this paper, an original heuristic algorithm of empty vehicles management in personal rapid transit network is presented. The algorithm is used for the delivery of empty vehicles for waiting passengers, for balancing the distribution of empty vehicles within the network, and for providing an empty space for vehicles approaching a station. Each of these tasks involves a decision on the trip that has to be done by a selected empty vehicle from its actual location to some determined destination. The decisions are based on a multi-parameter function involving a set of factors and thresholds. An important feature of the algorithm is that it does not use any central database of passenger input (demand) and locations of free vehicles. Instead, it is based on the local exchange of data between stations: on their states and on the vehicles they expect. Therefore, it seems well-tailored for a distributed implementation. The algorithm is uniform, meaning that the same basic procedure is used for multiple tasks using a task-specific set of parameters.
Tasks
Published	2017-10-17
URL	http://arxiv.org/abs/1710.06331v1
PDF	http://arxiv.org/pdf/1710.06331v1.pdf
PWC	https://paperswithcode.com/paper/distributed-algorithm-for-empty-vehicles
Repo
Framework

The Prediction Advantage: A Universally Meaningful Performance Measure for Classification and Regression


Title	The Prediction Advantage: A Universally Meaningful Performance Measure for Classification and Regression
Authors	Ran El-Yaniv, Yonatan Geifman, Yair Wiener
Abstract	We introduce the Prediction Advantage (PA), a novel performance measure for prediction functions under any loss function (e.g., classification or regression). The PA is defined as the performance advantage relative to the Bayesian risk restricted to knowing only the distribution of the labels. We derive the PA for well-known loss functions, including 0/1 loss, cross-entropy loss, absolute loss, and squared loss. In the latter case, the PA is identical to the well-known R-squared measure, widely used in statistics. The use of the PA ensures meaningful quantification of prediction performance, which is not guaranteed, for example, when dealing with noisy imbalanced classification problems. We argue that among several known alternative performance measures, PA is the best (and only) quantity ensuring meaningfulness for all noise and imbalance levels.
Tasks
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08499v2
PDF	http://arxiv.org/pdf/1705.08499v2.pdf
PWC	https://paperswithcode.com/paper/the-prediction-advantage-a-universally
Repo
Framework

Combining Discrete and Neural Features for Sequence Labeling


Title	Combining Discrete and Neural Features for Sequence Labeling
Authors	Jie Yang, Zhiyang Teng, Meishan Zhang, Yue Zhang
Abstract	Neural network models have recently received heated research attention in the natural language processing community. Compared with traditional models with discrete features, neural models have two main advantages. First, they take low-dimensional, real-valued embedding vectors as inputs, which can be trained over large raw data, thereby addressing the issue of feature sparsity in discrete models. Second, deep neural networks can be used to automatically combine input features, and including non-local features that capture semantic patterns that cannot be expressed using discrete indicator features. As a result, neural network models have achieved competitive accuracies compared with the best discrete models for a range of NLP tasks. On the other hand, manual feature templates have been carefully investigated for most NLP tasks over decades and typically cover the most useful indicator pattern for solving the problems. Such information can be complementary the features automatically induced from neural networks, and therefore combining discrete and neural features can potentially lead to better accuracy compared with models that leverage discrete or neural features only. In this paper, we systematically investigate the effect of discrete and neural feature combination for a range of fundamental NLP tasks based on sequence labeling, including word segmentation, POS tagging and named entity recognition for Chinese and English, respectively. Our results on standard benchmarks show that state-of-the-art neural models can give accuracies comparable to the best discrete models in the literature for most tasks and combing discrete and neural features unanimously yield better results.
Tasks	Named Entity Recognition
Published	2017-08-24
URL	http://arxiv.org/abs/1708.07279v1
PDF	http://arxiv.org/pdf/1708.07279v1.pdf
PWC	https://paperswithcode.com/paper/combining-discrete-and-neural-features-for
Repo
Framework

Efficient Contextual Bandits in Non-stationary Worlds


Title	Efficient Contextual Bandits in Non-stationary Worlds
Authors	Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford
Abstract	Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a change in distribution. We analyze various standard notions of regret suited to non-stationary environments for these algorithms, including interval regret, switching regret, and dynamic regret. When competing with the best policy at each time, one of our algorithms achieves regret $\mathcal{O}(\sqrt{ST})$ if there are $T$ rounds with $S$ stationary periods, or more generally $\mathcal{O}(\Delta^{1/3}T^{2/3})$ where $\Delta$ is some non-stationarity measure. These results almost match the optimal guarantees achieved by an inefficient baseline that is a variant of the classic Exp4 algorithm. The dynamic regret result is also the first one for efficient and fully adversarial contextual bandit. Furthermore, while the results above require tuning a parameter based on the unknown quantity $S$ or $\Delta$, we also develop a parameter free algorithm achieving regret $\min{S^{1/4}T^{3/4}, \Delta^{1/5}T^{4/5}}$. This improves and generalizes the best existing result $\Delta^{0.18}T^{0.82}$ by Karnin and Anava (2016) which only holds for the two-armed bandit problem.
Tasks	Multi-Armed Bandits
Published	2017-08-05
URL	http://arxiv.org/abs/1708.01799v4
PDF	http://arxiv.org/pdf/1708.01799v4.pdf
PWC	https://paperswithcode.com/paper/efficient-contextual-bandits-in-non
Repo
Framework