Paper Group ANR 408
Determinantal Point Processes for Mini-Batch Diversification. Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery. Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks. Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recogniti …
Determinantal Point Processes for Mini-Batch Diversification
Title | Determinantal Point Processes for Mini-Batch Diversification |
Authors | Cheng Zhang, Hedvig Kjellstrom, Stephan Mandt |
Abstract | We study a mini-batch diversification scheme for stochastic gradient descent (SGD). While classical SGD relies on uniformly sampling data points to form a mini-batch, we propose a non-uniform sampling scheme based on the Determinantal Point Process (DPP). The DPP relies on a similarity measure between data points and gives low probabilities to mini-batches which contain redundant data, and higher probabilities to mini-batches with more diverse data. This simultaneously balances the data and leads to stochastic gradients with lower variance. We term this approach Diversified Mini-Batch SGD (DM-SGD). We show that regular SGD and a biased version of stratified sampling emerge as special cases. Furthermore, DM-SGD generalizes stratified sampling to cases where no discrete features exist to bin the data into groups. We show experimentally that our method results more interpretable and diverse features in unsupervised setups, and in better classification accuracies in supervised setups. |
Tasks | Point Processes |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00607v2 |
http://arxiv.org/pdf/1705.00607v2.pdf | |
PWC | https://paperswithcode.com/paper/determinantal-point-processes-for-mini-batch |
Repo | |
Framework | |
Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery
Title | Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery |
Authors | Eric V. Strobl, Kun Zhang, Shyam Visweswaran |
Abstract | Constraint-based causal discovery (CCD) algorithms require fast and accurate conditional independence (CI) testing. The Kernel Conditional Independence Test (KCIT) is currently one of the most popular CI tests in the non-parametric setting, but many investigators cannot use KCIT with large datasets because the test scales cubicly with sample size. We therefore devise two relaxations called the Randomized Conditional Independence Test (RCIT) and the Randomized conditional Correlation Test (RCoT) which both approximate KCIT by utilizing random Fourier features. In practice, both of the proposed tests scale linearly with sample size and return accurate p-values much faster than KCIT in the large sample size context. CCD algorithms run with RCIT or RCoT also return graphs at least as accurate as the same algorithms run with KCIT but with large reductions in run time. |
Tasks | Causal Discovery |
Published | 2017-02-13 |
URL | http://arxiv.org/abs/1702.03877v2 |
http://arxiv.org/pdf/1702.03877v2.pdf | |
PWC | https://paperswithcode.com/paper/approximate-kernel-based-conditional |
Repo | |
Framework | |
Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks
Title | Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks |
Authors | Shuai Xiao, Junchi Yan, Mehrdad Farajtabar, Le Song, Xiaokang Yang, Hongyuan Zha |
Abstract | A variety of real-world processes (over networks) produce sequences of data whose complex temporal dynamics need to be studied. More especially, the event timestamps can carry important information about the underlying network dynamics, which otherwise are not available from the time-series evenly sampled from continuous signals. Moreover, in most complex processes, event sequences and evenly-sampled times series data can interact with each other, which renders joint modeling of those two sources of data necessary. To tackle the above problems, in this paper, we utilize the rich framework of (temporal) point processes to model event data and timely update its intensity function by the synergic twin Recurrent Neural Networks (RNNs). In the proposed architecture, the intensity function is synergistically modulated by one RNN with asynchronous events as input and another RNN with time series as input. Furthermore, to enhance the interpretability of the model, the attention mechanism for the neural point process is introduced. The whole model with event type and timestamp prediction output layers can be trained end-to-end and allows a black-box treatment for modeling the intensity. We substantiate the superiority of our model in synthetic data and three real-world benchmark datasets. |
Tasks | Point Processes, Time Series |
Published | 2017-03-24 |
URL | http://arxiv.org/abs/1703.08524v1 |
http://arxiv.org/pdf/1703.08524v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-modeling-of-event-sequence-and-time |
Repo | |
Framework | |
Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition
Title | Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition |
Authors | Jian Liu, Naveed Akhtar, Ajmal Mian |
Abstract | Human skeleton joints are popular for action analysis since they can be easily extracted from videos to discard background noises. However, current skeleton representations do not fully benefit from machine learning with CNNs. We propose “Skepxels” a spatio-temporal representation for skeleton sequences to fully exploit the “local” correlations between joints using the 2D convolution kernels of CNN. We transform skeleton videos into images of flexible dimensions using Skepxels and develop a CNN-based framework for effective human action recognition using the resulting images. Skepxels encode rich spatio-temporal information about the skeleton joints in the frames by maximizing a unique distance metric, defined collaboratively over the distinct joint arrangements used in the skeletal image. Moreover, they are flexible in encoding compound semantic notions such as location and speed of the joints. The proposed action recognition exploits the representation in a hierarchical manner by first capturing the micro-temporal relations between the skeleton joints with the Skepxels and then exploiting their macro-temporal relations by computing the Fourier Temporal Pyramids over the CNN features of the skeletal images. We extend the Inception-ResNet CNN architecture with the proposed method and improve the state-of-the-art accuracy by 4.4% on the large scale NTU human activity dataset. On the medium-sized N-UCLA and UTH-MHAD datasets, our method outperforms the existing results by 5.7% and 9.3% respectively. |
Tasks | Temporal Action Localization |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.05941v4 |
http://arxiv.org/pdf/1711.05941v4.pdf | |
PWC | https://paperswithcode.com/paper/skepxels-spatio-temporal-image-representation |
Repo | |
Framework | |
Learning flexible representations of stochastic processes on graphs
Title | Learning flexible representations of stochastic processes on graphs |
Authors | Addison Bohannon, Brian Sadler, Radu Balan |
Abstract | Graph convolutional networks adapt the architecture of convolutional neural networks to learn rich representations of data supported on arbitrary graphs by replacing the convolution operations of convolutional neural networks with graph-dependent linear operations. However, these graph-dependent linear operations are developed for scalar functions supported on undirected graphs. We propose a class of linear operations for stochastic (time-varying) processes on directed (or undirected) graphs to be used in graph convolutional networks. We propose a parameterization of such linear operations using functional calculus to achieve arbitrarily low learning complexity. The proposed approach is shown to model richer behaviors and display greater flexibility in learning representations than product graph methods. |
Tasks | |
Published | 2017-11-03 |
URL | http://arxiv.org/abs/1711.01191v2 |
http://arxiv.org/pdf/1711.01191v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-flexible-representations-of |
Repo | |
Framework | |
Sparse Hierarchical Regression with Polynomials
Title | Sparse Hierarchical Regression with Polynomials |
Authors | Dimitris Bertsimas, Bart Van Parys |
Abstract | We present a novel method for exact hierarchical sparse polynomial regression. Our regressor is that degree $r$ polynomial which depends on at most $k$ inputs, counting at most $\ell$ monomial terms, which minimizes the sum of the squares of its prediction errors. The previous hierarchical sparse specification aligns well with modern big data settings where many inputs are not relevant for prediction purposes and the functional complexity of the regressor needs to be controlled as to avoid overfitting. We present a two-step approach to this hierarchical sparse regression problem. First, we discard irrelevant inputs using an extremely fast input ranking heuristic. Secondly, we take advantage of modern cutting plane methods for integer optimization to solve our resulting reduced hierarchical $(k, \ell)$-sparse problem exactly. The ability of our method to identify all $k$ relevant inputs and all $\ell$ monomial terms is shown empirically to experience a phase transition. Crucially, the same transition also presents itself in our ability to reject all irrelevant features and monomials as well. In the regime where our method is statistically powerful, its computational complexity is interestingly on par with Lasso based heuristics. The presented work fills a void in terms of a lack of powerful disciplined nonlinear sparse regression methods in high-dimensional settings. Our method is shown empirically to scale to regression problems with $n\approx 10,000$ observations for input dimension $p\approx 1,000$. |
Tasks | |
Published | 2017-09-28 |
URL | http://arxiv.org/abs/1709.10030v1 |
http://arxiv.org/pdf/1709.10030v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-hierarchical-regression-with |
Repo | |
Framework | |
Novel Exploration Techniques (NETs) for Malaria Policy Interventions
Title | Novel Exploration Techniques (NETs) for Malaria Policy Interventions |
Authors | Oliver Bent, Sekou L. Remy, Stephen Roberts, Aisha Walcott-Bryant |
Abstract | The task of decision-making under uncertainty is daunting, especially for problems which have significant complexity. Healthcare policy makers across the globe are facing problems under challenging constraints, with limited tools to help them make data driven decisions. In this work we frame the process of finding an optimal malaria policy as a stochastic multi-armed bandit problem, and implement three agent based strategies to explore the policy space. We apply a Gaussian Process regression to the findings of each agent, both for comparison and to account for stochastic results from simulating the spread of malaria in a fixed population. The generated policy spaces are compared with published results to give a direct reference with human expert decisions for the same simulated population. Our novel approach provides a powerful resource for policy makers, and a platform which can be readily extended to capture future more nuanced policy spaces. |
Tasks | Decision Making, Decision Making Under Uncertainty |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.00428v1 |
http://arxiv.org/pdf/1712.00428v1.pdf | |
PWC | https://paperswithcode.com/paper/novel-exploration-techniques-nets-for-malaria |
Repo | |
Framework | |
Tight Bounds for Bandit Combinatorial Optimization
Title | Tight Bounds for Bandit Combinatorial Optimization |
Authors | Alon Cohen, Tamir Hazan, Tomer Koren |
Abstract | We revisit the study of optimal regret rates in bandit combinatorial optimization—a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems. We prove that the attainable regret in this setting grows as $\widetilde{\Theta}(k^{3/2}\sqrt{dT})$ where $d$ is the dimension of the problem and $k$ is a bound over the maximal instantaneous loss, disproving a conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal rate should be of the form $\widetilde{\Theta}(k\sqrt{dT})$. Our bounds apply to several important instances of the framework, and in particular, imply a tight bound for the well-studied bandit shortest path problem. By that, we also resolve an open problem posed by Cesa-Bianchi and Lugosi (2012). |
Tasks | Combinatorial Optimization, Decision Making, Decision Making Under Uncertainty |
Published | 2017-02-24 |
URL | http://arxiv.org/abs/1702.07539v1 |
http://arxiv.org/pdf/1702.07539v1.pdf | |
PWC | https://paperswithcode.com/paper/tight-bounds-for-bandit-combinatorial |
Repo | |
Framework | |
Leveraging Discourse Information Effectively for Authorship Attribution
Title | Leveraging Discourse Information Effectively for Authorship Attribution |
Authors | Su Wang, Elisa Ferracane, Raymond J. Mooney |
Abstract | We explore techniques to maximize the effectiveness of discourse information in the task of authorship attribution. We present a novel method to embed discourse features in a Convolutional Neural Network text classifier, which achieves a state-of-the-art result by a substantial margin. We empirically investigate several featurization methods to understand the conditions under which discourse features contribute non-trivial performance gains, and analyze discourse embeddings. |
Tasks | |
Published | 2017-09-07 |
URL | http://arxiv.org/abs/1709.02271v1 |
http://arxiv.org/pdf/1709.02271v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-discourse-information-effectively |
Repo | |
Framework | |
A Deep Convolutional Neural Network for Background Subtraction
Title | A Deep Convolutional Neural Network for Background Subtraction |
Authors | Mohammadreza Babaee, Duc Tung Dinh, Gerhard Rigoll |
Abstract | In this work, we present a novel background subtraction system that uses a deep Convolutional Neural Network (CNN) to perform the segmentation. With this approach, feature engineering and parameter tuning become unnecessary since the network parameters can be learned from data by training a single CNN that can handle various video scenes. Additionally, we propose a new approach to estimate background model from video. For the training of the CNN, we employed randomly 5 percent video frames and their ground truth segmentations taken from the Change Detection challenge 2014(CDnet 2014). We also utilized spatial-median filtering as the post-processing of the network outputs. Our method is evaluated with different data-sets, and the network outperforms the existing algorithms with respect to the average ranking over different evaluation metrics. Furthermore, due to the network architecture, our CNN is capable of real time processing. |
Tasks | Feature Engineering |
Published | 2017-02-06 |
URL | http://arxiv.org/abs/1702.01731v1 |
http://arxiv.org/pdf/1702.01731v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-convolutional-neural-network-for |
Repo | |
Framework | |
Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes
Title | Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes |
Authors | Kunal Menda, Yi-Chun Chen, Justin Grana, James W. Bono, Brendan D. Tracey, Mykel J. Kochenderfer, David Wolpert |
Abstract | The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies generalized advantage estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as “event-driven decision processes,” which are scenarios in which the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely separated events occur, adversely affecting the policies learned. In addition, we show that arbitrarily shrinking the time-step scales poorly with the number of agents. |
Tasks | |
Published | 2017-09-19 |
URL | https://arxiv.org/abs/1709.06656v2 |
https://arxiv.org/pdf/1709.06656v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-event-driven |
Repo | |
Framework | |
Distributed algorithm for empty vehicles management in personal rapid transit (PRT) network
Title | Distributed algorithm for empty vehicles management in personal rapid transit (PRT) network |
Authors | Wiktor B. Daszczuk, Jerzy Mieścicki, Waldemar Grabski |
Abstract | In this paper, an original heuristic algorithm of empty vehicles management in personal rapid transit network is presented. The algorithm is used for the delivery of empty vehicles for waiting passengers, for balancing the distribution of empty vehicles within the network, and for providing an empty space for vehicles approaching a station. Each of these tasks involves a decision on the trip that has to be done by a selected empty vehicle from its actual location to some determined destination. The decisions are based on a multi-parameter function involving a set of factors and thresholds. An important feature of the algorithm is that it does not use any central database of passenger input (demand) and locations of free vehicles. Instead, it is based on the local exchange of data between stations: on their states and on the vehicles they expect. Therefore, it seems well-tailored for a distributed implementation. The algorithm is uniform, meaning that the same basic procedure is used for multiple tasks using a task-specific set of parameters. |
Tasks | |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06331v1 |
http://arxiv.org/pdf/1710.06331v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-algorithm-for-empty-vehicles |
Repo | |
Framework | |
The Prediction Advantage: A Universally Meaningful Performance Measure for Classification and Regression
Title | The Prediction Advantage: A Universally Meaningful Performance Measure for Classification and Regression |
Authors | Ran El-Yaniv, Yonatan Geifman, Yair Wiener |
Abstract | We introduce the Prediction Advantage (PA), a novel performance measure for prediction functions under any loss function (e.g., classification or regression). The PA is defined as the performance advantage relative to the Bayesian risk restricted to knowing only the distribution of the labels. We derive the PA for well-known loss functions, including 0/1 loss, cross-entropy loss, absolute loss, and squared loss. In the latter case, the PA is identical to the well-known R-squared measure, widely used in statistics. The use of the PA ensures meaningful quantification of prediction performance, which is not guaranteed, for example, when dealing with noisy imbalanced classification problems. We argue that among several known alternative performance measures, PA is the best (and only) quantity ensuring meaningfulness for all noise and imbalance levels. |
Tasks | |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08499v2 |
http://arxiv.org/pdf/1705.08499v2.pdf | |
PWC | https://paperswithcode.com/paper/the-prediction-advantage-a-universally |
Repo | |
Framework | |
Combining Discrete and Neural Features for Sequence Labeling
Title | Combining Discrete and Neural Features for Sequence Labeling |
Authors | Jie Yang, Zhiyang Teng, Meishan Zhang, Yue Zhang |
Abstract | Neural network models have recently received heated research attention in the natural language processing community. Compared with traditional models with discrete features, neural models have two main advantages. First, they take low-dimensional, real-valued embedding vectors as inputs, which can be trained over large raw data, thereby addressing the issue of feature sparsity in discrete models. Second, deep neural networks can be used to automatically combine input features, and including non-local features that capture semantic patterns that cannot be expressed using discrete indicator features. As a result, neural network models have achieved competitive accuracies compared with the best discrete models for a range of NLP tasks. On the other hand, manual feature templates have been carefully investigated for most NLP tasks over decades and typically cover the most useful indicator pattern for solving the problems. Such information can be complementary the features automatically induced from neural networks, and therefore combining discrete and neural features can potentially lead to better accuracy compared with models that leverage discrete or neural features only. In this paper, we systematically investigate the effect of discrete and neural feature combination for a range of fundamental NLP tasks based on sequence labeling, including word segmentation, POS tagging and named entity recognition for Chinese and English, respectively. Our results on standard benchmarks show that state-of-the-art neural models can give accuracies comparable to the best discrete models in the literature for most tasks and combing discrete and neural features unanimously yield better results. |
Tasks | Named Entity Recognition |
Published | 2017-08-24 |
URL | http://arxiv.org/abs/1708.07279v1 |
http://arxiv.org/pdf/1708.07279v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-discrete-and-neural-features-for |
Repo | |
Framework | |
Efficient Contextual Bandits in Non-stationary Worlds
Title | Efficient Contextual Bandits in Non-stationary Worlds |
Authors | Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford |
Abstract | Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a change in distribution. We analyze various standard notions of regret suited to non-stationary environments for these algorithms, including interval regret, switching regret, and dynamic regret. When competing with the best policy at each time, one of our algorithms achieves regret $\mathcal{O}(\sqrt{ST})$ if there are $T$ rounds with $S$ stationary periods, or more generally $\mathcal{O}(\Delta^{1/3}T^{2/3})$ where $\Delta$ is some non-stationarity measure. These results almost match the optimal guarantees achieved by an inefficient baseline that is a variant of the classic Exp4 algorithm. The dynamic regret result is also the first one for efficient and fully adversarial contextual bandit. Furthermore, while the results above require tuning a parameter based on the unknown quantity $S$ or $\Delta$, we also develop a parameter free algorithm achieving regret $\min{S^{1/4}T^{3/4}, \Delta^{1/5}T^{4/5}}$. This improves and generalizes the best existing result $\Delta^{0.18}T^{0.82}$ by Karnin and Anava (2016) which only holds for the two-armed bandit problem. |
Tasks | Multi-Armed Bandits |
Published | 2017-08-05 |
URL | http://arxiv.org/abs/1708.01799v4 |
http://arxiv.org/pdf/1708.01799v4.pdf | |
PWC | https://paperswithcode.com/paper/efficient-contextual-bandits-in-non |
Repo | |
Framework | |