Paper Group ANR 404
Reinforcement Learning with Policy Mixture Model for Temporal Point Processes Clustering. Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes. Deep Learning in Alzheimer’s disease: Diagnostic Classification and Prognostic Prediction using Neuroimaging Data. Parameter Efficient Training of Deep Convolutio …
Reinforcement Learning with Policy Mixture Model for Temporal Point Processes Clustering
Title | Reinforcement Learning with Policy Mixture Model for Temporal Point Processes Clustering |
Authors | Weichang Wu, Junchi Yan, Xiaokang Yang, Hongyuan Zha |
Abstract | Temporal point process is an expressive tool for modeling event sequences over time. In this paper, we take a reinforcement learning view whereby the observed sequences are assumed to be generated from a mixture of latent policies. The purpose is to cluster the sequences with different temporal patterns into the underlying policies while learning each of the policy model. The flexibility of our model lies in: i) all the components are networks including the policy network for modeling the intensity function of temporal point process; ii) to handle varying-length event sequences, we resort to inverse reinforcement learning by decomposing the observed sequence into states (RNN hidden embedding of history) and actions (time interval to next event) in order to learn the reward function, thus achieving better performance or increasing efficiency compared to existing methods using rewards over the entire sequence such as log-likelihood or Wasserstein distance. We adopt an expectation-maximization framework with the E-step estimating the cluster labels for each sequence, and the M-step aiming to learn the respective policy. Extensive experiments show the efficacy of our method against state-of-the-arts. |
Tasks | Point Processes |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12345v3 |
https://arxiv.org/pdf/1905.12345v3.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-with-policy-mixture |
Repo | |
Framework | |
Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes
Title | Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes |
Authors | Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang |
Abstract | We propose a new class of structured methods for Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes. We successfully apply DPPMCs to problems involving nonisotropic distributions arising in guided evolution strategy (GES) methods for RL, CMA-ES techniques and trust region algorithms for blackbox optimization, improving state-of-the-art in all these settings. In particular, we show that DPPMCs drastically improve exploration profiles of the existing evolution strategy algorithms. We further confirm our results, analyzing random feature map estimators for Gaussian mixture kernels. We provide theoretical justification of our empirical results, showing a connection between DPPMCs and structured orthogonal MC methods for isotropic distributions. |
Tasks | Point Processes |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12667v1 |
https://arxiv.org/pdf/1905.12667v1.pdf | |
PWC | https://paperswithcode.com/paper/structured-monte-carlo-sampling-for |
Repo | |
Framework | |
Deep Learning in Alzheimer’s disease: Diagnostic Classification and Prognostic Prediction using Neuroimaging Data
Title | Deep Learning in Alzheimer’s disease: Diagnostic Classification and Prognostic Prediction using Neuroimaging Data |
Authors | Taeho Jo, Kwangsik Nho, Andrew J. Saykin |
Abstract | Deep learning has shown outstanding performance in identifying intricate structures in complex high-dimensional data, especially in the domain of computer vision. The application of deep learning to early detection and automated classification of Alzheimer’s disease (AD) has recently gained considerable attention, as rapid progress in neuroimaging techniques has generated large-scale multimodal neuroimaging data. A systematic review of publications using deep learning approaches and neuroimaging data for diagnostic classification of AD was performed. A PubMed and Google Scholar search was used to identify deep learning papers on AD published between January 2013 and July 2018. These papers were reviewed, evaluated, and classified by algorithm and neuroimaging type, and the findings were summarized. Of 16 studies meeting full inclusion criteria, 4 used a combination of deep learning and traditional machine learning approaches, and 12 used only deep learning approaches. The combination of traditional machine learning for classification and stacked auto-encoder (SAE) for feature selection produced accuracies of up to 98.8% for AD classification and 83.7% for prediction of conversion from mild cognitive impairment (MCI), a prodromal stage of AD, to AD. Deep learning approaches, such as convolutional neural network (CNN) or recurrent neural network (RNN), that use neuroimaging data without preprocessing for feature selection have yielded accuracies of up to 96.0% for AD classification and 84.2% for MCI conversion prediction. The best classification performance was obtained when multimodal neuroimaging and fluid biomarkers were combined. AD research that uses deep learning is still evolving, improving performance by incorporating additional hybrid data types, increasing transparency with explainable approaches that add knowledge of specific disease-related features and mechanisms. |
Tasks | Feature Selection |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.00931v4 |
https://arxiv.org/pdf/1905.00931v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-in-alzheimers-disease |
Repo | |
Framework | |
Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization
Title | Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization |
Authors | Hesham Mostafa, Xin Wang |
Abstract | Modern deep neural networks are typically highly overparameterized. Pruning techniques are able to remove a significant fraction of network parameters with little loss in accuracy. Recently, techniques based on dynamic reallocation of non-zero parameters have emerged, allowing direct training of sparse networks without having to pre-train a large dense model. Here we present a novel dynamic sparse reparameterization method that addresses the limitations of previous techniques such as high computational cost and the need for manual configuration of the number of free parameters allocated to each layer. We evaluate the performance of dynamic reallocation methods in training deep convolutional networks and show that our method outperforms previous static and dynamic reparameterization methods, yielding the best accuracy for a fixed parameter budget, on par with accuracies obtained by iteratively pruning a pre-trained dense model. We further investigated the mechanisms underlying the superior generalization performance of the resultant sparse networks. We found that neither the structure, nor the initialization of the non-zero parameters were sufficient to explain the superior performance. Rather, effective learning crucially depended on the continuous exploration of the sparse network structure space during training. Our work suggests that exploring structural degrees of freedom during training is more effective than adding extra parameters to the network. |
Tasks | |
Published | 2019-02-15 |
URL | https://arxiv.org/abs/1902.05967v3 |
https://arxiv.org/pdf/1902.05967v3.pdf | |
PWC | https://paperswithcode.com/paper/parameter-efficient-training-of-deep |
Repo | |
Framework | |
Locally Linear Embedding and fMRI feature selection in psychiatric classification
Title | Locally Linear Embedding and fMRI feature selection in psychiatric classification |
Authors | Gagan Sidhu |
Abstract | Background: Functional magnetic resonance imaging (fMRI) provides non-invasive measures of neuronal activity using an endogenous Blood Oxygenation-Level Dependent (BOLD) contrast. This article introduces a nonlinear dimensionality reduction (Locally Linear Embedding) to extract informative measures of the underlying neuronal activity from BOLD time-series. The method is validated using the Leave-One-Out-Cross-Validation (LOOCV) accuracy of classifying psychiatric diagnoses using resting-state and task-related fMRI. Methods: Locally Linear Embedding of BOLD time-series (into each voxel’s respective tensor) was used to optimise feature selection. This uses Gau\ss’ Principle of Least Constraint to conserve quantities over both space and time. This conservation was assessed using LOOCV to greedily select time points in an incremental fashion on training data that was categorised in terms of psychiatric diagnoses. Findings: The embedded fMRI gave highly diagnostic performances (> 80%) on eleven publicly-available datasets containing healthy controls and patients with either Schizophrenia, Attention-Deficit Hyperactivity Disorder (ADHD), or Autism Spectrum Disorder (ASD). Furthermore, unlike the original fMRI data before or after using Principal Component Analysis (PCA) for artefact reduction, the embedded fMRI furnished significantly better than chance classification (defined as the majority class proportion) on ten of eleven datasets Interpretation: Locally Linear Embedding appears to be a useful feature extraction procedure that retains important information about patterns of brain activity distinguishing among psychiatric cohorts. |
Tasks | Dimensionality Reduction, Feature Selection, Time Series |
Published | 2019-08-17 |
URL | https://arxiv.org/abs/1908.06319v9 |
https://arxiv.org/pdf/1908.06319v9.pdf | |
PWC | https://paperswithcode.com/paper/locally-linear-embedding-and-fmri-feature |
Repo | |
Framework | |
A Critical Note on the Evaluation of Clustering Algorithms
Title | A Critical Note on the Evaluation of Clustering Algorithms |
Authors | Tiantian Zhang, Li Zhong, Bo Yuan |
Abstract | Experimental evaluation is a major research methodology for investigating clustering algorithms and many other machine learning algorithms. For this purpose, a number of benchmark datasets have been widely used in the literature and their quality plays a key role on the value of the research work. However, in most of the existing studies, little attention has been paid to the properties of the datasets and they are often regarded as black-box problems. For example, it is common to use datasets intended for classification in clustering research and assume class la-bels as the ground truth for judging the quality of cluster-ing. In our work, with the help of advanced visualization and dimension reduction techniques, we show that this practice may seriously compromise the research quality and produce misleading results. We suggest that the applicability of existing benchmark datasets should be carefully revisited and significant efforts need to be devoted to improving the current practice of experimental evaluation of clustering algorithms to ensure an essential match between algorithms and problems. |
Tasks | Dimensionality Reduction |
Published | 2019-08-10 |
URL | https://arxiv.org/abs/1908.03782v2 |
https://arxiv.org/pdf/1908.03782v2.pdf | |
PWC | https://paperswithcode.com/paper/a-critical-note-on-the-evaluation-of |
Repo | |
Framework | |
An Information-Theoretic Approach to Minimax Regret in Partial Monitoring
Title | An Information-Theoretic Approach to Minimax Regret in Partial Monitoring |
Authors | Tor Lattimore, Csaba Szepesvari |
Abstract | We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then generalise the information-theoretic tools of Russo and Van Roy (2016) for proving Bayesian regret bounds and combine them with the minimax theorem to derive minimax regret bounds for various partial monitoring settings. The highlight is a clean analysis of non-degenerate easy' and hard’ finite partial monitoring, with new regret bounds that are independent of arbitrarily large game-dependent constants. The power of the generalised machinery is further demonstrated by proving that the minimax regret for k-armed adversarial bandits is at most sqrt{2kn}, improving on existing results by a factor of 2. Finally, we provide a simple analysis of the cops and robbers game, also improving best known constants. |
Tasks | |
Published | 2019-02-01 |
URL | https://arxiv.org/abs/1902.00470v2 |
https://arxiv.org/pdf/1902.00470v2.pdf | |
PWC | https://paperswithcode.com/paper/an-information-theoretic-approach-to-minimax |
Repo | |
Framework | |
Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates
Title | Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates |
Authors | Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu |
Abstract | We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local approximation errors. We address this by proposing an algorithm that adaptively switches between TD and MC in each state, thus mitigating the propagation of errors. Our method is based on learned confidence intervals that detect biases of TD estimates. We demonstrate in a variety of policy evaluation tasks that this simple adaptive algorithm performs competitively with the best approach in hindsight, suggesting that learned confidence intervals are a powerful technique for adapting policy evaluation to use TD or MC returns in a data-driven way. |
Tasks | |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.07987v1 |
https://arxiv.org/pdf/1906.07987v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-temporal-difference-learning-for |
Repo | |
Framework | |
Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification
Title | Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification |
Authors | Leo L Duan |
Abstract | High dimensional data often contain multiple facets, and several clustering patterns can co-exist under different variable subspaces, also known as the views. While multi-view clustering algorithms were proposed, the uncertainty quantification remains difficult — a particular challenge is in the high complexity of estimating the cluster assignment probability under each view, and sharing information among views. In this article, we propose an approximate Bayes approach — treating the similarity matrices generated over the views as rough first-stage estimates for the co-assignment probabilities; in its Kullback-Leibler neighborhood, we obtain a refined low-rank matrix, formed by the pairwise product of simplex coordinates. Interestingly, each simplex coordinate directly encodes the cluster assignment uncertainty. For multi-view clustering, we let each view draw a parameterization from a few candidates, leading to dimension reduction. With high model flexibility, the estimation can be efficiently carried out as a continuous optimization problem, hence enjoys gradient-based computation. The theory establishes the connection of this model to a random partition distribution under multiple views. Compared to single-view clustering approaches, substantially more interpretable results are obtained when clustering brains from a human traumatic brain injury study, using high-dimensional gene expression data. KEY WORDS: Co-regularized Clustering, Consensus, PAC-Bayes, Random Cluster Graph, Variable Selection |
Tasks | Dimensionality Reduction |
Published | 2019-03-21 |
URL | https://arxiv.org/abs/1903.09029v2 |
https://arxiv.org/pdf/1903.09029v2.pdf | |
PWC | https://paperswithcode.com/paper/latent-simplex-position-model-high |
Repo | |
Framework | |
Speech, Head, and Eye-based Cues for Continuous Affect Prediction
Title | Speech, Head, and Eye-based Cues for Continuous Affect Prediction |
Authors | Jonny O’Dwyer |
Abstract | Continuous affect prediction involves the discrete time-continuous regression of affect dimensions. Dimensions to be predicted often include arousal and valence. Continuous affect prediction researchers are now embracing multimodal model input. This provides motivation for researchers to investigate previously unexplored affective cues. Speech-based cues have traditionally received the most attention for affect prediction, however, non-verbal inputs have significant potential to increase the performance of affective computing systems and in addition, allow affect modelling in the absence of speech. However, non-verbal inputs that have received little attention for continuous affect prediction include eye and head-based cues. The eyes are involved in emotion displays and perception while head-based cues have been shown to contribute to emotion conveyance and perception. Additionally, these cues can be estimated non-invasively from video, using modern computer vision tools. This work exploits this gap by comprehensively investigating head and eye-based features and their combination with speech for continuous affect prediction. Hand-crafted, automatically generated and CNN-learned features from these modalities will be investigated for continuous affect prediction. The highest performing feature sets and feature set combinations will answer how effective these features are for the prediction of an individual’s affective state. |
Tasks | |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.09919v2 |
https://arxiv.org/pdf/1907.09919v2.pdf | |
PWC | https://paperswithcode.com/paper/speech-head-and-eye-based-cues-for-continuous |
Repo | |
Framework | |
SPoC: Search-based Pseudocode to Code
Title | SPoC: Search-based Pseudocode to Code |
Authors | Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy Liang |
Abstract | We consider the task of mapping pseudocode to long programs that are functionally correct. Given test cases as a mechanism to validate programs, we search over the space of possible translations of the pseudocode to find a program that passes the validation. However, without proper credit assignment to localize the sources of program failures, it is difficult to guide search toward more promising programs. We propose to perform credit assignment based on signals from compilation errors, which constitute 88.7% of program failures. Concretely, we treat the translation of each pseudocode line as a discrete portion of the program, and whenever a synthesized program fails to compile, an error localization method tries to identify the portion of the program responsible for the failure. We then focus search over alternative translations of the pseudocode for those portions. For evaluation, we collected the SPoC dataset (Search-based Pseudocode to Code) containing 18,356 programs with human-authored pseudocode and test cases. Under a budget of 100 program compilations, performing search improves the synthesis success rate over using the top-one translation of the pseudocode from 25.6% to 44.7%. |
Tasks | |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.04908v1 |
https://arxiv.org/pdf/1906.04908v1.pdf | |
PWC | https://paperswithcode.com/paper/spoc-search-based-pseudocode-to-code |
Repo | |
Framework | |
Likelihood Contribution based Multi-scale Architecture for Generative Flows
Title | Likelihood Contribution based Multi-scale Architecture for Generative Flows |
Authors | Hari Prasanna Das, Pieter Abbeel, Costas J. Spanos |
Abstract | Deep generative modeling using flows has gained popularity owing to the tractable exact log-likelihood estimation with efficient training and synthesis process. However, flow models suffer from the challenge of having high dimensional latent space, same in dimension as the input space. An effective solution to the above challenge as proposed by Dinh et al. (2016) is a multi-scale architecture, which is based on iterative early factorization of a part of the total dimensions at regular intervals. Prior works on generative flows involving a multi-scale architecture perform the dimension factorization based on a static masking. We propose a novel multi-scale architecture that performs data dependent factorization to decide which dimensions should pass through more flow layers. To facilitate the same, we introduce a heuristic based on the contribution of each dimension to the total log-likelihood which encodes the importance of the dimensions. Our proposed heuristic is readily obtained as part of the flow training process, enabling versatile implementation of our likelihood contribution based multi-scale architecture for generic flow models. We present such an implementation for the original flow introduced in Dinh et al. (2016), and demonstrate improvements in log-likelihood score and sampling quality on standard image benchmarks. We also conduct ablation studies to compare proposed method with other options for dimension factorization. |
Tasks | Dimensionality Reduction |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01686v2 |
https://arxiv.org/pdf/1908.01686v2.pdf | |
PWC | https://paperswithcode.com/paper/dimensionality-reduction-flows |
Repo | |
Framework | |
Gaussian Process Modulated Cox Processes under Linear Inequality Constraints
Title | Gaussian Process Modulated Cox Processes under Linear Inequality Constraints |
Authors | Andrés F. López-Lopera, ST John, Nicolas Durrande |
Abstract | Gaussian process (GP) modulated Cox processes are widely used to model point patterns. Existing approaches require a mapping (link function) between the unconstrained GP and the positive intensity function. This commonly yields solutions that do not have a closed form or that are restricted to specific covariance functions. We introduce a novel finite approximation of GP-modulated Cox processes where positiveness conditions can be imposed directly on the GP, with no restrictions on the covariance function. Our approach can also ensure other types of inequality constraints (e.g. monotonicity, convexity), resulting in more versatile models that can be used for other classes of point processes (e.g. renewal processes). We demonstrate on both synthetic and real-world data that our framework accurately infers the intensity functions. Where monotonicity is a feature of the process, our ability to include this in the inference improves results. |
Tasks | Point Processes |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.10974v1 |
http://arxiv.org/pdf/1902.10974v1.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-process-modulated-cox-processes |
Repo | |
Framework | |
Sample Complexity of Learning Mixtures of Sparse Linear Regressions
Title | Sample Complexity of Learning Mixtures of Sparse Linear Regressions |
Authors | Akshay Krishnamurthy, Arya Mazumdar, Andrew McGregor, Soumyabrata Pal |
Abstract | In the problem of learning mixtures of linear regressions, the goal is to learn a collection of signal vectors from a sequence of (possibly noisy) linear measurements, where each measurement is evaluated on an unknown signal drawn uniformly from this collection. This setting is quite expressive and has been studied both in terms of practical applications and for the sake of establishing theoretical guarantees. In this paper, we consider the case where the signal vectors are sparse; this generalizes the popular compressed sensing paradigm. We improve upon the state-of-the-art results as follows: In the noisy case, we resolve an open question of Yin et al. (IEEE Transactions on Information Theory, 2019) by showing how to handle collections of more than two vectors and present the first robust reconstruction algorithm, i.e., if the signals are not perfectly sparse, we still learn a good sparse approximation of the signals. In the noiseless case, as well as in the noisy case, we show how to circumvent the need for a restrictive assumption required in the previous work. Our techniques are quite different from those in the previous work: for the noiseless case, we rely on a property of sparse polynomials and for the noisy case, we provide new connections to learning Gaussian mixtures and use ideas from the theory of error-correcting codes. |
Tasks | |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.14106v1 |
https://arxiv.org/pdf/1910.14106v1.pdf | |
PWC | https://paperswithcode.com/paper/sample-complexity-of-learning-mixtures-of |
Repo | |
Framework | |
On the Veracity of Cyber Intrusion Alerts Synthesized by Generative Adversarial Networks
Title | On the Veracity of Cyber Intrusion Alerts Synthesized by Generative Adversarial Networks |
Authors | Christopher Sweet, Stephen Moskal, Shanchieh Jay Yang |
Abstract | Recreating cyber-attack alert data with a high level of fidelity is challenging due to the intricate interaction between features, non-homogeneity of alerts, and potential for rare yet critical samples. Generative Adversarial Networks (GANs) have been shown to effectively learn complex data distributions with the intent of creating increasingly realistic data. This paper presents the application of GANs to cyber-attack alert data and shows that GANs not only successfully learn to generate realistic alerts, but also reveal feature dependencies within alerts. This is accomplished by reviewing the intersection of histograms for varying alert-feature combinations between the ground truth and generated datsets. Traditional statistical metrics, such as conditional and joint entropy, are also employed to verify the accuracy of these dependencies. Finally, it is shown that a Mutual Information constraint on the network can be used to increase the generation of low probability, critical, alert values. By mapping alerts to a set of attack stages it is shown that the output of these low probability alerts has a direct contextual meaning for Cyber Security analysts. Overall, this work provides the basis for generating new cyber intrusion alerts and provides evidence that synthesized alerts emulate critical dependencies from the source dataset. |
Tasks | |
Published | 2019-08-03 |
URL | https://arxiv.org/abs/1908.01219v1 |
https://arxiv.org/pdf/1908.01219v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-veracity-of-cyber-intrusion-alerts |
Repo | |
Framework | |