Paper Group ANR 513
On the Influence of Momentum Acceleration on Online Learning. Sparse Subspace Clustering via Diffusion Process. An MM Algorithm for Split Feasibility Problems. Scalable Text Mining with Sparse Generative Models. A General Family of Trimmed Estimators for Robust High-dimensional Data Analysis. Top-$K$ Ranking from Pairwise Comparisons: When Spectral …
On the Influence of Momentum Acceleration on Online Learning
Title | On the Influence of Momentum Acceleration on Online Learning |
Authors | Kun Yuan, Bicheng Ying, Ali H. Sayed |
Abstract | The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known bene ts of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learn- ing in the presence of persistent gradient noise. From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for non-differentiable and non-convex problems. |
Tasks | |
Published | 2016-03-14 |
URL | http://arxiv.org/abs/1603.04136v4 |
http://arxiv.org/pdf/1603.04136v4.pdf | |
PWC | https://paperswithcode.com/paper/on-the-influence-of-momentum-acceleration-on |
Repo | |
Framework | |
Sparse Subspace Clustering via Diffusion Process
Title | Sparse Subspace Clustering via Diffusion Process |
Authors | Qilin Li, Ling Li, Wanquan Liu |
Abstract | Subspace clustering refers to the problem of clustering high-dimensional data that lie in a union of low-dimensional subspaces. State-of-the-art subspace clustering methods are based on the idea of expressing each data point as a linear combination of other data points while regularizing the matrix of coefficients with L1, L2 or nuclear norms for a sparse solution. L1 regularization is guaranteed to give a subspace-preserving affinity (i.e., there are no connections between points from different subspaces) under broad theoretical conditions, but the clusters may not be fully connected. L2 and nuclear norm regularization often improve connectivity, but give a subspace-preserving affinity only for independent subspaces. Mixed L1, L2 and nuclear norm regularization could offer a balance between the subspace-preserving and connectedness properties, but this comes at the cost of increased computational complexity. This paper focuses on using L1 norm and alleviating the corresponding connectivity problem by a simple yet efficient diffusion process on subspace affinity graphs. Without adding any tuning parameter , our method can achieve state-of-the-art clustering performance on Hopkins 155 and Extended Yale B data sets. |
Tasks | |
Published | 2016-08-05 |
URL | http://arxiv.org/abs/1608.01793v1 |
http://arxiv.org/pdf/1608.01793v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-subspace-clustering-via-diffusion |
Repo | |
Framework | |
An MM Algorithm for Split Feasibility Problems
Title | An MM Algorithm for Split Feasibility Problems |
Authors | Jason Xu, Eric C. Chi, Meng Yang, Kenneth Lange |
Abstract | The classical multi-set split feasibility problem seeks a point in the intersection of finitely many closed convex domain constraints, whose image under a linear mapping also lies in the intersection of finitely many closed convex range constraints. Split feasibility generalizes important inverse problems including convex feasibility, linear complementarity, and regression with constraint sets. When a feasible point does not exist, solution methods that proceed by minimizing a proximity function can be used to obtain optimal approximate solutions to the problem. We present an extension of the proximity function approach that generalizes the linear split feasibility problem to allow for non-linear mappings. Our algorithm is based on the principle of majorization-minimization, is amenable to quasi-Newton acceleration, and comes complete with convergence guarantees under mild assumptions. Furthermore, we show that the Euclidean norm appearing in the proximity function of the non-linear split feasibility problem can be replaced by arbitrary Bregman divergences. We explore several examples illustrating the merits of non-linear formulations over the linear case, with a focus on optimization for intensity-modulated radiation therapy. |
Tasks | |
Published | 2016-12-16 |
URL | http://arxiv.org/abs/1612.05614v2 |
http://arxiv.org/pdf/1612.05614v2.pdf | |
PWC | https://paperswithcode.com/paper/an-mm-algorithm-for-split-feasibility |
Repo | |
Framework | |
Scalable Text Mining with Sparse Generative Models
Title | Scalable Text Mining with Sparse Generative Models |
Authors | Antti Puurula |
Abstract | The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods. This thesis proposes a solution to scalable text mining: generative models combined with sparse computation. A unifying formalization for generative text models is defined, bringing together research traditions that have used formally equivalent models, but ignored parallel developments. This framework allows the use of methods developed in different processing tasks such as retrieval and classification, yielding effective solutions across different text mining tasks. Sparse computation using inverted indices is proposed for inference on probabilistic models. This reduces the computational complexity of the common text mining operations according to sparsity, yielding probabilistic models with the scalability of modern search engines. The proposed combination provides sparse generative models: a solution for text mining that is general, effective, and scalable. Extensive experimentation on text classification and ranked retrieval datasets are conducted, showing that the proposed solution matches or outperforms the leading task-specific methods in effectiveness, with a order of magnitude decrease in classification times for Wikipedia article categorization with a million classes. The developed methods were further applied in two 2014 Kaggle data mining prize competitions with over a hundred competing teams, earning first and second places. |
Tasks | Text Classification |
Published | 2016-02-07 |
URL | http://arxiv.org/abs/1602.02332v1 |
http://arxiv.org/pdf/1602.02332v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-text-mining-with-sparse-generative |
Repo | |
Framework | |
A General Family of Trimmed Estimators for Robust High-dimensional Data Analysis
Title | A General Family of Trimmed Estimators for Robust High-dimensional Data Analysis |
Authors | Eunho Yang, Aurelie Lozano, Aleksandr Aravkin |
Abstract | We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear models and graphical models, using possibly non-convex loss functions. We present a general analysis of their statistical convergence rates and consistency, and then take a closer look at the trimmed versions of the Lasso and Graphical Lasso estimators as special cases. On the optimization side, we show how to extend algorithms for M-estimators to fit trimmed variants and provide guarantees on their numerical convergence. The generality and competitive performance of high-dimensional trimmed estimators are illustrated numerically on both simulated and real-world genomics data. |
Tasks | |
Published | 2016-05-26 |
URL | http://arxiv.org/abs/1605.08299v2 |
http://arxiv.org/pdf/1605.08299v2.pdf | |
PWC | https://paperswithcode.com/paper/a-general-family-of-trimmed-estimators-for |
Repo | |
Framework | |
Top-$K$ Ranking from Pairwise Comparisons: When Spectral Ranking is Optimal
Title | Top-$K$ Ranking from Pairwise Comparisons: When Spectral Ranking is Optimal |
Authors | Minje Jang, Sunghyun Kim, Changho Suh, Sewoong Oh |
Abstract | We explore the top-$K$ rank aggregation problem. Suppose a collection of items is compared in pairs repeatedly, and we aim to recover a consistent ordering that focuses on the top-$K$ ranked items based on partially revealed preference information. We investigate the Bradley-Terry-Luce model in which one ranks items according to their perceived utilities modeled as noisy observations of their underlying true utilities. Our main contributions are two-fold. First, in a general comparison model where item pairs to compare are given a priori, we attain an upper and lower bound on the sample size for reliable recovery of the top-$K$ ranked items. Second, more importantly, extending the result to a random comparison model where item pairs to compare are chosen independently with some probability, we show that in slightly restricted regimes, the gap between the derived bounds reduces to a constant factor, hence reveals that a spectral method can achieve the minimax optimality on the (order-wise) sample size required for top-$K$ ranking. That is to say, we demonstrate a spectral method alone to be sufficient to achieve the optimality and advantageous in terms of computational complexity, as it does not require an additional stage of maximum likelihood estimation that a state-of-the-art scheme employs to achieve the optimality. We corroborate our main results by numerical experiments. |
Tasks | |
Published | 2016-03-14 |
URL | http://arxiv.org/abs/1603.04153v1 |
http://arxiv.org/pdf/1603.04153v1.pdf | |
PWC | https://paperswithcode.com/paper/top-k-ranking-from-pairwise-comparisons-when |
Repo | |
Framework | |
Nonstationary Distance Metric Learning
Title | Nonstationary Distance Metric Learning |
Authors | Kristjan Greenewald, Stephen Kelley, Alfred Hero |
Abstract | Recent work in distance metric learning has focused on learning transformations of data that best align with provided sets of pairwise similarity and dissimilarity constraints. The learned transformations lead to improved retrieval, classification, and clustering algorithms due to the better adapted distance or similarity measures. Here, we introduce the problem of learning these transformations when the underlying constraint generation process is nonstationary. This nonstationarity can be due to changes in either the ground-truth clustering used to generate constraints or changes to the feature subspaces in which the class structure is apparent. We propose and evaluate COMID-SADL, an adaptive, online approach for learning and tracking optimal metrics as they change over time that is highly robust to a variety of nonstationary behaviors in the changing metric. We demonstrate COMID-SADL on both real and synthetic data sets and show significant performance improvements relative to previously proposed batch and online distance metric learning algorithms. |
Tasks | Metric Learning |
Published | 2016-03-11 |
URL | http://arxiv.org/abs/1603.03678v2 |
http://arxiv.org/pdf/1603.03678v2.pdf | |
PWC | https://paperswithcode.com/paper/nonstationary-distance-metric-learning |
Repo | |
Framework | |
Endgame Analysis of Dou Shou Qi
Title | Endgame Analysis of Dou Shou Qi |
Authors | Jan N. van Rijn, Jonathan K. Vis |
Abstract | Dou Shou Qi is a game in which two players control a number of pieces, each of them aiming to move one of their pieces onto a given square. We implemented an engine for analyzing the game. Moreover, we created a series of endgame tablebases containing all configurations with up to four pieces. These tablebases are the first steps towards theoretically solving the game. Finally, we constructed decision trees based on the endgame tablebases. In this note we report on some interesting patterns. |
Tasks | |
Published | 2016-04-25 |
URL | http://arxiv.org/abs/1604.07312v1 |
http://arxiv.org/pdf/1604.07312v1.pdf | |
PWC | https://paperswithcode.com/paper/endgame-analysis-of-dou-shou-qi |
Repo | |
Framework | |
GOTM: a Goal-oriented Framework for Capturing Uncertainty of Medical Treatments
Title | GOTM: a Goal-oriented Framework for Capturing Uncertainty of Medical Treatments |
Authors | Davoud Mougouei, David Powers |
Abstract | It has been widely recognized that uncertainty is an inevitable aspect of diagnosis and treatment of medical disorders. Such uncertainties hence, need to be considered in computerized medical models. The existing medical modeling techniques however, have mainly focused on capturing uncertainty associated with diagnosis of medical disorders while ignoring uncertainty of treatments. To tackle this issue, we have proposed using a fuzzy-based modeling and description technique for capturing uncertainties in treatment plans. We have further contributed a formal framework which allows for goal-oriented modeling and analysis of medical treatments. |
Tasks | |
Published | 2016-12-09 |
URL | http://arxiv.org/abs/1612.02904v1 |
http://arxiv.org/pdf/1612.02904v1.pdf | |
PWC | https://paperswithcode.com/paper/gotm-a-goal-oriented-framework-for-capturing |
Repo | |
Framework | |
Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling
Title | Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling |
Authors | Gábor Berend |
Abstract | In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e.~150 sentences per language. |
Tasks | Named Entity Recognition, Part-Of-Speech Tagging, Word Embeddings |
Published | 2016-12-21 |
URL | http://arxiv.org/abs/1612.07130v1 |
http://arxiv.org/pdf/1612.07130v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-coding-of-neural-word-embeddings-for |
Repo | |
Framework | |
Classification of Big Data with Application to Imaging Genetics
Title | Classification of Big Data with Application to Imaging Genetics |
Authors | Magnus O. Ulfarsson, Frosti Palsson, Jakob Sigurdsson, Johannes R. Sveinsson |
Abstract | Big data applications, such as medical imaging and genetics, typically generate datasets that consist of few observations n on many more variables p, a scenario that we denote as p»n. Traditional data processing methods are often insufficient for extracting information out of big data. This calls for the development of new algorithms that can deal with the size, complexity, and the special structure of such datasets. In this paper, we consider the problem of classifying p»n data and propose a classification method based on linear discriminant analysis (LDA). Traditional LDA depends on the covariance estimate of the data, but when p»n the sample covariance estimate is singular. The proposed method estimates the covariance by using a sparse version of noisy principal component analysis (nPCA). The use of sparsity in this setting aims at automatically selecting variables that are relevant for classification. In experiments, the new method is compared to state-of-the art methods for big data problems using both simulated datasets and imaging genetics datasets. |
Tasks | |
Published | 2016-05-16 |
URL | http://arxiv.org/abs/1605.04932v1 |
http://arxiv.org/pdf/1605.04932v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-of-big-data-with-application |
Repo | |
Framework | |
Measuring Adverse Drug Effects on Multimorbity using Tractable Bayesian Networks
Title | Measuring Adverse Drug Effects on Multimorbity using Tractable Bayesian Networks |
Authors | Jessa Bekker, Arjen Hommersom, Martijn Lappenschaar, Jesse Davis |
Abstract | Managing patients with multimorbidity often results in polypharmacy: the prescription of multiple drugs. However, the long-term effects of specific combinations of drugs and diseases are typically unknown. In particular, drugs prescribed for one condition may result in adverse effects for the other. To investigate which types of drugs may affect the further progression of multimorbidity, we query models of diseases and prescriptions that are learned from primary care data. State-of-the-art tractable Bayesian network representations, on which such complex queries can be computed efficiently, are employed for these large medical networks. Our results confirm that prescriptions may lead to unintended negative consequences in further development of multimorbidity in cardiovascular diseases. Moreover, a drug treatment for one disease group may affect diseases of another group. |
Tasks | |
Published | 2016-12-09 |
URL | http://arxiv.org/abs/1612.03055v1 |
http://arxiv.org/pdf/1612.03055v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-adverse-drug-effects-on |
Repo | |
Framework | |
Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula
Title | Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula |
Authors | Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Thibault Lesieur, Lenka Zdeborova |
Abstract | Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows to express the minimal mean-square-error and to characterize the detectability phase transitions in a large set of estimation problems ranging from community detection to sparse PCA. We also show that for a large set of parameters, an iterative algorithm called approximate message-passing is Bayes optimal. There exists, however, a gap between what currently known polynomial algorithms can do and what is expected information theoretically. Additionally, the proof technique has an interest of its own and exploits three essential ingredients: the interpolation method introduced in statistical physics by Guerra, the analysis of the approximate message-passing algorithm and the theory of spatial coupling and threshold saturation in coding. Our approach is generic and applicable to other open problems in statistical estimation where heuristic statistical physics predictions are available. |
Tasks | Community Detection |
Published | 2016-06-13 |
URL | http://arxiv.org/abs/1606.04142v1 |
http://arxiv.org/pdf/1606.04142v1.pdf | |
PWC | https://paperswithcode.com/paper/mutual-information-for-symmetric-rank-one |
Repo | |
Framework | |
Joint Causal Inference from Multiple Contexts
Title | Joint Causal Inference from Multiple Contexts |
Authors | Joris M. Mooij, Sara Magliacane, Tom Claassen |
Abstract | The gold standard for discovering causal relations is by means of experimentation. Over the last decades, alternative methods have been proposed that can infer causal relations between variables from certain statistical patterns in purely observational data. We introduce Joint Causal Inference (JCI), a novel approach to causal discovery from multiple data sets from different contexts that elegantly unifies both approaches. JCI is a causal modeling framework rather than a specific algorithm, and it can be implemented using any causal discovery algorithm that can take into account certain background knowledge. JCI can deal with different types of interventions (e.g., perfect, imperfect, stochastic, etc.) in a unified fashion, and does not require knowledge of intervention targets or types in case of interventional data. We explain how several well-known causal discovery algorithms can be seen as addressing special cases of the JCI framework, and we also propose novel implementations that extend existing causal discovery methods for purely observational data to the JCI setting. We evaluate different JCI implementations on synthetic data and on flow cytometry protein expression data and conclude that JCI implementations can considerably outperform state-of-the-art causal discovery algorithms. |
Tasks | Causal Discovery, Causal Inference |
Published | 2016-11-30 |
URL | https://arxiv.org/abs/1611.10351v5 |
https://arxiv.org/pdf/1611.10351v5.pdf | |
PWC | https://paperswithcode.com/paper/joint-causal-inference-from-multiple-contexts |
Repo | |
Framework | |
Comments on: “A Random Forest Guided Tour” by G. Biau and E. Scornet
Title | Comments on: “A Random Forest Guided Tour” by G. Biau and E. Scornet |
Authors | Sylvain Arlot, Robin Genuer |
Abstract | This paper is a comment on the survey paper by Biau and Scornet (2016) about random forests. We focus on the problem of quantifying the impact of each ingredient of random forests on their performance. We show that such a quantification is possible for a simple pure forest , leading to conclusions that could apply more generally. Then, we consider “hold-out” random forests, which are a good middle point between “toy” pure forests and Breiman’s original random forests. |
Tasks | |
Published | 2016-04-06 |
URL | http://arxiv.org/abs/1604.01515v1 |
http://arxiv.org/pdf/1604.01515v1.pdf | |
PWC | https://paperswithcode.com/paper/comments-on-a-random-forest-guided-tour-by-g |
Repo | |
Framework | |