May 5, 2019

2903 words 14 mins read

Paper Group ANR 513

On the Influence of Momentum Acceleration on Online Learning. Sparse Subspace Clustering via Diffusion Process. An MM Algorithm for Split Feasibility Problems. Scalable Text Mining with Sparse Generative Models. A General Family of Trimmed Estimators for Robust High-dimensional Data Analysis. Top-$K$ Ranking from Pairwise Comparisons: When Spectral …

On the Influence of Momentum Acceleration on Online Learning


Title	On the Influence of Momentum Acceleration on Online Learning
Authors	Kun Yuan, Bicheng Ying, Ali H. Sayed
Abstract	The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known bene ts of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learn- ing in the presence of persistent gradient noise. From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for non-differentiable and non-convex problems.
Tasks
Published	2016-03-14
URL	http://arxiv.org/abs/1603.04136v4
PDF	http://arxiv.org/pdf/1603.04136v4.pdf
PWC	https://paperswithcode.com/paper/on-the-influence-of-momentum-acceleration-on
Repo
Framework

Sparse Subspace Clustering via Diffusion Process


Title	Sparse Subspace Clustering via Diffusion Process
Authors	Qilin Li, Ling Li, Wanquan Liu
Abstract	Subspace clustering refers to the problem of clustering high-dimensional data that lie in a union of low-dimensional subspaces. State-of-the-art subspace clustering methods are based on the idea of expressing each data point as a linear combination of other data points while regularizing the matrix of coefficients with L1, L2 or nuclear norms for a sparse solution. L1 regularization is guaranteed to give a subspace-preserving affinity (i.e., there are no connections between points from different subspaces) under broad theoretical conditions, but the clusters may not be fully connected. L2 and nuclear norm regularization often improve connectivity, but give a subspace-preserving affinity only for independent subspaces. Mixed L1, L2 and nuclear norm regularization could offer a balance between the subspace-preserving and connectedness properties, but this comes at the cost of increased computational complexity. This paper focuses on using L1 norm and alleviating the corresponding connectivity problem by a simple yet efficient diffusion process on subspace affinity graphs. Without adding any tuning parameter , our method can achieve state-of-the-art clustering performance on Hopkins 155 and Extended Yale B data sets.
Tasks
Published	2016-08-05
URL	http://arxiv.org/abs/1608.01793v1
PDF	http://arxiv.org/pdf/1608.01793v1.pdf
PWC	https://paperswithcode.com/paper/sparse-subspace-clustering-via-diffusion
Repo
Framework

An MM Algorithm for Split Feasibility Problems


Title	An MM Algorithm for Split Feasibility Problems
Authors	Jason Xu, Eric C. Chi, Meng Yang, Kenneth Lange
Abstract	The classical multi-set split feasibility problem seeks a point in the intersection of finitely many closed convex domain constraints, whose image under a linear mapping also lies in the intersection of finitely many closed convex range constraints. Split feasibility generalizes important inverse problems including convex feasibility, linear complementarity, and regression with constraint sets. When a feasible point does not exist, solution methods that proceed by minimizing a proximity function can be used to obtain optimal approximate solutions to the problem. We present an extension of the proximity function approach that generalizes the linear split feasibility problem to allow for non-linear mappings. Our algorithm is based on the principle of majorization-minimization, is amenable to quasi-Newton acceleration, and comes complete with convergence guarantees under mild assumptions. Furthermore, we show that the Euclidean norm appearing in the proximity function of the non-linear split feasibility problem can be replaced by arbitrary Bregman divergences. We explore several examples illustrating the merits of non-linear formulations over the linear case, with a focus on optimization for intensity-modulated radiation therapy.
Tasks
Published	2016-12-16
URL	http://arxiv.org/abs/1612.05614v2
PDF	http://arxiv.org/pdf/1612.05614v2.pdf
PWC	https://paperswithcode.com/paper/an-mm-algorithm-for-split-feasibility
Repo
Framework

Scalable Text Mining with Sparse Generative Models


Title	Scalable Text Mining with Sparse Generative Models
Authors	Antti Puurula
Abstract	The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods. This thesis proposes a solution to scalable text mining: generative models combined with sparse computation. A unifying formalization for generative text models is defined, bringing together research traditions that have used formally equivalent models, but ignored parallel developments. This framework allows the use of methods developed in different processing tasks such as retrieval and classification, yielding effective solutions across different text mining tasks. Sparse computation using inverted indices is proposed for inference on probabilistic models. This reduces the computational complexity of the common text mining operations according to sparsity, yielding probabilistic models with the scalability of modern search engines. The proposed combination provides sparse generative models: a solution for text mining that is general, effective, and scalable. Extensive experimentation on text classification and ranked retrieval datasets are conducted, showing that the proposed solution matches or outperforms the leading task-specific methods in effectiveness, with a order of magnitude decrease in classification times for Wikipedia article categorization with a million classes. The developed methods were further applied in two 2014 Kaggle data mining prize competitions with over a hundred competing teams, earning first and second places.
Tasks	Text Classification
Published	2016-02-07
URL	http://arxiv.org/abs/1602.02332v1
PDF	http://arxiv.org/pdf/1602.02332v1.pdf
PWC	https://paperswithcode.com/paper/scalable-text-mining-with-sparse-generative
Repo
Framework

A General Family of Trimmed Estimators for Robust High-dimensional Data Analysis


Title	A General Family of Trimmed Estimators for Robust High-dimensional Data Analysis
Authors	Eunho Yang, Aurelie Lozano, Aleksandr Aravkin
Abstract	We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear models and graphical models, using possibly non-convex loss functions. We present a general analysis of their statistical convergence rates and consistency, and then take a closer look at the trimmed versions of the Lasso and Graphical Lasso estimators as special cases. On the optimization side, we show how to extend algorithms for M-estimators to fit trimmed variants and provide guarantees on their numerical convergence. The generality and competitive performance of high-dimensional trimmed estimators are illustrated numerically on both simulated and real-world genomics data.
Tasks
Published	2016-05-26
URL	http://arxiv.org/abs/1605.08299v2
PDF	http://arxiv.org/pdf/1605.08299v2.pdf
PWC	https://paperswithcode.com/paper/a-general-family-of-trimmed-estimators-for
Repo
Framework

Top-$K$ Ranking from Pairwise Comparisons: When Spectral Ranking is Optimal


Title	Top-$K$ Ranking from Pairwise Comparisons: When Spectral Ranking is Optimal
Authors	Minje Jang, Sunghyun Kim, Changho Suh, Sewoong Oh
Abstract	We explore the top-$K$ rank aggregation problem. Suppose a collection of items is compared in pairs repeatedly, and we aim to recover a consistent ordering that focuses on the top-$K$ ranked items based on partially revealed preference information. We investigate the Bradley-Terry-Luce model in which one ranks items according to their perceived utilities modeled as noisy observations of their underlying true utilities. Our main contributions are two-fold. First, in a general comparison model where item pairs to compare are given a priori, we attain an upper and lower bound on the sample size for reliable recovery of the top-$K$ ranked items. Second, more importantly, extending the result to a random comparison model where item pairs to compare are chosen independently with some probability, we show that in slightly restricted regimes, the gap between the derived bounds reduces to a constant factor, hence reveals that a spectral method can achieve the minimax optimality on the (order-wise) sample size required for top-$K$ ranking. That is to say, we demonstrate a spectral method alone to be sufficient to achieve the optimality and advantageous in terms of computational complexity, as it does not require an additional stage of maximum likelihood estimation that a state-of-the-art scheme employs to achieve the optimality. We corroborate our main results by numerical experiments.
Tasks
Published	2016-03-14
URL	http://arxiv.org/abs/1603.04153v1
PDF	http://arxiv.org/pdf/1603.04153v1.pdf
PWC	https://paperswithcode.com/paper/top-k-ranking-from-pairwise-comparisons-when
Repo
Framework

Nonstationary Distance Metric Learning


Title	Nonstationary Distance Metric Learning
Authors	Kristjan Greenewald, Stephen Kelley, Alfred Hero
Abstract	Recent work in distance metric learning has focused on learning transformations of data that best align with provided sets of pairwise similarity and dissimilarity constraints. The learned transformations lead to improved retrieval, classification, and clustering algorithms due to the better adapted distance or similarity measures. Here, we introduce the problem of learning these transformations when the underlying constraint generation process is nonstationary. This nonstationarity can be due to changes in either the ground-truth clustering used to generate constraints or changes to the feature subspaces in which the class structure is apparent. We propose and evaluate COMID-SADL, an adaptive, online approach for learning and tracking optimal metrics as they change over time that is highly robust to a variety of nonstationary behaviors in the changing metric. We demonstrate COMID-SADL on both real and synthetic data sets and show significant performance improvements relative to previously proposed batch and online distance metric learning algorithms.
Tasks	Metric Learning
Published	2016-03-11
URL	http://arxiv.org/abs/1603.03678v2
PDF	http://arxiv.org/pdf/1603.03678v2.pdf
PWC	https://paperswithcode.com/paper/nonstationary-distance-metric-learning
Repo
Framework

Endgame Analysis of Dou Shou Qi


Title	Endgame Analysis of Dou Shou Qi
Authors	Jan N. van Rijn, Jonathan K. Vis
Abstract	Dou Shou Qi is a game in which two players control a number of pieces, each of them aiming to move one of their pieces onto a given square. We implemented an engine for analyzing the game. Moreover, we created a series of endgame tablebases containing all configurations with up to four pieces. These tablebases are the first steps towards theoretically solving the game. Finally, we constructed decision trees based on the endgame tablebases. In this note we report on some interesting patterns.
Tasks
Published	2016-04-25
URL	http://arxiv.org/abs/1604.07312v1
PDF	http://arxiv.org/pdf/1604.07312v1.pdf
PWC	https://paperswithcode.com/paper/endgame-analysis-of-dou-shou-qi
Repo
Framework

GOTM: a Goal-oriented Framework for Capturing Uncertainty of Medical Treatments


Title	GOTM: a Goal-oriented Framework for Capturing Uncertainty of Medical Treatments
Authors	Davoud Mougouei, David Powers
Abstract	It has been widely recognized that uncertainty is an inevitable aspect of diagnosis and treatment of medical disorders. Such uncertainties hence, need to be considered in computerized medical models. The existing medical modeling techniques however, have mainly focused on capturing uncertainty associated with diagnosis of medical disorders while ignoring uncertainty of treatments. To tackle this issue, we have proposed using a fuzzy-based modeling and description technique for capturing uncertainties in treatment plans. We have further contributed a formal framework which allows for goal-oriented modeling and analysis of medical treatments.
Tasks
Published	2016-12-09
URL	http://arxiv.org/abs/1612.02904v1
PDF	http://arxiv.org/pdf/1612.02904v1.pdf
PWC	https://paperswithcode.com/paper/gotm-a-goal-oriented-framework-for-capturing
Repo
Framework

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling


Title	Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling
Authors	Gábor Berend
Abstract	In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e.~150 sentences per language.
Tasks	Named Entity Recognition, Part-Of-Speech Tagging, Word Embeddings
Published	2016-12-21
URL	http://arxiv.org/abs/1612.07130v1
PDF	http://arxiv.org/pdf/1612.07130v1.pdf
PWC	https://paperswithcode.com/paper/sparse-coding-of-neural-word-embeddings-for
Repo
Framework

Classification of Big Data with Application to Imaging Genetics


Title	Classification of Big Data with Application to Imaging Genetics
Authors	Magnus O. Ulfarsson, Frosti Palsson, Jakob Sigurdsson, Johannes R. Sveinsson
Abstract	Big data applications, such as medical imaging and genetics, typically generate datasets that consist of few observations n on many more variables p, a scenario that we denote as p»n. Traditional data processing methods are often insufficient for extracting information out of big data. This calls for the development of new algorithms that can deal with the size, complexity, and the special structure of such datasets. In this paper, we consider the problem of classifying p»n data and propose a classification method based on linear discriminant analysis (LDA). Traditional LDA depends on the covariance estimate of the data, but when p»n the sample covariance estimate is singular. The proposed method estimates the covariance by using a sparse version of noisy principal component analysis (nPCA). The use of sparsity in this setting aims at automatically selecting variables that are relevant for classification. In experiments, the new method is compared to state-of-the art methods for big data problems using both simulated datasets and imaging genetics datasets.
Tasks
Published	2016-05-16
URL	http://arxiv.org/abs/1605.04932v1
PDF	http://arxiv.org/pdf/1605.04932v1.pdf
PWC	https://paperswithcode.com/paper/classification-of-big-data-with-application
Repo
Framework

Measuring Adverse Drug Effects on Multimorbity using Tractable Bayesian Networks


Title	Measuring Adverse Drug Effects on Multimorbity using Tractable Bayesian Networks
Authors	Jessa Bekker, Arjen Hommersom, Martijn Lappenschaar, Jesse Davis
Abstract	Managing patients with multimorbidity often results in polypharmacy: the prescription of multiple drugs. However, the long-term effects of specific combinations of drugs and diseases are typically unknown. In particular, drugs prescribed for one condition may result in adverse effects for the other. To investigate which types of drugs may affect the further progression of multimorbidity, we query models of diseases and prescriptions that are learned from primary care data. State-of-the-art tractable Bayesian network representations, on which such complex queries can be computed efficiently, are employed for these large medical networks. Our results confirm that prescriptions may lead to unintended negative consequences in further development of multimorbidity in cardiovascular diseases. Moreover, a drug treatment for one disease group may affect diseases of another group.
Tasks
Published	2016-12-09
URL	http://arxiv.org/abs/1612.03055v1
PDF	http://arxiv.org/pdf/1612.03055v1.pdf
PWC	https://paperswithcode.com/paper/measuring-adverse-drug-effects-on
Repo
Framework

Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula


Title	Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula
Authors	Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Thibault Lesieur, Lenka Zdeborova
Abstract	Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows to express the minimal mean-square-error and to characterize the detectability phase transitions in a large set of estimation problems ranging from community detection to sparse PCA. We also show that for a large set of parameters, an iterative algorithm called approximate message-passing is Bayes optimal. There exists, however, a gap between what currently known polynomial algorithms can do and what is expected information theoretically. Additionally, the proof technique has an interest of its own and exploits three essential ingredients: the interpolation method introduced in statistical physics by Guerra, the analysis of the approximate message-passing algorithm and the theory of spatial coupling and threshold saturation in coding. Our approach is generic and applicable to other open problems in statistical estimation where heuristic statistical physics predictions are available.
Tasks	Community Detection
Published	2016-06-13
URL	http://arxiv.org/abs/1606.04142v1
PDF	http://arxiv.org/pdf/1606.04142v1.pdf
PWC	https://paperswithcode.com/paper/mutual-information-for-symmetric-rank-one
Repo
Framework

Joint Causal Inference from Multiple Contexts


Title	Joint Causal Inference from Multiple Contexts
Authors	Joris M. Mooij, Sara Magliacane, Tom Claassen
Abstract	The gold standard for discovering causal relations is by means of experimentation. Over the last decades, alternative methods have been proposed that can infer causal relations between variables from certain statistical patterns in purely observational data. We introduce Joint Causal Inference (JCI), a novel approach to causal discovery from multiple data sets from different contexts that elegantly unifies both approaches. JCI is a causal modeling framework rather than a specific algorithm, and it can be implemented using any causal discovery algorithm that can take into account certain background knowledge. JCI can deal with different types of interventions (e.g., perfect, imperfect, stochastic, etc.) in a unified fashion, and does not require knowledge of intervention targets or types in case of interventional data. We explain how several well-known causal discovery algorithms can be seen as addressing special cases of the JCI framework, and we also propose novel implementations that extend existing causal discovery methods for purely observational data to the JCI setting. We evaluate different JCI implementations on synthetic data and on flow cytometry protein expression data and conclude that JCI implementations can considerably outperform state-of-the-art causal discovery algorithms.
Tasks	Causal Discovery, Causal Inference
Published	2016-11-30
URL	https://arxiv.org/abs/1611.10351v5
PDF	https://arxiv.org/pdf/1611.10351v5.pdf
PWC	https://paperswithcode.com/paper/joint-causal-inference-from-multiple-contexts
Repo
Framework

Comments on: “A Random Forest Guided Tour” by G. Biau and E. Scornet


Title	Comments on: “A Random Forest Guided Tour” by G. Biau and E. Scornet
Authors	Sylvain Arlot, Robin Genuer
Abstract	This paper is a comment on the survey paper by Biau and Scornet (2016) about random forests. We focus on the problem of quantifying the impact of each ingredient of random forests on their performance. We show that such a quantification is possible for a simple pure forest , leading to conclusions that could apply more generally. Then, we consider “hold-out” random forests, which are a good middle point between “toy” pure forests and Breiman’s original random forests.
Tasks
Published	2016-04-06
URL	http://arxiv.org/abs/1604.01515v1
PDF	http://arxiv.org/pdf/1604.01515v1.pdf
PWC	https://paperswithcode.com/paper/comments-on-a-random-forest-guided-tour-by-g
Repo
Framework