Paper Group ANR 432
Partially Observable Markov Decision Process for Recommender Systems. DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression. Convolutional Imputation of Matrix Networks. Structure from Motion on a Sphere. A system of serial computation for classified rules prediction in non-regular ontology trees. Multi-c …
Partially Observable Markov Decision Process for Recommender Systems
Title | Partially Observable Markov Decision Process for Recommender Systems |
Authors | Zhongqi Lu, Qiang Yang |
Abstract | We report the “Recurrent Deterioration” (RD) phenomenon observed in online recommender systems. The RD phenomenon is reflected by the trend of performance degradation when the recommendation model is always trained based on users’ feedbacks of the previous recommendations. There are several reasons for the recommender systems to encounter the RD phenomenon, including the lack of negative training data and the evolution of users’ interests, etc. Motivated to tackle the problems causing the RD phenomenon, we propose the POMDP-Rec framework, which is a neural-optimized Partially Observable Markov Decision Process algorithm for recommender systems. We show that the POMDP-Rec framework effectively uses the accumulated historical data from real-world recommender systems and automatically achieves comparable results with those models fine-tuned exhaustively by domain exports on public datasets. |
Tasks | Recommendation Systems |
Published | 2016-08-28 |
URL | http://arxiv.org/abs/1608.07793v2 |
http://arxiv.org/pdf/1608.07793v2.pdf | |
PWC | https://paperswithcode.com/paper/partially-observable-markov-decision-process |
Repo | |
Framework | |
DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression
Title | DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression |
Authors | Parameswaran Raman, Sriram Srinivasan, Shin Matsushima, Xinhua Zhang, Hyokun Yun, S. V. N. Vishwanathan |
Abstract | Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging. This is primarily because one needs to compute the log-partition function on every data point. This makes distributing the computation hard. In this paper, we present a distributed stochastic gradient descent based optimization method (DS-MLR) for scaling up multinomial logistic regression problems to massive scale datasets without hitting any storage constraints on the data and model parameters. Our algorithm exploits double-separability, an attractive property that allows us to achieve both data as well as model parallelism simultaneously. In addition, we introduce a non-blocking and asynchronous variant of our algorithm that avoids bulk-synchronization. We demonstrate the versatility of DS-MLR to various scenarios in data and model parallelism, through an extensive empirical study using several real-world datasets. In particular, we demonstrate the scalability of DS-MLR by solving an extreme multi-class classification problem on the Reddit dataset (159 GB data, 358 GB parameters) where, to the best of our knowledge, no other existing methods apply. |
Tasks | |
Published | 2016-04-16 |
URL | http://arxiv.org/abs/1604.04706v7 |
http://arxiv.org/pdf/1604.04706v7.pdf | |
PWC | https://paperswithcode.com/paper/ds-mlr-exploiting-double-separability-for |
Repo | |
Framework | |
Convolutional Imputation of Matrix Networks
Title | Convolutional Imputation of Matrix Networks |
Authors | Qingyun Sun, Mengyuan Yan David Donoho, Stephen Boyd |
Abstract | A matrix network is a family of matrices, with relatedness modeled by a weighted graph. We consider the task of completing a partially observed matrix network. We assume a novel sampling scheme where a fraction of matrices might be completely unobserved. How can we recover the entire matrix network from incomplete observations? This mathematical problem arises in many applications including medical imaging and social networks. To recover the matrix network, we propose a structural assumption that the matrices have a graph Fourier transform which is low-rank. We formulate a convex optimization problem and prove an exact recovery guarantee for the optimization problem. Furthermore, we numerically characterize the exact recovery regime for varying rank and sampling rate and discover a new phase transition phenomenon. Then we give an iterative imputation algorithm to efficiently solve the optimization problem and complete large scale matrix networks. We demonstrate the algorithm with a variety of applications such as MRI and Facebook user network. |
Tasks | Imputation |
Published | 2016-06-02 |
URL | http://arxiv.org/abs/1606.00925v3 |
http://arxiv.org/pdf/1606.00925v3.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-imputation-of-matrix-networks |
Repo | |
Framework | |
Structure from Motion on a Sphere
Title | Structure from Motion on a Sphere |
Authors | Jonathan Ventura |
Abstract | We describe a special case of structure from motion where the camera rotates on a sphere. The camera’s optical axis lies perpendicular to the sphere’s surface. In this case, the camera’s pose is minimally represented by three rotation parameters. From analysis of the epipolar geometry we derive a novel and efficient solution for the essential matrix relating two images, requiring only three point correspondences in the minimal case. We apply this solver in a structure-from-motion pipeline that aggregates pairwise relations by rotation averaging followed by bundle adjustment with an inverse depth parameterization. Our methods enable scene modeling with an outward-facing camera and object scanning with an inward-facing camera. |
Tasks | |
Published | 2016-04-01 |
URL | http://arxiv.org/abs/1604.00409v2 |
http://arxiv.org/pdf/1604.00409v2.pdf | |
PWC | https://paperswithcode.com/paper/structure-from-motion-on-a-sphere |
Repo | |
Framework | |
A system of serial computation for classified rules prediction in non-regular ontology trees
Title | A system of serial computation for classified rules prediction in non-regular ontology trees |
Authors | Kennedy E. Ehimwenma, Paul Crowther, Martin Beer |
Abstract | Objects or structures that are regular take uniform dimensions. Based on the concepts of regular models, our previous research work has developed a system of a regular ontology that models learning structures in a multiagent system for uniform pre-assessments in a learning environment. This regular ontology has led to the modelling of a classified rules learning algorithm that predicts the actual number of rules needed for inductive learning processes and decision making in a multiagent system. But not all processes or models are regular. Thus this paper presents a system of polynomial equation that can estimate and predict the required number of rules of a non-regular ontology model given some defined parameters. |
Tasks | Decision Making |
Published | 2016-04-08 |
URL | http://arxiv.org/abs/1604.02323v1 |
http://arxiv.org/pdf/1604.02323v1.pdf | |
PWC | https://paperswithcode.com/paper/a-system-of-serial-computation-for-classified |
Repo | |
Framework | |
Multi-category Angle-based Classifier Refit
Title | Multi-category Angle-based Classifier Refit |
Authors | Guo Xian Yau, Chong Zhang |
Abstract | Classification is an important statistical learning tool. In real application, besides high prediction accuracy, it is often desirable to estimate class conditional probabilities for new observations. For traditional problems where the number of observations is large, there exist many well developed approaches. Recently, high dimensional low sample size problems are becoming increasingly popular. Margin-based classifiers, such as logistic regression, are well established methods in the literature. On the other hand, in terms of probability estimation, it is known that for binary classifiers, the commonly used methods tend to under-estimate the norm of the classification function. This can lead to biased probability estimation. Remedy approaches have been proposed in the literature. However, for the simultaneous multicategory classification framework, much less work has been done. We fill the gap in this paper. In particular, we give theoretical insights on why heavy regularization terms are often needed in high dimensional applications, and how this can lead to bias in probability estimation. To overcome this difficulty, we propose a new refit strategy for multicategory angle-based classifiers. Our new method only adds a small computation cost to the problem, and is able to attain prediction accuracy that is as good as the regular margin-based classifiers. On the other hand, the improvement of probability estimation can be very significant. Numerical results suggest that the new refit approach is highly competitive. |
Tasks | |
Published | 2016-07-19 |
URL | http://arxiv.org/abs/1607.05709v1 |
http://arxiv.org/pdf/1607.05709v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-category-angle-based-classifier-refit |
Repo | |
Framework | |
A Hidden Absorbing Semi-Markov Model for Informatively Censored Temporal Data: Learning and Inference
Title | A Hidden Absorbing Semi-Markov Model for Informatively Censored Temporal Data: Learning and Inference |
Authors | Ahmed M. Alaa, Mihaela van der Schaar |
Abstract | Modeling continuous-time physiological processes that manifest a patient’s evolving clinical states is a key step in approaching many problems in healthcare. In this paper, we develop the Hidden Absorbing Semi-Markov Model (HASMM): a versatile probabilistic model that is capable of capturing the modern electronic health record (EHR) data. Unlike exist- ing models, an HASMM accommodates irregularly sampled, temporally correlated, and informatively censored physiological data, and can describe non-stationary clinical state transitions. Learning an HASMM from the EHR data is achieved via a novel forward- filtering backward-sampling Monte-Carlo EM algorithm that exploits the knowledge of the end-point clinical outcomes (informative censoring) in the EHR data, and implements the E-step by sequentially sampling the patients’ clinical states in the reverse-time direction while conditioning on the future states. Real-time inferences are drawn via a forward- filtering algorithm that operates on a virtually constructed discrete-time embedded Markov chain that mirrors the patient’s continuous-time state trajectory. We demonstrate the di- agnostic and prognostic utility of the HASMM in a critical care prognosis setting using a real-world dataset for patients admitted to the Ronald Reagan UCLA Medical Center. |
Tasks | |
Published | 2016-12-18 |
URL | http://arxiv.org/abs/1612.06007v2 |
http://arxiv.org/pdf/1612.06007v2.pdf | |
PWC | https://paperswithcode.com/paper/a-hidden-absorbing-semi-markov-model-for |
Repo | |
Framework | |
A generalized flow for multi-class and binary classification tasks: An Azure ML approach
Title | A generalized flow for multi-class and binary classification tasks: An Azure ML approach |
Authors | Matthew Bihis, Sohini Roychowdhury |
Abstract | The constant growth in the present day real-world databases pose computational challenges for a single computer. Cloud-based platforms, on the other hand, are capable of handling large volumes of information manipulation tasks, thereby necessitating their use for large real-world data set computations. This work focuses on creating a novel Generalized Flow within the cloud-based computing platform: Microsoft Azure Machine Learning Studio (MAMLS) that accepts multi-class and binary classification data sets alike and processes them to maximize the overall classification accuracy. First, each data set is split into training and testing data sets, respectively. Then, linear and nonlinear classification model parameters are estimated using the training data set. Data dimensionality reduction is then performed to maximize classification accuracy. For multi-class data sets, data centric information is used to further improve overall classification accuracy by reducing the multi-class classification to a series of hierarchical binary classification tasks. Finally, the performance of optimized classification model thus achieved is evaluated and scored on the testing data set. The classification characteristics of the proposed flow are comparatively evaluated on 3 public data sets and a local data set with respect to existing state-of-the-art methods. On the 3 public data sets, the proposed flow achieves 78-97.5% classification accuracy. Also, the local data set, created using the information regarding presence of Diabetic Retinopathy lesions in fundus images, results in 85.3-95.7% average classification accuracy, which is higher than the existing methods. Thus, the proposed generalized flow can be useful for a wide range of application-oriented “big data sets”. |
Tasks | Dimensionality Reduction |
Published | 2016-03-26 |
URL | http://arxiv.org/abs/1603.08070v1 |
http://arxiv.org/pdf/1603.08070v1.pdf | |
PWC | https://paperswithcode.com/paper/a-generalized-flow-for-multi-class-and-binary |
Repo | |
Framework | |
Bounds on the Number of Measurements for Reliable Compressive Classification
Title | Bounds on the Number of Measurements for Reliable Compressive Classification |
Authors | Hugo Reboredo, Francesco Renna, Robert Calderbank, Miguel R. D. Rodrigues |
Abstract | This paper studies the classification of high-dimensional Gaussian signals from low-dimensional noisy, linear measurements. In particular, it provides upper bounds (sufficient conditions) on the number of measurements required to drive the probability of misclassification to zero in the low-noise regime, both for random measurements and designed ones. Such bounds reveal two important operational regimes that are a function of the characteristics of the source: i) when the number of classes is less than or equal to the dimension of the space spanned by signals in each class, reliable classification is possible in the low-noise regime by using a one-vs-all measurement design; ii) when the dimension of the spaces spanned by signals in each class is lower than the number of classes, reliable classification is guaranteed in the low-noise regime by using a simple random measurement design. Simulation results both with synthetic and real data show that our analysis is sharp, in the sense that it is able to gauge the number of measurements required to drive the misclassification probability to zero in the low-noise regime. |
Tasks | |
Published | 2016-07-11 |
URL | http://arxiv.org/abs/1607.02801v2 |
http://arxiv.org/pdf/1607.02801v2.pdf | |
PWC | https://paperswithcode.com/paper/bounds-on-the-number-of-measurements-for |
Repo | |
Framework | |
Zeroth-order Asynchronous Doubly Stochastic Algorithm with Variance Reduction
Title | Zeroth-order Asynchronous Doubly Stochastic Algorithm with Variance Reduction |
Authors | Bin Gu, Zhouyuan Huo, Heng Huang |
Abstract | Zeroth-order (derivative-free) optimization attracts a lot of attention in machine learning, because explicit gradient calculations may be computationally expensive or infeasible. To handle large scale problems both in volume and dimension, recently asynchronous doubly stochastic zeroth-order algorithms were proposed. The convergence rate of existing asynchronous doubly stochastic zeroth order algorithms is $O(\frac{1}{\sqrt{T}})$ (also for the sequential stochastic zeroth-order optimization algorithms). In this paper, we focus on the finite sums of smooth but not necessarily convex functions, and propose an asynchronous doubly stochastic zeroth-order optimization algorithm using the accelerated technology of variance reduction (AsyDSZOVR). Rigorous theoretical analysis show that the convergence rate can be improved from $O(\frac{1}{\sqrt{T}})$ the best result of existing algorithms to $O(\frac{1}{T})$. Also our theoretical results is an improvement to the ones of the sequential stochastic zeroth-order optimization algorithms. |
Tasks | |
Published | 2016-12-05 |
URL | http://arxiv.org/abs/1612.01425v1 |
http://arxiv.org/pdf/1612.01425v1.pdf | |
PWC | https://paperswithcode.com/paper/zeroth-order-asynchronous-doubly-stochastic |
Repo | |
Framework | |
Fast, Scalable Phrase-Based SMT Decoding
Title | Fast, Scalable Phrase-Based SMT Decoding |
Authors | Hieu Hoang, Nikolay Bogoychev, Lane Schwartz, Marcin Junczys-Dowmunt |
Abstract | The utilization of statistical machine translation (SMT) has grown enormously over the last decade, many using open-source software developed by the NLP community. As commercial use has increased, there is need for software that is optimized for commercial requirements, in particular, fast phrase-based decoding and more efficient utilization of modern multicore servers. In this paper we re-examine the major components of phrase-based decoding and decoder implementation with particular emphasis on speed and scalability on multicore machines. The result is a drop-in replacement for the Moses decoder which is up to fifteen times faster and scales monotonically with the number of cores. |
Tasks | Machine Translation |
Published | 2016-10-13 |
URL | http://arxiv.org/abs/1610.04265v2 |
http://arxiv.org/pdf/1610.04265v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-scalable-phrase-based-smt-decoding |
Repo | |
Framework | |
Correlation-Based Method for Sentiment Classification
Title | Correlation-Based Method for Sentiment Classification |
Authors | Hussam Hamdan |
Abstract | The classic supervised classification algorithms are efficient, but time-consuming, complicated and not interpretable, which makes it difficult to analyze their results that limits the possibility to improve them based on real observations. In this paper, we propose a new and a simple classifier to predict a sentiment label of a short text. This model keeps the capacity of human interpret-ability and can be extended to integrate NLP techniques in a more interpretable way. Our model is based on a correlation metric which measures the degree of association between a sentiment label and a word. Ten correlation metrics are proposed and evaluated intrinsically. And then a classifier based on each metric is proposed, evaluated and compared to the classic classification algorithms which have proved their performance in many studies. Our model outperforms these algorithms with several correlation metrics. |
Tasks | Sentiment Analysis |
Published | 2016-10-10 |
URL | http://arxiv.org/abs/1610.03120v2 |
http://arxiv.org/pdf/1610.03120v2.pdf | |
PWC | https://paperswithcode.com/paper/correlation-based-method-for-sentiment |
Repo | |
Framework | |
Information Theoretic Structure Learning with Confidence
Title | Information Theoretic Structure Learning with Confidence |
Authors | Kevin R. Moon, Morteza Noshad, Salimeh Yasaei Sekeh, Alfred O. Hero III |
Abstract | Information theoretic measures (e.g. the Kullback Liebler divergence and Shannon mutual information) have been used for exploring possibly nonlinear multivariate dependencies in high dimension. If these dependencies are assumed to follow a Markov factor graph model, this exploration process is called structure discovery. For discrete-valued samples, estimates of the information divergence over the parametric class of multinomial models lead to structure discovery methods whose mean squared error achieves parametric convergence rates as the sample size grows. However, a naive application of this method to continuous nonparametric multivariate models converges much more slowly. In this paper we introduce a new method for nonparametric structure discovery that uses weighted ensemble divergence estimators that achieve parametric convergence rates and obey an asymptotic central limit theorem that facilitates hypothesis testing and other types of statistical validation. |
Tasks | |
Published | 2016-09-13 |
URL | http://arxiv.org/abs/1609.03912v1 |
http://arxiv.org/pdf/1609.03912v1.pdf | |
PWC | https://paperswithcode.com/paper/information-theoretic-structure-learning-with |
Repo | |
Framework | |
Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data
Title | Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data |
Authors | Rocco De Rosa |
Abstract | Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a statistical viewpoint, the analysis of decision tree classifiers in a streaming setting requires knowing when enough new information has been collected to justify splitting a leaf. Although some of the issues in the statistical analysis of Hoeffding trees have been already clarified, a general and rigorous study of confidence intervals for splitting criteria is missing. We fill this gap by deriving accurate confidence intervals to estimate the splitting gain in decision tree learning with respect to three criteria: entropy, Gini index, and a third index proposed by Kearns and Mansour. Our confidence intervals depend in a more detailed way on the tree parameters. We also extend our confidence analysis to a selective sampling setting, in which the decision tree learner adaptively decides which labels to query in the stream. We furnish theoretical guarantee bounding the probability that the classification is non-optimal learning the decision tree via our selective sampling strategy. Experiments on real and synthetic data in a streaming setting show that our trees are indeed more accurate than trees with the same number of leaves generated by other techniques and our active learning module permits to save labeling cost. In addition, comparing our labeling strategy with recent methods, we show that our approach is more robust and consistent respect all the other techniques applied to incremental decision trees. |
Tasks | Active Learning |
Published | 2016-04-12 |
URL | http://arxiv.org/abs/1604.03278v1 |
http://arxiv.org/pdf/1604.03278v1.pdf | |
PWC | https://paperswithcode.com/paper/confidence-decision-trees-via-online-and |
Repo | |
Framework | |
Efficiently Bounding Optimal Solutions after Small Data Modification in Large-Scale Empirical Risk Minimization
Title | Efficiently Bounding Optimal Solutions after Small Data Modification in Large-Scale Empirical Risk Minimization |
Authors | Hiroyuki Hanada, Atsushi Shibagaki, Jun Sakuma, Ichiro Takeuchi |
Abstract | We study large-scale classification problems in changing environments where a small part of the dataset is modified, and the effect of the data modification must be quickly incorporated into the classifier. When the entire dataset is large, even if the amount of the data modification is fairly small, the computational cost of re-training the classifier would be prohibitively large. In this paper, we propose a novel method for efficiently incorporating such a data modification effect into the classifier without actually re-training it. The proposed method provides bounds on the unknown optimal classifier with the cost only proportional to the size of the data modification. We demonstrate through numerical experiments that the proposed method provides sufficiently tight bounds with negligible computational costs, especially when a small part of the dataset is modified in a large-scale classification problem. |
Tasks | |
Published | 2016-06-01 |
URL | http://arxiv.org/abs/1606.00136v1 |
http://arxiv.org/pdf/1606.00136v1.pdf | |
PWC | https://paperswithcode.com/paper/efficiently-bounding-optimal-solutions-after |
Repo | |
Framework | |