Paper Group ANR 785
Stochastic Cubic Regularization for Fast Nonconvex Optimization. Completing a joint PMF from projections: a low-rank coupled tensor factorization approach. Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control. Model Extraction Warning in MLaaS Paradigm. Max-Margin Invariant Features from Transformed Unlabeled Data. A Te …
Stochastic Cubic Regularization for Fast Nonconvex Optimization
Title | Stochastic Cubic Regularization for Fast Nonconvex Optimization |
Authors | Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, Michael I. Jordan |
Abstract | This paper proposes a stochastic variant of a classic algorithm—the cubic-regularized Newton method [Nesterov and Polyak 2006]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed as efficiently as stochastic gradients. This improves upon the $\mathcal{\tilde{O}}(\epsilon^{-4})$ rate of stochastic gradient descent. Our rate matches the best-known result for finding local minima without requiring any delicate acceleration or variance-reduction techniques. |
Tasks | |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.02838v2 |
http://arxiv.org/pdf/1711.02838v2.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-cubic-regularization-for-fast |
Repo | |
Framework | |
Completing a joint PMF from projections: a low-rank coupled tensor factorization approach
Title | Completing a joint PMF from projections: a low-rank coupled tensor factorization approach |
Authors | Nikos Kargas, Nicholas D. Sidiropoulos |
Abstract | There has recently been considerable interest in completing a low-rank matrix or tensor given only a small fraction (or few linear combinations) of its entries. Related approaches have found considerable success in the area of recommender systems, under machine learning. From a statistical estimation point of view, the gold standard is to have access to the joint probability distribution of all pertinent random variables, from which any desired optimal estimator can be readily derived. In practice high-dimensional joint distributions are very hard to estimate, and only estimates of low-dimensional projections may be available. We show that it is possible to identify higher-order joint PMFs from lower-order marginalized PMFs using coupled low-rank tensor factorization. Our approach features guaranteed identifiability when the full joint PMF is of low-enough rank, and effective approximation otherwise. We provide an algorithmic approach to compute the sought factors, and illustrate the merits of our approach using rating prediction as an example. |
Tasks | Recommendation Systems |
Published | 2017-02-16 |
URL | http://arxiv.org/abs/1702.05184v1 |
http://arxiv.org/pdf/1702.05184v1.pdf | |
PWC | https://paperswithcode.com/paper/completing-a-joint-pmf-from-projections-a-low |
Repo | |
Framework | |
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
Title | Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control |
Authors | Sanket Kamthe, Marc Peter Deisenroth |
Abstract | Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A large number of interactions may be impractical in many real-world applications, such as robotics, and many practical systems have to obey limitations in the form of state space or control constraints. To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We then use MPC to find a control sequence that minimises the expected long-term cost. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach does not only achieve state-of-the-art data efficiency, but also is a principled way for RL in constrained environments. |
Tasks | Gaussian Processes |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06491v2 |
http://arxiv.org/pdf/1706.06491v2.pdf | |
PWC | https://paperswithcode.com/paper/data-efficient-reinforcement-learning-with |
Repo | |
Framework | |
Model Extraction Warning in MLaaS Paradigm
Title | Model Extraction Warning in MLaaS Paradigm |
Authors | Manish Kesarwani, Bhaskar Mukhoty, Vijay Arya, Sameep Mehta |
Abstract | Cloud vendors are increasingly offering machine learning services as part of their platform and services portfolios. These services enable the deployment of machine learning models on the cloud that are offered on a pay-per-query basis to application developers and end users. However recent work has shown that the hosted models are susceptible to extraction attacks. Adversaries may launch queries to steal the model and compromise future query payments or privacy of the training data. In this work, we present a cloud-based extraction monitor that can quantify the extraction status of models by observing the query and response streams of both individual and colluding adversarial users. We present a novel technique that uses information gain to measure the model learning rate by users with increasing number of queries. Additionally, we present an alternate technique that maintains intelligent query summaries to measure the learning rate relative to the coverage of the input feature space in the presence of collusion. Both these approaches have low computational overhead and can easily be offered as services to model owners to warn them of possible extraction attacks from adversaries. We present performance results for these approaches for decision tree models deployed on BigML MLaaS platform, using open source datasets and different adversarial attack strategies. |
Tasks | Adversarial Attack |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07221v1 |
http://arxiv.org/pdf/1711.07221v1.pdf | |
PWC | https://paperswithcode.com/paper/model-extraction-warning-in-mlaas-paradigm |
Repo | |
Framework | |
Max-Margin Invariant Features from Transformed Unlabeled Data
Title | Max-Margin Invariant Features from Transformed Unlabeled Data |
Authors | Dipan K. Pal, Ashwin A. Kannan, Gautam Arakalgud, Marios Savvides |
Abstract | The study of representations invariant to common transformations of the data is important to learning. Most techniques have focused on local approximate invariance implemented within expensive optimization frameworks lacking explicit theoretical guarantees. In this paper, we study kernels that are invariant to a unitary group while having theoretical guarantees in addressing the important practical issue of unavailability of transformed versions of labelled data. A problem we call the Unlabeled Transformation Problem which is a special form of semi-supervised learning and one-shot learning. We present a theoretically motivated alternate approach to the invariant kernel SVM based on which we propose Max-Margin Invariant Features (MMIF) to solve this problem. As an illustration, we design an framework for face recognition and demonstrate the efficacy of our approach on a large scale semi-synthetic dataset with 153,000 images and a new challenging protocol on Labelled Faces in the Wild (LFW) while out-performing strong baselines. |
Tasks | Face Recognition, One-Shot Learning |
Published | 2017-10-24 |
URL | http://arxiv.org/abs/1710.08585v1 |
http://arxiv.org/pdf/1710.08585v1.pdf | |
PWC | https://paperswithcode.com/paper/max-margin-invariant-features-from |
Repo | |
Framework | |
A Teacher-Student Framework for Zero-Resource Neural Machine Translation
Title | A Teacher-Student Framework for Zero-Resource Neural Machine Translation |
Authors | Yun Chen, Yang Liu, Yong Cheng, Victor O. K. Li |
Abstract | While end-to-end neural machine translation (NMT) has made remarkable progress recently, it still suffers from the data scarcity problem for low-resource language pairs and domains. In this paper, we propose a method for zero-resource NMT by assuming that parallel sentences have close probabilities of generating a sentence in a third language. Based on this assumption, our method is able to train a source-to-target NMT model (“student”) without parallel corpora available, guided by an existing pivot-to-target NMT model (“teacher”) on a source-pivot parallel corpus. Experimental results show that the proposed method significantly improves over a baseline pivot-based model by +3.0 BLEU points across various language pairs. |
Tasks | Machine Translation |
Published | 2017-05-02 |
URL | http://arxiv.org/abs/1705.00753v1 |
http://arxiv.org/pdf/1705.00753v1.pdf | |
PWC | https://paperswithcode.com/paper/a-teacher-student-framework-for-zero-resource |
Repo | |
Framework | |
A Multi-Layer K-means Approach for Multi-Sensor Data Pattern Recognition in Multi-Target Localization
Title | A Multi-Layer K-means Approach for Multi-Sensor Data Pattern Recognition in Multi-Target Localization |
Authors | Samuel Silva, Rengan Suresh, Feng Tao, Johnathan Votion, Yongcan Cao |
Abstract | Data-target association is an important step in multi-target localization for the intelligent operation of un- manned systems in numerous applications such as search and rescue, traffic management and surveillance. The objective of this paper is to present an innovative data association learning approach named multi-layer K-means (MLKM) based on leveraging the advantages of some existing machine learning approaches, including K-means, K-means++, and deep neural networks. To enable the accurate data association from different sensors for efficient target localization, MLKM relies on the clustering capabilities of K-means++ structured in a multi-layer framework with the error correction feature that is motivated by the backpropogation that is well-known in deep learning research. To show the effectiveness of the MLKM method, numerous simulation examples are conducted to compare its performance with K-means, K-means++, and deep neural networks. |
Tasks | |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10757v1 |
http://arxiv.org/pdf/1705.10757v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-layer-k-means-approach-for-multi |
Repo | |
Framework | |
Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach
Title | Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach |
Authors | Hu Han, Anil K. Jain, Fang Wang, Shiguang Shan, Xilin Chen |
Abstract | Face attribute estimation has many potential applications in video surveillance, face retrieval, and social media. While a number of methods have been proposed for face attribute estimation, most of them did not explicitly consider the attribute correlation and heterogeneity (e.g., ordinal vs. nominal and holistic vs. local) during feature representation learning. In this paper, we present a Deep Multi-Task Learning (DMTL) approach to jointly estimate multiple heterogeneous attributes from a single face image. In DMTL, we tackle attribute correlation and heterogeneity with convolutional neural networks (CNNs) consisting of shared feature learning for all the attributes, and category-specific feature learning for heterogeneous attributes. We also introduce an unconstrained face database (LFW+), an extension of public-domain LFW, with heterogeneous demographic attributes (age, gender, and race) obtained via crowdsourcing. Experimental results on benchmarks with multiple face attributes (MORPH II, LFW+, CelebA, LFWA, and FotW) show that the proposed approach has superior performance compared to state of the art. Finally, evaluations on a public-domain face database (LAP) with a single attribute show that the proposed approach has excellent generalization ability. |
Tasks | Multi-Task Learning, Representation Learning |
Published | 2017-06-03 |
URL | http://arxiv.org/abs/1706.00906v3 |
http://arxiv.org/pdf/1706.00906v3.pdf | |
PWC | https://paperswithcode.com/paper/heterogeneous-face-attribute-estimation-a |
Repo | |
Framework | |
Towards automated patient data cleaning using deep learning: A feasibility study on the standardization of organ labeling
Title | Towards automated patient data cleaning using deep learning: A feasibility study on the standardization of organ labeling |
Authors | Timothy Rozario, Troy Long, Mingli Chen, Weiguo Lu, Steve Jiang |
Abstract | Data cleaning consumes about 80% of the time spent on data analysis for clinical research projects. This is a much bigger problem in the era of big data and machine learning in the field of medicine where large volumes of data are being generated. We report an initial effort towards automated patient data cleaning using deep learning: the standardization of organ labeling in radiation therapy. Organs are often labeled inconsistently at different institutions (sometimes even within the same institution) and at different time periods, which poses a problem for clinical research, especially for multi-institutional collaborative clinical research where the acquired patient data is not being used effectively. We developed a convolutional neural network (CNN) to automatically identify each organ in the CT image and then label it with the standardized nomenclature presented at AAPM Task Group 263. We tested this model on the CT images of 54 patients with prostate and 100 patients with head and neck cancer who previously received radiation therapy. The model achieved 100% accuracy in detecting organs and assigning standardized labels for the patients tested. This work shows the feasibility of using deep learning in patient data cleaning that enables standardized datasets to be generated for effective intra- and interinstitutional collaborative clinical research. |
Tasks | |
Published | 2017-12-30 |
URL | http://arxiv.org/abs/1801.00096v1 |
http://arxiv.org/pdf/1801.00096v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-automated-patient-data-cleaning-using |
Repo | |
Framework | |
How Generative Adversarial Networks and Their Variants Work: An Overview
Title | How Generative Adversarial Networks and Their Variants Work: An Overview |
Authors | Yongjun Hong, Uiwon Hwang, Jaeyoon Yoo, Sungroh Yoon |
Abstract | Generative Adversarial Networks (GAN) have received wide attention in the machine learning field for their potential to learn high-dimensional, complex real data distribution. Specifically, they do not rely on any assumptions about the distribution and can generate real-like samples from latent space in a simple manner. This powerful property leads GAN to be applied to various applications such as image synthesis, image attribute editing, image translation, domain adaptation and other academic fields. In this paper, we aim to discuss the details of GAN for those readers who are familiar with, but do not comprehend GAN deeply or who wish to view GAN from various perspectives. In addition, we explain how GAN operates and the fundamental meaning of various objective functions that have been suggested recently. We then focus on how the GAN can be combined with an autoencoder framework. Finally, we enumerate the GAN variants that are applied to various tasks and other fields for those who are interested in exploiting GAN for their research. |
Tasks | Domain Adaptation, Image Generation |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.05914v9 |
http://arxiv.org/pdf/1711.05914v9.pdf | |
PWC | https://paperswithcode.com/paper/how-generative-adversarial-networks-and-their |
Repo | |
Framework | |
ApproxDBN: Approximate Computing for Discriminative Deep Belief Networks
Title | ApproxDBN: Approximate Computing for Discriminative Deep Belief Networks |
Authors | Xiaojing Xu, Srinjoy Das, Ken Kreutz-Delgado |
Abstract | Probabilistic generative neural networks are useful for many applications, such as image classification, speech recognition and occlusion removal. However, the power budget for hardware implementations of neural networks can be extremely tight. To address this challenge we describe a design methodology for using approximate computing methods to implement Approximate Deep Belief Networks (ApproxDBNs) by systematically exploring the use of (1) limited precision of variables; (2) criticality analysis to identify the nodes in the network which can operate with such limited precision while allowing the network to maintain target accuracy levels; and (3) a greedy search methodology with incremental retraining to determine the optimal reduction in precision to enable maximize power savings under user-specified accuracy constraints. Experimental results show that significant bit-length reduction can be achieved by our ApproxDBN with constrained accuracy loss. |
Tasks | Image Classification, Speech Recognition |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.03993v3 |
http://arxiv.org/pdf/1704.03993v3.pdf | |
PWC | https://paperswithcode.com/paper/approxdbn-approximate-computing-for |
Repo | |
Framework | |
Depression and Self-Harm Risk Assessment in Online Forums
Title | Depression and Self-Harm Risk Assessment in Online Forums |
Authors | Andrew Yates, Arman Cohan, Nazli Goharian |
Abstract | Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset (“RSDD”) consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset. |
Tasks | |
Published | 2017-09-06 |
URL | http://arxiv.org/abs/1709.01848v1 |
http://arxiv.org/pdf/1709.01848v1.pdf | |
PWC | https://paperswithcode.com/paper/depression-and-self-harm-risk-assessment-in |
Repo | |
Framework | |
Sequential Local Learning for Latent Graphical Models
Title | Sequential Local Learning for Latent Graphical Models |
Authors | Sejun Park, Eunho Yang, Jinwoo Shin |
Abstract | Learning parameters of latent graphical models (GM) is inherently much harder than that of no-latent ones since the latent variables make the corresponding log-likelihood non-concave. Nevertheless, expectation-maximization schemes are popularly used in practice, but they are typically stuck in local optima. In the recent years, the method of moments have provided a refreshing angle for resolving the non-convex issue, but it is applicable to a quite limited class of latent GMs. In this paper, we aim for enhancing its power via enlarging such a class of latent GMs. To this end, we introduce two novel concepts, coined marginalization and conditioning, which can reduce the problem of learning a larger GM to that of a smaller one. More importantly, they lead to a sequential learning framework that repeatedly increases the learning portion of given latent GM, and thus covers a significantly broader and more complicated class of loopy latent GMs which include convolutional and random regular models. |
Tasks | |
Published | 2017-03-12 |
URL | http://arxiv.org/abs/1703.04082v2 |
http://arxiv.org/pdf/1703.04082v2.pdf | |
PWC | https://paperswithcode.com/paper/sequential-local-learning-for-latent |
Repo | |
Framework | |
Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification
Title | Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification |
Authors | Zaidao Wen, Biao Hou, Licheng Jiao |
Abstract | Linear synthesis model based dictionary learning framework has achieved remarkable performances in image classification in the last decade. Behaved as a generative feature model, it however suffers from some intrinsic deficiencies. In this paper, we propose a novel parametric nonlinear analysis cosparse model (NACM) with which a unique feature vector will be much more efficiently extracted. Additionally, we derive a deep insight to demonstrate that NACM is capable of simultaneously learning the task adapted feature transformation and regularization to encode our preferences, domain prior knowledge and task oriented supervised information into the features. The proposed NACM is devoted to the classification task as a discriminative feature model and yield a novel discriminative nonlinear analysis operator learning framework (DNAOL). The theoretical analysis and experimental performances clearly demonstrate that DNAOL will not only achieve the better or at least competitive classification accuracies than the state-of-the-art algorithms but it can also dramatically reduce the time complexities in both training and testing phases. |
Tasks | Dictionary Learning, Image Classification |
Published | 2017-04-30 |
URL | http://arxiv.org/abs/1705.00322v1 |
http://arxiv.org/pdf/1705.00322v1.pdf | |
PWC | https://paperswithcode.com/paper/discriminative-nonlinear-analysis-operator |
Repo | |
Framework | |
How Well Can Generative Adversarial Networks Learn Densities: A Nonparametric View
Title | How Well Can Generative Adversarial Networks Learn Densities: A Nonparametric View |
Authors | Tengyuan Liang |
Abstract | We study in this paper the rate of convergence for learning densities under the Generative Adversarial Networks (GAN) framework, borrowing insights from nonparametric statistics. We introduce an improved GAN estimator that achieves a faster rate, through simultaneously leveraging the level of smoothness in the target density and the evaluation metric, which in theory remedies the mode collapse problem reported in the literature. A minimax lower bound is constructed to show that when the dimension is large, the exponent in the rate for the new GAN estimator is near optimal. One can view our results as answering in a quantitative way how well GAN learns a wide range of densities with different smoothness properties, under a hierarchy of evaluation metrics. As a byproduct, we also obtain improved generalization bounds for GAN with deeper ReLU discriminator network. |
Tasks | |
Published | 2017-12-21 |
URL | http://arxiv.org/abs/1712.08244v2 |
http://arxiv.org/pdf/1712.08244v2.pdf | |
PWC | https://paperswithcode.com/paper/how-well-can-generative-adversarial-networks |
Repo | |
Framework | |