Paper Group ANR 13
K-Nearest Oracles Borderline Dynamic Classifier Ensemble Selection. Analyzing different prototype selection techniques for dynamic classifier and ensemble selection. Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals. Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorit …
K-Nearest Oracles Borderline Dynamic Classifier Ensemble Selection
Title | K-Nearest Oracles Borderline Dynamic Classifier Ensemble Selection |
Authors | Dayvid V. R. Oliveira, George D. C. Cavalcanti, Thyago N. Porpino, Rafael M. O. Cruz, Robert Sabourin |
Abstract | Dynamic Ensemble Selection (DES) techniques aim to select locally competent classifiers for the classification of each new test sample. Most DES techniques estimate the competence of classifiers using a given criterion over the region of competence of the test sample (its the nearest neighbors in the validation set). The K-Nearest Oracles Eliminate (KNORA-E) DES selects all classifiers that correctly classify all samples in the region of competence of the test sample, if such classifier exists, otherwise, it removes from the region of competence the sample that is furthest from the test sample, and the process repeats. When the region of competence has samples of different classes, KNORA-E can reduce the region of competence in such a way that only samples of a single class remain in the region of competence, leading to the selection of locally incompetent classifiers that classify all samples in the region of competence as being from the same class. In this paper, we propose two DES techniques: K-Nearest Oracles Borderline (KNORA-B) and K-Nearest Oracles Borderline Imbalanced (KNORA-BI). KNORA-B is a DES technique based on KNORA-E that reduces the region of competence but maintains at least one sample from each class that is in the original region of competence. KNORA-BI is a variation of KNORA-B for imbalance datasets that reduces the region of competence but maintains at least one minority class sample if there is any in the original region of competence. Experiments are conducted comparing the proposed techniques with 19 DES techniques from the literature using 40 datasets. The results show that the proposed techniques achieved interesting results, with KNORA-BI outperforming state-of-art techniques. |
Tasks | |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06943v1 |
http://arxiv.org/pdf/1804.06943v1.pdf | |
PWC | https://paperswithcode.com/paper/k-nearest-oracles-borderline-dynamic |
Repo | |
Framework | |
Analyzing different prototype selection techniques for dynamic classifier and ensemble selection
Title | Analyzing different prototype selection techniques for dynamic classifier and ensemble selection |
Authors | Rafael M. O. Cruz, Robert Sabourin, George D. C. Cavalcanti |
Abstract | In dynamic selection (DS) techniques, only the most competent classifiers, for the classification of a specific test sample are selected to predict the sample’s class labels. The more important step in DES techniques is estimating the competence of the base classifiers for the classification of each specific test sample. The classifiers’ competence is usually estimated using the neighborhood of the test sample defined on the validation samples, called the region of competence. Thus, the performance of DS techniques is sensitive to the distribution of the validation set. In this paper, we evaluate six prototype selection techniques that work by editing the validation data in order to remove noise and redundant instances. Experiments conducted using several state-of-the-art DS techniques over 30 classification problems demonstrate that by using prototype selection techniques we can improve the classification accuracy of DS techniques and also significantly reduce the computational cost involved. |
Tasks | |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00677v1 |
http://arxiv.org/pdf/1811.00677v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-different-prototype-selection |
Repo | |
Framework | |
Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals
Title | Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals |
Authors | Emilie Kaufmann, Wouter Koolen |
Abstract | This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model. The deviations are measured using the Kullback-Leibler divergence in a given one-dimensional exponential family, and may take into account several arms at a time. They are obtained by constructing for each arm a mixture martingale based on a hierarchical prior, and by multiplying those martingales. Our deviation inequalities allow us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems, and to construct tight confidence intervals for some functions of the means of the arms. |
Tasks | |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11419v1 |
http://arxiv.org/pdf/1811.11419v1.pdf | |
PWC | https://paperswithcode.com/paper/mixture-martingales-revisited-with |
Repo | |
Framework | |
Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds
Title | Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds |
Authors | Kuan Liu, Aurélien Bellet |
Abstract | Similarity and metric learning provides a principled approach to construct a task-specific similarity from weakly supervised data. However, these methods are subject to the curse of dimensionality: as the number of features grows large, poor generalization is to be expected and training becomes intractable due to high computational and memory costs. In this paper, we propose a similarity learning method that can efficiently deal with high-dimensional sparse data. This is achieved through a parameterization of similarity functions by convex combinations of sparse rank-one matrices, together with the use of a greedy approximate Frank-Wolfe algorithm which provides an efficient way to control the number of active features. We show that the convergence rate of the algorithm, as well as its time and memory complexity, are independent of the data dimension. We further provide a theoretical justification of our modeling choices through an analysis of the generalization error, which depends logarithmically on the sparsity of the solution rather than on the number of features. Our experiments on datasets with up to one million features demonstrate the ability of our approach to generalize well despite the high dimensionality as well as its superiority compared to several competing methods. |
Tasks | Metric Learning |
Published | 2018-07-20 |
URL | https://arxiv.org/abs/1807.07789v4 |
https://arxiv.org/pdf/1807.07789v4.pdf | |
PWC | https://paperswithcode.com/paper/escaping-the-curse-of-dimensionality-in |
Repo | |
Framework | |
On the Information Theoretic Distance Measures and Bidirectional Helmholtz Machines
Title | On the Information Theoretic Distance Measures and Bidirectional Helmholtz Machines |
Authors | Mahdi Azarafrooz, Xuan Zhao, Sepehr Akhavan-Masouleh |
Abstract | By establishing a connection between bi-directional Helmholtz machines and information theory, we propose a generalized Helmholtz machine. Theoretical and experimental results show that given \textit{shallow} architectures, the generalized model outperforms the previous ones substantially. |
Tasks | |
Published | 2018-07-16 |
URL | http://arxiv.org/abs/1807.06054v1 |
http://arxiv.org/pdf/1807.06054v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-information-theoretic-distance |
Repo | |
Framework | |
Group Preserving Label Embedding for Multi-Label Classification
Title | Group Preserving Label Embedding for Multi-Label Classification |
Authors | Vikas Kumar, Arun K Pujari, Vineet Padmanabhan, Venkateswara Rao Kagita |
Abstract | Multi-label learning is concerned with the classification of data with multiple class labels. This is in contrast to the traditional classification problem where every data instance has a single label. Due to the exponential size of output space, exploiting intrinsic information in feature and label spaces has been the major thrust of research in recent years and use of parametrization and embedding have been the prime focus. Researchers have studied several aspects of embedding which include label embedding, input embedding, dimensionality reduction and feature selection. These approaches differ from one another in their capability to capture other intrinsic properties such as label correlation, local invariance etc. We assume here that the input data form groups and as a result, the label matrix exhibits a sparsity pattern and hence the labels corresponding to objects in the same group have similar sparsity. In this paper, we study the embedding of labels together with the group information with an objective to build an efficient multi-label classification. We assume the existence of a low-dimensional space onto which the feature vectors and label vectors can be embedded. In order to achieve this, we address three sub-problems namely; (1) Identification of groups of labels; (2) Embedding of label vectors to a low rank-space so that the sparsity characteristic of individual groups remains invariant; and (3) Determining a linear mapping that embeds the feature vectors onto the same set of points, as in stage 2, in the low-dimensional space. We compare our method with seven well-known algorithms on twelve benchmark data sets. Our experimental analysis manifests the superiority of our proposed method over state-of-art algorithms for multi-label learning. |
Tasks | Dimensionality Reduction, Feature Selection, Multi-Label Classification, Multi-Label Learning |
Published | 2018-12-24 |
URL | http://arxiv.org/abs/1812.09910v1 |
http://arxiv.org/pdf/1812.09910v1.pdf | |
PWC | https://paperswithcode.com/paper/group-preserving-label-embedding-for-multi |
Repo | |
Framework | |
Multi-Modal Data Augmentation for End-to-End ASR
Title | Multi-Modal Data Augmentation for End-to-End ASR |
Authors | Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, Shinji Watanabe |
Abstract | We present a new end-to-end architecture for automatic speech recognition (ASR) that can be trained using \emph{symbolic} input in addition to the traditional acoustic input. This architecture utilizes two separate encoders: one for acoustic input and another for symbolic input, both sharing the attention and decoder parameters. We call this architecture a multi-modal data augmentation network (MMDA), as it can support multi-modal (acoustic and symbolic) input and enables seamless mixing of large text datasets with significantly smaller transcribed speech corpora during training. We study different ways of transforming large text corpora into a symbolic form suitable for training our MMDA network. Our best MMDA setup obtains small improvements on character error rate (CER), and as much as 7-10% relative word error rate (WER) improvement over a baseline both with and without an external language model. |
Tasks | Data Augmentation, End-To-End Speech Recognition, Language Modelling, Speech Recognition |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.10299v3 |
http://arxiv.org/pdf/1803.10299v3.pdf | |
PWC | https://paperswithcode.com/paper/multi-modal-data-augmentation-for-end-to-end |
Repo | |
Framework | |
Effective Feature Learning with Unsupervised Learning for Improving the Predictive Models in Massive Open Online Courses
Title | Effective Feature Learning with Unsupervised Learning for Improving the Predictive Models in Massive Open Online Courses |
Authors | Mucong Ding, Kai Yang, Dit-Yan Yeung, Ting-Chuen Pong |
Abstract | The effectiveness of learning in massive open online courses (MOOCs) can be significantly enhanced by introducing personalized intervention schemes which rely on building predictive models of student learning behaviors such as some engagement or performance indicators. A major challenge that has to be addressed when building such models is to design handcrafted features that are effective for the prediction task at hand. In this paper, we make the first attempt to solve the feature learning problem by taking the unsupervised learning approach to learn a compact representation of the raw features with a large degree of redundancy. Specifically, in order to capture the underlying learning patterns in the content domain and the temporal nature of the clickstream data, we train a modified auto-encoder (AE) combined with the long short-term memory (LSTM) network to obtain a fixed-length embedding for each input sequence. When compared with the original features, the new features that correspond to the embedding obtained by the modified LSTM-AE are not only more parsimonious but also more discriminative for our prediction task. Using simple supervised learning models, the learned features can improve the prediction accuracy by up to 17% compared with the supervised neural networks and reduce overfitting to the dominant low-performing group of students, specifically in the task of predicting students’ performance. Our approach is generic in the sense that it is not restricted to a specific supervised learning model nor a specific prediction task for MOOC learning analytics. |
Tasks | |
Published | 2018-12-12 |
URL | http://arxiv.org/abs/1812.05044v2 |
http://arxiv.org/pdf/1812.05044v2.pdf | |
PWC | https://paperswithcode.com/paper/effective-feature-learning-with-unsupervised |
Repo | |
Framework | |
Making Classifier Chains Resilient to Class Imbalance
Title | Making Classifier Chains Resilient to Class Imbalance |
Authors | Bin Liu, Grigorios Tsoumakas |
Abstract | Class imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics. |
Tasks | Multi-Label Learning |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11393v4 |
http://arxiv.org/pdf/1807.11393v4.pdf | |
PWC | https://paperswithcode.com/paper/making-classifier-chains-resilient-to-class |
Repo | |
Framework | |
Ensemble Soft-Margin Softmax Loss for Image Classification
Title | Ensemble Soft-Margin Softmax Loss for Image Classification |
Authors | Xiaobo Wang, Shifeng Zhang, Zhen Lei, Si Liu, Xiaojie Guo, Stan Z. Li |
Abstract | Softmax loss is arguably one of the most popular losses to train CNN models for image classification. However, recent works have exposed its limitation on feature discriminability. This paper casts a new viewpoint on the weakness of softmax loss. On the one hand, the CNN features learned using the softmax loss are often inadequately discriminative. We hence introduce a soft-margin softmax function to explicitly encourage the discrimination between different classes. On the other hand, the learned classifier of softmax loss is weak. We propose to assemble multiple these weak classifiers to a strong one, inspired by the recognition that the diversity among weak classifiers is critical to a good ensemble. To achieve the diversity, we adopt the Hilbert-Schmidt Independence Criterion (HSIC). Considering these two aspects in one framework, we design a novel loss, named as Ensemble soft-Margin Softmax (EM-Softmax). Extensive experiments on benchmark datasets are conducted to show the superiority of our design over the baseline softmax loss and several state-of-the-art alternatives. |
Tasks | Image Classification |
Published | 2018-05-10 |
URL | http://arxiv.org/abs/1805.03922v1 |
http://arxiv.org/pdf/1805.03922v1.pdf | |
PWC | https://paperswithcode.com/paper/ensemble-soft-margin-softmax-loss-for-image |
Repo | |
Framework | |
Data-Driven Investigative Journalism For Connectas Dataset
Title | Data-Driven Investigative Journalism For Connectas Dataset |
Authors | Aniket Jain, Bhavya Sharma, Paridhi Choudhary, Rohan Sangave, William Yang |
Abstract | The following paper explores the possibility of using Machine Learning algorithms to detect the cases of corruption and malpractice by governments. The dataset used by the authors contains information about several government contracts in Colombia from year 2007 to 2012. The authors begin with exploring and cleaning the data, followed by which they perform feature engineering before finally implementing Machine Learning models to detect anomalies in the given dataset. |
Tasks | Feature Engineering |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08675v1 |
http://arxiv.org/pdf/1804.08675v1.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-investigative-journalism-for |
Repo | |
Framework | |
META-DES: A Dynamic Ensemble Selection Framework using Meta-Learning
Title | META-DES: A Dynamic Ensemble Selection Framework using Meta-Learning |
Authors | Rafael M. O. Cruz, Robert Sabourin, George D. C. Cavalcanti, Tsang Ing Ren |
Abstract | Dynamic ensemble selection systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of a base classifier, such as, its accuracy in local regions of the feature space around the query instance. However, using only one criterion about the behavior of a base classifier is not sufficient to accurately estimate its level of competence. In this paper, we present a novel dynamic ensemble selection framework using meta-learning. We propose five distinct sets of meta-features, each one corresponding to a different criterion to measure the level of competence of a classifier for the classification of input samples. The meta-features are extracted from the training data and used to train a meta-classifier to predict whether or not a base classifier is competent enough to classify an input instance. During the generalization phase, the meta-features are extracted from the query instance and passed down as input to the meta-classifier. The meta-classifier estimates, whether a base classifier is competent enough to be added to the ensemble. Experiments are conducted over several small sample size classification problems, i.e., problems with a high degree of uncertainty due to the lack of training data. Experimental results show the proposed meta-learning framework greatly improves classification accuracy when compared against current state-of-the-art dynamic ensemble selection techniques. |
Tasks | Meta-Learning |
Published | 2018-09-30 |
URL | http://arxiv.org/abs/1810.01270v1 |
http://arxiv.org/pdf/1810.01270v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-des-a-dynamic-ensemble-selection |
Repo | |
Framework | |
Patient representation learning and interpretable evaluation using clinical notes
Title | Patient representation learning and interpretable evaluation using clinical notes |
Authors | Madhumita Sushil, Simon Šuster, Kim Luyckx, Walter Daelemans |
Abstract | We have three contributions in this work: 1. We explore the utility of a stacked denoising autoencoder and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes. To analyze if these representations are transferable across tasks, we evaluate them in multiple supervised setups to predict patient mortality, primary diagnostic and procedural category, and gender. We compare their performance with sparse representations obtained from a bag-of-words model. We observe that the learned generalized representations significantly outperform the sparse representations when we have few positive instances to learn from, and there is an absence of strong lexical features. 2. We compare the model performance of the feature set constructed from a bag of words to that obtained from medical concepts. In the latter case, concepts represent problems, treatments, and tests. We find that concept identification does not improve the classification performance. 3. We propose novel techniques to facilitate model interpretability. To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model. Further, we calculate feature sensitivity across two networks to identify the most significant input features for different classification tasks when we use these pretrained representations as the supervised input. We successfully extract the most influential features for the pipeline using this technique. |
Tasks | Denoising, Representation Learning |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.01395v1 |
http://arxiv.org/pdf/1807.01395v1.pdf | |
PWC | https://paperswithcode.com/paper/patient-representation-learning-and |
Repo | |
Framework | |
The Theory and Algorithm of Ergodic Inference
Title | The Theory and Algorithm of Ergodic Inference |
Authors | Yichuan Zhang |
Abstract | Approximate inference algorithm is one of the fundamental research fields in machine learning. The two dominant theoretical inference frameworks in machine learning are variational inference (VI) and Markov chain Monte Carlo (MCMC). However, because of the fundamental limitation in the theory, it is very challenging to improve existing VI and MCMC methods on both the computational scalability and statistical efficiency. To overcome this obstacle, we propose a new theoretical inference framework called ergodic Inference based on the fundamental property of ergodic transformations. The key contribution of this work is to establish the theoretical foundation of ergodic inference for the development of practical algorithms in future work. |
Tasks | |
Published | 2018-11-17 |
URL | http://arxiv.org/abs/1811.07192v1 |
http://arxiv.org/pdf/1811.07192v1.pdf | |
PWC | https://paperswithcode.com/paper/the-theory-and-algorithm-of-ergodic-inference |
Repo | |
Framework | |
Newton-MR: Newton’s Method Without Smoothness or Convexity
Title | Newton-MR: Newton’s Method Without Smoothness or Convexity |
Authors | Fred Roosta, Yang Liu, Peng Xu, Michael W. Mahoney |
Abstract | Establishing global convergence of Newton-CG has long been limited to making strong convexity assumptions. Hence, many Newton-type variants have been proposed which aim at extending Newton-CG beyond strongly convex problems. However, the analysis of almost all these non-convex methods commonly relies on the Lipschitz continuity assumptions of the gradient and Hessian. Furthermore, the sub-problems of many of these methods are themselves non-trivial optimization problems. Here, we show that two simple modifications of Newton-CG result in an algorithm, called Newton-MR, which offers a diverse range of algorithmic and theoretical advantages. Newton-MR can be applied, beyond the traditional convex settings, to invex problems. Sub-problems of Newton-MR are simple ordinary least squares. Furthermore, by introducing a weaker notion of joint regularity of Hessian and gradient, we establish the global convergence of Newton-MR even in the absence of the usual smoothness assumptions. We also obtain Newton-MR’s local convergence guarantee that generalizes that of Newton-CG. Specifically, unlike the local convergence analysis of Newton-CG, which relies on the notion of isolated minimum, our analysis amounts to local convergence to the set of minima. |
Tasks | |
Published | 2018-09-30 |
URL | https://arxiv.org/abs/1810.00303v2 |
https://arxiv.org/pdf/1810.00303v2.pdf | |
PWC | https://paperswithcode.com/paper/newton-mr-newtons-method-without-smoothness |
Repo | |
Framework | |