Paper Group AWR 93
Business Process Deviance Mining: Review and Evaluation. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Measuring Neural Net Robustness with Constraints. CliqueCNN: Deep Unsupervised Exemplar Learning. An Architecture for Deep, Hierarchical Generative Models. Deep Transfer Learning for Person Re-identification. B …
Business Process Deviance Mining: Review and Evaluation
Title | Business Process Deviance Mining: Review and Evaluation |
Authors | Hoang Nguyen, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, Suriadi Suriadi |
Abstract | Business process deviance refers to the phenomenon whereby a subset of the executions of a business process deviate, in a negative or positive way, with respect to its expected or desirable outcomes. Deviant executions of a business process include those that violate compliance rules, or executions that undershoot or exceed performance targets. Deviance mining is concerned with uncovering the reasons for deviant executions by analyzing business process event logs. This article provides a systematic review and comparative evaluation of deviance mining approaches based on a family of data mining techniques known as sequence classification. Using real-life logs from multiple domains, we evaluate a range of feature types and classification methods in terms of their ability to accurately discriminate between normal and deviant executions of a process. We also analyze the interestingness of the rule sets extracted using different methods. We observe that feature sets extracted using pattern mining techniques only slightly outperform simpler feature sets based on counts of individual activity occurrences in a trace. |
Tasks | |
Published | 2016-08-29 |
URL | http://arxiv.org/abs/1608.08252v1 |
http://arxiv.org/pdf/1608.08252v1.pdf | |
PWC | https://paperswithcode.com/paper/business-process-deviance-mining-review-and |
Repo | https://github.com/Abercus/devianceminingoverview |
Framework | none |
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Title | Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings |
Authors | Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai |
Abstract | The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to “debias” the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias. |
Tasks | Word Embeddings |
Published | 2016-07-21 |
URL | http://arxiv.org/abs/1607.06520v1 |
http://arxiv.org/pdf/1607.06520v1.pdf | |
PWC | https://paperswithcode.com/paper/man-is-to-computer-programmer-as-woman-is-to |
Repo | https://github.com/if1015-datascience-ufpe/2018-2-ex3-bash |
Framework | none |
Measuring Neural Net Robustness with Constraints
Title | Measuring Neural Net Robustness with Constraints |
Authors | Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, Antonio Criminisi |
Abstract | Despite having high accuracy, neural nets have been shown to be susceptible to adversarial examples, where a small perturbation to an input can cause it to become mislabeled. We propose metrics for measuring the robustness of a neural net and devise a novel algorithm for approximating these metrics based on an encoding of robustness as a linear program. We show how our metrics can be used to evaluate the robustness of deep neural nets with experiments on the MNIST and CIFAR-10 datasets. Our algorithm generates more informative estimates of robustness metrics compared to estimates based on existing algorithms. Furthermore, we show how existing approaches to improving robustness “overfit” to adversarial examples generated using a specific algorithm. Finally, we show that our techniques can be used to additionally improve neural net robustness both according to the metrics that we propose, but also according to previously proposed metrics. |
Tasks | |
Published | 2016-05-24 |
URL | http://arxiv.org/abs/1605.07262v2 |
http://arxiv.org/pdf/1605.07262v2.pdf | |
PWC | https://paperswithcode.com/paper/measuring-neural-net-robustness-with |
Repo | https://github.com/Microsoft/NeuralNetworkAnalysis |
Framework | none |
CliqueCNN: Deep Unsupervised Exemplar Learning
Title | CliqueCNN: Deep Unsupervised Exemplar Learning |
Authors | Miguel A. Bautista, Artsiom Sanakoyeu, Ekaterina Sutter, Björn Ommer |
Abstract | Exemplar learning is a powerful paradigm for discovering visual similarities in an unsupervised manner. In this context, however, the recent breakthrough in deep learning could not yet unfold its full potential. With only a single positive sample, a great imbalance between one positive and many negatives, and unreliable relationships between most samples, training of Convolutional Neural networks is impaired. Given weak estimates of local distance we propose a single optimization problem to extract batches of samples with mutually consistent relations. Conflicting relations are distributed over different batches and similar samples are grouped into compact cliques. Learning exemplar similarities is framed as a sequence of clique categorization tasks. The CNN then consolidates transitivity relations within and between cliques and learns a single representation for all samples without the need for labels. The proposed unsupervised approach has shown competitive performance on detailed posture analysis and object classification. |
Tasks | Object Classification |
Published | 2016-08-31 |
URL | http://arxiv.org/abs/1608.08792v1 |
http://arxiv.org/pdf/1608.08792v1.pdf | |
PWC | https://paperswithcode.com/paper/cliquecnn-deep-unsupervised-exemplar-learning |
Repo | https://github.com/asanakoy/cliquecnn |
Framework | none |
An Architecture for Deep, Hierarchical Generative Models
Title | An Architecture for Deep, Hierarchical Generative Models |
Authors | Philip Bachman |
Abstract | We present an architecture which lets us train deep, directed generative models with many layers of latent variables. We include deterministic paths between all latent variables and the generated output, and provide a richer set of connections between computations for inference and generation, which enables more effective communication of information throughout the model during training. To improve performance on natural images, we incorporate a lightweight autoregressive model in the reconstruction distribution. These techniques permit end-to-end training of models with 10+ layers of latent variables. Experiments show that our approach achieves state-of-the-art performance on standard image modelling benchmarks, can expose latent class structure in the absence of label information, and can provide convincing imputations of occluded regions in natural images. |
Tasks | |
Published | 2016-12-08 |
URL | http://arxiv.org/abs/1612.04739v1 |
http://arxiv.org/pdf/1612.04739v1.pdf | |
PWC | https://paperswithcode.com/paper/an-architecture-for-deep-hierarchical |
Repo | https://github.com/Philip-Bachman/MatNets-NIPS |
Framework | none |
Deep Transfer Learning for Person Re-identification
Title | Deep Transfer Learning for Person Re-identification |
Authors | Mengyue Geng, Yaowei Wang, Tao Xiang, Yonghong Tian |
Abstract | Person re-identification (Re-ID) poses a unique challenge to deep learning: how to learn a deep model with millions of parameters on a small training set of few or no labels. In this paper, a number of deep transfer learning models are proposed to address the data sparsity problem. First, a deep network architecture is designed which differs from existing deep Re-ID models in that (a) it is more suitable for transferring representations learned from large image classification datasets, and (b) classification loss and verification loss are combined, each of which adopts a different dropout strategy. Second, a two-stepped fine-tuning strategy is developed to transfer knowledge from auxiliary datasets. Third, given an unlabelled Re-ID dataset, a novel unsupervised deep transfer learning model is developed based on co-training. The proposed models outperform the state-of-the-art deep Re-ID models by large margins: we achieve Rank-1 accuracy of 85.4%, 83.7% and 56.3% on CUHK03, Market1501, and VIPeR respectively, whilst on VIPeR, our unsupervised model (45.1%) beats most supervised models. |
Tasks | Image Classification, Person Re-Identification, Transfer Learning |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05244v2 |
http://arxiv.org/pdf/1611.05244v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-transfer-learning-for-person-re |
Repo | https://github.com/KaiyangZhou/deep-person-reid |
Framework | pytorch |
Bayesian quantile additive regression trees
Title | Bayesian quantile additive regression trees |
Authors | Bereket P. Kindo, Hao Wang, Timothy Hanson, Edsel A. Peña |
Abstract | Ensemble of regression trees have become popular statistical tools for the estimation of conditional mean given a set of predictors. However, quantile regression trees and their ensembles have not yet garnered much attention despite the increasing popularity of the linear quantile regression model. This work proposes a Bayesian quantile additive regression trees model that shows very good predictive performance illustrated using simulation studies and real data applications. Further extension to tackle binary classification problems is also considered. |
Tasks | |
Published | 2016-07-10 |
URL | http://arxiv.org/abs/1607.02676v1 |
http://arxiv.org/pdf/1607.02676v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-quantile-additive-regression-trees |
Repo | https://github.com/bpkindo/bayesqart |
Framework | none |
Medical Concept Representation Learning from Electronic Health Records and its Application on Heart Failure Prediction
Title | Medical Concept Representation Learning from Electronic Health Records and its Application on Heart Failure Prediction |
Authors | Edward Choi, Andy Schuetz, Walter F. Stewart, Jimeng Sun |
Abstract | Objective: To transform heterogeneous clinical data from electronic health records into clinically meaningful constructed features using data driven method that rely, in part, on temporal relations among data. Materials and Methods: The clinically meaningful representations of medical concepts and patients are the key for health analytic applications. Most of existing approaches directly construct features mapped to raw data (e.g., ICD or CPT codes), or utilize some ontology mapping such as SNOMED codes. However, none of the existing approaches leverage EHR data directly for learning such concept representation. We propose a new way to represent heterogeneous medical concepts (e.g., diagnoses, medications and procedures) based on co-occurrence patterns in longitudinal electronic health records. The intuition behind the method is to map medical concepts that are co-occuring closely in time to similar concept vectors so that their distance will be small. We also derive a simple method to construct patient vectors from the related medical concept vectors. Results: For qualitative evaluation, we study similar medical concepts across diagnosis, medication and procedure. In quantitative evaluation, our proposed representation significantly improves the predictive modeling performance for onset of heart failure (HF), where classification methods (e.g. logistic regression, neural network, support vector machine and K-nearest neighbors) achieve up to 23% improvement in area under the ROC curve (AUC) using this proposed representation. Conclusion: We proposed an effective method for patient and medical concept representation learning. The resulting representation can map relevant concepts together and also improves predictive modeling performance. |
Tasks | Representation Learning |
Published | 2016-02-11 |
URL | http://arxiv.org/abs/1602.03686v2 |
http://arxiv.org/pdf/1602.03686v2.pdf | |
PWC | https://paperswithcode.com/paper/medical-concept-representation-learning-from |
Repo | https://github.com/mp2893/retain |
Framework | none |
Optimizing Neural Network Hyperparameters with Gaussian Processes for Dialog Act Classification
Title | Optimizing Neural Network Hyperparameters with Gaussian Processes for Dialog Act Classification |
Authors | Franck Dernoncourt, Ji Young Lee |
Abstract | Systems based on artificial neural networks (ANNs) have achieved state-of-the-art results in many natural language processing tasks. Although ANNs do not require manually engineered features, ANNs have many hyperparameters to be optimized. The choice of hyperparameters significantly impacts models’ performances. However, the ANN hyperparameters are typically chosen by manual, grid, or random search, which either requires expert experiences or is computationally expensive. Recent approaches based on Bayesian optimization using Gaussian processes (GPs) is a more systematic way to automatically pinpoint optimal or near-optimal machine learning hyperparameters. Using a previously published ANN model yielding state-of-the-art results for dialog act classification, we demonstrate that optimizing hyperparameters using GP further improves the results, and reduces the computational time by a factor of 4 compared to a random search. Therefore it is a useful technique for tuning ANN models to yield the best performances for natural language processing tasks. |
Tasks | Dialog Act Classification, Gaussian Processes |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08703v1 |
http://arxiv.org/pdf/1609.08703v1.pdf | |
PWC | https://paperswithcode.com/paper/optimizing-neural-network-hyperparameters |
Repo | https://github.com/Franck-Dernoncourt/slt2016 |
Framework | none |
Discriminative Correlation Filter with Channel and Spatial Reliability
Title | Discriminative Correlation Filter with Channel and Spatial Reliability |
Authors | Alan Lukežič, Tomáš Vojíř, Luka Čehovin, Jiří Matas, Matej Kristan |
Abstract | Short-term tracking is an open and challenging problem for which discriminative correlation filters (DCF) have shown excellent performance. We introduce the channel and spatial reliability concepts to DCF tracking and provide a novel learning algorithm for its efficient and seamless integration in the filter update and the tracking process. The spatial reliability map adjusts the filter support to the part of the object suitable for tracking. This both allows to enlarge the search region and improves tracking of non-rectangular objects. Reliability scores reflect channel-wise quality of the learned filters and are used as feature weighting coefficients in localization. Experimentally, with only two simple standard features, HoGs and Colornames, the novel CSR-DCF method – DCF with Channel and Spatial Reliability – achieves state-of-the-art results on VOT 2016, VOT 2015 and OTB100. The CSR-DCF runs in real-time on a CPU. |
Tasks | Visual Object Tracking |
Published | 2016-11-25 |
URL | http://arxiv.org/abs/1611.08461v3 |
http://arxiv.org/pdf/1611.08461v3.pdf | |
PWC | https://paperswithcode.com/paper/discriminative-correlation-filter-with |
Repo | https://github.com/alanlukezic/csr-dcf |
Framework | none |
Robust Named Entity Recognition in Idiosyncratic Domains
Title | Robust Named Entity Recognition in Idiosyncratic Domains |
Authors | Sebastian Arnold, Felix A. Gers, Torsten Kilias, Alexander Löser |
Abstract | Named entity recognition often fails in idiosyncratic domains. That causes a problem for depending tasks, such as entity linking and relation extraction. We propose a generic and robust approach for high-recall named entity recognition. Our approach is easy to train and offers strong generalization over diverse domain-specific language, such as news documents (e.g. Reuters) or biomedical text (e.g. Medline). Our approach is based on deep contextual sequence learning and utilizes stacked bidirectional LSTM networks. Our model is trained with only few hundred labeled sentences and does not rely on further external knowledge. We report from our results F1 scores in the range of 84-94% on standard datasets. |
Tasks | Entity Linking, Named Entity Recognition, Relation Extraction |
Published | 2016-08-24 |
URL | http://arxiv.org/abs/1608.06757v1 |
http://arxiv.org/pdf/1608.06757v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-named-entity-recognition-in |
Repo | https://github.com/sebastianarnold/TeXoo |
Framework | none |
Generalizing and Hybridizing Count-based and Neural Language Models
Title | Generalizing and Hybridizing Count-based and Neural Language Models |
Authors | Graham Neubig, Chris Dyer |
Abstract | Language models (LMs) are statistical models that calculate probabilities over sequences of words or other discrete symbols. Currently two major paradigms for language modeling exist: count-based n-gram models, which have advantages of scalability and test-time speed, and neural LMs, which often achieve superior modeling performance. We demonstrate how both varieties of models can be unified in a single modeling framework that defines a set of probability distributions over the vocabulary of words, and then dynamically calculates mixture weights over these distributions. This formulation allows us to create novel hybrid models that combine the desirable features of count-based and neural LMs, and experiments demonstrate the advantages of these approaches. |
Tasks | Language Modelling |
Published | 2016-06-01 |
URL | http://arxiv.org/abs/1606.00499v2 |
http://arxiv.org/pdf/1606.00499v2.pdf | |
PWC | https://paperswithcode.com/paper/generalizing-and-hybridizing-count-based-and |
Repo | https://github.com/neubig/modlm |
Framework | none |
Towards the Science of Security and Privacy in Machine Learning
Title | Towards the Science of Security and Privacy in Machine Learning |
Authors | Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, Michael Wellman |
Abstract | Advances in machine learning (ML) in recent years have enabled a dizzying array of applications such as data analytics, autonomous systems, and security diagnostics. ML is now pervasive—new systems and models are being deployed in every domain imaginable, leading to rapid and widespread deployment of software based inference and decision making. There is growing recognition that ML exposes new vulnerabilities in software systems, yet the technical community’s understanding of the nature and extent of these vulnerabilities remains limited. We systematize recent findings on ML security and privacy, focusing on attacks identified on these systems and defenses crafted to date. We articulate a comprehensive threat model for ML, and categorize attacks and defenses within an adversarial framework. Key insights resulting from works both in the ML and security communities are identified and the effectiveness of approaches are related to structural elements of ML algorithms and the data used to train them. We conclude by formally exploring the opposing relationship between model accuracy and resilience to adversarial manipulation. Through these explorations, we show that there are (possibly unavoidable) tensions between model complexity, accuracy, and resilience that must be calibrated for the environments in which they will be used. |
Tasks | Decision Making |
Published | 2016-11-11 |
URL | http://arxiv.org/abs/1611.03814v1 |
http://arxiv.org/pdf/1611.03814v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-the-science-of-security-and-privacy |
Repo | https://github.com/Abukara/Udacity_Secure-and-Private-AI-Scholarship-Challenge-Nanodegree-Program |
Framework | tf |
Covariate Regularized Community Detection in Sparse Graphs
Title | Covariate Regularized Community Detection in Sparse Graphs |
Authors | Bowei Yan, Purnamrita Sarkar |
Abstract | In this paper, we investigate community detection in networks in the presence of node covariates. In many instances, covariates and networks individually only give a partial view of the cluster structure. One needs to jointly infer the full cluster structure by considering both. In statistics, an emerging body of work has been focused on combining information from both the edges in the network and the node covariates to infer community memberships. However, so far the theoretical guarantees have been established in the dense regime, where the network can lead to perfect clustering under a broad parameter regime, and hence the role of covariates is often not clear. In this paper, we examine sparse networks in conjunction with finite dimensional sub-gaussian mixtures as covariates under moderate separation conditions. In this setting each individual source can only cluster a non-vanishing fraction of nodes correctly. We propose a simple optimization framework which provably improves clustering accuracy when the two sources carry partial information about the cluster memberships, and hence perform poorly on their own. Our optimization problem can be solved using scalable convex optimization algorithms. Using a variety of simulated and real data examples, we show that the proposed method outperforms other existing methodology. |
Tasks | Community Detection |
Published | 2016-07-10 |
URL | http://arxiv.org/abs/1607.02675v4 |
http://arxiv.org/pdf/1607.02675v4.pdf | |
PWC | https://paperswithcode.com/paper/covariate-regularized-community-detection-in |
Repo | https://github.com/boweiYan/SDP_SBM_unbalanced_size |
Framework | none |
Density estimation using Real NVP
Title | Density estimation using Real NVP |
Authors | Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio |
Abstract | Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning. Specifically, designing models with tractable learning, sampling, inference and evaluation is crucial in solving this task. We extend the space of such models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space. We demonstrate its ability to model natural images on four datasets through sampling, log-likelihood evaluation and latent variable manipulations. |
Tasks | Density Estimation, Image Generation |
Published | 2016-05-27 |
URL | http://arxiv.org/abs/1605.08803v3 |
http://arxiv.org/pdf/1605.08803v3.pdf | |
PWC | https://paperswithcode.com/paper/density-estimation-using-real-nvp |
Repo | https://github.com/ANLGBOY/RealNVP-with-PyTorch |
Framework | pytorch |