Paper Group ANR 1113
On Classifying Sepsis Heterogeneity in the ICU: Insight Using Machine Learning. HCNAF: Hyper-Conditioned Neural Autoregressive Flow and its Application for Probabilistic Occupancy Map Forecasting. FfDL : A Flexible Multi-tenant Deep Learning Platform. Attention Based Neural Architecture for Rumor Detection with Author Context Awareness. Learning Mi …
On Classifying Sepsis Heterogeneity in the ICU: Insight Using Machine Learning
Title | On Classifying Sepsis Heterogeneity in the ICU: Insight Using Machine Learning |
Authors | Zina Ibrahim, Honghan Wu, Ahmed Hamoud, Lukas Stappen, Richard Dobson, Andrea Agarossi |
Abstract | Current machine learning models aiming to predict sepsis from Electronic Health Records (EHR) do not account for the heterogeneity of the condition, despite its emerging importance in prognosis and treatment. This work demonstrates the added value of stratifying the types of organ dysfunction observed in patients who develop sepsis in the ICU in improving the ability to recognise patients at risk of sepsis from their EHR data. Using an ICU dataset of 13,728 records, we identify clinically significant sepsis subpopulations with distinct organ dysfunction patterns. Classification experiments using Random Forest, Gradient Boost Trees and Support Vector Machines, aiming to distinguish patients who develop sepsis in the ICU from those who do not, show that features selected using sepsis subpopulations as background knowledge yield a superior performance regardless of the classification model used. Our findings can steer machine learning efforts towards more personalised models for complex conditions including sepsis. |
Tasks | |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00672v2 |
https://arxiv.org/pdf/1912.00672v2.pdf | |
PWC | https://paperswithcode.com/paper/on-classifying-sepsis-heterogeneity-in-the |
Repo | |
Framework | |
HCNAF: Hyper-Conditioned Neural Autoregressive Flow and its Application for Probabilistic Occupancy Map Forecasting
Title | HCNAF: Hyper-Conditioned Neural Autoregressive Flow and its Application for Probabilistic Occupancy Map Forecasting |
Authors | Geunseob Oh, Jean-Sebastien Valois |
Abstract | We introduce Hyper-Conditioned Neural Autoregressive Flow (HCNAF); a powerful universal distribution approximator designed to model arbitrarily complex conditional probability density functions. HCNAF consists of a neural-net based conditional autoregressive flow (AF) and a hyper-network that can take large conditions in non-autoregressive fashion and outputs the network parameters of the AF. Like other flow models, HCNAF performs exact likelihood inference. We demonstrate the effectiveness and attributes of HCNAF, including its generalization capability over unseen conditions and show that HCNAF outperforms recent flow models in a conditional density estimation task for MNIST. We also show that HCNAF scales up to complex high-dimensional prediction problems of the magnitude of self-driving and that HCNAF yields a state-of-the-art performance in a public self-driving dataset. |
Tasks | Density Estimation |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.08111v2 |
https://arxiv.org/pdf/1912.08111v2.pdf | |
PWC | https://paperswithcode.com/paper/hcnaf-hyper-conditioned-neural-autoregressive |
Repo | |
Framework | |
FfDL : A Flexible Multi-tenant Deep Learning Platform
Title | FfDL : A Flexible Multi-tenant Deep Learning Platform |
Authors | K. R. Jayaram, Vinod Muthusamy, Parijat Dube, Vatche Ishakian, Chen Wang, Benjamin Herta, Scott Boag, Diana Arroyo, Asser Tantawi, Archit Verma, Falk Pollok, Rania Khalaf |
Abstract | Deep learning (DL) is becoming increasingly popular in several application domains and has made several new application features involving computer vision, speech recognition and synthesis, self-driving automobiles, drug design, etc. feasible and accurate. As a result, large scale on-premise and cloud-hosted deep learning platforms have become essential infrastructure in many organizations. These systems accept, schedule, manage and execute DL training jobs at scale. This paper describes the design, implementation and our experiences with FfDL, a DL platform used at IBM. We describe how our design balances dependability with scalability, elasticity, flexibility and efficiency. We examine FfDL qualitatively through a retrospective look at the lessons learned from building, operating, and supporting FfDL; and quantitatively through a detailed empirical evaluation of FfDL, including the overheads introduced by the platform for various deep learning models, the load and performance observed in a real case study using FfDL within our organization, the frequency of various faults observed including unanticipated faults, and experiments demonstrating the benefits of various scheduling policies. FfDL has been open-sourced. |
Tasks | Speech Recognition |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06526v1 |
https://arxiv.org/pdf/1909.06526v1.pdf | |
PWC | https://paperswithcode.com/paper/ffdl-a-flexible-multi-tenant-deep-learning |
Repo | |
Framework | |
Attention Based Neural Architecture for Rumor Detection with Author Context Awareness
Title | Attention Based Neural Architecture for Rumor Detection with Author Context Awareness |
Authors | Sansiri Tarnpradab, Kien A. Hua |
Abstract | The prevalence of social media has made information sharing possible across the globe. The downside, unfortunately, is the wide spread of misinformation. Methods applied in most previous rumor classifiers give an equal weight, or attention, to words in the microblog, and do not take the context beyond microblog contents into account; therefore, the accuracy becomes plateaued. In this research, we propose an ensemble neural architecture to detect rumor on Twitter. The architecture incorporates word attention and context from the author to enhance the classification performance. In particular, the word-level attention mechanism enables the architecture to put more emphasis on important words when constructing the text representation. To derive further context, microblog posts composed by individual authors are exploited since they can reflect style and characteristics in spreading information, which are significant cues to help classify whether the shared content is rumor or legitimate news. The experiment on the real-world Twitter dataset collected from two well-known rumor tracking websites demonstrates promising results. |
Tasks | |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1910.01458v1 |
https://arxiv.org/pdf/1910.01458v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-neural-architecture-for-rumor |
Repo | |
Framework | |
Learning Mixtures of Linear Regressions in Subexponential Time via Fourier Moments
Title | Learning Mixtures of Linear Regressions in Subexponential Time via Fourier Moments |
Authors | Sitan Chen, Jerry Li, Zhao Song |
Abstract | We consider the problem of learning a mixture of linear regressions (MLRs). An MLR is specified by $k$ nonnegative mixing weights $p_1, \ldots, p_k$ summing to $1$, and $k$ unknown regressors $w_1,…,w_k\in\mathbb{R}^d$. A sample from the MLR is drawn by sampling $i$ with probability $p_i$, then outputting $(x, y)$ where $y = \langle x, w_i \rangle + \eta$, where $\eta\sim\mathcal{N}(0,\varsigma^2)$ for noise rate $\varsigma$. Mixtures of linear regressions are a popular generative model and have been studied extensively in machine learning and theoretical computer science. However, all previous algorithms for learning the parameters of an MLR require running time and sample complexity scaling exponentially with $k$. In this paper, we give the first algorithm for learning an MLR that runs in time which is sub-exponential in $k$. Specifically, we give an algorithm which runs in time $\widetilde{O}(d)\cdot\exp(\widetilde{O}(\sqrt{k}))$ and outputs the parameters of the MLR to high accuracy, even in the presence of nontrivial regression noise. We demonstrate a new method that we call “Fourier moment descent” which uses univariate density estimation and low-degree moments of the Fourier transform of suitable univariate projections of the MLR to iteratively refine our estimate of the parameters. To the best of our knowledge, these techniques have never been used in the context of high dimensional distribution learning, and may be of independent interest. We also show that our techniques can be used to give a sub-exponential time algorithm for learning mixtures of hyperplanes, a natural hard instance of the subspace clustering problem. |
Tasks | Density Estimation |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07629v1 |
https://arxiv.org/pdf/1912.07629v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-mixtures-of-linear-regressions-in |
Repo | |
Framework | |
Supervised Neural Networks for Helioseismic Ring-Diagram Inversions
Title | Supervised Neural Networks for Helioseismic Ring-Diagram Inversions |
Authors | Rasha Alshehhi, Chris S. Hanson, Laurent Gizon, Shravan Hanasoge |
Abstract | The inversion of ring fit parameters to obtain subsurface flow maps in ring-diagram analysis for 8 years of SDO observations is computationally expensive, requiring ~3200 CPU hours. In this paper we apply machine learning techniques to the inversion in order to speed up calculations. Specifically, we train a predictor for subsurface flows using the mode fit parameters and the previous inversion results, to replace future inversion requirements. We utilize Artificial Neural Networks as a supervised learning method for predicting the flows in 15 degree ring tiles. To demonstrate that the machine learning results still contain the subtle signatures key to local helioseismic studies, we use the machine learning results to study the recently discovered solar equatorial Rossby waves. The Artificial Neural Network is computationally efficient, able to make future flow predictions of an entire Carrington rotation in a matter of seconds, which is much faster than the current ~31 CPU hours. Initial training of the networks requires ~3 CPU hours. The trained Artificial Neural Network can achieve a root mean-square error equal to approximately half that reported for the velocity inversions, demonstrating the accuracy of the machine learning (and perhaps the overestimation of the original errors from the ring-diagram pipeline). We find the signature of equatorial Rossby waves in the machine learning flows covering six years of data, demonstrating that small-amplitude signals are maintained. The recovery of Rossby waves in the machine learning flow maps can be achieved with only one Carrington rotation (27.275 days) of training data. We have shown that machine learning can be applied to, and perform more efficiently than the current ring-diagram inversion. The computation burden of the machine learning includes 3 CPU hours for initial training, then around 0.0001 CPU hours for future predictions. |
Tasks | |
Published | 2019-01-06 |
URL | http://arxiv.org/abs/1901.01505v1 |
http://arxiv.org/pdf/1901.01505v1.pdf | |
PWC | https://paperswithcode.com/paper/supervised-neural-networks-for-helioseismic |
Repo | |
Framework | |
Mask2Lesion: Mask-Constrained Adversarial Skin Lesion Image Synthesis
Title | Mask2Lesion: Mask-Constrained Adversarial Skin Lesion Image Synthesis |
Authors | Kumar Abhishek, Ghassan Hamarneh |
Abstract | Skin lesion segmentation is a vital task in skin cancer diagnosis and further treatment. Although deep learning based approaches have significantly improved the segmentation accuracy, these algorithms are still reliant on having a large enough dataset in order to achieve adequate results. Inspired by the immense success of generative adversarial networks (GANs), we propose a GAN-based augmentation of the original dataset in order to improve the segmentation performance. In particular, we use the segmentation masks available in the training dataset to train the Mask2Lesion model, and use the model to generate new lesion images given any arbitrary mask, which are then used to augment the original training dataset. We test Mask2Lesion augmentation on the ISBI ISIC 2017 Skin Lesion Segmentation Challenge dataset and achieve an improvement of 5.17% in the mean Dice score as compared to a model trained with only classical data augmentation techniques. |
Tasks | Data Augmentation, Image Generation, Image-to-Image Translation, Lesion Segmentation |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05845v2 |
https://arxiv.org/pdf/1906.05845v2.pdf | |
PWC | https://paperswithcode.com/paper/mask2lesion-mask-constrained-adversarial-skin |
Repo | |
Framework | |
Unbiased estimators for the variance of MMD estimators
Title | Unbiased estimators for the variance of MMD estimators |
Authors | Dougal J. Sutherland |
Abstract | The maximum mean discrepancy (MMD) is a kernel-based distance between probability distributions useful in many applications (Gretton et al. 2012), bearing a simple estimator with pleasing computational and statistical properties. Being able to efficiently estimate the variance of this estimator is very helpful to various problems in two-sample testing. Towards this end, Bounliphone et al. (2016) used the theory of U-statistics to derive estimators for the variance of an MMD estimator, and differences between two such estimators. Their estimator, however, drops lower-order terms, and is unnecessarily biased. We show in this note - extending and correcting work of Sutherland et al. (2017) - that we can find a truly unbiased estimator for the actual variance of both the squared MMD estimator and the difference of two correlated squared MMD estimators, at essentially no additional computational cost. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02104v1 |
https://arxiv.org/pdf/1906.02104v1.pdf | |
PWC | https://paperswithcode.com/paper/unbiased-estimators-for-the-variance-of-mmd |
Repo | |
Framework | |
Deep Heterogeneous Hashing for Face Video Retrieval
Title | Deep Heterogeneous Hashing for Face Video Retrieval |
Authors | Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen |
Abstract | Retrieving videos of a particular person with face image as a query via hashing technique has many important applications. While face images are typically represented as vectors in Euclidean space, characterizing face videos with some robust set modeling techniques (e.g. covariance matrices as exploited in this study, which reside on Riemannian manifold), has recently shown appealing advantages. This hence results in a thorny heterogeneous spaces matching problem. Moreover, hashing with handcrafted features as done in many existing works is clearly inadequate to achieve desirable performance for this task. To address such problems, we present an end-to-end Deep Heterogeneous Hashing (DHH) method that integrates three stages including image feature learning, video modeling, and heterogeneous hashing in a single framework, to learn unified binary codes for both face images and videos. To tackle the key challenge of hashing on the manifold, a well-studied Riemannian kernel mapping is employed to project data (i.e. covariance matrices) into Euclidean space and thus enables to embed the two heterogeneous representations into a common Hamming space, where both intra-space discriminability and inter-space compatibility are considered. To perform network optimization, the gradient of the kernel mapping is innovatively derived via structured matrix backpropagation in a theoretically principled way. Experiments on three challenging datasets show that our method achieves quite competitive performance compared with existing hashing methods. |
Tasks | Video Retrieval |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01048v1 |
https://arxiv.org/pdf/1911.01048v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-heterogeneous-hashing-for-face-video |
Repo | |
Framework | |
The Relationship between the Consistency of Users’ Ratings and Recommendation Calibration
Title | The Relationship between the Consistency of Users’ Ratings and Recommendation Calibration |
Authors | Masoud Mansoury, Himan Abdollahpouri, Joris Rombouts, Mykola Pechenizkiy |
Abstract | Fairness in recommender systems has recently received attention from researchers. Unfair recommendations have negative impact on the effectiveness of recommender systems as it may degrade users’ satisfaction, loyalty, and at worst, it can lead to or perpetuate undesirable social dynamics. One of the factors that may impact fairness is calibration, the degree to which users’ preferences on various item categories are reflected in the recommendations they receive. The ability of a recommendation algorithm for generating effective recommendations may depend on the meaningfulness of the input data and the amount of information available in users’ profile. In this paper, we aim to explore the relationship between the consistency of users’ ratings behavior and the degree of calibrated recommendations they receive. We conduct our analysis on different groups of users based on the consistency of their ratings. Our experimental results on a movie dataset and several recommendation algorithms show that there is a positive correlation between the consistency of users’ ratings behavior and the degree of calibration in their recommendations, meaning that user groups with higher inconsistency in their ratings receive less calibrated recommendations. |
Tasks | Calibration, Recommendation Systems |
Published | 2019-11-03 |
URL | https://arxiv.org/abs/1911.00852v1 |
https://arxiv.org/pdf/1911.00852v1.pdf | |
PWC | https://paperswithcode.com/paper/the-relationship-between-the-consistency-of |
Repo | |
Framework | |
Hierarchical Autoregressive Image Models with Auxiliary Decoders
Title | Hierarchical Autoregressive Image Models with Auxiliary Decoders |
Authors | Jeffrey De Fauw, Sander Dieleman, Karen Simonyan |
Abstract | Autoregressive generative models of images tend to be biased towards capturing local structure, and as a result they often produce samples which are lacking in terms of large-scale coherence. To address this, we propose two methods to learn discrete representations of images which abstract away local detail. We show that autoregressive models conditioned on these representations can produce high-fidelity reconstructions of images, and that we can train autoregressive priors on these representations that produce samples with large-scale coherence. We can recursively apply the learning procedure, yielding a hierarchy of progressively more abstract image representations. We train hierarchical class-conditional autoregressive models on the ImageNet dataset and demonstrate that they are able to generate realistic images at resolutions of 128$\times$128 and 256$\times$256 pixels. We also perform a human evaluation study comparing our models with both adversarial and likelihood-based state-of-the-art generative models. |
Tasks | |
Published | 2019-03-06 |
URL | https://arxiv.org/abs/1903.04933v2 |
https://arxiv.org/pdf/1903.04933v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-autoregressive-image-models-with |
Repo | |
Framework | |
Technical Report: A Stratification Approach to Partial Dependence for Codependent Variables
Title | Technical Report: A Stratification Approach to Partial Dependence for Codependent Variables |
Authors | Terence Parr, James D. Wilson |
Abstract | Model interpretability is important to machine learning practitioners, and a key component of interpretation is the characterization of partial dependence of the response variable on any subset of features used in the model. The two most common strategies for assessing partial dependence suffer from a number of critical weaknesses. In the first strategy, linear regression model coefficients describe how a unit change in an explanatory variable changes the response, while holding other variables constant. But, linear regression is inapplicable for high dimensional (p>n) data sets and is often insufficient to capture the relationship between explanatory variables and the response. In the second strategy, Partial Dependence (PD) plots and Individual Conditional Expectation (ICE) plots give biased results for the common situation of codependent variables and they rely on fitted models provided by the user. When the supplied model is a poor choice due to systematic bias or overfitting, PD/ICE plots provide little (if any) useful information. To address these issues, we introduce a new strategy, called StratPD, that does not depend on a user’s fitted model, provides accurate results in the presence codependent variables, and is applicable to high dimensional settings. The strategy works by stratifying a data set into groups of observations that are similar, except in the variable of interest, through the use of a decision tree. Any fluctuations of the response variable within a group is likely due to the variable of interest. We apply StratPD to a collection of simulations and case studies to show that StratPD is a fast, reliable, and robust method for assessing partial dependence with clear advantages over state-of-the-art methods. |
Tasks | |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06698v2 |
https://arxiv.org/pdf/1907.06698v2.pdf | |
PWC | https://paperswithcode.com/paper/a-stratification-approach-to-partial |
Repo | |
Framework | |
The Design of Global Correlation Quantifiers and Continuous Notions of Statistical Sufficiency
Title | The Design of Global Correlation Quantifiers and Continuous Notions of Statistical Sufficiency |
Authors | Nicholas Carrara, Kevin Vanslette |
Abstract | Using first principles from inference, we design a set of functionals for the purposes of \textit{ranking} joint probability distributions with respect to their correlations. Starting with a general functional, we impose its desired behaviour through the \textit{Principle of Constant Correlations} (PCC), which constrains the correlation functional to behave in a consistent way under statistically independent inferential transformations. The PCC guides us in choosing the appropriate design criteria for constructing the desired functionals. Since the derivations depend on a choice of partitioning the variable space into $n$ disjoint subspaces, the general functional we design is the $n$-partite information (NPI), of which the \textit{total correlation} and \textit{mutual information} are special cases. Thus, these functionals are found to be uniquely capable of determining whether a certain class of inferential transformations, $\rho\xrightarrow{*}\rho'$, preserve, destroy or create correlations. This provides conceptual clarity by ruling out other possible global correlation quantifiers. Finally, the derivation and results allow us to quantify non-binary notions of statistical sufficency. Our results express what percentage of the correlations are preserved under a given inferential transformation or variable mapping. |
Tasks | |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.06992v5 |
https://arxiv.org/pdf/1907.06992v5.pdf | |
PWC | https://paperswithcode.com/paper/the-design-of-mutual-information |
Repo | |
Framework | |
ncRNA Classification with Graph Convolutional Networks
Title | ncRNA Classification with Graph Convolutional Networks |
Authors | Emanuele Rossi, Federico Monti, Michael Bronstein, Pietro Liò |
Abstract | Non-coding RNA (ncRNA) are RNA sequences which don’t code for a gene but instead carry important biological functions. The task of ncRNA classification consists in classifying a given ncRNA sequence into its family. While it has been shown that the graph structure of an ncRNA sequence folding is of great importance for the prediction of its family, current methods make use of machine learning classifiers on hand-crafted graph features. We improve on the state-of-the-art for this task with a graph convolutional network model which achieves an accuracy of 85.73% and an F1-score of 85.61% over 13 classes. Moreover, our model learns in an end-to-end fashion from the raw RNA graphs and removes the need for expensive feature extraction. To the best of our knowledge, this also represents the first successful application of graph convolutional networks to RNA folding data. |
Tasks | |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06515v1 |
https://arxiv.org/pdf/1905.06515v1.pdf | |
PWC | https://paperswithcode.com/paper/ncrna-classification-with-graph-convolutional |
Repo | |
Framework | |
PPGAN: Privacy-preserving Generative Adversarial Network
Title | PPGAN: Privacy-preserving Generative Adversarial Network |
Authors | Yi Liu, Jialiang Peng, James J. Q Yu, Yi Wu |
Abstract | Generative Adversarial Network (GAN) and its variants serve as a perfect representation of the data generation model, providing researchers with a large amount of high-quality generated data. They illustrate a promising direction for research with limited data availability. When GAN learns the semantic-rich data distribution from a dataset, the density of the generated distribution tends to concentrate on the training data. Due to the gradient parameters of the deep neural network contain the data distribution of the training samples, they can easily remember the training samples. When GAN is applied to private or sensitive data, for instance, patient medical records, as private information may be leakage. To address this issue, we propose a Privacy-preserving Generative Adversarial Network (PPGAN) model, in which we achieve differential privacy in GANs by adding well-designed noise to the gradient during the model learning procedure. Besides, we introduced the Moments Accountant strategy in the PPGAN training process to improve the stability and compatibility of the model by controlling privacy loss. We also give a mathematical proof of the differential privacy discriminator. Through extensive case studies of the benchmark datasets, we demonstrate that PPGAN can generate high-quality synthetic data while retaining the required data available under a reasonable privacy budget. |
Tasks | |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.02007v1 |
https://arxiv.org/pdf/1910.02007v1.pdf | |
PWC | https://paperswithcode.com/paper/ppgan-privacy-preserving-generative |
Repo | |
Framework | |