January 29, 2020

3019 words 15 mins read

Paper Group ANR 739

Paper Group ANR 739

Prediction of Malignant & Benign Breast Cancer: A Data Mining Approach in Healthcare Applications. Time-Guided High-Order Attention Model of Longitudinal Heterogeneous Healthcare Data. Towards Automatic Bot Detection in Twitter for Health-related Tasks. Long-length Legal Document Classification. 3D Geometric salient patterns analysis on 3D meshes. …

Prediction of Malignant & Benign Breast Cancer: A Data Mining Approach in Healthcare Applications

Title Prediction of Malignant & Benign Breast Cancer: A Data Mining Approach in Healthcare Applications
Authors Vivek Kumar, Brojo Kishore Mishra, Manuel Mazzara, Dang N. H. Thanh, Abhishek Verma
Abstract As much as data science is playing a pivotal role everywhere, healthcare also finds it prominent application. Breast Cancer is the top rated type of cancer amongst women; which took away 627,000 lives alone. This high mortality rate due to breast cancer does need attention, for early detection so that prevention can be done in time. As a potential contributor to state-of-art technology development, data mining finds a multi-fold application in predicting Brest cancer. This work focuses on different classification techniques implementation for data mining in predicting malignant and benign breast cancer. Breast Cancer Wisconsin data set from the UCI repository has been used as experimental dataset while attribute clump thickness being used as an evaluation class. The performances of these twelve algorithms: Ada Boost M 1, Decision Table, J Rip, Lazy IBK, Logistics Regression, Multiclass Classifier, Multilayer Perceptron, Naive Bayes, Random forest and Random Tree are analyzed on this data set. Keywords- Data Mining, Classification Techniques, UCI repository, Breast Cancer, Classification Algorithms
Tasks
Published 2019-02-11
URL http://arxiv.org/abs/1902.03825v4
PDF http://arxiv.org/pdf/1902.03825v4.pdf
PWC https://paperswithcode.com/paper/prediction-of-malignant-benign-breast-cancer
Repo
Framework

Time-Guided High-Order Attention Model of Longitudinal Heterogeneous Healthcare Data

Title Time-Guided High-Order Attention Model of Longitudinal Heterogeneous Healthcare Data
Authors Yi Huang, Xiaoshan Yang, Changsheng Xu
Abstract Due to potential applications in chronic disease management and personalized healthcare, the EHRs data analysis has attracted much attention of both researchers and practitioners. There are three main challenges in modeling longitudinal and heterogeneous EHRs data: heterogeneity, irregular temporality and interpretability. A series of deep learning methods have made remarkable progress in resolving these challenges. Nevertheless, most of existing attention models rely on capturing the 1-order temporal dependencies or 2-order multimodal relationships among feature elements. In this paper, we propose a time-guided high-order attention (TGHOA) model. The proposed method has three major advantages. (1) It can model longitudinal heterogeneous EHRs data via capturing the 3-order correlations of different modalities and the irregular temporal impact of historical events. (2) It can be used to identify the potential concerns of medical features to explain the reasoning process of the healthcare model. (3) It can be easily expanded into cases with more modalities and flexibly applied in different prediction tasks. We evaluate the proposed method in two tasks of mortality prediction and disease ranking on two real world EHRs datasets. Extensive experimental results show the effectiveness of the proposed model.
Tasks Mortality Prediction
Published 2019-11-28
URL https://arxiv.org/abs/1912.00773v1
PDF https://arxiv.org/pdf/1912.00773v1.pdf
PWC https://paperswithcode.com/paper/time-guided-high-order-attention-model-of
Repo
Framework
Title Towards Automatic Bot Detection in Twitter for Health-related Tasks
Authors Anahita Davoudi, Ari Z. Klein, Abeed Sarker, Graciela Gonzalez-Hernandez
Abstract With the increasing use of social media data for health-related research, the credibility of the information from this source has been questioned as the posts may originate from automated accounts or “bots”. While automatic bot detection approaches have been proposed, there are none that have been evaluated on users posting health-related information. In this paper, we extend an existing bot detection system and customize it for health-related research. Using a dataset of Twitter users, we first show that the system, which was designed for political bot detection, underperforms when applied to health-related Twitter users. We then incorporate additional features and a statistical machine learning classifier to significantly improve bot detection performance. Our approach obtains F_1 scores of 0.7 for the “bot” class, representing improvements of 0.339. Our approach is customizable and generalizable for bot detection in other health-related social media cohorts.
Tasks
Published 2019-09-29
URL https://arxiv.org/abs/1909.13184v1
PDF https://arxiv.org/pdf/1909.13184v1.pdf
PWC https://paperswithcode.com/paper/towards-automatic-bot-detection-in-twitter
Repo
Framework
Title Long-length Legal Document Classification
Authors Lulu Wan, George Papageorgiou, Michael Seddon, Mirko Bernardoni
Abstract One of the principal tasks of machine learning with major applications is text classification. This paper focuses on the legal domain and, in particular, on the classification of lengthy legal documents. The main challenge that this study addresses is the limitation that current models impose on the length of the input text. In addition, the present paper shows that dividing the text into segments and later combining the resulting embeddings with a BiLSTM architecture to form a single document embedding can improve results. These advancements are achieved by utilising a simpler structure, rather than an increasingly complex one, which is often the case in NLP research. The dataset used in this paper is obtained from an online public database containing lengthy legal documents with highly domain-specific vocabulary and thus, the comparison of our results to the ones produced by models implemented on the commonly used datasets would be unjustified. This work provides the foundation for future work in document classification in the legal field.
Tasks Document Classification, Document Embedding, Text Classification
Published 2019-12-14
URL https://arxiv.org/abs/1912.06905v1
PDF https://arxiv.org/pdf/1912.06905v1.pdf
PWC https://paperswithcode.com/paper/long-length-legal-document-classification
Repo
Framework

3D Geometric salient patterns analysis on 3D meshes

Title 3D Geometric salient patterns analysis on 3D meshes
Authors Alice Othmani, Fakhri Torkhani, Jean-Marie Favreau
Abstract Pattern analysis is a wide domain that has wide applicability in many fields. In fact, texture analysis is one of those fields, since the texture is defined as a set of repetitive or quasi-repetitive patterns. Despite its importance in analyzing 3D meshes, geometric texture analysis is less studied by geometry processing community. This paper presents a new efficient approach for geometric texture analysis on 3D triangular meshes. The proposed method is a scale-aware approach that takes as input a 3D mesh and a user-scale. It provides, as a result, a similarity-based clustering of texels in meaningful classes. Experimental results of the proposed algorithm are presented for both real-world and synthetic meshes within various textures. Furthermore, the efficiency of the proposed approach was experimentally demonstrated under mesh simplification and noise addition on the mesh surface. In this paper, we present a practical application for semantic annotation of 3D geometric salient texels.
Tasks Texture Classification
Published 2019-06-18
URL https://arxiv.org/abs/1906.07645v1
PDF https://arxiv.org/pdf/1906.07645v1.pdf
PWC https://paperswithcode.com/paper/3d-geometric-salient-patterns-analysis-on-3d
Repo
Framework

Sparse $\ell^q$-regularization of inverse problems with deep learning

Title Sparse $\ell^q$-regularization of inverse problems with deep learning
Authors Markus Haltmeier, Linh Nguyen, Daniel Obmann, Johannes Schwab
Abstract We propose a sparse reconstruction framework for solving inverse problems. Opposed to existing sparse reconstruction techniques that are based on linear sparsifying transforms, we train an encoder-decoder network $D \circ E$ with $E$ acting as a nonlinear sparsifying transform. We minimize a Tikhonov functional which used a learned regularization term formed by the $\ell^q$-norm of the encoder coefficients and a penalty for the distance to the data manifold. For this augmented sparse $\ell^q$-approach, we present a full convergence analysis, derive convergence rates and describe a training strategy. As a main ingredient for the analysis we establish the coercivity of the augmented regularization term.
Tasks
Published 2019-08-08
URL https://arxiv.org/abs/1908.03006v1
PDF https://arxiv.org/pdf/1908.03006v1.pdf
PWC https://paperswithcode.com/paper/sparse-ellq-regularization-of-inverse
Repo
Framework

Bandit Multiclass Linear Classification for the Group Linear Separable Case

Title Bandit Multiclass Linear Classification for the Group Linear Separable Case
Authors Jittat Fakcharoenphol, Chayutpong Prompak
Abstract We consider the online multiclass linear classification under the bandit feedback setting. Beygelzimer, P'{a}l, Sz"{o}r'{e}nyi, Thiruvenkatachari, Wei, and Zhang [ICML’19] considered two notions of linear separability, weak and strong linear separability. When examples are strongly linearly separable with margin $\gamma$, they presented an algorithm based on Multiclass Perceptron with mistake bound $O(K/\gamma^2)$, where $K$ is the number of classes. They employed rational kernel to deal with examples under the weakly linearly separable condition, and obtained the mistake bound of $\min(K\cdot 2^{\tilde{O}(K\log^2(1/\gamma))},K\cdot 2^{\tilde{O}(\sqrt{1/\gamma}\log K)})$. In this paper, we refine the notion of weak linear separability to support the notion of class grouping, called group weak linear separable condition. This situation may arise from the fact that class structures contain inherent grouping. We show that under this condition, we can also use the rational kernel and obtain the mistake bound of $K\cdot 2^{\tilde{O}(\sqrt{1/\gamma}\log L)})$, where $L\leq K$ represents the number of groups.
Tasks
Published 2019-12-21
URL https://arxiv.org/abs/1912.10340v1
PDF https://arxiv.org/pdf/1912.10340v1.pdf
PWC https://paperswithcode.com/paper/bandit-multiclass-linear-classification-for
Repo
Framework

Removal of Batch Effects using Generative Adversarial Networks

Title Removal of Batch Effects using Generative Adversarial Networks
Authors Uddeshya Upadhyay, Arjun Jain
Abstract Many biological data analysis processes like Cytometry or Next Generation Sequencing (NGS) produce massive amounts of data which needs to be processed in batches for down-stream analysis. Such datasets are prone to technical variations due to difference in handling the batches possibly at different times, by different experimenters or under other different conditions. This adds variation to the batches coming from the same source sample. These variations are known as Batch Effects. It is possible that these variations and natural variations due to biology confound but such situations can be avoided by performing experiments in a carefully planned manner. Batch effects can hamper downstream analysis and may also cause results to be inconclusive. Thus, it is essential to correct for these effects. This can be solved using a novel Generative Adversarial Networks (GANs) based framework that is proposed here, advantage of using this framework over other prior approaches is that here it is not required to choose a reproducing kernel and define its parameters. Results of the framework on a mass cytometry dataset are reported.
Tasks
Published 2019-01-20
URL https://arxiv.org/abs/1901.06654v3
PDF https://arxiv.org/pdf/1901.06654v3.pdf
PWC https://paperswithcode.com/paper/removal-of-batch-effects-using-generative
Repo
Framework

Evaluating the Utility of Document Embedding Vector Difference for Relation Learning

Title Evaluating the Utility of Document Embedding Vector Difference for Relation Learning
Authors Jingyuan Zhang, Timothy Baldwin
Abstract Recent work has demonstrated that vector offsets obtained by subtracting pretrained word embedding vectors can be used to predict lexical relations with surprising accuracy. Inspired by this finding, in this paper, we extend the idea to the document level, in generating document-level embeddings, calculating the distance between them, and using a linear classifier to classify the relation between the documents. In the context of duplicate detection and dialogue act tagging tasks, we show that document-level difference vectors have utility in assessing document-level similarity, but perform less well in multi-relational classification.
Tasks Document Embedding
Published 2019-07-18
URL https://arxiv.org/abs/1907.08184v1
PDF https://arxiv.org/pdf/1907.08184v1.pdf
PWC https://paperswithcode.com/paper/evaluating-the-utility-of-document-embedding
Repo
Framework

Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

Title Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision
Authors Vishal Kaushal, Rishabh Iyer, Suraj Kothawade, Rohan Mahadev, Khoshrav Doctor, Ganesh Ramakrishnan
Abstract Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.
Tasks Active Learning, Face Recognition, Object Detection, Object Recognition, Scene Recognition
Published 2019-01-03
URL http://arxiv.org/abs/1901.01151v1
PDF http://arxiv.org/pdf/1901.01151v1.pdf
PWC https://paperswithcode.com/paper/learning-from-less-data-a-unified-data-subset
Repo
Framework

Correspondence Analysis Using Neural Networks

Title Correspondence Analysis Using Neural Networks
Authors Hsiang Hsu, Salman Salamatian, Flavio P. Calmon
Abstract Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies. CA has found applications in fields ranging from epidemiology to social sciences. However, current methods used to perform CA do not scale to large, high-dimensional datasets. By re-interpreting the objective in CA using an information-theoretic tool called the principal inertia components, we demonstrate that performing CA is equivalent to solving a functional optimization problem over the space of finite variance functions of two random variable. We show that this optimization problem, in turn, can be efficiently approximated by neural networks. The resulting formulation, called the correspondence analysis neural network (CA-NN), enables CA to be performed at an unprecedented scale. We validate the CA-NN on synthetic data, and demonstrate how it can be used to perform CA on a variety of datasets, including food recipes, wine compositions, and images. Our results outperform traditional methods used in CA, indicating that CA-NN can serve as a new, scalable tool for interpretability and visualization of complex dependencies between random variables.
Tasks Epidemiology
Published 2019-02-21
URL http://arxiv.org/abs/1902.07828v1
PDF http://arxiv.org/pdf/1902.07828v1.pdf
PWC https://paperswithcode.com/paper/correspondence-analysis-using-neural-networks
Repo
Framework

Model Selection for Simulator-based Statistical Models: A Kernel Approach

Title Model Selection for Simulator-based Statistical Models: A Kernel Approach
Authors Takafumi Kajihara, Motonobu Kanagawa, Yuuki Nakaguchi, Kanishka Khandelwal, Kenji Fukumiziu
Abstract We propose a novel approach to model selection for simulator-based statistical models. The proposed approach defines a mixture of candidate models, and then iteratively updates the weight coefficients for those models as well as the parameters in each model simultaneously; this is done by recursively applying Bayes’ rule, using the recently proposed kernel recursive ABC algorithm. The practical advantage of the method is that it can be used even when a modeler lacks appropriate prior knowledge about the parameters in each model. We demonstrate the effectiveness of the proposed approach with a number of experiments, including model selection for dynamical systems in ecology and epidemiology.
Tasks Epidemiology, Model Selection
Published 2019-02-07
URL http://arxiv.org/abs/1902.02517v1
PDF http://arxiv.org/pdf/1902.02517v1.pdf
PWC https://paperswithcode.com/paper/model-selection-for-simulator-based
Repo
Framework

Ground Truth Simulation for Deep Learning Classification of Mid-Resolution Venus Images Via Unmixing of High-Resolution Hyperspectral Fenix Data

Title Ground Truth Simulation for Deep Learning Classification of Mid-Resolution Venus Images Via Unmixing of High-Resolution Hyperspectral Fenix Data
Authors Ido Faran, Nathan S. Netanyahu, Eli David, Maxim Shoshany, Fadi Kizel, Jisung Geba Chang, Ronit Rud
Abstract Training a deep neural network for classification constitutes a major problem in remote sensing due to the lack of adequate field data. Acquiring high-resolution ground truth (GT) by human interpretation is both cost-ineffective and inconsistent. We propose, instead, to utilize high-resolution, hyperspectral images for solving this problem, by unmixing these images to obtain reliable GT for training a deep network. Specifically, we simulate GT from high-resolution, hyperspectral FENIX images, and use it for training a convolutional neural network (CNN) for pixel-based classification. We show how the model can be transferred successfully to classify new mid-resolution VENuS imagery.
Tasks
Published 2019-11-24
URL https://arxiv.org/abs/1911.10442v1
PDF https://arxiv.org/pdf/1911.10442v1.pdf
PWC https://paperswithcode.com/paper/ground-truth-simulation-for-deep-learning
Repo
Framework

Empirical validation of network learning with taxi GPS data from Wuhan, China

Title Empirical validation of network learning with taxi GPS data from Wuhan, China
Authors Susan Jia Xu, Qian Xie, Joseph Y. J. Chow, Xintao Liu
Abstract In prior research, a statistically cheap method was developed to monitor transportation network performance by using only a few groups of agents without having to forecast the population flows. The current study validates this “multi-agent inverse optimization” method using taxi GPS probe data from the city of Wuhan, China. Using a controlled 2062-link network environment and different GPS data processing algorithms, an online monitoring environment is simulated using the real data over a 4-hour period. Results show that using only samples from one OD pair, the multi-agent inverse optimization method can learn network parameters such that forecasted travel times have a 0.23 correlation with the observed travel times. By increasing to monitoring from just two OD pairs, the correlation improves further to 0.56.
Tasks
Published 2019-11-09
URL https://arxiv.org/abs/1911.03779v1
PDF https://arxiv.org/pdf/1911.03779v1.pdf
PWC https://paperswithcode.com/paper/empirical-validation-of-network-learning-with
Repo
Framework

Empowering individual trait prediction using interactions

Title Empowering individual trait prediction using interactions
Authors Damian Gola, Inke R. König
Abstract One component of precision medicine is to construct prediction models with their predictive ability as high as possible, e.g. to enable individual risk prediction. In genetic epidemiology, complex diseases have a polygenic basis and a common assumption is that biological and genetic features affect the outcome under consideration via interactions. In the case of omics data, the use of standard approaches such as generalized linear models may be suboptimal and machine learning methods are appealing to make individual predictions. However, most of these algorithms focus mostly on main or marginal effects of the single features in a dataset. On the other hand, the detection of interacting features is an active area of research in the realm of genetic epidemiology. One big class of algorithms to detect interacting features is based on the multifactor dimensionality reduction (MDR). Here, we extend the model-based MDR (MB-MDR), a powerful extension of the original MDR algorithm, to enable interaction empowered individual prediction. Using a comprehensive simulation study we show that our new algorithm can use information hidden in interactions more efficiently than two other state-of-the-art algorithms, namely the Random Forest and Elastic Net, and clearly outperforms these if interactions are present. The performance of these algorithms is comparable if no interactions are present. Further, we show that our new algorithm is applicable to real data by comparing the performance of the three algorithms on a dataset of rheumatoid arthritis cases and healthy controls. As our new algorithm is not only applicable to biological/genetic data but to all datasets with discrete features, it may have practical implications in other applications as well, and we made our method available as an R package.
Tasks Dimensionality Reduction, Epidemiology
Published 2019-01-25
URL http://arxiv.org/abs/1901.08814v1
PDF http://arxiv.org/pdf/1901.08814v1.pdf
PWC https://paperswithcode.com/paper/empowering-individual-trait-prediction-using
Repo
Framework
comments powered by Disqus