October 20, 2019

3144 words 15 mins read

Paper Group ANR 44

Paper Group ANR 44

Reasoning about multiple aspects in DLs: Semantics and Closure Construction. TLR: Transfer Latent Representation for Unsupervised Domain Adaptation. PRESISTANT: Learning based assistant for data pre-processing. An Algorithmic Framework to Control Bias in Bandit-based Personalization. Social Media Analysis based on Semanticity of Streaming and Batch …

Reasoning about multiple aspects in DLs: Semantics and Closure Construction

Title Reasoning about multiple aspects in DLs: Semantics and Closure Construction
Authors Laura Giordano, Valentina Gliozzi
Abstract Starting from the observation that rational closure has the undesirable property of being an “all or nothing” mechanism, we here propose a multipreferential semantics, which enriches the preferential semantics underlying rational closure in order to separately deal with the inheritance of different properties in an ontology with exceptions. We provide a multipreference closure mechanism which is sound with respect to the multipreference semantics.
Tasks
Published 2018-01-18
URL http://arxiv.org/abs/1801.07161v1
PDF http://arxiv.org/pdf/1801.07161v1.pdf
PWC https://paperswithcode.com/paper/reasoning-about-multiple-aspects-in-dls
Repo
Framework

TLR: Transfer Latent Representation for Unsupervised Domain Adaptation

Title TLR: Transfer Latent Representation for Unsupervised Domain Adaptation
Authors Pan Xiao, Bo Du, Jia Wu, Lefei Zhang, Ruimin Hu, Xuelong Li
Abstract Domain adaptation refers to the process of learning prediction models in a target domain by making use of data from a source domain. Many classic methods solve the domain adaptation problem by establishing a common latent space, which may cause the loss of many important properties across both domains. In this manuscript, we develop a novel method, transfer latent representation (TLR), to learn a better latent space. Specifically, we design an objective function based on a simple linear autoencoder to derive the latent representations of both domains. The encoder in the autoencoder aims to project the data of both domains into a robust latent space. Besides, the decoder imposes an additional constraint to reconstruct the original data, which can preserve the common properties of both domains and reduce the noise that causes domain shift. Experiments on cross-domain tasks demonstrate the advantages of TLR over competing methods.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2018-08-19
URL http://arxiv.org/abs/1808.06206v1
PDF http://arxiv.org/pdf/1808.06206v1.pdf
PWC https://paperswithcode.com/paper/tlr-transfer-latent-representation-for
Repo
Framework

PRESISTANT: Learning based assistant for data pre-processing

Title PRESISTANT: Learning based assistant for data pre-processing
Authors Besim Bilalli, Alberto Abelló, Tomàs Aluja-Banet, Robert Wrembel
Abstract Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only “syntactically” applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks.
Tasks
Published 2018-03-02
URL http://arxiv.org/abs/1803.01024v1
PDF http://arxiv.org/pdf/1803.01024v1.pdf
PWC https://paperswithcode.com/paper/presistant-learning-based-assistant-for-data
Repo
Framework

An Algorithmic Framework to Control Bias in Bandit-based Personalization

Title An Algorithmic Framework to Control Bias in Bandit-based Personalization
Authors L. Elisa Celis, Sayash Kapoor, Farnood Salehi, Nisheeth K. Vishnoi
Abstract Personalization is pervasive in the online space as it leads to higher efficiency and revenue by allowing the most relevant content to be served to each user. However, recent studies suggest that personalization methods can propagate societal or systemic biases and polarize opinions; this has led to calls for regulatory mechanisms and algorithms to combat bias and inequality. Algorithmically, bandit optimization has enjoyed great success in learning user preferences and personalizing content or feeds accordingly. We propose an algorithmic framework that allows for the possibility to control bias or discrimination in such bandit-based personalization. Our model allows for the specification of general fairness constraints on the sensitive types of the content that can be displayed to a user. The challenge, however, is to come up with a scalable and low regret algorithm for the constrained optimization problem that arises. Our main technical contribution is a provably fast and low-regret algorithm for the fairness-constrained bandit optimization problem. Our proofs crucially leverage the special structure of our problem. Experiments on synthetic and real-world data sets show that our algorithmic framework can control bias with only a minor loss to revenue.
Tasks
Published 2018-02-23
URL http://arxiv.org/abs/1802.08674v1
PDF http://arxiv.org/pdf/1802.08674v1.pdf
PWC https://paperswithcode.com/paper/an-algorithmic-framework-to-control-bias-in
Repo
Framework

Social Media Analysis based on Semanticity of Streaming and Batch Data

Title Social Media Analysis based on Semanticity of Streaming and Batch Data
Authors Barathi Ganesh HB
Abstract Languages shared by people differ in different regions based on their accents, pronunciation and word usages. In this era sharing of language takes place mainly through social media and blogs. Every second swing of such a micro posts exist which induces the need of processing those micro posts, in-order to extract knowledge out of it. Knowledge extraction differs with respect to the application in which the research on cognitive science fed the necessities for the same. This work further moves forward such a research by extracting semantic information of streaming and batch data in applications like Named Entity Recognition and Author Profiling. In the case of Named Entity Recognition context of a single micro post has been utilized and context that lies in the pool of micro posts were utilized to identify the sociolect aspects of the author of those micro posts. In this work Conditional Random Field has been utilized to do the entity recognition and a novel approach has been proposed to find the sociolect aspects of the author (Gender, Age group).
Tasks Named Entity Recognition
Published 2018-01-03
URL http://arxiv.org/abs/1801.01102v2
PDF http://arxiv.org/pdf/1801.01102v2.pdf
PWC https://paperswithcode.com/paper/social-media-analysis-based-on-semanticity-of
Repo
Framework

Finding GEMS: Multi-Scale Dictionaries for High-Dimensional Graph Signals

Title Finding GEMS: Multi-Scale Dictionaries for High-Dimensional Graph Signals
Authors Yael Yankelevsky, Michael Elad
Abstract Modern data introduces new challenges to classic signal processing approaches, leading to a growing interest in the field of graph signal processing. A powerful and well established model for real world signals in various domains is sparse representation over a dictionary, combined with the ability to train the dictionary from signal examples. This model has been successfully applied to graph signals as well by integrating the underlying graph topology into the learned dictionary. Nonetheless, dictionary learning methods for graph signals are typically restricted to small dimensions due to the computational constraints that the dictionary learning problem entails, and due to the direct use of the graph Laplacian matrix. In this paper, we propose a dictionary learning algorithm that applies to a broader class of graph signals, and is capable of handling much higher dimensional data. We incorporate the underlying graph topology both implicitly, by forcing the learned dictionary atoms to be sparse combinations of graph-wavelet functions, and explicitly, by adding direct graph constraints to promote smoothness in both the feature and manifold domains. The resulting atoms are thus adapted to the data of interest while adhering to the underlying graph structure and possessing a desired multi-scale property. Experimental results on several datasets, representing both synthetic and real network data of different nature, demonstrate the effectiveness of the proposed algorithm for graph signal processing even in high dimensions.
Tasks Dictionary Learning
Published 2018-06-14
URL http://arxiv.org/abs/1806.05356v1
PDF http://arxiv.org/pdf/1806.05356v1.pdf
PWC https://paperswithcode.com/paper/finding-gems-multi-scale-dictionaries-for
Repo
Framework

MOBA-Slice: A Time Slice Based Evaluation Framework of Relative Advantage between Teams in MOBA Games

Title MOBA-Slice: A Time Slice Based Evaluation Framework of Relative Advantage between Teams in MOBA Games
Authors Lijun Yu, Dawei Zhang, Xiangqun Chen, Xing Xie
Abstract Multiplayer Online Battle Arena (MOBA) is currently one of the most popular genres of digital games around the world. The domain of knowledge contained in these complicated games is large. It is hard for humans and algorithms to evaluate the real-time game situation or predict the game result. In this paper, we introduce MOBA-Slice, a time slice based evaluation framework of relative advantage between teams in MOBA games. MOBA-Slice is a quantitative evaluation method based on learning, similar to the value network of AlphaGo. It establishes a foundation for further MOBA related research including AI development. In MOBA-Slice, with an analysis of the deciding factors of MOBA game results, we design a neural network model to fit our discounted evaluation function. Then we apply MOBA-Slice to Defense of the Ancients 2 (DotA2), a typical and popular MOBA game. Experiments on a large number of match replays show that our model works well on arbitrary matches. MOBA-Slice not only has an accuracy 3.7% higher than DotA Plus Assistant at result prediction, but also supports the prediction of the remaining time of the game, and then realizes the evaluation of relative advantage between teams.
Tasks
Published 2018-07-22
URL http://arxiv.org/abs/1807.08360v1
PDF http://arxiv.org/pdf/1807.08360v1.pdf
PWC https://paperswithcode.com/paper/moba-slice-a-time-slice-based-evaluation
Repo
Framework

Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective

Title Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective
Authors Yuan Luo, Peter Szolovits
Abstract This paper presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard, customized and in-house developed NLP tools. Our system facilitates portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements. It utilizes UMLS to perform domain adaptation when integrating generic domain NLP tools. It also features stand-off annotations that are specified by positional reference to the original document. We built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements. We also developed a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations. We experimented with our system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical notes. These experiments showcased the broader applicability and utility of LAPNLP.
Tasks Computational Phenotyping, Domain Adaptation, Relation Extraction
Published 2018-11-15
URL http://arxiv.org/abs/1811.06179v1
PDF http://arxiv.org/pdf/1811.06179v1.pdf
PWC https://paperswithcode.com/paper/implementing-a-portable-clinical-nlp-system
Repo
Framework

Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer

Title Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer
Authors Zexian Zeng, Ankita Roy, Xiaoyu Li, Sasa Espino, Susan Clare, Seema Khan, Yuan Luo
Abstract Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We apply MetaMap to extract features from clinical narratives and also retrieve structured clinical data from EHR. Using these features, we train a support vector machine model to identify distant recurrences in breast cancer patients. We train the model using 1,396 double-annotated subjects and validate the model using 599 double-annotated subjects. In addition, we validate the model on a set of 4,904 single-annotated subjects as a generalization test. We obtained a high area under curve (AUC) score of 0.92 (SD=0.01) in the cross-validation using the training dataset, then obtained AUC scores of 0.95 and 0.93 in the held-out test and generalization test using 599 and 4,904 samples respectively. Our model can accurately and efficiently identify distant recurrences in breast cancer by combining features extracted from unstructured clinical narratives and structured clinical data.
Tasks Computational Phenotyping
Published 2018-06-13
URL http://arxiv.org/abs/1806.04818v2
PDF http://arxiv.org/pdf/1806.04818v2.pdf
PWC https://paperswithcode.com/paper/using-clinical-narratives-and-structured-data
Repo
Framework

3D Terrain Segmentation in the SWIR Spectrum

Title 3D Terrain Segmentation in the SWIR Spectrum
Authors Dalton Rosario, Anthony Ortiz, Olac Fuentes
Abstract We focus on the automatic 3D terrain segmentation problem using hyperspectral shortwave IR (HS-SWIR) imagery and 3D Digital Elevation Models (DEM). The datasets were independently collected, and metadata for the HS-SWIR dataset are unavailable. We explore an overall slope of the SWIR spectrum that correlates with the presence of moisture in soil to propose a band ratio test to be used as a proxy for soil moisture content to distinguish two broad classes of objects: live vegetation from impermeable manmade surface. We show that image based localization techniques combined with the Optimal Randomized RANdom Sample Consensus (RANSAC) algorithm achieve precise spatial matches between HS-SWIR data of a portion of downtown Los Angeles (LA (USA)) and the Visible image of a geo-registered 3D DEM, covering a wider-area of LA. Our spectral-elevation rule based approach yields an overall accuracy of 97.7%, segmenting the object classes into buildings, houses, trees, grass, and roads/parking lots.
Tasks Image-Based Localization
Published 2018-10-27
URL http://arxiv.org/abs/1810.11690v1
PDF http://arxiv.org/pdf/1810.11690v1.pdf
PWC https://paperswithcode.com/paper/3d-terrain-segmentation-in-the-swir-spectrum
Repo
Framework

Composing photomosaic images using clustering based evolutionary programming

Title Composing photomosaic images using clustering based evolutionary programming
Authors Yaodong He, Jianfeng Zhou, Shiu Yin Yuen
Abstract Photomosaic images are a type of images consisting of various tiny images. A complete form can be seen clearly by viewing it from a long distance. Small tiny images which replace blocks of the original image can be seen clearly by viewing it from a short distance. In the past, many algorithms have been proposed trying to automatically compose photomosaic images. Most of these algorithms are designed with greedy algorithms to match the blocks with the tiny images. To obtain a better visual sense and satisfy some commercial requirements, a constraint that a tiny image should not be repeatedly used many times is usually added. With the constraint, the photomosaic problem becomes a combinatorial optimization problem. Evolutionary algorithms imitating the process of natural selection are popular and powerful in combinatorial optimization problems. However, little work has been done on applying evolutionary algorithms to photomosaic problem. In this paper, we present an algorithm called clustering based evolutionary programming to deal with the problem. We give prior knowledge to the optimization algorithm which makes the optimization process converges faster. In our experiment, the proposed algorithm is compared with the state of the art algorithms and software. The results indicate that our algorithm performs the best.
Tasks Combinatorial Optimization
Published 2018-04-09
URL http://arxiv.org/abs/1804.02827v1
PDF http://arxiv.org/pdf/1804.02827v1.pdf
PWC https://paperswithcode.com/paper/composing-photomosaic-images-using-clustering
Repo
Framework

Online Feature Ranking for Intrusion Detection Systems

Title Online Feature Ranking for Intrusion Detection Systems
Authors Buse Gul Atli, Alexander Jung
Abstract Many current approaches to the design of intrusion detection systems apply feature selection in a static, non-adaptive fashion. These methods often neglect the dynamic nature of network data which requires to use adaptive feature selection techniques. In this paper, we present a simple technique based on incremental learning of support vector machines in order to rank the features in real time within a streaming model for network data. Some illustrative numerical experiments with two popular benchmark datasets show that our approach allows to adapt to the changes in normal network behaviour and novel attack patterns which have not been experienced before.
Tasks Feature Selection, Intrusion Detection
Published 2018-03-01
URL http://arxiv.org/abs/1803.00530v2
PDF http://arxiv.org/pdf/1803.00530v2.pdf
PWC https://paperswithcode.com/paper/online-feature-ranking-for-intrusion
Repo
Framework

Predicting the time-evolution of multi-physics systems with sequence-to-sequence models

Title Predicting the time-evolution of multi-physics systems with sequence-to-sequence models
Authors K. D. Humbird, J. L. Peterson, R. G. McClarren
Abstract In this work, sequence-to-sequence (seq2seq) models, originally developed for language translation, are used to predict the temporal evolution of complex, multi-physics computer simulations. The predictive performance of seq2seq models is compared to state transition models for datasets generated with multi-physics codes with varying levels of complexity - from simple 1D diffusion calculations to simulations of inertial confinement fusion implosions. Seq2seq models demonstrate the ability to accurately emulate complex systems, enabling the rapid estimation of the evolution of quantities of interest in computationally expensive simulations.
Tasks
Published 2018-11-14
URL http://arxiv.org/abs/1811.05852v1
PDF http://arxiv.org/pdf/1811.05852v1.pdf
PWC https://paperswithcode.com/paper/predicting-the-time-evolution-of-multi
Repo
Framework

Prediction of Electric Multiple Unit Fleet Size Based on Convolutional Neural Network

Title Prediction of Electric Multiple Unit Fleet Size Based on Convolutional Neural Network
Authors Boliang Lin
Abstract With the expansion of high-speed railway network and growth of passenger transportation demands, the fleet size of electric multiple unit (EMU) in China needs to be adjusted accordingly. Generally, an EMU train costs tens of millions of dollars which constitutes a significant portion of capital investment. Thus, the prediction of EMU fleet size has attracted increasing attention from associated railway departments. First, this paper introduces a typical architecture of convolutional neural network (CNN) and its basic theory. Then, some data of nine indices, such as passenger traffic volume and length of high-speed railways in operation, is collected and preprocessed. Next, a CNN and a backpropagation neural network (BPNN) are constructed and trained aiming to predict EMU fleet size in the following years. The differences and performances of these two networks in computation experiments are analyzed in-depth. The results indicate that the CNN is superior to the BPNN both in generalization ability and fitting accuracy, and CNN can serve as an aid in EMU fleet size prediction.
Tasks
Published 2018-09-03
URL http://arxiv.org/abs/1809.00491v1
PDF http://arxiv.org/pdf/1809.00491v1.pdf
PWC https://paperswithcode.com/paper/prediction-of-electric-multiple-unit-fleet
Repo
Framework

Natural Language Processing for EHR-Based Computational Phenotyping

Title Natural Language Processing for EHR-Based Computational Phenotyping
Authors Zexian Zeng, Yu Deng, Xiaoyu Li, Tristan Naumann, Yuan Luo
Abstract This article reviews recent advances in applying natural language processing (NLP) to Electronic Health Records (EHRs) for computational phenotyping. NLP-based computational phenotyping has numerous applications including diagnosis categorization, novel phenotype discovery, clinical trial screening, pharmacogenomics, drug-drug interaction (DDI) and adverse drug event (ADE) detection, as well as genome-wide and phenome-wide association studies. Significant progress has been made in algorithm development and resource construction for computational phenotyping. Among the surveyed methods, well-designed keyword search and rule-based systems often achieve good performance. However, the construction of keyword and rule lists requires significant manual effort, which is difficult to scale. Supervised machine learning models have been favored because they are capable of acquiring both classification patterns and structures from data. Recently, deep learning and unsupervised learning have received growing attention, with the former favored for its performance and the latter for its ability to find novel phenotypes. Integrating heterogeneous data sources have become increasingly important and have shown promise in improving model performance. Often better performance is achieved by combining multiple modalities of information. Despite these many advances, challenges and opportunities remain for NLP-based computational phenotyping, including better model interpretability and generalizability, and proper characterization of feature relations in clinical narratives
Tasks Computational Phenotyping
Published 2018-06-13
URL http://arxiv.org/abs/1806.04820v2
PDF http://arxiv.org/pdf/1806.04820v2.pdf
PWC https://paperswithcode.com/paper/natural-language-processing-for-ehr-based
Repo
Framework
comments powered by Disqus