October 20, 2019

3144 words 15 mins read

Paper Group ANR 44

Reasoning about multiple aspects in DLs: Semantics and Closure Construction. TLR: Transfer Latent Representation for Unsupervised Domain Adaptation. PRESISTANT: Learning based assistant for data pre-processing. An Algorithmic Framework to Control Bias in Bandit-based Personalization. Social Media Analysis based on Semanticity of Streaming and Batch …

Reasoning about multiple aspects in DLs: Semantics and Closure Construction


Title	Reasoning about multiple aspects in DLs: Semantics and Closure Construction
Authors	Laura Giordano, Valentina Gliozzi
Abstract	Starting from the observation that rational closure has the undesirable property of being an “all or nothing” mechanism, we here propose a multipreferential semantics, which enriches the preferential semantics underlying rational closure in order to separately deal with the inheritance of different properties in an ontology with exceptions. We provide a multipreference closure mechanism which is sound with respect to the multipreference semantics.
Tasks
Published	2018-01-18
URL	http://arxiv.org/abs/1801.07161v1
PDF	http://arxiv.org/pdf/1801.07161v1.pdf
PWC	https://paperswithcode.com/paper/reasoning-about-multiple-aspects-in-dls
Repo
Framework

TLR: Transfer Latent Representation for Unsupervised Domain Adaptation


Title	TLR: Transfer Latent Representation for Unsupervised Domain Adaptation
Authors	Pan Xiao, Bo Du, Jia Wu, Lefei Zhang, Ruimin Hu, Xuelong Li
Abstract	Domain adaptation refers to the process of learning prediction models in a target domain by making use of data from a source domain. Many classic methods solve the domain adaptation problem by establishing a common latent space, which may cause the loss of many important properties across both domains. In this manuscript, we develop a novel method, transfer latent representation (TLR), to learn a better latent space. Specifically, we design an objective function based on a simple linear autoencoder to derive the latent representations of both domains. The encoder in the autoencoder aims to project the data of both domains into a robust latent space. Besides, the decoder imposes an additional constraint to reconstruct the original data, which can preserve the common properties of both domains and reduce the noise that causes domain shift. Experiments on cross-domain tasks demonstrate the advantages of TLR over competing methods.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2018-08-19
URL	http://arxiv.org/abs/1808.06206v1
PDF	http://arxiv.org/pdf/1808.06206v1.pdf
PWC	https://paperswithcode.com/paper/tlr-transfer-latent-representation-for
Repo
Framework

PRESISTANT: Learning based assistant for data pre-processing


Title	PRESISTANT: Learning based assistant for data pre-processing
Authors	Besim Bilalli, Alberto Abelló, Tomàs Aluja-Banet, Robert Wrembel
Abstract	Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only “syntactically” applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks.
Tasks
Published	2018-03-02
URL	http://arxiv.org/abs/1803.01024v1
PDF	http://arxiv.org/pdf/1803.01024v1.pdf
PWC	https://paperswithcode.com/paper/presistant-learning-based-assistant-for-data
Repo
Framework

An Algorithmic Framework to Control Bias in Bandit-based Personalization


Title	An Algorithmic Framework to Control Bias in Bandit-based Personalization
Authors	L. Elisa Celis, Sayash Kapoor, Farnood Salehi, Nisheeth K. Vishnoi
Abstract	Personalization is pervasive in the online space as it leads to higher efficiency and revenue by allowing the most relevant content to be served to each user. However, recent studies suggest that personalization methods can propagate societal or systemic biases and polarize opinions; this has led to calls for regulatory mechanisms and algorithms to combat bias and inequality. Algorithmically, bandit optimization has enjoyed great success in learning user preferences and personalizing content or feeds accordingly. We propose an algorithmic framework that allows for the possibility to control bias or discrimination in such bandit-based personalization. Our model allows for the specification of general fairness constraints on the sensitive types of the content that can be displayed to a user. The challenge, however, is to come up with a scalable and low regret algorithm for the constrained optimization problem that arises. Our main technical contribution is a provably fast and low-regret algorithm for the fairness-constrained bandit optimization problem. Our proofs crucially leverage the special structure of our problem. Experiments on synthetic and real-world data sets show that our algorithmic framework can control bias with only a minor loss to revenue.
Tasks
Published	2018-02-23
URL	http://arxiv.org/abs/1802.08674v1
PDF	http://arxiv.org/pdf/1802.08674v1.pdf
PWC	https://paperswithcode.com/paper/an-algorithmic-framework-to-control-bias-in
Repo
Framework


Title	Social Media Analysis based on Semanticity of Streaming and Batch Data
Authors	Barathi Ganesh HB
Abstract	Languages shared by people differ in different regions based on their accents, pronunciation and word usages. In this era sharing of language takes place mainly through social media and blogs. Every second swing of such a micro posts exist which induces the need of processing those micro posts, in-order to extract knowledge out of it. Knowledge extraction differs with respect to the application in which the research on cognitive science fed the necessities for the same. This work further moves forward such a research by extracting semantic information of streaming and batch data in applications like Named Entity Recognition and Author Profiling. In the case of Named Entity Recognition context of a single micro post has been utilized and context that lies in the pool of micro posts were utilized to identify the sociolect aspects of the author of those micro posts. In this work Conditional Random Field has been utilized to do the entity recognition and a novel approach has been proposed to find the sociolect aspects of the author (Gender, Age group).
Tasks	Named Entity Recognition
Published	2018-01-03
URL	http://arxiv.org/abs/1801.01102v2
PDF	http://arxiv.org/pdf/1801.01102v2.pdf
PWC	https://paperswithcode.com/paper/social-media-analysis-based-on-semanticity-of
Repo
Framework

Finding GEMS: Multi-Scale Dictionaries for High-Dimensional Graph Signals


Title	Finding GEMS: Multi-Scale Dictionaries for High-Dimensional Graph Signals
Authors	Yael Yankelevsky, Michael Elad
Abstract	Modern data introduces new challenges to classic signal processing approaches, leading to a growing interest in the field of graph signal processing. A powerful and well established model for real world signals in various domains is sparse representation over a dictionary, combined with the ability to train the dictionary from signal examples. This model has been successfully applied to graph signals as well by integrating the underlying graph topology into the learned dictionary. Nonetheless, dictionary learning methods for graph signals are typically restricted to small dimensions due to the computational constraints that the dictionary learning problem entails, and due to the direct use of the graph Laplacian matrix. In this paper, we propose a dictionary learning algorithm that applies to a broader class of graph signals, and is capable of handling much higher dimensional data. We incorporate the underlying graph topology both implicitly, by forcing the learned dictionary atoms to be sparse combinations of graph-wavelet functions, and explicitly, by adding direct graph constraints to promote smoothness in both the feature and manifold domains. The resulting atoms are thus adapted to the data of interest while adhering to the underlying graph structure and possessing a desired multi-scale property. Experimental results on several datasets, representing both synthetic and real network data of different nature, demonstrate the effectiveness of the proposed algorithm for graph signal processing even in high dimensions.
Tasks	Dictionary Learning
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05356v1
PDF	http://arxiv.org/pdf/1806.05356v1.pdf
PWC	https://paperswithcode.com/paper/finding-gems-multi-scale-dictionaries-for
Repo
Framework

MOBA-Slice: A Time Slice Based Evaluation Framework of Relative Advantage between Teams in MOBA Games


Title	MOBA-Slice: A Time Slice Based Evaluation Framework of Relative Advantage between Teams in MOBA Games
Authors	Lijun Yu, Dawei Zhang, Xiangqun Chen, Xing Xie
Abstract	Multiplayer Online Battle Arena (MOBA) is currently one of the most popular genres of digital games around the world. The domain of knowledge contained in these complicated games is large. It is hard for humans and algorithms to evaluate the real-time game situation or predict the game result. In this paper, we introduce MOBA-Slice, a time slice based evaluation framework of relative advantage between teams in MOBA games. MOBA-Slice is a quantitative evaluation method based on learning, similar to the value network of AlphaGo. It establishes a foundation for further MOBA related research including AI development. In MOBA-Slice, with an analysis of the deciding factors of MOBA game results, we design a neural network model to fit our discounted evaluation function. Then we apply MOBA-Slice to Defense of the Ancients 2 (DotA2), a typical and popular MOBA game. Experiments on a large number of match replays show that our model works well on arbitrary matches. MOBA-Slice not only has an accuracy 3.7% higher than DotA Plus Assistant at result prediction, but also supports the prediction of the remaining time of the game, and then realizes the evaluation of relative advantage between teams.
Tasks
Published	2018-07-22
URL	http://arxiv.org/abs/1807.08360v1
PDF	http://arxiv.org/pdf/1807.08360v1.pdf
PWC	https://paperswithcode.com/paper/moba-slice-a-time-slice-based-evaluation
Repo
Framework

Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective


Title	Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective
Authors	Yuan Luo, Peter Szolovits
Abstract	This paper presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard, customized and in-house developed NLP tools. Our system facilitates portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements. It utilizes UMLS to perform domain adaptation when integrating generic domain NLP tools. It also features stand-off annotations that are specified by positional reference to the original document. We built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements. We also developed a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations. We experimented with our system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical notes. These experiments showcased the broader applicability and utility of LAPNLP.
Tasks	Computational Phenotyping, Domain Adaptation, Relation Extraction
Published	2018-11-15
URL	http://arxiv.org/abs/1811.06179v1
PDF	http://arxiv.org/pdf/1811.06179v1.pdf
PWC	https://paperswithcode.com/paper/implementing-a-portable-clinical-nlp-system
Repo
Framework

Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer


Title	Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer
Authors	Zexian Zeng, Ankita Roy, Xiaoyu Li, Sasa Espino, Susan Clare, Seema Khan, Yuan Luo
Abstract	Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We apply MetaMap to extract features from clinical narratives and also retrieve structured clinical data from EHR. Using these features, we train a support vector machine model to identify distant recurrences in breast cancer patients. We train the model using 1,396 double-annotated subjects and validate the model using 599 double-annotated subjects. In addition, we validate the model on a set of 4,904 single-annotated subjects as a generalization test. We obtained a high area under curve (AUC) score of 0.92 (SD=0.01) in the cross-validation using the training dataset, then obtained AUC scores of 0.95 and 0.93 in the held-out test and generalization test using 599 and 4,904 samples respectively. Our model can accurately and efficiently identify distant recurrences in breast cancer by combining features extracted from unstructured clinical narratives and structured clinical data.
Tasks	Computational Phenotyping
Published	2018-06-13
URL	http://arxiv.org/abs/1806.04818v2
PDF	http://arxiv.org/pdf/1806.04818v2.pdf
PWC	https://paperswithcode.com/paper/using-clinical-narratives-and-structured-data
Repo
Framework

3D Terrain Segmentation in the SWIR Spectrum


Title	3D Terrain Segmentation in the SWIR Spectrum
Authors	Dalton Rosario, Anthony Ortiz, Olac Fuentes
Abstract	We focus on the automatic 3D terrain segmentation problem using hyperspectral shortwave IR (HS-SWIR) imagery and 3D Digital Elevation Models (DEM). The datasets were independently collected, and metadata for the HS-SWIR dataset are unavailable. We explore an overall slope of the SWIR spectrum that correlates with the presence of moisture in soil to propose a band ratio test to be used as a proxy for soil moisture content to distinguish two broad classes of objects: live vegetation from impermeable manmade surface. We show that image based localization techniques combined with the Optimal Randomized RANdom Sample Consensus (RANSAC) algorithm achieve precise spatial matches between HS-SWIR data of a portion of downtown Los Angeles (LA (USA)) and the Visible image of a geo-registered 3D DEM, covering a wider-area of LA. Our spectral-elevation rule based approach yields an overall accuracy of 97.7%, segmenting the object classes into buildings, houses, trees, grass, and roads/parking lots.
Tasks	Image-Based Localization
Published	2018-10-27
URL	http://arxiv.org/abs/1810.11690v1
PDF	http://arxiv.org/pdf/1810.11690v1.pdf
PWC	https://paperswithcode.com/paper/3d-terrain-segmentation-in-the-swir-spectrum
Repo
Framework

Composing photomosaic images using clustering based evolutionary programming


Title	Composing photomosaic images using clustering based evolutionary programming
Authors	Yaodong He, Jianfeng Zhou, Shiu Yin Yuen
Abstract	Photomosaic images are a type of images consisting of various tiny images. A complete form can be seen clearly by viewing it from a long distance. Small tiny images which replace blocks of the original image can be seen clearly by viewing it from a short distance. In the past, many algorithms have been proposed trying to automatically compose photomosaic images. Most of these algorithms are designed with greedy algorithms to match the blocks with the tiny images. To obtain a better visual sense and satisfy some commercial requirements, a constraint that a tiny image should not be repeatedly used many times is usually added. With the constraint, the photomosaic problem becomes a combinatorial optimization problem. Evolutionary algorithms imitating the process of natural selection are popular and powerful in combinatorial optimization problems. However, little work has been done on applying evolutionary algorithms to photomosaic problem. In this paper, we present an algorithm called clustering based evolutionary programming to deal with the problem. We give prior knowledge to the optimization algorithm which makes the optimization process converges faster. In our experiment, the proposed algorithm is compared with the state of the art algorithms and software. The results indicate that our algorithm performs the best.
Tasks	Combinatorial Optimization
Published	2018-04-09
URL	http://arxiv.org/abs/1804.02827v1
PDF	http://arxiv.org/pdf/1804.02827v1.pdf
PWC	https://paperswithcode.com/paper/composing-photomosaic-images-using-clustering
Repo
Framework

Online Feature Ranking for Intrusion Detection Systems


Title	Online Feature Ranking for Intrusion Detection Systems
Authors	Buse Gul Atli, Alexander Jung
Abstract	Many current approaches to the design of intrusion detection systems apply feature selection in a static, non-adaptive fashion. These methods often neglect the dynamic nature of network data which requires to use adaptive feature selection techniques. In this paper, we present a simple technique based on incremental learning of support vector machines in order to rank the features in real time within a streaming model for network data. Some illustrative numerical experiments with two popular benchmark datasets show that our approach allows to adapt to the changes in normal network behaviour and novel attack patterns which have not been experienced before.
Tasks	Feature Selection, Intrusion Detection
Published	2018-03-01
URL	http://arxiv.org/abs/1803.00530v2
PDF	http://arxiv.org/pdf/1803.00530v2.pdf
PWC	https://paperswithcode.com/paper/online-feature-ranking-for-intrusion
Repo
Framework

Predicting the time-evolution of multi-physics systems with sequence-to-sequence models


Title	Predicting the time-evolution of multi-physics systems with sequence-to-sequence models
Authors	K. D. Humbird, J. L. Peterson, R. G. McClarren
Abstract	In this work, sequence-to-sequence (seq2seq) models, originally developed for language translation, are used to predict the temporal evolution of complex, multi-physics computer simulations. The predictive performance of seq2seq models is compared to state transition models for datasets generated with multi-physics codes with varying levels of complexity - from simple 1D diffusion calculations to simulations of inertial confinement fusion implosions. Seq2seq models demonstrate the ability to accurately emulate complex systems, enabling the rapid estimation of the evolution of quantities of interest in computationally expensive simulations.
Tasks
Published	2018-11-14
URL	http://arxiv.org/abs/1811.05852v1
PDF	http://arxiv.org/pdf/1811.05852v1.pdf
PWC	https://paperswithcode.com/paper/predicting-the-time-evolution-of-multi
Repo
Framework

Prediction of Electric Multiple Unit Fleet Size Based on Convolutional Neural Network


Title	Prediction of Electric Multiple Unit Fleet Size Based on Convolutional Neural Network
Authors	Boliang Lin
Abstract	With the expansion of high-speed railway network and growth of passenger transportation demands, the fleet size of electric multiple unit (EMU) in China needs to be adjusted accordingly. Generally, an EMU train costs tens of millions of dollars which constitutes a significant portion of capital investment. Thus, the prediction of EMU fleet size has attracted increasing attention from associated railway departments. First, this paper introduces a typical architecture of convolutional neural network (CNN) and its basic theory. Then, some data of nine indices, such as passenger traffic volume and length of high-speed railways in operation, is collected and preprocessed. Next, a CNN and a backpropagation neural network (BPNN) are constructed and trained aiming to predict EMU fleet size in the following years. The differences and performances of these two networks in computation experiments are analyzed in-depth. The results indicate that the CNN is superior to the BPNN both in generalization ability and fitting accuracy, and CNN can serve as an aid in EMU fleet size prediction.
Tasks
Published	2018-09-03
URL	http://arxiv.org/abs/1809.00491v1
PDF	http://arxiv.org/pdf/1809.00491v1.pdf
PWC	https://paperswithcode.com/paper/prediction-of-electric-multiple-unit-fleet
Repo
Framework

Natural Language Processing for EHR-Based Computational Phenotyping


Title	Natural Language Processing for EHR-Based Computational Phenotyping
Authors	Zexian Zeng, Yu Deng, Xiaoyu Li, Tristan Naumann, Yuan Luo
Abstract	This article reviews recent advances in applying natural language processing (NLP) to Electronic Health Records (EHRs) for computational phenotyping. NLP-based computational phenotyping has numerous applications including diagnosis categorization, novel phenotype discovery, clinical trial screening, pharmacogenomics, drug-drug interaction (DDI) and adverse drug event (ADE) detection, as well as genome-wide and phenome-wide association studies. Significant progress has been made in algorithm development and resource construction for computational phenotyping. Among the surveyed methods, well-designed keyword search and rule-based systems often achieve good performance. However, the construction of keyword and rule lists requires significant manual effort, which is difficult to scale. Supervised machine learning models have been favored because they are capable of acquiring both classification patterns and structures from data. Recently, deep learning and unsupervised learning have received growing attention, with the former favored for its performance and the latter for its ability to find novel phenotypes. Integrating heterogeneous data sources have become increasingly important and have shown promise in improving model performance. Often better performance is achieved by combining multiple modalities of information. Despite these many advances, challenges and opportunities remain for NLP-based computational phenotyping, including better model interpretability and generalizability, and proper characterization of feature relations in clinical narratives
Tasks	Computational Phenotyping
Published	2018-06-13
URL	http://arxiv.org/abs/1806.04820v2
PDF	http://arxiv.org/pdf/1806.04820v2.pdf
PWC	https://paperswithcode.com/paper/natural-language-processing-for-ehr-based
Repo
Framework