Paper Group ANR 44
Reasoning about multiple aspects in DLs: Semantics and Closure Construction. TLR: Transfer Latent Representation for Unsupervised Domain Adaptation. PRESISTANT: Learning based assistant for data pre-processing. An Algorithmic Framework to Control Bias in Bandit-based Personalization. Social Media Analysis based on Semanticity of Streaming and Batch …
Reasoning about multiple aspects in DLs: Semantics and Closure Construction
Title | Reasoning about multiple aspects in DLs: Semantics and Closure Construction |
Authors | Laura Giordano, Valentina Gliozzi |
Abstract | Starting from the observation that rational closure has the undesirable property of being an “all or nothing” mechanism, we here propose a multipreferential semantics, which enriches the preferential semantics underlying rational closure in order to separately deal with the inheritance of different properties in an ontology with exceptions. We provide a multipreference closure mechanism which is sound with respect to the multipreference semantics. |
Tasks | |
Published | 2018-01-18 |
URL | http://arxiv.org/abs/1801.07161v1 |
http://arxiv.org/pdf/1801.07161v1.pdf | |
PWC | https://paperswithcode.com/paper/reasoning-about-multiple-aspects-in-dls |
Repo | |
Framework | |
TLR: Transfer Latent Representation for Unsupervised Domain Adaptation
Title | TLR: Transfer Latent Representation for Unsupervised Domain Adaptation |
Authors | Pan Xiao, Bo Du, Jia Wu, Lefei Zhang, Ruimin Hu, Xuelong Li |
Abstract | Domain adaptation refers to the process of learning prediction models in a target domain by making use of data from a source domain. Many classic methods solve the domain adaptation problem by establishing a common latent space, which may cause the loss of many important properties across both domains. In this manuscript, we develop a novel method, transfer latent representation (TLR), to learn a better latent space. Specifically, we design an objective function based on a simple linear autoencoder to derive the latent representations of both domains. The encoder in the autoencoder aims to project the data of both domains into a robust latent space. Besides, the decoder imposes an additional constraint to reconstruct the original data, which can preserve the common properties of both domains and reduce the noise that causes domain shift. Experiments on cross-domain tasks demonstrate the advantages of TLR over competing methods. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2018-08-19 |
URL | http://arxiv.org/abs/1808.06206v1 |
http://arxiv.org/pdf/1808.06206v1.pdf | |
PWC | https://paperswithcode.com/paper/tlr-transfer-latent-representation-for |
Repo | |
Framework | |
PRESISTANT: Learning based assistant for data pre-processing
Title | PRESISTANT: Learning based assistant for data pre-processing |
Authors | Besim Bilalli, Alberto Abelló, Tomàs Aluja-Banet, Robert Wrembel |
Abstract | Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only “syntactically” applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks. |
Tasks | |
Published | 2018-03-02 |
URL | http://arxiv.org/abs/1803.01024v1 |
http://arxiv.org/pdf/1803.01024v1.pdf | |
PWC | https://paperswithcode.com/paper/presistant-learning-based-assistant-for-data |
Repo | |
Framework | |
An Algorithmic Framework to Control Bias in Bandit-based Personalization
Title | An Algorithmic Framework to Control Bias in Bandit-based Personalization |
Authors | L. Elisa Celis, Sayash Kapoor, Farnood Salehi, Nisheeth K. Vishnoi |
Abstract | Personalization is pervasive in the online space as it leads to higher efficiency and revenue by allowing the most relevant content to be served to each user. However, recent studies suggest that personalization methods can propagate societal or systemic biases and polarize opinions; this has led to calls for regulatory mechanisms and algorithms to combat bias and inequality. Algorithmically, bandit optimization has enjoyed great success in learning user preferences and personalizing content or feeds accordingly. We propose an algorithmic framework that allows for the possibility to control bias or discrimination in such bandit-based personalization. Our model allows for the specification of general fairness constraints on the sensitive types of the content that can be displayed to a user. The challenge, however, is to come up with a scalable and low regret algorithm for the constrained optimization problem that arises. Our main technical contribution is a provably fast and low-regret algorithm for the fairness-constrained bandit optimization problem. Our proofs crucially leverage the special structure of our problem. Experiments on synthetic and real-world data sets show that our algorithmic framework can control bias with only a minor loss to revenue. |
Tasks | |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.08674v1 |
http://arxiv.org/pdf/1802.08674v1.pdf | |
PWC | https://paperswithcode.com/paper/an-algorithmic-framework-to-control-bias-in |
Repo | |
Framework | |
Social Media Analysis based on Semanticity of Streaming and Batch Data
Title | Social Media Analysis based on Semanticity of Streaming and Batch Data |
Authors | Barathi Ganesh HB |
Abstract | Languages shared by people differ in different regions based on their accents, pronunciation and word usages. In this era sharing of language takes place mainly through social media and blogs. Every second swing of such a micro posts exist which induces the need of processing those micro posts, in-order to extract knowledge out of it. Knowledge extraction differs with respect to the application in which the research on cognitive science fed the necessities for the same. This work further moves forward such a research by extracting semantic information of streaming and batch data in applications like Named Entity Recognition and Author Profiling. In the case of Named Entity Recognition context of a single micro post has been utilized and context that lies in the pool of micro posts were utilized to identify the sociolect aspects of the author of those micro posts. In this work Conditional Random Field has been utilized to do the entity recognition and a novel approach has been proposed to find the sociolect aspects of the author (Gender, Age group). |
Tasks | Named Entity Recognition |
Published | 2018-01-03 |
URL | http://arxiv.org/abs/1801.01102v2 |
http://arxiv.org/pdf/1801.01102v2.pdf | |
PWC | https://paperswithcode.com/paper/social-media-analysis-based-on-semanticity-of |
Repo | |
Framework | |
Finding GEMS: Multi-Scale Dictionaries for High-Dimensional Graph Signals
Title | Finding GEMS: Multi-Scale Dictionaries for High-Dimensional Graph Signals |
Authors | Yael Yankelevsky, Michael Elad |
Abstract | Modern data introduces new challenges to classic signal processing approaches, leading to a growing interest in the field of graph signal processing. A powerful and well established model for real world signals in various domains is sparse representation over a dictionary, combined with the ability to train the dictionary from signal examples. This model has been successfully applied to graph signals as well by integrating the underlying graph topology into the learned dictionary. Nonetheless, dictionary learning methods for graph signals are typically restricted to small dimensions due to the computational constraints that the dictionary learning problem entails, and due to the direct use of the graph Laplacian matrix. In this paper, we propose a dictionary learning algorithm that applies to a broader class of graph signals, and is capable of handling much higher dimensional data. We incorporate the underlying graph topology both implicitly, by forcing the learned dictionary atoms to be sparse combinations of graph-wavelet functions, and explicitly, by adding direct graph constraints to promote smoothness in both the feature and manifold domains. The resulting atoms are thus adapted to the data of interest while adhering to the underlying graph structure and possessing a desired multi-scale property. Experimental results on several datasets, representing both synthetic and real network data of different nature, demonstrate the effectiveness of the proposed algorithm for graph signal processing even in high dimensions. |
Tasks | Dictionary Learning |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05356v1 |
http://arxiv.org/pdf/1806.05356v1.pdf | |
PWC | https://paperswithcode.com/paper/finding-gems-multi-scale-dictionaries-for |
Repo | |
Framework | |
MOBA-Slice: A Time Slice Based Evaluation Framework of Relative Advantage between Teams in MOBA Games
Title | MOBA-Slice: A Time Slice Based Evaluation Framework of Relative Advantage between Teams in MOBA Games |
Authors | Lijun Yu, Dawei Zhang, Xiangqun Chen, Xing Xie |
Abstract | Multiplayer Online Battle Arena (MOBA) is currently one of the most popular genres of digital games around the world. The domain of knowledge contained in these complicated games is large. It is hard for humans and algorithms to evaluate the real-time game situation or predict the game result. In this paper, we introduce MOBA-Slice, a time slice based evaluation framework of relative advantage between teams in MOBA games. MOBA-Slice is a quantitative evaluation method based on learning, similar to the value network of AlphaGo. It establishes a foundation for further MOBA related research including AI development. In MOBA-Slice, with an analysis of the deciding factors of MOBA game results, we design a neural network model to fit our discounted evaluation function. Then we apply MOBA-Slice to Defense of the Ancients 2 (DotA2), a typical and popular MOBA game. Experiments on a large number of match replays show that our model works well on arbitrary matches. MOBA-Slice not only has an accuracy 3.7% higher than DotA Plus Assistant at result prediction, but also supports the prediction of the remaining time of the game, and then realizes the evaluation of relative advantage between teams. |
Tasks | |
Published | 2018-07-22 |
URL | http://arxiv.org/abs/1807.08360v1 |
http://arxiv.org/pdf/1807.08360v1.pdf | |
PWC | https://paperswithcode.com/paper/moba-slice-a-time-slice-based-evaluation |
Repo | |
Framework | |
Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective
Title | Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective |
Authors | Yuan Luo, Peter Szolovits |
Abstract | This paper presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard, customized and in-house developed NLP tools. Our system facilitates portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements. It utilizes UMLS to perform domain adaptation when integrating generic domain NLP tools. It also features stand-off annotations that are specified by positional reference to the original document. We built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements. We also developed a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations. We experimented with our system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical notes. These experiments showcased the broader applicability and utility of LAPNLP. |
Tasks | Computational Phenotyping, Domain Adaptation, Relation Extraction |
Published | 2018-11-15 |
URL | http://arxiv.org/abs/1811.06179v1 |
http://arxiv.org/pdf/1811.06179v1.pdf | |
PWC | https://paperswithcode.com/paper/implementing-a-portable-clinical-nlp-system |
Repo | |
Framework | |
Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer
Title | Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer |
Authors | Zexian Zeng, Ankita Roy, Xiaoyu Li, Sasa Espino, Susan Clare, Seema Khan, Yuan Luo |
Abstract | Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We apply MetaMap to extract features from clinical narratives and also retrieve structured clinical data from EHR. Using these features, we train a support vector machine model to identify distant recurrences in breast cancer patients. We train the model using 1,396 double-annotated subjects and validate the model using 599 double-annotated subjects. In addition, we validate the model on a set of 4,904 single-annotated subjects as a generalization test. We obtained a high area under curve (AUC) score of 0.92 (SD=0.01) in the cross-validation using the training dataset, then obtained AUC scores of 0.95 and 0.93 in the held-out test and generalization test using 599 and 4,904 samples respectively. Our model can accurately and efficiently identify distant recurrences in breast cancer by combining features extracted from unstructured clinical narratives and structured clinical data. |
Tasks | Computational Phenotyping |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.04818v2 |
http://arxiv.org/pdf/1806.04818v2.pdf | |
PWC | https://paperswithcode.com/paper/using-clinical-narratives-and-structured-data |
Repo | |
Framework | |
3D Terrain Segmentation in the SWIR Spectrum
Title | 3D Terrain Segmentation in the SWIR Spectrum |
Authors | Dalton Rosario, Anthony Ortiz, Olac Fuentes |
Abstract | We focus on the automatic 3D terrain segmentation problem using hyperspectral shortwave IR (HS-SWIR) imagery and 3D Digital Elevation Models (DEM). The datasets were independently collected, and metadata for the HS-SWIR dataset are unavailable. We explore an overall slope of the SWIR spectrum that correlates with the presence of moisture in soil to propose a band ratio test to be used as a proxy for soil moisture content to distinguish two broad classes of objects: live vegetation from impermeable manmade surface. We show that image based localization techniques combined with the Optimal Randomized RANdom Sample Consensus (RANSAC) algorithm achieve precise spatial matches between HS-SWIR data of a portion of downtown Los Angeles (LA (USA)) and the Visible image of a geo-registered 3D DEM, covering a wider-area of LA. Our spectral-elevation rule based approach yields an overall accuracy of 97.7%, segmenting the object classes into buildings, houses, trees, grass, and roads/parking lots. |
Tasks | Image-Based Localization |
Published | 2018-10-27 |
URL | http://arxiv.org/abs/1810.11690v1 |
http://arxiv.org/pdf/1810.11690v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-terrain-segmentation-in-the-swir-spectrum |
Repo | |
Framework | |
Composing photomosaic images using clustering based evolutionary programming
Title | Composing photomosaic images using clustering based evolutionary programming |
Authors | Yaodong He, Jianfeng Zhou, Shiu Yin Yuen |
Abstract | Photomosaic images are a type of images consisting of various tiny images. A complete form can be seen clearly by viewing it from a long distance. Small tiny images which replace blocks of the original image can be seen clearly by viewing it from a short distance. In the past, many algorithms have been proposed trying to automatically compose photomosaic images. Most of these algorithms are designed with greedy algorithms to match the blocks with the tiny images. To obtain a better visual sense and satisfy some commercial requirements, a constraint that a tiny image should not be repeatedly used many times is usually added. With the constraint, the photomosaic problem becomes a combinatorial optimization problem. Evolutionary algorithms imitating the process of natural selection are popular and powerful in combinatorial optimization problems. However, little work has been done on applying evolutionary algorithms to photomosaic problem. In this paper, we present an algorithm called clustering based evolutionary programming to deal with the problem. We give prior knowledge to the optimization algorithm which makes the optimization process converges faster. In our experiment, the proposed algorithm is compared with the state of the art algorithms and software. The results indicate that our algorithm performs the best. |
Tasks | Combinatorial Optimization |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.02827v1 |
http://arxiv.org/pdf/1804.02827v1.pdf | |
PWC | https://paperswithcode.com/paper/composing-photomosaic-images-using-clustering |
Repo | |
Framework | |
Online Feature Ranking for Intrusion Detection Systems
Title | Online Feature Ranking for Intrusion Detection Systems |
Authors | Buse Gul Atli, Alexander Jung |
Abstract | Many current approaches to the design of intrusion detection systems apply feature selection in a static, non-adaptive fashion. These methods often neglect the dynamic nature of network data which requires to use adaptive feature selection techniques. In this paper, we present a simple technique based on incremental learning of support vector machines in order to rank the features in real time within a streaming model for network data. Some illustrative numerical experiments with two popular benchmark datasets show that our approach allows to adapt to the changes in normal network behaviour and novel attack patterns which have not been experienced before. |
Tasks | Feature Selection, Intrusion Detection |
Published | 2018-03-01 |
URL | http://arxiv.org/abs/1803.00530v2 |
http://arxiv.org/pdf/1803.00530v2.pdf | |
PWC | https://paperswithcode.com/paper/online-feature-ranking-for-intrusion |
Repo | |
Framework | |
Predicting the time-evolution of multi-physics systems with sequence-to-sequence models
Title | Predicting the time-evolution of multi-physics systems with sequence-to-sequence models |
Authors | K. D. Humbird, J. L. Peterson, R. G. McClarren |
Abstract | In this work, sequence-to-sequence (seq2seq) models, originally developed for language translation, are used to predict the temporal evolution of complex, multi-physics computer simulations. The predictive performance of seq2seq models is compared to state transition models for datasets generated with multi-physics codes with varying levels of complexity - from simple 1D diffusion calculations to simulations of inertial confinement fusion implosions. Seq2seq models demonstrate the ability to accurately emulate complex systems, enabling the rapid estimation of the evolution of quantities of interest in computationally expensive simulations. |
Tasks | |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.05852v1 |
http://arxiv.org/pdf/1811.05852v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-time-evolution-of-multi |
Repo | |
Framework | |
Prediction of Electric Multiple Unit Fleet Size Based on Convolutional Neural Network
Title | Prediction of Electric Multiple Unit Fleet Size Based on Convolutional Neural Network |
Authors | Boliang Lin |
Abstract | With the expansion of high-speed railway network and growth of passenger transportation demands, the fleet size of electric multiple unit (EMU) in China needs to be adjusted accordingly. Generally, an EMU train costs tens of millions of dollars which constitutes a significant portion of capital investment. Thus, the prediction of EMU fleet size has attracted increasing attention from associated railway departments. First, this paper introduces a typical architecture of convolutional neural network (CNN) and its basic theory. Then, some data of nine indices, such as passenger traffic volume and length of high-speed railways in operation, is collected and preprocessed. Next, a CNN and a backpropagation neural network (BPNN) are constructed and trained aiming to predict EMU fleet size in the following years. The differences and performances of these two networks in computation experiments are analyzed in-depth. The results indicate that the CNN is superior to the BPNN both in generalization ability and fitting accuracy, and CNN can serve as an aid in EMU fleet size prediction. |
Tasks | |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00491v1 |
http://arxiv.org/pdf/1809.00491v1.pdf | |
PWC | https://paperswithcode.com/paper/prediction-of-electric-multiple-unit-fleet |
Repo | |
Framework | |
Natural Language Processing for EHR-Based Computational Phenotyping
Title | Natural Language Processing for EHR-Based Computational Phenotyping |
Authors | Zexian Zeng, Yu Deng, Xiaoyu Li, Tristan Naumann, Yuan Luo |
Abstract | This article reviews recent advances in applying natural language processing (NLP) to Electronic Health Records (EHRs) for computational phenotyping. NLP-based computational phenotyping has numerous applications including diagnosis categorization, novel phenotype discovery, clinical trial screening, pharmacogenomics, drug-drug interaction (DDI) and adverse drug event (ADE) detection, as well as genome-wide and phenome-wide association studies. Significant progress has been made in algorithm development and resource construction for computational phenotyping. Among the surveyed methods, well-designed keyword search and rule-based systems often achieve good performance. However, the construction of keyword and rule lists requires significant manual effort, which is difficult to scale. Supervised machine learning models have been favored because they are capable of acquiring both classification patterns and structures from data. Recently, deep learning and unsupervised learning have received growing attention, with the former favored for its performance and the latter for its ability to find novel phenotypes. Integrating heterogeneous data sources have become increasingly important and have shown promise in improving model performance. Often better performance is achieved by combining multiple modalities of information. Despite these many advances, challenges and opportunities remain for NLP-based computational phenotyping, including better model interpretability and generalizability, and proper characterization of feature relations in clinical narratives |
Tasks | Computational Phenotyping |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.04820v2 |
http://arxiv.org/pdf/1806.04820v2.pdf | |
PWC | https://paperswithcode.com/paper/natural-language-processing-for-ehr-based |
Repo | |
Framework | |