May 4, 2019

2138 words 11 mins read

Paper Group NANR 210

Tractable Operations for Arithmetic Circuits of Probabilistic Models. The impact of simple feature engineering in multilingual medical NER. Climbing Mont BLEU: The Strange World of Reachable High-BLEU Translations. Sparse Support Recovery with Non-smooth Loss Functions. Agreement on Target-bidirectional Neural Machine Translation. Towards a Distrib …

Tractable Operations for Arithmetic Circuits of Probabilistic Models


Title	Tractable Operations for Arithmetic Circuits of Probabilistic Models
Authors	Yujia Shen, Arthur Choi, Adnan Darwiche
Abstract	We consider tractable representations of probability distributions and the polytime operations they support. In particular, we consider a recently proposed arithmetic circuit representation, the Probabilistic Sentential Decision Diagram (PSDD). We show that PSDD supports a polytime multiplication operator, while they do not support a polytime operator for summing-out variables. A polytime multiplication operator make PSDDs suitable for a broader class of applications compared to arithmetic circuits, which do not in general support multiplication. As one example, we show that PSDD multiplication leads to a very simple but effective compilation algorithm for probabilistic graphical models: represent each model factor as a PSDD, and then multiply them.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6363-tractable-operations-for-arithmetic-circuits-of-probabilistic-models
PDF	http://papers.nips.cc/paper/6363-tractable-operations-for-arithmetic-circuits-of-probabilistic-models.pdf
PWC	https://paperswithcode.com/paper/tractable-operations-for-arithmetic-circuits
Repo
Framework

The impact of simple feature engineering in multilingual medical NER


Title	The impact of simple feature engineering in multilingual medical NER
Authors	Rebecka Weegar, Arantza Casillas, Arantza Diaz de Ilarraza, Maite Oronoz, Alicia P{'e}rez, Koldo Gojenola
Abstract	The goal of this paper is to examine the impact of simple feature engineering mechanisms before applying more sophisticated techniques to the task of medical NER. Sometimes papers using scientifically sound techniques present raw baselines that could be improved adding simple and cheap features. This work focuses on entity recognition for the clinical domain for three languages: English, Swedish and Spanish. The task is tackled using simple features, starting from the window size, capitalization, prefixes, and moving to POS and semantic tags. This work demonstrates that a simple initial step of feature engineering can improve the baseline results significantly. Hence, the contributions of this paper are: first, a short list of guidelines well supported with experimental results on three languages and, second, a detailed description of the relevance of these features for medical NER.
Tasks	Feature Engineering, Lemmatization, Named Entity Recognition, Relation Extraction
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4201/
PDF	https://www.aclweb.org/anthology/W16-4201
PWC	https://paperswithcode.com/paper/the-impact-of-simple-feature-engineering-in
Repo
Framework

Climbing Mont BLEU: The Strange World of Reachable High-BLEU Translations


Title	Climbing Mont BLEU: The Strange World of Reachable High-BLEU Translations
Authors	Aaron Smith, Christian Hardmeier, Joerg Tiedemann
Abstract
Tasks	Machine Translation
Published	2016-01-01
URL	https://www.aclweb.org/anthology/W16-3414/
PDF	https://www.aclweb.org/anthology/W16-3414
PWC	https://paperswithcode.com/paper/climbing-mont-bleu-the-strange-world-of
Repo
Framework

Sparse Support Recovery with Non-smooth Loss Functions


Title	Sparse Support Recovery with Non-smooth Loss Functions
Authors	Kévin Degraux, Gabriel Peyré, Jalal Fadili, Laurent Jacques
Abstract	In this paper, we study the support recovery guarantees of underdetermined sparse regression using the $\ell_1$-norm as a regularizer and a non-smooth loss function for data fidelity. More precisely, we focus in detail on the cases of $\ell_1$ and $\ell_\infty$ losses, and contrast them with the usual $\ell_2$ loss.While these losses are routinely used to account for either sparse ($\ell_1$ loss) or uniform ($\ell_\infty$ loss) noise models, a theoretical analysis of their performance is still lacking. In this article, we extend the existing theory from the smooth $\ell_2$ case to these non-smooth cases. We derive a sharp condition which ensures that the support of the vector to recover is stable to small additive noise in the observations, as long as the loss constraint size is tuned proportionally to the noise level. A distinctive feature of our theory is that it also explains what happens when the support is unstable. While the support is not stable anymore, we identify an “extended support” and show that this extended support is stable to small additive noise. To exemplify the usefulness of our theory, we give a detailed numerical analysis of the support stability/instability of compressed sensing recovery with these different losses. This highlights different parameter regimes, ranging from total support stability to progressively increasing support instability.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6559-sparse-support-recovery-with-non-smooth-loss-functions
PDF	http://papers.nips.cc/paper/6559-sparse-support-recovery-with-non-smooth-loss-functions.pdf
PWC	https://paperswithcode.com/paper/sparse-support-recovery-with-non-smooth-loss
Repo
Framework

Agreement on Target-bidirectional Neural Machine Translation


Title	Agreement on Target-bidirectional Neural Machine Translation
Authors	Lemao Liu, Masao Utiyama, Andrew Finch, Eiichiro Sumita
Abstract
Tasks	Machine Translation, Structured Prediction, Transliteration
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1046/
PDF	https://www.aclweb.org/anthology/N16-1046
PWC	https://paperswithcode.com/paper/agreement-on-target-bidirectional-neural
Repo
Framework

Towards a Distributional Model of Semantic Complexity


Title	Towards a Distributional Model of Semantic Complexity
Authors	Emmanuele Chersoni, Philippe Blache, Aless Lenci, ro
Abstract	In this paper, we introduce for the first time a Distributional Model for computing semantic complexity, inspired by the general principles of the Memory, Unification and Control framework(Hagoort, 2013; Hagoort, 2016). We argue that sentence comprehension is an incremental process driven by the goal of constructing a coherent representation of the event represented by the sentence. The composition cost of a sentence depends on the semantic coherence of the event being constructed and on the activation degree of the linguistic constructions. We also report the results of a first evaluation of the model on the Bicknell dataset (Bicknell et al., 2010).
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4102/
PDF	https://www.aclweb.org/anthology/W16-4102
PWC	https://paperswithcode.com/paper/towards-a-distributional-model-of-semantic
Repo
Framework

EmpiriST: AIPHES - Robust Tokenization and POS-Tagging for Different Genres


Title	EmpiriST: AIPHES - Robust Tokenization and POS-Tagging for Different Genres
Authors	Steffen Remus, Gerold Hintz, Chris Biemann, Christian M. Meyer, Darina Benikova, Judith Eckle-Kohler, Margot Mieskes, Thomas Arnold
Abstract
Tasks	Machine Translation, Part-Of-Speech Tagging, Tokenization
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2613/
PDF	https://www.aclweb.org/anthology/W16-2613
PWC	https://paperswithcode.com/paper/empirist-aiphes-robust-tokenization-and-pos
Repo
Framework


Title	SoMaJo: State-of-the-art tokenization for German web and social media texts
Authors	Thomas Proisl, Peter Uhrig
Abstract
Tasks	Lemmatization, Tokenization
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2607/
PDF	https://www.aclweb.org/anthology/W16-2607
PWC	https://paperswithcode.com/paper/somajo-state-of-the-art-tokenization-for
Repo
Framework


Title	LTL-UDE @ EmpiriST 2015: Tokenization and PoS Tagging of Social Media Text
Authors	Tobias Horsmann, Torsten Zesch
Abstract
Tasks	Boundary Detection, Part-Of-Speech Tagging, Tokenization
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2615/
PDF	https://www.aclweb.org/anthology/W16-2615
PWC	https://paperswithcode.com/paper/ltl-ude-empirist-2015-tokenization-and-pos
Repo
Framework

Deep Learning Architecture for Patient Data De-identification in Clinical Records


Title	Deep Learning Architecture for Patient Data De-identification in Clinical Records
Authors	Shweta Yadav, Asif Ekbal, Sriparna Saha, Pushpak Bhattacharyya
Abstract	Rapid growth in Electronic Medical Records (EMR) has emerged to an expansion of data in the clinical domain. The majority of the available health care information is sealed in the form of narrative documents which form the rich source of clinical information. Text mining of such clinical records has gained huge attention in various medical applications like treatment and decision making. However, medical records enclose patient Private Health Information (PHI) which can reveal the identities of the patients. In order to retain the privacy of patients, it is mandatory to remove all the PHI information prior to making it publicly available. The aim is to de-identify or encrypt the PHI from the patient medical records. In this paper, we propose an algorithm based on deep learning architecture to solve this problem. We perform de-identification of seven PHI terms from the clinical records. Experiments on benchmark datasets show that our proposed approach achieves encouraging performance, which is better than the baseline model developed with Conditional Random Field.
Tasks	Decision Making, Named Entity Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4206/
PDF	https://www.aclweb.org/anthology/W16-4206
PWC	https://paperswithcode.com/paper/deep-learning-architecture-for-patient-data
Repo
Framework

Detecting Japanese Patients with Alzheimer’s Disease based on Word Category Frequencies


Title	Detecting Japanese Patients with Alzheimer’s Disease based on Word Category Frequencies
Authors	Daisaku Shibata, Shoko Wakamiya, Ayae Kinoshita, Eiji Aramaki
Abstract	In recent years, detecting Alzheimer disease (AD) in early stages based on natural language processing (NLP) has drawn much attention. To date, vocabulary size, grammatical complexity, and fluency have been studied using NLP metrics. However, the content analysis of AD narratives is still unreachable for NLP. This study investigates features of the words that AD patients use in their spoken language. After recruiting 18 examinees of 53{–}90 years old (mean: 76.89), they were divided into two groups based on MMSE scores. The AD group comprised 9 examinees with scores of 21 or lower. The healthy control group comprised 9 examinees with a score of 22 or higher. Linguistic Inquiry and Word Count (LIWC) classified words were used to categorize the words that the examinees used. The word frequency was found from observation. Significant differences were confirmed for the usage of impersonal pronouns in the AD group. This result demonstrated the basic feasibility of the proposed NLP-based detection approach.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4211/
PDF	https://www.aclweb.org/anthology/W16-4211
PWC	https://paperswithcode.com/paper/detecting-japanese-patients-with-alzheimeras
Repo
Framework

SPA: Web-based Platform for easy Access to Speech Processing Modules


Title	SPA: Web-based Platform for easy Access to Speech Processing Modules
Authors	Fern Batista, o, Pedro Curto, Isabel Trancoso, Alberto Abad, Jaime Ferreira, Eug{'e}nio Ribeiro, Helena Moniz, David Martins de Matos, Ricardo Ribeiro
Abstract	This paper presents SPA, a web-based Speech Analytics platform that integrates several speech processing modules and that makes it possible to use them through the web. It was developed with the aim of facilitating the usage of the modules, without the need to know about software dependencies and specific configurations. Apart from being accessed by a web-browser, the platform also provides a REST API for easy integration with other applications. The platform is flexible, scalable, provides authentication for access restrictions, and was developed taking into consideration the time and effort of providing new services. The platform is still being improved, but it already integrates a considerable number of audio and text processing modules, including: Automatic transcription, speech disfluency classification, emotion detection, dialog act recognition, age and gender classification, non-nativeness detection, hyper-articulation detection, dialog act recognition, and two external modules for feature extraction and DTMF detection. This paper describes the SPA architecture, presents the already integrated modules, and provides a detailed description for the ones most recently integrated.
Tasks	Age And Gender Classification
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1615/
PDF	https://www.aclweb.org/anthology/L16-1615
PWC	https://paperswithcode.com/paper/spa-web-based-platform-for-easy-access-to
Repo
Framework

Automated Anonymization as Spelling Variant Detection


Title	Automated Anonymization as Spelling Variant Detection
Authors	Steven Kester Yuwono, Hwee Tou Ng, Kee Yuan Ngiam
Abstract	The issue of privacy has always been a concern when clinical texts are used for research purposes. Personal health information (PHI) (such as name and identification number) needs to be removed so that patients cannot be identified. Manual anonymization is not feasible due to the large number of clinical texts to be anonymized. In this paper, we tackle the task of anonymizing clinical texts written in sentence fragments and which frequently contain symbols, abbreviations, and misspelled words. Our clinical texts therefore differ from those in the i2b2 shared tasks which are in prose form with complete sentences. Our clinical texts are also part of a structured database which contains patient name and identification number in structured fields. As such, we formulate our anonymization task as spelling variant detection, exploiting patients{'} personal information in the structured fields to detect their spelling variants in clinical texts. We successfully anonymized clinical texts consisting of more than 200 million words, using minimum edit distance and regular expression patterns.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4214/
PDF	https://www.aclweb.org/anthology/W16-4214
PWC	https://paperswithcode.com/paper/automated-anonymization-as-spelling-variant
Repo
Framework

The Effect of Gender and Age Differences on the Recognition of Emotions from Facial Expressions


Title	The Effect of Gender and Age Differences on the Recognition of Emotions from Facial Expressions
Authors	Daniela Schneevogt, Patrizia Paggio
Abstract	Recent studies have demonstrated gender and cultural differences in the recognition of emotions in facial expressions. However, most studies were conducted on American subjects. In this paper, we explore the generalizability of several findings to a non-American culture in the form of Danish subjects. We conduct an emotion recognition task followed by two stereotype questionnaires with different genders and age groups. While recent findings (Krems et al., 2015) suggest that women are biased to see anger in neutral facial expressions posed by females, in our sample both genders assign higher ratings of anger to all emotions expressed by females. Furthermore, we demonstrate an effect of gender on the fear-surprise-confusion observed by Tomkins and McCarter (1964); females overpredict fear, while males overpredict surprise.
Tasks	Emotion Recognition, Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4302/
PDF	https://www.aclweb.org/anthology/W16-4302
PWC	https://paperswithcode.com/paper/the-effect-of-gender-and-age-differences-on
Repo
Framework

A Recurrent and Compositional Model for Personality Trait Recognition from Short Texts


Title	A Recurrent and Compositional Model for Personality Trait Recognition from Short Texts
Authors	Fei Liu, Julien Perez, Scott Nowson
Abstract	Many methods have been used to recognise author personality traits from text, typically combining linguistic feature engineering with shallow learning models, e.g. linear regression or Support Vector Machines. This work uses deep-learning-based models and atomic features of text, the characters, to build hierarchical, vectorial word and sentence representations for trait inference. This method, applied to a corpus of tweets, shows state-of-the-art performance across five traits compared with prior work. The results, supported by preliminary visualisation work, are encouraging for the ability to detect complex human traits.
Tasks	Feature Engineering, Part-Of-Speech Tagging, Personality Trait Recognition, Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4303/
PDF	https://www.aclweb.org/anthology/W16-4303
PWC	https://paperswithcode.com/paper/a-recurrent-and-compositional-model-for
Repo
Framework