May 5, 2019

2135 words 11 mins read

Paper Group NANR 83

LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform. A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code. LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl. Evaluating a Topic Modelling Approach to Measuring Corpus Similarity. Detecting Expressions of Blame or …

LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform


Title	LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform
Authors	Olga Uryupina, Barbara Plank, Gianni Barlacchi, Francisco J. Valverde Albacete, Manos Tsagkias, Antonio Uva, Aless Moschitti, ro
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-4027/
PDF	https://www.aclweb.org/anthology/P16-4027
PWC	https://paperswithcode.com/paper/limosine-pipeline-multilingual-uima-based-nlp
Repo
Framework

A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code


Title	A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code
Authors	Rogelio Nazar, Irene Renau
Abstract	In this paper we describe our work in progress in the automatic development of a taxonomy of Spanish nouns, we offer the Perl implementation we have so far, and we discuss the different problems that still need to be addressed. We designed a statistically-based taxonomy induction algorithm consisting of a combination of different strategies not involving explicit linguistic knowledge. Being all quantitative, the strategies we present are however of different nature. Some of them are based on the computation of distributional similarity coefficients which identify pairs of sibling words or co-hyponyms, while others are based on asymmetric co-occurrence and identify pairs of parent-child words or hypernym-hyponym relations. A decision making process is then applied to combine the results of the previous steps, and finally connect lexical units to a basic structure containing the most general categories of the language. We evaluate the quality of the taxonomy both manually and also using Spanish Wordnet as a gold-standard. We estimate an average of 89.07{%} precision and 25.49{%} recall considering only the results which the algorithm presents with high degree of certainty, or 77.86{%} precision and 33.72{%} recall considering all results.
Tasks	Decision Making
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1236/
PDF	https://www.aclweb.org/anthology/L16-1236
PWC	https://paperswithcode.com/paper/a-taxonomy-of-spanish-nouns-a-statistical
Repo
Framework

LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl


Title	LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl
Authors	Szymon Roziewski, Wojciech Stokowiec
Abstract	The web data contains immense amount of data, hundreds of billion words are waiting to be extracted and used for language research. In this work we introduce our tool LanguageCrawl which allows NLP researchers to easily construct web-scale corpus from Common Crawl Archive: a petabyte scale, open repository of web crawl information. Three use-cases are presented: filtering Polish websites, building an N-gram corpora and training continuous skip-gram language model with hierarchical softmax. Each of them has been implemented within the LanguageCrawl toolkit, with the possibility to adjust specified language and N-gram ranks. Special effort has been put on high computing efficiency, by applying highly concurrent multitasking. We make our tool publicly available to enrich NLP resources. We strongly believe that our work will help to facilitate NLP research, especially in under-resourced languages, where the lack of appropriately sized corpora is a serious hindrance to applying data-intensive methods, such as deep neural networks.
Tasks	Language Modelling
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1443/
PDF	https://www.aclweb.org/anthology/L16-1443
PWC	https://paperswithcode.com/paper/languagecrawl-a-generic-tool-for-building
Repo
Framework

Evaluating a Topic Modelling Approach to Measuring Corpus Similarity


Title	Evaluating a Topic Modelling Approach to Measuring Corpus Similarity
Authors	Richard Fothergill, Paul Cook, Timothy Baldwin
Abstract	Web corpora are often constructed automatically, and their contents are therefore often not well understood. One technique for assessing the composition of such a web corpus is to empirically measure its similarity to a reference corpus whose composition is known. In this paper we evaluate a number of measures of corpus similarity, including a method based on topic modelling which has not been previously evaluated for this task. To evaluate these methods we use known-similarity corpora that have been previously used for this purpose, as well as a number of newly-constructed known-similarity corpora targeting differences in genre, topic, time, and region. Our findings indicate that, overall, the topic modelling approach did not improve on a chi-square method that had previously been found to work well for measuring corpus similarity.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1042/
PDF	https://www.aclweb.org/anthology/L16-1042
PWC	https://paperswithcode.com/paper/evaluating-a-topic-modelling-approach-to
Repo
Framework

Detecting Expressions of Blame or Praise in Text


Title	Detecting Expressions of Blame or Praise in Text
Authors	Udochukwu Orizu, Yulan He
Abstract	The growth of social networking platforms has drawn a lot of attentions to the need for social computing. Social computing utilises human insights for computational tasks as well as design of systems that support social behaviours and interactions. One of the key aspects of social computing is the ability to attribute responsibility such as blame or praise to social events. This ability helps an intelligent entity account and understand other intelligent entities{'} social behaviours, and enriches both the social functionalities and cognitive aspects of intelligent agents. In this paper, we present an approach with a model for blame and praise detection in text. We build our model based on various theories of blame and include in our model features used by humans determining judgment such as moral agent causality, foreknowledge, intentionality and coercion. An annotated corpus has been created for the task of blame and praise detection from text. The experimental results show that while our model gives similar results compared to supervised classifiers on classifying text as blame, praise or others, it outperforms supervised classifiers on more finer-grained classification of determining the direction of blame and praise, i.e., self-blame, blame-others, self-praise or praise-others, despite not using labelled training data.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1651/
PDF	https://www.aclweb.org/anthology/L16-1651
PWC	https://paperswithcode.com/paper/detecting-expressions-of-blame-or-praise-in
Repo
Framework

4Couv: A New Treebank for French


Title	4Couv: A New Treebank for French
Authors	Philippe Blache, Gr{'e}goire de Montcheuil, Laurent Pr{'e}vot, St{'e}phane Rauzy
Abstract	The question of the type of text used as primary data in treebanks is of certain importance. First, it has an influence at the discourse level: an article is not organized in the same way as a novel or a technical document. Moreover, it also has consequences in terms of semantic interpretation: some types of texts can be easier to interpret than others. We present in this paper a new type of treebank which presents the particularity to answer to specific needs of experimental linguistic. It is made of short texts (book backcovers) that presents a strong coherence in their organization and can be rapidly interpreted. This type of text is adapted to short reading sessions, making it easy to acquire physiological data (e.g. eye movement, electroencepholagraphy). Such a resource offers reliable data when looking for correlations between computational models and human language processing.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1245/
PDF	https://www.aclweb.org/anthology/L16-1245
PWC	https://paperswithcode.com/paper/4couv-a-new-treebank-for-french
Repo
Framework

A sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon


Title	A sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon
Authors	Tibor Kiss, Francis Jeffry Pelletier, Halima Husic, Roman Nino Simunic, Johanna Marie Poppek
Abstract	The present paper describes the current release of the Bochum English Countability Lexicon (BECL 2.1), a large empirical database consisting of lemmata from Open ANC (http://www.anc.org) with added senses from WordNet (Fellbaum 1998). BECL 2.1 contains {\mbox{$\approx$}} 11,800 annotated noun-sense pairs, divided in four major countability classes and 18 fine-grained subclasses. In the current version, BECL also provides information on nouns whose senses occur in more than one class allowing a closer look on polysemy and homonymy with regard to countability. Further included are sets of similar senses using the Leacock and Chodorow (LCH) score for semantic similarity (Leacock {&} Chodorow 1998), information on orthographic variation, on the completeness of all WordNet senses in the database and an annotated representation of different types of proper names. The further development of BECL will investigate the different countability classes of proper names and the general relation between semantic similarity and countability as well as recurring syntactic patterns for noun-sense pairs. The BECL 2.1 database is also publicly available via http://count-and-mass.org.
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1447/
PDF	https://www.aclweb.org/anthology/L16-1447
PWC	https://paperswithcode.com/paper/a-sense-based-lexicon-of-count-and-mass
Repo
Framework

Posterior regularization for Joint Modeling of Multiple Structured Prediction Tasks with Soft Constraints


Title	Posterior regularization for Joint Modeling of Multiple Structured Prediction Tasks with Soft Constraints
Authors	Kartik Goyal, Chris Dyer
Abstract
Tasks	Multi-Task Learning, Named Entity Recognition, Part-Of-Speech Tagging, Structured Prediction
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-5904/
PDF	https://www.aclweb.org/anthology/W16-5904
PWC	https://paperswithcode.com/paper/posterior-regularization-for-joint-modeling
Repo
Framework

Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings


Title	Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings
Authors	Fereshte Khani, Martin Rinard, Percy Liang
Abstract
Tasks	Question Answering, Semantic Parsing
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1090/
PDF	https://www.aclweb.org/anthology/P16-1090
PWC	https://paperswithcode.com/paper/unanimous-prediction-for-100-precision-with-1
Repo
Framework

Falling silent, lost for words … Tracing personal involvement in interviews with Dutch war veterans


Title	Falling silent, lost for words … Tracing personal involvement in interviews with Dutch war veterans
Authors	Henk van den Heuvel, Nelleke Oostdijk
Abstract	In sources used in oral history research (such as interviews with eye witnesses), passages where the degree of personal emotional involvement is found to be high can be of particular interest, as these may give insight into how historical events were experienced, and what moral dilemmas and psychological or religious struggles were encountered. In a pilot study involving a large corpus of interview recordings with Dutch war veterans, we have investigated if it is possible to develop a method for automatically identifying those passages where the degree of personal emotional involvement is high. The method is based on the automatic detection of exceptionally large silences and filled pause segments (using Automatic Speech Recognition), and cues taken from specific n-grams. The first results appear to be encouraging enough for further elaboration of the method.
Tasks	Speech Recognition
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1158/
PDF	https://www.aclweb.org/anthology/L16-1158
PWC	https://paperswithcode.com/paper/falling-silent-lost-for-words-tracing
Repo
Framework

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers


Title	Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers
Authors	Amal Htait, Sebastien Fournier, Patrice Bellot
Abstract	In this paper, we present the automatic annotation of bibliographical references{'} zone in papers and articles of XML/TEI format. Our work is applied through two phases: first, we use machine learning technology to classify bibliographical and non-bibliographical paragraphs in papers, by means of a model that was initially created to differentiate between the footnotes containing or not containing bibliographical references. The previous description is one of BILBO{'}s features, which is an open source software for automatic annotation of bibliographic reference. Also, we suggest some methods to minimize the margin of error. Second, we propose an algorithm to find the largest list of bibliographical references in the article. The improvement applied on our model results an increase in the model{'}s efficiency with an Accuracy equal to 85.89. And by testing our work, we are able to achieve 72.23{%} as an average for the percentage of success in detecting bibliographical references{'} zone.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1576/
PDF	https://www.aclweb.org/anthology/L16-1576
PWC	https://paperswithcode.com/paper/bilbo-val-automatic-identification-of
Repo
Framework

LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages


Title	LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
Authors	Stephanie Strassel, Jennifer Tracey
Abstract	In this paper, we describe the textual linguistic resources in nearly 3 dozen languages being produced by Linguistic Data Consortium for DARPA{'}s LORELEI (Low Resource Languages for Emergent Incidents) Program. The goal of LORELEI is to improve the performance of human language technologies for low-resource languages and enable rapid re-training of such technologies for new languages, with a focus on the use case of deployment of resources in sudden emergencies such as natural disasters. Representative languages have been selected to provide broad typological coverage for training, and surprise incident languages for testing will be selected over the course of the program. Our approach treats the full set of language packs as a coherent whole, maintaining LORELEI-wide specifications, tagsets, and guidelines, while allowing for adaptation to the specific needs created by each language. Each representative language corpus, therefore, both stands on its own as a resource for the specific language and forms part of a large multilingual resource for broader cross-language technology development.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1521/
PDF	https://www.aclweb.org/anthology/L16-1521
PWC	https://paperswithcode.com/paper/lorelei-language-packs-data-tools-and
Repo
Framework

Analysis of Anxious Word Usage on Online Health Forums


Title	Analysis of Anxious Word Usage on Online Health Forums
Authors	Nicolas Rey-Villamizar, Prasha Shrestha, Farig Sadeque, Steven Bethard, Ted Pedersen, Arjun Mukherjee, Thamar Solorio
Abstract
Tasks
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-6105/
PDF	https://www.aclweb.org/anthology/W16-6105
PWC	https://paperswithcode.com/paper/analysis-of-anxious-word-usage-on-online
Repo
Framework


Title	Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media
Authors
Abstract
Tasks
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-6200/
PDF	https://www.aclweb.org/anthology/W16-6200
PWC	https://paperswithcode.com/paper/proceedings-of-the-fourth-international-1
Repo
Framework

Improving Twitter Community Detection through Contextual Sentiment Analysis


Title	Improving Twitter Community Detection through Contextual Sentiment Analysis
Authors	Alron Jan Lam
Abstract
Tasks	Community Detection, Sentiment Analysis
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-3005/
PDF	https://www.aclweb.org/anthology/P16-3005
PWC	https://paperswithcode.com/paper/improving-twitter-community-detection-through
Repo
Framework