Paper Group NANR 83
LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform. A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code. LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl. Evaluating a Topic Modelling Approach to Measuring Corpus Similarity. Detecting Expressions of Blame or …
LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform
Title | LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform |
Authors | Olga Uryupina, Barbara Plank, Gianni Barlacchi, Francisco J. Valverde Albacete, Manos Tsagkias, Antonio Uva, Aless Moschitti, ro |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-4027/ |
https://www.aclweb.org/anthology/P16-4027 | |
PWC | https://paperswithcode.com/paper/limosine-pipeline-multilingual-uima-based-nlp |
Repo | |
Framework | |
A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code
Title | A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code |
Authors | Rogelio Nazar, Irene Renau |
Abstract | In this paper we describe our work in progress in the automatic development of a taxonomy of Spanish nouns, we offer the Perl implementation we have so far, and we discuss the different problems that still need to be addressed. We designed a statistically-based taxonomy induction algorithm consisting of a combination of different strategies not involving explicit linguistic knowledge. Being all quantitative, the strategies we present are however of different nature. Some of them are based on the computation of distributional similarity coefficients which identify pairs of sibling words or co-hyponyms, while others are based on asymmetric co-occurrence and identify pairs of parent-child words or hypernym-hyponym relations. A decision making process is then applied to combine the results of the previous steps, and finally connect lexical units to a basic structure containing the most general categories of the language. We evaluate the quality of the taxonomy both manually and also using Spanish Wordnet as a gold-standard. We estimate an average of 89.07{%} precision and 25.49{%} recall considering only the results which the algorithm presents with high degree of certainty, or 77.86{%} precision and 33.72{%} recall considering all results. |
Tasks | Decision Making |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1236/ |
https://www.aclweb.org/anthology/L16-1236 | |
PWC | https://paperswithcode.com/paper/a-taxonomy-of-spanish-nouns-a-statistical |
Repo | |
Framework | |
LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl
Title | LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl |
Authors | Szymon Roziewski, Wojciech Stokowiec |
Abstract | The web data contains immense amount of data, hundreds of billion words are waiting to be extracted and used for language research. In this work we introduce our tool LanguageCrawl which allows NLP researchers to easily construct web-scale corpus from Common Crawl Archive: a petabyte scale, open repository of web crawl information. Three use-cases are presented: filtering Polish websites, building an N-gram corpora and training continuous skip-gram language model with hierarchical softmax. Each of them has been implemented within the LanguageCrawl toolkit, with the possibility to adjust specified language and N-gram ranks. Special effort has been put on high computing efficiency, by applying highly concurrent multitasking. We make our tool publicly available to enrich NLP resources. We strongly believe that our work will help to facilitate NLP research, especially in under-resourced languages, where the lack of appropriately sized corpora is a serious hindrance to applying data-intensive methods, such as deep neural networks. |
Tasks | Language Modelling |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1443/ |
https://www.aclweb.org/anthology/L16-1443 | |
PWC | https://paperswithcode.com/paper/languagecrawl-a-generic-tool-for-building |
Repo | |
Framework | |
Evaluating a Topic Modelling Approach to Measuring Corpus Similarity
Title | Evaluating a Topic Modelling Approach to Measuring Corpus Similarity |
Authors | Richard Fothergill, Paul Cook, Timothy Baldwin |
Abstract | Web corpora are often constructed automatically, and their contents are therefore often not well understood. One technique for assessing the composition of such a web corpus is to empirically measure its similarity to a reference corpus whose composition is known. In this paper we evaluate a number of measures of corpus similarity, including a method based on topic modelling which has not been previously evaluated for this task. To evaluate these methods we use known-similarity corpora that have been previously used for this purpose, as well as a number of newly-constructed known-similarity corpora targeting differences in genre, topic, time, and region. Our findings indicate that, overall, the topic modelling approach did not improve on a chi-square method that had previously been found to work well for measuring corpus similarity. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1042/ |
https://www.aclweb.org/anthology/L16-1042 | |
PWC | https://paperswithcode.com/paper/evaluating-a-topic-modelling-approach-to |
Repo | |
Framework | |
Detecting Expressions of Blame or Praise in Text
Title | Detecting Expressions of Blame or Praise in Text |
Authors | Udochukwu Orizu, Yulan He |
Abstract | The growth of social networking platforms has drawn a lot of attentions to the need for social computing. Social computing utilises human insights for computational tasks as well as design of systems that support social behaviours and interactions. One of the key aspects of social computing is the ability to attribute responsibility such as blame or praise to social events. This ability helps an intelligent entity account and understand other intelligent entities{'} social behaviours, and enriches both the social functionalities and cognitive aspects of intelligent agents. In this paper, we present an approach with a model for blame and praise detection in text. We build our model based on various theories of blame and include in our model features used by humans determining judgment such as moral agent causality, foreknowledge, intentionality and coercion. An annotated corpus has been created for the task of blame and praise detection from text. The experimental results show that while our model gives similar results compared to supervised classifiers on classifying text as blame, praise or others, it outperforms supervised classifiers on more finer-grained classification of determining the direction of blame and praise, i.e., self-blame, blame-others, self-praise or praise-others, despite not using labelled training data. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1651/ |
https://www.aclweb.org/anthology/L16-1651 | |
PWC | https://paperswithcode.com/paper/detecting-expressions-of-blame-or-praise-in |
Repo | |
Framework | |
4Couv: A New Treebank for French
Title | 4Couv: A New Treebank for French |
Authors | Philippe Blache, Gr{'e}goire de Montcheuil, Laurent Pr{'e}vot, St{'e}phane Rauzy |
Abstract | The question of the type of text used as primary data in treebanks is of certain importance. First, it has an influence at the discourse level: an article is not organized in the same way as a novel or a technical document. Moreover, it also has consequences in terms of semantic interpretation: some types of texts can be easier to interpret than others. We present in this paper a new type of treebank which presents the particularity to answer to specific needs of experimental linguistic. It is made of short texts (book backcovers) that presents a strong coherence in their organization and can be rapidly interpreted. This type of text is adapted to short reading sessions, making it easy to acquire physiological data (e.g. eye movement, electroencepholagraphy). Such a resource offers reliable data when looking for correlations between computational models and human language processing. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1245/ |
https://www.aclweb.org/anthology/L16-1245 | |
PWC | https://paperswithcode.com/paper/4couv-a-new-treebank-for-french |
Repo | |
Framework | |
A sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon
Title | A sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon |
Authors | Tibor Kiss, Francis Jeffry Pelletier, Halima Husic, Roman Nino Simunic, Johanna Marie Poppek |
Abstract | The present paper describes the current release of the Bochum English Countability Lexicon (BECL 2.1), a large empirical database consisting of lemmata from Open ANC (http://www.anc.org) with added senses from WordNet (Fellbaum 1998). BECL 2.1 contains {\mbox{$\approx$}} 11,800 annotated noun-sense pairs, divided in four major countability classes and 18 fine-grained subclasses. In the current version, BECL also provides information on nouns whose senses occur in more than one class allowing a closer look on polysemy and homonymy with regard to countability. Further included are sets of similar senses using the Leacock and Chodorow (LCH) score for semantic similarity (Leacock {&} Chodorow 1998), information on orthographic variation, on the completeness of all WordNet senses in the database and an annotated representation of different types of proper names. The further development of BECL will investigate the different countability classes of proper names and the general relation between semantic similarity and countability as well as recurring syntactic patterns for noun-sense pairs. The BECL 2.1 database is also publicly available via http://count-and-mass.org. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1447/ |
https://www.aclweb.org/anthology/L16-1447 | |
PWC | https://paperswithcode.com/paper/a-sense-based-lexicon-of-count-and-mass |
Repo | |
Framework | |
Posterior regularization for Joint Modeling of Multiple Structured Prediction Tasks with Soft Constraints
Title | Posterior regularization for Joint Modeling of Multiple Structured Prediction Tasks with Soft Constraints |
Authors | Kartik Goyal, Chris Dyer |
Abstract | |
Tasks | Multi-Task Learning, Named Entity Recognition, Part-Of-Speech Tagging, Structured Prediction |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/W16-5904/ |
https://www.aclweb.org/anthology/W16-5904 | |
PWC | https://paperswithcode.com/paper/posterior-regularization-for-joint-modeling |
Repo | |
Framework | |
Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings
Title | Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings |
Authors | Fereshte Khani, Martin Rinard, Percy Liang |
Abstract | |
Tasks | Question Answering, Semantic Parsing |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1090/ |
https://www.aclweb.org/anthology/P16-1090 | |
PWC | https://paperswithcode.com/paper/unanimous-prediction-for-100-precision-with-1 |
Repo | |
Framework | |
Falling silent, lost for words … Tracing personal involvement in interviews with Dutch war veterans
Title | Falling silent, lost for words … Tracing personal involvement in interviews with Dutch war veterans |
Authors | Henk van den Heuvel, Nelleke Oostdijk |
Abstract | In sources used in oral history research (such as interviews with eye witnesses), passages where the degree of personal emotional involvement is found to be high can be of particular interest, as these may give insight into how historical events were experienced, and what moral dilemmas and psychological or religious struggles were encountered. In a pilot study involving a large corpus of interview recordings with Dutch war veterans, we have investigated if it is possible to develop a method for automatically identifying those passages where the degree of personal emotional involvement is high. The method is based on the automatic detection of exceptionally large silences and filled pause segments (using Automatic Speech Recognition), and cues taken from specific n-grams. The first results appear to be encouraging enough for further elaboration of the method. |
Tasks | Speech Recognition |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1158/ |
https://www.aclweb.org/anthology/L16-1158 | |
PWC | https://paperswithcode.com/paper/falling-silent-lost-for-words-tracing |
Repo | |
Framework | |
Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers
Title | Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers |
Authors | Amal Htait, Sebastien Fournier, Patrice Bellot |
Abstract | In this paper, we present the automatic annotation of bibliographical references{'} zone in papers and articles of XML/TEI format. Our work is applied through two phases: first, we use machine learning technology to classify bibliographical and non-bibliographical paragraphs in papers, by means of a model that was initially created to differentiate between the footnotes containing or not containing bibliographical references. The previous description is one of BILBO{'}s features, which is an open source software for automatic annotation of bibliographic reference. Also, we suggest some methods to minimize the margin of error. Second, we propose an algorithm to find the largest list of bibliographical references in the article. The improvement applied on our model results an increase in the model{'}s efficiency with an Accuracy equal to 85.89. And by testing our work, we are able to achieve 72.23{%} as an average for the percentage of success in detecting bibliographical references{'} zone. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1576/ |
https://www.aclweb.org/anthology/L16-1576 | |
PWC | https://paperswithcode.com/paper/bilbo-val-automatic-identification-of |
Repo | |
Framework | |
LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
Title | LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages |
Authors | Stephanie Strassel, Jennifer Tracey |
Abstract | In this paper, we describe the textual linguistic resources in nearly 3 dozen languages being produced by Linguistic Data Consortium for DARPA{'}s LORELEI (Low Resource Languages for Emergent Incidents) Program. The goal of LORELEI is to improve the performance of human language technologies for low-resource languages and enable rapid re-training of such technologies for new languages, with a focus on the use case of deployment of resources in sudden emergencies such as natural disasters. Representative languages have been selected to provide broad typological coverage for training, and surprise incident languages for testing will be selected over the course of the program. Our approach treats the full set of language packs as a coherent whole, maintaining LORELEI-wide specifications, tagsets, and guidelines, while allowing for adaptation to the specific needs created by each language. Each representative language corpus, therefore, both stands on its own as a resource for the specific language and forms part of a large multilingual resource for broader cross-language technology development. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1521/ |
https://www.aclweb.org/anthology/L16-1521 | |
PWC | https://paperswithcode.com/paper/lorelei-language-packs-data-tools-and |
Repo | |
Framework | |
Analysis of Anxious Word Usage on Online Health Forums
Title | Analysis of Anxious Word Usage on Online Health Forums |
Authors | Nicolas Rey-Villamizar, Prasha Shrestha, Farig Sadeque, Steven Bethard, Ted Pedersen, Arjun Mukherjee, Thamar Solorio |
Abstract | |
Tasks | |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/W16-6105/ |
https://www.aclweb.org/anthology/W16-6105 | |
PWC | https://paperswithcode.com/paper/analysis-of-anxious-word-usage-on-online |
Repo | |
Framework | |
Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media
Title | Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media |
Authors | |
Abstract | |
Tasks | |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/W16-6200/ |
https://www.aclweb.org/anthology/W16-6200 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-fourth-international-1 |
Repo | |
Framework | |
Improving Twitter Community Detection through Contextual Sentiment Analysis
Title | Improving Twitter Community Detection through Contextual Sentiment Analysis |
Authors | Alron Jan Lam |
Abstract | |
Tasks | Community Detection, Sentiment Analysis |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-3005/ |
https://www.aclweb.org/anthology/P16-3005 | |
PWC | https://paperswithcode.com/paper/improving-twitter-community-detection-through |
Repo | |
Framework | |