May 5, 2019

2135 words 11 mins read

Paper Group NANR 83

Paper Group NANR 83

LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform. A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code. LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl. Evaluating a Topic Modelling Approach to Measuring Corpus Similarity. Detecting Expressions of Blame or …

LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform

Title LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform
Authors Olga Uryupina, Barbara Plank, Gianni Barlacchi, Francisco J. Valverde Albacete, Manos Tsagkias, Antonio Uva, Aless Moschitti, ro
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-4027/
PDF https://www.aclweb.org/anthology/P16-4027
PWC https://paperswithcode.com/paper/limosine-pipeline-multilingual-uima-based-nlp
Repo
Framework

A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code

Title A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code
Authors Rogelio Nazar, Irene Renau
Abstract In this paper we describe our work in progress in the automatic development of a taxonomy of Spanish nouns, we offer the Perl implementation we have so far, and we discuss the different problems that still need to be addressed. We designed a statistically-based taxonomy induction algorithm consisting of a combination of different strategies not involving explicit linguistic knowledge. Being all quantitative, the strategies we present are however of different nature. Some of them are based on the computation of distributional similarity coefficients which identify pairs of sibling words or co-hyponyms, while others are based on asymmetric co-occurrence and identify pairs of parent-child words or hypernym-hyponym relations. A decision making process is then applied to combine the results of the previous steps, and finally connect lexical units to a basic structure containing the most general categories of the language. We evaluate the quality of the taxonomy both manually and also using Spanish Wordnet as a gold-standard. We estimate an average of 89.07{%} precision and 25.49{%} recall considering only the results which the algorithm presents with high degree of certainty, or 77.86{%} precision and 33.72{%} recall considering all results.
Tasks Decision Making
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1236/
PDF https://www.aclweb.org/anthology/L16-1236
PWC https://paperswithcode.com/paper/a-taxonomy-of-spanish-nouns-a-statistical
Repo
Framework

LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl

Title LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl
Authors Szymon Roziewski, Wojciech Stokowiec
Abstract The web data contains immense amount of data, hundreds of billion words are waiting to be extracted and used for language research. In this work we introduce our tool LanguageCrawl which allows NLP researchers to easily construct web-scale corpus from Common Crawl Archive: a petabyte scale, open repository of web crawl information. Three use-cases are presented: filtering Polish websites, building an N-gram corpora and training continuous skip-gram language model with hierarchical softmax. Each of them has been implemented within the LanguageCrawl toolkit, with the possibility to adjust specified language and N-gram ranks. Special effort has been put on high computing efficiency, by applying highly concurrent multitasking. We make our tool publicly available to enrich NLP resources. We strongly believe that our work will help to facilitate NLP research, especially in under-resourced languages, where the lack of appropriately sized corpora is a serious hindrance to applying data-intensive methods, such as deep neural networks.
Tasks Language Modelling
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1443/
PDF https://www.aclweb.org/anthology/L16-1443
PWC https://paperswithcode.com/paper/languagecrawl-a-generic-tool-for-building
Repo
Framework

Evaluating a Topic Modelling Approach to Measuring Corpus Similarity

Title Evaluating a Topic Modelling Approach to Measuring Corpus Similarity
Authors Richard Fothergill, Paul Cook, Timothy Baldwin
Abstract Web corpora are often constructed automatically, and their contents are therefore often not well understood. One technique for assessing the composition of such a web corpus is to empirically measure its similarity to a reference corpus whose composition is known. In this paper we evaluate a number of measures of corpus similarity, including a method based on topic modelling which has not been previously evaluated for this task. To evaluate these methods we use known-similarity corpora that have been previously used for this purpose, as well as a number of newly-constructed known-similarity corpora targeting differences in genre, topic, time, and region. Our findings indicate that, overall, the topic modelling approach did not improve on a chi-square method that had previously been found to work well for measuring corpus similarity.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1042/
PDF https://www.aclweb.org/anthology/L16-1042
PWC https://paperswithcode.com/paper/evaluating-a-topic-modelling-approach-to
Repo
Framework

Detecting Expressions of Blame or Praise in Text

Title Detecting Expressions of Blame or Praise in Text
Authors Udochukwu Orizu, Yulan He
Abstract The growth of social networking platforms has drawn a lot of attentions to the need for social computing. Social computing utilises human insights for computational tasks as well as design of systems that support social behaviours and interactions. One of the key aspects of social computing is the ability to attribute responsibility such as blame or praise to social events. This ability helps an intelligent entity account and understand other intelligent entities{'} social behaviours, and enriches both the social functionalities and cognitive aspects of intelligent agents. In this paper, we present an approach with a model for blame and praise detection in text. We build our model based on various theories of blame and include in our model features used by humans determining judgment such as moral agent causality, foreknowledge, intentionality and coercion. An annotated corpus has been created for the task of blame and praise detection from text. The experimental results show that while our model gives similar results compared to supervised classifiers on classifying text as blame, praise or others, it outperforms supervised classifiers on more finer-grained classification of determining the direction of blame and praise, i.e., self-blame, blame-others, self-praise or praise-others, despite not using labelled training data.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1651/
PDF https://www.aclweb.org/anthology/L16-1651
PWC https://paperswithcode.com/paper/detecting-expressions-of-blame-or-praise-in
Repo
Framework

4Couv: A New Treebank for French

Title 4Couv: A New Treebank for French
Authors Philippe Blache, Gr{'e}goire de Montcheuil, Laurent Pr{'e}vot, St{'e}phane Rauzy
Abstract The question of the type of text used as primary data in treebanks is of certain importance. First, it has an influence at the discourse level: an article is not organized in the same way as a novel or a technical document. Moreover, it also has consequences in terms of semantic interpretation: some types of texts can be easier to interpret than others. We present in this paper a new type of treebank which presents the particularity to answer to specific needs of experimental linguistic. It is made of short texts (book backcovers) that presents a strong coherence in their organization and can be rapidly interpreted. This type of text is adapted to short reading sessions, making it easy to acquire physiological data (e.g. eye movement, electroencepholagraphy). Such a resource offers reliable data when looking for correlations between computational models and human language processing.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1245/
PDF https://www.aclweb.org/anthology/L16-1245
PWC https://paperswithcode.com/paper/4couv-a-new-treebank-for-french
Repo
Framework

A sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon

Title A sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon
Authors Tibor Kiss, Francis Jeffry Pelletier, Halima Husic, Roman Nino Simunic, Johanna Marie Poppek
Abstract The present paper describes the current release of the Bochum English Countability Lexicon (BECL 2.1), a large empirical database consisting of lemmata from Open ANC (http://www.anc.org) with added senses from WordNet (Fellbaum 1998). BECL 2.1 contains {\mbox{$\approx$}} 11,800 annotated noun-sense pairs, divided in four major countability classes and 18 fine-grained subclasses. In the current version, BECL also provides information on nouns whose senses occur in more than one class allowing a closer look on polysemy and homonymy with regard to countability. Further included are sets of similar senses using the Leacock and Chodorow (LCH) score for semantic similarity (Leacock {&} Chodorow 1998), information on orthographic variation, on the completeness of all WordNet senses in the database and an annotated representation of different types of proper names. The further development of BECL will investigate the different countability classes of proper names and the general relation between semantic similarity and countability as well as recurring syntactic patterns for noun-sense pairs. The BECL 2.1 database is also publicly available via http://count-and-mass.org.
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1447/
PDF https://www.aclweb.org/anthology/L16-1447
PWC https://paperswithcode.com/paper/a-sense-based-lexicon-of-count-and-mass
Repo
Framework

Posterior regularization for Joint Modeling of Multiple Structured Prediction Tasks with Soft Constraints

Title Posterior regularization for Joint Modeling of Multiple Structured Prediction Tasks with Soft Constraints
Authors Kartik Goyal, Chris Dyer
Abstract
Tasks Multi-Task Learning, Named Entity Recognition, Part-Of-Speech Tagging, Structured Prediction
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-5904/
PDF https://www.aclweb.org/anthology/W16-5904
PWC https://paperswithcode.com/paper/posterior-regularization-for-joint-modeling
Repo
Framework

Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings

Title Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings
Authors Fereshte Khani, Martin Rinard, Percy Liang
Abstract
Tasks Question Answering, Semantic Parsing
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1090/
PDF https://www.aclweb.org/anthology/P16-1090
PWC https://paperswithcode.com/paper/unanimous-prediction-for-100-precision-with-1
Repo
Framework

Falling silent, lost for words … Tracing personal involvement in interviews with Dutch war veterans

Title Falling silent, lost for words … Tracing personal involvement in interviews with Dutch war veterans
Authors Henk van den Heuvel, Nelleke Oostdijk
Abstract In sources used in oral history research (such as interviews with eye witnesses), passages where the degree of personal emotional involvement is found to be high can be of particular interest, as these may give insight into how historical events were experienced, and what moral dilemmas and psychological or religious struggles were encountered. In a pilot study involving a large corpus of interview recordings with Dutch war veterans, we have investigated if it is possible to develop a method for automatically identifying those passages where the degree of personal emotional involvement is high. The method is based on the automatic detection of exceptionally large silences and filled pause segments (using Automatic Speech Recognition), and cues taken from specific n-grams. The first results appear to be encouraging enough for further elaboration of the method.
Tasks Speech Recognition
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1158/
PDF https://www.aclweb.org/anthology/L16-1158
PWC https://paperswithcode.com/paper/falling-silent-lost-for-words-tracing
Repo
Framework

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers

Title Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers
Authors Amal Htait, Sebastien Fournier, Patrice Bellot
Abstract In this paper, we present the automatic annotation of bibliographical references{'} zone in papers and articles of XML/TEI format. Our work is applied through two phases: first, we use machine learning technology to classify bibliographical and non-bibliographical paragraphs in papers, by means of a model that was initially created to differentiate between the footnotes containing or not containing bibliographical references. The previous description is one of BILBO{'}s features, which is an open source software for automatic annotation of bibliographic reference. Also, we suggest some methods to minimize the margin of error. Second, we propose an algorithm to find the largest list of bibliographical references in the article. The improvement applied on our model results an increase in the model{'}s efficiency with an Accuracy equal to 85.89. And by testing our work, we are able to achieve 72.23{%} as an average for the percentage of success in detecting bibliographical references{'} zone.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1576/
PDF https://www.aclweb.org/anthology/L16-1576
PWC https://paperswithcode.com/paper/bilbo-val-automatic-identification-of
Repo
Framework

LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages

Title LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
Authors Stephanie Strassel, Jennifer Tracey
Abstract In this paper, we describe the textual linguistic resources in nearly 3 dozen languages being produced by Linguistic Data Consortium for DARPA{'}s LORELEI (Low Resource Languages for Emergent Incidents) Program. The goal of LORELEI is to improve the performance of human language technologies for low-resource languages and enable rapid re-training of such technologies for new languages, with a focus on the use case of deployment of resources in sudden emergencies such as natural disasters. Representative languages have been selected to provide broad typological coverage for training, and surprise incident languages for testing will be selected over the course of the program. Our approach treats the full set of language packs as a coherent whole, maintaining LORELEI-wide specifications, tagsets, and guidelines, while allowing for adaptation to the specific needs created by each language. Each representative language corpus, therefore, both stands on its own as a resource for the specific language and forms part of a large multilingual resource for broader cross-language technology development.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1521/
PDF https://www.aclweb.org/anthology/L16-1521
PWC https://paperswithcode.com/paper/lorelei-language-packs-data-tools-and
Repo
Framework

Analysis of Anxious Word Usage on Online Health Forums

Title Analysis of Anxious Word Usage on Online Health Forums
Authors Nicolas Rey-Villamizar, Prasha Shrestha, Farig Sadeque, Steven Bethard, Ted Pedersen, Arjun Mukherjee, Thamar Solorio
Abstract
Tasks
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-6105/
PDF https://www.aclweb.org/anthology/W16-6105
PWC https://paperswithcode.com/paper/analysis-of-anxious-word-usage-on-online
Repo
Framework

Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media

Title Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media
Authors
Abstract
Tasks
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-6200/
PDF https://www.aclweb.org/anthology/W16-6200
PWC https://paperswithcode.com/paper/proceedings-of-the-fourth-international-1
Repo
Framework

Improving Twitter Community Detection through Contextual Sentiment Analysis

Title Improving Twitter Community Detection through Contextual Sentiment Analysis
Authors Alron Jan Lam
Abstract
Tasks Community Detection, Sentiment Analysis
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-3005/
PDF https://www.aclweb.org/anthology/P16-3005
PWC https://paperswithcode.com/paper/improving-twitter-community-detection-through
Repo
Framework
comments powered by Disqus