May 5, 2019

1829 words 9 mins read

Paper Group NANR 56

Paper Group NANR 56

Language Muse: Automated Linguistic Activity Generation for English Language Learners. Pronoun Prediction with Linguistic Features and Example Weighing. USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing. Bilingual Embeddings and Word Alignments for Translation Quality Estimation. DBpedia Abstracts: A Large-Scale, Open, Mul …

Language Muse: Automated Linguistic Activity Generation for English Language Learners

Title Language Muse: Automated Linguistic Activity Generation for English Language Learners
Authors Nitin Madnani, Jill Burstein, John Sabatini, Kietha Biggers, Slava Andreyev
Abstract
Tasks Question Generation
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-4014/
PDF https://www.aclweb.org/anthology/P16-4014
PWC https://paperswithcode.com/paper/language-muse-automated-linguistic-activity
Repo
Framework

Pronoun Prediction with Linguistic Features and Example Weighing

Title Pronoun Prediction with Linguistic Features and Example Weighing
Authors Michal Nov{'a}k
Abstract
Tasks Language Modelling, Machine Translation, Word Alignment
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2354/
PDF https://www.aclweb.org/anthology/W16-2354
PWC https://paperswithcode.com/paper/pronoun-prediction-with-linguistic-features
Repo
Framework

USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing

Title USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing
Authors Santanu Pal, Marcos Zampieri, Josef van Genabith
Abstract
Tasks Automatic Post-Editing, Machine Translation, Word Alignment
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2379/
PDF https://www.aclweb.org/anthology/W16-2379
PWC https://paperswithcode.com/paper/usaar-an-operation-sequential-model-for
Repo
Framework

Bilingual Embeddings and Word Alignments for Translation Quality Estimation

Title Bilingual Embeddings and Word Alignments for Translation Quality Estimation
Authors Amal Abdelsalam, Ond{\v{r}}ej Bojar, Samhaa El-Beltagy
Abstract
Tasks Machine Translation, Word Alignment, Word Embeddings
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2380/
PDF https://www.aclweb.org/anthology/W16-2380
PWC https://paperswithcode.com/paper/bilingual-embeddings-and-word-alignments-for
Repo
Framework

DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus

Title DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
Authors Martin Br{"u}mmer, Milan Dojchinovski, Sebastian Hellmann
Abstract The ever increasing importance of machine learning in Natural Language Processing is accompanied by an equally increasing need in large-scale training and evaluation corpora. Due to its size, its openness and relative quality, the Wikipedia has already been a source of such data, but on a limited scale. This paper introduces the DBpedia Abstract Corpus, a large-scale, open corpus of annotated Wikipedia texts in six languages, featuring over 11 million texts and over 97 million entity links. The properties of the Wikipedia texts are being described, as well as the corpus creation process, its format and interesting use-cases, like Named Entity Linking training and evaluation.
Tasks Entity Linking
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1532/
PDF https://www.aclweb.org/anthology/L16-1532
PWC https://paperswithcode.com/paper/dbpedia-abstracts-a-large-scale-open
Repo
Framework

A Dataset for Detecting Stance in Tweets

Title A Dataset for Detecting Stance in Tweets
Authors Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, Colin Cherry
Abstract We can often detect from a person{'}s utterances whether he/she is in favor of or against a given target entity (a product, topic, another person, etc.). Here for the first time we present a dataset of tweets annotated for whether the tweeter is in favor of or against pre-chosen targets of interest―their stance. The targets of interest may or may not be referred to in the tweets, and they may or may not be the target of opinion in the tweets. The data pertains to six targets of interest commonly known and debated in the United States. Apart from stance, the tweets are also annotated for whether the target of interest is the target of opinion in the tweet. The annotations were performed by crowdsourcing. Several techniques were employed to encourage high-quality annotations (for example, providing clear and simple instructions) and to identify and discard poor annotations (for example, using a small set of check questions annotated by the authors). This Stance Dataset, which was subsequently also annotated for sentiment, can be used to better understand the relationship between stance, sentiment, entity relationships, and textual inference.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1623/
PDF https://www.aclweb.org/anthology/L16-1623
PWC https://paperswithcode.com/paper/a-dataset-for-detecting-stance-in-tweets
Repo
Framework

Named Entity Resources - Overview and Outlook

Title Named Entity Resources - Overview and Outlook
Authors Maud Ehrmann, Damien Nouvel, Sophie Rosset
Abstract Recognition of real-world entities is crucial for most NLP applications. Since its introduction some twenty years ago, named entity processing has undergone a significant evolution with, among others, the definition of new tasks (e.g. entity linking) and the emergence of new types of data (e.g. speech transcriptions, micro-blogging). These pose certainly new challenges which affect not only methods and algorithms but especially linguistic resources. Where do we stand with respect to named entity resources? This paper aims at providing a systematic overview of named entity resources, accounting for qualities such as multilingualism, dynamicity and interoperability, and to identify shortfalls in order to guide future developments.
Tasks Entity Linking
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1534/
PDF https://www.aclweb.org/anthology/L16-1534
PWC https://paperswithcode.com/paper/named-entity-resources-overview-and-outlook
Repo
Framework

Incorporating Lexico-semantic Heuristics into Coreference Resolution Sieves for Named Entity Recognition at Document-level

Title Incorporating Lexico-semantic Heuristics into Coreference Resolution Sieves for Named Entity Recognition at Document-level
Authors Marcos Garcia
Abstract This paper explores the incorporation of lexico-semantic heuristics into a deterministic Coreference Resolution (CR) system for classifying named entities at document-level. The highest precise sieves of a CR tool are enriched with both a set of heuristics for merging named entities labeled with different classes and also with some constraints that avoid the incorrect merging of similar mentions. Several tests show that this strategy improves both NER labeling and CR. The CR tool can be applied in combination with any system for named entity recognition using the CoNLL format, and brings benefits to text analytics tasks such as Information Extraction. Experiments were carried out in Spanish, using three different NER tools.
Tasks Coreference Resolution, Named Entity Recognition
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1535/
PDF https://www.aclweb.org/anthology/L16-1535
PWC https://paperswithcode.com/paper/incorporating-lexico-semantic-heuristics-into
Repo
Framework

Inter-document Contextual Language model

Title Inter-document Contextual Language model
Authors Quan Hung Tran, Ingrid Zukerman, Gholamreza Haffari
Abstract
Tasks Language Modelling
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1090/
PDF https://www.aclweb.org/anthology/N16-1090
PWC https://paperswithcode.com/paper/inter-document-contextual-language-model
Repo
Framework

Temporal Information Annotation: Crowd vs. Experts

Title Temporal Information Annotation: Crowd vs. Experts
Authors Tommaso Caselli, Rachele Sprugnoli, Oana Inel
Abstract This paper describes two sets of crowdsourcing experiments on temporal information annotation conducted on two languages, i.e., English and Italian. The first experiment, launched on the CrowdFlower platform, was aimed at classifying temporal relations given target entities. The second one, relying on the CrowdTruth metric, consisted in two subtasks: one devoted to the recognition of events and temporal expressions and one to the detection and classification of temporal relations. The outcomes of the experiments suggest a valuable use of crowdsourcing annotations also for a complex task like Temporal Processing.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1557/
PDF https://www.aclweb.org/anthology/L16-1557
PWC https://paperswithcode.com/paper/temporal-information-annotation-crowd-vs
Repo
Framework

Using Word Embeddings to Translate Named Entities

Title Using Word Embeddings to Translate Named Entities
Authors Octavia-Maria {\c{S}}ulea, Sergiu Nisioi, Liviu P. Dinu
Abstract In this paper we investigate the usefulness of neural word embeddings in the process of translating Named Entities (NEs) from a resource-rich language to a language low on resources relevant to the task at hand, introducing a novel, yet simple way of obtaining bilingual word vectors. Inspired by observations in (Mikolov et al., 2013b), which show that training their word vector model on comparable corpora yields comparable vector space representations of those corpora, reducing the problem of translating words to finding a rotation matrix, and results in (Zou et al., 2013), which showed that bilingual word embeddings can improve Chinese Named Entity Recognition (NER) and English to Chinese phrase translation, we use the sentence-aligned English-French EuroParl corpora and show that word embeddings extracted from a merged corpus (corpus resulted from the merger of the two aligned corpora) can be used to NE translation. We extrapolate that word embeddings trained on merged parallel corpora are useful in Named Entity Recognition and Translation tasks for resource-poor languages.
Tasks Chinese Named Entity Recognition, Named Entity Recognition, Word Embeddings
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1536/
PDF https://www.aclweb.org/anthology/L16-1536
PWC https://paperswithcode.com/paper/using-word-embeddings-to-translate-named
Repo
Framework
Title DISAANA and D-SUMM: Large-scale Real Time NLP Systems for Analyzing Disaster Related Reports in Tweets
Authors Kentaro Torisawa
Abstract This talk presents two NLP systems that were developed for helping disaster victims and rescue workers in the aftermath of large-scale disasters. DISAANA provides answers to questions such as {``}What is in short supply in Tokyo?{''} and displays locations related to each answer on a map. D-SUMM automatically summarizes a large number of disaster related reports concerning a specified area and helps rescue workers to understand disaster situations from a macro perspective. Both systems are publicly available as Web services. In the aftermath of the 2016 Kumamoto Earthquake (M7.0), the Japanese government actually used DISAANA to analyze the situation. |
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3903/
PDF https://www.aclweb.org/anthology/W16-3903
PWC https://paperswithcode.com/paper/disaana-and-d-summ-large-scale-real-time-nlp
Repo
Framework

The GW/LT3 VarDial 2016 Shared Task System for Dialects and Similar Languages Detection

Title The GW/LT3 VarDial 2016 Shared Task System for Dialects and Similar Languages Detection
Authors Ayah Zirikly, Bart Desmet, Mona Diab
Abstract This paper describes the GW/LT3 contribution to the 2016 VarDial shared task on the identification of similar languages (task 1) and Arabic dialects (task 2). For both tasks, we experimented with Logistic Regression and Neural Network classifiers in isolation. Additionally, we implemented a cascaded classifier that consists of coarse and fine-grained classifiers (task 1) and a classifier ensemble with majority voting for task 2. The submitted systems obtained state-of-the art performance and ranked first for the evaluation on social media data (test sets B1 and B2 for task 1), with a maximum weighted F1 score of 91.94{%}.
Tasks Feature Engineering
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4804/
PDF https://www.aclweb.org/anthology/W16-4804
PWC https://paperswithcode.com/paper/the-gwlt3-vardial-2016-shared-task-system-for
Repo
Framework

Learning to Search for Recognizing Named Entities in Twitter

Title Learning to Search for Recognizing Named Entities in Twitter
Authors Ioannis Partalas, C{'e}dric Lopez, Nadia Derbas, Ruslan Kalitvianski
Abstract We presented in this work our participation in the 2nd Named Entity Recognition for Twitter shared task. The task has been cast as a sequence labeling one and we employed a learning to search approach in order to tackle it. We also leveraged LOD for extracting rich contextual features for the named-entities. Our submission achieved F-scores of 46.16 and 60.24 for the classification and the segmentation tasks and ranked 2nd and 3rd respectively. The post-analysis showed that LOD features improved substantially the performance of our system as they counter-balance the lack of context in tweets. The shared task gave us the opportunity to test the performance of NER systems in short and noisy textual data. The results of the participated systems shows that the task is far to be considered as a solved one and methods with stellar performance in normal texts need to be revised.
Tasks Named Entity Recognition
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3923/
PDF https://www.aclweb.org/anthology/W16-3923
PWC https://paperswithcode.com/paper/learning-to-search-for-recognizing-named
Repo
Framework

Comparing Speech and Text Classification on ICNALE

Title Comparing Speech and Text Classification on ICNALE
Authors Sergiu Nisioi
Abstract In this paper we explore and compare a speech and text classification approach on a corpus of native and non-native English speakers. We experiment on a subset of the International Corpus Network of Asian Learners of English containing the recorded speeches and the equivalent text transcriptions. Our results suggest a high correlation between the spoken and written classification results, showing that native accent is highly correlated with grammatical structures found in text.
Tasks Text Classification
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1542/
PDF https://www.aclweb.org/anthology/L16-1542
PWC https://paperswithcode.com/paper/comparing-speech-and-text-classification-on
Repo
Framework
comments powered by Disqus