May 5, 2019

1829 words 9 mins read

Paper Group NANR 56

Language Muse: Automated Linguistic Activity Generation for English Language Learners. Pronoun Prediction with Linguistic Features and Example Weighing. USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing. Bilingual Embeddings and Word Alignments for Translation Quality Estimation. DBpedia Abstracts: A Large-Scale, Open, Mul …

Language Muse: Automated Linguistic Activity Generation for English Language Learners


Title	Language Muse: Automated Linguistic Activity Generation for English Language Learners
Authors	Nitin Madnani, Jill Burstein, John Sabatini, Kietha Biggers, Slava Andreyev
Abstract
Tasks	Question Generation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-4014/
PDF	https://www.aclweb.org/anthology/P16-4014
PWC	https://paperswithcode.com/paper/language-muse-automated-linguistic-activity
Repo
Framework

Pronoun Prediction with Linguistic Features and Example Weighing


Title	Pronoun Prediction with Linguistic Features and Example Weighing
Authors	Michal Nov{'a}k
Abstract
Tasks	Language Modelling, Machine Translation, Word Alignment
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2354/
PDF	https://www.aclweb.org/anthology/W16-2354
PWC	https://paperswithcode.com/paper/pronoun-prediction-with-linguistic-features
Repo
Framework

USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing


Title	USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing
Authors	Santanu Pal, Marcos Zampieri, Josef van Genabith
Abstract
Tasks	Automatic Post-Editing, Machine Translation, Word Alignment
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2379/
PDF	https://www.aclweb.org/anthology/W16-2379
PWC	https://paperswithcode.com/paper/usaar-an-operation-sequential-model-for
Repo
Framework

Bilingual Embeddings and Word Alignments for Translation Quality Estimation


Title	Bilingual Embeddings and Word Alignments for Translation Quality Estimation
Authors	Amal Abdelsalam, Ond{\v{r}}ej Bojar, Samhaa El-Beltagy
Abstract
Tasks	Machine Translation, Word Alignment, Word Embeddings
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2380/
PDF	https://www.aclweb.org/anthology/W16-2380
PWC	https://paperswithcode.com/paper/bilingual-embeddings-and-word-alignments-for
Repo
Framework

DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus


Title	DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
Authors	Martin Br{"u}mmer, Milan Dojchinovski, Sebastian Hellmann
Abstract	The ever increasing importance of machine learning in Natural Language Processing is accompanied by an equally increasing need in large-scale training and evaluation corpora. Due to its size, its openness and relative quality, the Wikipedia has already been a source of such data, but on a limited scale. This paper introduces the DBpedia Abstract Corpus, a large-scale, open corpus of annotated Wikipedia texts in six languages, featuring over 11 million texts and over 97 million entity links. The properties of the Wikipedia texts are being described, as well as the corpus creation process, its format and interesting use-cases, like Named Entity Linking training and evaluation.
Tasks	Entity Linking
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1532/
PDF	https://www.aclweb.org/anthology/L16-1532
PWC	https://paperswithcode.com/paper/dbpedia-abstracts-a-large-scale-open
Repo
Framework

A Dataset for Detecting Stance in Tweets


Title	A Dataset for Detecting Stance in Tweets
Authors	Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, Colin Cherry
Abstract	We can often detect from a person{'}s utterances whether he/she is in favor of or against a given target entity (a product, topic, another person, etc.). Here for the first time we present a dataset of tweets annotated for whether the tweeter is in favor of or against pre-chosen targets of interestâ€•their stance. The targets of interest may or may not be referred to in the tweets, and they may or may not be the target of opinion in the tweets. The data pertains to six targets of interest commonly known and debated in the United States. Apart from stance, the tweets are also annotated for whether the target of interest is the target of opinion in the tweet. The annotations were performed by crowdsourcing. Several techniques were employed to encourage high-quality annotations (for example, providing clear and simple instructions) and to identify and discard poor annotations (for example, using a small set of check questions annotated by the authors). This Stance Dataset, which was subsequently also annotated for sentiment, can be used to better understand the relationship between stance, sentiment, entity relationships, and textual inference.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1623/
PDF	https://www.aclweb.org/anthology/L16-1623
PWC	https://paperswithcode.com/paper/a-dataset-for-detecting-stance-in-tweets
Repo
Framework

Named Entity Resources - Overview and Outlook


Title	Named Entity Resources - Overview and Outlook
Authors	Maud Ehrmann, Damien Nouvel, Sophie Rosset
Abstract	Recognition of real-world entities is crucial for most NLP applications. Since its introduction some twenty years ago, named entity processing has undergone a significant evolution with, among others, the definition of new tasks (e.g. entity linking) and the emergence of new types of data (e.g. speech transcriptions, micro-blogging). These pose certainly new challenges which affect not only methods and algorithms but especially linguistic resources. Where do we stand with respect to named entity resources? This paper aims at providing a systematic overview of named entity resources, accounting for qualities such as multilingualism, dynamicity and interoperability, and to identify shortfalls in order to guide future developments.
Tasks	Entity Linking
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1534/
PDF	https://www.aclweb.org/anthology/L16-1534
PWC	https://paperswithcode.com/paper/named-entity-resources-overview-and-outlook
Repo
Framework

Incorporating Lexico-semantic Heuristics into Coreference Resolution Sieves for Named Entity Recognition at Document-level


Title	Incorporating Lexico-semantic Heuristics into Coreference Resolution Sieves for Named Entity Recognition at Document-level
Authors	Marcos Garcia
Abstract	This paper explores the incorporation of lexico-semantic heuristics into a deterministic Coreference Resolution (CR) system for classifying named entities at document-level. The highest precise sieves of a CR tool are enriched with both a set of heuristics for merging named entities labeled with different classes and also with some constraints that avoid the incorrect merging of similar mentions. Several tests show that this strategy improves both NER labeling and CR. The CR tool can be applied in combination with any system for named entity recognition using the CoNLL format, and brings benefits to text analytics tasks such as Information Extraction. Experiments were carried out in Spanish, using three different NER tools.
Tasks	Coreference Resolution, Named Entity Recognition
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1535/
PDF	https://www.aclweb.org/anthology/L16-1535
PWC	https://paperswithcode.com/paper/incorporating-lexico-semantic-heuristics-into
Repo
Framework

Inter-document Contextual Language model


Title	Inter-document Contextual Language model
Authors	Quan Hung Tran, Ingrid Zukerman, Gholamreza Haffari
Abstract
Tasks	Language Modelling
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1090/
PDF	https://www.aclweb.org/anthology/N16-1090
PWC	https://paperswithcode.com/paper/inter-document-contextual-language-model
Repo
Framework

Temporal Information Annotation: Crowd vs. Experts


Title	Temporal Information Annotation: Crowd vs. Experts
Authors	Tommaso Caselli, Rachele Sprugnoli, Oana Inel
Abstract	This paper describes two sets of crowdsourcing experiments on temporal information annotation conducted on two languages, i.e., English and Italian. The first experiment, launched on the CrowdFlower platform, was aimed at classifying temporal relations given target entities. The second one, relying on the CrowdTruth metric, consisted in two subtasks: one devoted to the recognition of events and temporal expressions and one to the detection and classification of temporal relations. The outcomes of the experiments suggest a valuable use of crowdsourcing annotations also for a complex task like Temporal Processing.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1557/
PDF	https://www.aclweb.org/anthology/L16-1557
PWC	https://paperswithcode.com/paper/temporal-information-annotation-crowd-vs
Repo
Framework

Using Word Embeddings to Translate Named Entities


Title	Using Word Embeddings to Translate Named Entities
Authors	Octavia-Maria {\c{S}}ulea, Sergiu Nisioi, Liviu P. Dinu
Abstract	In this paper we investigate the usefulness of neural word embeddings in the process of translating Named Entities (NEs) from a resource-rich language to a language low on resources relevant to the task at hand, introducing a novel, yet simple way of obtaining bilingual word vectors. Inspired by observations in (Mikolov et al., 2013b), which show that training their word vector model on comparable corpora yields comparable vector space representations of those corpora, reducing the problem of translating words to finding a rotation matrix, and results in (Zou et al., 2013), which showed that bilingual word embeddings can improve Chinese Named Entity Recognition (NER) and English to Chinese phrase translation, we use the sentence-aligned English-French EuroParl corpora and show that word embeddings extracted from a merged corpus (corpus resulted from the merger of the two aligned corpora) can be used to NE translation. We extrapolate that word embeddings trained on merged parallel corpora are useful in Named Entity Recognition and Translation tasks for resource-poor languages.
Tasks	Chinese Named Entity Recognition, Named Entity Recognition, Word Embeddings
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1536/
PDF	https://www.aclweb.org/anthology/L16-1536
PWC	https://paperswithcode.com/paper/using-word-embeddings-to-translate-named
Repo
Framework


Title	DISAANA and D-SUMM: Large-scale Real Time NLP Systems for Analyzing Disaster Related Reports in Tweets
Authors	Kentaro Torisawa
Abstract	This talk presents two NLP systems that were developed for helping disaster victims and rescue workers in the aftermath of large-scale disasters. DISAANA provides answers to questions such as {``}What is in short supply in Tokyo?{''} and displays locations related to each answer on a map. D-SUMM automatically summarizes a large number of disaster related reports concerning a specified area and helps rescue workers to understand disaster situations from a macro perspective. Both systems are publicly available as Web services. In the aftermath of the 2016 Kumamoto Earthquake (M7.0), the Japanese government actually used DISAANA to analyze the situation. \|
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3903/
PDF	https://www.aclweb.org/anthology/W16-3903
PWC	https://paperswithcode.com/paper/disaana-and-d-summ-large-scale-real-time-nlp
Repo
Framework

The GW/LT3 VarDial 2016 Shared Task System for Dialects and Similar Languages Detection


Title	The GW/LT3 VarDial 2016 Shared Task System for Dialects and Similar Languages Detection
Authors	Ayah Zirikly, Bart Desmet, Mona Diab
Abstract	This paper describes the GW/LT3 contribution to the 2016 VarDial shared task on the identification of similar languages (task 1) and Arabic dialects (task 2). For both tasks, we experimented with Logistic Regression and Neural Network classifiers in isolation. Additionally, we implemented a cascaded classifier that consists of coarse and fine-grained classifiers (task 1) and a classifier ensemble with majority voting for task 2. The submitted systems obtained state-of-the art performance and ranked first for the evaluation on social media data (test sets B1 and B2 for task 1), with a maximum weighted F1 score of 91.94{%}.
Tasks	Feature Engineering
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4804/
PDF	https://www.aclweb.org/anthology/W16-4804
PWC	https://paperswithcode.com/paper/the-gwlt3-vardial-2016-shared-task-system-for
Repo
Framework

Learning to Search for Recognizing Named Entities in Twitter


Title	Learning to Search for Recognizing Named Entities in Twitter
Authors	Ioannis Partalas, C{'e}dric Lopez, Nadia Derbas, Ruslan Kalitvianski
Abstract	We presented in this work our participation in the 2nd Named Entity Recognition for Twitter shared task. The task has been cast as a sequence labeling one and we employed a learning to search approach in order to tackle it. We also leveraged LOD for extracting rich contextual features for the named-entities. Our submission achieved F-scores of 46.16 and 60.24 for the classification and the segmentation tasks and ranked 2nd and 3rd respectively. The post-analysis showed that LOD features improved substantially the performance of our system as they counter-balance the lack of context in tweets. The shared task gave us the opportunity to test the performance of NER systems in short and noisy textual data. The results of the participated systems shows that the task is far to be considered as a solved one and methods with stellar performance in normal texts need to be revised.
Tasks	Named Entity Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3923/
PDF	https://www.aclweb.org/anthology/W16-3923
PWC	https://paperswithcode.com/paper/learning-to-search-for-recognizing-named
Repo
Framework

Comparing Speech and Text Classification on ICNALE


Title	Comparing Speech and Text Classification on ICNALE
Authors	Sergiu Nisioi
Abstract	In this paper we explore and compare a speech and text classification approach on a corpus of native and non-native English speakers. We experiment on a subset of the International Corpus Network of Asian Learners of English containing the recorded speeches and the equivalent text transcriptions. Our results suggest a high correlation between the spoken and written classification results, showing that native accent is highly correlated with grammatical structures found in text.
Tasks	Text Classification
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1542/
PDF	https://www.aclweb.org/anthology/L16-1542
PWC	https://paperswithcode.com/paper/comparing-speech-and-text-classification-on
Repo
Framework