May 5, 2019

1951 words 10 mins read

Paper Group NANR 79

Two Years of Aranea: Increasing Counts and Tuning the Pipeline. Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation. Emotion Corpus Construction Based on Selection from Hashtags. A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies. Resources for bu …

Two Years of Aranea: Increasing Counts and Tuning the Pipeline


Title	Two Years of Aranea: Increasing Counts and Tuning the Pipeline
Authors	Vladim{'\i}r Benko
Abstract	The Aranea Project is targeted at creation of a family of Gigaword web-corpora for a dozen of languages that could be used for teaching language- and linguistics-related subjects at Slovak universities, as well as for research purposes in various areas of linguistics. All corpora are being built according to a standard methodology and using the same set of tools for processing and annotation, which ― together with their standard size and― makes them also a valuable resource for translators and contrastive studies. All our corpora are freely available either via a web interface or in a source form in an annotated vertical format.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1672/
PDF	https://www.aclweb.org/anthology/L16-1672
PWC	https://paperswithcode.com/paper/two-years-of-aranea-increasing-counts-and
Repo
Framework

Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation


Title	Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
Authors	Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, Kenny Q. Zhu
Abstract	In neural machine translation, the attention mechanism facilitates the translation process by producing a soft alignment between the source sentence and the target sentence. However, without dedicated distortion and fertility models seen in traditional SMT systems, the learned alignment may not be accurate, which can lead to low translation quality. In this paper, we propose two novel models to improve attention-based neural machine translation. We propose a recurrent attention mechanism as an implicit distortion model, and a fertility conditioned decoder as an implicit fertility model. We conduct experiments on large-scale Chinese{–}English translation tasks. The results show that our models significantly improve both the alignment and translation quality compared to the original attention mechanism and several other variations.
Tasks	Machine Translation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1290/
PDF	https://www.aclweb.org/anthology/C16-1290
PWC	https://paperswithcode.com/paper/improving-attention-modeling-with-implicit
Repo
Framework

Emotion Corpus Construction Based on Selection from Hashtags


Title	Emotion Corpus Construction Based on Selection from Hashtags
Authors	Minglei Li, Yunfei Long, Lu Qin, Wenjie Li
Abstract	The availability of labelled corpus is of great importance for supervised learning in emotion classification tasks. Because it is time-consuming to manually label text, hashtags have been used as naturally annotated labels to obtain a large amount of labelled training data from microblog. However, natural hashtags contain too much noise for it to be used directly in learning algorithms. In this paper, we design a three-stage semi-automatic method to construct an emotion corpus from microblogs. Firstly, a lexicon based voting approach is used to verify the hashtag automatically. Secondly, a SVM based classifier is used to select the data whose natural labels are consistent with the predicted labels. Finally, the remaining data will be manually examined to filter out the noisy data. Out of about 48K filtered Chinese microblogs, 39k microblogs are selected to form the final corpus with the Kappa value reaching over 0.92 for the automatic parts and over 0.81 for the manual part. The proportion of automatic selection reaches 54.1{%}. Thus, the method can reduce about 44.5{%} of manual workload for acquiring quality data. Experiment on a classifier trained on this corpus shows that it achieves comparable results compared to the manually annotated NLP{&}CC2013 corpus.
Tasks	Emotion Classification
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1291/
PDF	https://www.aclweb.org/anthology/L16-1291
PWC	https://paperswithcode.com/paper/emotion-corpus-construction-based-on
Repo
Framework

A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies


Title	A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies
Authors	Miki Iwai, Koichi Takeuchi, Kyo Kageura, Kazuya Ishibashi
Abstract	In this paper, we propose a method of augmenting existing bilingual terminologies. Our method belongs to a {`}generate and validate{''} framework rather than extraction from corpora. Although many studies have proposed methods to find term translations or to augment terminology within a {`}generate and validate{''} framework, few has taken full advantage of the systematic nature of terminologies. A terminology of a domain represents the conceptual system of the domain fairly systematically, and we contend that making use of the systematicity fully will greatly contribute to the effective augmentation of terminologies. This paper proposes and evaluates a novel method to generate bilingual term candidates by using existing terminologies and delving into their systematicity. Experiments have shown that our method can generate much better term candidate pairs than the existing method and give improved performance for terminology augmentation.
Tasks	Transfer Learning
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4705/
PDF	https://www.aclweb.org/anthology/W16-4705
PWC	https://paperswithcode.com/paper/a-method-of-augmenting-bilingual-terminology
Repo
Framework

Resources for building applications with Dependency Minimal Recursion Semantics


Title	Resources for building applications with Dependency Minimal Recursion Semantics
Authors	Ann Copestake, Guy Emerson, Michael Wayne Goodman, Matic Horvat, Alex Kuhnle, er, Ewa Muszy{'n}ska
Abstract	We describe resources aimed at increasing the usability of the semantic representations utilized within the DELPH-IN (Deep Linguistic Processing with HPSG) consortium. We concentrate in particular on the Dependency Minimal Recursion Semantics (DMRS) formalism, a graph-based representation designed for compositional semantic representation with deep grammars. Our main focus is on English, and specifically English Resource Semantics (ERS) as used in the English Resource Grammar. We first give an introduction to ERS and DMRS and a brief overview of some existing resources and then describe in detail a new repository which has been developed to simplify the use of ERS/DMRS. We explain a number of operations on DMRS graphs which our repository supports, with sketches of the algorithms, and illustrate how these operations can be exploited in application building. We believe that this work will aid researchers to exploit the rich and effective but complex DELPH-IN resources.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1197/
PDF	https://www.aclweb.org/anthology/L16-1197
PWC	https://paperswithcode.com/paper/resources-for-building-applications-with
Repo
Framework

Exploiting Linguistic Features for Sentence Completion


Title	Exploiting Linguistic Features for Sentence Completion
Authors	Aubrie Woods
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-2071/
PDF	https://www.aclweb.org/anthology/P16-2071
PWC	https://paperswithcode.com/paper/exploiting-linguistic-features-for-sentence
Repo
Framework

Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces


Title	Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces
Authors	Matthias Sperber, Graham Neubig, Satoshi Nakamura, Alex Waibel
Abstract	Computer-assisted transcription promises high-quality speech transcription at reduced costs. This is achieved by limiting human effort to transcribing parts for which automatic transcription quality is insufficient. Our goal is to improve the human transcription quality via appropriate user interface design. We focus on iterative interfaces that allow humans to solve tasks based on an initially given suggestion, in this case an automatic transcription. We conduct a user study that reveals considerable quality gains for three variations of iterative interfaces over a non-iterative from-scratch transcription interface. Our iterative interfaces included post-editing, confidence-enhanced post-editing, and a novel retyping interface. All three yielded similar quality on average, but we found that the proposed retyping interface was less sensitive to the difficulty of the segment, and superior when the automatic transcription of the segment contained relatively many errors. An analysis using mixed-effects models allows us to quantify these and other factors and draw conclusions over which interface design should be chosen in which circumstance.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1314/
PDF	https://www.aclweb.org/anthology/L16-1314
PWC	https://paperswithcode.com/paper/optimizing-computer-assisted-transcription
Repo
Framework

Predictive Modeling: Guessing the NLP Terms of Tomorrow


Title	Predictive Modeling: Guessing the NLP Terms of Tomorrow
Authors	Gil Francopoulo, Joseph Mariani, Patrick Paroubek
Abstract	Predictive modeling, often called {``}predictive analytics{''} in a commercial context, encompasses a variety of statistical techniques that analyze historical and present facts to make predictions about unknown events. Often the unknown events are in the future, but prediction can be applied to any type of unknown whether it be in the past or future. In our case, we present some experiments applying predictive modeling to the usage of technical terms within the NLP domain. \|
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1052/
PDF	https://www.aclweb.org/anthology/L16-1052
PWC	https://paperswithcode.com/paper/predictive-modeling-guessing-the-nlp-terms-of
Repo
Framework

Syntactic methods for negation detection in radiology reports in Spanish


Title	Syntactic methods for negation detection in radiology reports in Spanish
Authors	Viviana Cotik, Vanesa Stricker, Jorge Vivaldi, Horacio Rodriguez
Abstract
Tasks	Negation Detection
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2921/
PDF	https://www.aclweb.org/anthology/W16-2921
PWC	https://paperswithcode.com/paper/syntactic-methods-for-negation-detection-in
Repo
Framework

Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods


Title	Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods
Authors
Abstract
Tasks
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-6000/
PDF	https://www.aclweb.org/anthology/W16-6000
PWC	https://paperswithcode.com/paper/proceedings-of-the-workshop-on-uphill-battles
Repo
Framework

Designing A Long Lasting Linguistic Project: The Case Study of ASIt


Title	Designing A Long Lasting Linguistic Project: The Case Study of ASIt
Authors	Maristella Agosti, Emanuele Di Buccio, Giorgio Maria Di Nunzio, Cecilia Poletto, Esther Rinke
Abstract	In this paper, we discuss the requirements that a long lasting linguistic database should have in order to meet the needs of the linguists together with the aim of durability and sharing of data. In particular, we discuss the generalizability of the Syntactic Atlas of Italy, a linguistic project that builds on a long standing tradition of collecting and analyzing linguistic corpora, on a more recent project that focuses on the synchronic and diachronic analysis of the syntax of Italian and Portuguese relative clauses. The results that are presented are in line with the FLaReNet Strategic Agenda that highlighted the most pressing needs for research areas, such as Natural Language Processing, and presented a set of recommendations for the development and progress of Language resources in Europe.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1709/
PDF	https://www.aclweb.org/anthology/L16-1709
PWC	https://paperswithcode.com/paper/designing-a-long-lasting-linguistic-project
Repo
Framework

Unsupervised Stemmer for Arabic Tweets


Title	Unsupervised Stemmer for Arabic Tweets
Authors	Fahad Albogamy, Allan Ramsay
Abstract	Stemming is an essential processing step in a wide range of high level text processing applications such as information extraction, machine translation and sentiment analysis. It is used to reduce words to their stems. Many stemming algorithms have been developed for Modern Standard Arabic (MSA). Although Arabic tweets and MSA are closely related and share many characteristics, there are substantial differences between them in lexicon and syntax. In this paper, we introduce a light Arabic stemmer for Arabic tweets. Our results show improvements over the performance of a number of well-known stemmers for Arabic.
Tasks	Machine Translation, Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3912/
PDF	https://www.aclweb.org/anthology/W16-3912
PWC	https://paperswithcode.com/paper/unsupervised-stemmer-for-arabic-tweets
Repo
Framework

ECNU at SemEval-2016 Task 4: An Empirical Investigation of Traditional NLP Features and Word Embedding Features for Sentence-level and Topic-level Sentiment Analysis in Twitter


Title	ECNU at SemEval-2016 Task 4: An Empirical Investigation of Traditional NLP Features and Word Embedding Features for Sentence-level and Topic-level Sentiment Analysis in Twitter
Authors	Yunxiao Zhou, Zhihua Zhang, Man Lan
Abstract
Tasks	Feature Engineering, Language Modelling, Sentiment Analysis
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1040/
PDF	https://www.aclweb.org/anthology/S16-1040
PWC	https://paperswithcode.com/paper/ecnu-at-semeval-2016-task-4-an-empirical
Repo
Framework

A Bilingual Discourse Corpus and Its Applications


Title	A Bilingual Discourse Corpus and Its Applications
Authors	Yang Liu, Jiajun Zhang, Chengqing Zong, Yating Yang, Xi Zhou
Abstract	Existing discourse research only focuses on the monolingual languages and the inconsistency between languages limits the power of the discourse theory in multilingual applications such as machine translation. To address this issue, we design and build a bilingual discource corpus in which we are currently defining and annotating the bilingual elementary discourse units (BEDUs). The BEDUs are then organized into hierarchical structures. Using this discourse style, we have annotated nearly 20K LDC sentences. Finally, we design a bilingual discourse based method for machine translation evaluation and show the effectiveness of our bilingual discourse annotations.
Tasks	Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1159/
PDF	https://www.aclweb.org/anthology/L16-1159
PWC	https://paperswithcode.com/paper/a-bilingual-discourse-corpus-and-its
Repo
Framework

Statistical Script Learning with Recurrent Neural Networks


Title	Statistical Script Learning with Recurrent Neural Networks
Authors	Karl Pichotta, Raymond Mooney
Abstract
Tasks	Coreference Resolution, Question Answering, Semantic Role Labeling
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-6003/
PDF	https://www.aclweb.org/anthology/W16-6003
PWC	https://paperswithcode.com/paper/statistical-script-learning-with-recurrent
Repo
Framework