Paper Group NANR 79
Two Years of Aranea: Increasing Counts and Tuning the Pipeline. Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation. Emotion Corpus Construction Based on Selection from Hashtags. A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies. Resources for bu …
Two Years of Aranea: Increasing Counts and Tuning the Pipeline
Title | Two Years of Aranea: Increasing Counts and Tuning the Pipeline |
Authors | Vladim{'\i}r Benko |
Abstract | The Aranea Project is targeted at creation of a family of Gigaword web-corpora for a dozen of languages that could be used for teaching language- and linguistics-related subjects at Slovak universities, as well as for research purposes in various areas of linguistics. All corpora are being built according to a standard methodology and using the same set of tools for processing and annotation, which ― together with their standard size and― makes them also a valuable resource for translators and contrastive studies. All our corpora are freely available either via a web interface or in a source form in an annotated vertical format. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1672/ |
https://www.aclweb.org/anthology/L16-1672 | |
PWC | https://paperswithcode.com/paper/two-years-of-aranea-increasing-counts-and |
Repo | |
Framework | |
Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
Title | Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation |
Authors | Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, Kenny Q. Zhu |
Abstract | In neural machine translation, the attention mechanism facilitates the translation process by producing a soft alignment between the source sentence and the target sentence. However, without dedicated distortion and fertility models seen in traditional SMT systems, the learned alignment may not be accurate, which can lead to low translation quality. In this paper, we propose two novel models to improve attention-based neural machine translation. We propose a recurrent attention mechanism as an implicit distortion model, and a fertility conditioned decoder as an implicit fertility model. We conduct experiments on large-scale Chinese{–}English translation tasks. The results show that our models significantly improve both the alignment and translation quality compared to the original attention mechanism and several other variations. |
Tasks | Machine Translation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1290/ |
https://www.aclweb.org/anthology/C16-1290 | |
PWC | https://paperswithcode.com/paper/improving-attention-modeling-with-implicit |
Repo | |
Framework | |
Emotion Corpus Construction Based on Selection from Hashtags
Title | Emotion Corpus Construction Based on Selection from Hashtags |
Authors | Minglei Li, Yunfei Long, Lu Qin, Wenjie Li |
Abstract | The availability of labelled corpus is of great importance for supervised learning in emotion classification tasks. Because it is time-consuming to manually label text, hashtags have been used as naturally annotated labels to obtain a large amount of labelled training data from microblog. However, natural hashtags contain too much noise for it to be used directly in learning algorithms. In this paper, we design a three-stage semi-automatic method to construct an emotion corpus from microblogs. Firstly, a lexicon based voting approach is used to verify the hashtag automatically. Secondly, a SVM based classifier is used to select the data whose natural labels are consistent with the predicted labels. Finally, the remaining data will be manually examined to filter out the noisy data. Out of about 48K filtered Chinese microblogs, 39k microblogs are selected to form the final corpus with the Kappa value reaching over 0.92 for the automatic parts and over 0.81 for the manual part. The proportion of automatic selection reaches 54.1{%}. Thus, the method can reduce about 44.5{%} of manual workload for acquiring quality data. Experiment on a classifier trained on this corpus shows that it achieves comparable results compared to the manually annotated NLP{&}CC2013 corpus. |
Tasks | Emotion Classification |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1291/ |
https://www.aclweb.org/anthology/L16-1291 | |
PWC | https://paperswithcode.com/paper/emotion-corpus-construction-based-on |
Repo | |
Framework | |
A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies
Title | A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies |
Authors | Miki Iwai, Koichi Takeuchi, Kyo Kageura, Kazuya Ishibashi |
Abstract | In this paper, we propose a method of augmenting existing bilingual terminologies. Our method belongs to a {}generate and validate{''} framework rather than extraction from corpora. Although many studies have proposed methods to find term translations or to augment terminology within a { }generate and validate{''} framework, few has taken full advantage of the systematic nature of terminologies. A terminology of a domain represents the conceptual system of the domain fairly systematically, and we contend that making use of the systematicity fully will greatly contribute to the effective augmentation of terminologies. This paper proposes and evaluates a novel method to generate bilingual term candidates by using existing terminologies and delving into their systematicity. Experiments have shown that our method can generate much better term candidate pairs than the existing method and give improved performance for terminology augmentation. |
Tasks | Transfer Learning |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4705/ |
https://www.aclweb.org/anthology/W16-4705 | |
PWC | https://paperswithcode.com/paper/a-method-of-augmenting-bilingual-terminology |
Repo | |
Framework | |
Resources for building applications with Dependency Minimal Recursion Semantics
Title | Resources for building applications with Dependency Minimal Recursion Semantics |
Authors | Ann Copestake, Guy Emerson, Michael Wayne Goodman, Matic Horvat, Alex Kuhnle, er, Ewa Muszy{'n}ska |
Abstract | We describe resources aimed at increasing the usability of the semantic representations utilized within the DELPH-IN (Deep Linguistic Processing with HPSG) consortium. We concentrate in particular on the Dependency Minimal Recursion Semantics (DMRS) formalism, a graph-based representation designed for compositional semantic representation with deep grammars. Our main focus is on English, and specifically English Resource Semantics (ERS) as used in the English Resource Grammar. We first give an introduction to ERS and DMRS and a brief overview of some existing resources and then describe in detail a new repository which has been developed to simplify the use of ERS/DMRS. We explain a number of operations on DMRS graphs which our repository supports, with sketches of the algorithms, and illustrate how these operations can be exploited in application building. We believe that this work will aid researchers to exploit the rich and effective but complex DELPH-IN resources. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1197/ |
https://www.aclweb.org/anthology/L16-1197 | |
PWC | https://paperswithcode.com/paper/resources-for-building-applications-with |
Repo | |
Framework | |
Exploiting Linguistic Features for Sentence Completion
Title | Exploiting Linguistic Features for Sentence Completion |
Authors | Aubrie Woods |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-2071/ |
https://www.aclweb.org/anthology/P16-2071 | |
PWC | https://paperswithcode.com/paper/exploiting-linguistic-features-for-sentence |
Repo | |
Framework | |
Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces
Title | Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces |
Authors | Matthias Sperber, Graham Neubig, Satoshi Nakamura, Alex Waibel |
Abstract | Computer-assisted transcription promises high-quality speech transcription at reduced costs. This is achieved by limiting human effort to transcribing parts for which automatic transcription quality is insufficient. Our goal is to improve the human transcription quality via appropriate user interface design. We focus on iterative interfaces that allow humans to solve tasks based on an initially given suggestion, in this case an automatic transcription. We conduct a user study that reveals considerable quality gains for three variations of iterative interfaces over a non-iterative from-scratch transcription interface. Our iterative interfaces included post-editing, confidence-enhanced post-editing, and a novel retyping interface. All three yielded similar quality on average, but we found that the proposed retyping interface was less sensitive to the difficulty of the segment, and superior when the automatic transcription of the segment contained relatively many errors. An analysis using mixed-effects models allows us to quantify these and other factors and draw conclusions over which interface design should be chosen in which circumstance. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1314/ |
https://www.aclweb.org/anthology/L16-1314 | |
PWC | https://paperswithcode.com/paper/optimizing-computer-assisted-transcription |
Repo | |
Framework | |
Predictive Modeling: Guessing the NLP Terms of Tomorrow
Title | Predictive Modeling: Guessing the NLP Terms of Tomorrow |
Authors | Gil Francopoulo, Joseph Mariani, Patrick Paroubek |
Abstract | Predictive modeling, often called {``}predictive analytics{''} in a commercial context, encompasses a variety of statistical techniques that analyze historical and present facts to make predictions about unknown events. Often the unknown events are in the future, but prediction can be applied to any type of unknown whether it be in the past or future. In our case, we present some experiments applying predictive modeling to the usage of technical terms within the NLP domain. | |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1052/ |
https://www.aclweb.org/anthology/L16-1052 | |
PWC | https://paperswithcode.com/paper/predictive-modeling-guessing-the-nlp-terms-of |
Repo | |
Framework | |
Syntactic methods for negation detection in radiology reports in Spanish
Title | Syntactic methods for negation detection in radiology reports in Spanish |
Authors | Viviana Cotik, Vanesa Stricker, Jorge Vivaldi, Horacio Rodriguez |
Abstract | |
Tasks | Negation Detection |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2921/ |
https://www.aclweb.org/anthology/W16-2921 | |
PWC | https://paperswithcode.com/paper/syntactic-methods-for-negation-detection-in |
Repo | |
Framework | |
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods
Title | Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods |
Authors | |
Abstract | |
Tasks | |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/W16-6000/ |
https://www.aclweb.org/anthology/W16-6000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-workshop-on-uphill-battles |
Repo | |
Framework | |
Designing A Long Lasting Linguistic Project: The Case Study of ASIt
Title | Designing A Long Lasting Linguistic Project: The Case Study of ASIt |
Authors | Maristella Agosti, Emanuele Di Buccio, Giorgio Maria Di Nunzio, Cecilia Poletto, Esther Rinke |
Abstract | In this paper, we discuss the requirements that a long lasting linguistic database should have in order to meet the needs of the linguists together with the aim of durability and sharing of data. In particular, we discuss the generalizability of the Syntactic Atlas of Italy, a linguistic project that builds on a long standing tradition of collecting and analyzing linguistic corpora, on a more recent project that focuses on the synchronic and diachronic analysis of the syntax of Italian and Portuguese relative clauses. The results that are presented are in line with the FLaReNet Strategic Agenda that highlighted the most pressing needs for research areas, such as Natural Language Processing, and presented a set of recommendations for the development and progress of Language resources in Europe. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1709/ |
https://www.aclweb.org/anthology/L16-1709 | |
PWC | https://paperswithcode.com/paper/designing-a-long-lasting-linguistic-project |
Repo | |
Framework | |
Unsupervised Stemmer for Arabic Tweets
Title | Unsupervised Stemmer for Arabic Tweets |
Authors | Fahad Albogamy, Allan Ramsay |
Abstract | Stemming is an essential processing step in a wide range of high level text processing applications such as information extraction, machine translation and sentiment analysis. It is used to reduce words to their stems. Many stemming algorithms have been developed for Modern Standard Arabic (MSA). Although Arabic tweets and MSA are closely related and share many characteristics, there are substantial differences between them in lexicon and syntax. In this paper, we introduce a light Arabic stemmer for Arabic tweets. Our results show improvements over the performance of a number of well-known stemmers for Arabic. |
Tasks | Machine Translation, Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3912/ |
https://www.aclweb.org/anthology/W16-3912 | |
PWC | https://paperswithcode.com/paper/unsupervised-stemmer-for-arabic-tweets |
Repo | |
Framework | |
ECNU at SemEval-2016 Task 4: An Empirical Investigation of Traditional NLP Features and Word Embedding Features for Sentence-level and Topic-level Sentiment Analysis in Twitter
Title | ECNU at SemEval-2016 Task 4: An Empirical Investigation of Traditional NLP Features and Word Embedding Features for Sentence-level and Topic-level Sentiment Analysis in Twitter |
Authors | Yunxiao Zhou, Zhihua Zhang, Man Lan |
Abstract | |
Tasks | Feature Engineering, Language Modelling, Sentiment Analysis |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1040/ |
https://www.aclweb.org/anthology/S16-1040 | |
PWC | https://paperswithcode.com/paper/ecnu-at-semeval-2016-task-4-an-empirical |
Repo | |
Framework | |
A Bilingual Discourse Corpus and Its Applications
Title | A Bilingual Discourse Corpus and Its Applications |
Authors | Yang Liu, Jiajun Zhang, Chengqing Zong, Yating Yang, Xi Zhou |
Abstract | Existing discourse research only focuses on the monolingual languages and the inconsistency between languages limits the power of the discourse theory in multilingual applications such as machine translation. To address this issue, we design and build a bilingual discource corpus in which we are currently defining and annotating the bilingual elementary discourse units (BEDUs). The BEDUs are then organized into hierarchical structures. Using this discourse style, we have annotated nearly 20K LDC sentences. Finally, we design a bilingual discourse based method for machine translation evaluation and show the effectiveness of our bilingual discourse annotations. |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1159/ |
https://www.aclweb.org/anthology/L16-1159 | |
PWC | https://paperswithcode.com/paper/a-bilingual-discourse-corpus-and-its |
Repo | |
Framework | |
Statistical Script Learning with Recurrent Neural Networks
Title | Statistical Script Learning with Recurrent Neural Networks |
Authors | Karl Pichotta, Raymond Mooney |
Abstract | |
Tasks | Coreference Resolution, Question Answering, Semantic Role Labeling |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/W16-6003/ |
https://www.aclweb.org/anthology/W16-6003 | |
PWC | https://paperswithcode.com/paper/statistical-script-learning-with-recurrent |
Repo | |
Framework | |