May 5, 2019

1951 words 10 mins read

Paper Group NANR 79

Paper Group NANR 79

Two Years of Aranea: Increasing Counts and Tuning the Pipeline. Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation. Emotion Corpus Construction Based on Selection from Hashtags. A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies. Resources for bu …

Two Years of Aranea: Increasing Counts and Tuning the Pipeline

Title Two Years of Aranea: Increasing Counts and Tuning the Pipeline
Authors Vladim{'\i}r Benko
Abstract The Aranea Project is targeted at creation of a family of Gigaword web-corpora for a dozen of languages that could be used for teaching language- and linguistics-related subjects at Slovak universities, as well as for research purposes in various areas of linguistics. All corpora are being built according to a standard methodology and using the same set of tools for processing and annotation, which ― together with their standard size and― makes them also a valuable resource for translators and contrastive studies. All our corpora are freely available either via a web interface or in a source form in an annotated vertical format.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1672/
PDF https://www.aclweb.org/anthology/L16-1672
PWC https://paperswithcode.com/paper/two-years-of-aranea-increasing-counts-and
Repo
Framework

Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation

Title Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
Authors Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, Kenny Q. Zhu
Abstract In neural machine translation, the attention mechanism facilitates the translation process by producing a soft alignment between the source sentence and the target sentence. However, without dedicated distortion and fertility models seen in traditional SMT systems, the learned alignment may not be accurate, which can lead to low translation quality. In this paper, we propose two novel models to improve attention-based neural machine translation. We propose a recurrent attention mechanism as an implicit distortion model, and a fertility conditioned decoder as an implicit fertility model. We conduct experiments on large-scale Chinese{–}English translation tasks. The results show that our models significantly improve both the alignment and translation quality compared to the original attention mechanism and several other variations.
Tasks Machine Translation
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1290/
PDF https://www.aclweb.org/anthology/C16-1290
PWC https://paperswithcode.com/paper/improving-attention-modeling-with-implicit
Repo
Framework

Emotion Corpus Construction Based on Selection from Hashtags

Title Emotion Corpus Construction Based on Selection from Hashtags
Authors Minglei Li, Yunfei Long, Lu Qin, Wenjie Li
Abstract The availability of labelled corpus is of great importance for supervised learning in emotion classification tasks. Because it is time-consuming to manually label text, hashtags have been used as naturally annotated labels to obtain a large amount of labelled training data from microblog. However, natural hashtags contain too much noise for it to be used directly in learning algorithms. In this paper, we design a three-stage semi-automatic method to construct an emotion corpus from microblogs. Firstly, a lexicon based voting approach is used to verify the hashtag automatically. Secondly, a SVM based classifier is used to select the data whose natural labels are consistent with the predicted labels. Finally, the remaining data will be manually examined to filter out the noisy data. Out of about 48K filtered Chinese microblogs, 39k microblogs are selected to form the final corpus with the Kappa value reaching over 0.92 for the automatic parts and over 0.81 for the manual part. The proportion of automatic selection reaches 54.1{%}. Thus, the method can reduce about 44.5{%} of manual workload for acquiring quality data. Experiment on a classifier trained on this corpus shows that it achieves comparable results compared to the manually annotated NLP{&}CC2013 corpus.
Tasks Emotion Classification
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1291/
PDF https://www.aclweb.org/anthology/L16-1291
PWC https://paperswithcode.com/paper/emotion-corpus-construction-based-on
Repo
Framework

A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies

Title A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies
Authors Miki Iwai, Koichi Takeuchi, Kyo Kageura, Kazuya Ishibashi
Abstract In this paper, we propose a method of augmenting existing bilingual terminologies. Our method belongs to a {}generate and validate{''} framework rather than extraction from corpora. Although many studies have proposed methods to find term translations or to augment terminology within a {}generate and validate{''} framework, few has taken full advantage of the systematic nature of terminologies. A terminology of a domain represents the conceptual system of the domain fairly systematically, and we contend that making use of the systematicity fully will greatly contribute to the effective augmentation of terminologies. This paper proposes and evaluates a novel method to generate bilingual term candidates by using existing terminologies and delving into their systematicity. Experiments have shown that our method can generate much better term candidate pairs than the existing method and give improved performance for terminology augmentation.
Tasks Transfer Learning
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4705/
PDF https://www.aclweb.org/anthology/W16-4705
PWC https://paperswithcode.com/paper/a-method-of-augmenting-bilingual-terminology
Repo
Framework

Resources for building applications with Dependency Minimal Recursion Semantics

Title Resources for building applications with Dependency Minimal Recursion Semantics
Authors Ann Copestake, Guy Emerson, Michael Wayne Goodman, Matic Horvat, Alex Kuhnle, er, Ewa Muszy{'n}ska
Abstract We describe resources aimed at increasing the usability of the semantic representations utilized within the DELPH-IN (Deep Linguistic Processing with HPSG) consortium. We concentrate in particular on the Dependency Minimal Recursion Semantics (DMRS) formalism, a graph-based representation designed for compositional semantic representation with deep grammars. Our main focus is on English, and specifically English Resource Semantics (ERS) as used in the English Resource Grammar. We first give an introduction to ERS and DMRS and a brief overview of some existing resources and then describe in detail a new repository which has been developed to simplify the use of ERS/DMRS. We explain a number of operations on DMRS graphs which our repository supports, with sketches of the algorithms, and illustrate how these operations can be exploited in application building. We believe that this work will aid researchers to exploit the rich and effective but complex DELPH-IN resources.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1197/
PDF https://www.aclweb.org/anthology/L16-1197
PWC https://paperswithcode.com/paper/resources-for-building-applications-with
Repo
Framework

Exploiting Linguistic Features for Sentence Completion

Title Exploiting Linguistic Features for Sentence Completion
Authors Aubrie Woods
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-2071/
PDF https://www.aclweb.org/anthology/P16-2071
PWC https://paperswithcode.com/paper/exploiting-linguistic-features-for-sentence
Repo
Framework

Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces

Title Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces
Authors Matthias Sperber, Graham Neubig, Satoshi Nakamura, Alex Waibel
Abstract Computer-assisted transcription promises high-quality speech transcription at reduced costs. This is achieved by limiting human effort to transcribing parts for which automatic transcription quality is insufficient. Our goal is to improve the human transcription quality via appropriate user interface design. We focus on iterative interfaces that allow humans to solve tasks based on an initially given suggestion, in this case an automatic transcription. We conduct a user study that reveals considerable quality gains for three variations of iterative interfaces over a non-iterative from-scratch transcription interface. Our iterative interfaces included post-editing, confidence-enhanced post-editing, and a novel retyping interface. All three yielded similar quality on average, but we found that the proposed retyping interface was less sensitive to the difficulty of the segment, and superior when the automatic transcription of the segment contained relatively many errors. An analysis using mixed-effects models allows us to quantify these and other factors and draw conclusions over which interface design should be chosen in which circumstance.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1314/
PDF https://www.aclweb.org/anthology/L16-1314
PWC https://paperswithcode.com/paper/optimizing-computer-assisted-transcription
Repo
Framework

Predictive Modeling: Guessing the NLP Terms of Tomorrow

Title Predictive Modeling: Guessing the NLP Terms of Tomorrow
Authors Gil Francopoulo, Joseph Mariani, Patrick Paroubek
Abstract Predictive modeling, often called {``}predictive analytics{''} in a commercial context, encompasses a variety of statistical techniques that analyze historical and present facts to make predictions about unknown events. Often the unknown events are in the future, but prediction can be applied to any type of unknown whether it be in the past or future. In our case, we present some experiments applying predictive modeling to the usage of technical terms within the NLP domain. |
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1052/
PDF https://www.aclweb.org/anthology/L16-1052
PWC https://paperswithcode.com/paper/predictive-modeling-guessing-the-nlp-terms-of
Repo
Framework

Syntactic methods for negation detection in radiology reports in Spanish

Title Syntactic methods for negation detection in radiology reports in Spanish
Authors Viviana Cotik, Vanesa Stricker, Jorge Vivaldi, Horacio Rodriguez
Abstract
Tasks Negation Detection
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2921/
PDF https://www.aclweb.org/anthology/W16-2921
PWC https://paperswithcode.com/paper/syntactic-methods-for-negation-detection-in
Repo
Framework

Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods

Title Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods
Authors
Abstract
Tasks
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-6000/
PDF https://www.aclweb.org/anthology/W16-6000
PWC https://paperswithcode.com/paper/proceedings-of-the-workshop-on-uphill-battles
Repo
Framework

Designing A Long Lasting Linguistic Project: The Case Study of ASIt

Title Designing A Long Lasting Linguistic Project: The Case Study of ASIt
Authors Maristella Agosti, Emanuele Di Buccio, Giorgio Maria Di Nunzio, Cecilia Poletto, Esther Rinke
Abstract In this paper, we discuss the requirements that a long lasting linguistic database should have in order to meet the needs of the linguists together with the aim of durability and sharing of data. In particular, we discuss the generalizability of the Syntactic Atlas of Italy, a linguistic project that builds on a long standing tradition of collecting and analyzing linguistic corpora, on a more recent project that focuses on the synchronic and diachronic analysis of the syntax of Italian and Portuguese relative clauses. The results that are presented are in line with the FLaReNet Strategic Agenda that highlighted the most pressing needs for research areas, such as Natural Language Processing, and presented a set of recommendations for the development and progress of Language resources in Europe.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1709/
PDF https://www.aclweb.org/anthology/L16-1709
PWC https://paperswithcode.com/paper/designing-a-long-lasting-linguistic-project
Repo
Framework

Unsupervised Stemmer for Arabic Tweets

Title Unsupervised Stemmer for Arabic Tweets
Authors Fahad Albogamy, Allan Ramsay
Abstract Stemming is an essential processing step in a wide range of high level text processing applications such as information extraction, machine translation and sentiment analysis. It is used to reduce words to their stems. Many stemming algorithms have been developed for Modern Standard Arabic (MSA). Although Arabic tweets and MSA are closely related and share many characteristics, there are substantial differences between them in lexicon and syntax. In this paper, we introduce a light Arabic stemmer for Arabic tweets. Our results show improvements over the performance of a number of well-known stemmers for Arabic.
Tasks Machine Translation, Sentiment Analysis
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3912/
PDF https://www.aclweb.org/anthology/W16-3912
PWC https://paperswithcode.com/paper/unsupervised-stemmer-for-arabic-tweets
Repo
Framework

ECNU at SemEval-2016 Task 4: An Empirical Investigation of Traditional NLP Features and Word Embedding Features for Sentence-level and Topic-level Sentiment Analysis in Twitter

Title ECNU at SemEval-2016 Task 4: An Empirical Investigation of Traditional NLP Features and Word Embedding Features for Sentence-level and Topic-level Sentiment Analysis in Twitter
Authors Yunxiao Zhou, Zhihua Zhang, Man Lan
Abstract
Tasks Feature Engineering, Language Modelling, Sentiment Analysis
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1040/
PDF https://www.aclweb.org/anthology/S16-1040
PWC https://paperswithcode.com/paper/ecnu-at-semeval-2016-task-4-an-empirical
Repo
Framework

A Bilingual Discourse Corpus and Its Applications

Title A Bilingual Discourse Corpus and Its Applications
Authors Yang Liu, Jiajun Zhang, Chengqing Zong, Yating Yang, Xi Zhou
Abstract Existing discourse research only focuses on the monolingual languages and the inconsistency between languages limits the power of the discourse theory in multilingual applications such as machine translation. To address this issue, we design and build a bilingual discource corpus in which we are currently defining and annotating the bilingual elementary discourse units (BEDUs). The BEDUs are then organized into hierarchical structures. Using this discourse style, we have annotated nearly 20K LDC sentences. Finally, we design a bilingual discourse based method for machine translation evaluation and show the effectiveness of our bilingual discourse annotations.
Tasks Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1159/
PDF https://www.aclweb.org/anthology/L16-1159
PWC https://paperswithcode.com/paper/a-bilingual-discourse-corpus-and-its
Repo
Framework

Statistical Script Learning with Recurrent Neural Networks

Title Statistical Script Learning with Recurrent Neural Networks
Authors Karl Pichotta, Raymond Mooney
Abstract
Tasks Coreference Resolution, Question Answering, Semantic Role Labeling
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-6003/
PDF https://www.aclweb.org/anthology/W16-6003
PWC https://paperswithcode.com/paper/statistical-script-learning-with-recurrent
Repo
Framework
comments powered by Disqus