May 4, 2019

1943 words 10 mins read

Paper Group NANR 226

BonTen' -- Corpus Concordance System for NINJAL Web Japanese Corpus’. Lexical Resources to Enrich English Malayalam Machine Translation. Quantifying sentence complexity based on eye-tracking measures. A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge. Bitextor’s participation in WMT’16: sh …

`BonTen' -- Corpus Concordance System for` NINJAL Web Japanese Corpus’


Title	`BonTen' -- Corpus Concordance System for` NINJAL Web Japanese Corpus’
Authors	Masayuki Asahara, Kazuya Kawahara, Yuya Takei, Hideto Masuoka, Yasuko Ohba, Yuki Torii, Toru Morii, Yuki Tanaka, Kikuo Maekawa, Sachi Kato, Hikari Konishi
Abstract	The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named {`}BonTen{'} which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure. \|
Tasks	Morphological Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-2006/
PDF	https://www.aclweb.org/anthology/C16-2006
PWC	https://paperswithcode.com/paper/bonten-a-corpus-concordance-system-for-ninjal
Repo
Framework

Lexical Resources to Enrich English Malayalam Machine Translation


Title	Lexical Resources to Enrich English Malayalam Machine Translation
Authors	Sreelekha S, Pushpak Bhattacharyya
Abstract	In this paper we present our work on the usage of lexical resources for the Machine Translation English and Malayalam. We describe a comparative performance between different Statistical Machine Translation (SMT) systems on top of phrase based SMT system as baseline. We explore different ways of utilizing lexical resources to improve the quality of English Malayalam statistical machine translation. In order to enrich the training corpus we have augmented the lexical resources in two ways (a) additional vocabulary and (b) inflected verbal forms. Lexical resources include IndoWordnet semantic relation set, lexical words and verb phrases etc. We have described case studies, evaluations and have given detailed error analysis for both Malayalam to English and English to Malayalam machine translation systems. We observed significant improvement in evaluations of translation quality. Lexical resources do help uplift performance when parallel corpora are scanty.
Tasks	Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1098/
PDF	https://www.aclweb.org/anthology/L16-1098
PWC	https://paperswithcode.com/paper/lexical-resources-to-enrich-english-malayalam
Repo
Framework

Quantifying sentence complexity based on eye-tracking measures


Title	Quantifying sentence complexity based on eye-tracking measures
Authors	Abhinav Deep Singh, Poojan Mehta, Samar Husain, Rajkumar Rajakrishnan
Abstract	Eye-tracking reading times have been attested to reflect cognitive processes underlying sentence comprehension. However, the use of reading times in NLP applications is an underexplored area of research. In this initial work we build an automatic system to assess sentence complexity using automatically predicted eye-tracking reading time measures and demonstrate the efficacy of these reading times for a well known NLP task, namely, readability assessment. We use a machine learning model and a set of features known to be significant predictors of reading times in order to learn per-word reading times from a corpus of English text having reading times of human readers. Subsequently, we use the model to predict reading times for novel text in the context of the aforementioned task. A model based only on reading times gave competitive results compared to the systems that use extensive syntactic features to compute linguistic complexity. Our work, to the best of our knowledge, is the first study to show that automatically predicted reading times can successfully model the difficulty of a text and can be deployed in practical text processing applications.
Tasks	Eye Tracking, Part-Of-Speech Tagging, Sarcasm Detection, Text Simplification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4123/
PDF	https://www.aclweb.org/anthology/W16-4123
PWC	https://paperswithcode.com/paper/quantifying-sentence-complexity-based-on-eye
Repo
Framework

A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge


Title	A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
Authors	Lilian D. A. Wanzare, Aless Zarcone, ra, Stefan Thater, Manfred Pinkal
Abstract	Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched with crowdsourced alignment annotation on a subset of the event descriptions, to be used in future work as seed data for automatic alignment of event descriptions (for example via clustering). The event descriptions to be aligned were chosen among those expected to have the strongest corrective effect on the clustering algorithm. The alignment annotation was evaluated against a gold standard of expert annotators. The resulting database of partially-aligned script-event descriptions provides a sound empirical basis for inducing high-quality script knowledge, as well as for any task involving alignment and paraphrase detection of events.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1556/
PDF	https://www.aclweb.org/anthology/L16-1556
PWC	https://paperswithcode.com/paper/a-crowdsourced-database-of-event-sequence
Repo
Framework

Bitextor’s participation in WMT’16: shared task on document alignment


Title	Bitextor’s participation in WMT’16: shared task on document alignment
Authors	Miquel Espl{`a}-Gomis, Mikel Forcada, Sergio Ortiz-Rojas, Jorge Ferr{'a}ndez-Tordera
Abstract
Tasks	Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2367/
PDF	https://www.aclweb.org/anthology/W16-2367
PWC	https://paperswithcode.com/paper/bitextors-participation-in-wmt16-shared-task
Repo
Framework

Answer Presentation in Question Answering over Linked Data using Typed Dependency Subtree Patterns


Title	Answer Presentation in Question Answering over Linked Data using Typed Dependency Subtree Patterns
Authors	Rivindu Perera, Parma Nand
Abstract
Tasks	Dependency Parsing, Information Retrieval, Question Answering
Published	2016-12-01
URL	https://www.aclweb.org/anthology/papers/W16-4406/w16-4406
PDF	https://www.aclweb.org/anthology/W16-4406
PWC	https://paperswithcode.com/paper/answer-presentation-in-question-answering
Repo
Framework

Enhancing STEM Motivation through Personal and Communal Values: NLP for Assessment of Utility Value in Student Writing


Title	Enhancing STEM Motivation through Personal and Communal Values: NLP for Assessment of Utility Value in Student Writing
Authors	Beata Beigman Klebanov, Jill Burstein, Judith Harackiewicz, Stacy Priniski, Matthew Mulholland
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/papers/W16-0522/w16-0522
PDF	https://www.aclweb.org/anthology/W16-0522
PWC	https://paperswithcode.com/paper/enhancing-stem-motivation-through-personal
Repo
Framework

MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier


Title	MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier
Authors	Shervin Malmasi, Marcos Zampieri
Abstract
Tasks	Complex Word Identification, Lexical Simplification, Text Classification, Text Simplification
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1153/
PDF	https://www.aclweb.org/anthology/S16-1153
PWC	https://paperswithcode.com/paper/maza-at-semeval-2016-task-11-detecting
Repo
Framework

Garuda & Bhasha at SemEval-2016 Task 11: Complex Word Identification Using Aggregated Learning Models


Title	Garuda & Bhasha at SemEval-2016 Task 11: Complex Word Identification Using Aggregated Learning Models
Authors	Prafulla Choubey, Shubham Pateria
Abstract
Tasks	Complex Word Identification, Lexical Simplification
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1156/
PDF	https://www.aclweb.org/anthology/S16-1156
PWC	https://paperswithcode.com/paper/garuda-bhasha-at-semeval-2016-task-11-complex
Repo
Framework

Using a Cross-Language Information Retrieval System based on OHSUMED to Evaluate the Moses and KantanMT Statistical Machine Translation Systems


Title	Using a Cross-Language Information Retrieval System based on OHSUMED to Evaluate the Moses and KantanMT Statistical Machine Translation Systems
Authors	Nikolaos Katris, Richard Sutcliffe, Theodore Kalamboukis
Abstract	The objective of this paper was to evaluate the performance of two statistical machine translation (SMT) systems within a cross-language information retrieval (CLIR) architecture and examine if there is a correlation between translation quality and CLIR performance. The SMT systems were KantanMT, a cloud-based machine translation (MT) platform, and Moses, an open-source MT application. First we trained both systems using the same language resources: the EMEA corpus for the translation model and language model and the QTLP corpus for tuning. Then we translated the 63 queries of the OHSUMED test collection from Greek into English using both MT systems. Next, we ran the queries on the document collection using Apache Solr to get a list of the top ten matches. The results were compared to the OHSUMED gold standard. KantanMT achieved higher average precision and F-measure than Moses, while both systems produced the same recall score. We also calculated the BLEU score for each system using the ECDC corpus. Moses achieved a higher BLEU score than KantanMT. Finally, we also tested the IR performance of the original English queries. This work overall showed that CLIR performance can be better even when BLEU score is worse.
Tasks	Information Retrieval, Language Modelling, Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1057/
PDF	https://www.aclweb.org/anthology/L16-1057
PWC	https://paperswithcode.com/paper/using-a-cross-language-information-retrieval
Repo
Framework

Improvement of VerbNet-like resources by frame typing


Title	Improvement of VerbNet-like resources by frame typing
Authors	Laurence Danlos, Matthieu Constant, Lucie Barque
Abstract	Verbenet is a French lexicon developed by {}translation{''} of its English counterpart {---} VerbNet (Kipper-Schuler, 2005){---}and treatment of the specificities of French syntax (Pradet et al., 2014; Danlos et al., 2016). One difficulty encountered in its development springs from the fact that the list of (potentially numerous) frames has no internal organization. This paper proposes a type system for frames that shows whether two frames are variants of a given alternation. Frame typing facilitates coherence checking of the resource in a {}virtuous circle{''}. We present the principles underlying a program we developed and used to automatically type frames in VerbeNet. We also show that our system is portable to other languages.
Tasks	Machine Translation, Question Answering, Stock Market Prediction
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3809/
PDF	https://www.aclweb.org/anthology/W16-3809
PWC	https://paperswithcode.com/paper/improvement-of-verbnet-like-resources-by
Repo
Framework

Enriching TimeBank: Towards a more precise annotation of temporal relations in a text


Title	Enriching TimeBank: Towards a more precise annotation of temporal relations in a text
Authors	Volker Gast, Bierk, Lennart t, Stephan Druskat, Christoph Rzymski
Abstract	We propose a way of enriching the TimeML annotations of TimeBank by adding information about the Topic Time in terms of Klein (1994). The annotations are partly automatic, partly inferential and partly manual. The corpus was converted into the native format of the annotation software GraphAnno and POS-tagged using the Stanford bidirectional dependency network tagger. On top of each finite verb, a FIN-node with tense information was created, and on top of any FIN-node, a TOPICTIME-node, in accordance with Klein{'}s (1994) treatment of finiteness as the linguistic correlate of the Topic Time. Each TOPICTIME-node is linked to a MAKEINSTANCE-node representing an (instantiated) event in TimeML (Pustejovsky et al. 2005), the markup language used for the annotation of TimeBank. For such links we introduce a new category, ELINK. ELINKs capture the relationship between the Topic Time (TT) and the Time of Situation (TSit) and have an aspectual interpretation in Klein{'}s (1994) theory. In addition to these automatic and inferential annotations, some TLINKs were added manually. Using an example from the corpus, we show that the inclusion of the Topic Time in the annotations allows for a richer representation of the temporal structure than does TimeML. A way of representing this structure in a diagrammatic form similar to the T-Box format (Verhagen, 2007) is proposed.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1608/
PDF	https://www.aclweb.org/anthology/L16-1608
PWC	https://paperswithcode.com/paper/enriching-timebank-towards-a-more-precise
Repo
Framework

Integrating Word Embedding Offsets into the Espresso System for Part-Whole Relation Extraction


Title	Integrating Word Embedding Offsets into the Espresso System for Part-Whole Relation Extraction
Authors	Van-Thuy Phi, Yuji Matsumoto
Abstract
Tasks	Relation Extraction
Published	2016-10-01
URL	https://www.aclweb.org/anthology/Y16-2015/
PDF	https://www.aclweb.org/anthology/Y16-2015
PWC	https://paperswithcode.com/paper/integrating-word-embedding-offsets-into-the
Repo
Framework

Scalable Statistical Relational Learning for NLP


Title	Scalable Statistical Relational Learning for NLP
Authors	William Yang Wang, William Cohen
Abstract
Tasks	Coreference Resolution, Relational Reasoning, Semantic Parsing, Sentiment Analysis, Text Classification
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-4005/
PDF	https://www.aclweb.org/anthology/N16-4005
PWC	https://paperswithcode.com/paper/scalable-statistical-relational-learning-for
Repo
Framework

Bootstrapping a Hybrid MT System to a New Language Pair


Title	Bootstrapping a Hybrid MT System to a New Language Pair
Authors	Jo{~a}o Ant{'o}nio Rodrigues, Nuno Rendeiro, Andreia Querido, Sanja {\v{S}}tajner, Ant{'o}nio Branco
Abstract	The usual concern when opting for a rule-based or a hybrid machine translation (MT) system is how much effort is required to adapt the system to a different language pair or a new domain. In this paper, we describe a way of adapting an existing hybrid MT system to a new language pair, and show that such a system can outperform a standard phrase-based statistical machine translation system with an average of 10 persons/month of work. This is specifically important in the case of domain-specific MT for which there is not enough parallel data for training a statistical machine translation system.
Tasks	Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1438/
PDF	https://www.aclweb.org/anthology/L16-1438
PWC	https://paperswithcode.com/paper/bootstrapping-a-hybrid-mt-system-to-a-new
Repo
Framework