May 4, 2019

1943 words 10 mins read

Paper Group NANR 226

Paper Group NANR 226

BonTen' -- Corpus Concordance System for NINJAL Web Japanese Corpus’. Lexical Resources to Enrich English Malayalam Machine Translation. Quantifying sentence complexity based on eye-tracking measures. A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge. Bitextor’s participation in WMT’16: sh …

BonTen' -- Corpus Concordance System for NINJAL Web Japanese Corpus’

Title BonTen' -- Corpus Concordance System for NINJAL Web Japanese Corpus’
Authors Masayuki Asahara, Kazuya Kawahara, Yuya Takei, Hideto Masuoka, Yasuko Ohba, Yuki Torii, Toru Morii, Yuki Tanaka, Kikuo Maekawa, Sachi Kato, Hikari Konishi
Abstract The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named {`}BonTen{'} which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure. |
Tasks Morphological Analysis
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-2006/
PDF https://www.aclweb.org/anthology/C16-2006
PWC https://paperswithcode.com/paper/bonten-a-corpus-concordance-system-for-ninjal
Repo
Framework

Lexical Resources to Enrich English Malayalam Machine Translation

Title Lexical Resources to Enrich English Malayalam Machine Translation
Authors Sreelekha S, Pushpak Bhattacharyya
Abstract In this paper we present our work on the usage of lexical resources for the Machine Translation English and Malayalam. We describe a comparative performance between different Statistical Machine Translation (SMT) systems on top of phrase based SMT system as baseline. We explore different ways of utilizing lexical resources to improve the quality of English Malayalam statistical machine translation. In order to enrich the training corpus we have augmented the lexical resources in two ways (a) additional vocabulary and (b) inflected verbal forms. Lexical resources include IndoWordnet semantic relation set, lexical words and verb phrases etc. We have described case studies, evaluations and have given detailed error analysis for both Malayalam to English and English to Malayalam machine translation systems. We observed significant improvement in evaluations of translation quality. Lexical resources do help uplift performance when parallel corpora are scanty.
Tasks Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1098/
PDF https://www.aclweb.org/anthology/L16-1098
PWC https://paperswithcode.com/paper/lexical-resources-to-enrich-english-malayalam
Repo
Framework

Quantifying sentence complexity based on eye-tracking measures

Title Quantifying sentence complexity based on eye-tracking measures
Authors Abhinav Deep Singh, Poojan Mehta, Samar Husain, Rajkumar Rajakrishnan
Abstract Eye-tracking reading times have been attested to reflect cognitive processes underlying sentence comprehension. However, the use of reading times in NLP applications is an underexplored area of research. In this initial work we build an automatic system to assess sentence complexity using automatically predicted eye-tracking reading time measures and demonstrate the efficacy of these reading times for a well known NLP task, namely, readability assessment. We use a machine learning model and a set of features known to be significant predictors of reading times in order to learn per-word reading times from a corpus of English text having reading times of human readers. Subsequently, we use the model to predict reading times for novel text in the context of the aforementioned task. A model based only on reading times gave competitive results compared to the systems that use extensive syntactic features to compute linguistic complexity. Our work, to the best of our knowledge, is the first study to show that automatically predicted reading times can successfully model the difficulty of a text and can be deployed in practical text processing applications.
Tasks Eye Tracking, Part-Of-Speech Tagging, Sarcasm Detection, Text Simplification
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4123/
PDF https://www.aclweb.org/anthology/W16-4123
PWC https://paperswithcode.com/paper/quantifying-sentence-complexity-based-on-eye
Repo
Framework

A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge

Title A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
Authors Lilian D. A. Wanzare, Aless Zarcone, ra, Stefan Thater, Manfred Pinkal
Abstract Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched with crowdsourced alignment annotation on a subset of the event descriptions, to be used in future work as seed data for automatic alignment of event descriptions (for example via clustering). The event descriptions to be aligned were chosen among those expected to have the strongest corrective effect on the clustering algorithm. The alignment annotation was evaluated against a gold standard of expert annotators. The resulting database of partially-aligned script-event descriptions provides a sound empirical basis for inducing high-quality script knowledge, as well as for any task involving alignment and paraphrase detection of events.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1556/
PDF https://www.aclweb.org/anthology/L16-1556
PWC https://paperswithcode.com/paper/a-crowdsourced-database-of-event-sequence
Repo
Framework

Bitextor’s participation in WMT’16: shared task on document alignment

Title Bitextor’s participation in WMT’16: shared task on document alignment
Authors Miquel Espl{`a}-Gomis, Mikel Forcada, Sergio Ortiz-Rojas, Jorge Ferr{'a}ndez-Tordera
Abstract
Tasks Machine Translation
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2367/
PDF https://www.aclweb.org/anthology/W16-2367
PWC https://paperswithcode.com/paper/bitextors-participation-in-wmt16-shared-task
Repo
Framework

Answer Presentation in Question Answering over Linked Data using Typed Dependency Subtree Patterns

Title Answer Presentation in Question Answering over Linked Data using Typed Dependency Subtree Patterns
Authors Rivindu Perera, Parma Nand
Abstract
Tasks Dependency Parsing, Information Retrieval, Question Answering
Published 2016-12-01
URL https://www.aclweb.org/anthology/papers/W16-4406/w16-4406
PDF https://www.aclweb.org/anthology/W16-4406
PWC https://paperswithcode.com/paper/answer-presentation-in-question-answering
Repo
Framework

Enhancing STEM Motivation through Personal and Communal Values: NLP for Assessment of Utility Value in Student Writing

Title Enhancing STEM Motivation through Personal and Communal Values: NLP for Assessment of Utility Value in Student Writing
Authors Beata Beigman Klebanov, Jill Burstein, Judith Harackiewicz, Stacy Priniski, Matthew Mulholland
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/papers/W16-0522/w16-0522
PDF https://www.aclweb.org/anthology/W16-0522
PWC https://paperswithcode.com/paper/enhancing-stem-motivation-through-personal
Repo
Framework

MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier

Title MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier
Authors Shervin Malmasi, Marcos Zampieri
Abstract
Tasks Complex Word Identification, Lexical Simplification, Text Classification, Text Simplification
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1153/
PDF https://www.aclweb.org/anthology/S16-1153
PWC https://paperswithcode.com/paper/maza-at-semeval-2016-task-11-detecting
Repo
Framework

Garuda & Bhasha at SemEval-2016 Task 11: Complex Word Identification Using Aggregated Learning Models

Title Garuda & Bhasha at SemEval-2016 Task 11: Complex Word Identification Using Aggregated Learning Models
Authors Prafulla Choubey, Shubham Pateria
Abstract
Tasks Complex Word Identification, Lexical Simplification
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1156/
PDF https://www.aclweb.org/anthology/S16-1156
PWC https://paperswithcode.com/paper/garuda-bhasha-at-semeval-2016-task-11-complex
Repo
Framework

Using a Cross-Language Information Retrieval System based on OHSUMED to Evaluate the Moses and KantanMT Statistical Machine Translation Systems

Title Using a Cross-Language Information Retrieval System based on OHSUMED to Evaluate the Moses and KantanMT Statistical Machine Translation Systems
Authors Nikolaos Katris, Richard Sutcliffe, Theodore Kalamboukis
Abstract The objective of this paper was to evaluate the performance of two statistical machine translation (SMT) systems within a cross-language information retrieval (CLIR) architecture and examine if there is a correlation between translation quality and CLIR performance. The SMT systems were KantanMT, a cloud-based machine translation (MT) platform, and Moses, an open-source MT application. First we trained both systems using the same language resources: the EMEA corpus for the translation model and language model and the QTLP corpus for tuning. Then we translated the 63 queries of the OHSUMED test collection from Greek into English using both MT systems. Next, we ran the queries on the document collection using Apache Solr to get a list of the top ten matches. The results were compared to the OHSUMED gold standard. KantanMT achieved higher average precision and F-measure than Moses, while both systems produced the same recall score. We also calculated the BLEU score for each system using the ECDC corpus. Moses achieved a higher BLEU score than KantanMT. Finally, we also tested the IR performance of the original English queries. This work overall showed that CLIR performance can be better even when BLEU score is worse.
Tasks Information Retrieval, Language Modelling, Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1057/
PDF https://www.aclweb.org/anthology/L16-1057
PWC https://paperswithcode.com/paper/using-a-cross-language-information-retrieval
Repo
Framework

Improvement of VerbNet-like resources by frame typing

Title Improvement of VerbNet-like resources by frame typing
Authors Laurence Danlos, Matthieu Constant, Lucie Barque
Abstract Verbenet is a French lexicon developed by {}translation{''} of its English counterpart {---} VerbNet (Kipper-Schuler, 2005){---}and treatment of the specificities of French syntax (Pradet et al., 2014; Danlos et al., 2016). One difficulty encountered in its development springs from the fact that the list of (potentially numerous) frames has no internal organization. This paper proposes a type system for frames that shows whether two frames are variants of a given alternation. Frame typing facilitates coherence checking of the resource in a {}virtuous circle{''}. We present the principles underlying a program we developed and used to automatically type frames in VerbeNet. We also show that our system is portable to other languages.
Tasks Machine Translation, Question Answering, Stock Market Prediction
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3809/
PDF https://www.aclweb.org/anthology/W16-3809
PWC https://paperswithcode.com/paper/improvement-of-verbnet-like-resources-by
Repo
Framework

Enriching TimeBank: Towards a more precise annotation of temporal relations in a text

Title Enriching TimeBank: Towards a more precise annotation of temporal relations in a text
Authors Volker Gast, Bierk, Lennart t, Stephan Druskat, Christoph Rzymski
Abstract We propose a way of enriching the TimeML annotations of TimeBank by adding information about the Topic Time in terms of Klein (1994). The annotations are partly automatic, partly inferential and partly manual. The corpus was converted into the native format of the annotation software GraphAnno and POS-tagged using the Stanford bidirectional dependency network tagger. On top of each finite verb, a FIN-node with tense information was created, and on top of any FIN-node, a TOPICTIME-node, in accordance with Klein{'}s (1994) treatment of finiteness as the linguistic correlate of the Topic Time. Each TOPICTIME-node is linked to a MAKEINSTANCE-node representing an (instantiated) event in TimeML (Pustejovsky et al. 2005), the markup language used for the annotation of TimeBank. For such links we introduce a new category, ELINK. ELINKs capture the relationship between the Topic Time (TT) and the Time of Situation (TSit) and have an aspectual interpretation in Klein{'}s (1994) theory. In addition to these automatic and inferential annotations, some TLINKs were added manually. Using an example from the corpus, we show that the inclusion of the Topic Time in the annotations allows for a richer representation of the temporal structure than does TimeML. A way of representing this structure in a diagrammatic form similar to the T-Box format (Verhagen, 2007) is proposed.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1608/
PDF https://www.aclweb.org/anthology/L16-1608
PWC https://paperswithcode.com/paper/enriching-timebank-towards-a-more-precise
Repo
Framework

Integrating Word Embedding Offsets into the Espresso System for Part-Whole Relation Extraction

Title Integrating Word Embedding Offsets into the Espresso System for Part-Whole Relation Extraction
Authors Van-Thuy Phi, Yuji Matsumoto
Abstract
Tasks Relation Extraction
Published 2016-10-01
URL https://www.aclweb.org/anthology/Y16-2015/
PDF https://www.aclweb.org/anthology/Y16-2015
PWC https://paperswithcode.com/paper/integrating-word-embedding-offsets-into-the
Repo
Framework

Scalable Statistical Relational Learning for NLP

Title Scalable Statistical Relational Learning for NLP
Authors William Yang Wang, William Cohen
Abstract
Tasks Coreference Resolution, Relational Reasoning, Semantic Parsing, Sentiment Analysis, Text Classification
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-4005/
PDF https://www.aclweb.org/anthology/N16-4005
PWC https://paperswithcode.com/paper/scalable-statistical-relational-learning-for
Repo
Framework

Bootstrapping a Hybrid MT System to a New Language Pair

Title Bootstrapping a Hybrid MT System to a New Language Pair
Authors Jo{~a}o Ant{'o}nio Rodrigues, Nuno Rendeiro, Andreia Querido, Sanja {\v{S}}tajner, Ant{'o}nio Branco
Abstract The usual concern when opting for a rule-based or a hybrid machine translation (MT) system is how much effort is required to adapt the system to a different language pair or a new domain. In this paper, we describe a way of adapting an existing hybrid MT system to a new language pair, and show that such a system can outperform a standard phrase-based statistical machine translation system with an average of 10 persons/month of work. This is specifically important in the case of domain-specific MT for which there is not enough parallel data for training a statistical machine translation system.
Tasks Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1438/
PDF https://www.aclweb.org/anthology/L16-1438
PWC https://paperswithcode.com/paper/bootstrapping-a-hybrid-mt-system-to-a-new
Repo
Framework
comments powered by Disqus