Paper Group NANR 226
BonTen' -- Corpus Concordance System for
NINJAL Web Japanese Corpus’. Lexical Resources to Enrich English Malayalam Machine Translation. Quantifying sentence complexity based on eye-tracking measures. A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge. Bitextor’s participation in WMT’16: sh …
BonTen' -- Corpus Concordance System for
NINJAL Web Japanese Corpus’
Title | BonTen' -- Corpus Concordance System for NINJAL Web Japanese Corpus’ |
Authors | Masayuki Asahara, Kazuya Kawahara, Yuya Takei, Hideto Masuoka, Yasuko Ohba, Yuki Torii, Toru Morii, Yuki Tanaka, Kikuo Maekawa, Sachi Kato, Hikari Konishi |
Abstract | The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named {`}BonTen{'} which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure. | |
Tasks | Morphological Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2006/ |
https://www.aclweb.org/anthology/C16-2006 | |
PWC | https://paperswithcode.com/paper/bonten-a-corpus-concordance-system-for-ninjal |
Repo | |
Framework | |
Lexical Resources to Enrich English Malayalam Machine Translation
Title | Lexical Resources to Enrich English Malayalam Machine Translation |
Authors | Sreelekha S, Pushpak Bhattacharyya |
Abstract | In this paper we present our work on the usage of lexical resources for the Machine Translation English and Malayalam. We describe a comparative performance between different Statistical Machine Translation (SMT) systems on top of phrase based SMT system as baseline. We explore different ways of utilizing lexical resources to improve the quality of English Malayalam statistical machine translation. In order to enrich the training corpus we have augmented the lexical resources in two ways (a) additional vocabulary and (b) inflected verbal forms. Lexical resources include IndoWordnet semantic relation set, lexical words and verb phrases etc. We have described case studies, evaluations and have given detailed error analysis for both Malayalam to English and English to Malayalam machine translation systems. We observed significant improvement in evaluations of translation quality. Lexical resources do help uplift performance when parallel corpora are scanty. |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1098/ |
https://www.aclweb.org/anthology/L16-1098 | |
PWC | https://paperswithcode.com/paper/lexical-resources-to-enrich-english-malayalam |
Repo | |
Framework | |
Quantifying sentence complexity based on eye-tracking measures
Title | Quantifying sentence complexity based on eye-tracking measures |
Authors | Abhinav Deep Singh, Poojan Mehta, Samar Husain, Rajkumar Rajakrishnan |
Abstract | Eye-tracking reading times have been attested to reflect cognitive processes underlying sentence comprehension. However, the use of reading times in NLP applications is an underexplored area of research. In this initial work we build an automatic system to assess sentence complexity using automatically predicted eye-tracking reading time measures and demonstrate the efficacy of these reading times for a well known NLP task, namely, readability assessment. We use a machine learning model and a set of features known to be significant predictors of reading times in order to learn per-word reading times from a corpus of English text having reading times of human readers. Subsequently, we use the model to predict reading times for novel text in the context of the aforementioned task. A model based only on reading times gave competitive results compared to the systems that use extensive syntactic features to compute linguistic complexity. Our work, to the best of our knowledge, is the first study to show that automatically predicted reading times can successfully model the difficulty of a text and can be deployed in practical text processing applications. |
Tasks | Eye Tracking, Part-Of-Speech Tagging, Sarcasm Detection, Text Simplification |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4123/ |
https://www.aclweb.org/anthology/W16-4123 | |
PWC | https://paperswithcode.com/paper/quantifying-sentence-complexity-based-on-eye |
Repo | |
Framework | |
A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
Title | A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge |
Authors | Lilian D. A. Wanzare, Aless Zarcone, ra, Stefan Thater, Manfred Pinkal |
Abstract | Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched with crowdsourced alignment annotation on a subset of the event descriptions, to be used in future work as seed data for automatic alignment of event descriptions (for example via clustering). The event descriptions to be aligned were chosen among those expected to have the strongest corrective effect on the clustering algorithm. The alignment annotation was evaluated against a gold standard of expert annotators. The resulting database of partially-aligned script-event descriptions provides a sound empirical basis for inducing high-quality script knowledge, as well as for any task involving alignment and paraphrase detection of events. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1556/ |
https://www.aclweb.org/anthology/L16-1556 | |
PWC | https://paperswithcode.com/paper/a-crowdsourced-database-of-event-sequence |
Repo | |
Framework | |
Bitextor’s participation in WMT’16: shared task on document alignment
Title | Bitextor’s participation in WMT’16: shared task on document alignment |
Authors | Miquel Espl{`a}-Gomis, Mikel Forcada, Sergio Ortiz-Rojas, Jorge Ferr{'a}ndez-Tordera |
Abstract | |
Tasks | Machine Translation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2367/ |
https://www.aclweb.org/anthology/W16-2367 | |
PWC | https://paperswithcode.com/paper/bitextors-participation-in-wmt16-shared-task |
Repo | |
Framework | |
Answer Presentation in Question Answering over Linked Data using Typed Dependency Subtree Patterns
Title | Answer Presentation in Question Answering over Linked Data using Typed Dependency Subtree Patterns |
Authors | Rivindu Perera, Parma Nand |
Abstract | |
Tasks | Dependency Parsing, Information Retrieval, Question Answering |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/papers/W16-4406/w16-4406 |
https://www.aclweb.org/anthology/W16-4406 | |
PWC | https://paperswithcode.com/paper/answer-presentation-in-question-answering |
Repo | |
Framework | |
Enhancing STEM Motivation through Personal and Communal Values: NLP for Assessment of Utility Value in Student Writing
Title | Enhancing STEM Motivation through Personal and Communal Values: NLP for Assessment of Utility Value in Student Writing |
Authors | Beata Beigman Klebanov, Jill Burstein, Judith Harackiewicz, Stacy Priniski, Matthew Mulholland |
Abstract | |
Tasks | |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/papers/W16-0522/w16-0522 |
https://www.aclweb.org/anthology/W16-0522 | |
PWC | https://paperswithcode.com/paper/enhancing-stem-motivation-through-personal |
Repo | |
Framework | |
MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier
Title | MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier |
Authors | Shervin Malmasi, Marcos Zampieri |
Abstract | |
Tasks | Complex Word Identification, Lexical Simplification, Text Classification, Text Simplification |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1153/ |
https://www.aclweb.org/anthology/S16-1153 | |
PWC | https://paperswithcode.com/paper/maza-at-semeval-2016-task-11-detecting |
Repo | |
Framework | |
Garuda & Bhasha at SemEval-2016 Task 11: Complex Word Identification Using Aggregated Learning Models
Title | Garuda & Bhasha at SemEval-2016 Task 11: Complex Word Identification Using Aggregated Learning Models |
Authors | Prafulla Choubey, Shubham Pateria |
Abstract | |
Tasks | Complex Word Identification, Lexical Simplification |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1156/ |
https://www.aclweb.org/anthology/S16-1156 | |
PWC | https://paperswithcode.com/paper/garuda-bhasha-at-semeval-2016-task-11-complex |
Repo | |
Framework | |
Using a Cross-Language Information Retrieval System based on OHSUMED to Evaluate the Moses and KantanMT Statistical Machine Translation Systems
Title | Using a Cross-Language Information Retrieval System based on OHSUMED to Evaluate the Moses and KantanMT Statistical Machine Translation Systems |
Authors | Nikolaos Katris, Richard Sutcliffe, Theodore Kalamboukis |
Abstract | The objective of this paper was to evaluate the performance of two statistical machine translation (SMT) systems within a cross-language information retrieval (CLIR) architecture and examine if there is a correlation between translation quality and CLIR performance. The SMT systems were KantanMT, a cloud-based machine translation (MT) platform, and Moses, an open-source MT application. First we trained both systems using the same language resources: the EMEA corpus for the translation model and language model and the QTLP corpus for tuning. Then we translated the 63 queries of the OHSUMED test collection from Greek into English using both MT systems. Next, we ran the queries on the document collection using Apache Solr to get a list of the top ten matches. The results were compared to the OHSUMED gold standard. KantanMT achieved higher average precision and F-measure than Moses, while both systems produced the same recall score. We also calculated the BLEU score for each system using the ECDC corpus. Moses achieved a higher BLEU score than KantanMT. Finally, we also tested the IR performance of the original English queries. This work overall showed that CLIR performance can be better even when BLEU score is worse. |
Tasks | Information Retrieval, Language Modelling, Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1057/ |
https://www.aclweb.org/anthology/L16-1057 | |
PWC | https://paperswithcode.com/paper/using-a-cross-language-information-retrieval |
Repo | |
Framework | |
Improvement of VerbNet-like resources by frame typing
Title | Improvement of VerbNet-like resources by frame typing |
Authors | Laurence Danlos, Matthieu Constant, Lucie Barque |
Abstract | Verbenet is a French lexicon developed by {}translation{''} of its English counterpart {---} VerbNet (Kipper-Schuler, 2005){---}and treatment of the specificities of French syntax (Pradet et al., 2014; Danlos et al., 2016). One difficulty encountered in its development springs from the fact that the list of (potentially numerous) frames has no internal organization. This paper proposes a type system for frames that shows whether two frames are variants of a given alternation. Frame typing facilitates coherence checking of the resource in a { }virtuous circle{''}. We present the principles underlying a program we developed and used to automatically type frames in VerbeNet. We also show that our system is portable to other languages. |
Tasks | Machine Translation, Question Answering, Stock Market Prediction |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3809/ |
https://www.aclweb.org/anthology/W16-3809 | |
PWC | https://paperswithcode.com/paper/improvement-of-verbnet-like-resources-by |
Repo | |
Framework | |
Enriching TimeBank: Towards a more precise annotation of temporal relations in a text
Title | Enriching TimeBank: Towards a more precise annotation of temporal relations in a text |
Authors | Volker Gast, Bierk, Lennart t, Stephan Druskat, Christoph Rzymski |
Abstract | We propose a way of enriching the TimeML annotations of TimeBank by adding information about the Topic Time in terms of Klein (1994). The annotations are partly automatic, partly inferential and partly manual. The corpus was converted into the native format of the annotation software GraphAnno and POS-tagged using the Stanford bidirectional dependency network tagger. On top of each finite verb, a FIN-node with tense information was created, and on top of any FIN-node, a TOPICTIME-node, in accordance with Klein{'}s (1994) treatment of finiteness as the linguistic correlate of the Topic Time. Each TOPICTIME-node is linked to a MAKEINSTANCE-node representing an (instantiated) event in TimeML (Pustejovsky et al. 2005), the markup language used for the annotation of TimeBank. For such links we introduce a new category, ELINK. ELINKs capture the relationship between the Topic Time (TT) and the Time of Situation (TSit) and have an aspectual interpretation in Klein{'}s (1994) theory. In addition to these automatic and inferential annotations, some TLINKs were added manually. Using an example from the corpus, we show that the inclusion of the Topic Time in the annotations allows for a richer representation of the temporal structure than does TimeML. A way of representing this structure in a diagrammatic form similar to the T-Box format (Verhagen, 2007) is proposed. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1608/ |
https://www.aclweb.org/anthology/L16-1608 | |
PWC | https://paperswithcode.com/paper/enriching-timebank-towards-a-more-precise |
Repo | |
Framework | |
Integrating Word Embedding Offsets into the Espresso System for Part-Whole Relation Extraction
Title | Integrating Word Embedding Offsets into the Espresso System for Part-Whole Relation Extraction |
Authors | Van-Thuy Phi, Yuji Matsumoto |
Abstract | |
Tasks | Relation Extraction |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/Y16-2015/ |
https://www.aclweb.org/anthology/Y16-2015 | |
PWC | https://paperswithcode.com/paper/integrating-word-embedding-offsets-into-the |
Repo | |
Framework | |
Scalable Statistical Relational Learning for NLP
Title | Scalable Statistical Relational Learning for NLP |
Authors | William Yang Wang, William Cohen |
Abstract | |
Tasks | Coreference Resolution, Relational Reasoning, Semantic Parsing, Sentiment Analysis, Text Classification |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-4005/ |
https://www.aclweb.org/anthology/N16-4005 | |
PWC | https://paperswithcode.com/paper/scalable-statistical-relational-learning-for |
Repo | |
Framework | |
Bootstrapping a Hybrid MT System to a New Language Pair
Title | Bootstrapping a Hybrid MT System to a New Language Pair |
Authors | Jo{~a}o Ant{'o}nio Rodrigues, Nuno Rendeiro, Andreia Querido, Sanja {\v{S}}tajner, Ant{'o}nio Branco |
Abstract | The usual concern when opting for a rule-based or a hybrid machine translation (MT) system is how much effort is required to adapt the system to a different language pair or a new domain. In this paper, we describe a way of adapting an existing hybrid MT system to a new language pair, and show that such a system can outperform a standard phrase-based statistical machine translation system with an average of 10 persons/month of work. This is specifically important in the case of domain-specific MT for which there is not enough parallel data for training a statistical machine translation system. |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1438/ |
https://www.aclweb.org/anthology/L16-1438 | |
PWC | https://paperswithcode.com/paper/bootstrapping-a-hybrid-mt-system-to-a-new |
Repo | |
Framework | |