May 5, 2019

1978 words 10 mins read

Paper Group NANR 143

Paper Group NANR 143

Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity. Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016. Understanding the Lexical Simplification Needs of Non-Native Speakers of English. Proceedings of the 12th International Workshop on Tree Adjoining Gram …

Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity

Title Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity
Authors Santanu Pal, Sudip Kumar Naskar, Josef van Genabith
Abstract In this paper we combine two strands of machine translation (MT) research: automatic post-editing (APE) and multi-engine (system combination) MT. APE systems learn a target-language-side second stage MT system from the data produced by human corrected output of a first stage MT system, to improve the output of the first stage MT in what is essentially a sequential MT system combination architecture. At the same time, there is a rich research literature on parallel MT system combination where the same input is fed to multiple engines and the best output is selected or smaller sections of the outputs are combined to obtain improved translation output. In the paper we show that parallel system combination in the APE stage of a sequential MT-APE combination yields substantial translation improvements both measured in terms of automatic evaluation metrics as well as in terms of productivity improvements measured in a post-editing experiment. We also show that system combination on the level of APE alignments yields further improvements. Overall our APE system yields statistically significant improvement of 5.9{%} relative BLEU over a strong baseline (English{–}Italian Google MT) and 21.76{%} productivity increase in a human post-editing experiment with professional translators.
Tasks Automatic Post-Editing, Machine Translation
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1241/
PDF https://www.aclweb.org/anthology/C16-1241
PWC https://paperswithcode.com/paper/multi-engine-and-multi-alignment-based
Repo
Framework

Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016

Title Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016
Authors Katsuhito Sudoh, Masaaki Nagata
Abstract This paper presents our Chinese-to-Japanese patent machine translation system for WAT 2016 (Group ID: ntt) that uses syntactic pre-ordering over Chinese dependency structures. Chinese words are reordered by a learning-to-rank model based on pairwise classification to obtain word order close to Japanese. In this year{'}s system, two different machine translation methods are compared: traditional phrase-based statistical machine translation and recent sequence-to-sequence neural machine translation with an attention mechanism. Our pre-ordering showed a significant improvement over the phrase-based baseline, but, in contrast, it degraded the neural machine translation baseline.
Tasks Chinese Word Segmentation, Dependency Parsing, Learning-To-Rank, Machine Translation, Part-Of-Speech Tagging
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4621/
PDF https://www.aclweb.org/anthology/W16-4621
PWC https://paperswithcode.com/paper/chinese-to-japanese-patent-machine-1
Repo
Framework

Understanding the Lexical Simplification Needs of Non-Native Speakers of English

Title Understanding the Lexical Simplification Needs of Non-Native Speakers of English
Authors Gustavo Paetzold, Lucia Specia
Abstract We report three user studies in which the Lexical Simplification needs of non-native English speakers are investigated. Our analyses feature valuable new insight on the relationship between the non-natives{'} notion of complexity and various morphological, semantic and lexical word properties. Some of our findings contradict long-standing misconceptions about word simplicity. The data produced in our studies consists of 211,564 annotations made by 1,100 volunteers, which we hope will guide forthcoming research on Text Simplification for non-native speakers of English.
Tasks Complex Word Identification, Lexical Simplification, Text Simplification
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1069/
PDF https://www.aclweb.org/anthology/C16-1069
PWC https://paperswithcode.com/paper/understanding-the-lexical-simplification
Repo
Framework
Title Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12)
Authors
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-3300/
PDF https://www.aclweb.org/anthology/W16-3300
PWC https://paperswithcode.com/paper/proceedings-of-the-12th-international-2
Repo
Framework

Assigning Fine-grained PoS Tags based on High-precision Coarse-grained Tagging

Title Assigning Fine-grained PoS Tags based on High-precision Coarse-grained Tagging
Authors Tobias Horsmann, Torsten Zesch
Abstract We propose a new approach to PoS tagging where in a first step, we assign a coarse-grained tag corresponding to the main syntactic category. Based on this high-precision decision, in the second step we utilize specially trained fine-grained models with heavily reduced decision complexity. By analyzing the system under oracle conditions, we show that there is a quite large potential for significantly outperforming a competitive baseline. When we take error-propagation from the coarse-grained tagging into account, our approach is on par with the state of the art. Our approach also allows tailoring the tagger towards recognizing single word classes which are of interest e.g. for researchers searching for specific phenomena in large corpora. In a case study, we significantly outperform a standard model that also makes use of the same optimizations.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1032/
PDF https://www.aclweb.org/anthology/C16-1032
PWC https://paperswithcode.com/paper/assigning-fine-grained-pos-tags-based-on-high
Repo
Framework

Accurate Deep Syntactic Parsing of Graphs: The Case of French

Title Accurate Deep Syntactic Parsing of Graphs: The Case of French
Authors Corentin Ribeyre, Eric Villemonte de la Clergerie, Djam{'e} Seddah
Abstract Parsing predicate-argument structures in a deep syntax framework requires graphs to be predicted. Argument structures represent a higher level of abstraction than the syntactic ones and are thus more difficult to predict even for highly accurate parsing models on surfacic syntax. In this paper we investigate deep syntax parsing, using a French data set (Ribeyre et al., 2014a). We demonstrate that the use of topologically different types of syntactic features, such as dependencies, tree fragments, spines or syntactic paths, brings a much needed context to the parser. Our higher-order parsing model, gaining thus up to 4 points, establishes the state of the art for parsing French deep syntactic structures.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1566/
PDF https://www.aclweb.org/anthology/L16-1566
PWC https://paperswithcode.com/paper/accurate-deep-syntactic-parsing-of-graphs-the
Repo
Framework

FABIOLE, a Speech Database for Forensic Speaker Comparison

Title FABIOLE, a Speech Database for Forensic Speaker Comparison
Authors Moez Ajili, Jean-Fran{\c{c}}ois Bonastre, Juliette Kahn, Solange Rossato, Guillaume Bernard
Abstract A speech database has been collected for use to highlight the importance of {}speaker factor{''} in forensic voice comparison. FABIOLE has been created during the FABIOLE project funded by the French Research Agency (ANR) from 2013 to 2016. This corpus consists in more than 3 thousands excerpts spoken by 130 French native male speakers. The speakers are divided into two categories: 30 target speakers who everyone has 100 excerpts and 100 {}impostors{''} who everyone has only one excerpt. The data were collected from 10 different French radio and television shows where each utterance turns with a minimum duration of 30s and has a good speech quality. The data set is mainly used for investigating speaker factor in forensic voice comparison and interpreting some unsolved issue such as the relationship between speaker characteristics and system behavior. In this paper, we present FABIOLE database. Then, preliminary experiments are performed to evaluate the effect of the {``}speaker factor{''} and the show on a voice comparison system behavior. |
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1115/
PDF https://www.aclweb.org/anthology/L16-1115
PWC https://paperswithcode.com/paper/fabiole-a-speech-database-for-forensic
Repo
Framework

Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance

Title Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance
Authors Christian Buck, Philipp Koehn
Abstract
Tasks Graph Matching, Machine Translation
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2365/
PDF https://www.aclweb.org/anthology/W16-2365
PWC https://paperswithcode.com/paper/quick-and-reliable-document-alignment-via
Repo
Framework

Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notes

Title Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notes
Authors Sumithra Velupillai, Danielle L. Mowery, Mike Conway, John Hurdle, Brent Kious
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2912/
PDF https://www.aclweb.org/anthology/W16-2912
PWC https://paperswithcode.com/paper/vocabulary-development-to-support-information
Repo
Framework

Measuring the behavioral impact of machine translation quality improvements with A/B testing

Title Measuring the behavioral impact of machine translation quality improvements with A/B testing
Authors Ben Russell, Duncan Gillespie
Abstract
Tasks Machine Translation
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1251/
PDF https://www.aclweb.org/anthology/D16-1251
PWC https://paperswithcode.com/paper/measuring-the-behavioral-impact-of-machine
Repo
Framework

Training Data Enrichment for Infrequent Discourse Relations

Title Training Data Enrichment for Infrequent Discourse Relations
Authors Kailang Jiang, Giuseppe Carenini, Raymond Ng
Abstract Discourse parsing is a popular technique widely used in text understanding, sentiment analysis and other NLP tasks. However, for most discourse parsers, the performance varies significantly across different discourse relations. In this paper, we first validate the underfitting hypothesis, i.e., the less frequent a relation is in the training data, the poorer the performance on that relation. We then explore how to increase the number of positive training instances, without resorting to manually creating additional labeled data. We propose a training data enrichment framework that relies on co-training of two different discourse parsers on unlabeled documents. Importantly, we show that co-training alone is not sufficient. The framework requires a filtering step to ensure that only {``}good quality{''} unlabeled documents can be used for enrichment and re-training. We propose and evaluate two ways to perform the filtering. The first is to use an agreement score between the two parsers. The second is to use only the confidence score of the faster parser. Our empirical results show that agreement score can help to boost the performance on infrequent relations, and that the confidence score is a viable approximation of the agreement score for infrequent relations. |
Tasks Sentiment Analysis
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1245/
PDF https://www.aclweb.org/anthology/C16-1245
PWC https://paperswithcode.com/paper/training-data-enrichment-for-infrequent
Repo
Framework

Expectation-Regulated Neural Model for Event Mention Extraction

Title Expectation-Regulated Neural Model for Event Mention Extraction
Authors Ching-Yun Chang, Zhiyang Teng, Yue Zhang
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1045/
PDF https://www.aclweb.org/anthology/N16-1045
PWC https://paperswithcode.com/paper/expectation-regulated-neural-model-for-event
Repo
Framework

Language Independent Dependency to Constituent Tree Conversion

Title Language Independent Dependency to Constituent Tree Conversion
Authors Young-Suk Lee, Zhiguo Wang
Abstract We present a dependency to constituent tree conversion technique that aims to improve constituent parsing accuracies by leveraging dependency treebanks available in a wide variety in many languages. The technique works in two steps. First, a partial constituent tree is derived from a dependency tree with a very simple deterministic algorithm that is both language and dependency type independent. Second, a complete high accuracy constituent tree is derived with a constraint-based parser, which uses the partial constituent tree as external constraints. Evaluated on Section 22 of the WSJ Treebank, the technique achieves the state-of-the-art conversion F-score 95.6. When applied to English Universal Dependency treebank and German CoNLL2006 treebank, the converted treebanks added to the human-annotated constituent parser training corpus improve parsing F-scores significantly for both languages.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1041/
PDF https://www.aclweb.org/anthology/C16-1041
PWC https://paperswithcode.com/paper/language-independent-dependency-to
Repo
Framework

Exact Recovery of Hard Thresholding Pursuit

Title Exact Recovery of Hard Thresholding Pursuit
Authors Xiaotong Yuan, Ping Li, Tong Zhang
Abstract The Hard Thresholding Pursuit (HTP) is a class of truncated gradient descent methods for finding sparse solutions of $\ell_0$-constrained loss minimization problems. The HTP-style methods have been shown to have strong approximation guarantee and impressive numerical performance in high dimensional statistical learning applications. However, the current theoretical treatment of these methods has traditionally been restricted to the analysis of parameter estimation consistency. It remains an open problem to analyze the support recovery performance (a.k.a., sparsistency) of this type of methods for recovering the global minimizer of the original NP-hard problem. In this paper, we bridge this gap by showing, for the first time, that exact recovery of the global sparse minimizer is possible for HTP-style methods under restricted strong condition number bounding conditions. We further show that HTP-style methods are able to recover the support of certain relaxed sparse solutions without assuming bounded restricted strong condition number. Numerical results on simulated data confirms our theoretical predictions.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6432-exact-recovery-of-hard-thresholding-pursuit
PDF http://papers.nips.cc/paper/6432-exact-recovery-of-hard-thresholding-pursuit.pdf
PWC https://paperswithcode.com/paper/exact-recovery-of-hard-thresholding-pursuit
Repo
Framework

Promoting multiword expressions in A* TAG parsing

Title Promoting multiword expressions in A* TAG parsing
Authors Jakub Waszczuk, Agata Savary, Yannick Parmentier
Abstract Multiword expressions (MWEs) are pervasive in natural languages and often have both idiomatic and compositional readings, which leads to high syntactic ambiguity. We show that for some MWE types idiomatic readings are usually the correct ones. We propose a heuristic for an A* parser for Tree Adjoining Grammars which benefits from this knowledge by promoting MWE-oriented analyses. This strategy leads to a substantial reduction in the parsing search space in case of true positive MWE occurrences, while avoiding parsing failures in case of false positives.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1042/
PDF https://www.aclweb.org/anthology/C16-1042
PWC https://paperswithcode.com/paper/promoting-multiword-expressions-in-a-tag
Repo
Framework
comments powered by Disqus