Paper Group NANR 143
Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity. Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016. Understanding the Lexical Simplification Needs of Non-Native Speakers of English. Proceedings of the 12th International Workshop on Tree Adjoining Gram …
Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity
Title | Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity |
Authors | Santanu Pal, Sudip Kumar Naskar, Josef van Genabith |
Abstract | In this paper we combine two strands of machine translation (MT) research: automatic post-editing (APE) and multi-engine (system combination) MT. APE systems learn a target-language-side second stage MT system from the data produced by human corrected output of a first stage MT system, to improve the output of the first stage MT in what is essentially a sequential MT system combination architecture. At the same time, there is a rich research literature on parallel MT system combination where the same input is fed to multiple engines and the best output is selected or smaller sections of the outputs are combined to obtain improved translation output. In the paper we show that parallel system combination in the APE stage of a sequential MT-APE combination yields substantial translation improvements both measured in terms of automatic evaluation metrics as well as in terms of productivity improvements measured in a post-editing experiment. We also show that system combination on the level of APE alignments yields further improvements. Overall our APE system yields statistically significant improvement of 5.9{%} relative BLEU over a strong baseline (English{–}Italian Google MT) and 21.76{%} productivity increase in a human post-editing experiment with professional translators. |
Tasks | Automatic Post-Editing, Machine Translation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1241/ |
https://www.aclweb.org/anthology/C16-1241 | |
PWC | https://paperswithcode.com/paper/multi-engine-and-multi-alignment-based |
Repo | |
Framework | |
Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016
Title | Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016 |
Authors | Katsuhito Sudoh, Masaaki Nagata |
Abstract | This paper presents our Chinese-to-Japanese patent machine translation system for WAT 2016 (Group ID: ntt) that uses syntactic pre-ordering over Chinese dependency structures. Chinese words are reordered by a learning-to-rank model based on pairwise classification to obtain word order close to Japanese. In this year{'}s system, two different machine translation methods are compared: traditional phrase-based statistical machine translation and recent sequence-to-sequence neural machine translation with an attention mechanism. Our pre-ordering showed a significant improvement over the phrase-based baseline, but, in contrast, it degraded the neural machine translation baseline. |
Tasks | Chinese Word Segmentation, Dependency Parsing, Learning-To-Rank, Machine Translation, Part-Of-Speech Tagging |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4621/ |
https://www.aclweb.org/anthology/W16-4621 | |
PWC | https://paperswithcode.com/paper/chinese-to-japanese-patent-machine-1 |
Repo | |
Framework | |
Understanding the Lexical Simplification Needs of Non-Native Speakers of English
Title | Understanding the Lexical Simplification Needs of Non-Native Speakers of English |
Authors | Gustavo Paetzold, Lucia Specia |
Abstract | We report three user studies in which the Lexical Simplification needs of non-native English speakers are investigated. Our analyses feature valuable new insight on the relationship between the non-natives{'} notion of complexity and various morphological, semantic and lexical word properties. Some of our findings contradict long-standing misconceptions about word simplicity. The data produced in our studies consists of 211,564 annotations made by 1,100 volunteers, which we hope will guide forthcoming research on Text Simplification for non-native speakers of English. |
Tasks | Complex Word Identification, Lexical Simplification, Text Simplification |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1069/ |
https://www.aclweb.org/anthology/C16-1069 | |
PWC | https://paperswithcode.com/paper/understanding-the-lexical-simplification |
Repo | |
Framework | |
Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12)
Title | Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12) |
Authors | |
Abstract | |
Tasks | |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-3300/ |
https://www.aclweb.org/anthology/W16-3300 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-12th-international-2 |
Repo | |
Framework | |
Assigning Fine-grained PoS Tags based on High-precision Coarse-grained Tagging
Title | Assigning Fine-grained PoS Tags based on High-precision Coarse-grained Tagging |
Authors | Tobias Horsmann, Torsten Zesch |
Abstract | We propose a new approach to PoS tagging where in a first step, we assign a coarse-grained tag corresponding to the main syntactic category. Based on this high-precision decision, in the second step we utilize specially trained fine-grained models with heavily reduced decision complexity. By analyzing the system under oracle conditions, we show that there is a quite large potential for significantly outperforming a competitive baseline. When we take error-propagation from the coarse-grained tagging into account, our approach is on par with the state of the art. Our approach also allows tailoring the tagger towards recognizing single word classes which are of interest e.g. for researchers searching for specific phenomena in large corpora. In a case study, we significantly outperform a standard model that also makes use of the same optimizations. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1032/ |
https://www.aclweb.org/anthology/C16-1032 | |
PWC | https://paperswithcode.com/paper/assigning-fine-grained-pos-tags-based-on-high |
Repo | |
Framework | |
Accurate Deep Syntactic Parsing of Graphs: The Case of French
Title | Accurate Deep Syntactic Parsing of Graphs: The Case of French |
Authors | Corentin Ribeyre, Eric Villemonte de la Clergerie, Djam{'e} Seddah |
Abstract | Parsing predicate-argument structures in a deep syntax framework requires graphs to be predicted. Argument structures represent a higher level of abstraction than the syntactic ones and are thus more difficult to predict even for highly accurate parsing models on surfacic syntax. In this paper we investigate deep syntax parsing, using a French data set (Ribeyre et al., 2014a). We demonstrate that the use of topologically different types of syntactic features, such as dependencies, tree fragments, spines or syntactic paths, brings a much needed context to the parser. Our higher-order parsing model, gaining thus up to 4 points, establishes the state of the art for parsing French deep syntactic structures. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1566/ |
https://www.aclweb.org/anthology/L16-1566 | |
PWC | https://paperswithcode.com/paper/accurate-deep-syntactic-parsing-of-graphs-the |
Repo | |
Framework | |
FABIOLE, a Speech Database for Forensic Speaker Comparison
Title | FABIOLE, a Speech Database for Forensic Speaker Comparison |
Authors | Moez Ajili, Jean-Fran{\c{c}}ois Bonastre, Juliette Kahn, Solange Rossato, Guillaume Bernard |
Abstract | A speech database has been collected for use to highlight the importance of {}speaker factor{''} in forensic voice comparison. FABIOLE has been created during the FABIOLE project funded by the French Research Agency (ANR) from 2013 to 2016. This corpus consists in more than 3 thousands excerpts spoken by 130 French native male speakers. The speakers are divided into two categories: 30 target speakers who everyone has 100 excerpts and 100 { }impostors{''} who everyone has only one excerpt. The data were collected from 10 different French radio and television shows where each utterance turns with a minimum duration of 30s and has a good speech quality. The data set is mainly used for investigating speaker factor in forensic voice comparison and interpreting some unsolved issue such as the relationship between speaker characteristics and system behavior. In this paper, we present FABIOLE database. Then, preliminary experiments are performed to evaluate the effect of the {``}speaker factor{''} and the show on a voice comparison system behavior. | |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1115/ |
https://www.aclweb.org/anthology/L16-1115 | |
PWC | https://paperswithcode.com/paper/fabiole-a-speech-database-for-forensic |
Repo | |
Framework | |
Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance
Title | Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance |
Authors | Christian Buck, Philipp Koehn |
Abstract | |
Tasks | Graph Matching, Machine Translation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2365/ |
https://www.aclweb.org/anthology/W16-2365 | |
PWC | https://paperswithcode.com/paper/quick-and-reliable-document-alignment-via |
Repo | |
Framework | |
Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notes
Title | Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notes |
Authors | Sumithra Velupillai, Danielle L. Mowery, Mike Conway, John Hurdle, Brent Kious |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2912/ |
https://www.aclweb.org/anthology/W16-2912 | |
PWC | https://paperswithcode.com/paper/vocabulary-development-to-support-information |
Repo | |
Framework | |
Measuring the behavioral impact of machine translation quality improvements with A/B testing
Title | Measuring the behavioral impact of machine translation quality improvements with A/B testing |
Authors | Ben Russell, Duncan Gillespie |
Abstract | |
Tasks | Machine Translation |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1251/ |
https://www.aclweb.org/anthology/D16-1251 | |
PWC | https://paperswithcode.com/paper/measuring-the-behavioral-impact-of-machine |
Repo | |
Framework | |
Training Data Enrichment for Infrequent Discourse Relations
Title | Training Data Enrichment for Infrequent Discourse Relations |
Authors | Kailang Jiang, Giuseppe Carenini, Raymond Ng |
Abstract | Discourse parsing is a popular technique widely used in text understanding, sentiment analysis and other NLP tasks. However, for most discourse parsers, the performance varies significantly across different discourse relations. In this paper, we first validate the underfitting hypothesis, i.e., the less frequent a relation is in the training data, the poorer the performance on that relation. We then explore how to increase the number of positive training instances, without resorting to manually creating additional labeled data. We propose a training data enrichment framework that relies on co-training of two different discourse parsers on unlabeled documents. Importantly, we show that co-training alone is not sufficient. The framework requires a filtering step to ensure that only {``}good quality{''} unlabeled documents can be used for enrichment and re-training. We propose and evaluate two ways to perform the filtering. The first is to use an agreement score between the two parsers. The second is to use only the confidence score of the faster parser. Our empirical results show that agreement score can help to boost the performance on infrequent relations, and that the confidence score is a viable approximation of the agreement score for infrequent relations. | |
Tasks | Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1245/ |
https://www.aclweb.org/anthology/C16-1245 | |
PWC | https://paperswithcode.com/paper/training-data-enrichment-for-infrequent |
Repo | |
Framework | |
Expectation-Regulated Neural Model for Event Mention Extraction
Title | Expectation-Regulated Neural Model for Event Mention Extraction |
Authors | Ching-Yun Chang, Zhiyang Teng, Yue Zhang |
Abstract | |
Tasks | |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1045/ |
https://www.aclweb.org/anthology/N16-1045 | |
PWC | https://paperswithcode.com/paper/expectation-regulated-neural-model-for-event |
Repo | |
Framework | |
Language Independent Dependency to Constituent Tree Conversion
Title | Language Independent Dependency to Constituent Tree Conversion |
Authors | Young-Suk Lee, Zhiguo Wang |
Abstract | We present a dependency to constituent tree conversion technique that aims to improve constituent parsing accuracies by leveraging dependency treebanks available in a wide variety in many languages. The technique works in two steps. First, a partial constituent tree is derived from a dependency tree with a very simple deterministic algorithm that is both language and dependency type independent. Second, a complete high accuracy constituent tree is derived with a constraint-based parser, which uses the partial constituent tree as external constraints. Evaluated on Section 22 of the WSJ Treebank, the technique achieves the state-of-the-art conversion F-score 95.6. When applied to English Universal Dependency treebank and German CoNLL2006 treebank, the converted treebanks added to the human-annotated constituent parser training corpus improve parsing F-scores significantly for both languages. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1041/ |
https://www.aclweb.org/anthology/C16-1041 | |
PWC | https://paperswithcode.com/paper/language-independent-dependency-to |
Repo | |
Framework | |
Exact Recovery of Hard Thresholding Pursuit
Title | Exact Recovery of Hard Thresholding Pursuit |
Authors | Xiaotong Yuan, Ping Li, Tong Zhang |
Abstract | The Hard Thresholding Pursuit (HTP) is a class of truncated gradient descent methods for finding sparse solutions of $\ell_0$-constrained loss minimization problems. The HTP-style methods have been shown to have strong approximation guarantee and impressive numerical performance in high dimensional statistical learning applications. However, the current theoretical treatment of these methods has traditionally been restricted to the analysis of parameter estimation consistency. It remains an open problem to analyze the support recovery performance (a.k.a., sparsistency) of this type of methods for recovering the global minimizer of the original NP-hard problem. In this paper, we bridge this gap by showing, for the first time, that exact recovery of the global sparse minimizer is possible for HTP-style methods under restricted strong condition number bounding conditions. We further show that HTP-style methods are able to recover the support of certain relaxed sparse solutions without assuming bounded restricted strong condition number. Numerical results on simulated data confirms our theoretical predictions. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6432-exact-recovery-of-hard-thresholding-pursuit |
http://papers.nips.cc/paper/6432-exact-recovery-of-hard-thresholding-pursuit.pdf | |
PWC | https://paperswithcode.com/paper/exact-recovery-of-hard-thresholding-pursuit |
Repo | |
Framework | |
Promoting multiword expressions in A* TAG parsing
Title | Promoting multiword expressions in A* TAG parsing |
Authors | Jakub Waszczuk, Agata Savary, Yannick Parmentier |
Abstract | Multiword expressions (MWEs) are pervasive in natural languages and often have both idiomatic and compositional readings, which leads to high syntactic ambiguity. We show that for some MWE types idiomatic readings are usually the correct ones. We propose a heuristic for an A* parser for Tree Adjoining Grammars which benefits from this knowledge by promoting MWE-oriented analyses. This strategy leads to a substantial reduction in the parsing search space in case of true positive MWE occurrences, while avoiding parsing failures in case of false positives. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1042/ |
https://www.aclweb.org/anthology/C16-1042 | |
PWC | https://paperswithcode.com/paper/promoting-multiword-expressions-in-a-tag |
Repo | |
Framework | |