May 5, 2019

1978 words 10 mins read

Paper Group NANR 143

Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity. Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016. Understanding the Lexical Simplification Needs of Non-Native Speakers of English. Proceedings of the 12th International Workshop on Tree Adjoining Gram …

Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity


Title	Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity
Authors	Santanu Pal, Sudip Kumar Naskar, Josef van Genabith
Abstract	In this paper we combine two strands of machine translation (MT) research: automatic post-editing (APE) and multi-engine (system combination) MT. APE systems learn a target-language-side second stage MT system from the data produced by human corrected output of a first stage MT system, to improve the output of the first stage MT in what is essentially a sequential MT system combination architecture. At the same time, there is a rich research literature on parallel MT system combination where the same input is fed to multiple engines and the best output is selected or smaller sections of the outputs are combined to obtain improved translation output. In the paper we show that parallel system combination in the APE stage of a sequential MT-APE combination yields substantial translation improvements both measured in terms of automatic evaluation metrics as well as in terms of productivity improvements measured in a post-editing experiment. We also show that system combination on the level of APE alignments yields further improvements. Overall our APE system yields statistically significant improvement of 5.9{%} relative BLEU over a strong baseline (English{–}Italian Google MT) and 21.76{%} productivity increase in a human post-editing experiment with professional translators.
Tasks	Automatic Post-Editing, Machine Translation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1241/
PDF	https://www.aclweb.org/anthology/C16-1241
PWC	https://paperswithcode.com/paper/multi-engine-and-multi-alignment-based
Repo
Framework

Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016


Title	Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016
Authors	Katsuhito Sudoh, Masaaki Nagata
Abstract	This paper presents our Chinese-to-Japanese patent machine translation system for WAT 2016 (Group ID: ntt) that uses syntactic pre-ordering over Chinese dependency structures. Chinese words are reordered by a learning-to-rank model based on pairwise classification to obtain word order close to Japanese. In this year{'}s system, two different machine translation methods are compared: traditional phrase-based statistical machine translation and recent sequence-to-sequence neural machine translation with an attention mechanism. Our pre-ordering showed a significant improvement over the phrase-based baseline, but, in contrast, it degraded the neural machine translation baseline.
Tasks	Chinese Word Segmentation, Dependency Parsing, Learning-To-Rank, Machine Translation, Part-Of-Speech Tagging
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4621/
PDF	https://www.aclweb.org/anthology/W16-4621
PWC	https://paperswithcode.com/paper/chinese-to-japanese-patent-machine-1
Repo
Framework

Understanding the Lexical Simplification Needs of Non-Native Speakers of English


Title	Understanding the Lexical Simplification Needs of Non-Native Speakers of English
Authors	Gustavo Paetzold, Lucia Specia
Abstract	We report three user studies in which the Lexical Simplification needs of non-native English speakers are investigated. Our analyses feature valuable new insight on the relationship between the non-natives{'} notion of complexity and various morphological, semantic and lexical word properties. Some of our findings contradict long-standing misconceptions about word simplicity. The data produced in our studies consists of 211,564 annotations made by 1,100 volunteers, which we hope will guide forthcoming research on Text Simplification for non-native speakers of English.
Tasks	Complex Word Identification, Lexical Simplification, Text Simplification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1069/
PDF	https://www.aclweb.org/anthology/C16-1069
PWC	https://paperswithcode.com/paper/understanding-the-lexical-simplification
Repo
Framework


Title	Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12)
Authors
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-3300/
PDF	https://www.aclweb.org/anthology/W16-3300
PWC	https://paperswithcode.com/paper/proceedings-of-the-12th-international-2
Repo
Framework

Assigning Fine-grained PoS Tags based on High-precision Coarse-grained Tagging


Title	Assigning Fine-grained PoS Tags based on High-precision Coarse-grained Tagging
Authors	Tobias Horsmann, Torsten Zesch
Abstract	We propose a new approach to PoS tagging where in a first step, we assign a coarse-grained tag corresponding to the main syntactic category. Based on this high-precision decision, in the second step we utilize specially trained fine-grained models with heavily reduced decision complexity. By analyzing the system under oracle conditions, we show that there is a quite large potential for significantly outperforming a competitive baseline. When we take error-propagation from the coarse-grained tagging into account, our approach is on par with the state of the art. Our approach also allows tailoring the tagger towards recognizing single word classes which are of interest e.g. for researchers searching for specific phenomena in large corpora. In a case study, we significantly outperform a standard model that also makes use of the same optimizations.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1032/
PDF	https://www.aclweb.org/anthology/C16-1032
PWC	https://paperswithcode.com/paper/assigning-fine-grained-pos-tags-based-on-high
Repo
Framework

Accurate Deep Syntactic Parsing of Graphs: The Case of French


Title	Accurate Deep Syntactic Parsing of Graphs: The Case of French
Authors	Corentin Ribeyre, Eric Villemonte de la Clergerie, Djam{'e} Seddah
Abstract	Parsing predicate-argument structures in a deep syntax framework requires graphs to be predicted. Argument structures represent a higher level of abstraction than the syntactic ones and are thus more difficult to predict even for highly accurate parsing models on surfacic syntax. In this paper we investigate deep syntax parsing, using a French data set (Ribeyre et al., 2014a). We demonstrate that the use of topologically different types of syntactic features, such as dependencies, tree fragments, spines or syntactic paths, brings a much needed context to the parser. Our higher-order parsing model, gaining thus up to 4 points, establishes the state of the art for parsing French deep syntactic structures.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1566/
PDF	https://www.aclweb.org/anthology/L16-1566
PWC	https://paperswithcode.com/paper/accurate-deep-syntactic-parsing-of-graphs-the
Repo
Framework

FABIOLE, a Speech Database for Forensic Speaker Comparison


Title	FABIOLE, a Speech Database for Forensic Speaker Comparison
Authors	Moez Ajili, Jean-Fran{\c{c}}ois Bonastre, Juliette Kahn, Solange Rossato, Guillaume Bernard
Abstract	A speech database has been collected for use to highlight the importance of {`}speaker factor{''} in forensic voice comparison. FABIOLE has been created during the FABIOLE project funded by the French Research Agency (ANR) from 2013 to 2016. This corpus consists in more than 3 thousands excerpts spoken by 130 French native male speakers. The speakers are divided into two categories: 30 target speakers who everyone has 100 excerpts and 100 {`}impostors{''} who everyone has only one excerpt. The data were collected from 10 different French radio and television shows where each utterance turns with a minimum duration of 30s and has a good speech quality. The data set is mainly used for investigating speaker factor in forensic voice comparison and interpreting some unsolved issue such as the relationship between speaker characteristics and system behavior. In this paper, we present FABIOLE database. Then, preliminary experiments are performed to evaluate the effect of the {``}speaker factor{''} and the show on a voice comparison system behavior. \|
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1115/
PDF	https://www.aclweb.org/anthology/L16-1115
PWC	https://paperswithcode.com/paper/fabiole-a-speech-database-for-forensic
Repo
Framework

Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance


Title	Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance
Authors	Christian Buck, Philipp Koehn
Abstract
Tasks	Graph Matching, Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2365/
PDF	https://www.aclweb.org/anthology/W16-2365
PWC	https://paperswithcode.com/paper/quick-and-reliable-document-alignment-via
Repo
Framework

Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notes


Title	Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notes
Authors	Sumithra Velupillai, Danielle L. Mowery, Mike Conway, John Hurdle, Brent Kious
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2912/
PDF	https://www.aclweb.org/anthology/W16-2912
PWC	https://paperswithcode.com/paper/vocabulary-development-to-support-information
Repo
Framework

Measuring the behavioral impact of machine translation quality improvements with A/B testing


Title	Measuring the behavioral impact of machine translation quality improvements with A/B testing
Authors	Ben Russell, Duncan Gillespie
Abstract
Tasks	Machine Translation
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1251/
PDF	https://www.aclweb.org/anthology/D16-1251
PWC	https://paperswithcode.com/paper/measuring-the-behavioral-impact-of-machine
Repo
Framework

Training Data Enrichment for Infrequent Discourse Relations


Title	Training Data Enrichment for Infrequent Discourse Relations
Authors	Kailang Jiang, Giuseppe Carenini, Raymond Ng
Abstract	Discourse parsing is a popular technique widely used in text understanding, sentiment analysis and other NLP tasks. However, for most discourse parsers, the performance varies significantly across different discourse relations. In this paper, we first validate the underfitting hypothesis, i.e., the less frequent a relation is in the training data, the poorer the performance on that relation. We then explore how to increase the number of positive training instances, without resorting to manually creating additional labeled data. We propose a training data enrichment framework that relies on co-training of two different discourse parsers on unlabeled documents. Importantly, we show that co-training alone is not sufficient. The framework requires a filtering step to ensure that only {``}good quality{''} unlabeled documents can be used for enrichment and re-training. We propose and evaluate two ways to perform the filtering. The first is to use an agreement score between the two parsers. The second is to use only the confidence score of the faster parser. Our empirical results show that agreement score can help to boost the performance on infrequent relations, and that the confidence score is a viable approximation of the agreement score for infrequent relations. \|
Tasks	Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1245/
PDF	https://www.aclweb.org/anthology/C16-1245
PWC	https://paperswithcode.com/paper/training-data-enrichment-for-infrequent
Repo
Framework

Expectation-Regulated Neural Model for Event Mention Extraction


Title	Expectation-Regulated Neural Model for Event Mention Extraction
Authors	Ching-Yun Chang, Zhiyang Teng, Yue Zhang
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1045/
PDF	https://www.aclweb.org/anthology/N16-1045
PWC	https://paperswithcode.com/paper/expectation-regulated-neural-model-for-event
Repo
Framework

Language Independent Dependency to Constituent Tree Conversion


Title	Language Independent Dependency to Constituent Tree Conversion
Authors	Young-Suk Lee, Zhiguo Wang
Abstract	We present a dependency to constituent tree conversion technique that aims to improve constituent parsing accuracies by leveraging dependency treebanks available in a wide variety in many languages. The technique works in two steps. First, a partial constituent tree is derived from a dependency tree with a very simple deterministic algorithm that is both language and dependency type independent. Second, a complete high accuracy constituent tree is derived with a constraint-based parser, which uses the partial constituent tree as external constraints. Evaluated on Section 22 of the WSJ Treebank, the technique achieves the state-of-the-art conversion F-score 95.6. When applied to English Universal Dependency treebank and German CoNLL2006 treebank, the converted treebanks added to the human-annotated constituent parser training corpus improve parsing F-scores significantly for both languages.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1041/
PDF	https://www.aclweb.org/anthology/C16-1041
PWC	https://paperswithcode.com/paper/language-independent-dependency-to
Repo
Framework

Exact Recovery of Hard Thresholding Pursuit


Title	Exact Recovery of Hard Thresholding Pursuit
Authors	Xiaotong Yuan, Ping Li, Tong Zhang
Abstract	The Hard Thresholding Pursuit (HTP) is a class of truncated gradient descent methods for finding sparse solutions of $\ell_0$-constrained loss minimization problems. The HTP-style methods have been shown to have strong approximation guarantee and impressive numerical performance in high dimensional statistical learning applications. However, the current theoretical treatment of these methods has traditionally been restricted to the analysis of parameter estimation consistency. It remains an open problem to analyze the support recovery performance (a.k.a., sparsistency) of this type of methods for recovering the global minimizer of the original NP-hard problem. In this paper, we bridge this gap by showing, for the first time, that exact recovery of the global sparse minimizer is possible for HTP-style methods under restricted strong condition number bounding conditions. We further show that HTP-style methods are able to recover the support of certain relaxed sparse solutions without assuming bounded restricted strong condition number. Numerical results on simulated data confirms our theoretical predictions.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6432-exact-recovery-of-hard-thresholding-pursuit
PDF	http://papers.nips.cc/paper/6432-exact-recovery-of-hard-thresholding-pursuit.pdf
PWC	https://paperswithcode.com/paper/exact-recovery-of-hard-thresholding-pursuit
Repo
Framework

Promoting multiword expressions in A* TAG parsing


Title	Promoting multiword expressions in A* TAG parsing
Authors	Jakub Waszczuk, Agata Savary, Yannick Parmentier
Abstract	Multiword expressions (MWEs) are pervasive in natural languages and often have both idiomatic and compositional readings, which leads to high syntactic ambiguity. We show that for some MWE types idiomatic readings are usually the correct ones. We propose a heuristic for an A* parser for Tree Adjoining Grammars which benefits from this knowledge by promoting MWE-oriented analyses. This strategy leads to a substantial reduction in the parsing search space in case of true positive MWE occurrences, while avoiding parsing failures in case of false positives.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1042/
PDF	https://www.aclweb.org/anthology/C16-1042
PWC	https://paperswithcode.com/paper/promoting-multiword-expressions-in-a-tag
Repo
Framework