July 26, 2019

2238 words 11 mins read

Paper Group NANR 72

Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research. MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network. Last Words: Sharing Is Caring: The Future of Shared Tasks. UHH Submission to the WMT17 Quality Estimation Shared Task. Multiple Nominative Construc …

Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research


Title	Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research
Authors	Phillippe Langlais
Abstract	Despite numerous studies devoted to mining parallel material from bilingual data, we have yet to see the resulting technologies wholeheartedly adopted by professional translators and terminologists alike. I argue that this state of affairs is mainly due to two factors: the emphasis published authors put on models (even though data is as important), and the conspicuous lack of concern for actual end-users.
Tasks	Machine Translation
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2501/
PDF	https://www.aclweb.org/anthology/W17-2501
PWC	https://paperswithcode.com/paper/users-and-data-the-two-neglected-children-of
Repo
Framework

MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network


Title	MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network
Authors	Yu-Lun Hsieh, Yung-Chun Chang, Yi-Jie Huang, Shu-Hao Yeh, Chun-Hung Chen, Wen-Lian Hsu
Abstract	Part-of-speech (POS) tagging and named entity recognition (NER) are crucial steps in natural language processing. In addition, the difficulty of word segmentation places additional burden on those who intend to deal with languages such as Chinese, and pipelined systems often suffer from error propagation. This work proposes an end-to-end model using character-based recurrent neural network (RNN) to jointly accomplish segmentation, POS tagging and NER of a Chinese sentence. Experiments on previous word segmentation and NER datasets show that a single model with the proposed architecture is comparable to those trained specifically for each task, and outperforms freely-available softwares. Moreover, we provide a web-based interface for the public to easily access this resource.
Tasks	Named Entity Recognition, Part-Of-Speech Tagging
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2014/
PDF	https://www.aclweb.org/anthology/I17-2014
PWC	https://paperswithcode.com/paper/monpa-multi-objective-named-entity-and-part
Repo
Framework


Title	Last Words: Sharing Is Caring: The Future of Shared Tasks
Authors	Malvina Nissim, Lasha Abzianidze, Kilian Evang, Rob van der Goot, Hessel Haagsma, Barbara Plank, Martijn Wieling
Abstract
Tasks
Published	2017-12-01
URL	https://www.aclweb.org/anthology/J17-4007/
PDF	https://www.aclweb.org/anthology/J17-4007
PWC	https://paperswithcode.com/paper/last-words-sharing-is-caring-the-future-of
Repo
Framework

UHH Submission to the WMT17 Quality Estimation Shared Task


Title	UHH Submission to the WMT17 Quality Estimation Shared Task
Authors	Melania Duma, Wolfgang Menzel
Abstract
Tasks	Language Modelling, Machine Translation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4762/
PDF	https://www.aclweb.org/anthology/W17-4762
PWC	https://paperswithcode.com/paper/uhh-submission-to-the-wmt17-quality
Repo
Framework

Multiple Nominative Constructions in Japanese: An Incremental Grammar Perspective


Title	Multiple Nominative Constructions in Japanese: An Incremental Grammar Perspective
Authors	Tohru Seraku
Abstract
Tasks
Published	2017-11-01
URL	https://www.aclweb.org/anthology/Y17-1017/
PDF	https://www.aclweb.org/anthology/Y17-1017
PWC	https://paperswithcode.com/paper/multiple-nominative-constructions-in-japanese
Repo
Framework

Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation


Title	Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation
Authors	Benjamin Marie, Atsushi Fujita
Abstract	We present a new framework to induce an in-domain phrase table from in-domain monolingual data that can be used to adapt a general-domain statistical machine translation system to the targeted domain. Our method first compiles sets of phrases in source and target languages separately and generates candidate phrase pairs by taking the Cartesian product of the two phrase sets. It then computes inexpensive features for each candidate phrase pair and filters them using a supervised classifier in order to induce an in-domain phrase table. We experimented on the language pair English{–}French, both translation directions, in two domains and obtained consistently better results than a strong baseline system that uses an in-domain bilingual lexicon. We also conducted an error analysis that showed the induced phrase tables proposed useful translations, especially for words and phrases unseen in the parallel data used to train the general-domain baseline system.
Tasks	Domain Adaptation, Machine Translation
Published	2017-01-01
URL	https://www.aclweb.org/anthology/Q17-1034/
PDF	https://www.aclweb.org/anthology/Q17-1034
PWC	https://paperswithcode.com/paper/phrase-table-induction-using-in-domain
Repo
Framework

A Domain and Language Independent Named Entity Classification Approach Based on Profiles and Local Information


Title	A Domain and Language Independent Named Entity Classification Approach Based on Profiles and Local Information
Authors	Isabel Moreno, Mar{'\i}a Teresa Rom{'a}-Ferri, Paloma Moreda Pozo
Abstract	This paper presents a Named Entity Classification system, which employs machine learning. Our methodology employs local entity information and profiles as feature set. All features are generated in an unsupervised manner. It is tested on two different data sets: (i) DrugSemantics Spanish corpus (Overall F1 = 74.92), whose results are in-line with the state of the art without employing external domain-specific resources. And, (ii) English CONLL2003 dataset (Overall F1 = 81.40), although our results are lower than previous work, these are reached without external knowledge or complex linguistic analysis. Last, using the same configuration for the two corpora, the difference of overall F1 is only 6.48 points (DrugSemantics = 74.92 versus CoNLL2003 = 81.40). Thus, this result supports our hypothesis that our approach is language and domain independent and does not require any external knowledge or complex linguistic analysis.
Tasks	Named Entity Recognition, Question Answering, Text Generation, Text Summarization
Published	2017-09-01
URL	https://www.aclweb.org/anthology/R17-1067/
PDF	https://doi.org/10.26615/978-954-452-049-6_067
PWC	https://paperswithcode.com/paper/a-domain-and-language-independent-named
Repo
Framework

Generative Local Metric Learning for Kernel Regression


Title	Generative Local Metric Learning for Kernel Regression
Authors	Yung-Kyun Noh, Masashi Sugiyama, Kee-Eung Kim, Frank Park, Daniel D. Lee
Abstract	This paper shows how metric learning can be used with Nadaraya-Watson (NW) kernel regression. Compared with standard approaches, such as bandwidth selection, we show how metric learning can significantly reduce the mean square error (MSE) in kernel regression, particularly for high-dimensional data. We propose a method for efficiently learning a good metric function based upon analyzing the performance of the NW estimator for Gaussian-distributed data. A key feature of our approach is that the NW estimator with a learned metric uses information from both the global and local structure of the training data. Theoretical and empirical results confirm that the learned metric can considerably reduce the bias and MSE for kernel regression even when the data are not confined to Gaussian.
Tasks	Metric Learning
Published	2017-12-01
URL	http://papers.nips.cc/paper/6839-generative-local-metric-learning-for-kernel-regression
PDF	http://papers.nips.cc/paper/6839-generative-local-metric-learning-for-kernel-regression.pdf
PWC	https://paperswithcode.com/paper/generative-local-metric-learning-for-kernel
Repo
Framework

Similarity Based Genre Identification for POS Tagging Experts & Dependency Parsing


Title	Similarity Based Genre Identification for POS Tagging Experts & Dependency Parsing
Authors	Atreyee Mukherjee, S K{"u}bler, ra
Abstract	POS tagging and dependency parsing achieve good results for homogeneous datasets. However, these tasks are much more difficult on heterogeneous datasets. In (Mukherjee et al. 2016, 2017), we address this issue by creating genre experts for both POS tagging and parsing. We use topic modeling to automatically separate training and test data into genres and to create annotation experts per genre by training separate models for each topic. However, this approach assumes that topic modeling is performed jointly on training and test sentences each time a new test sentence is encountered. We extend this work by assigning new test sentences to their genre expert by using similarity metrics. We investigate three different types of methods: 1) based on words highly associated with a genre by the topic modeler, 2) using a k-nearest neighbor classification approach, and 3) using perplexity to determine the closest topic. The results show that the choice of similarity metric has an effect on results and that we can reach comparable accuracies to the joint topic modeling in POS tagging and dependency parsing, thus providing a viable and efficient approach to POS tagging and parsing a sentence by its genre expert.
Tasks	Dependency Parsing, Domain Adaptation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/R17-1068/
PDF	https://doi.org/10.26615/978-954-452-049-6_068
PWC	https://paperswithcode.com/paper/similarity-based-genre-identification-for-pos
Repo
Framework

Lookahead Bayesian Optimization with Inequality Constraints


Title	Lookahead Bayesian Optimization with Inequality Constraints
Authors	Remi Lam, Karen Willcox
Abstract	We consider the task of optimizing an objective function subject to inequality constraints when both the objective and the constraints are expensive to evaluate. Bayesian optimization (BO) is a popular way to tackle optimization problems with expensive objective function evaluations, but has mostly been applied to unconstrained problems. Several BO approaches have been proposed to address expensive constraints but are limited to greedy strategies maximizing immediate reward. To address this limitation, we propose a lookahead approach that selects the next evaluation in order to maximize the long-term feasible reduction of the objective function. We present numerical experiments demonstrating the performance improvements of such a lookahead approach compared to several greedy BO algorithms, including constrained expected improvement (EIC) and predictive entropy search with constraint (PESC).
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/6785-lookahead-bayesian-optimization-with-inequality-constraints
PDF	http://papers.nips.cc/paper/6785-lookahead-bayesian-optimization-with-inequality-constraints.pdf
PWC	https://paperswithcode.com/paper/lookahead-bayesian-optimization-with
Repo
Framework

Unsupervised Dialogue Act Induction using Gaussian Mixtures


Title	Unsupervised Dialogue Act Induction using Gaussian Mixtures
Authors	Tom{'a}{\v{s}} Brychc{'\i}n, Pavel Kr{'a}l
Abstract	This paper introduces a new unsupervised approach for dialogue act induction. Given the sequence of dialogue utterances, the task is to assign them the labels representing their function in the dialogue. Utterances are represented as real-valued vectors encoding their meaning. We model the dialogue as Hidden Markov model with emission probabilities estimated by Gaussian mixtures. We use Gibbs sampling for posterior inference. We present the results on the standard Switchboard-DAMSL corpus. Our algorithm achieves promising results compared with strong supervised baselines and outperforms other unsupervised algorithms.
Tasks	Topic Models
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2078/
PDF	https://www.aclweb.org/anthology/E17-2078
PWC	https://paperswithcode.com/paper/unsupervised-dialogue-act-induction-using
Repo
Framework

A deep-learning based native-language classification by using a latent semantic analysis for the NLI Shared Task 2017


Title	A deep-learning based native-language classification by using a latent semantic analysis for the NLI Shared Task 2017
Authors	Yoo Rhee Oh, Hyung-Bae Jeon, Hwa Jeon Song, Yun-Kyung Lee, Jeon-Gue Park, Yun-Keun Lee
Abstract	This paper proposes a deep-learning based native-language identification (NLI) using a latent semantic analysis (LSA) as a participant (ETRI-SLP) of the NLI Shared Task 2017 where the NLI Shared Task 2017 aims to detect the native language of an essay or speech response of a standardized assessment of English proficiency for academic purposes. To this end, we use the six unit forms of a text data such as character 4/5/6-grams and word 1/2/3-grams. For each unit form of text data, we convert it into a count-based vector, extract a 2000-rank LSA feature, and perform a linear discriminant analysis (LDA) based dimension reduction. From the count-based vector or the LSA-LDA feature, we also obtain the output prediction values of a support vector machine (SVM) based classifier, the output prediction values of a deep neural network (DNN) based classifier, and the bottleneck values of a DNN based classifier. In order to incorporate the various kinds of text-based features and a speech-based i-vector feature, we design two DNN based ensemble classifiers for late fusion and early fusion, respectively. From the NLI experiments, the F1 (macro) scores are obtained as 0.8601, 0.8664, and 0.9220 for the essay track, the speech track, and the fusion track, respectively. The proposed method has comparable performance to the top-ranked teams for the speech and fusion tracks, although it has slightly lower performance for the essay track.
Tasks	Dimensionality Reduction, Language Identification, Native Language Identification, Speech Recognition
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5047/
PDF	https://www.aclweb.org/anthology/W17-5047
PWC	https://paperswithcode.com/paper/a-deep-learning-based-native-language
Repo
Framework

Bulgarian-English and English-Bulgarian Machine Translation: System Design and Evaluation


Title	Bulgarian-English and English-Bulgarian Machine Translation: System Design and Evaluation
Authors	Petya Osenova, Kiril Simov
Abstract	The paper presents a deep factored machine translation (MT) system between English and Bulgarian languages in both directions. The MT system is hybrid. It consists of three main steps: (1) the source-language text is linguistically annotated, (2) it is translated to the target language with the Moses system, and (3) translation is post-processed with the help of the transferred linguistic annotation from the source text. Besides automatic evaluation we performed manual evaluation over a domain test suite of sentences demonstrating certain phenomena like imperatives, questions, etc.
Tasks	Machine Translation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/R17-1073/
PDF	https://doi.org/10.26615/978-954-452-049-6_073
PWC	https://paperswithcode.com/paper/bulgarian-english-and-english-bulgarian
Repo
Framework

Analyzing Semantic Change in Japanese Loanwords


Title	Analyzing Semantic Change in Japanese Loanwords
Authors	Hiroya Takamura, Ryo Nagata, Yoshifumi Kawasaki
Abstract	We analyze semantic changes in loanwords from English that are used in Japanese (Japanese loanwords). Specifically, we create word embeddings of English and Japanese and map the Japanese embeddings into the English space so that we can calculate the similarity of each Japanese word and each English word. We then attempt to find loanwords that are semantically different from their original, see if known meaning changes are correctly captured, and show the possibility of using our methodology in language education.
Tasks	Word Embeddings
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-1112/
PDF	https://www.aclweb.org/anthology/E17-1112
PWC	https://paperswithcode.com/paper/analyzing-semantic-change-in-japanese
Repo
Framework

Entity Linking via Joint Encoding of Types, Descriptions, and Context


Title	Entity Linking via Joint Encoding of Types, Descriptions, and Context
Authors	Nitish Gupta, Sameer Singh, Dan Roth
Abstract	For accurate entity linking, we need to capture various information aspects of an entity, such as its description in a KB, contexts in which it is mentioned, and structured knowledge. Additionally, a linking system should work on texts from different domains without requiring domain-specific training data or hand-engineered features. In this work we present a neural, modular entity linking system that learns a unified dense representation for each entity using multiple sources of information, such as its description, contexts around its mentions, and its fine-grained types. We show that the resulting entity linking system is effective at combining these sources, and performs competitively, sometimes out-performing current state-of-the-art systems across datasets, without requiring any domain-specific training data or hand-engineered features. We also show that our model can effectively {``}embed{''} entities that are new to the KB, and is able to link its mentions accurately. \|
Tasks	Entity Linking
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1284/
PDF	https://www.aclweb.org/anthology/D17-1284
PWC	https://paperswithcode.com/paper/entity-linking-via-joint-encoding-of-types
Repo
Framework