July 26, 2019

2238 words 11 mins read

Paper Group NANR 72

Paper Group NANR 72

Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research. MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network. Last Words: Sharing Is Caring: The Future of Shared Tasks. UHH Submission to the WMT17 Quality Estimation Shared Task. Multiple Nominative Construc …

Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research

Title Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research
Authors Phillippe Langlais
Abstract Despite numerous studies devoted to mining parallel material from bilingual data, we have yet to see the resulting technologies wholeheartedly adopted by professional translators and terminologists alike. I argue that this state of affairs is mainly due to two factors: the emphasis published authors put on models (even though data is as important), and the conspicuous lack of concern for actual end-users.
Tasks Machine Translation
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2501/
PDF https://www.aclweb.org/anthology/W17-2501
PWC https://paperswithcode.com/paper/users-and-data-the-two-neglected-children-of
Repo
Framework

MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network

Title MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network
Authors Yu-Lun Hsieh, Yung-Chun Chang, Yi-Jie Huang, Shu-Hao Yeh, Chun-Hung Chen, Wen-Lian Hsu
Abstract Part-of-speech (POS) tagging and named entity recognition (NER) are crucial steps in natural language processing. In addition, the difficulty of word segmentation places additional burden on those who intend to deal with languages such as Chinese, and pipelined systems often suffer from error propagation. This work proposes an end-to-end model using character-based recurrent neural network (RNN) to jointly accomplish segmentation, POS tagging and NER of a Chinese sentence. Experiments on previous word segmentation and NER datasets show that a single model with the proposed architecture is comparable to those trained specifically for each task, and outperforms freely-available softwares. Moreover, we provide a web-based interface for the public to easily access this resource.
Tasks Named Entity Recognition, Part-Of-Speech Tagging
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-2014/
PDF https://www.aclweb.org/anthology/I17-2014
PWC https://paperswithcode.com/paper/monpa-multi-objective-named-entity-and-part
Repo
Framework

Last Words: Sharing Is Caring: The Future of Shared Tasks

Title Last Words: Sharing Is Caring: The Future of Shared Tasks
Authors Malvina Nissim, Lasha Abzianidze, Kilian Evang, Rob van der Goot, Hessel Haagsma, Barbara Plank, Martijn Wieling
Abstract
Tasks
Published 2017-12-01
URL https://www.aclweb.org/anthology/J17-4007/
PDF https://www.aclweb.org/anthology/J17-4007
PWC https://paperswithcode.com/paper/last-words-sharing-is-caring-the-future-of
Repo
Framework

UHH Submission to the WMT17 Quality Estimation Shared Task

Title UHH Submission to the WMT17 Quality Estimation Shared Task
Authors Melania Duma, Wolfgang Menzel
Abstract
Tasks Language Modelling, Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4762/
PDF https://www.aclweb.org/anthology/W17-4762
PWC https://paperswithcode.com/paper/uhh-submission-to-the-wmt17-quality
Repo
Framework

Multiple Nominative Constructions in Japanese: An Incremental Grammar Perspective

Title Multiple Nominative Constructions in Japanese: An Incremental Grammar Perspective
Authors Tohru Seraku
Abstract
Tasks
Published 2017-11-01
URL https://www.aclweb.org/anthology/Y17-1017/
PDF https://www.aclweb.org/anthology/Y17-1017
PWC https://paperswithcode.com/paper/multiple-nominative-constructions-in-japanese
Repo
Framework

Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation

Title Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation
Authors Benjamin Marie, Atsushi Fujita
Abstract We present a new framework to induce an in-domain phrase table from in-domain monolingual data that can be used to adapt a general-domain statistical machine translation system to the targeted domain. Our method first compiles sets of phrases in source and target languages separately and generates candidate phrase pairs by taking the Cartesian product of the two phrase sets. It then computes inexpensive features for each candidate phrase pair and filters them using a supervised classifier in order to induce an in-domain phrase table. We experimented on the language pair English{–}French, both translation directions, in two domains and obtained consistently better results than a strong baseline system that uses an in-domain bilingual lexicon. We also conducted an error analysis that showed the induced phrase tables proposed useful translations, especially for words and phrases unseen in the parallel data used to train the general-domain baseline system.
Tasks Domain Adaptation, Machine Translation
Published 2017-01-01
URL https://www.aclweb.org/anthology/Q17-1034/
PDF https://www.aclweb.org/anthology/Q17-1034
PWC https://paperswithcode.com/paper/phrase-table-induction-using-in-domain
Repo
Framework

A Domain and Language Independent Named Entity Classification Approach Based on Profiles and Local Information

Title A Domain and Language Independent Named Entity Classification Approach Based on Profiles and Local Information
Authors Isabel Moreno, Mar{'\i}a Teresa Rom{'a}-Ferri, Paloma Moreda Pozo
Abstract This paper presents a Named Entity Classification system, which employs machine learning. Our methodology employs local entity information and profiles as feature set. All features are generated in an unsupervised manner. It is tested on two different data sets: (i) DrugSemantics Spanish corpus (Overall F1 = 74.92), whose results are in-line with the state of the art without employing external domain-specific resources. And, (ii) English CONLL2003 dataset (Overall F1 = 81.40), although our results are lower than previous work, these are reached without external knowledge or complex linguistic analysis. Last, using the same configuration for the two corpora, the difference of overall F1 is only 6.48 points (DrugSemantics = 74.92 versus CoNLL2003 = 81.40). Thus, this result supports our hypothesis that our approach is language and domain independent and does not require any external knowledge or complex linguistic analysis.
Tasks Named Entity Recognition, Question Answering, Text Generation, Text Summarization
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1067/
PDF https://doi.org/10.26615/978-954-452-049-6_067
PWC https://paperswithcode.com/paper/a-domain-and-language-independent-named
Repo
Framework

Generative Local Metric Learning for Kernel Regression

Title Generative Local Metric Learning for Kernel Regression
Authors Yung-Kyun Noh, Masashi Sugiyama, Kee-Eung Kim, Frank Park, Daniel D. Lee
Abstract This paper shows how metric learning can be used with Nadaraya-Watson (NW) kernel regression. Compared with standard approaches, such as bandwidth selection, we show how metric learning can significantly reduce the mean square error (MSE) in kernel regression, particularly for high-dimensional data. We propose a method for efficiently learning a good metric function based upon analyzing the performance of the NW estimator for Gaussian-distributed data. A key feature of our approach is that the NW estimator with a learned metric uses information from both the global and local structure of the training data. Theoretical and empirical results confirm that the learned metric can considerably reduce the bias and MSE for kernel regression even when the data are not confined to Gaussian.
Tasks Metric Learning
Published 2017-12-01
URL http://papers.nips.cc/paper/6839-generative-local-metric-learning-for-kernel-regression
PDF http://papers.nips.cc/paper/6839-generative-local-metric-learning-for-kernel-regression.pdf
PWC https://paperswithcode.com/paper/generative-local-metric-learning-for-kernel
Repo
Framework

Similarity Based Genre Identification for POS Tagging Experts & Dependency Parsing

Title Similarity Based Genre Identification for POS Tagging Experts & Dependency Parsing
Authors Atreyee Mukherjee, S K{"u}bler, ra
Abstract POS tagging and dependency parsing achieve good results for homogeneous datasets. However, these tasks are much more difficult on heterogeneous datasets. In (Mukherjee et al. 2016, 2017), we address this issue by creating genre experts for both POS tagging and parsing. We use topic modeling to automatically separate training and test data into genres and to create annotation experts per genre by training separate models for each topic. However, this approach assumes that topic modeling is performed jointly on training and test sentences each time a new test sentence is encountered. We extend this work by assigning new test sentences to their genre expert by using similarity metrics. We investigate three different types of methods: 1) based on words highly associated with a genre by the topic modeler, 2) using a k-nearest neighbor classification approach, and 3) using perplexity to determine the closest topic. The results show that the choice of similarity metric has an effect on results and that we can reach comparable accuracies to the joint topic modeling in POS tagging and dependency parsing, thus providing a viable and efficient approach to POS tagging and parsing a sentence by its genre expert.
Tasks Dependency Parsing, Domain Adaptation
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1068/
PDF https://doi.org/10.26615/978-954-452-049-6_068
PWC https://paperswithcode.com/paper/similarity-based-genre-identification-for-pos
Repo
Framework

Lookahead Bayesian Optimization with Inequality Constraints

Title Lookahead Bayesian Optimization with Inequality Constraints
Authors Remi Lam, Karen Willcox
Abstract We consider the task of optimizing an objective function subject to inequality constraints when both the objective and the constraints are expensive to evaluate. Bayesian optimization (BO) is a popular way to tackle optimization problems with expensive objective function evaluations, but has mostly been applied to unconstrained problems. Several BO approaches have been proposed to address expensive constraints but are limited to greedy strategies maximizing immediate reward. To address this limitation, we propose a lookahead approach that selects the next evaluation in order to maximize the long-term feasible reduction of the objective function. We present numerical experiments demonstrating the performance improvements of such a lookahead approach compared to several greedy BO algorithms, including constrained expected improvement (EIC) and predictive entropy search with constraint (PESC).
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/6785-lookahead-bayesian-optimization-with-inequality-constraints
PDF http://papers.nips.cc/paper/6785-lookahead-bayesian-optimization-with-inequality-constraints.pdf
PWC https://paperswithcode.com/paper/lookahead-bayesian-optimization-with
Repo
Framework

Unsupervised Dialogue Act Induction using Gaussian Mixtures

Title Unsupervised Dialogue Act Induction using Gaussian Mixtures
Authors Tom{'a}{\v{s}} Brychc{'\i}n, Pavel Kr{'a}l
Abstract This paper introduces a new unsupervised approach for dialogue act induction. Given the sequence of dialogue utterances, the task is to assign them the labels representing their function in the dialogue. Utterances are represented as real-valued vectors encoding their meaning. We model the dialogue as Hidden Markov model with emission probabilities estimated by Gaussian mixtures. We use Gibbs sampling for posterior inference. We present the results on the standard Switchboard-DAMSL corpus. Our algorithm achieves promising results compared with strong supervised baselines and outperforms other unsupervised algorithms.
Tasks Topic Models
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2078/
PDF https://www.aclweb.org/anthology/E17-2078
PWC https://paperswithcode.com/paper/unsupervised-dialogue-act-induction-using
Repo
Framework

A deep-learning based native-language classification by using a latent semantic analysis for the NLI Shared Task 2017

Title A deep-learning based native-language classification by using a latent semantic analysis for the NLI Shared Task 2017
Authors Yoo Rhee Oh, Hyung-Bae Jeon, Hwa Jeon Song, Yun-Kyung Lee, Jeon-Gue Park, Yun-Keun Lee
Abstract This paper proposes a deep-learning based native-language identification (NLI) using a latent semantic analysis (LSA) as a participant (ETRI-SLP) of the NLI Shared Task 2017 where the NLI Shared Task 2017 aims to detect the native language of an essay or speech response of a standardized assessment of English proficiency for academic purposes. To this end, we use the six unit forms of a text data such as character 4/5/6-grams and word 1/2/3-grams. For each unit form of text data, we convert it into a count-based vector, extract a 2000-rank LSA feature, and perform a linear discriminant analysis (LDA) based dimension reduction. From the count-based vector or the LSA-LDA feature, we also obtain the output prediction values of a support vector machine (SVM) based classifier, the output prediction values of a deep neural network (DNN) based classifier, and the bottleneck values of a DNN based classifier. In order to incorporate the various kinds of text-based features and a speech-based i-vector feature, we design two DNN based ensemble classifiers for late fusion and early fusion, respectively. From the NLI experiments, the F1 (macro) scores are obtained as 0.8601, 0.8664, and 0.9220 for the essay track, the speech track, and the fusion track, respectively. The proposed method has comparable performance to the top-ranked teams for the speech and fusion tracks, although it has slightly lower performance for the essay track.
Tasks Dimensionality Reduction, Language Identification, Native Language Identification, Speech Recognition
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-5047/
PDF https://www.aclweb.org/anthology/W17-5047
PWC https://paperswithcode.com/paper/a-deep-learning-based-native-language
Repo
Framework

Bulgarian-English and English-Bulgarian Machine Translation: System Design and Evaluation

Title Bulgarian-English and English-Bulgarian Machine Translation: System Design and Evaluation
Authors Petya Osenova, Kiril Simov
Abstract The paper presents a deep factored machine translation (MT) system between English and Bulgarian languages in both directions. The MT system is hybrid. It consists of three main steps: (1) the source-language text is linguistically annotated, (2) it is translated to the target language with the Moses system, and (3) translation is post-processed with the help of the transferred linguistic annotation from the source text. Besides automatic evaluation we performed manual evaluation over a domain test suite of sentences demonstrating certain phenomena like imperatives, questions, etc.
Tasks Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1073/
PDF https://doi.org/10.26615/978-954-452-049-6_073
PWC https://paperswithcode.com/paper/bulgarian-english-and-english-bulgarian
Repo
Framework

Analyzing Semantic Change in Japanese Loanwords

Title Analyzing Semantic Change in Japanese Loanwords
Authors Hiroya Takamura, Ryo Nagata, Yoshifumi Kawasaki
Abstract We analyze semantic changes in loanwords from English that are used in Japanese (Japanese loanwords). Specifically, we create word embeddings of English and Japanese and map the Japanese embeddings into the English space so that we can calculate the similarity of each Japanese word and each English word. We then attempt to find loanwords that are semantically different from their original, see if known meaning changes are correctly captured, and show the possibility of using our methodology in language education.
Tasks Word Embeddings
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-1112/
PDF https://www.aclweb.org/anthology/E17-1112
PWC https://paperswithcode.com/paper/analyzing-semantic-change-in-japanese
Repo
Framework

Entity Linking via Joint Encoding of Types, Descriptions, and Context

Title Entity Linking via Joint Encoding of Types, Descriptions, and Context
Authors Nitish Gupta, Sameer Singh, Dan Roth
Abstract For accurate entity linking, we need to capture various information aspects of an entity, such as its description in a KB, contexts in which it is mentioned, and structured knowledge. Additionally, a linking system should work on texts from different domains without requiring domain-specific training data or hand-engineered features. In this work we present a neural, modular entity linking system that learns a unified dense representation for each entity using multiple sources of information, such as its description, contexts around its mentions, and its fine-grained types. We show that the resulting entity linking system is effective at combining these sources, and performs competitively, sometimes out-performing current state-of-the-art systems across datasets, without requiring any domain-specific training data or hand-engineered features. We also show that our model can effectively {``}embed{''} entities that are new to the KB, and is able to link its mentions accurately. |
Tasks Entity Linking
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1284/
PDF https://www.aclweb.org/anthology/D17-1284
PWC https://paperswithcode.com/paper/entity-linking-via-joint-encoding-of-types
Repo
Framework
comments powered by Disqus