Paper Group NANR 72
Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research. MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network. Last Words: Sharing Is Caring: The Future of Shared Tasks. UHH Submission to the WMT17 Quality Estimation Shared Task. Multiple Nominative Construc …
Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research
Title | Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research |
Authors | Phillippe Langlais |
Abstract | Despite numerous studies devoted to mining parallel material from bilingual data, we have yet to see the resulting technologies wholeheartedly adopted by professional translators and terminologists alike. I argue that this state of affairs is mainly due to two factors: the emphasis published authors put on models (even though data is as important), and the conspicuous lack of concern for actual end-users. |
Tasks | Machine Translation |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2501/ |
https://www.aclweb.org/anthology/W17-2501 | |
PWC | https://paperswithcode.com/paper/users-and-data-the-two-neglected-children-of |
Repo | |
Framework | |
MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network
Title | MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network |
Authors | Yu-Lun Hsieh, Yung-Chun Chang, Yi-Jie Huang, Shu-Hao Yeh, Chun-Hung Chen, Wen-Lian Hsu |
Abstract | Part-of-speech (POS) tagging and named entity recognition (NER) are crucial steps in natural language processing. In addition, the difficulty of word segmentation places additional burden on those who intend to deal with languages such as Chinese, and pipelined systems often suffer from error propagation. This work proposes an end-to-end model using character-based recurrent neural network (RNN) to jointly accomplish segmentation, POS tagging and NER of a Chinese sentence. Experiments on previous word segmentation and NER datasets show that a single model with the proposed architecture is comparable to those trained specifically for each task, and outperforms freely-available softwares. Moreover, we provide a web-based interface for the public to easily access this resource. |
Tasks | Named Entity Recognition, Part-Of-Speech Tagging |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-2014/ |
https://www.aclweb.org/anthology/I17-2014 | |
PWC | https://paperswithcode.com/paper/monpa-multi-objective-named-entity-and-part |
Repo | |
Framework | |
Last Words: Sharing Is Caring: The Future of Shared Tasks
Title | Last Words: Sharing Is Caring: The Future of Shared Tasks |
Authors | Malvina Nissim, Lasha Abzianidze, Kilian Evang, Rob van der Goot, Hessel Haagsma, Barbara Plank, Martijn Wieling |
Abstract | |
Tasks | |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/J17-4007/ |
https://www.aclweb.org/anthology/J17-4007 | |
PWC | https://paperswithcode.com/paper/last-words-sharing-is-caring-the-future-of |
Repo | |
Framework | |
UHH Submission to the WMT17 Quality Estimation Shared Task
Title | UHH Submission to the WMT17 Quality Estimation Shared Task |
Authors | Melania Duma, Wolfgang Menzel |
Abstract | |
Tasks | Language Modelling, Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4762/ |
https://www.aclweb.org/anthology/W17-4762 | |
PWC | https://paperswithcode.com/paper/uhh-submission-to-the-wmt17-quality |
Repo | |
Framework | |
Multiple Nominative Constructions in Japanese: An Incremental Grammar Perspective
Title | Multiple Nominative Constructions in Japanese: An Incremental Grammar Perspective |
Authors | Tohru Seraku |
Abstract | |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/Y17-1017/ |
https://www.aclweb.org/anthology/Y17-1017 | |
PWC | https://paperswithcode.com/paper/multiple-nominative-constructions-in-japanese |
Repo | |
Framework | |
Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation
Title | Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation |
Authors | Benjamin Marie, Atsushi Fujita |
Abstract | We present a new framework to induce an in-domain phrase table from in-domain monolingual data that can be used to adapt a general-domain statistical machine translation system to the targeted domain. Our method first compiles sets of phrases in source and target languages separately and generates candidate phrase pairs by taking the Cartesian product of the two phrase sets. It then computes inexpensive features for each candidate phrase pair and filters them using a supervised classifier in order to induce an in-domain phrase table. We experimented on the language pair English{–}French, both translation directions, in two domains and obtained consistently better results than a strong baseline system that uses an in-domain bilingual lexicon. We also conducted an error analysis that showed the induced phrase tables proposed useful translations, especially for words and phrases unseen in the parallel data used to train the general-domain baseline system. |
Tasks | Domain Adaptation, Machine Translation |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/Q17-1034/ |
https://www.aclweb.org/anthology/Q17-1034 | |
PWC | https://paperswithcode.com/paper/phrase-table-induction-using-in-domain |
Repo | |
Framework | |
A Domain and Language Independent Named Entity Classification Approach Based on Profiles and Local Information
Title | A Domain and Language Independent Named Entity Classification Approach Based on Profiles and Local Information |
Authors | Isabel Moreno, Mar{'\i}a Teresa Rom{'a}-Ferri, Paloma Moreda Pozo |
Abstract | This paper presents a Named Entity Classification system, which employs machine learning. Our methodology employs local entity information and profiles as feature set. All features are generated in an unsupervised manner. It is tested on two different data sets: (i) DrugSemantics Spanish corpus (Overall F1 = 74.92), whose results are in-line with the state of the art without employing external domain-specific resources. And, (ii) English CONLL2003 dataset (Overall F1 = 81.40), although our results are lower than previous work, these are reached without external knowledge or complex linguistic analysis. Last, using the same configuration for the two corpora, the difference of overall F1 is only 6.48 points (DrugSemantics = 74.92 versus CoNLL2003 = 81.40). Thus, this result supports our hypothesis that our approach is language and domain independent and does not require any external knowledge or complex linguistic analysis. |
Tasks | Named Entity Recognition, Question Answering, Text Generation, Text Summarization |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1067/ |
https://doi.org/10.26615/978-954-452-049-6_067 | |
PWC | https://paperswithcode.com/paper/a-domain-and-language-independent-named |
Repo | |
Framework | |
Generative Local Metric Learning for Kernel Regression
Title | Generative Local Metric Learning for Kernel Regression |
Authors | Yung-Kyun Noh, Masashi Sugiyama, Kee-Eung Kim, Frank Park, Daniel D. Lee |
Abstract | This paper shows how metric learning can be used with Nadaraya-Watson (NW) kernel regression. Compared with standard approaches, such as bandwidth selection, we show how metric learning can significantly reduce the mean square error (MSE) in kernel regression, particularly for high-dimensional data. We propose a method for efficiently learning a good metric function based upon analyzing the performance of the NW estimator for Gaussian-distributed data. A key feature of our approach is that the NW estimator with a learned metric uses information from both the global and local structure of the training data. Theoretical and empirical results confirm that the learned metric can considerably reduce the bias and MSE for kernel regression even when the data are not confined to Gaussian. |
Tasks | Metric Learning |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6839-generative-local-metric-learning-for-kernel-regression |
http://papers.nips.cc/paper/6839-generative-local-metric-learning-for-kernel-regression.pdf | |
PWC | https://paperswithcode.com/paper/generative-local-metric-learning-for-kernel |
Repo | |
Framework | |
Similarity Based Genre Identification for POS Tagging Experts & Dependency Parsing
Title | Similarity Based Genre Identification for POS Tagging Experts & Dependency Parsing |
Authors | Atreyee Mukherjee, S K{"u}bler, ra |
Abstract | POS tagging and dependency parsing achieve good results for homogeneous datasets. However, these tasks are much more difficult on heterogeneous datasets. In (Mukherjee et al. 2016, 2017), we address this issue by creating genre experts for both POS tagging and parsing. We use topic modeling to automatically separate training and test data into genres and to create annotation experts per genre by training separate models for each topic. However, this approach assumes that topic modeling is performed jointly on training and test sentences each time a new test sentence is encountered. We extend this work by assigning new test sentences to their genre expert by using similarity metrics. We investigate three different types of methods: 1) based on words highly associated with a genre by the topic modeler, 2) using a k-nearest neighbor classification approach, and 3) using perplexity to determine the closest topic. The results show that the choice of similarity metric has an effect on results and that we can reach comparable accuracies to the joint topic modeling in POS tagging and dependency parsing, thus providing a viable and efficient approach to POS tagging and parsing a sentence by its genre expert. |
Tasks | Dependency Parsing, Domain Adaptation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1068/ |
https://doi.org/10.26615/978-954-452-049-6_068 | |
PWC | https://paperswithcode.com/paper/similarity-based-genre-identification-for-pos |
Repo | |
Framework | |
Lookahead Bayesian Optimization with Inequality Constraints
Title | Lookahead Bayesian Optimization with Inequality Constraints |
Authors | Remi Lam, Karen Willcox |
Abstract | We consider the task of optimizing an objective function subject to inequality constraints when both the objective and the constraints are expensive to evaluate. Bayesian optimization (BO) is a popular way to tackle optimization problems with expensive objective function evaluations, but has mostly been applied to unconstrained problems. Several BO approaches have been proposed to address expensive constraints but are limited to greedy strategies maximizing immediate reward. To address this limitation, we propose a lookahead approach that selects the next evaluation in order to maximize the long-term feasible reduction of the objective function. We present numerical experiments demonstrating the performance improvements of such a lookahead approach compared to several greedy BO algorithms, including constrained expected improvement (EIC) and predictive entropy search with constraint (PESC). |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6785-lookahead-bayesian-optimization-with-inequality-constraints |
http://papers.nips.cc/paper/6785-lookahead-bayesian-optimization-with-inequality-constraints.pdf | |
PWC | https://paperswithcode.com/paper/lookahead-bayesian-optimization-with |
Repo | |
Framework | |
Unsupervised Dialogue Act Induction using Gaussian Mixtures
Title | Unsupervised Dialogue Act Induction using Gaussian Mixtures |
Authors | Tom{'a}{\v{s}} Brychc{'\i}n, Pavel Kr{'a}l |
Abstract | This paper introduces a new unsupervised approach for dialogue act induction. Given the sequence of dialogue utterances, the task is to assign them the labels representing their function in the dialogue. Utterances are represented as real-valued vectors encoding their meaning. We model the dialogue as Hidden Markov model with emission probabilities estimated by Gaussian mixtures. We use Gibbs sampling for posterior inference. We present the results on the standard Switchboard-DAMSL corpus. Our algorithm achieves promising results compared with strong supervised baselines and outperforms other unsupervised algorithms. |
Tasks | Topic Models |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-2078/ |
https://www.aclweb.org/anthology/E17-2078 | |
PWC | https://paperswithcode.com/paper/unsupervised-dialogue-act-induction-using |
Repo | |
Framework | |
A deep-learning based native-language classification by using a latent semantic analysis for the NLI Shared Task 2017
Title | A deep-learning based native-language classification by using a latent semantic analysis for the NLI Shared Task 2017 |
Authors | Yoo Rhee Oh, Hyung-Bae Jeon, Hwa Jeon Song, Yun-Kyung Lee, Jeon-Gue Park, Yun-Keun Lee |
Abstract | This paper proposes a deep-learning based native-language identification (NLI) using a latent semantic analysis (LSA) as a participant (ETRI-SLP) of the NLI Shared Task 2017 where the NLI Shared Task 2017 aims to detect the native language of an essay or speech response of a standardized assessment of English proficiency for academic purposes. To this end, we use the six unit forms of a text data such as character 4/5/6-grams and word 1/2/3-grams. For each unit form of text data, we convert it into a count-based vector, extract a 2000-rank LSA feature, and perform a linear discriminant analysis (LDA) based dimension reduction. From the count-based vector or the LSA-LDA feature, we also obtain the output prediction values of a support vector machine (SVM) based classifier, the output prediction values of a deep neural network (DNN) based classifier, and the bottleneck values of a DNN based classifier. In order to incorporate the various kinds of text-based features and a speech-based i-vector feature, we design two DNN based ensemble classifiers for late fusion and early fusion, respectively. From the NLI experiments, the F1 (macro) scores are obtained as 0.8601, 0.8664, and 0.9220 for the essay track, the speech track, and the fusion track, respectively. The proposed method has comparable performance to the top-ranked teams for the speech and fusion tracks, although it has slightly lower performance for the essay track. |
Tasks | Dimensionality Reduction, Language Identification, Native Language Identification, Speech Recognition |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-5047/ |
https://www.aclweb.org/anthology/W17-5047 | |
PWC | https://paperswithcode.com/paper/a-deep-learning-based-native-language |
Repo | |
Framework | |
Bulgarian-English and English-Bulgarian Machine Translation: System Design and Evaluation
Title | Bulgarian-English and English-Bulgarian Machine Translation: System Design and Evaluation |
Authors | Petya Osenova, Kiril Simov |
Abstract | The paper presents a deep factored machine translation (MT) system between English and Bulgarian languages in both directions. The MT system is hybrid. It consists of three main steps: (1) the source-language text is linguistically annotated, (2) it is translated to the target language with the Moses system, and (3) translation is post-processed with the help of the transferred linguistic annotation from the source text. Besides automatic evaluation we performed manual evaluation over a domain test suite of sentences demonstrating certain phenomena like imperatives, questions, etc. |
Tasks | Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1073/ |
https://doi.org/10.26615/978-954-452-049-6_073 | |
PWC | https://paperswithcode.com/paper/bulgarian-english-and-english-bulgarian |
Repo | |
Framework | |
Analyzing Semantic Change in Japanese Loanwords
Title | Analyzing Semantic Change in Japanese Loanwords |
Authors | Hiroya Takamura, Ryo Nagata, Yoshifumi Kawasaki |
Abstract | We analyze semantic changes in loanwords from English that are used in Japanese (Japanese loanwords). Specifically, we create word embeddings of English and Japanese and map the Japanese embeddings into the English space so that we can calculate the similarity of each Japanese word and each English word. We then attempt to find loanwords that are semantically different from their original, see if known meaning changes are correctly captured, and show the possibility of using our methodology in language education. |
Tasks | Word Embeddings |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1112/ |
https://www.aclweb.org/anthology/E17-1112 | |
PWC | https://paperswithcode.com/paper/analyzing-semantic-change-in-japanese |
Repo | |
Framework | |
Entity Linking via Joint Encoding of Types, Descriptions, and Context
Title | Entity Linking via Joint Encoding of Types, Descriptions, and Context |
Authors | Nitish Gupta, Sameer Singh, Dan Roth |
Abstract | For accurate entity linking, we need to capture various information aspects of an entity, such as its description in a KB, contexts in which it is mentioned, and structured knowledge. Additionally, a linking system should work on texts from different domains without requiring domain-specific training data or hand-engineered features. In this work we present a neural, modular entity linking system that learns a unified dense representation for each entity using multiple sources of information, such as its description, contexts around its mentions, and its fine-grained types. We show that the resulting entity linking system is effective at combining these sources, and performs competitively, sometimes out-performing current state-of-the-art systems across datasets, without requiring any domain-specific training data or hand-engineered features. We also show that our model can effectively {``}embed{''} entities that are new to the KB, and is able to link its mentions accurately. | |
Tasks | Entity Linking |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1284/ |
https://www.aclweb.org/anthology/D17-1284 | |
PWC | https://paperswithcode.com/paper/entity-linking-via-joint-encoding-of-types |
Repo | |
Framework | |