Paper Group NANR 114
Notion of Semantics in Computer Science - A Systematic Literature Review. Universal Dependencies-based syntactic features in detecting human translation varieties. Extensions to the GrETEL Treebank Query Application. Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text. A semiautomatic lemmatisation procedure for treeb …
Notion of Semantics in Computer Science - A Systematic Literature Review
Title | Notion of Semantics in Computer Science - A Systematic Literature Review |
Authors | Sai Prasad Vrj Gollapudi, Venkatesh Choppella |
Abstract | |
Tasks | |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/W17-7562/ |
https://www.aclweb.org/anthology/W17-7562 | |
PWC | https://paperswithcode.com/paper/notion-of-semantics-in-computer-science-a |
Repo | |
Framework | |
Universal Dependencies-based syntactic features in detecting human translation varieties
Title | Universal Dependencies-based syntactic features in detecting human translation varieties |
Authors | Maria Kunilovskaya, Andrey Kutuzov |
Abstract | |
Tasks | Machine Translation, Text Classification |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-7606/ |
https://www.aclweb.org/anthology/W17-7606 | |
PWC | https://paperswithcode.com/paper/universal-dependencies-based-syntactic |
Repo | |
Framework | |
Extensions to the GrETEL Treebank Query Application
Title | Extensions to the GrETEL Treebank Query Application |
Authors | Jan Odijk, Martijn van der Klis, Sheean Spoel |
Abstract | |
Tasks | Language Acquisition |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-7608/ |
https://www.aclweb.org/anthology/W17-7608 | |
PWC | https://paperswithcode.com/paper/extensions-to-the-gretel-treebank-query |
Repo | |
Framework | |
Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text
Title | Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text |
Authors | Nikola Ljube{\v{s}}i{'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er |
Abstract | In this paper we present the adaptations of a state-of-the-art tagger for South Slavic languages to non-standard texts on the example of the Slovene language. We investigate the impact of introducing in-domain training data as well as additional supervision through external resources or tools like word clusters and word normalization. We remove more than half of the error of the standard tagger when applied to non-standard texts by training it on a combination of standard and non-standard training data, while enriching the data representation with external resources removes additional 11 percent of the error. The final configuration achieves tagging accuracy of 87.41{%} on the full morphosyntactic description, which is, nevertheless, still quite far from the accuracy of 94.27{%} achieved on standard text. |
Tasks | Domain Adaptation, Lemmatization, Machine Translation, Part-Of-Speech Tagging |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1410/ |
https://www.aclweb.org/anthology/W17-1410 | |
PWC | https://paperswithcode.com/paper/adapting-a-state-of-the-art-tagger-for-south |
Repo | |
Framework | |
A semiautomatic lemmatisation procedure for treebanks. Old English strong and weak verbs
Title | A semiautomatic lemmatisation procedure for treebanks. Old English strong and weak verbs |
Authors | Marta T{'\i}o S{'a}enz, Dar{'\i}o Metola Rodr{'\i}guez |
Abstract | |
Tasks | Morphological Tagging |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-7613/ |
https://www.aclweb.org/anthology/W17-7613 | |
PWC | https://paperswithcode.com/paper/a-semiautomatic-lemmatisation-procedure-for |
Repo | |
Framework | |
Author Index
Title | Author Index |
Authors | |
Abstract | |
Tasks | |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-7625/ |
https://www.aclweb.org/anthology/W17-7625 | |
PWC | https://paperswithcode.com/paper/author-index |
Repo | |
Framework | |
Multi-word Entity Classification in a Highly Multilingual Environment
Title | Multi-word Entity Classification in a Highly Multilingual Environment |
Authors | Sophie Chesney, Guillaume Jacquet, Ralf Steinberger, Jakub Piskorski |
Abstract | This paper describes an approach for the classification of millions of existing multi-word entities (MWEntities), such as organisation or event names, into thirteen category types, based only on the tokens they contain. In order to classify our very large in-house collection of multilingual MWEntities into an application-oriented set of entity categories, we trained and tested distantly-supervised classifiers in 43 languages based on MWEntities extracted from BabelNet. The best-performing classifier was the multi-class SVM using a TF.IDF-weighted data representation. Interestingly, one unique classifier trained on a mix of all languages consistently performed better than classifiers trained for individual languages, reaching an averaged F1-value of 88.8{%}. In this paper, we present the training and test data, including a human evaluation of its accuracy, describe the methods used to train the classifiers, and discuss the results. |
Tasks | Named Entity Recognition |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1702/ |
https://www.aclweb.org/anthology/W17-1702 | |
PWC | https://paperswithcode.com/paper/multi-word-entity-classification-in-a-highly |
Repo | |
Framework | |
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Title | The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions |
Authors | Agata Savary, Carlos Ramisch, Silvio Cordeiro, Federico Sangati, Veronika Vincze, Behrang QasemiZadeh, C, Marie ito, Fabienne Cap, Voula Giouli, Ivelina Stoyanova, Antoine Doucet |
Abstract | Multiword expressions (MWEs) are known as a {}pain in the neck{''} for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one{'}s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as { }words with spaces{''}. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems. |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1704/ |
https://www.aclweb.org/anthology/W17-1704 | |
PWC | https://paperswithcode.com/paper/the-parseme-shared-task-on-automatic |
Repo | |
Framework | |
Neural Networks for Multi-Word Expression Detection
Title | Neural Networks for Multi-Word Expression Detection |
Authors | Natalia Klyueva, Antoine Doucet, Milan Straka |
Abstract | In this paper we describe the MUMULS system that participated to the 2017 shared task on automatic identification of verbal multiword expressions (VMWEs). The MUMULS system was implemented using a supervised approach based on recurrent neural networks using the open source library TensorFlow. The model was trained on a data set containing annotated VMWEs as well as morphological and syntactic information. The MUMULS system performed the identification of VMWEs in 15 languages, it was one of few systems that could categorize VMWEs type in nearly all languages. |
Tasks | Machine Translation |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1707/ |
https://www.aclweb.org/anthology/W17-1707 | |
PWC | https://paperswithcode.com/paper/neural-networks-for-multi-word-expression |
Repo | |
Framework | |
Crowd-Sourced Iterative Annotation for Narrative Summarization Corpora
Title | Crowd-Sourced Iterative Annotation for Narrative Summarization Corpora |
Authors | Jessica Ouyang, Serina Chang, Kathy McKeown |
Abstract | We present an iterative annotation process for producing aligned, parallel corpora of abstractive and extractive summaries for narrative. Our approach uses a combination of trained annotators and crowd-sourcing, allowing us to elicit human-generated summaries and alignments quickly and at low cost. We use crowd-sourcing to annotate aligned phrases with the text-to-text generation techniques needed to transform each phrase into the other. We apply this process to a corpus of 476 personal narratives, which we make available on the Web. |
Tasks | Abstractive Text Summarization, Sentence Compression, Text Generation, Text Summarization |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-2008/ |
https://www.aclweb.org/anthology/E17-2008 | |
PWC | https://paperswithcode.com/paper/crowd-sourced-iterative-annotation-for |
Repo | |
Framework | |
Collapsed variational Bayes for Markov jump processes
Title | Collapsed variational Bayes for Markov jump processes |
Authors | Boqian Zhang, Jiangwei Pan, Vinayak A. Rao |
Abstract | Markov jump processes are continuous-time stochastic processes widely used in statistical applications in the natural sciences, and more recently in machine learning. Inference for these models typically proceeds via Markov chain Monte Carlo, and can suffer from various computational challenges. In this work, we propose a novel collapsed variational inference algorithm to address this issue. Our work leverages ideas from discrete-time Markov chains, and exploits a connection between these two through an idea called uniformization. Our algorithm proceeds by marginalizing out the parameters of the Markov jump process, and then approximating the distribution over the trajectory with a factored distribution over segments of a piecewise-constant function. Unlike MCMC schemes that marginalize out transition times of a piecewise-constant process, our scheme optimizes the discretization of time, resulting in significant computational savings. We apply our ideas to synthetic data as well as a dataset of check-in recordings, where we demonstrate superior performance over state-of-the-art MCMC methods. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6965-collapsed-variational-bayes-for-markov-jump-processes |
http://papers.nips.cc/paper/6965-collapsed-variational-bayes-for-markov-jump-processes.pdf | |
PWC | https://paperswithcode.com/paper/collapsed-variational-bayes-for-markov-jump |
Repo | |
Framework | |
MASSAlign: Alignment and Annotation of Comparable Documents
Title | MASSAlign: Alignment and Annotation of Comparable Documents |
Authors | Gustavo Paetzold, Fern Alva-Manchego, o, Lucia Specia |
Abstract | We introduce MASSAlign: a Python library for the alignment and annotation of monolingual comparable documents. MASSAlign offers easy-to-use access to state of the art algorithms for paragraph and sentence-level alignment, as well as novel algorithms for word-level annotation of transformation operations between aligned sentences. In addition, MASSAlign provides a visualization module to display and analyze the alignments and annotations performed. |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-3001/ |
https://www.aclweb.org/anthology/I17-3001 | |
PWC | https://paperswithcode.com/paper/massalign-alignment-and-annotation-of |
Repo | |
Framework | |
Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment
Title | Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment |
Authors | Natalie Vargas, Carlos Ramisch, Helena Caseli |
Abstract | We propose a method for joint unsupervised discovery of multiword expressions (MWEs) and their translations from parallel corpora. First, we apply independent monolingual MWE extraction in source and target languages simultaneously. Then, we calculate translation probability, association score and distributional similarity of co-occurring pairs. Finally, we rank all translations of a given MWE using a linear combination of these features. Preliminary experiments on light verb constructions show promising results. |
Tasks | Machine Translation, Word Alignment |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1711/ |
https://www.aclweb.org/anthology/W17-1711 | |
PWC | https://paperswithcode.com/paper/discovering-light-verb-constructions-and |
Repo | |
Framework | |
Phonological Soundscapes in Medieval Poetry
Title | Phonological Soundscapes in Medieval Poetry |
Authors | Christopher Hench |
Abstract | The oral component of medieval poetry was integral to its performance and reception. Yet many believe that the medieval voice has been forever lost, and any attempts at rediscovering it are doomed to failure due to scribal practices, manuscript mouvance, and linguistic normalization in editing practices. This paper offers a method to abstract from this noise and better understand relative differences in phonological soundscapes by considering syllable qualities. The presented syllabification method and soundscape analysis offer themselves as cross-disciplinary tools for low-resource languages. As a case study, we examine medieval German lyric and argue that the heavily debated lyrical {`}I{'} follows a unique trajectory through soundscapes, shedding light on the performance and practice of these poets. | |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2207/ |
https://www.aclweb.org/anthology/W17-2207 | |
PWC | https://paperswithcode.com/paper/phonological-soundscapes-in-medieval-poetry |
Repo | |
Framework | |
Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources
Title | Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources |
Authors | Manon Scholivet, Carlos Ramisch |
Abstract | We present a simple and efficient tagger capable of identifying highly ambiguous multiword expressions (MWEs) in French texts. It is based on conditional random fields (CRF), using local context information as features. We show that this approach can obtain results that, in some cases, approach more sophisticated parser-based MWE identification methods without requiring syntactic trees from a treebank. Moreover, we study how well the CRF can take into account external information coming from a lexicon. |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1723/ |
https://www.aclweb.org/anthology/W17-1723 | |
PWC | https://paperswithcode.com/paper/identification-of-ambiguous-multiword |
Repo | |
Framework | |