July 26, 2019

1680 words 8 mins read

Paper Group NANR 114

Notion of Semantics in Computer Science - A Systematic Literature Review. Universal Dependencies-based syntactic features in detecting human translation varieties. Extensions to the GrETEL Treebank Query Application. Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text. A semiautomatic lemmatisation procedure for treeb …

Notion of Semantics in Computer Science - A Systematic Literature Review


Title	Notion of Semantics in Computer Science - A Systematic Literature Review
Authors	Sai Prasad Vrj Gollapudi, Venkatesh Choppella
Abstract
Tasks
Published	2017-12-01
URL	https://www.aclweb.org/anthology/W17-7562/
PDF	https://www.aclweb.org/anthology/W17-7562
PWC	https://paperswithcode.com/paper/notion-of-semantics-in-computer-science-a
Repo
Framework

Universal Dependencies-based syntactic features in detecting human translation varieties


Title	Universal Dependencies-based syntactic features in detecting human translation varieties
Authors	Maria Kunilovskaya, Andrey Kutuzov
Abstract
Tasks	Machine Translation, Text Classification
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-7606/
PDF	https://www.aclweb.org/anthology/W17-7606
PWC	https://paperswithcode.com/paper/universal-dependencies-based-syntactic
Repo
Framework

Extensions to the GrETEL Treebank Query Application


Title	Extensions to the GrETEL Treebank Query Application
Authors	Jan Odijk, Martijn van der Klis, Sheean Spoel
Abstract
Tasks	Language Acquisition
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-7608/
PDF	https://www.aclweb.org/anthology/W17-7608
PWC	https://paperswithcode.com/paper/extensions-to-the-gretel-treebank-query
Repo
Framework

Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text


Title	Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text
Authors	Nikola Ljube{\v{s}}i{'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er
Abstract	In this paper we present the adaptations of a state-of-the-art tagger for South Slavic languages to non-standard texts on the example of the Slovene language. We investigate the impact of introducing in-domain training data as well as additional supervision through external resources or tools like word clusters and word normalization. We remove more than half of the error of the standard tagger when applied to non-standard texts by training it on a combination of standard and non-standard training data, while enriching the data representation with external resources removes additional 11 percent of the error. The final configuration achieves tagging accuracy of 87.41{%} on the full morphosyntactic description, which is, nevertheless, still quite far from the accuracy of 94.27{%} achieved on standard text.
Tasks	Domain Adaptation, Lemmatization, Machine Translation, Part-Of-Speech Tagging
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1410/
PDF	https://www.aclweb.org/anthology/W17-1410
PWC	https://paperswithcode.com/paper/adapting-a-state-of-the-art-tagger-for-south
Repo
Framework

A semiautomatic lemmatisation procedure for treebanks. Old English strong and weak verbs


Title	A semiautomatic lemmatisation procedure for treebanks. Old English strong and weak verbs
Authors	Marta T{'\i}o S{'a}enz, Dar{'\i}o Metola Rodr{'\i}guez
Abstract
Tasks	Morphological Tagging
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-7613/
PDF	https://www.aclweb.org/anthology/W17-7613
PWC	https://paperswithcode.com/paper/a-semiautomatic-lemmatisation-procedure-for
Repo
Framework

Author Index


Title	Author Index
Authors
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-7625/
PDF	https://www.aclweb.org/anthology/W17-7625
PWC	https://paperswithcode.com/paper/author-index
Repo
Framework

Multi-word Entity Classification in a Highly Multilingual Environment


Title	Multi-word Entity Classification in a Highly Multilingual Environment
Authors	Sophie Chesney, Guillaume Jacquet, Ralf Steinberger, Jakub Piskorski
Abstract	This paper describes an approach for the classification of millions of existing multi-word entities (MWEntities), such as organisation or event names, into thirteen category types, based only on the tokens they contain. In order to classify our very large in-house collection of multilingual MWEntities into an application-oriented set of entity categories, we trained and tested distantly-supervised classifiers in 43 languages based on MWEntities extracted from BabelNet. The best-performing classifier was the multi-class SVM using a TF.IDF-weighted data representation. Interestingly, one unique classifier trained on a mix of all languages consistently performed better than classifiers trained for individual languages, reaching an averaged F1-value of 88.8{%}. In this paper, we present the training and test data, including a human evaluation of its accuracy, describe the methods used to train the classifiers, and discuss the results.
Tasks	Named Entity Recognition
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1702/
PDF	https://www.aclweb.org/anthology/W17-1702
PWC	https://paperswithcode.com/paper/multi-word-entity-classification-in-a-highly
Repo
Framework

The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions


Title	The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Authors	Agata Savary, Carlos Ramisch, Silvio Cordeiro, Federico Sangati, Veronika Vincze, Behrang QasemiZadeh, C, Marie ito, Fabienne Cap, Voula Giouli, Ivelina Stoyanova, Antoine Doucet
Abstract	Multiword expressions (MWEs) are known as a {`}pain in the neck{''} for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one{'}s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as {`}words with spaces{''}. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1704/
PDF	https://www.aclweb.org/anthology/W17-1704
PWC	https://paperswithcode.com/paper/the-parseme-shared-task-on-automatic
Repo
Framework

Neural Networks for Multi-Word Expression Detection


Title	Neural Networks for Multi-Word Expression Detection
Authors	Natalia Klyueva, Antoine Doucet, Milan Straka
Abstract	In this paper we describe the MUMULS system that participated to the 2017 shared task on automatic identification of verbal multiword expressions (VMWEs). The MUMULS system was implemented using a supervised approach based on recurrent neural networks using the open source library TensorFlow. The model was trained on a data set containing annotated VMWEs as well as morphological and syntactic information. The MUMULS system performed the identification of VMWEs in 15 languages, it was one of few systems that could categorize VMWEs type in nearly all languages.
Tasks	Machine Translation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1707/
PDF	https://www.aclweb.org/anthology/W17-1707
PWC	https://paperswithcode.com/paper/neural-networks-for-multi-word-expression
Repo
Framework

Crowd-Sourced Iterative Annotation for Narrative Summarization Corpora


Title	Crowd-Sourced Iterative Annotation for Narrative Summarization Corpora
Authors	Jessica Ouyang, Serina Chang, Kathy McKeown
Abstract	We present an iterative annotation process for producing aligned, parallel corpora of abstractive and extractive summaries for narrative. Our approach uses a combination of trained annotators and crowd-sourcing, allowing us to elicit human-generated summaries and alignments quickly and at low cost. We use crowd-sourcing to annotate aligned phrases with the text-to-text generation techniques needed to transform each phrase into the other. We apply this process to a corpus of 476 personal narratives, which we make available on the Web.
Tasks	Abstractive Text Summarization, Sentence Compression, Text Generation, Text Summarization
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2008/
PDF	https://www.aclweb.org/anthology/E17-2008
PWC	https://paperswithcode.com/paper/crowd-sourced-iterative-annotation-for
Repo
Framework

Collapsed variational Bayes for Markov jump processes


Title	Collapsed variational Bayes for Markov jump processes
Authors	Boqian Zhang, Jiangwei Pan, Vinayak A. Rao
Abstract	Markov jump processes are continuous-time stochastic processes widely used in statistical applications in the natural sciences, and more recently in machine learning. Inference for these models typically proceeds via Markov chain Monte Carlo, and can suffer from various computational challenges. In this work, we propose a novel collapsed variational inference algorithm to address this issue. Our work leverages ideas from discrete-time Markov chains, and exploits a connection between these two through an idea called uniformization. Our algorithm proceeds by marginalizing out the parameters of the Markov jump process, and then approximating the distribution over the trajectory with a factored distribution over segments of a piecewise-constant function. Unlike MCMC schemes that marginalize out transition times of a piecewise-constant process, our scheme optimizes the discretization of time, resulting in significant computational savings. We apply our ideas to synthetic data as well as a dataset of check-in recordings, where we demonstrate superior performance over state-of-the-art MCMC methods.
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/6965-collapsed-variational-bayes-for-markov-jump-processes
PDF	http://papers.nips.cc/paper/6965-collapsed-variational-bayes-for-markov-jump-processes.pdf
PWC	https://paperswithcode.com/paper/collapsed-variational-bayes-for-markov-jump
Repo
Framework

MASSAlign: Alignment and Annotation of Comparable Documents


Title	MASSAlign: Alignment and Annotation of Comparable Documents
Authors	Gustavo Paetzold, Fern Alva-Manchego, o, Lucia Specia
Abstract	We introduce MASSAlign: a Python library for the alignment and annotation of monolingual comparable documents. MASSAlign offers easy-to-use access to state of the art algorithms for paragraph and sentence-level alignment, as well as novel algorithms for word-level annotation of transformation operations between aligned sentences. In addition, MASSAlign provides a visualization module to display and analyze the alignments and annotations performed.
Tasks
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-3001/
PDF	https://www.aclweb.org/anthology/I17-3001
PWC	https://paperswithcode.com/paper/massalign-alignment-and-annotation-of
Repo
Framework

Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment


Title	Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment
Authors	Natalie Vargas, Carlos Ramisch, Helena Caseli
Abstract	We propose a method for joint unsupervised discovery of multiword expressions (MWEs) and their translations from parallel corpora. First, we apply independent monolingual MWE extraction in source and target languages simultaneously. Then, we calculate translation probability, association score and distributional similarity of co-occurring pairs. Finally, we rank all translations of a given MWE using a linear combination of these features. Preliminary experiments on light verb constructions show promising results.
Tasks	Machine Translation, Word Alignment
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1711/
PDF	https://www.aclweb.org/anthology/W17-1711
PWC	https://paperswithcode.com/paper/discovering-light-verb-constructions-and
Repo
Framework

Phonological Soundscapes in Medieval Poetry


Title	Phonological Soundscapes in Medieval Poetry
Authors	Christopher Hench
Abstract	The oral component of medieval poetry was integral to its performance and reception. Yet many believe that the medieval voice has been forever lost, and any attempts at rediscovering it are doomed to failure due to scribal practices, manuscript mouvance, and linguistic normalization in editing practices. This paper offers a method to abstract from this noise and better understand relative differences in phonological soundscapes by considering syllable qualities. The presented syllabification method and soundscape analysis offer themselves as cross-disciplinary tools for low-resource languages. As a case study, we examine medieval German lyric and argue that the heavily debated lyrical {`}I{'} follows a unique trajectory through soundscapes, shedding light on the performance and practice of these poets. \|
Tasks
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2207/
PDF	https://www.aclweb.org/anthology/W17-2207
PWC	https://paperswithcode.com/paper/phonological-soundscapes-in-medieval-poetry
Repo
Framework

Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources


Title	Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources
Authors	Manon Scholivet, Carlos Ramisch
Abstract	We present a simple and efficient tagger capable of identifying highly ambiguous multiword expressions (MWEs) in French texts. It is based on conditional random fields (CRF), using local context information as features. We show that this approach can obtain results that, in some cases, approach more sophisticated parser-based MWE identification methods without requiring syntactic trees from a treebank. Moreover, we study how well the CRF can take into account external information coming from a lexicon.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1723/
PDF	https://www.aclweb.org/anthology/W17-1723
PWC	https://paperswithcode.com/paper/identification-of-ambiguous-multiword
Repo
Framework