July 26, 2019

1680 words 8 mins read

Paper Group NANR 114

Paper Group NANR 114

Notion of Semantics in Computer Science - A Systematic Literature Review. Universal Dependencies-based syntactic features in detecting human translation varieties. Extensions to the GrETEL Treebank Query Application. Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text. A semiautomatic lemmatisation procedure for treeb …

Notion of Semantics in Computer Science - A Systematic Literature Review

Title Notion of Semantics in Computer Science - A Systematic Literature Review
Authors Sai Prasad Vrj Gollapudi, Venkatesh Choppella
Abstract
Tasks
Published 2017-12-01
URL https://www.aclweb.org/anthology/W17-7562/
PDF https://www.aclweb.org/anthology/W17-7562
PWC https://paperswithcode.com/paper/notion-of-semantics-in-computer-science-a
Repo
Framework

Universal Dependencies-based syntactic features in detecting human translation varieties

Title Universal Dependencies-based syntactic features in detecting human translation varieties
Authors Maria Kunilovskaya, Andrey Kutuzov
Abstract
Tasks Machine Translation, Text Classification
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-7606/
PDF https://www.aclweb.org/anthology/W17-7606
PWC https://paperswithcode.com/paper/universal-dependencies-based-syntactic
Repo
Framework

Extensions to the GrETEL Treebank Query Application

Title Extensions to the GrETEL Treebank Query Application
Authors Jan Odijk, Martijn van der Klis, Sheean Spoel
Abstract
Tasks Language Acquisition
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-7608/
PDF https://www.aclweb.org/anthology/W17-7608
PWC https://paperswithcode.com/paper/extensions-to-the-gretel-treebank-query
Repo
Framework

Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text

Title Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text
Authors Nikola Ljube{\v{s}}i{'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er
Abstract In this paper we present the adaptations of a state-of-the-art tagger for South Slavic languages to non-standard texts on the example of the Slovene language. We investigate the impact of introducing in-domain training data as well as additional supervision through external resources or tools like word clusters and word normalization. We remove more than half of the error of the standard tagger when applied to non-standard texts by training it on a combination of standard and non-standard training data, while enriching the data representation with external resources removes additional 11 percent of the error. The final configuration achieves tagging accuracy of 87.41{%} on the full morphosyntactic description, which is, nevertheless, still quite far from the accuracy of 94.27{%} achieved on standard text.
Tasks Domain Adaptation, Lemmatization, Machine Translation, Part-Of-Speech Tagging
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1410/
PDF https://www.aclweb.org/anthology/W17-1410
PWC https://paperswithcode.com/paper/adapting-a-state-of-the-art-tagger-for-south
Repo
Framework

A semiautomatic lemmatisation procedure for treebanks. Old English strong and weak verbs

Title A semiautomatic lemmatisation procedure for treebanks. Old English strong and weak verbs
Authors Marta T{'\i}o S{'a}enz, Dar{'\i}o Metola Rodr{'\i}guez
Abstract
Tasks Morphological Tagging
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-7613/
PDF https://www.aclweb.org/anthology/W17-7613
PWC https://paperswithcode.com/paper/a-semiautomatic-lemmatisation-procedure-for
Repo
Framework

Author Index

Title Author Index
Authors
Abstract
Tasks
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-7625/
PDF https://www.aclweb.org/anthology/W17-7625
PWC https://paperswithcode.com/paper/author-index
Repo
Framework

Multi-word Entity Classification in a Highly Multilingual Environment

Title Multi-word Entity Classification in a Highly Multilingual Environment
Authors Sophie Chesney, Guillaume Jacquet, Ralf Steinberger, Jakub Piskorski
Abstract This paper describes an approach for the classification of millions of existing multi-word entities (MWEntities), such as organisation or event names, into thirteen category types, based only on the tokens they contain. In order to classify our very large in-house collection of multilingual MWEntities into an application-oriented set of entity categories, we trained and tested distantly-supervised classifiers in 43 languages based on MWEntities extracted from BabelNet. The best-performing classifier was the multi-class SVM using a TF.IDF-weighted data representation. Interestingly, one unique classifier trained on a mix of all languages consistently performed better than classifiers trained for individual languages, reaching an averaged F1-value of 88.8{%}. In this paper, we present the training and test data, including a human evaluation of its accuracy, describe the methods used to train the classifiers, and discuss the results.
Tasks Named Entity Recognition
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1702/
PDF https://www.aclweb.org/anthology/W17-1702
PWC https://paperswithcode.com/paper/multi-word-entity-classification-in-a-highly
Repo
Framework

The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions

Title The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Authors Agata Savary, Carlos Ramisch, Silvio Cordeiro, Federico Sangati, Veronika Vincze, Behrang QasemiZadeh, C, Marie ito, Fabienne Cap, Voula Giouli, Ivelina Stoyanova, Antoine Doucet
Abstract Multiword expressions (MWEs) are known as a {}pain in the neck{''} for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one{'}s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as {}words with spaces{''}. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1704/
PDF https://www.aclweb.org/anthology/W17-1704
PWC https://paperswithcode.com/paper/the-parseme-shared-task-on-automatic
Repo
Framework

Neural Networks for Multi-Word Expression Detection

Title Neural Networks for Multi-Word Expression Detection
Authors Natalia Klyueva, Antoine Doucet, Milan Straka
Abstract In this paper we describe the MUMULS system that participated to the 2017 shared task on automatic identification of verbal multiword expressions (VMWEs). The MUMULS system was implemented using a supervised approach based on recurrent neural networks using the open source library TensorFlow. The model was trained on a data set containing annotated VMWEs as well as morphological and syntactic information. The MUMULS system performed the identification of VMWEs in 15 languages, it was one of few systems that could categorize VMWEs type in nearly all languages.
Tasks Machine Translation
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1707/
PDF https://www.aclweb.org/anthology/W17-1707
PWC https://paperswithcode.com/paper/neural-networks-for-multi-word-expression
Repo
Framework

Crowd-Sourced Iterative Annotation for Narrative Summarization Corpora

Title Crowd-Sourced Iterative Annotation for Narrative Summarization Corpora
Authors Jessica Ouyang, Serina Chang, Kathy McKeown
Abstract We present an iterative annotation process for producing aligned, parallel corpora of abstractive and extractive summaries for narrative. Our approach uses a combination of trained annotators and crowd-sourcing, allowing us to elicit human-generated summaries and alignments quickly and at low cost. We use crowd-sourcing to annotate aligned phrases with the text-to-text generation techniques needed to transform each phrase into the other. We apply this process to a corpus of 476 personal narratives, which we make available on the Web.
Tasks Abstractive Text Summarization, Sentence Compression, Text Generation, Text Summarization
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2008/
PDF https://www.aclweb.org/anthology/E17-2008
PWC https://paperswithcode.com/paper/crowd-sourced-iterative-annotation-for
Repo
Framework

Collapsed variational Bayes for Markov jump processes

Title Collapsed variational Bayes for Markov jump processes
Authors Boqian Zhang, Jiangwei Pan, Vinayak A. Rao
Abstract Markov jump processes are continuous-time stochastic processes widely used in statistical applications in the natural sciences, and more recently in machine learning. Inference for these models typically proceeds via Markov chain Monte Carlo, and can suffer from various computational challenges. In this work, we propose a novel collapsed variational inference algorithm to address this issue. Our work leverages ideas from discrete-time Markov chains, and exploits a connection between these two through an idea called uniformization. Our algorithm proceeds by marginalizing out the parameters of the Markov jump process, and then approximating the distribution over the trajectory with a factored distribution over segments of a piecewise-constant function. Unlike MCMC schemes that marginalize out transition times of a piecewise-constant process, our scheme optimizes the discretization of time, resulting in significant computational savings. We apply our ideas to synthetic data as well as a dataset of check-in recordings, where we demonstrate superior performance over state-of-the-art MCMC methods.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/6965-collapsed-variational-bayes-for-markov-jump-processes
PDF http://papers.nips.cc/paper/6965-collapsed-variational-bayes-for-markov-jump-processes.pdf
PWC https://paperswithcode.com/paper/collapsed-variational-bayes-for-markov-jump
Repo
Framework

MASSAlign: Alignment and Annotation of Comparable Documents

Title MASSAlign: Alignment and Annotation of Comparable Documents
Authors Gustavo Paetzold, Fern Alva-Manchego, o, Lucia Specia
Abstract We introduce MASSAlign: a Python library for the alignment and annotation of monolingual comparable documents. MASSAlign offers easy-to-use access to state of the art algorithms for paragraph and sentence-level alignment, as well as novel algorithms for word-level annotation of transformation operations between aligned sentences. In addition, MASSAlign provides a visualization module to display and analyze the alignments and annotations performed.
Tasks
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-3001/
PDF https://www.aclweb.org/anthology/I17-3001
PWC https://paperswithcode.com/paper/massalign-alignment-and-annotation-of
Repo
Framework

Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment

Title Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment
Authors Natalie Vargas, Carlos Ramisch, Helena Caseli
Abstract We propose a method for joint unsupervised discovery of multiword expressions (MWEs) and their translations from parallel corpora. First, we apply independent monolingual MWE extraction in source and target languages simultaneously. Then, we calculate translation probability, association score and distributional similarity of co-occurring pairs. Finally, we rank all translations of a given MWE using a linear combination of these features. Preliminary experiments on light verb constructions show promising results.
Tasks Machine Translation, Word Alignment
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1711/
PDF https://www.aclweb.org/anthology/W17-1711
PWC https://paperswithcode.com/paper/discovering-light-verb-constructions-and
Repo
Framework

Phonological Soundscapes in Medieval Poetry

Title Phonological Soundscapes in Medieval Poetry
Authors Christopher Hench
Abstract The oral component of medieval poetry was integral to its performance and reception. Yet many believe that the medieval voice has been forever lost, and any attempts at rediscovering it are doomed to failure due to scribal practices, manuscript mouvance, and linguistic normalization in editing practices. This paper offers a method to abstract from this noise and better understand relative differences in phonological soundscapes by considering syllable qualities. The presented syllabification method and soundscape analysis offer themselves as cross-disciplinary tools for low-resource languages. As a case study, we examine medieval German lyric and argue that the heavily debated lyrical {`}I{'} follows a unique trajectory through soundscapes, shedding light on the performance and practice of these poets. |
Tasks
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2207/
PDF https://www.aclweb.org/anthology/W17-2207
PWC https://paperswithcode.com/paper/phonological-soundscapes-in-medieval-poetry
Repo
Framework

Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources

Title Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources
Authors Manon Scholivet, Carlos Ramisch
Abstract We present a simple and efficient tagger capable of identifying highly ambiguous multiword expressions (MWEs) in French texts. It is based on conditional random fields (CRF), using local context information as features. We show that this approach can obtain results that, in some cases, approach more sophisticated parser-based MWE identification methods without requiring syntactic trees from a treebank. Moreover, we study how well the CRF can take into account external information coming from a lexicon.
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1723/
PDF https://www.aclweb.org/anthology/W17-1723
PWC https://paperswithcode.com/paper/identification-of-ambiguous-multiword
Repo
Framework
comments powered by Disqus