May 5, 2019

2158 words 11 mins read

Paper Group NANR 68

Paper Group NANR 68

Improving word alignment for low resource languages using English monolingual SRL. Finding Rising and Falling Words. Coursebook Texts as a Helping Hand for Classifying Linguistic Complexity in Language Learners’ Writings. Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis. Measuring Cognitive Translation Effort with Ac …

Improving word alignment for low resource languages using English monolingual SRL

Title Improving word alignment for low resource languages using English monolingual SRL
Authors Meriem Beloucif, Markus Saers, Dekai Wu
Abstract We introduce a new statistical machine translation approach specifically geared to learning translation from low resource languages, that exploits monolingual English semantic parsing to bias inversion transduction grammar (ITG) induction. We show that in contrast to conventional statistical machine translation (SMT) training methods, which rely heavily on phrase memorization, our approach focuses on learning bilingual correlations that help translating low resource languages, by using the output language semantic structure to further narrow down ITG constraints. This approach is motivated by previous research which has shown that injecting a semantic frame based objective function while training SMT models improves the translation quality. We show that including a monolingual semantic objective function during the learning of the translation model leads towards a semantically driven alignment which is more efficient than simply tuning loglinear mixture weights against a semantic frame based evaluation metric in the final stage of statistical machine translation training. We test our approach with three different language pairs and demonstrate that our model biases the learning towards more semantically correct alignments. Both GIZA++ and ITG based techniques fail to capture meaningful bilingual constituents, which is required when trying to learn translation models for low resource languages. In contrast, our proposed model not only improve translation by injecting a monolingual objective function to learn bilingual correlations during early training of the translation model, but also helps to learn more meaningful correlations with a relatively small data set, leading to a better alignment compared to either conventional ITG or traditional GIZA++ based approaches.
Tasks Machine Translation, Semantic Parsing, Word Alignment
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4507/
PDF https://www.aclweb.org/anthology/W16-4507
PWC https://paperswithcode.com/paper/improving-word-alignment-for-low-resource
Repo
Framework

Finding Rising and Falling Words

Title Finding Rising and Falling Words
Authors Erik Tjong Kim Sang
Abstract We examine two different methods for finding rising words (among which neologisms) and falling words (among which archaisms) in decades of magazine texts (millions of words) and in years of tweets (billions of words): one based on correlation coefficients of relative frequencies and time, and one based on comparing initial and final word frequencies of time intervals. We find that smoothing frequency scores improves the precision scores of both methods and that the correlation coefficients perform better on magazine text but worse on tweets. Since the two ranking methods find different words they can be used in side-by-side to study the behavior of words over time.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4002/
PDF https://www.aclweb.org/anthology/W16-4002
PWC https://paperswithcode.com/paper/finding-rising-and-falling-words
Repo
Framework

Coursebook Texts as a Helping Hand for Classifying Linguistic Complexity in Language Learners’ Writings

Title Coursebook Texts as a Helping Hand for Classifying Linguistic Complexity in Language Learners’ Writings
Authors Ildik{'o} Pil{'a}n, David Alfter, Elena Volodina
Abstract We bring together knowledge from two different types of language learning data, texts learners read and texts they write, to improve linguistic complexity classification in the latter. Linguistic complexity in the foreign and second language learning context can be expressed in terms of proficiency levels. We show that incorporating features capturing lexical complexity information from reading passages can boost significantly the machine learning based classification of learner-written texts into proficiency levels. With an F1 score of .8 our system rivals state-of-the-art results reported for other languages for this task. Finally, we present a freely available web-based tool for proficiency level classification and lexical complexity visualization for both learner writings and reading texts.
Tasks Domain Adaptation
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4114/
PDF https://www.aclweb.org/anthology/W16-4114
PWC https://paperswithcode.com/paper/coursebook-texts-as-a-helping-hand-for
Repo
Framework

Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis

Title Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis
Authors Amir Hazem, B{'e}atrice Daille
Abstract Bilingual lexicon extraction from comparable corpora is usually based on distributional methods when dealing with single word terms (SWT). These methods often treat SWT as single tokens without considering their compositional property. However, many SWT are compositional (composed of roots and affixes) and this information, if taken into account can be very useful to match translational pairs, especially for infrequent terms where distributional methods often fail. For instance, the English compound \textit{xenograft} which is composed of the root \textit{xeno} and the lexeme \textit{graft} can be translated into French compositionally by aligning each of its elements (\textit{xeno} with \textit{x{'e}no} and \textit{graft} with \textit{greffe}) resulting in the translation: \textit{x{'e}nogreffe}. In this paper, we experiment several distributional modellings at the morpheme level that we apply to perform compositional translation to a subset of French and English compounds. We show promising results using distributional analysis at the root and affix levels. We also show that the adapted approach significantly improve bilingual lexicon extraction from comparable corpora compared to the approach at the word level.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1496/
PDF https://www.aclweb.org/anthology/L16-1496
PWC https://paperswithcode.com/paper/bilingual-lexicon-extraction-at-the-morpheme
Repo
Framework

Measuring Cognitive Translation Effort with Activity Units

Title Measuring Cognitive Translation Effort with Activity Units
Authors Moritz Jonas Schaeffer, Michael Carl, Isabel Lacruz, Akiko Aizawa
Abstract
Tasks Machine Translation
Published 2016-01-01
URL https://www.aclweb.org/anthology/W16-3419/
PDF https://www.aclweb.org/anthology/W16-3419
PWC https://paperswithcode.com/paper/measuring-cognitive-translation-effort-with
Repo
Framework

Kathaa: A Visual Programming Framework for NLP Applications

Title Kathaa: A Visual Programming Framework for NLP Applications
Authors Sharada Prasanna Mohanty, Nehal J Wani, Manish Srivastava, Dipti Misra Sharma
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-3019/
PDF https://www.aclweb.org/anthology/N16-3019
PWC https://paperswithcode.com/paper/kathaa-a-visual-programming-framework-for-nlp
Repo
Framework

Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations

Title Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations
Authors Beatrice Alex, Clare Llewellyn, Claire Grover, Oberl, Jon er, Richard Tobin
Abstract Twitter-related studies often need to geo-locate Tweets or Twitter users, identifying their real-world geographic locations. As tweet-level geotagging remains rare, most prior work exploited tweet content, timezone and network information to inform geolocation, or else relied on off-the-shelf tools to geolocate users from location information in their user profiles. However, such user location metadata is not consistently structured, causing such tools to fail regularly, especially if a string contains multiple locations, or if locations are very fine-grained. We argue that user profile location (UPL) and tweet location need to be treated as distinct types of information from which differing inferences can be drawn. Here, we apply geoparsing to UPLs, and demonstrate how task performance can be improved by adapting our Edinburgh Geoparser, which was originally developed for processing English text. We present a detailed evaluation method and results, including inter-coder agreement. We demonstrate that the optimised geoparser can effectively extract and geo-reference multiple locations at different levels of granularity with an F1-score of around 0.90. We also illustrate how geoparsed UPLs can be exploited for international information trade studies and country-level sentiment analysis.
Tasks Sentiment Analysis
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1622/
PDF https://www.aclweb.org/anthology/L16-1622
PWC https://paperswithcode.com/paper/homing-in-on-twitter-users-evaluating-an
Repo
Framework

The Challenges of Multi-dimensional Sentiment Analysis Across Languages

Title The Challenges of Multi-dimensional Sentiment Analysis Across Languages
Authors Emily {"O}hman, Timo Honkela, J{"o}rg Tiedemann
Abstract This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservation of sentiments and emotions in translation and our assessment reveals that the lexical approach shows great inter-language agreement. However, our manual evaluation also suggests that the use of purely lexical methods is limited and further studies are necessary to pinpoint the cross-lingual differences and to develop better sentiment classifiers.
Tasks Sentiment Analysis
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4315/
PDF https://www.aclweb.org/anthology/W16-4315
PWC https://paperswithcode.com/paper/the-challenges-of-multi-dimensional-sentiment
Repo
Framework

Comprehensive Part-Of-Speech Tag Set and SVM based POS Tagger for Sinhala

Title Comprehensive Part-Of-Speech Tag Set and SVM based POS Tagger for Sinhala
Authors Fern, S o, areka, Surangika Ranathunga, Sanath Jayasena, Gihan Dias
Abstract This paper presents a new comprehensive multi-level Part-Of-Speech tag set and a Support Vector Machine based Part-Of-Speech tagger for the Sinhala language. The currently available tag set for Sinhala has two limitations: the unavailability of tags to represent some word classes and the lack of tags to capture inflection based grammatical variations of words. The new tag set, presented in this paper overcomes both of these limitations. The accuracy of available Sinhala Part-Of-Speech taggers, which are based on Hidden Markov Models, still falls far behind state of the art. Our Support Vector Machine based tagger achieved an overall accuracy of 84.68{%} with 59.86{%} accuracy for unknown words and 87.12{%} for known words, when the test set contains 10{%} of unknown words.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3718/
PDF https://www.aclweb.org/anthology/W16-3718
PWC https://paperswithcode.com/paper/comprehensive-part-of-speech-tag-set-and-svm
Repo
Framework

Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform

Title Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform
Authors Paul Crook, Alex Marin, Vipul Agarwal, Khushboo Aggarwal, Tasos Anastasakos, Ravi Bikkula, Daniel Boies, Asli Celikyilmaz, Ch, Senthilkumar ramohan, Zhaleh Feizollahi, Roman Holenstein, Minwoo Jeong, Omar Khan, Young-Bum Kim, Elizabeth Krawczyk, Xiaohu Liu, Danko Panic, Vasiliy Radostev, Nikhil Ramesh, Jean-Phillipe Robichaud, Alex Rochette, re, Logan Stromberg, Ruhi Sarikaya
Abstract
Tasks Dialogue Management
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-3010/
PDF https://www.aclweb.org/anthology/N16-3010
PWC https://paperswithcode.com/paper/task-completion-platform-a-self-serve-multi
Repo
Framework

Modelling Multi-issue Bargaining Dialogues: Data Collection, Annotation Design and Corpus

Title Modelling Multi-issue Bargaining Dialogues: Data Collection, Annotation Design and Corpus
Authors Volha Petukhova, Christopher Stevens, Harmen de Weerd, Niels Taatgen, Fokie Cnossen, Andrei Malchanau
Abstract The paper describes experimental dialogue data collection activities, as well semantically annotated corpus creation undertaken within EU-funded METALOGUE project(www.metalogue.eu). The project aims to develop a dialogue system with flexible dialogue management to enable system{'}s adaptive, reactive, interactive and proactive dialogue behavior in setting goals, choosing appropriate strategies and monitoring numerous parallel interpretation and management processes. To achieve these goals negotiation (or more precisely multi-issue bargaining) scenario has been considered as the specific setting and application domain. The dialogue corpus forms the basis for the design of task and interaction models of participants negotiation behavior, and subsequently for dialogue system development which would be capable to replace one of the negotiators. The METALOGUE corpus will be released to the community for research purposes.
Tasks Dialogue Management
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1500/
PDF https://www.aclweb.org/anthology/L16-1500
PWC https://paperswithcode.com/paper/modelling-multi-issue-bargaining-dialogues
Repo
Framework

Deeper Machine Translation and Evaluation for German

Title Deeper Machine Translation and Evaluation for German
Authors Eleftherios Avramidis, Vivien Macketanz, Aljoscha Burchardt, Jindrich Helcl, Hans Uszkoreit
Abstract
Tasks Machine Translation
Published 2016-10-01
URL https://www.aclweb.org/anthology/W16-6404/
PDF https://www.aclweb.org/anthology/W16-6404
PWC https://paperswithcode.com/paper/deeper-machine-translation-and-evaluation-for
Repo
Framework

PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data

Title PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data
Authors Francesco Corcoglioniti, Marco Rospocher, Alessio Palmero Aprosio, Sara Tonelli
Abstract We introduce PreMOn (predicate model for ontologies), a linguistic resource for exposing predicate models (PropBank, NomBank, VerbNet, and FrameNet) and mappings between them (e.g, SemLink) as Linked Open Data. It consists of two components: (i) the PreMOn Ontology, an extension of the lemon model by the W3C Ontology-Lexica Community Group, that enables to homogeneously represent data from the various predicate models; and, (ii) the PreMOn Dataset, a collection of RDF datasets integrating various versions of the aforementioned predicate models and mapping resources. PreMOn is freely available and accessible online in different ways, including through a dedicated SPARQL endpoint.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1141/
PDF https://www.aclweb.org/anthology/L16-1141
PWC https://paperswithcode.com/paper/premon-a-lemon-extension-for-exposing
Repo
Framework

Named Entity Disambiguation for little known referents: a topic-based approach

Title Named Entity Disambiguation for little known referents: a topic-based approach
Authors Andrea Glaser, Jonas Kuhn
Abstract We propose an approach to Named Entity Disambiguation that avoids a problem of standard work on the task (likewise affecting fully supervised, weakly supervised, or distantly supervised machine learning techniques): the treatment of name mentions referring to people with no (or very little) coverage in the textual training data is systematically incorrect. We propose to indirectly take into account the property information for the {``}non-prominent{''} name bearers, such as nationality and profession (e.g., for a Canadian law professor named Michael Jackson, with no Wikipedia article, it is very hard to obtain reliable textual training data). The target property information for the entities is directly available from name authority files, or inferrable, e.g., from listings of sportspeople etc. Our proposed approach employs topic modeling to exploit textual training data based on entities sharing the relevant properties. In experiments with a pilot implementation of the general approach, we show that the approach does indeed work well for name/referent pairs with limited textual coverage in the training data. |
Tasks Entity Disambiguation, Entity Linking, Information Retrieval
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1140/
PDF https://www.aclweb.org/anthology/C16-1140
PWC https://paperswithcode.com/paper/named-entity-disambiguation-for-little-known
Repo
Framework

Data Selection for IT Texts using Paragraph Vector

Title Data Selection for IT Texts using Paragraph Vector
Authors Mirela-Stefania Duma, Wolfgang Menzel
Abstract
Tasks Domain Adaptation, Machine Translation
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2331/
PDF https://www.aclweb.org/anthology/W16-2331
PWC https://paperswithcode.com/paper/data-selection-for-it-texts-using-paragraph
Repo
Framework
comments powered by Disqus