Paper Group NANR 68
Improving word alignment for low resource languages using English monolingual SRL. Finding Rising and Falling Words. Coursebook Texts as a Helping Hand for Classifying Linguistic Complexity in Language Learners’ Writings. Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis. Measuring Cognitive Translation Effort with Ac …
Improving word alignment for low resource languages using English monolingual SRL
Title | Improving word alignment for low resource languages using English monolingual SRL |
Authors | Meriem Beloucif, Markus Saers, Dekai Wu |
Abstract | We introduce a new statistical machine translation approach specifically geared to learning translation from low resource languages, that exploits monolingual English semantic parsing to bias inversion transduction grammar (ITG) induction. We show that in contrast to conventional statistical machine translation (SMT) training methods, which rely heavily on phrase memorization, our approach focuses on learning bilingual correlations that help translating low resource languages, by using the output language semantic structure to further narrow down ITG constraints. This approach is motivated by previous research which has shown that injecting a semantic frame based objective function while training SMT models improves the translation quality. We show that including a monolingual semantic objective function during the learning of the translation model leads towards a semantically driven alignment which is more efficient than simply tuning loglinear mixture weights against a semantic frame based evaluation metric in the final stage of statistical machine translation training. We test our approach with three different language pairs and demonstrate that our model biases the learning towards more semantically correct alignments. Both GIZA++ and ITG based techniques fail to capture meaningful bilingual constituents, which is required when trying to learn translation models for low resource languages. In contrast, our proposed model not only improve translation by injecting a monolingual objective function to learn bilingual correlations during early training of the translation model, but also helps to learn more meaningful correlations with a relatively small data set, leading to a better alignment compared to either conventional ITG or traditional GIZA++ based approaches. |
Tasks | Machine Translation, Semantic Parsing, Word Alignment |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4507/ |
https://www.aclweb.org/anthology/W16-4507 | |
PWC | https://paperswithcode.com/paper/improving-word-alignment-for-low-resource |
Repo | |
Framework | |
Finding Rising and Falling Words
Title | Finding Rising and Falling Words |
Authors | Erik Tjong Kim Sang |
Abstract | We examine two different methods for finding rising words (among which neologisms) and falling words (among which archaisms) in decades of magazine texts (millions of words) and in years of tweets (billions of words): one based on correlation coefficients of relative frequencies and time, and one based on comparing initial and final word frequencies of time intervals. We find that smoothing frequency scores improves the precision scores of both methods and that the correlation coefficients perform better on magazine text but worse on tweets. Since the two ranking methods find different words they can be used in side-by-side to study the behavior of words over time. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4002/ |
https://www.aclweb.org/anthology/W16-4002 | |
PWC | https://paperswithcode.com/paper/finding-rising-and-falling-words |
Repo | |
Framework | |
Coursebook Texts as a Helping Hand for Classifying Linguistic Complexity in Language Learners’ Writings
Title | Coursebook Texts as a Helping Hand for Classifying Linguistic Complexity in Language Learners’ Writings |
Authors | Ildik{'o} Pil{'a}n, David Alfter, Elena Volodina |
Abstract | We bring together knowledge from two different types of language learning data, texts learners read and texts they write, to improve linguistic complexity classification in the latter. Linguistic complexity in the foreign and second language learning context can be expressed in terms of proficiency levels. We show that incorporating features capturing lexical complexity information from reading passages can boost significantly the machine learning based classification of learner-written texts into proficiency levels. With an F1 score of .8 our system rivals state-of-the-art results reported for other languages for this task. Finally, we present a freely available web-based tool for proficiency level classification and lexical complexity visualization for both learner writings and reading texts. |
Tasks | Domain Adaptation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4114/ |
https://www.aclweb.org/anthology/W16-4114 | |
PWC | https://paperswithcode.com/paper/coursebook-texts-as-a-helping-hand-for |
Repo | |
Framework | |
Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis
Title | Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis |
Authors | Amir Hazem, B{'e}atrice Daille |
Abstract | Bilingual lexicon extraction from comparable corpora is usually based on distributional methods when dealing with single word terms (SWT). These methods often treat SWT as single tokens without considering their compositional property. However, many SWT are compositional (composed of roots and affixes) and this information, if taken into account can be very useful to match translational pairs, especially for infrequent terms where distributional methods often fail. For instance, the English compound \textit{xenograft} which is composed of the root \textit{xeno} and the lexeme \textit{graft} can be translated into French compositionally by aligning each of its elements (\textit{xeno} with \textit{x{'e}no} and \textit{graft} with \textit{greffe}) resulting in the translation: \textit{x{'e}nogreffe}. In this paper, we experiment several distributional modellings at the morpheme level that we apply to perform compositional translation to a subset of French and English compounds. We show promising results using distributional analysis at the root and affix levels. We also show that the adapted approach significantly improve bilingual lexicon extraction from comparable corpora compared to the approach at the word level. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1496/ |
https://www.aclweb.org/anthology/L16-1496 | |
PWC | https://paperswithcode.com/paper/bilingual-lexicon-extraction-at-the-morpheme |
Repo | |
Framework | |
Measuring Cognitive Translation Effort with Activity Units
Title | Measuring Cognitive Translation Effort with Activity Units |
Authors | Moritz Jonas Schaeffer, Michael Carl, Isabel Lacruz, Akiko Aizawa |
Abstract | |
Tasks | Machine Translation |
Published | 2016-01-01 |
URL | https://www.aclweb.org/anthology/W16-3419/ |
https://www.aclweb.org/anthology/W16-3419 | |
PWC | https://paperswithcode.com/paper/measuring-cognitive-translation-effort-with |
Repo | |
Framework | |
Kathaa: A Visual Programming Framework for NLP Applications
Title | Kathaa: A Visual Programming Framework for NLP Applications |
Authors | Sharada Prasanna Mohanty, Nehal J Wani, Manish Srivastava, Dipti Misra Sharma |
Abstract | |
Tasks | |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-3019/ |
https://www.aclweb.org/anthology/N16-3019 | |
PWC | https://paperswithcode.com/paper/kathaa-a-visual-programming-framework-for-nlp |
Repo | |
Framework | |
Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations
Title | Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations |
Authors | Beatrice Alex, Clare Llewellyn, Claire Grover, Oberl, Jon er, Richard Tobin |
Abstract | Twitter-related studies often need to geo-locate Tweets or Twitter users, identifying their real-world geographic locations. As tweet-level geotagging remains rare, most prior work exploited tweet content, timezone and network information to inform geolocation, or else relied on off-the-shelf tools to geolocate users from location information in their user profiles. However, such user location metadata is not consistently structured, causing such tools to fail regularly, especially if a string contains multiple locations, or if locations are very fine-grained. We argue that user profile location (UPL) and tweet location need to be treated as distinct types of information from which differing inferences can be drawn. Here, we apply geoparsing to UPLs, and demonstrate how task performance can be improved by adapting our Edinburgh Geoparser, which was originally developed for processing English text. We present a detailed evaluation method and results, including inter-coder agreement. We demonstrate that the optimised geoparser can effectively extract and geo-reference multiple locations at different levels of granularity with an F1-score of around 0.90. We also illustrate how geoparsed UPLs can be exploited for international information trade studies and country-level sentiment analysis. |
Tasks | Sentiment Analysis |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1622/ |
https://www.aclweb.org/anthology/L16-1622 | |
PWC | https://paperswithcode.com/paper/homing-in-on-twitter-users-evaluating-an |
Repo | |
Framework | |
The Challenges of Multi-dimensional Sentiment Analysis Across Languages
Title | The Challenges of Multi-dimensional Sentiment Analysis Across Languages |
Authors | Emily {"O}hman, Timo Honkela, J{"o}rg Tiedemann |
Abstract | This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservation of sentiments and emotions in translation and our assessment reveals that the lexical approach shows great inter-language agreement. However, our manual evaluation also suggests that the use of purely lexical methods is limited and further studies are necessary to pinpoint the cross-lingual differences and to develop better sentiment classifiers. |
Tasks | Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4315/ |
https://www.aclweb.org/anthology/W16-4315 | |
PWC | https://paperswithcode.com/paper/the-challenges-of-multi-dimensional-sentiment |
Repo | |
Framework | |
Comprehensive Part-Of-Speech Tag Set and SVM based POS Tagger for Sinhala
Title | Comprehensive Part-Of-Speech Tag Set and SVM based POS Tagger for Sinhala |
Authors | Fern, S o, areka, Surangika Ranathunga, Sanath Jayasena, Gihan Dias |
Abstract | This paper presents a new comprehensive multi-level Part-Of-Speech tag set and a Support Vector Machine based Part-Of-Speech tagger for the Sinhala language. The currently available tag set for Sinhala has two limitations: the unavailability of tags to represent some word classes and the lack of tags to capture inflection based grammatical variations of words. The new tag set, presented in this paper overcomes both of these limitations. The accuracy of available Sinhala Part-Of-Speech taggers, which are based on Hidden Markov Models, still falls far behind state of the art. Our Support Vector Machine based tagger achieved an overall accuracy of 84.68{%} with 59.86{%} accuracy for unknown words and 87.12{%} for known words, when the test set contains 10{%} of unknown words. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3718/ |
https://www.aclweb.org/anthology/W16-3718 | |
PWC | https://paperswithcode.com/paper/comprehensive-part-of-speech-tag-set-and-svm |
Repo | |
Framework | |
Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform
Title | Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform |
Authors | Paul Crook, Alex Marin, Vipul Agarwal, Khushboo Aggarwal, Tasos Anastasakos, Ravi Bikkula, Daniel Boies, Asli Celikyilmaz, Ch, Senthilkumar ramohan, Zhaleh Feizollahi, Roman Holenstein, Minwoo Jeong, Omar Khan, Young-Bum Kim, Elizabeth Krawczyk, Xiaohu Liu, Danko Panic, Vasiliy Radostev, Nikhil Ramesh, Jean-Phillipe Robichaud, Alex Rochette, re, Logan Stromberg, Ruhi Sarikaya |
Abstract | |
Tasks | Dialogue Management |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-3010/ |
https://www.aclweb.org/anthology/N16-3010 | |
PWC | https://paperswithcode.com/paper/task-completion-platform-a-self-serve-multi |
Repo | |
Framework | |
Modelling Multi-issue Bargaining Dialogues: Data Collection, Annotation Design and Corpus
Title | Modelling Multi-issue Bargaining Dialogues: Data Collection, Annotation Design and Corpus |
Authors | Volha Petukhova, Christopher Stevens, Harmen de Weerd, Niels Taatgen, Fokie Cnossen, Andrei Malchanau |
Abstract | The paper describes experimental dialogue data collection activities, as well semantically annotated corpus creation undertaken within EU-funded METALOGUE project(www.metalogue.eu). The project aims to develop a dialogue system with flexible dialogue management to enable system{'}s adaptive, reactive, interactive and proactive dialogue behavior in setting goals, choosing appropriate strategies and monitoring numerous parallel interpretation and management processes. To achieve these goals negotiation (or more precisely multi-issue bargaining) scenario has been considered as the specific setting and application domain. The dialogue corpus forms the basis for the design of task and interaction models of participants negotiation behavior, and subsequently for dialogue system development which would be capable to replace one of the negotiators. The METALOGUE corpus will be released to the community for research purposes. |
Tasks | Dialogue Management |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1500/ |
https://www.aclweb.org/anthology/L16-1500 | |
PWC | https://paperswithcode.com/paper/modelling-multi-issue-bargaining-dialogues |
Repo | |
Framework | |
Deeper Machine Translation and Evaluation for German
Title | Deeper Machine Translation and Evaluation for German |
Authors | Eleftherios Avramidis, Vivien Macketanz, Aljoscha Burchardt, Jindrich Helcl, Hans Uszkoreit |
Abstract | |
Tasks | Machine Translation |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/W16-6404/ |
https://www.aclweb.org/anthology/W16-6404 | |
PWC | https://paperswithcode.com/paper/deeper-machine-translation-and-evaluation-for |
Repo | |
Framework | |
PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data
Title | PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data |
Authors | Francesco Corcoglioniti, Marco Rospocher, Alessio Palmero Aprosio, Sara Tonelli |
Abstract | We introduce PreMOn (predicate model for ontologies), a linguistic resource for exposing predicate models (PropBank, NomBank, VerbNet, and FrameNet) and mappings between them (e.g, SemLink) as Linked Open Data. It consists of two components: (i) the PreMOn Ontology, an extension of the lemon model by the W3C Ontology-Lexica Community Group, that enables to homogeneously represent data from the various predicate models; and, (ii) the PreMOn Dataset, a collection of RDF datasets integrating various versions of the aforementioned predicate models and mapping resources. PreMOn is freely available and accessible online in different ways, including through a dedicated SPARQL endpoint. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1141/ |
https://www.aclweb.org/anthology/L16-1141 | |
PWC | https://paperswithcode.com/paper/premon-a-lemon-extension-for-exposing |
Repo | |
Framework | |
Named Entity Disambiguation for little known referents: a topic-based approach
Title | Named Entity Disambiguation for little known referents: a topic-based approach |
Authors | Andrea Glaser, Jonas Kuhn |
Abstract | We propose an approach to Named Entity Disambiguation that avoids a problem of standard work on the task (likewise affecting fully supervised, weakly supervised, or distantly supervised machine learning techniques): the treatment of name mentions referring to people with no (or very little) coverage in the textual training data is systematically incorrect. We propose to indirectly take into account the property information for the {``}non-prominent{''} name bearers, such as nationality and profession (e.g., for a Canadian law professor named Michael Jackson, with no Wikipedia article, it is very hard to obtain reliable textual training data). The target property information for the entities is directly available from name authority files, or inferrable, e.g., from listings of sportspeople etc. Our proposed approach employs topic modeling to exploit textual training data based on entities sharing the relevant properties. In experiments with a pilot implementation of the general approach, we show that the approach does indeed work well for name/referent pairs with limited textual coverage in the training data. | |
Tasks | Entity Disambiguation, Entity Linking, Information Retrieval |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1140/ |
https://www.aclweb.org/anthology/C16-1140 | |
PWC | https://paperswithcode.com/paper/named-entity-disambiguation-for-little-known |
Repo | |
Framework | |
Data Selection for IT Texts using Paragraph Vector
Title | Data Selection for IT Texts using Paragraph Vector |
Authors | Mirela-Stefania Duma, Wolfgang Menzel |
Abstract | |
Tasks | Domain Adaptation, Machine Translation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2331/ |
https://www.aclweb.org/anthology/W16-2331 | |
PWC | https://paperswithcode.com/paper/data-selection-for-it-texts-using-paragraph |
Repo | |
Framework | |