January 25, 2020

2167 words 11 mins read

Paper Group NANR 6

Paper Group NANR 6

SWAP at SemEval-2019 Task 3: Emotion detection in conversations through Tweets, CNN and LSTM deep neural networks. Supervised and Nonlinear Alignment of Two Embedding Spaces for Dictionary Induction in Low Resourced Languages. Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin. Multi-Team: …

SWAP at SemEval-2019 Task 3: Emotion detection in conversations through Tweets, CNN and LSTM deep neural networks

Title SWAP at SemEval-2019 Task 3: Emotion detection in conversations through Tweets, CNN and LSTM deep neural networks
Authors Marco Polignano, Marco de Gemmis, Giovanni Semeraro
Abstract Emotion detection from user-generated contents is growing in importance in the area of natural language processing. The approach we proposed for the EmoContext task is based on the combination of a CNN and an LSTM using a concatenation of word embeddings. A stack of convolutional neural networks (CNN) is used for capturing the hierarchical hidden relations among embedding features. Meanwhile, a long short-term memory network (LSTM) is used for capturing information shared among words of the sentence. Each conversation has been formalized as a list of word embeddings, in particular during experimental runs pre-trained Glove and Google word embeddings have been evaluated. Surface lexical features have been also considered, but they have been demonstrated to be not usefully for the classification in this specific task. The final system configuration achieved a micro F1 score of 0.7089. The python code of the system is fully available at https://github.com/marcopoli/EmoContext2019
Tasks Word Embeddings
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2056/
PDF https://www.aclweb.org/anthology/S19-2056
PWC https://paperswithcode.com/paper/swap-at-semeval-2019-task-3-emotion-detection
Repo
Framework

Supervised and Nonlinear Alignment of Two Embedding Spaces for Dictionary Induction in Low Resourced Languages

Title Supervised and Nonlinear Alignment of Two Embedding Spaces for Dictionary Induction in Low Resourced Languages
Authors Masud Moshtaghi
Abstract Enabling cross-lingual NLP tasks by leveraging multilingual word embedding has recently attracted much attention. An important motivation is to support lower resourced languages, however, most efforts focus on demonstrating the effectiveness of the techniques using embeddings derived from similar languages to English with large parallel content. In this study, we first describe the general requirements for the success of these techniques and then present a noise tolerant piecewise linear technique to learn a non-linear mapping between two monolingual word embedding vector spaces. We evaluate our approach on inferring bilingual dictionaries. We show that our technique outperforms the state-of-the-art in lower resourced settings with an average of 3.7{%} improvement of precision @10 across 14 mostly low resourced languages.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1076/
PDF https://www.aclweb.org/anthology/D19-1076
PWC https://paperswithcode.com/paper/supervised-and-nonlinear-alignment-of-two
Repo
Framework

Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin

Title Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin
Authors Francesco Mambrini, Marco Passarotti
Abstract The interoperability between lemmatized corpora of Latin and other resources that use the lemma as indexing key is hampered by the multiple lemmatization strategies that different projects adopt. In this paper we discuss how we tackle the challenges raised by harmonizing different lemmatization criteria in the context of a project that aims to connect linguistic resources for Latin using the Linked Data paradigm. The paper introduces the architecture supporting an open-ended, lemma-based Knowledge Base, built to make textual and lexical resources for Latin interoperable. Particularly, the paper describes the inclusion into the Knowledge Base of its lexical basis, of a word formation lexicon and of a lemmatized and syntactically annotated corpus.
Tasks Lemmatization
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4009/
PDF https://www.aclweb.org/anthology/W19-4009
PWC https://paperswithcode.com/paper/harmonizing-different-lemmatization
Repo
Framework

Multi-Team: A Multi-attention, Multi-decoder Approach to Morphological Analysis.

Title Multi-Team: A Multi-attention, Multi-decoder Approach to Morphological Analysis.
Authors Ahmet {"U}st{"u}n, Rob van der Goot, Gosse Bouma, Gertjan van Noord
Abstract This paper describes our submission to SIGMORPHON 2019 Task 2: Morphological analysis and lemmatization in context. Our model is a multi-task sequence to sequence neural network, which jointly learns morphological tagging and lemmatization. On the encoding side, we exploit character-level as well as contextual information. We introduce a multi-attention decoder to selectively focus on different parts of character and word sequences. To further improve the model, we train on multiple datasets simultaneously and use external embeddings for initialization. Our final model reaches an average morphological tagging F1 score of 94.54 and a lemma accuracy of 93.91 on the test data, ranking respectively 3rd and 6th out of 13 teams in the SIGMORPHON 2019 shared task.
Tasks Lemmatization, Morphological Analysis, Morphological Tagging
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4206/
PDF https://www.aclweb.org/anthology/W19-4206
PWC https://paperswithcode.com/paper/multi-team-a-multi-attention-multi-decoder
Repo
Framework

Transfer Learning in Natural Language Processing

Title Transfer Learning in Natural Language Processing
Authors Sebastian Ruder, Matthew E. Peters, Swabha Swayamdipta, Thomas Wolf
Abstract The classic supervised machine learning paradigm is based on learning in isolation, a single predictive model for a task using a single dataset. This approach requires a large number of training examples and performs best for well-defined and narrow tasks. Transfer learning refers to a set of methods that extend this approach by leveraging data from additional domains or tasks to train a model with better generalization properties. Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of several transfer learning methods and architectures which significantly improved upon the state-of-the-art on a wide range of NLP tasks. These improvements together with the wide availability and ease of integration of these methods are reminiscent of the factors that led to the success of pretrained word embeddings and ImageNet pretraining in computer vision, and indicate that these methods will likely become a common tool in the NLP landscape as well as an important research direction. We will present an overview of modern transfer learning methods in NLP, how models are pre-trained, what information the representations they learn capture, and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks.
Tasks Transfer Learning, Word Embeddings
Published 2019-06-01
URL https://www.aclweb.org/anthology/N19-5004/
PDF https://www.aclweb.org/anthology/N19-5004
PWC https://paperswithcode.com/paper/transfer-learning-in-natural-language
Repo
Framework

Towards Deep Universal Dependencies

Title Towards Deep Universal Dependencies
Authors Kira Droganova, Daniel Zeman
Abstract
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-7717/
PDF https://www.aclweb.org/anthology/W19-7717
PWC https://paperswithcode.com/paper/towards-deep-universal-dependencies
Repo
Framework

NLP@UIOWA at SemEval-2019 Task 6: Classifying the Crass using Multi-windowed CNNs

Title NLP@UIOWA at SemEval-2019 Task 6: Classifying the Crass using Multi-windowed CNNs
Authors Jonathan Rusert, Padmini Srinivasan
Abstract This paper proposes a system for OffensEval (SemEval 2019 Task 6), which calls for a system to classify offensive language into several categories. Our system is a text based CNN, which learns only from the provided training data. Our system achieves 80 - 90{%} accuracy for the binary classification problems (offensive vs not offensive and targeted vs untargeted) and 63{%} accuracy for trinary classification (group vs individual vs other).
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2125/
PDF https://www.aclweb.org/anthology/S19-2125
PWC https://paperswithcode.com/paper/nlpuiowa-at-semeval-2019-task-6-classifying
Repo
Framework

Transfer Learning Based Free-Form Speech Command Classification for Low-Resource Languages

Title Transfer Learning Based Free-Form Speech Command Classification for Low-Resource Languages
Authors Yohan Karunanayake, Uthayasanker Thayasivam, Surangika Ranathunga
Abstract Current state-of-the-art speech-based user interfaces use data intense methodologies to recognize free-form speech commands. However, this is not viable for low-resource languages, which lack speech data. This restricts the usability of such interfaces to a limited number of languages. In this paper, we propose a methodology to develop a robust domain-specific speech command classification system for low-resource languages using speech data of a high-resource language. In this transfer learning-based approach, we used a Convolution Neural Network (CNN) to identify a fixed set of intents using an ASR-based character probability map. We were able to achieve significant results for Sinhala and Tamil datasets using an English based ASR, which attests the robustness of the proposed approach.
Tasks Transfer Learning
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-2040/
PDF https://www.aclweb.org/anthology/P19-2040
PWC https://paperswithcode.com/paper/transfer-learning-based-free-form-speech
Repo
Framework

CUNI–Malta system at SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context: Operation-based word formation

Title CUNI–Malta system at SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context: Operation-based word formation
Authors Ronald Cardenas, Claudia Borg, Daniel Zeman
Abstract This paper presents the submission by the Charles University-University of Malta team to the SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context. We present a lemmatization model based on previous work on neural transducers (Makarov and Clematide, 2018b; Aharoni and Goldberg, 2016). The key difference is that our model transforms the whole word form in every step, instead of consuming it character by character. We propose a merging strategy inspired by Byte-Pair-Encoding that reduces the space of valid operations by merging frequent adjacent operations. The resulting operations not only encode the actions to be performed but the relative position in the word token and how characters need to be transformed. Our morphological tagger is a vanilla biLSTM tagger that operates over operation representations, encoding operations and words in a hierarchical manner. Even though relative performance according to metrics is below the baseline, experiments show that our models capture important associations between interpretable operation labels and fine-grained morpho-syntax labels.
Tasks Lemmatization, Morphological Analysis
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4213/
PDF https://www.aclweb.org/anthology/W19-4213
PWC https://paperswithcode.com/paper/cuni-malta-system-at-sigmorphon-2019-shared
Repo
Framework

Can Greenbergian universals be induced from language networks?

Title Can Greenbergian universals be induced from language networks?
Authors Kartik Sharma, Kaivalya Swami, Aditya Shete, Samar Husain
Abstract
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-7804/
PDF https://www.aclweb.org/anthology/W19-7804
PWC https://paperswithcode.com/paper/can-greenbergian-universals-be-induced-from
Repo
Framework

Learning to request guidance in emergent language

Title Learning to request guidance in emergent language
Authors Benjamin Kolb, Leon Lang, Henning Bartsch, Arwin Gansekoele, Raymond Koopmanschap, Leonardo Romor, David Speck, Mathijs Mul, Elia Bruni
Abstract Previous research into agent communication has shown that a pre-trained guide can speed up the learning process of an imitation learning agent. The guide achieves this by providing the agent with discrete messages in an emerged language about how to solve the task. We extend this one-directional communication by a one-bit communication channel from the learner back to the guide: It is able to ask the guide for help, and we limit the guidance by penalizing the learner for these requests. During training, the agent learns to control this gate based on its current observation. We find that the amount of requested guidance decreases over time and guidance is requested in situations of high uncertainty. We investigate the agent{'}s performance in cases of open and closed gates and discuss potential motives for the observed gating behavior.
Tasks Imitation Learning
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6407/
PDF https://www.aclweb.org/anthology/D19-6407
PWC https://paperswithcode.com/paper/learning-to-request-guidance-in-emergent
Repo
Framework

GrapAL: Connecting the Dots in Scientific Literature

Title GrapAL: Connecting the Dots in Scientific Literature
Authors Christine Betts, Joanna Power, Waleed Ammar
Abstract We introduce GrapAL (Graph database of Academic Literature), a versatile tool for exploring and investigating a knowledge base of scientific literature that was semi-automatically constructed using NLP methods. GrapAL fills many informational needs expressed by researchers. At the core of GrapAL is a Neo4j graph database with an intuitive schema and a simple query language. In this paper, we describe the basic elements of GrapAL, how to use it, and several use cases such as finding experts on a given topic for peer reviewing, discovering indirect connections between biomedical entities, and computing citation-based metrics. We open source the demo code to help other researchers develop applications that build on GrapAL.
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-3025/
PDF https://www.aclweb.org/anthology/P19-3025
PWC https://paperswithcode.com/paper/grapal-connecting-the-dots-in-scientific
Repo
Framework

Machine Translation with parfda, Moses, kenlm, nplm, and PRO

Title Machine Translation with parfda, Moses, kenlm, nplm, and PRO
Authors Ergun Bi{\c{c}}ici
Abstract We build parfda Moses statistical machine translation (SMT) models for most language pairs in the news translation task. We experiment with a hybrid approach using neural language models integrated into Moses. We obtain the constrained data statistics on the machine translation task, the coverage of the test sets, and the upper bounds on the translation results. We also contribute a new testsuite for the German-English language pair and a new automated key phrase extraction technique for the evaluation of the testsuite translations.
Tasks Machine Translation
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5306/
PDF https://www.aclweb.org/anthology/W19-5306
PWC https://paperswithcode.com/paper/machine-translation-with-parfda-moses-kenlm
Repo
Framework

Cross-Lingual Word Embeddings for Morphologically Rich Languages

Title Cross-Lingual Word Embeddings for Morphologically Rich Languages
Authors Ahmet {"U}st{"u}n, Gosse Bouma, Gertjan van Noord
Abstract Cross-lingual word embedding models learn a shared vector space for two or more languages so that words with similar meaning are represented by similar vectors regardless of their language. Although the existing models achieve high performance on pairs of morphologically simple languages, they perform very poorly on morphologically rich languages such as Turkish and Finnish. In this paper, we propose a morpheme-based model in order to increase the performance of cross-lingual word embeddings on morphologically rich languages. Our model includes a simple extension which enables us to exploit morphemes for cross-lingual mapping. We applied our model for the Turkish-Finnish language pair on the bilingual word translation task. Results show that our model outperforms the baseline models by 2{%} in the nearest neighbour ranking.
Tasks Word Embeddings
Published 2019-09-01
URL https://www.aclweb.org/anthology/R19-1140/
PDF https://www.aclweb.org/anthology/R19-1140
PWC https://paperswithcode.com/paper/cross-lingual-word-embeddings-for
Repo
Framework

Toward automatic improvement of language produced by non-native language learners

Title Toward automatic improvement of language produced by non-native language learners
Authors Mathias Creutz, Eetu Sj{"o}blom
Abstract
Tasks
Published 2019-09-01
URL https://www.aclweb.org/anthology/W19-6303/
PDF https://www.aclweb.org/anthology/W19-6303
PWC https://paperswithcode.com/paper/toward-automatic-improvement-of-language
Repo
Framework
comments powered by Disqus