Paper Group NANR 6
SWAP at SemEval-2019 Task 3: Emotion detection in conversations through Tweets, CNN and LSTM deep neural networks. Supervised and Nonlinear Alignment of Two Embedding Spaces for Dictionary Induction in Low Resourced Languages. Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin. Multi-Team: …
SWAP at SemEval-2019 Task 3: Emotion detection in conversations through Tweets, CNN and LSTM deep neural networks
Title | SWAP at SemEval-2019 Task 3: Emotion detection in conversations through Tweets, CNN and LSTM deep neural networks |
Authors | Marco Polignano, Marco de Gemmis, Giovanni Semeraro |
Abstract | Emotion detection from user-generated contents is growing in importance in the area of natural language processing. The approach we proposed for the EmoContext task is based on the combination of a CNN and an LSTM using a concatenation of word embeddings. A stack of convolutional neural networks (CNN) is used for capturing the hierarchical hidden relations among embedding features. Meanwhile, a long short-term memory network (LSTM) is used for capturing information shared among words of the sentence. Each conversation has been formalized as a list of word embeddings, in particular during experimental runs pre-trained Glove and Google word embeddings have been evaluated. Surface lexical features have been also considered, but they have been demonstrated to be not usefully for the classification in this specific task. The final system configuration achieved a micro F1 score of 0.7089. The python code of the system is fully available at https://github.com/marcopoli/EmoContext2019 |
Tasks | Word Embeddings |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2056/ |
https://www.aclweb.org/anthology/S19-2056 | |
PWC | https://paperswithcode.com/paper/swap-at-semeval-2019-task-3-emotion-detection |
Repo | |
Framework | |
Supervised and Nonlinear Alignment of Two Embedding Spaces for Dictionary Induction in Low Resourced Languages
Title | Supervised and Nonlinear Alignment of Two Embedding Spaces for Dictionary Induction in Low Resourced Languages |
Authors | Masud Moshtaghi |
Abstract | Enabling cross-lingual NLP tasks by leveraging multilingual word embedding has recently attracted much attention. An important motivation is to support lower resourced languages, however, most efforts focus on demonstrating the effectiveness of the techniques using embeddings derived from similar languages to English with large parallel content. In this study, we first describe the general requirements for the success of these techniques and then present a noise tolerant piecewise linear technique to learn a non-linear mapping between two monolingual word embedding vector spaces. We evaluate our approach on inferring bilingual dictionaries. We show that our technique outperforms the state-of-the-art in lower resourced settings with an average of 3.7{%} improvement of precision @10 across 14 mostly low resourced languages. |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1076/ |
https://www.aclweb.org/anthology/D19-1076 | |
PWC | https://paperswithcode.com/paper/supervised-and-nonlinear-alignment-of-two |
Repo | |
Framework | |
Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin
Title | Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin |
Authors | Francesco Mambrini, Marco Passarotti |
Abstract | The interoperability between lemmatized corpora of Latin and other resources that use the lemma as indexing key is hampered by the multiple lemmatization strategies that different projects adopt. In this paper we discuss how we tackle the challenges raised by harmonizing different lemmatization criteria in the context of a project that aims to connect linguistic resources for Latin using the Linked Data paradigm. The paper introduces the architecture supporting an open-ended, lemma-based Knowledge Base, built to make textual and lexical resources for Latin interoperable. Particularly, the paper describes the inclusion into the Knowledge Base of its lexical basis, of a word formation lexicon and of a lemmatized and syntactically annotated corpus. |
Tasks | Lemmatization |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4009/ |
https://www.aclweb.org/anthology/W19-4009 | |
PWC | https://paperswithcode.com/paper/harmonizing-different-lemmatization |
Repo | |
Framework | |
Multi-Team: A Multi-attention, Multi-decoder Approach to Morphological Analysis.
Title | Multi-Team: A Multi-attention, Multi-decoder Approach to Morphological Analysis. |
Authors | Ahmet {"U}st{"u}n, Rob van der Goot, Gosse Bouma, Gertjan van Noord |
Abstract | This paper describes our submission to SIGMORPHON 2019 Task 2: Morphological analysis and lemmatization in context. Our model is a multi-task sequence to sequence neural network, which jointly learns morphological tagging and lemmatization. On the encoding side, we exploit character-level as well as contextual information. We introduce a multi-attention decoder to selectively focus on different parts of character and word sequences. To further improve the model, we train on multiple datasets simultaneously and use external embeddings for initialization. Our final model reaches an average morphological tagging F1 score of 94.54 and a lemma accuracy of 93.91 on the test data, ranking respectively 3rd and 6th out of 13 teams in the SIGMORPHON 2019 shared task. |
Tasks | Lemmatization, Morphological Analysis, Morphological Tagging |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4206/ |
https://www.aclweb.org/anthology/W19-4206 | |
PWC | https://paperswithcode.com/paper/multi-team-a-multi-attention-multi-decoder |
Repo | |
Framework | |
Transfer Learning in Natural Language Processing
Title | Transfer Learning in Natural Language Processing |
Authors | Sebastian Ruder, Matthew E. Peters, Swabha Swayamdipta, Thomas Wolf |
Abstract | The classic supervised machine learning paradigm is based on learning in isolation, a single predictive model for a task using a single dataset. This approach requires a large number of training examples and performs best for well-defined and narrow tasks. Transfer learning refers to a set of methods that extend this approach by leveraging data from additional domains or tasks to train a model with better generalization properties. Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of several transfer learning methods and architectures which significantly improved upon the state-of-the-art on a wide range of NLP tasks. These improvements together with the wide availability and ease of integration of these methods are reminiscent of the factors that led to the success of pretrained word embeddings and ImageNet pretraining in computer vision, and indicate that these methods will likely become a common tool in the NLP landscape as well as an important research direction. We will present an overview of modern transfer learning methods in NLP, how models are pre-trained, what information the representations they learn capture, and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks. |
Tasks | Transfer Learning, Word Embeddings |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-5004/ |
https://www.aclweb.org/anthology/N19-5004 | |
PWC | https://paperswithcode.com/paper/transfer-learning-in-natural-language |
Repo | |
Framework | |
Towards Deep Universal Dependencies
Title | Towards Deep Universal Dependencies |
Authors | Kira Droganova, Daniel Zeman |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7717/ |
https://www.aclweb.org/anthology/W19-7717 | |
PWC | https://paperswithcode.com/paper/towards-deep-universal-dependencies |
Repo | |
Framework | |
NLP@UIOWA at SemEval-2019 Task 6: Classifying the Crass using Multi-windowed CNNs
Title | NLP@UIOWA at SemEval-2019 Task 6: Classifying the Crass using Multi-windowed CNNs |
Authors | Jonathan Rusert, Padmini Srinivasan |
Abstract | This paper proposes a system for OffensEval (SemEval 2019 Task 6), which calls for a system to classify offensive language into several categories. Our system is a text based CNN, which learns only from the provided training data. Our system achieves 80 - 90{%} accuracy for the binary classification problems (offensive vs not offensive and targeted vs untargeted) and 63{%} accuracy for trinary classification (group vs individual vs other). |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2125/ |
https://www.aclweb.org/anthology/S19-2125 | |
PWC | https://paperswithcode.com/paper/nlpuiowa-at-semeval-2019-task-6-classifying |
Repo | |
Framework | |
Transfer Learning Based Free-Form Speech Command Classification for Low-Resource Languages
Title | Transfer Learning Based Free-Form Speech Command Classification for Low-Resource Languages |
Authors | Yohan Karunanayake, Uthayasanker Thayasivam, Surangika Ranathunga |
Abstract | Current state-of-the-art speech-based user interfaces use data intense methodologies to recognize free-form speech commands. However, this is not viable for low-resource languages, which lack speech data. This restricts the usability of such interfaces to a limited number of languages. In this paper, we propose a methodology to develop a robust domain-specific speech command classification system for low-resource languages using speech data of a high-resource language. In this transfer learning-based approach, we used a Convolution Neural Network (CNN) to identify a fixed set of intents using an ASR-based character probability map. We were able to achieve significant results for Sinhala and Tamil datasets using an English based ASR, which attests the robustness of the proposed approach. |
Tasks | Transfer Learning |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-2040/ |
https://www.aclweb.org/anthology/P19-2040 | |
PWC | https://paperswithcode.com/paper/transfer-learning-based-free-form-speech |
Repo | |
Framework | |
CUNI–Malta system at SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context: Operation-based word formation
Title | CUNI–Malta system at SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context: Operation-based word formation |
Authors | Ronald Cardenas, Claudia Borg, Daniel Zeman |
Abstract | This paper presents the submission by the Charles University-University of Malta team to the SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context. We present a lemmatization model based on previous work on neural transducers (Makarov and Clematide, 2018b; Aharoni and Goldberg, 2016). The key difference is that our model transforms the whole word form in every step, instead of consuming it character by character. We propose a merging strategy inspired by Byte-Pair-Encoding that reduces the space of valid operations by merging frequent adjacent operations. The resulting operations not only encode the actions to be performed but the relative position in the word token and how characters need to be transformed. Our morphological tagger is a vanilla biLSTM tagger that operates over operation representations, encoding operations and words in a hierarchical manner. Even though relative performance according to metrics is below the baseline, experiments show that our models capture important associations between interpretable operation labels and fine-grained morpho-syntax labels. |
Tasks | Lemmatization, Morphological Analysis |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4213/ |
https://www.aclweb.org/anthology/W19-4213 | |
PWC | https://paperswithcode.com/paper/cuni-malta-system-at-sigmorphon-2019-shared |
Repo | |
Framework | |
Can Greenbergian universals be induced from language networks?
Title | Can Greenbergian universals be induced from language networks? |
Authors | Kartik Sharma, Kaivalya Swami, Aditya Shete, Samar Husain |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7804/ |
https://www.aclweb.org/anthology/W19-7804 | |
PWC | https://paperswithcode.com/paper/can-greenbergian-universals-be-induced-from |
Repo | |
Framework | |
Learning to request guidance in emergent language
Title | Learning to request guidance in emergent language |
Authors | Benjamin Kolb, Leon Lang, Henning Bartsch, Arwin Gansekoele, Raymond Koopmanschap, Leonardo Romor, David Speck, Mathijs Mul, Elia Bruni |
Abstract | Previous research into agent communication has shown that a pre-trained guide can speed up the learning process of an imitation learning agent. The guide achieves this by providing the agent with discrete messages in an emerged language about how to solve the task. We extend this one-directional communication by a one-bit communication channel from the learner back to the guide: It is able to ask the guide for help, and we limit the guidance by penalizing the learner for these requests. During training, the agent learns to control this gate based on its current observation. We find that the amount of requested guidance decreases over time and guidance is requested in situations of high uncertainty. We investigate the agent{'}s performance in cases of open and closed gates and discuss potential motives for the observed gating behavior. |
Tasks | Imitation Learning |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-6407/ |
https://www.aclweb.org/anthology/D19-6407 | |
PWC | https://paperswithcode.com/paper/learning-to-request-guidance-in-emergent |
Repo | |
Framework | |
GrapAL: Connecting the Dots in Scientific Literature
Title | GrapAL: Connecting the Dots in Scientific Literature |
Authors | Christine Betts, Joanna Power, Waleed Ammar |
Abstract | We introduce GrapAL (Graph database of Academic Literature), a versatile tool for exploring and investigating a knowledge base of scientific literature that was semi-automatically constructed using NLP methods. GrapAL fills many informational needs expressed by researchers. At the core of GrapAL is a Neo4j graph database with an intuitive schema and a simple query language. In this paper, we describe the basic elements of GrapAL, how to use it, and several use cases such as finding experts on a given topic for peer reviewing, discovering indirect connections between biomedical entities, and computing citation-based metrics. We open source the demo code to help other researchers develop applications that build on GrapAL. |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-3025/ |
https://www.aclweb.org/anthology/P19-3025 | |
PWC | https://paperswithcode.com/paper/grapal-connecting-the-dots-in-scientific |
Repo | |
Framework | |
Machine Translation with parfda, Moses, kenlm, nplm, and PRO
Title | Machine Translation with parfda, Moses, kenlm, nplm, and PRO |
Authors | Ergun Bi{\c{c}}ici |
Abstract | We build parfda Moses statistical machine translation (SMT) models for most language pairs in the news translation task. We experiment with a hybrid approach using neural language models integrated into Moses. We obtain the constrained data statistics on the machine translation task, the coverage of the test sets, and the upper bounds on the translation results. We also contribute a new testsuite for the German-English language pair and a new automated key phrase extraction technique for the evaluation of the testsuite translations. |
Tasks | Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5306/ |
https://www.aclweb.org/anthology/W19-5306 | |
PWC | https://paperswithcode.com/paper/machine-translation-with-parfda-moses-kenlm |
Repo | |
Framework | |
Cross-Lingual Word Embeddings for Morphologically Rich Languages
Title | Cross-Lingual Word Embeddings for Morphologically Rich Languages |
Authors | Ahmet {"U}st{"u}n, Gosse Bouma, Gertjan van Noord |
Abstract | Cross-lingual word embedding models learn a shared vector space for two or more languages so that words with similar meaning are represented by similar vectors regardless of their language. Although the existing models achieve high performance on pairs of morphologically simple languages, they perform very poorly on morphologically rich languages such as Turkish and Finnish. In this paper, we propose a morpheme-based model in order to increase the performance of cross-lingual word embeddings on morphologically rich languages. Our model includes a simple extension which enables us to exploit morphemes for cross-lingual mapping. We applied our model for the Turkish-Finnish language pair on the bilingual word translation task. Results show that our model outperforms the baseline models by 2{%} in the nearest neighbour ranking. |
Tasks | Word Embeddings |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1140/ |
https://www.aclweb.org/anthology/R19-1140 | |
PWC | https://paperswithcode.com/paper/cross-lingual-word-embeddings-for |
Repo | |
Framework | |
Toward automatic improvement of language produced by non-native language learners
Title | Toward automatic improvement of language produced by non-native language learners |
Authors | Mathias Creutz, Eetu Sj{"o}blom |
Abstract | |
Tasks | |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/W19-6303/ |
https://www.aclweb.org/anthology/W19-6303 | |
PWC | https://paperswithcode.com/paper/toward-automatic-improvement-of-language |
Repo | |
Framework | |