January 24, 2020

1954 words 10 mins read

Paper Group NANR 133

Contextual Text Denoising with Masked Language Model. Hope at SemEval-2019 Task 6: Mining social media language to discover offensive language. GWU NLP at SemEval-2019 Task 7: Hybrid Pipeline for Rumour Veracity and Stance Classification on Social Media. Towards Automated Semantic Role Labelling of Hindi-English Code-Mixed Tweets. Findings of the 2 …

Contextual Text Denoising with Masked Language Model


Title	Contextual Text Denoising with Masked Language Model
Authors	Yifu Sun, Haoming Jiang
Abstract	Recently, with the help of deep learning models, significant advances have been made in different Natural Language Processing (NLP) tasks. Unfortunately, state-of-the-art models are vulnerable to noisy texts. We propose a new contextual text denoising algorithm based on the ready-to-use masked language model. The proposed algorithm does not require retraining of the model and can be integrated into any NLP system without additional training on paired cleaning training data. We evaluate our method under synthetic noise and natural noise and show that the proposed algorithm can use context information to correct noise text and improve the performance of noisy inputs in several downstream tasks.
Tasks	Denoising, Language Modelling
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5537/
PDF	https://www.aclweb.org/anthology/D19-5537
PWC	https://paperswithcode.com/paper/contextual-text-denoising-with-masked-1
Repo
Framework


Title	Hope at SemEval-2019 Task 6: Mining social media language to discover offensive language
Authors	Gabriel Florentin Patras, Diana Florina Lungu, Daniela Gifu, Tr, Diana abat
Abstract	User{'}s content share through social media has reached huge proportions nowadays. However, along with the free expression of thoughts on social media, people risk getting exposed to various aggressive statements. In this paper, we present a system able to identify and classify offensive user-generated content.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2113/
PDF	https://www.aclweb.org/anthology/S19-2113
PWC	https://paperswithcode.com/paper/hope-at-semeval-2019-task-6-mining-social
Repo
Framework


Title	GWU NLP at SemEval-2019 Task 7: Hybrid Pipeline for Rumour Veracity and Stance Classification on Social Media
Authors	Sardar Hamidian, Mona Diab
Abstract	Social media plays a crucial role as the main resource news for information seekers online. However, the unmoderated feature of social media platforms lead to the emergence and spread of untrustworthy contents which harm individuals or even societies. Most of the current automated approaches for automatically determining the veracity of a rumor are not generalizable for novel emerging topics. This paper describes our hybrid system comprising rules and a machine learning model which makes use of replied tweets to identify the veracity of the source tweet. The proposed system in this paper achieved 0.435 F-Macro in stance classification, and 0.262 F-macro and 0.801 RMSE in rumor verification tasks in Task7 of SemEval 2019.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2195/
PDF	https://www.aclweb.org/anthology/S19-2195
PWC	https://paperswithcode.com/paper/gwu-nlp-at-semeval-2019-task-7-hybrid
Repo
Framework

Towards Automated Semantic Role Labelling of Hindi-English Code-Mixed Tweets


Title	Towards Automated Semantic Role Labelling of Hindi-English Code-Mixed Tweets
Authors	Riya Pal, Dipti Sharma
Abstract	We present a system for automating Semantic Role Labelling of Hindi-English code-mixed tweets. We explore the issues posed by noisy, user generated code-mixed social media data. We also compare the individual effect of various linguistic features used in our system. Our proposed model is a 2-step system for automated labelling which gives an overall accuracy of 84{%} for Argument Classification, marking a 10{%} increase over the existing rule-based baseline model. This is the first attempt at building a statistical Semantic Role Labeller for Hindi-English code-mixed data, to the best of our knowledge.
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5538/
PDF	https://www.aclweb.org/anthology/D19-5538
PWC	https://paperswithcode.com/paper/towards-automated-semantic-role-labelling-of
Repo
Framework

Findings of the 2019 Conference on Machine Translation (WMT19)


Title	Findings of the 2019 Conference on Machine Translation (WMT19)
Authors	Lo{"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{"u}ller, Santanu Pal, Matt Post, Marcos Zampieri
Abstract	This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation.
Tasks	Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5301/
PDF	https://www.aclweb.org/anthology/W19-5301
PWC	https://paperswithcode.com/paper/findings-of-the-2019-conference-on-machine
Repo
Framework

A2N: Attending to Neighbors for Knowledge Graph Inference


Title	A2N: Attending to Neighbors for Knowledge Graph Inference
Authors	Trapit Bansal, Da-Cheng Juan, Sujith Ravi, Andrew McCallum
Abstract	State-of-the-art models for knowledge graph completion aim at learning a fixed embedding representation of entities in a multi-relational graph which can generalize to infer unseen entity relationships at test time. This can be sub-optimal as it requires memorizing and generalizing to all possible entity relationships using these fixed representations. We thus propose a novel attention-based method to learn query-dependent representation of entities which adaptively combines the relevant graph neighborhood of an entity leading to more accurate KG completion. The proposed method is evaluated on two benchmark datasets for knowledge graph completion, and experimental results show that the proposed model performs competitively or better than existing state-of-the-art, including recent methods for explicit multi-hop reasoning. Qualitative probing offers insight into how the model can reason about facts involving multiple hops in the knowledge graph, through the use of neighborhood attention.
Tasks	Knowledge Graph Completion
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1431/
PDF	https://www.aclweb.org/anthology/P19-1431
PWC	https://paperswithcode.com/paper/a2n-attending-to-neighbors-for-knowledge
Repo
Framework

INFORMATION MAXIMIZATION AUTO-ENCODING


Title	INFORMATION MAXIMIZATION AUTO-ENCODING
Authors	Dejiao Zhang, Tianchen Zhao, Laura Balzano
Abstract	We propose the Information Maximization Autoencoder (IMAE), an information theoretic approach to simultaneously learn continuous and discrete representations in an unsupervised setting. Unlike the Variational Autoencoder framework, IMAE starts from a stochastic encoder that seeks to map each input data to a hybrid discrete and continuous representation with the objective of maximizing the mutual information between the data and their representations. A decoder is included to approximate the posterior distribution of the data given their representations, where a high fidelity approximation can be achieved by leveraging the informative representations. We show that the proposed objective is theoretically valid and provides a principled framework for understanding the tradeoffs regarding informativeness of each representation factor, disentanglement of representations, and decoding quality.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=SyVpB2RqFX
PDF	https://openreview.net/pdf?id=SyVpB2RqFX
PWC	https://paperswithcode.com/paper/information-maximization-auto-encoding
Repo
Framework

Tom Jumbo-Grumbo at SemEval-2019 Task 4: Hyperpartisan News Detection with GloVe vectors and SVM


Title	Tom Jumbo-Grumbo at SemEval-2019 Task 4: Hyperpartisan News Detection with GloVe vectors and SVM
Authors	Chia-Lun Yeh, Babak Loni, Anne Schuth
Abstract	In this paper, we describe our attempt to learn bias from news articles. From our experiments, it seems that although there is a correlation between publisher bias and article bias, it is challenging to learn bias directly from the publisher labels. On the other hand, using few manually-labeled samples can increase the accuracy metric from around 60{%} to near 80{%}. Our system is computationally inexpensive and uses several standard document representations in NLP to train an SVM or LR classifier. The system ranked 4th in the SemEval-2019 task. The code is released for reproducibility.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2187/
PDF	https://www.aclweb.org/anthology/S19-2187
PWC	https://paperswithcode.com/paper/tom-jumbo-grumbo-at-semeval-2019-task-4
Repo
Framework

CoSSAT: Code-Switched Speech Annotation Tool


Title	CoSSAT: Code-Switched Speech Annotation Tool
Authors	Sanket Shah, Pratik Joshi, Sebastin Santy, Sunayana Sitaram
Abstract	Code-switching refers to the alternation of two or more languages in a conversation or utterance and is common in multilingual communities across the world. Building code-switched speech and natural language processing systems are challenging due to the lack of annotated speech and text data. We present a speech annotation interface CoSSAT, which helps annotators transcribe code-switched speech faster, more easily and more accurately than a traditional interface, by displaying candidate words from monolingual speech recognizers. We conduct a user study on the transcription of Hindi-English code-switched speech with 10 annotators and describe quantitative and qualitative results.
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5907/
PDF	https://www.aclweb.org/anthology/D19-5907
PWC	https://paperswithcode.com/paper/cossat-code-switched-speech-annotation-tool
Repo
Framework

Realizing Universal Dependencies Structures


Title	Realizing Universal Dependencies Structures
Authors	Guy Lapalme
Abstract	We first describe a surface realizer forUniversal Dependencies (UD) structures. The system uses a symbolic approach to transform the dependency tree into a tree of constituents that is transformed into an English sentence by an existing realizer. This approach was then adapted for the two shared tasks of SR{'}19. The system is quite fast and showed competitive results for English sentences using automatic and manual evaluation measures.
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-6305/
PDF	https://www.aclweb.org/anthology/D19-6305
PWC	https://paperswithcode.com/paper/realizing-universal-dependencies-structures
Repo
Framework

CUNI Systems for the Unsupervised News Translation Task in WMT 2019


Title	CUNI Systems for the Unsupervised News Translation Task in WMT 2019
Authors	Ivana Kvapil{'\i}kov{'a}, Dominik Mach{'a}{\v{c}}ek, Ond{\v{r}}ej Bojar
Abstract	In this paper we describe the CUNI translation system used for the unsupervised news shared task of the ACL 2019 Fourth Conference on Machine Translation (WMT19). We follow the strategy of Artetxe ae at. (2018b), creating a seed phrase-based system where the phrase table is initialized from cross-lingual embedding mappings trained on monolingual data, followed by a neural machine translation system trained on synthetic parallel data. The synthetic corpus was produced from a monolingual corpus by a tuned PBMT model refined through iterative back-translation. We further focus on the handling of named entities, i.e. the part of vocabulary where the cross-lingual embedding mapping suffers most. Our system reaches a BLEU score of 15.3 on the German-Czech WMT19 shared task.
Tasks	Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5323/
PDF	https://www.aclweb.org/anthology/W19-5323
PWC	https://paperswithcode.com/paper/cuni-systems-for-the-unsupervised-news-1
Repo
Framework

Unsupervised Neologism Normalization Using Embedding Space Mapping


Title	Unsupervised Neologism Normalization Using Embedding Space Mapping
Authors	Nasser Zalmout, Kapil Thadani, Aasish Pappu
Abstract	This paper presents an approach for detecting and normalizing neologisms in social media content. Neologisms refer to recent expressions that are specific to certain entities or events and are being increasingly used by the public, but have not yet been accepted in mainstream language. Automated methods for handling neologisms are important for natural language understanding and normalization, especially for informal genres with user generated content. We present an unsupervised approach for detecting neologisms and then normalizing them to canonical words without relying on parallel training data. Our approach builds on the text normalization literature and introduces adaptations to fit the specificities of this task, including phonetic and etymological considerations. We evaluate the proposed techniques on a dataset of Reddit comments, with detected neologisms and corresponding normalizations.
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5555/
PDF	https://www.aclweb.org/anthology/D19-5555
PWC	https://paperswithcode.com/paper/unsupervised-neologism-normalization-using
Repo
Framework

Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges


Title	Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges
Authors
Abstract
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-3900/
PDF	https://www.aclweb.org/anthology/W19-3900
PWC	https://paperswithcode.com/paper/proceedings-of-the-workshop-on-deep-learning-1
Repo
Framework

UU_TAILS at MEDIQA 2019: Learning Textual Entailment in the Medical Domain


Title	UU_TAILS at MEDIQA 2019: Learning Textual Entailment in the Medical Domain
Authors	Noha Tawfik, Marco Spruit
Abstract	This article describes the participation of the UU{_}TAILS team in the 2019 MEDIQA challenge intended to improve domain-specific models in medical and clinical NLP. The challenge consists of 3 tasks: medical language inference (NLI), recognizing textual entailment (RQE) and question answering (QA). Our team participated in tasks 1 and 2 and our best runs achieved a performance accuracy of 0.852 and 0.584 respectively for the test sets. The models proposed for task 1 relied on BERT embeddings and different ensemble techniques. For the RQE task, we trained a traditional multilayer perceptron network based on embeddings generated by the universal sentence encoder.
Tasks	Natural Language Inference, Question Answering
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5053/
PDF	https://www.aclweb.org/anthology/W19-5053
PWC	https://paperswithcode.com/paper/uu_tails-at-mediqa-2019-learning-textual
Repo
Framework

A Crowdsourcing-based Approach for Speech Corpus Transcription Case of Arabic Algerian Dialects


Title	A Crowdsourcing-based Approach for Speech Corpus Transcription Case of Arabic Algerian Dialects
Authors	Ilyes Zine, Mohamed Cherif Zeghad, Soumia Bougrine, Hadda Cherroun
Abstract
Tasks
Published	2019-09-01
URL	https://www.aclweb.org/anthology/W19-7411/
PDF	https://www.aclweb.org/anthology/W19-7411
PWC	https://paperswithcode.com/paper/a-crowdsourcing-based-approach-for-speech
Repo
Framework