January 24, 2020

1954 words 10 mins read

Paper Group NANR 133

Paper Group NANR 133

Contextual Text Denoising with Masked Language Model. Hope at SemEval-2019 Task 6: Mining social media language to discover offensive language. GWU NLP at SemEval-2019 Task 7: Hybrid Pipeline for Rumour Veracity and Stance Classification on Social Media. Towards Automated Semantic Role Labelling of Hindi-English Code-Mixed Tweets. Findings of the 2 …

Contextual Text Denoising with Masked Language Model

Title Contextual Text Denoising with Masked Language Model
Authors Yifu Sun, Haoming Jiang
Abstract Recently, with the help of deep learning models, significant advances have been made in different Natural Language Processing (NLP) tasks. Unfortunately, state-of-the-art models are vulnerable to noisy texts. We propose a new contextual text denoising algorithm based on the ready-to-use masked language model. The proposed algorithm does not require retraining of the model and can be integrated into any NLP system without additional training on paired cleaning training data. We evaluate our method under synthetic noise and natural noise and show that the proposed algorithm can use context information to correct noise text and improve the performance of noisy inputs in several downstream tasks.
Tasks Denoising, Language Modelling
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5537/
PDF https://www.aclweb.org/anthology/D19-5537
PWC https://paperswithcode.com/paper/contextual-text-denoising-with-masked-1
Repo
Framework

Hope at SemEval-2019 Task 6: Mining social media language to discover offensive language

Title Hope at SemEval-2019 Task 6: Mining social media language to discover offensive language
Authors Gabriel Florentin Patras, Diana Florina Lungu, Daniela Gifu, Tr, Diana abat
Abstract User{'}s content share through social media has reached huge proportions nowadays. However, along with the free expression of thoughts on social media, people risk getting exposed to various aggressive statements. In this paper, we present a system able to identify and classify offensive user-generated content.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2113/
PDF https://www.aclweb.org/anthology/S19-2113
PWC https://paperswithcode.com/paper/hope-at-semeval-2019-task-6-mining-social
Repo
Framework

GWU NLP at SemEval-2019 Task 7: Hybrid Pipeline for Rumour Veracity and Stance Classification on Social Media

Title GWU NLP at SemEval-2019 Task 7: Hybrid Pipeline for Rumour Veracity and Stance Classification on Social Media
Authors Sardar Hamidian, Mona Diab
Abstract Social media plays a crucial role as the main resource news for information seekers online. However, the unmoderated feature of social media platforms lead to the emergence and spread of untrustworthy contents which harm individuals or even societies. Most of the current automated approaches for automatically determining the veracity of a rumor are not generalizable for novel emerging topics. This paper describes our hybrid system comprising rules and a machine learning model which makes use of replied tweets to identify the veracity of the source tweet. The proposed system in this paper achieved 0.435 F-Macro in stance classification, and 0.262 F-macro and 0.801 RMSE in rumor verification tasks in Task7 of SemEval 2019.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2195/
PDF https://www.aclweb.org/anthology/S19-2195
PWC https://paperswithcode.com/paper/gwu-nlp-at-semeval-2019-task-7-hybrid
Repo
Framework

Towards Automated Semantic Role Labelling of Hindi-English Code-Mixed Tweets

Title Towards Automated Semantic Role Labelling of Hindi-English Code-Mixed Tweets
Authors Riya Pal, Dipti Sharma
Abstract We present a system for automating Semantic Role Labelling of Hindi-English code-mixed tweets. We explore the issues posed by noisy, user generated code-mixed social media data. We also compare the individual effect of various linguistic features used in our system. Our proposed model is a 2-step system for automated labelling which gives an overall accuracy of 84{%} for Argument Classification, marking a 10{%} increase over the existing rule-based baseline model. This is the first attempt at building a statistical Semantic Role Labeller for Hindi-English code-mixed data, to the best of our knowledge.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5538/
PDF https://www.aclweb.org/anthology/D19-5538
PWC https://paperswithcode.com/paper/towards-automated-semantic-role-labelling-of
Repo
Framework

Findings of the 2019 Conference on Machine Translation (WMT19)

Title Findings of the 2019 Conference on Machine Translation (WMT19)
Authors Lo{"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{"u}ller, Santanu Pal, Matt Post, Marcos Zampieri
Abstract This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation.
Tasks Machine Translation
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5301/
PDF https://www.aclweb.org/anthology/W19-5301
PWC https://paperswithcode.com/paper/findings-of-the-2019-conference-on-machine
Repo
Framework

A2N: Attending to Neighbors for Knowledge Graph Inference

Title A2N: Attending to Neighbors for Knowledge Graph Inference
Authors Trapit Bansal, Da-Cheng Juan, Sujith Ravi, Andrew McCallum
Abstract State-of-the-art models for knowledge graph completion aim at learning a fixed embedding representation of entities in a multi-relational graph which can generalize to infer unseen entity relationships at test time. This can be sub-optimal as it requires memorizing and generalizing to all possible entity relationships using these fixed representations. We thus propose a novel attention-based method to learn query-dependent representation of entities which adaptively combines the relevant graph neighborhood of an entity leading to more accurate KG completion. The proposed method is evaluated on two benchmark datasets for knowledge graph completion, and experimental results show that the proposed model performs competitively or better than existing state-of-the-art, including recent methods for explicit multi-hop reasoning. Qualitative probing offers insight into how the model can reason about facts involving multiple hops in the knowledge graph, through the use of neighborhood attention.
Tasks Knowledge Graph Completion
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1431/
PDF https://www.aclweb.org/anthology/P19-1431
PWC https://paperswithcode.com/paper/a2n-attending-to-neighbors-for-knowledge
Repo
Framework

INFORMATION MAXIMIZATION AUTO-ENCODING

Title INFORMATION MAXIMIZATION AUTO-ENCODING
Authors Dejiao Zhang, Tianchen Zhao, Laura Balzano
Abstract We propose the Information Maximization Autoencoder (IMAE), an information theoretic approach to simultaneously learn continuous and discrete representations in an unsupervised setting. Unlike the Variational Autoencoder framework, IMAE starts from a stochastic encoder that seeks to map each input data to a hybrid discrete and continuous representation with the objective of maximizing the mutual information between the data and their representations. A decoder is included to approximate the posterior distribution of the data given their representations, where a high fidelity approximation can be achieved by leveraging the informative representations. We show that the proposed objective is theoretically valid and provides a principled framework for understanding the tradeoffs regarding informativeness of each representation factor, disentanglement of representations, and decoding quality.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=SyVpB2RqFX
PDF https://openreview.net/pdf?id=SyVpB2RqFX
PWC https://paperswithcode.com/paper/information-maximization-auto-encoding
Repo
Framework

Tom Jumbo-Grumbo at SemEval-2019 Task 4: Hyperpartisan News Detection with GloVe vectors and SVM

Title Tom Jumbo-Grumbo at SemEval-2019 Task 4: Hyperpartisan News Detection with GloVe vectors and SVM
Authors Chia-Lun Yeh, Babak Loni, Anne Schuth
Abstract In this paper, we describe our attempt to learn bias from news articles. From our experiments, it seems that although there is a correlation between publisher bias and article bias, it is challenging to learn bias directly from the publisher labels. On the other hand, using few manually-labeled samples can increase the accuracy metric from around 60{%} to near 80{%}. Our system is computationally inexpensive and uses several standard document representations in NLP to train an SVM or LR classifier. The system ranked 4th in the SemEval-2019 task. The code is released for reproducibility.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2187/
PDF https://www.aclweb.org/anthology/S19-2187
PWC https://paperswithcode.com/paper/tom-jumbo-grumbo-at-semeval-2019-task-4
Repo
Framework

CoSSAT: Code-Switched Speech Annotation Tool

Title CoSSAT: Code-Switched Speech Annotation Tool
Authors Sanket Shah, Pratik Joshi, Sebastin Santy, Sunayana Sitaram
Abstract Code-switching refers to the alternation of two or more languages in a conversation or utterance and is common in multilingual communities across the world. Building code-switched speech and natural language processing systems are challenging due to the lack of annotated speech and text data. We present a speech annotation interface CoSSAT, which helps annotators transcribe code-switched speech faster, more easily and more accurately than a traditional interface, by displaying candidate words from monolingual speech recognizers. We conduct a user study on the transcription of Hindi-English code-switched speech with 10 annotators and describe quantitative and qualitative results.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5907/
PDF https://www.aclweb.org/anthology/D19-5907
PWC https://paperswithcode.com/paper/cossat-code-switched-speech-annotation-tool
Repo
Framework

Realizing Universal Dependencies Structures

Title Realizing Universal Dependencies Structures
Authors Guy Lapalme
Abstract We first describe a surface realizer forUniversal Dependencies (UD) structures. The system uses a symbolic approach to transform the dependency tree into a tree of constituents that is transformed into an English sentence by an existing realizer. This approach was then adapted for the two shared tasks of SR{'}19. The system is quite fast and showed competitive results for English sentences using automatic and manual evaluation measures.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6305/
PDF https://www.aclweb.org/anthology/D19-6305
PWC https://paperswithcode.com/paper/realizing-universal-dependencies-structures
Repo
Framework

CUNI Systems for the Unsupervised News Translation Task in WMT 2019

Title CUNI Systems for the Unsupervised News Translation Task in WMT 2019
Authors Ivana Kvapil{'\i}kov{'a}, Dominik Mach{'a}{\v{c}}ek, Ond{\v{r}}ej Bojar
Abstract In this paper we describe the CUNI translation system used for the unsupervised news shared task of the ACL 2019 Fourth Conference on Machine Translation (WMT19). We follow the strategy of Artetxe ae at. (2018b), creating a seed phrase-based system where the phrase table is initialized from cross-lingual embedding mappings trained on monolingual data, followed by a neural machine translation system trained on synthetic parallel data. The synthetic corpus was produced from a monolingual corpus by a tuned PBMT model refined through iterative back-translation. We further focus on the handling of named entities, i.e. the part of vocabulary where the cross-lingual embedding mapping suffers most. Our system reaches a BLEU score of 15.3 on the German-Czech WMT19 shared task.
Tasks Machine Translation
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5323/
PDF https://www.aclweb.org/anthology/W19-5323
PWC https://paperswithcode.com/paper/cuni-systems-for-the-unsupervised-news-1
Repo
Framework

Unsupervised Neologism Normalization Using Embedding Space Mapping

Title Unsupervised Neologism Normalization Using Embedding Space Mapping
Authors Nasser Zalmout, Kapil Thadani, Aasish Pappu
Abstract This paper presents an approach for detecting and normalizing neologisms in social media content. Neologisms refer to recent expressions that are specific to certain entities or events and are being increasingly used by the public, but have not yet been accepted in mainstream language. Automated methods for handling neologisms are important for natural language understanding and normalization, especially for informal genres with user generated content. We present an unsupervised approach for detecting neologisms and then normalizing them to canonical words without relying on parallel training data. Our approach builds on the text normalization literature and introduces adaptations to fit the specificities of this task, including phonetic and etymological considerations. We evaluate the proposed techniques on a dataset of Reddit comments, with detected neologisms and corresponding normalizations.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5555/
PDF https://www.aclweb.org/anthology/D19-5555
PWC https://paperswithcode.com/paper/unsupervised-neologism-normalization-using
Repo
Framework

Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges

Title Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges
Authors
Abstract
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3900/
PDF https://www.aclweb.org/anthology/W19-3900
PWC https://paperswithcode.com/paper/proceedings-of-the-workshop-on-deep-learning-1
Repo
Framework

UU_TAILS at MEDIQA 2019: Learning Textual Entailment in the Medical Domain

Title UU_TAILS at MEDIQA 2019: Learning Textual Entailment in the Medical Domain
Authors Noha Tawfik, Marco Spruit
Abstract This article describes the participation of the UU{_}TAILS team in the 2019 MEDIQA challenge intended to improve domain-specific models in medical and clinical NLP. The challenge consists of 3 tasks: medical language inference (NLI), recognizing textual entailment (RQE) and question answering (QA). Our team participated in tasks 1 and 2 and our best runs achieved a performance accuracy of 0.852 and 0.584 respectively for the test sets. The models proposed for task 1 relied on BERT embeddings and different ensemble techniques. For the RQE task, we trained a traditional multilayer perceptron network based on embeddings generated by the universal sentence encoder.
Tasks Natural Language Inference, Question Answering
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5053/
PDF https://www.aclweb.org/anthology/W19-5053
PWC https://paperswithcode.com/paper/uu_tails-at-mediqa-2019-learning-textual
Repo
Framework

A Crowdsourcing-based Approach for Speech Corpus Transcription Case of Arabic Algerian Dialects

Title A Crowdsourcing-based Approach for Speech Corpus Transcription Case of Arabic Algerian Dialects
Authors Ilyes Zine, Mohamed Cherif Zeghad, Soumia Bougrine, Hadda Cherroun
Abstract
Tasks
Published 2019-09-01
URL https://www.aclweb.org/anthology/W19-7411/
PDF https://www.aclweb.org/anthology/W19-7411
PWC https://paperswithcode.com/paper/a-crowdsourcing-based-approach-for-speech
Repo
Framework
comments powered by Disqus