January 24, 2020

2210 words 11 mins read

Paper Group NANR 134

Paper Group NANR 134

Team Peter-Parker at SemEval-2019 Task 4: BERT-Based Method in Hyperpartisan News Detection. The University of Maryland’s Kazakh-English Neural Machine Translation System at WMT19. Crowdsourcing Discourse Relation Annotations by a Two-Step Connective Insertion Task. Times Are Changing: Investigating the Pace of Language Change in Diachronic Word Em …

Team Peter-Parker at SemEval-2019 Task 4: BERT-Based Method in Hyperpartisan News Detection

Title Team Peter-Parker at SemEval-2019 Task 4: BERT-Based Method in Hyperpartisan News Detection
Authors Zhiyuan Ning, Yuanzhen Lin, Ruichao Zhong
Abstract This paper describes the team peter-parker{'}s participation in Hyperpartisan News Detection task (SemEval-2019 Task 4), which requires to classify whether a given news article is bias or not. We decided to use JAVA to do the article parsing tool and the BERT-Based model to do the bias prediction. Furthermore, we will show experiment results with analysis.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2181/
PDF https://www.aclweb.org/anthology/S19-2181
PWC https://paperswithcode.com/paper/team-peter-parker-at-semeval-2019-task-4-bert
Repo
Framework

The University of Maryland’s Kazakh-English Neural Machine Translation System at WMT19

Title The University of Maryland’s Kazakh-English Neural Machine Translation System at WMT19
Authors Eleftheria Briakou, Marine Carpuat
Abstract This paper describes the University of Maryland{'}s submission to the WMT 2019 Kazakh-English news translation task. We study the impact of transfer learning from another low-resource but related language. We experiment with different ways of encoding lexical units to maximize lexical overlap between the two language pairs, as well as back-translation and ensembling. The submitted system improves over a Kazakh-only baseline by +5.45 BLEU on newstest2019.
Tasks Machine Translation, Transfer Learning
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5308/
PDF https://www.aclweb.org/anthology/W19-5308
PWC https://paperswithcode.com/paper/the-university-of-marylands-kazakh-english
Repo
Framework

Crowdsourcing Discourse Relation Annotations by a Two-Step Connective Insertion Task

Title Crowdsourcing Discourse Relation Annotations by a Two-Step Connective Insertion Task
Authors Frances Yung, Vera Demberg, Merel Scholman
Abstract The perspective of being able to crowd-source coherence relations bears the promise of acquiring annotations for new texts quickly, which could then increase the size and variety of discourse-annotated corpora. It would also open the avenue to answering new research questions: Collecting annotations from a larger number of individuals per instance would allow to investigate the distribution of inferred relations, and to study individual differences in coherence relation interpretation. However, annotating coherence relations with untrained workers is not trivial. We here propose a novel two-step annotation procedure, which extends an earlier method by Scholman and Demberg (2017a). In our approach, coherence relation labels are inferred from connectives that workers insert into the text. We show that the proposed method leads to replicable coherence annotations, and analyse the agreement between the obtained relation labels and annotations from PDTB and RSTDT on the same texts.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4003/
PDF https://www.aclweb.org/anthology/W19-4003
PWC https://paperswithcode.com/paper/crowdsourcing-discourse-relation-annotations
Repo
Framework

Times Are Changing: Investigating the Pace of Language Change in Diachronic Word Embeddings

Title Times Are Changing: Investigating the Pace of Language Change in Diachronic Word Embeddings
Authors Br, Stephanie l, David Lassner
Abstract We propose Word Embedding Networks, a novel method that is able to learn word embeddings of individual data slices while simultaneously aligning and ordering them without feeding temporal information a priori to the model. This gives us the opportunity to analyse the dynamics in word embeddings on a large scale in a purely data-driven manner. In experiments on two different newspaper corpora, the New York Times (English) and die Zeit (German), we were able to show that time actually determines the dynamics of semantic change. However, there is by no means a uniform evolution, but instead times of faster and times of slower change.
Tasks Word Embeddings
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4718/
PDF https://www.aclweb.org/anthology/W19-4718
PWC https://paperswithcode.com/paper/times-are-changing-investigating-the-pace-of
Repo
Framework

The making of the Litkey Corpus, a richly annotated longitudinal corpus of German texts written by primary school children

Title The making of the Litkey Corpus, a richly annotated longitudinal corpus of German texts written by primary school children
Authors Ronja Laarmann-Quante, Stefanie Dipper, Eva Belke
Abstract To date, corpus and computational linguistic work on written language acquisition has mostly dealt with second language learners who have usually already mastered orthography acquisition in their first language. In this paper, we present the Litkey Corpus, a richly-annotated longitudinal corpus of written texts produced by primary school children in Germany from grades 2 to 4. The paper focuses on the (semi-)automatic annotation procedure at various linguistic levels, which include POS tags, features of the word-internal structure (phonemes, syllables, morphemes) and key orthographic features of the target words as well as a categorization of spelling errors. Comprehensive evaluations show that high accuracy was achieved on all levels, making the Litkey Corpus a useful resource for corpus-based research on literacy acquisition of German primary school children and for developing NLP tools for educational purposes. The corpus is freely available under https://www.linguistics.rub.de/litkeycorpus/.
Tasks Language Acquisition
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4006/
PDF https://www.aclweb.org/anthology/W19-4006
PWC https://paperswithcode.com/paper/the-making-of-the-litkey-corpus-a-richly
Repo
Framework

Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019)

Title Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019)
Authors
Abstract
Tasks Machine Translation
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6500/
PDF https://www.aclweb.org/anthology/D19-6500
PWC https://paperswithcode.com/paper/proceedings-of-the-fourth-workshop-on-5
Repo
Framework

Zero-Resource Neural Machine Translation with Monolingual Pivot Data

Title Zero-Resource Neural Machine Translation with Monolingual Pivot Data
Authors Anna Currey, Kenneth Heafield
Abstract Zero-shot neural machine translation (NMT) is a framework that uses source-pivot and target-pivot parallel data to train a source-target NMT system. An extension to zero-shot NMT is zero-resource NMT, which generates pseudo-parallel corpora using a zero-shot system and further trains the zero-shot system on that data. In this paper, we expand on zero-resource NMT by incorporating monolingual data in the pivot language into training; since the pivot language is usually the highest-resource language of the three, we expect monolingual pivot-language data to be most abundant. We propose methods for generating pseudo-parallel corpora using pivot-language monolingual data and for leveraging the pseudo-parallel corpora to improve the zero-shot NMT system. We evaluate these methods for a high-resource language pair (German-Russian) using English as the pivot. We show that our proposed methods yield consistent improvements over strong zero-shot and zero-resource baselines and even catch up to pivot-based models in BLEU (while not requiring the two-pass inference that pivot models require).
Tasks Machine Translation
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5610/
PDF https://www.aclweb.org/anthology/D19-5610
PWC https://paperswithcode.com/paper/zero-resource-neural-machine-translation-with-1
Repo
Framework

Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings

Title Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings
Authors Philippa Shoemark, Farhana Ferdousi Liza, Dong Nguyen, Scott Hale, Barbara McGillivray
Abstract Word embeddings are increasingly used for the automatic detection of semantic change; yet, a robust evaluation and systematic comparison of the choices involved has been lacking. We propose a new evaluation framework for semantic change detection and find that (i) using the whole time series is preferable over only comparing between the first and last time points; (ii) independently trained and aligned embeddings perform better than continuously trained embeddings for long time periods; and (iii) that the reference point for comparison matters. We also present an analysis of the changes detected on a large Twitter dataset spanning 5.5 years.
Tasks Time Series, Word Embeddings
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1007/
PDF https://www.aclweb.org/anthology/D19-1007
PWC https://paperswithcode.com/paper/room-to-glo-a-systematic-comparison-of
Repo
Framework

Attention and Lexicon Regularized LSTM for Aspect-based Sentiment Analysis

Title Attention and Lexicon Regularized LSTM for Aspect-based Sentiment Analysis
Authors Lingxian Bao, Patrik Lambert, Toni Badia
Abstract Abstract Attention based deep learning systems have been demonstrated to be the state of the art approach for aspect-level sentiment analysis, however, end-to-end deep neural networks lack flexibility as one can not easily adjust the network to fix an obvious problem, especially when more training data is not available: e.g. when it always predicts \textit{positive} when seeing the word \textit{disappointed}. Meanwhile, it is less stressed that attention mechanism is likely to {}over-focus{''} on particular parts of a sentence, while ignoring positions which provide key information for judging the polarity. In this paper, we describe a simple yet effective approach to leverage lexicon information so that the model becomes more flexible and robust. We also explore the effect of regularizing attention vectors to allow the network to have a broader {}focus{''} on different parts of the sentence. The experimental results demonstrate the effectiveness of our approach.
Tasks Aspect-Based Sentiment Analysis, Sentiment Analysis
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-2035/
PDF https://www.aclweb.org/anthology/P19-2035
PWC https://paperswithcode.com/paper/attention-and-lexicon-regularized-lstm-for
Repo
Framework

Entity-level Classification of Adverse Drug Reactions: a Comparison of Neural Network Models

Title Entity-level Classification of Adverse Drug Reactions: a Comparison of Neural Network Models
Authors Ilseyar Alimova, Elena Tutubalina
Abstract This paper presents our experimental work on exploring the potential of neural network models developed for aspect-based sentiment analysis for entity-level adverse drug reaction (ADR) classification. Our goal is to explore how to represent local context around ADR mentions and learn an entity representation, interacting with its context. We conducted extensive experiments on various sources of text-based information, including social media, electronic health records, and abstracts of scientific articles from PubMed. The results show that Interactive Attention Neural Network (IAN) outperformed other models on four corpora in terms of macro F-measure. This work is an abridged version of our recent paper accepted to Programming and Computer Software journal in 2019.
Tasks Aspect-Based Sentiment Analysis, Sentiment Analysis
Published 2019-08-01
URL https://www.aclweb.org/anthology/papers/W/W19/W19-3641/
PDF https://www.aclweb.org/anthology/W19-3641
PWC https://paperswithcode.com/paper/entity-level-classification-of-adverse-drug
Repo
Framework

A Working Memory Model for Task-oriented Dialog Response Generation

Title A Working Memory Model for Task-oriented Dialog Response Generation
Authors Xiuyi Chen, Jiaming Xu, Bo Xu
Abstract Recently, to incorporate external Knowledge Base (KB) information, one form of world knowledge, several end-to-end task-oriented dialog systems have been proposed. These models, however, tend to confound the dialog history with KB tuples and simply store them into one memory. Inspired by the psychological studies on working memory, we propose a working memory model (WMM2Seq) for dialog response generation. Our WMM2Seq adopts a working memory to interact with two separated long-term memories, which are the episodic memory for memorizing dialog history and the semantic memory for storing KB tuples. The working memory consists of a central executive to attend to the aforementioned memories, and a short-term storage system to store the {``}activated{''} contents from the long-term memories. Furthermore, we introduce a context-sensitive perceptual process for the token representations of dialog history, and then feed them into the episodic memory. Extensive experiments on two task-oriented dialog datasets demonstrate that our WMM2Seq significantly outperforms the state-of-the-art results in several evaluation metrics. |
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1258/
PDF https://www.aclweb.org/anthology/P19-1258
PWC https://paperswithcode.com/paper/a-working-memory-model-for-task-oriented
Repo
Framework

Building a Comprehensive Romanian Knowledge Base for Drug Administration

Title Building a Comprehensive Romanian Knowledge Base for Drug Administration
Authors Bogdan Nicula, Mihai Dascalu, Maria-Dorinela S{^\i}rbu, {\textcommabelow{S}}tefan Tr{\u{a}}u{\textcommabelow{s}}an-Matu, Alex Nu{\textcommabelow{t}}{\u{a}}, ru
Abstract Information on drug administration is obtained traditionally from doctors and pharmacists, as well as leaflets which provide in most cases cumbersome and hard-to-follow details. Thus, the need for medical knowledge bases emerges to provide access to concrete and well-structured information which can play an important role in informing patients. This paper introduces a Romanian medical knowledge base focused on drug-drug interactions, on representing relevant drug information, and on symptom-disease relations. The knowledge base was created by extracting and transforming information using Natural Language Processing techniques from both structured and unstructured sources, together with manual annotations. The resulting Romanian ontologies are aligned with larger English medical ontologies. Our knowledge base supports queries regarding drugs (e.g., active ingredients, concentration, expiration date), drug-drug interaction, symptom-disease relations, as well as drug-symptom relations.
Tasks
Published 2019-09-01
URL https://www.aclweb.org/anthology/R19-1096/
PDF https://www.aclweb.org/anthology/R19-1096
PWC https://paperswithcode.com/paper/building-a-comprehensive-romanian-knowledge
Repo
Framework

Presupposition Projection and Repair Strategies in Trivalent Semantics

Title Presupposition Projection and Repair Strategies in Trivalent Semantics
Authors Yoad Winter
Abstract
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/W19-5703/
PDF https://www.aclweb.org/anthology/W19-5703
PWC https://paperswithcode.com/paper/presupposition-projection-and-repair
Repo
Framework
Title Nonsense!: Quality Control via Two-Step Reason Selection for Annotating Local Acceptability and Related Attributes in News Editorials
Authors Wonsuk Yang, Seungwon Yoon, Ada Carpenter, Jong Park
Abstract Annotation quality control is a critical aspect for building reliable corpora through linguistic annotation. In this study, we present a simple but powerful quality control method using two-step reason selection. We gathered sentential annotations of local acceptance and three related attributes through a crowdsourcing platform. For each attribute, the reason for the choice of the attribute value is selected in a two-step manner. The options given for reason selection were designed to facilitate the detection of a nonsensical reason selection. We assume that a sentential annotation that contains a nonsensical reason is less reliable than the one without such reason. Our method, based solely on this assumption, is found to retain the annotations with satisfactory quality out of the entire annotations mixed with those of low quality.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1293/
PDF https://www.aclweb.org/anthology/D19-1293
PWC https://paperswithcode.com/paper/nonsense-quality-control-via-two-step-reason
Repo
Framework

The IIIT-H Gujarati-English Machine Translation System for WMT19

Title The IIIT-H Gujarati-English Machine Translation System for WMT19
Authors Vikrant Goyal, Dipti Misra Sharma
Abstract This paper describes the Neural Machine Translation system of IIIT-Hyderabad for the Gujarati→English news translation shared task of WMT19. Our system is basedon encoder-decoder framework with attention mechanism. We experimented with Multilingual Neural MT models. Our experiments show that Multilingual Neural Machine Translation leveraging parallel data from related language pairs helps in significant BLEU improvements upto 11.5, for low resource language pairs like Gujarati-English
Tasks Machine Translation
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5316/
PDF https://www.aclweb.org/anthology/W19-5316
PWC https://paperswithcode.com/paper/the-iiit-h-gujarati-english-machine
Repo
Framework
comments powered by Disqus