January 25, 2020

2357 words 12 mins read

Paper Group NANR 82

Incorporating Source Syntax into Transformer-Based Neural Machine Translation. Smoothing the Geometry of Probabilistic Box Embeddings. Improving machine classification using human uncertainty measurements. Data Programming for Learning Discourse Structure. Teaching FORGe to Verbalize DBpedia Properties in Spanish. Sentence-Level Adaptation for Low- …

Incorporating Source Syntax into Transformer-Based Neural Machine Translation


Title	Incorporating Source Syntax into Transformer-Based Neural Machine Translation
Authors	Anna Currey, Kenneth Heafield
Abstract	Transformer-based neural machine translation (NMT) has recently achieved state-of-the-art performance on many machine translation tasks. However, recent work (Raganato and Tiedemann, 2018; Tang et al., 2018; Tran et al., 2018) has indicated that Transformer models may not learn syntactic structures as well as their recurrent neural network-based counterparts, particularly in low-resource cases. In this paper, we incorporate constituency parse information into a Transformer NMT model. We leverage linearized parses of the source training sentences in order to inject syntax into the Transformer architecture without modifying it. We introduce two methods: a multi-task machine translation and parsing model with a single encoder and decoder, and a mixed encoder model that learns to translate directly from parsed and unparsed source sentences. We evaluate our methods on low-resource translation from English into twenty target languages, showing consistent improvements of 1.3 BLEU on average across diverse target languages for the multi-task technique. We further evaluate the models on full-scale WMT tasks, finding that the multi-task model aids low- and medium-resource NMT but degenerates high-resource English-German translation.
Tasks	Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5203/
PDF	https://www.aclweb.org/anthology/W19-5203
PWC	https://paperswithcode.com/paper/incorporating-source-syntax-into-transformer
Repo
Framework

Smoothing the Geometry of Probabilistic Box Embeddings


Title	Smoothing the Geometry of Probabilistic Box Embeddings
Authors	Xiang Li, Luke Vilnis, Dongxu Zhang, Michael Boratko, Andrew McCallum
Abstract	There is growing interest in geometrically-inspired embeddings for learning hierarchies, partial orders, and lattice structures, with natural applications to transitive relational data such as entailment graphs. Recent work has extended these ideas beyond deterministic hierarchies to probabilistically calibrated models, which enable learning from uncertain supervision and inferring soft-inclusions among concepts, while maintaining the geometric inductive bias of hierarchical embedding models. We build on the Box Lattice model of Vilnis et al. (2018), which showed promising results in modeling soft-inclusions through an overlapping hierarchy of sets, parameterized as high-dimensional hyperrectangles (boxes). However, the hard edges of the boxes present difficulties for standard gradient based optimization; that work employed a special surrogate function for the disjoint case, but we find this method to be fragile. In this work, we present a novel hierarchical embedding model, inspired by a relaxation of box embeddings into parameterized density functions using Gaussian convolutions over the boxes. Our approach provides an alternative surrogate to the original lattice measure that improves the robustness of optimization in the disjoint case, while also preserving the desirable properties with respect to the original lattice. We demonstrate increased or matching performance on WordNet hypernymy prediction, Flickr caption entailment, and a MovieLens-based market basket dataset. We show especially marked improvements in the case of sparse data, where many conditional probabilities should be low, and thus boxes should be nearly disjoint.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=H1xSNiRcF7
PDF	https://openreview.net/pdf?id=H1xSNiRcF7
PWC	https://paperswithcode.com/paper/smoothing-the-geometry-of-probabilistic-box
Repo
Framework

Improving machine classification using human uncertainty measurements


Title	Improving machine classification using human uncertainty measurements
Authors	Ruairidh M. Battleday, Joshua C. Peterson, Thomas L. Griffiths
Abstract	As deep CNN classifier performance using ground-truth labels has begun to asymptote at near-perfect levels, a key aim for the field is to extend training paradigms to capture further useful structure in natural image data and improve model robustness and generalization. In this paper, we present a novel natural image benchmark for making this extension, which we call CIFAR10H. This new dataset comprises a human-derived, full distribution over labels for each image of the CIFAR10 test set, offering the ability to assess the generalization of state-of-the-art CIFAR10 models, as well as investigate the effects of including this information in model training. We show that classification models trained on CIFAR10 do not generalize as well to our dataset as it does to traditional extensions, and that models fine-tuned using our label information are able to generalize better to related datasets, complement popular data augmentation schemes, and provide robustness to adversarial attacks. We explain these improvements in terms of better empirical approximations to the expected loss function over natural images and their categories in the visual world.
Tasks	Data Augmentation
Published	2019-05-01
URL	https://openreview.net/forum?id=rJl8BhRqF7
PDF	https://openreview.net/pdf?id=rJl8BhRqF7
PWC	https://paperswithcode.com/paper/improving-machine-classification-using-human
Repo
Framework

Data Programming for Learning Discourse Structure


Title	Data Programming for Learning Discourse Structure
Authors	Sonia Badene, Kate Thompson, Jean-Pierre Lorr{'e}, Nicholas Asher
Abstract	This paper investigates the advantages and limits of data programming for the task of learning discourse structure. The data programming paradigm implemented in the Snorkel framework allows a user to label training data using expert-composed heuristics, which are then transformed via the {``}generative step{''} into probability distributions of the class labels given the training candidates. These results are later generalized using a discriminative model. Snorkel{'}s attractive promise to create a large amount of annotated data from a smaller set of training data by unifying the output of a set of heuristics has yet to be used for computationally difficult tasks, such as that of discourse attachment, in which one must decide where a given discourse unit attaches to other units in a text in order to form a coherent discourse structure. Although approaching this problem using Snorkel requires significant modifications to the structure of the heuristics, we show that weak supervision methods can be more than competitive with classical supervised learning approaches to the attachment problem. \|
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1061/
PDF	https://www.aclweb.org/anthology/P19-1061
PWC	https://paperswithcode.com/paper/data-programming-for-learning-discourse
Repo
Framework

Teaching FORGe to Verbalize DBpedia Properties in Spanish


Title	Teaching FORGe to Verbalize DBpedia Properties in Spanish
Authors	Simon Mille, Stamatia Dasiopoulou, Beatriz Fisas, Leo Wanner
Abstract	Statistical generators increasingly dominate the research in NLG. However, grammar-based generators that are grounded in a solid linguistic framework remain very competitive, especially for generation from deep knowledge structures. Furthermore, if built modularly, they can be ported to other genres and languages with a limited amount of work, without the need of the annotation of a considerable amount of training data. One of these generators is FORGe, which is based on the Meaning-Text Model. In the recent WebNLG challenge (the first comprehensive task addressing the mapping of RDF triples to text) FORGe ranked first with respect to the overall quality in human evaluation. We extend the coverage of FORGE{'}s open source grammatical and lexical resources for English, so as to further improve the English texts, and port them to Spanish, to achieve a comparable quality. This confirms that, as already observed in the case of SimpleNLG, a robust universal grammar-driven framework and a systematic organization of the linguistic resources can be an adequate choice for NLG applications.
Tasks
Published	2019-10-01
URL	https://www.aclweb.org/anthology/W19-8659/
PDF	https://www.aclweb.org/anthology/W19-8659
PWC	https://paperswithcode.com/paper/teaching-forge-to-verbalize-dbpedia
Repo
Framework

Sentence-Level Adaptation for Low-Resource Neural Machine Translation


Title	Sentence-Level Adaptation for Low-Resource Neural Machine Translation
Authors	Aaron Mueller, Yash Kumar Lal
Abstract
Tasks	Low-Resource Neural Machine Translation, Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-6807/
PDF	https://www.aclweb.org/anthology/W19-6807
PWC	https://paperswithcode.com/paper/sentence-level-adaptation-for-low-resource
Repo
Framework

End-to-end Deep Reinforcement Learning Based Coreference Resolution


Title	End-to-end Deep Reinforcement Learning Based Coreference Resolution
Authors	Hongliang Fei, Xu Li, Dingcheng Li, Ping Li
Abstract	Recent neural network models have significantly advanced the task of coreference resolution. However, current neural coreference models are usually trained with heuristic loss functions that are computed over a sequence of local decisions. In this paper, we introduce an end-to-end reinforcement learning based coreference resolution model to directly optimize coreference evaluation metrics. Specifically, we modify the state-of-the-art higher-order mention ranking approach in Lee et al. (2018) to a reinforced policy gradient model by incorporating the reward associated with a sequence of coreference linking actions. Furthermore, we introduce maximum entropy regularization for adequate exploration to prevent the model from prematurely converging to a bad local optimum. Our proposed model achieves new state-of-the-art performance on the English OntoNotes v5.0 benchmark.
Tasks	Coreference Resolution
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1064/
PDF	https://www.aclweb.org/anthology/P19-1064
PWC	https://paperswithcode.com/paper/end-to-end-deep-reinforcement-learning-based
Repo
Framework

Semantic Change in the Language of UK Parliamentary Debates


Title	Semantic Change in the Language of UK Parliamentary Debates
Authors	Gavin Abercrombie, Riza Batista-Navarro
Abstract	We investigate changes in the meanings of words used in the UK Parliament across two different epochs. We use word embeddings to explore changes in the distribution of words of interest and uncover words that appear to have undergone semantic transformation in the intervening period, and explore different ways of obtaining target words for this purpose. We find that semantic changes are generally in line with those found in other corpora, and little evidence that parliamentary language is more static than general English. It also seems that words with senses that have been recorded in the dictionary as having fallen into disuse do not undergo semantic changes in this domain.
Tasks	Word Embeddings
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4726/
PDF	https://www.aclweb.org/anthology/W19-4726
PWC	https://paperswithcode.com/paper/semantic-change-in-the-language-of-uk
Repo
Framework

Approaching SMM4H with Merged Models and Multi-task Learning


Title	Approaching SMM4H with Merged Models and Multi-task Learning
Authors	Tilia Ellendorff, Lenz Furrer, Nicola Colic, No{"e}mi Aepli, Fabio Rinaldi
Abstract	We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps.
Tasks	Multi-Task Learning
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-3208/
PDF	https://www.aclweb.org/anthology/W19-3208
PWC	https://paperswithcode.com/paper/approaching-smm4h-with-merged-models-and
Repo
Framework

Clark Kent at SemEval-2019 Task 4: Stylometric Insights into Hyperpartisan News Detection


Title	Clark Kent at SemEval-2019 Task 4: Stylometric Insights into Hyperpartisan News Detection
Authors	Viresh Gupta, Baani Leen Kaur Jolly, Ramneek Kaur, Tanmoy Chakraborty
Abstract	In this paper, we present a news bias prediction system, which we developed as part of a SemEval 2019 task. We developed an XGBoost based system which uses character and word level n-gram features represented using TF-IDF, count vector based correlation matrix, and predicts if an input news article is a hyperpartisan news article. Our model was able to achieve a precision of 68.3{%} on the test set provided by the contest organizers. We also run our model on the BuzzFeed corpus and find XGBoost with simple character level N-Gram embeddings to be performing well with an accuracy of around 96{%}.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2159/
PDF	https://www.aclweb.org/anthology/S19-2159
PWC	https://paperswithcode.com/paper/clark-kent-at-semeval-2019-task-4-stylometric
Repo
Framework

JHU System Description for the MADAR Arabic Dialect Identification Shared Task


Title	JHU System Description for the MADAR Arabic Dialect Identification Shared Task
Authors	Tom Lippincott, Pamela Shapiro, Kevin Duh, Paul McNamee
Abstract	Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models. We found several of these techniques provided small boosts in performance, though a simple character-level language model was a strong baseline, and a lower-order LM achieved best performance on Subtask 2. Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel. This suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning.
Tasks	Language Modelling, Word Embeddings
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4634/
PDF	https://www.aclweb.org/anthology/W19-4634
PWC	https://paperswithcode.com/paper/jhu-system-description-for-the-madar-arabic
Repo
Framework

Detecting and Extracting of Adverse Drug Reaction Mentioning Tweets with Multi-Head Self Attention


Title	Detecting and Extracting of Adverse Drug Reaction Mentioning Tweets with Multi-Head Self Attention
Authors	Suyu Ge, Tao Qi, Chuhan Wu, Yongfeng Huang
Abstract	This paper describes our system for the first and second shared tasks of the fourth Social Media Mining for Health Applications (SMM4H) workshop. We enhance tweet representation with a language model and distinguish the importance of different words with Multi-Head Self-Attention. In addition, transfer learning is exploited to make up for the data shortage. Our system achieved competitive results on both tasks with an F1-score of 0.5718 for task 1 and 0.653 (overlap) / 0.357 (strict) for task 2.
Tasks	Language Modelling, Transfer Learning
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-3214/
PDF	https://www.aclweb.org/anthology/W19-3214
PWC	https://paperswithcode.com/paper/detecting-and-extracting-of-adverse-drug
Repo
Framework

Gaussian Process Models of Sound Change in Indo-Aryan Dialectology


Title	Gaussian Process Models of Sound Change in Indo-Aryan Dialectology
Authors	Chundra Cathcart
Abstract	This paper proposes a Gaussian Process model of sound change targeted toward questions in Indo-Aryan dialectology. Gaussian Processes (GPs) provide a flexible means of expressing covariance between outcomes, and can be extended to a wide variety of probability distributions. We find that GP models fare better in terms of some key posterior predictive checks than models that do not express covariance between sound changes, and outline directions for future work.
Tasks	Gaussian Processes
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4732/
PDF	https://www.aclweb.org/anthology/W19-4732
PWC	https://paperswithcode.com/paper/gaussian-process-models-of-sound-change-in
Repo
Framework

Modeling a Historical Variety of a Low-Resource Language: Language Contact Effects in the Verbal Cluster of Early-Modern Frisian


Title	Modeling a Historical Variety of a Low-Resource Language: Language Contact Effects in the Verbal Cluster of Early-Modern Frisian
Authors	Jelke Bloem, Arjen Versloot, Fred Weerman
Abstract	Certain phenomena of interest to linguists mainly occur in low-resource languages, such as contact-induced language change. We show that it is possible to study contact-induced language change computationally in a historical variety of a low-resource language, Early-Modern Frisian, by creating a model using features that were established to be relevant in a closely related language, modern Dutch. This allows us to test two hypotheses on two types of language contact that may have taken place between Frisian and Dutch during this time. Our model shows that Frisian verb cluster word orders are associated with different context features than Dutch verb orders, supporting the {`}learned borrowing{'} hypothesis. \|
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4733/
PDF	https://www.aclweb.org/anthology/W19-4733
PWC	https://paperswithcode.com/paper/modeling-a-historical-variety-of-a-low
Repo
Framework

Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP


Title	Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
Authors
Abstract
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4800/
PDF	https://www.aclweb.org/anthology/W19-4800
PWC	https://paperswithcode.com/paper/proceedings-of-the-2019-acl-workshop
Repo
Framework