January 25, 2020

2357 words 12 mins read

Paper Group NANR 82

Paper Group NANR 82

Incorporating Source Syntax into Transformer-Based Neural Machine Translation. Smoothing the Geometry of Probabilistic Box Embeddings. Improving machine classification using human uncertainty measurements. Data Programming for Learning Discourse Structure. Teaching FORGe to Verbalize DBpedia Properties in Spanish. Sentence-Level Adaptation for Low- …

Incorporating Source Syntax into Transformer-Based Neural Machine Translation

Title Incorporating Source Syntax into Transformer-Based Neural Machine Translation
Authors Anna Currey, Kenneth Heafield
Abstract Transformer-based neural machine translation (NMT) has recently achieved state-of-the-art performance on many machine translation tasks. However, recent work (Raganato and Tiedemann, 2018; Tang et al., 2018; Tran et al., 2018) has indicated that Transformer models may not learn syntactic structures as well as their recurrent neural network-based counterparts, particularly in low-resource cases. In this paper, we incorporate constituency parse information into a Transformer NMT model. We leverage linearized parses of the source training sentences in order to inject syntax into the Transformer architecture without modifying it. We introduce two methods: a multi-task machine translation and parsing model with a single encoder and decoder, and a mixed encoder model that learns to translate directly from parsed and unparsed source sentences. We evaluate our methods on low-resource translation from English into twenty target languages, showing consistent improvements of 1.3 BLEU on average across diverse target languages for the multi-task technique. We further evaluate the models on full-scale WMT tasks, finding that the multi-task model aids low- and medium-resource NMT but degenerates high-resource English-German translation.
Tasks Machine Translation
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5203/
PDF https://www.aclweb.org/anthology/W19-5203
PWC https://paperswithcode.com/paper/incorporating-source-syntax-into-transformer
Repo
Framework

Smoothing the Geometry of Probabilistic Box Embeddings

Title Smoothing the Geometry of Probabilistic Box Embeddings
Authors Xiang Li, Luke Vilnis, Dongxu Zhang, Michael Boratko, Andrew McCallum
Abstract There is growing interest in geometrically-inspired embeddings for learning hierarchies, partial orders, and lattice structures, with natural applications to transitive relational data such as entailment graphs. Recent work has extended these ideas beyond deterministic hierarchies to probabilistically calibrated models, which enable learning from uncertain supervision and inferring soft-inclusions among concepts, while maintaining the geometric inductive bias of hierarchical embedding models. We build on the Box Lattice model of Vilnis et al. (2018), which showed promising results in modeling soft-inclusions through an overlapping hierarchy of sets, parameterized as high-dimensional hyperrectangles (boxes). However, the hard edges of the boxes present difficulties for standard gradient based optimization; that work employed a special surrogate function for the disjoint case, but we find this method to be fragile. In this work, we present a novel hierarchical embedding model, inspired by a relaxation of box embeddings into parameterized density functions using Gaussian convolutions over the boxes. Our approach provides an alternative surrogate to the original lattice measure that improves the robustness of optimization in the disjoint case, while also preserving the desirable properties with respect to the original lattice. We demonstrate increased or matching performance on WordNet hypernymy prediction, Flickr caption entailment, and a MovieLens-based market basket dataset. We show especially marked improvements in the case of sparse data, where many conditional probabilities should be low, and thus boxes should be nearly disjoint.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=H1xSNiRcF7
PDF https://openreview.net/pdf?id=H1xSNiRcF7
PWC https://paperswithcode.com/paper/smoothing-the-geometry-of-probabilistic-box
Repo
Framework

Improving machine classification using human uncertainty measurements

Title Improving machine classification using human uncertainty measurements
Authors Ruairidh M. Battleday, Joshua C. Peterson, Thomas L. Griffiths
Abstract As deep CNN classifier performance using ground-truth labels has begun to asymptote at near-perfect levels, a key aim for the field is to extend training paradigms to capture further useful structure in natural image data and improve model robustness and generalization. In this paper, we present a novel natural image benchmark for making this extension, which we call CIFAR10H. This new dataset comprises a human-derived, full distribution over labels for each image of the CIFAR10 test set, offering the ability to assess the generalization of state-of-the-art CIFAR10 models, as well as investigate the effects of including this information in model training. We show that classification models trained on CIFAR10 do not generalize as well to our dataset as it does to traditional extensions, and that models fine-tuned using our label information are able to generalize better to related datasets, complement popular data augmentation schemes, and provide robustness to adversarial attacks. We explain these improvements in terms of better empirical approximations to the expected loss function over natural images and their categories in the visual world.
Tasks Data Augmentation
Published 2019-05-01
URL https://openreview.net/forum?id=rJl8BhRqF7
PDF https://openreview.net/pdf?id=rJl8BhRqF7
PWC https://paperswithcode.com/paper/improving-machine-classification-using-human
Repo
Framework

Data Programming for Learning Discourse Structure

Title Data Programming for Learning Discourse Structure
Authors Sonia Badene, Kate Thompson, Jean-Pierre Lorr{'e}, Nicholas Asher
Abstract This paper investigates the advantages and limits of data programming for the task of learning discourse structure. The data programming paradigm implemented in the Snorkel framework allows a user to label training data using expert-composed heuristics, which are then transformed via the {``}generative step{''} into probability distributions of the class labels given the training candidates. These results are later generalized using a discriminative model. Snorkel{'}s attractive promise to create a large amount of annotated data from a smaller set of training data by unifying the output of a set of heuristics has yet to be used for computationally difficult tasks, such as that of discourse attachment, in which one must decide where a given discourse unit attaches to other units in a text in order to form a coherent discourse structure. Although approaching this problem using Snorkel requires significant modifications to the structure of the heuristics, we show that weak supervision methods can be more than competitive with classical supervised learning approaches to the attachment problem. |
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1061/
PDF https://www.aclweb.org/anthology/P19-1061
PWC https://paperswithcode.com/paper/data-programming-for-learning-discourse
Repo
Framework

Teaching FORGe to Verbalize DBpedia Properties in Spanish

Title Teaching FORGe to Verbalize DBpedia Properties in Spanish
Authors Simon Mille, Stamatia Dasiopoulou, Beatriz Fisas, Leo Wanner
Abstract Statistical generators increasingly dominate the research in NLG. However, grammar-based generators that are grounded in a solid linguistic framework remain very competitive, especially for generation from deep knowledge structures. Furthermore, if built modularly, they can be ported to other genres and languages with a limited amount of work, without the need of the annotation of a considerable amount of training data. One of these generators is FORGe, which is based on the Meaning-Text Model. In the recent WebNLG challenge (the first comprehensive task addressing the mapping of RDF triples to text) FORGe ranked first with respect to the overall quality in human evaluation. We extend the coverage of FORGE{'}s open source grammatical and lexical resources for English, so as to further improve the English texts, and port them to Spanish, to achieve a comparable quality. This confirms that, as already observed in the case of SimpleNLG, a robust universal grammar-driven framework and a systematic organization of the linguistic resources can be an adequate choice for NLG applications.
Tasks
Published 2019-10-01
URL https://www.aclweb.org/anthology/W19-8659/
PDF https://www.aclweb.org/anthology/W19-8659
PWC https://paperswithcode.com/paper/teaching-forge-to-verbalize-dbpedia
Repo
Framework

Sentence-Level Adaptation for Low-Resource Neural Machine Translation

Title Sentence-Level Adaptation for Low-Resource Neural Machine Translation
Authors Aaron Mueller, Yash Kumar Lal
Abstract
Tasks Low-Resource Neural Machine Translation, Machine Translation
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-6807/
PDF https://www.aclweb.org/anthology/W19-6807
PWC https://paperswithcode.com/paper/sentence-level-adaptation-for-low-resource
Repo
Framework

End-to-end Deep Reinforcement Learning Based Coreference Resolution

Title End-to-end Deep Reinforcement Learning Based Coreference Resolution
Authors Hongliang Fei, Xu Li, Dingcheng Li, Ping Li
Abstract Recent neural network models have significantly advanced the task of coreference resolution. However, current neural coreference models are usually trained with heuristic loss functions that are computed over a sequence of local decisions. In this paper, we introduce an end-to-end reinforcement learning based coreference resolution model to directly optimize coreference evaluation metrics. Specifically, we modify the state-of-the-art higher-order mention ranking approach in Lee et al. (2018) to a reinforced policy gradient model by incorporating the reward associated with a sequence of coreference linking actions. Furthermore, we introduce maximum entropy regularization for adequate exploration to prevent the model from prematurely converging to a bad local optimum. Our proposed model achieves new state-of-the-art performance on the English OntoNotes v5.0 benchmark.
Tasks Coreference Resolution
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1064/
PDF https://www.aclweb.org/anthology/P19-1064
PWC https://paperswithcode.com/paper/end-to-end-deep-reinforcement-learning-based
Repo
Framework

Semantic Change in the Language of UK Parliamentary Debates

Title Semantic Change in the Language of UK Parliamentary Debates
Authors Gavin Abercrombie, Riza Batista-Navarro
Abstract We investigate changes in the meanings of words used in the UK Parliament across two different epochs. We use word embeddings to explore changes in the distribution of words of interest and uncover words that appear to have undergone semantic transformation in the intervening period, and explore different ways of obtaining target words for this purpose. We find that semantic changes are generally in line with those found in other corpora, and little evidence that parliamentary language is more static than general English. It also seems that words with senses that have been recorded in the dictionary as having fallen into disuse do not undergo semantic changes in this domain.
Tasks Word Embeddings
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4726/
PDF https://www.aclweb.org/anthology/W19-4726
PWC https://paperswithcode.com/paper/semantic-change-in-the-language-of-uk
Repo
Framework

Approaching SMM4H with Merged Models and Multi-task Learning

Title Approaching SMM4H with Merged Models and Multi-task Learning
Authors Tilia Ellendorff, Lenz Furrer, Nicola Colic, No{"e}mi Aepli, Fabio Rinaldi
Abstract We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps.
Tasks Multi-Task Learning
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3208/
PDF https://www.aclweb.org/anthology/W19-3208
PWC https://paperswithcode.com/paper/approaching-smm4h-with-merged-models-and
Repo
Framework

Clark Kent at SemEval-2019 Task 4: Stylometric Insights into Hyperpartisan News Detection

Title Clark Kent at SemEval-2019 Task 4: Stylometric Insights into Hyperpartisan News Detection
Authors Viresh Gupta, Baani Leen Kaur Jolly, Ramneek Kaur, Tanmoy Chakraborty
Abstract In this paper, we present a news bias prediction system, which we developed as part of a SemEval 2019 task. We developed an XGBoost based system which uses character and word level n-gram features represented using TF-IDF, count vector based correlation matrix, and predicts if an input news article is a hyperpartisan news article. Our model was able to achieve a precision of 68.3{%} on the test set provided by the contest organizers. We also run our model on the BuzzFeed corpus and find XGBoost with simple character level N-Gram embeddings to be performing well with an accuracy of around 96{%}.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2159/
PDF https://www.aclweb.org/anthology/S19-2159
PWC https://paperswithcode.com/paper/clark-kent-at-semeval-2019-task-4-stylometric
Repo
Framework

JHU System Description for the MADAR Arabic Dialect Identification Shared Task

Title JHU System Description for the MADAR Arabic Dialect Identification Shared Task
Authors Tom Lippincott, Pamela Shapiro, Kevin Duh, Paul McNamee
Abstract Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models. We found several of these techniques provided small boosts in performance, though a simple character-level language model was a strong baseline, and a lower-order LM achieved best performance on Subtask 2. Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel. This suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning.
Tasks Language Modelling, Word Embeddings
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4634/
PDF https://www.aclweb.org/anthology/W19-4634
PWC https://paperswithcode.com/paper/jhu-system-description-for-the-madar-arabic
Repo
Framework

Detecting and Extracting of Adverse Drug Reaction Mentioning Tweets with Multi-Head Self Attention

Title Detecting and Extracting of Adverse Drug Reaction Mentioning Tweets with Multi-Head Self Attention
Authors Suyu Ge, Tao Qi, Chuhan Wu, Yongfeng Huang
Abstract This paper describes our system for the first and second shared tasks of the fourth Social Media Mining for Health Applications (SMM4H) workshop. We enhance tweet representation with a language model and distinguish the importance of different words with Multi-Head Self-Attention. In addition, transfer learning is exploited to make up for the data shortage. Our system achieved competitive results on both tasks with an F1-score of 0.5718 for task 1 and 0.653 (overlap) / 0.357 (strict) for task 2.
Tasks Language Modelling, Transfer Learning
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3214/
PDF https://www.aclweb.org/anthology/W19-3214
PWC https://paperswithcode.com/paper/detecting-and-extracting-of-adverse-drug
Repo
Framework

Gaussian Process Models of Sound Change in Indo-Aryan Dialectology

Title Gaussian Process Models of Sound Change in Indo-Aryan Dialectology
Authors Chundra Cathcart
Abstract This paper proposes a Gaussian Process model of sound change targeted toward questions in Indo-Aryan dialectology. Gaussian Processes (GPs) provide a flexible means of expressing covariance between outcomes, and can be extended to a wide variety of probability distributions. We find that GP models fare better in terms of some key posterior predictive checks than models that do not express covariance between sound changes, and outline directions for future work.
Tasks Gaussian Processes
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4732/
PDF https://www.aclweb.org/anthology/W19-4732
PWC https://paperswithcode.com/paper/gaussian-process-models-of-sound-change-in
Repo
Framework

Modeling a Historical Variety of a Low-Resource Language: Language Contact Effects in the Verbal Cluster of Early-Modern Frisian

Title Modeling a Historical Variety of a Low-Resource Language: Language Contact Effects in the Verbal Cluster of Early-Modern Frisian
Authors Jelke Bloem, Arjen Versloot, Fred Weerman
Abstract Certain phenomena of interest to linguists mainly occur in low-resource languages, such as contact-induced language change. We show that it is possible to study contact-induced language change computationally in a historical variety of a low-resource language, Early-Modern Frisian, by creating a model using features that were established to be relevant in a closely related language, modern Dutch. This allows us to test two hypotheses on two types of language contact that may have taken place between Frisian and Dutch during this time. Our model shows that Frisian verb cluster word orders are associated with different context features than Dutch verb orders, supporting the {`}learned borrowing{'} hypothesis. |
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4733/
PDF https://www.aclweb.org/anthology/W19-4733
PWC https://paperswithcode.com/paper/modeling-a-historical-variety-of-a-low
Repo
Framework

Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Title Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
Authors
Abstract
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4800/
PDF https://www.aclweb.org/anthology/W19-4800
PWC https://paperswithcode.com/paper/proceedings-of-the-2019-acl-workshop
Repo
Framework
comments powered by Disqus