Paper Group NANR 82
Incorporating Source Syntax into Transformer-Based Neural Machine Translation. Smoothing the Geometry of Probabilistic Box Embeddings. Improving machine classification using human uncertainty measurements. Data Programming for Learning Discourse Structure. Teaching FORGe to Verbalize DBpedia Properties in Spanish. Sentence-Level Adaptation for Low- …
Incorporating Source Syntax into Transformer-Based Neural Machine Translation
Title | Incorporating Source Syntax into Transformer-Based Neural Machine Translation |
Authors | Anna Currey, Kenneth Heafield |
Abstract | Transformer-based neural machine translation (NMT) has recently achieved state-of-the-art performance on many machine translation tasks. However, recent work (Raganato and Tiedemann, 2018; Tang et al., 2018; Tran et al., 2018) has indicated that Transformer models may not learn syntactic structures as well as their recurrent neural network-based counterparts, particularly in low-resource cases. In this paper, we incorporate constituency parse information into a Transformer NMT model. We leverage linearized parses of the source training sentences in order to inject syntax into the Transformer architecture without modifying it. We introduce two methods: a multi-task machine translation and parsing model with a single encoder and decoder, and a mixed encoder model that learns to translate directly from parsed and unparsed source sentences. We evaluate our methods on low-resource translation from English into twenty target languages, showing consistent improvements of 1.3 BLEU on average across diverse target languages for the multi-task technique. We further evaluate the models on full-scale WMT tasks, finding that the multi-task model aids low- and medium-resource NMT but degenerates high-resource English-German translation. |
Tasks | Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5203/ |
https://www.aclweb.org/anthology/W19-5203 | |
PWC | https://paperswithcode.com/paper/incorporating-source-syntax-into-transformer |
Repo | |
Framework | |
Smoothing the Geometry of Probabilistic Box Embeddings
Title | Smoothing the Geometry of Probabilistic Box Embeddings |
Authors | Xiang Li, Luke Vilnis, Dongxu Zhang, Michael Boratko, Andrew McCallum |
Abstract | There is growing interest in geometrically-inspired embeddings for learning hierarchies, partial orders, and lattice structures, with natural applications to transitive relational data such as entailment graphs. Recent work has extended these ideas beyond deterministic hierarchies to probabilistically calibrated models, which enable learning from uncertain supervision and inferring soft-inclusions among concepts, while maintaining the geometric inductive bias of hierarchical embedding models. We build on the Box Lattice model of Vilnis et al. (2018), which showed promising results in modeling soft-inclusions through an overlapping hierarchy of sets, parameterized as high-dimensional hyperrectangles (boxes). However, the hard edges of the boxes present difficulties for standard gradient based optimization; that work employed a special surrogate function for the disjoint case, but we find this method to be fragile. In this work, we present a novel hierarchical embedding model, inspired by a relaxation of box embeddings into parameterized density functions using Gaussian convolutions over the boxes. Our approach provides an alternative surrogate to the original lattice measure that improves the robustness of optimization in the disjoint case, while also preserving the desirable properties with respect to the original lattice. We demonstrate increased or matching performance on WordNet hypernymy prediction, Flickr caption entailment, and a MovieLens-based market basket dataset. We show especially marked improvements in the case of sparse data, where many conditional probabilities should be low, and thus boxes should be nearly disjoint. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=H1xSNiRcF7 |
https://openreview.net/pdf?id=H1xSNiRcF7 | |
PWC | https://paperswithcode.com/paper/smoothing-the-geometry-of-probabilistic-box |
Repo | |
Framework | |
Improving machine classification using human uncertainty measurements
Title | Improving machine classification using human uncertainty measurements |
Authors | Ruairidh M. Battleday, Joshua C. Peterson, Thomas L. Griffiths |
Abstract | As deep CNN classifier performance using ground-truth labels has begun to asymptote at near-perfect levels, a key aim for the field is to extend training paradigms to capture further useful structure in natural image data and improve model robustness and generalization. In this paper, we present a novel natural image benchmark for making this extension, which we call CIFAR10H. This new dataset comprises a human-derived, full distribution over labels for each image of the CIFAR10 test set, offering the ability to assess the generalization of state-of-the-art CIFAR10 models, as well as investigate the effects of including this information in model training. We show that classification models trained on CIFAR10 do not generalize as well to our dataset as it does to traditional extensions, and that models fine-tuned using our label information are able to generalize better to related datasets, complement popular data augmentation schemes, and provide robustness to adversarial attacks. We explain these improvements in terms of better empirical approximations to the expected loss function over natural images and their categories in the visual world. |
Tasks | Data Augmentation |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=rJl8BhRqF7 |
https://openreview.net/pdf?id=rJl8BhRqF7 | |
PWC | https://paperswithcode.com/paper/improving-machine-classification-using-human |
Repo | |
Framework | |
Data Programming for Learning Discourse Structure
Title | Data Programming for Learning Discourse Structure |
Authors | Sonia Badene, Kate Thompson, Jean-Pierre Lorr{'e}, Nicholas Asher |
Abstract | This paper investigates the advantages and limits of data programming for the task of learning discourse structure. The data programming paradigm implemented in the Snorkel framework allows a user to label training data using expert-composed heuristics, which are then transformed via the {``}generative step{''} into probability distributions of the class labels given the training candidates. These results are later generalized using a discriminative model. Snorkel{'}s attractive promise to create a large amount of annotated data from a smaller set of training data by unifying the output of a set of heuristics has yet to be used for computationally difficult tasks, such as that of discourse attachment, in which one must decide where a given discourse unit attaches to other units in a text in order to form a coherent discourse structure. Although approaching this problem using Snorkel requires significant modifications to the structure of the heuristics, we show that weak supervision methods can be more than competitive with classical supervised learning approaches to the attachment problem. | |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1061/ |
https://www.aclweb.org/anthology/P19-1061 | |
PWC | https://paperswithcode.com/paper/data-programming-for-learning-discourse |
Repo | |
Framework | |
Teaching FORGe to Verbalize DBpedia Properties in Spanish
Title | Teaching FORGe to Verbalize DBpedia Properties in Spanish |
Authors | Simon Mille, Stamatia Dasiopoulou, Beatriz Fisas, Leo Wanner |
Abstract | Statistical generators increasingly dominate the research in NLG. However, grammar-based generators that are grounded in a solid linguistic framework remain very competitive, especially for generation from deep knowledge structures. Furthermore, if built modularly, they can be ported to other genres and languages with a limited amount of work, without the need of the annotation of a considerable amount of training data. One of these generators is FORGe, which is based on the Meaning-Text Model. In the recent WebNLG challenge (the first comprehensive task addressing the mapping of RDF triples to text) FORGe ranked first with respect to the overall quality in human evaluation. We extend the coverage of FORGE{'}s open source grammatical and lexical resources for English, so as to further improve the English texts, and port them to Spanish, to achieve a comparable quality. This confirms that, as already observed in the case of SimpleNLG, a robust universal grammar-driven framework and a systematic organization of the linguistic resources can be an adequate choice for NLG applications. |
Tasks | |
Published | 2019-10-01 |
URL | https://www.aclweb.org/anthology/W19-8659/ |
https://www.aclweb.org/anthology/W19-8659 | |
PWC | https://paperswithcode.com/paper/teaching-forge-to-verbalize-dbpedia |
Repo | |
Framework | |
Sentence-Level Adaptation for Low-Resource Neural Machine Translation
Title | Sentence-Level Adaptation for Low-Resource Neural Machine Translation |
Authors | Aaron Mueller, Yash Kumar Lal |
Abstract | |
Tasks | Low-Resource Neural Machine Translation, Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-6807/ |
https://www.aclweb.org/anthology/W19-6807 | |
PWC | https://paperswithcode.com/paper/sentence-level-adaptation-for-low-resource |
Repo | |
Framework | |
End-to-end Deep Reinforcement Learning Based Coreference Resolution
Title | End-to-end Deep Reinforcement Learning Based Coreference Resolution |
Authors | Hongliang Fei, Xu Li, Dingcheng Li, Ping Li |
Abstract | Recent neural network models have significantly advanced the task of coreference resolution. However, current neural coreference models are usually trained with heuristic loss functions that are computed over a sequence of local decisions. In this paper, we introduce an end-to-end reinforcement learning based coreference resolution model to directly optimize coreference evaluation metrics. Specifically, we modify the state-of-the-art higher-order mention ranking approach in Lee et al. (2018) to a reinforced policy gradient model by incorporating the reward associated with a sequence of coreference linking actions. Furthermore, we introduce maximum entropy regularization for adequate exploration to prevent the model from prematurely converging to a bad local optimum. Our proposed model achieves new state-of-the-art performance on the English OntoNotes v5.0 benchmark. |
Tasks | Coreference Resolution |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1064/ |
https://www.aclweb.org/anthology/P19-1064 | |
PWC | https://paperswithcode.com/paper/end-to-end-deep-reinforcement-learning-based |
Repo | |
Framework | |
Semantic Change in the Language of UK Parliamentary Debates
Title | Semantic Change in the Language of UK Parliamentary Debates |
Authors | Gavin Abercrombie, Riza Batista-Navarro |
Abstract | We investigate changes in the meanings of words used in the UK Parliament across two different epochs. We use word embeddings to explore changes in the distribution of words of interest and uncover words that appear to have undergone semantic transformation in the intervening period, and explore different ways of obtaining target words for this purpose. We find that semantic changes are generally in line with those found in other corpora, and little evidence that parliamentary language is more static than general English. It also seems that words with senses that have been recorded in the dictionary as having fallen into disuse do not undergo semantic changes in this domain. |
Tasks | Word Embeddings |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4726/ |
https://www.aclweb.org/anthology/W19-4726 | |
PWC | https://paperswithcode.com/paper/semantic-change-in-the-language-of-uk |
Repo | |
Framework | |
Approaching SMM4H with Merged Models and Multi-task Learning
Title | Approaching SMM4H with Merged Models and Multi-task Learning |
Authors | Tilia Ellendorff, Lenz Furrer, Nicola Colic, No{"e}mi Aepli, Fabio Rinaldi |
Abstract | We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps. |
Tasks | Multi-Task Learning |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-3208/ |
https://www.aclweb.org/anthology/W19-3208 | |
PWC | https://paperswithcode.com/paper/approaching-smm4h-with-merged-models-and |
Repo | |
Framework | |
Clark Kent at SemEval-2019 Task 4: Stylometric Insights into Hyperpartisan News Detection
Title | Clark Kent at SemEval-2019 Task 4: Stylometric Insights into Hyperpartisan News Detection |
Authors | Viresh Gupta, Baani Leen Kaur Jolly, Ramneek Kaur, Tanmoy Chakraborty |
Abstract | In this paper, we present a news bias prediction system, which we developed as part of a SemEval 2019 task. We developed an XGBoost based system which uses character and word level n-gram features represented using TF-IDF, count vector based correlation matrix, and predicts if an input news article is a hyperpartisan news article. Our model was able to achieve a precision of 68.3{%} on the test set provided by the contest organizers. We also run our model on the BuzzFeed corpus and find XGBoost with simple character level N-Gram embeddings to be performing well with an accuracy of around 96{%}. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2159/ |
https://www.aclweb.org/anthology/S19-2159 | |
PWC | https://paperswithcode.com/paper/clark-kent-at-semeval-2019-task-4-stylometric |
Repo | |
Framework | |
JHU System Description for the MADAR Arabic Dialect Identification Shared Task
Title | JHU System Description for the MADAR Arabic Dialect Identification Shared Task |
Authors | Tom Lippincott, Pamela Shapiro, Kevin Duh, Paul McNamee |
Abstract | Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models. We found several of these techniques provided small boosts in performance, though a simple character-level language model was a strong baseline, and a lower-order LM achieved best performance on Subtask 2. Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel. This suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning. |
Tasks | Language Modelling, Word Embeddings |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4634/ |
https://www.aclweb.org/anthology/W19-4634 | |
PWC | https://paperswithcode.com/paper/jhu-system-description-for-the-madar-arabic |
Repo | |
Framework | |
Detecting and Extracting of Adverse Drug Reaction Mentioning Tweets with Multi-Head Self Attention
Title | Detecting and Extracting of Adverse Drug Reaction Mentioning Tweets with Multi-Head Self Attention |
Authors | Suyu Ge, Tao Qi, Chuhan Wu, Yongfeng Huang |
Abstract | This paper describes our system for the first and second shared tasks of the fourth Social Media Mining for Health Applications (SMM4H) workshop. We enhance tweet representation with a language model and distinguish the importance of different words with Multi-Head Self-Attention. In addition, transfer learning is exploited to make up for the data shortage. Our system achieved competitive results on both tasks with an F1-score of 0.5718 for task 1 and 0.653 (overlap) / 0.357 (strict) for task 2. |
Tasks | Language Modelling, Transfer Learning |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-3214/ |
https://www.aclweb.org/anthology/W19-3214 | |
PWC | https://paperswithcode.com/paper/detecting-and-extracting-of-adverse-drug |
Repo | |
Framework | |
Gaussian Process Models of Sound Change in Indo-Aryan Dialectology
Title | Gaussian Process Models of Sound Change in Indo-Aryan Dialectology |
Authors | Chundra Cathcart |
Abstract | This paper proposes a Gaussian Process model of sound change targeted toward questions in Indo-Aryan dialectology. Gaussian Processes (GPs) provide a flexible means of expressing covariance between outcomes, and can be extended to a wide variety of probability distributions. We find that GP models fare better in terms of some key posterior predictive checks than models that do not express covariance between sound changes, and outline directions for future work. |
Tasks | Gaussian Processes |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4732/ |
https://www.aclweb.org/anthology/W19-4732 | |
PWC | https://paperswithcode.com/paper/gaussian-process-models-of-sound-change-in |
Repo | |
Framework | |
Modeling a Historical Variety of a Low-Resource Language: Language Contact Effects in the Verbal Cluster of Early-Modern Frisian
Title | Modeling a Historical Variety of a Low-Resource Language: Language Contact Effects in the Verbal Cluster of Early-Modern Frisian |
Authors | Jelke Bloem, Arjen Versloot, Fred Weerman |
Abstract | Certain phenomena of interest to linguists mainly occur in low-resource languages, such as contact-induced language change. We show that it is possible to study contact-induced language change computationally in a historical variety of a low-resource language, Early-Modern Frisian, by creating a model using features that were established to be relevant in a closely related language, modern Dutch. This allows us to test two hypotheses on two types of language contact that may have taken place between Frisian and Dutch during this time. Our model shows that Frisian verb cluster word orders are associated with different context features than Dutch verb orders, supporting the {`}learned borrowing{'} hypothesis. | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4733/ |
https://www.aclweb.org/anthology/W19-4733 | |
PWC | https://paperswithcode.com/paper/modeling-a-historical-variety-of-a-low |
Repo | |
Framework | |
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
Title | Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP |
Authors | |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4800/ |
https://www.aclweb.org/anthology/W19-4800 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-2019-acl-workshop |
Repo | |
Framework | |