Paper Group NANR 85
A Compact and Language-Sensitive Multilingual Translation Method. Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH. Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology. A distantly supervised dataset for automated data extraction from diagnostic studi …
A Compact and Language-Sensitive Multilingual Translation Method
Title | A Compact and Language-Sensitive Multilingual Translation Method |
Authors | Yining Wang, Long Zhou, Jiajun Zhang, Feifei Zhai, Jingfang Xu, Chengqing Zong |
Abstract | Multilingual neural machine translation (Multi-NMT) with one encoder-decoder model has made remarkable progress due to its simple deployment. However, this multilingual translation paradigm does not make full use of language commonality and parameter sharing between encoder and decoder. Furthermore, this kind of paradigm cannot outperform the individual models trained on bilingual corpus in most cases. In this paper, we propose a compact and language-sensitive method for multilingual translation. To maximize parameter sharing, we first present a universal representor to replace both encoder and decoder models. To make the representor sensitive for specific languages, we further introduce language-sensitive embedding, attention, and discriminator with the ability to enhance model performance. We verify our methods on various translation scenarios, including one-to-many, many-to-many and zero-shot. Extensive experiments demonstrate that our proposed methods remarkably outperform strong standard multilingual translation systems on WMT and IWSLT datasets. Moreover, we find that our model is especially helpful in low-resource and zero-shot translation scenarios. |
Tasks | Machine Translation |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1117/ |
https://www.aclweb.org/anthology/P19-1117 | |
PWC | https://paperswithcode.com/paper/a-compact-and-language-sensitive-multilingual |
Repo | |
Framework | |
Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH
Title | Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH |
Authors | Thierry Declerck, Melanie Siegel, Stefania Racioppa |
Abstract | We describe work consisting in porting two large German lexical resources into the OntoLex-Lemon model in order to establish complementary interlinkings between them. One resource is OdeNet (Open German WordNet) and the other is a further development of the German version of the MMORPH morphological analyzer. We show how the Multiword Expressions (MWEs) contained in OdeNet can be morphologically specified by the use of the lexical representation and linking features of OntoLex-Lemon, which also support the formulation of restrictions in the usage of such expressions. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5104/ |
https://www.aclweb.org/anthology/W19-5104 | |
PWC | https://paperswithcode.com/paper/using-ontolex-lemon-for-representing-and |
Repo | |
Framework | |
Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology
Title | Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology |
Authors | Yang Xu, Jiasheng Zhang, David Reitter |
Abstract | We use a variant of word embedding model that incorporates subword information to characterize the degree of compositionality in lexical semantics. Our models reveal some interesting yet contrastive patterns of long-term change in multiple languages: Indo-European languages put more weight on subword units in newer words, while conversely Chinese puts less weights on the subwords, but more weight on the word as a whole. Our method provides novel evidence and methodology that enriches existing theories in evolutionary linguistics. The resulting word vectors also has decent performance in NLP-related tasks. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4717/ |
https://www.aclweb.org/anthology/W19-4717 | |
PWC | https://paperswithcode.com/paper/treat-the-word-as-a-whole-or-look-inside |
Repo | |
Framework | |
A distantly supervised dataset for automated data extraction from diagnostic studies
Title | A distantly supervised dataset for automated data extraction from diagnostic studies |
Authors | Christopher Norman, Mariska Leeflang, Ren{'e} Spijker, Evangelos Kanoulas, Aur{'e}lie N{'e}v{'e}ol |
Abstract | Systematic reviews are important in evidence based medicine, but are expensive to produce. Automating or semi-automating the data extraction of index test, target condition, and reference standard from articles has the potential to decrease the cost of conducting systematic reviews of diagnostic test accuracy, but relevant training data is not available. We create a distantly supervised dataset of approximately 90,000 sentences, and let two experts manually annotate a small subset of around 1,000 sentences for evaluation. We evaluate the performance of BioBERT and logistic regression for ranking the sentences, and compare the performance for distant and direct supervision. Our results suggest that distant supervision can work as well as, or better than direct supervision on this problem, and that distantly trained models can perform as well as, or better than human annotators. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5012/ |
https://www.aclweb.org/anthology/W19-5012 | |
PWC | https://paperswithcode.com/paper/a-distantly-supervised-dataset-for-automated |
Repo | |
Framework | |
Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches
Title | Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches |
Authors | Talha {\c{C}}olako{\u{g}}lu, Umut Sulubacak, Ahmet C{"u}neyd Tantu{\u{g}} |
Abstract | With the growth of the social web, user-generated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a token level pipeline of modules, heavily dependent on external linguistic resources and manually defined rules. Instead, we propose a fully automated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin. |
Tasks | Machine Translation |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-2037/ |
https://www.aclweb.org/anthology/P19-2037 | |
PWC | https://paperswithcode.com/paper/normalizing-non-canonical-turkish-texts-using |
Repo | |
Framework | |
Enhancing biomedical word embeddings by retrofitting to verb clusters
Title | Enhancing biomedical word embeddings by retrofitting to verb clusters |
Authors | Billy Chiu, Simon Baker, Martha Palmer, Anna Korhonen |
Abstract | Verbs play a fundamental role in many biomed-ical tasks and applications such as relation and event extraction. We hypothesize that performance on many downstream tasks can be improved by aligning the input pretrained embeddings according to semantic verb classes.In this work, we show that by using semantic clusters for verbs, a large lexicon of verbclasses derived from biomedical literature, weare able to improve the performance of common pretrained embeddings in downstream tasks by retrofitting them to verb classes. We present a simple and computationally efficient approach using a widely-available {``}off-the-shelf{''} retrofitting algorithm to align pretrained embeddings according to semantic verb clusters. We achieve state-of-the-art results on text classification and relation extraction tasks. | |
Tasks | Relation Extraction, Text Classification, Word Embeddings |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5014/ |
https://www.aclweb.org/anthology/W19-5014 | |
PWC | https://paperswithcode.com/paper/enhancing-biomedical-word-embeddings-by |
Repo | |
Framework | |
Commonsense inference in human-robot communication
Title | Commonsense inference in human-robot communication |
Authors | Aliaks Huminski, r, Yan Bin Ng, Kenneth Kwok, Francis Bond |
Abstract | Natural language communication between machines and humans are still constrained. The article addresses a gap in natural language understanding about actions, specifically that of understanding commands. We propose a new method for commonsense inference (grounding) of high-level natural language commands into specific action commands for further execution by a robotic system. The method allows to build a knowledge base that consists of a large set of commonsense inferences. The preliminary results have been presented. |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-6013/ |
https://www.aclweb.org/anthology/D19-6013 | |
PWC | https://paperswithcode.com/paper/commonsense-inference-in-human-robot |
Repo | |
Framework | |
Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles
Title | Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles |
Authors | Nitin Saini, Eric Price, Rahul Tallamraju, Raffi Enficiaud, Roman Ludwig, Igor Martinovic, Aamir Ahmad, Michael J. Black |
Abstract | Capturing human motion in natural scenarios means moving motion capture out of the lab and into the wild. Typical approaches rely on fixed, calibrated, cameras and reflective markers on the body, significantly limiting the motions that can be captured. To make motion capture truly unconstrained, we describe the first fully autonomous outdoor capture system based on flying vehicles. We use multiple micro-aerial-vehicles(MAVs), each equipped with a monocular RGB camera, an IMU, and a GPS receiver module. These detect the person, optimize their position, and localize themselves approximately. We then develop a markerless motion capture method that is suitable for this challenging scenario with a distant subject, viewed from above, with approximately calibrated and moving cameras. We combine multiple state-of-the-art 2D joint detectors with a 3D human body model and a powerful prior on human pose. We jointly optimize for 3D body pose and camera pose to robustly fit the 2D measurements. To our knowledge, this is the first successful demonstration of outdoor, full-body, markerless motion capture from autonomous flying vehicles. |
Tasks | Markerless Motion Capture, Motion Capture |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Saini_Markerless_Outdoor_Human_Motion_Capture_Using_Multiple_Autonomous_Micro_Aerial_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Saini_Markerless_Outdoor_Human_Motion_Capture_Using_Multiple_Autonomous_Micro_Aerial_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/markerless-outdoor-human-motion-capture-using |
Repo | |
Framework | |
Extractive NarrativeQA with Heuristic Pre-Training
Title | Extractive NarrativeQA with Heuristic Pre-Training |
Authors | Lea Frermann |
Abstract | Although advances in neural architectures for NLP problems as well as unsupervised pre-training have led to substantial improvements on question answering and natural language inference, understanding of and reasoning over long texts still poses a substantial challenge. Here, we consider the task of question answering from full narratives (e.g., books or movie scripts), or their summaries, tackling the NarrativeQA challenge (NQA; Kocisky et al. (2018)). We introduce a heuristic extractive version of the data set, which allows us to approach the more feasible problem of answer extraction (rather than generation). We train systems for passage retrieval as well as answer span prediction using this data set. We use pre-trained BERT embeddings for injecting prior knowledge into our system. We show that our setup leads to state of the art performance on summary-level QA. On QA from full narratives, our model outperforms previous models on the METEOR metric. We analyze the relative contributions of pre-trained embeddings and the extractive training paradigm, and provide a detailed error analysis. |
Tasks | Natural Language Inference, Question Answering |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5823/ |
https://www.aclweb.org/anthology/D19-5823 | |
PWC | https://paperswithcode.com/paper/extractive-narrativeqa-with-heuristic-pre |
Repo | |
Framework | |
Tintin at SemEval-2019 Task 4: Detecting Hyperpartisan News Article with only Simple Tokens
Title | Tintin at SemEval-2019 Task 4: Detecting Hyperpartisan News Article with only Simple Tokens |
Authors | Yves Bestgen |
Abstract | Tintin, the system proposed by the CECL for the Hyperpartisan News Detection task of SemEval 2019, is exclusively based on the tokens that make up the documents and a standard supervised learning procedure. It obtained very contrasting results: poor on the main task, but much more effective at distinguishing documents published by hyperpartisan media outlets from unbiased ones, as it ranked first. An analysis of the most important features highlighted the positive aspects, but also some potential limitations of the approach. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2186/ |
https://www.aclweb.org/anthology/S19-2186 | |
PWC | https://paperswithcode.com/paper/tintin-at-semeval-2019-task-4-detecting |
Repo | |
Framework | |
Information-theoretic locality properties of natural language
Title | Information-theoretic locality properties of natural language |
Authors | Richard Futrell |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7902/ |
https://www.aclweb.org/anthology/W19-7902 | |
PWC | https://paperswithcode.com/paper/information-theoretic-locality-properties-of |
Repo | |
Framework | |
Locally Linear Unsupervised Feature Selection
Title | Locally Linear Unsupervised Feature Selection |
Authors | Guillaume DOQUET, Michèle SEBAG |
Abstract | The paper, interested in unsupervised feature selection, aims to retain the features best accounting for the local patterns in the data. The proposed approach, called Locally Linear Unsupervised Feature Selection, relies on a dimensionality reduction method to characterize such patterns; each feature is thereafter assessed according to its compliance w.r.t. the local patterns, taking inspiration from Locally Linear Embedding (Roweis and Saul, 2000). The experimental validation of the approach on the scikit-feature benchmark suite demonstrates its effectiveness compared to the state of the art. |
Tasks | Dimensionality Reduction, Feature Selection |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=ByxF-nAqYX |
https://openreview.net/pdf?id=ByxF-nAqYX | |
PWC | https://paperswithcode.com/paper/locally-linear-unsupervised-feature-selection |
Repo | |
Framework | |
Automatic error classification with multiple error labels
Title | Automatic error classification with multiple error labels |
Authors | Maja Popovic, David Vilar |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-6609/ |
https://www.aclweb.org/anthology/W19-6609 | |
PWC | https://paperswithcode.com/paper/automatic-error-classification-with-multiple |
Repo | |
Framework | |
Towards augmenting crisis counselor training by improving message retrieval
Title | Towards augmenting crisis counselor training by improving message retrieval |
Authors | Orianna Demasi, Marti A. Hearst, Benjamin Recht |
Abstract | A fundamental challenge when training counselors is presenting novices with the opportunity to practice counseling distressed individuals without exacerbating a situation. Rather than replacing human empathy with an automated counselor, we propose simulating an individual in crisis so that human counselors in training can practice crisis counseling in a low-risk environment. Towards this end, we collect a dataset of suicide prevention counselor role-play transcripts and make initial steps towards constructing a CRISISbot for humans to counsel while in training. In this data-constrained setting, we evaluate the potential for message retrieval to construct a coherent chat agent in light of recent advances with text embedding methods. Our results show that embeddings can considerably improve retrieval approaches to make them competitive with generative models. By coherently retrieving messages, we can help counselors practice chatting in a low-risk environment. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/W19-3001/ |
https://www.aclweb.org/anthology/W19-3001 | |
PWC | https://paperswithcode.com/paper/towards-augmenting-crisis-counselor-training |
Repo | |
Framework | |
Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases
Title | Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases |
Authors | Lama Alsudias, Paul Rayson |
Abstract | |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/W19-5604/ |
https://www.aclweb.org/anthology/W19-5604 | |
PWC | https://paperswithcode.com/paper/classifying-information-sources-in-arabic |
Repo | |
Framework | |