January 25, 2020

2043 words 10 mins read

Paper Group NANR 85

A Compact and Language-Sensitive Multilingual Translation Method. Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH. Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology. A distantly supervised dataset for automated data extraction from diagnostic studi …

A Compact and Language-Sensitive Multilingual Translation Method


Title	A Compact and Language-Sensitive Multilingual Translation Method
Authors	Yining Wang, Long Zhou, Jiajun Zhang, Feifei Zhai, Jingfang Xu, Chengqing Zong
Abstract	Multilingual neural machine translation (Multi-NMT) with one encoder-decoder model has made remarkable progress due to its simple deployment. However, this multilingual translation paradigm does not make full use of language commonality and parameter sharing between encoder and decoder. Furthermore, this kind of paradigm cannot outperform the individual models trained on bilingual corpus in most cases. In this paper, we propose a compact and language-sensitive method for multilingual translation. To maximize parameter sharing, we first present a universal representor to replace both encoder and decoder models. To make the representor sensitive for specific languages, we further introduce language-sensitive embedding, attention, and discriminator with the ability to enhance model performance. We verify our methods on various translation scenarios, including one-to-many, many-to-many and zero-shot. Extensive experiments demonstrate that our proposed methods remarkably outperform strong standard multilingual translation systems on WMT and IWSLT datasets. Moreover, we find that our model is especially helpful in low-resource and zero-shot translation scenarios.
Tasks	Machine Translation
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1117/
PDF	https://www.aclweb.org/anthology/P19-1117
PWC	https://paperswithcode.com/paper/a-compact-and-language-sensitive-multilingual
Repo
Framework

Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH


Title	Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH
Authors	Thierry Declerck, Melanie Siegel, Stefania Racioppa
Abstract	We describe work consisting in porting two large German lexical resources into the OntoLex-Lemon model in order to establish complementary interlinkings between them. One resource is OdeNet (Open German WordNet) and the other is a further development of the German version of the MMORPH morphological analyzer. We show how the Multiword Expressions (MWEs) contained in OdeNet can be morphologically specified by the use of the lexical representation and linking features of OntoLex-Lemon, which also support the formulation of restrictions in the usage of such expressions.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5104/
PDF	https://www.aclweb.org/anthology/W19-5104
PWC	https://paperswithcode.com/paper/using-ontolex-lemon-for-representing-and
Repo
Framework

Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology


Title	Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology
Authors	Yang Xu, Jiasheng Zhang, David Reitter
Abstract	We use a variant of word embedding model that incorporates subword information to characterize the degree of compositionality in lexical semantics. Our models reveal some interesting yet contrastive patterns of long-term change in multiple languages: Indo-European languages put more weight on subword units in newer words, while conversely Chinese puts less weights on the subwords, but more weight on the word as a whole. Our method provides novel evidence and methodology that enriches existing theories in evolutionary linguistics. The resulting word vectors also has decent performance in NLP-related tasks.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4717/
PDF	https://www.aclweb.org/anthology/W19-4717
PWC	https://paperswithcode.com/paper/treat-the-word-as-a-whole-or-look-inside
Repo
Framework

A distantly supervised dataset for automated data extraction from diagnostic studies


Title	A distantly supervised dataset for automated data extraction from diagnostic studies
Authors	Christopher Norman, Mariska Leeflang, Ren{'e} Spijker, Evangelos Kanoulas, Aur{'e}lie N{'e}v{'e}ol
Abstract	Systematic reviews are important in evidence based medicine, but are expensive to produce. Automating or semi-automating the data extraction of index test, target condition, and reference standard from articles has the potential to decrease the cost of conducting systematic reviews of diagnostic test accuracy, but relevant training data is not available. We create a distantly supervised dataset of approximately 90,000 sentences, and let two experts manually annotate a small subset of around 1,000 sentences for evaluation. We evaluate the performance of BioBERT and logistic regression for ranking the sentences, and compare the performance for distant and direct supervision. Our results suggest that distant supervision can work as well as, or better than direct supervision on this problem, and that distantly trained models can perform as well as, or better than human annotators.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5012/
PDF	https://www.aclweb.org/anthology/W19-5012
PWC	https://paperswithcode.com/paper/a-distantly-supervised-dataset-for-automated
Repo
Framework

Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches


Title	Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches
Authors	Talha {\c{C}}olako{\u{g}}lu, Umut Sulubacak, Ahmet C{"u}neyd Tantu{\u{g}}
Abstract	With the growth of the social web, user-generated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a token level pipeline of modules, heavily dependent on external linguistic resources and manually defined rules. Instead, we propose a fully automated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.
Tasks	Machine Translation
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-2037/
PDF	https://www.aclweb.org/anthology/P19-2037
PWC	https://paperswithcode.com/paper/normalizing-non-canonical-turkish-texts-using
Repo
Framework

Enhancing biomedical word embeddings by retrofitting to verb clusters


Title	Enhancing biomedical word embeddings by retrofitting to verb clusters
Authors	Billy Chiu, Simon Baker, Martha Palmer, Anna Korhonen
Abstract	Verbs play a fundamental role in many biomed-ical tasks and applications such as relation and event extraction. We hypothesize that performance on many downstream tasks can be improved by aligning the input pretrained embeddings according to semantic verb classes.In this work, we show that by using semantic clusters for verbs, a large lexicon of verbclasses derived from biomedical literature, weare able to improve the performance of common pretrained embeddings in downstream tasks by retrofitting them to verb classes. We present a simple and computationally efficient approach using a widely-available {``}off-the-shelf{''} retrofitting algorithm to align pretrained embeddings according to semantic verb clusters. We achieve state-of-the-art results on text classification and relation extraction tasks. \|
Tasks	Relation Extraction, Text Classification, Word Embeddings
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5014/
PDF	https://www.aclweb.org/anthology/W19-5014
PWC	https://paperswithcode.com/paper/enhancing-biomedical-word-embeddings-by
Repo
Framework

Commonsense inference in human-robot communication


Title	Commonsense inference in human-robot communication
Authors	Aliaks Huminski, r, Yan Bin Ng, Kenneth Kwok, Francis Bond
Abstract	Natural language communication between machines and humans are still constrained. The article addresses a gap in natural language understanding about actions, specifically that of understanding commands. We propose a new method for commonsense inference (grounding) of high-level natural language commands into specific action commands for further execution by a robotic system. The method allows to build a knowledge base that consists of a large set of commonsense inferences. The preliminary results have been presented.
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-6013/
PDF	https://www.aclweb.org/anthology/D19-6013
PWC	https://paperswithcode.com/paper/commonsense-inference-in-human-robot
Repo
Framework

Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles


Title	Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles
Authors	Nitin Saini, Eric Price, Rahul Tallamraju, Raffi Enficiaud, Roman Ludwig, Igor Martinovic, Aamir Ahmad, Michael J. Black
Abstract	Capturing human motion in natural scenarios means moving motion capture out of the lab and into the wild. Typical approaches rely on fixed, calibrated, cameras and reflective markers on the body, significantly limiting the motions that can be captured. To make motion capture truly unconstrained, we describe the first fully autonomous outdoor capture system based on flying vehicles. We use multiple micro-aerial-vehicles(MAVs), each equipped with a monocular RGB camera, an IMU, and a GPS receiver module. These detect the person, optimize their position, and localize themselves approximately. We then develop a markerless motion capture method that is suitable for this challenging scenario with a distant subject, viewed from above, with approximately calibrated and moving cameras. We combine multiple state-of-the-art 2D joint detectors with a 3D human body model and a powerful prior on human pose. We jointly optimize for 3D body pose and camera pose to robustly fit the 2D measurements. To our knowledge, this is the first successful demonstration of outdoor, full-body, markerless motion capture from autonomous flying vehicles.
Tasks	Markerless Motion Capture, Motion Capture
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Saini_Markerless_Outdoor_Human_Motion_Capture_Using_Multiple_Autonomous_Micro_Aerial_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Saini_Markerless_Outdoor_Human_Motion_Capture_Using_Multiple_Autonomous_Micro_Aerial_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/markerless-outdoor-human-motion-capture-using
Repo
Framework

Extractive NarrativeQA with Heuristic Pre-Training


Title	Extractive NarrativeQA with Heuristic Pre-Training
Authors	Lea Frermann
Abstract	Although advances in neural architectures for NLP problems as well as unsupervised pre-training have led to substantial improvements on question answering and natural language inference, understanding of and reasoning over long texts still poses a substantial challenge. Here, we consider the task of question answering from full narratives (e.g., books or movie scripts), or their summaries, tackling the NarrativeQA challenge (NQA; Kocisky et al. (2018)). We introduce a heuristic extractive version of the data set, which allows us to approach the more feasible problem of answer extraction (rather than generation). We train systems for passage retrieval as well as answer span prediction using this data set. We use pre-trained BERT embeddings for injecting prior knowledge into our system. We show that our setup leads to state of the art performance on summary-level QA. On QA from full narratives, our model outperforms previous models on the METEOR metric. We analyze the relative contributions of pre-trained embeddings and the extractive training paradigm, and provide a detailed error analysis.
Tasks	Natural Language Inference, Question Answering
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5823/
PDF	https://www.aclweb.org/anthology/D19-5823
PWC	https://paperswithcode.com/paper/extractive-narrativeqa-with-heuristic-pre
Repo
Framework

Tintin at SemEval-2019 Task 4: Detecting Hyperpartisan News Article with only Simple Tokens


Title	Tintin at SemEval-2019 Task 4: Detecting Hyperpartisan News Article with only Simple Tokens
Authors	Yves Bestgen
Abstract	Tintin, the system proposed by the CECL for the Hyperpartisan News Detection task of SemEval 2019, is exclusively based on the tokens that make up the documents and a standard supervised learning procedure. It obtained very contrasting results: poor on the main task, but much more effective at distinguishing documents published by hyperpartisan media outlets from unbiased ones, as it ranked first. An analysis of the most important features highlighted the positive aspects, but also some potential limitations of the approach.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2186/
PDF	https://www.aclweb.org/anthology/S19-2186
PWC	https://paperswithcode.com/paper/tintin-at-semeval-2019-task-4-detecting
Repo
Framework

Information-theoretic locality properties of natural language


Title	Information-theoretic locality properties of natural language
Authors	Richard Futrell
Abstract
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-7902/
PDF	https://www.aclweb.org/anthology/W19-7902
PWC	https://paperswithcode.com/paper/information-theoretic-locality-properties-of
Repo
Framework

Locally Linear Unsupervised Feature Selection


Title	Locally Linear Unsupervised Feature Selection
Authors	Guillaume DOQUET, Michèle SEBAG
Abstract	The paper, interested in unsupervised feature selection, aims to retain the features best accounting for the local patterns in the data. The proposed approach, called Locally Linear Unsupervised Feature Selection, relies on a dimensionality reduction method to characterize such patterns; each feature is thereafter assessed according to its compliance w.r.t. the local patterns, taking inspiration from Locally Linear Embedding (Roweis and Saul, 2000). The experimental validation of the approach on the scikit-feature benchmark suite demonstrates its effectiveness compared to the state of the art.
Tasks	Dimensionality Reduction, Feature Selection
Published	2019-05-01
URL	https://openreview.net/forum?id=ByxF-nAqYX
PDF	https://openreview.net/pdf?id=ByxF-nAqYX
PWC	https://paperswithcode.com/paper/locally-linear-unsupervised-feature-selection
Repo
Framework

Automatic error classification with multiple error labels


Title	Automatic error classification with multiple error labels
Authors	Maja Popovic, David Vilar
Abstract
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-6609/
PDF	https://www.aclweb.org/anthology/W19-6609
PWC	https://paperswithcode.com/paper/automatic-error-classification-with-multiple
Repo
Framework

Towards augmenting crisis counselor training by improving message retrieval


Title	Towards augmenting crisis counselor training by improving message retrieval
Authors	Orianna Demasi, Marti A. Hearst, Benjamin Recht
Abstract	A fundamental challenge when training counselors is presenting novices with the opportunity to practice counseling distressed individuals without exacerbating a situation. Rather than replacing human empathy with an automated counselor, we propose simulating an individual in crisis so that human counselors in training can practice crisis counseling in a low-risk environment. Towards this end, we collect a dataset of suicide prevention counselor role-play transcripts and make initial steps towards constructing a CRISISbot for humans to counsel while in training. In this data-constrained setting, we evaluate the potential for message retrieval to construct a coherent chat agent in light of recent advances with text embedding methods. Our results show that embeddings can considerably improve retrieval approaches to make them competitive with generative models. By coherently retrieving messages, we can help counselors practice chatting in a low-risk environment.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-3001/
PDF	https://www.aclweb.org/anthology/W19-3001
PWC	https://paperswithcode.com/paper/towards-augmenting-crisis-counselor-training
Repo
Framework

Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases


Title	Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases
Authors	Lama Alsudias, Paul Rayson
Abstract
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/W19-5604/
PDF	https://www.aclweb.org/anthology/W19-5604
PWC	https://paperswithcode.com/paper/classifying-information-sources-in-arabic
Repo
Framework