January 25, 2020

2043 words 10 mins read

Paper Group NANR 85

Paper Group NANR 85

A Compact and Language-Sensitive Multilingual Translation Method. Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH. Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology. A distantly supervised dataset for automated data extraction from diagnostic studi …

A Compact and Language-Sensitive Multilingual Translation Method

Title A Compact and Language-Sensitive Multilingual Translation Method
Authors Yining Wang, Long Zhou, Jiajun Zhang, Feifei Zhai, Jingfang Xu, Chengqing Zong
Abstract Multilingual neural machine translation (Multi-NMT) with one encoder-decoder model has made remarkable progress due to its simple deployment. However, this multilingual translation paradigm does not make full use of language commonality and parameter sharing between encoder and decoder. Furthermore, this kind of paradigm cannot outperform the individual models trained on bilingual corpus in most cases. In this paper, we propose a compact and language-sensitive method for multilingual translation. To maximize parameter sharing, we first present a universal representor to replace both encoder and decoder models. To make the representor sensitive for specific languages, we further introduce language-sensitive embedding, attention, and discriminator with the ability to enhance model performance. We verify our methods on various translation scenarios, including one-to-many, many-to-many and zero-shot. Extensive experiments demonstrate that our proposed methods remarkably outperform strong standard multilingual translation systems on WMT and IWSLT datasets. Moreover, we find that our model is especially helpful in low-resource and zero-shot translation scenarios.
Tasks Machine Translation
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1117/
PDF https://www.aclweb.org/anthology/P19-1117
PWC https://paperswithcode.com/paper/a-compact-and-language-sensitive-multilingual
Repo
Framework

Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH

Title Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH
Authors Thierry Declerck, Melanie Siegel, Stefania Racioppa
Abstract We describe work consisting in porting two large German lexical resources into the OntoLex-Lemon model in order to establish complementary interlinkings between them. One resource is OdeNet (Open German WordNet) and the other is a further development of the German version of the MMORPH morphological analyzer. We show how the Multiword Expressions (MWEs) contained in OdeNet can be morphologically specified by the use of the lexical representation and linking features of OntoLex-Lemon, which also support the formulation of restrictions in the usage of such expressions.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5104/
PDF https://www.aclweb.org/anthology/W19-5104
PWC https://paperswithcode.com/paper/using-ontolex-lemon-for-representing-and
Repo
Framework

Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology

Title Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology
Authors Yang Xu, Jiasheng Zhang, David Reitter
Abstract We use a variant of word embedding model that incorporates subword information to characterize the degree of compositionality in lexical semantics. Our models reveal some interesting yet contrastive patterns of long-term change in multiple languages: Indo-European languages put more weight on subword units in newer words, while conversely Chinese puts less weights on the subwords, but more weight on the word as a whole. Our method provides novel evidence and methodology that enriches existing theories in evolutionary linguistics. The resulting word vectors also has decent performance in NLP-related tasks.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4717/
PDF https://www.aclweb.org/anthology/W19-4717
PWC https://paperswithcode.com/paper/treat-the-word-as-a-whole-or-look-inside
Repo
Framework

A distantly supervised dataset for automated data extraction from diagnostic studies

Title A distantly supervised dataset for automated data extraction from diagnostic studies
Authors Christopher Norman, Mariska Leeflang, Ren{'e} Spijker, Evangelos Kanoulas, Aur{'e}lie N{'e}v{'e}ol
Abstract Systematic reviews are important in evidence based medicine, but are expensive to produce. Automating or semi-automating the data extraction of index test, target condition, and reference standard from articles has the potential to decrease the cost of conducting systematic reviews of diagnostic test accuracy, but relevant training data is not available. We create a distantly supervised dataset of approximately 90,000 sentences, and let two experts manually annotate a small subset of around 1,000 sentences for evaluation. We evaluate the performance of BioBERT and logistic regression for ranking the sentences, and compare the performance for distant and direct supervision. Our results suggest that distant supervision can work as well as, or better than direct supervision on this problem, and that distantly trained models can perform as well as, or better than human annotators.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5012/
PDF https://www.aclweb.org/anthology/W19-5012
PWC https://paperswithcode.com/paper/a-distantly-supervised-dataset-for-automated
Repo
Framework

Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches

Title Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches
Authors Talha {\c{C}}olako{\u{g}}lu, Umut Sulubacak, Ahmet C{"u}neyd Tantu{\u{g}}
Abstract With the growth of the social web, user-generated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a token level pipeline of modules, heavily dependent on external linguistic resources and manually defined rules. Instead, we propose a fully automated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.
Tasks Machine Translation
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-2037/
PDF https://www.aclweb.org/anthology/P19-2037
PWC https://paperswithcode.com/paper/normalizing-non-canonical-turkish-texts-using
Repo
Framework

Enhancing biomedical word embeddings by retrofitting to verb clusters

Title Enhancing biomedical word embeddings by retrofitting to verb clusters
Authors Billy Chiu, Simon Baker, Martha Palmer, Anna Korhonen
Abstract Verbs play a fundamental role in many biomed-ical tasks and applications such as relation and event extraction. We hypothesize that performance on many downstream tasks can be improved by aligning the input pretrained embeddings according to semantic verb classes.In this work, we show that by using semantic clusters for verbs, a large lexicon of verbclasses derived from biomedical literature, weare able to improve the performance of common pretrained embeddings in downstream tasks by retrofitting them to verb classes. We present a simple and computationally efficient approach using a widely-available {``}off-the-shelf{''} retrofitting algorithm to align pretrained embeddings according to semantic verb clusters. We achieve state-of-the-art results on text classification and relation extraction tasks. |
Tasks Relation Extraction, Text Classification, Word Embeddings
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5014/
PDF https://www.aclweb.org/anthology/W19-5014
PWC https://paperswithcode.com/paper/enhancing-biomedical-word-embeddings-by
Repo
Framework

Commonsense inference in human-robot communication

Title Commonsense inference in human-robot communication
Authors Aliaks Huminski, r, Yan Bin Ng, Kenneth Kwok, Francis Bond
Abstract Natural language communication between machines and humans are still constrained. The article addresses a gap in natural language understanding about actions, specifically that of understanding commands. We propose a new method for commonsense inference (grounding) of high-level natural language commands into specific action commands for further execution by a robotic system. The method allows to build a knowledge base that consists of a large set of commonsense inferences. The preliminary results have been presented.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6013/
PDF https://www.aclweb.org/anthology/D19-6013
PWC https://paperswithcode.com/paper/commonsense-inference-in-human-robot
Repo
Framework

Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles

Title Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles
Authors Nitin Saini, Eric Price, Rahul Tallamraju, Raffi Enficiaud, Roman Ludwig, Igor Martinovic, Aamir Ahmad, Michael J. Black
Abstract Capturing human motion in natural scenarios means moving motion capture out of the lab and into the wild. Typical approaches rely on fixed, calibrated, cameras and reflective markers on the body, significantly limiting the motions that can be captured. To make motion capture truly unconstrained, we describe the first fully autonomous outdoor capture system based on flying vehicles. We use multiple micro-aerial-vehicles(MAVs), each equipped with a monocular RGB camera, an IMU, and a GPS receiver module. These detect the person, optimize their position, and localize themselves approximately. We then develop a markerless motion capture method that is suitable for this challenging scenario with a distant subject, viewed from above, with approximately calibrated and moving cameras. We combine multiple state-of-the-art 2D joint detectors with a 3D human body model and a powerful prior on human pose. We jointly optimize for 3D body pose and camera pose to robustly fit the 2D measurements. To our knowledge, this is the first successful demonstration of outdoor, full-body, markerless motion capture from autonomous flying vehicles.
Tasks Markerless Motion Capture, Motion Capture
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Saini_Markerless_Outdoor_Human_Motion_Capture_Using_Multiple_Autonomous_Micro_Aerial_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Saini_Markerless_Outdoor_Human_Motion_Capture_Using_Multiple_Autonomous_Micro_Aerial_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/markerless-outdoor-human-motion-capture-using
Repo
Framework

Extractive NarrativeQA with Heuristic Pre-Training

Title Extractive NarrativeQA with Heuristic Pre-Training
Authors Lea Frermann
Abstract Although advances in neural architectures for NLP problems as well as unsupervised pre-training have led to substantial improvements on question answering and natural language inference, understanding of and reasoning over long texts still poses a substantial challenge. Here, we consider the task of question answering from full narratives (e.g., books or movie scripts), or their summaries, tackling the NarrativeQA challenge (NQA; Kocisky et al. (2018)). We introduce a heuristic extractive version of the data set, which allows us to approach the more feasible problem of answer extraction (rather than generation). We train systems for passage retrieval as well as answer span prediction using this data set. We use pre-trained BERT embeddings for injecting prior knowledge into our system. We show that our setup leads to state of the art performance on summary-level QA. On QA from full narratives, our model outperforms previous models on the METEOR metric. We analyze the relative contributions of pre-trained embeddings and the extractive training paradigm, and provide a detailed error analysis.
Tasks Natural Language Inference, Question Answering
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5823/
PDF https://www.aclweb.org/anthology/D19-5823
PWC https://paperswithcode.com/paper/extractive-narrativeqa-with-heuristic-pre
Repo
Framework

Tintin at SemEval-2019 Task 4: Detecting Hyperpartisan News Article with only Simple Tokens

Title Tintin at SemEval-2019 Task 4: Detecting Hyperpartisan News Article with only Simple Tokens
Authors Yves Bestgen
Abstract Tintin, the system proposed by the CECL for the Hyperpartisan News Detection task of SemEval 2019, is exclusively based on the tokens that make up the documents and a standard supervised learning procedure. It obtained very contrasting results: poor on the main task, but much more effective at distinguishing documents published by hyperpartisan media outlets from unbiased ones, as it ranked first. An analysis of the most important features highlighted the positive aspects, but also some potential limitations of the approach.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2186/
PDF https://www.aclweb.org/anthology/S19-2186
PWC https://paperswithcode.com/paper/tintin-at-semeval-2019-task-4-detecting
Repo
Framework

Information-theoretic locality properties of natural language

Title Information-theoretic locality properties of natural language
Authors Richard Futrell
Abstract
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-7902/
PDF https://www.aclweb.org/anthology/W19-7902
PWC https://paperswithcode.com/paper/information-theoretic-locality-properties-of
Repo
Framework

Locally Linear Unsupervised Feature Selection

Title Locally Linear Unsupervised Feature Selection
Authors Guillaume DOQUET, Michèle SEBAG
Abstract The paper, interested in unsupervised feature selection, aims to retain the features best accounting for the local patterns in the data. The proposed approach, called Locally Linear Unsupervised Feature Selection, relies on a dimensionality reduction method to characterize such patterns; each feature is thereafter assessed according to its compliance w.r.t. the local patterns, taking inspiration from Locally Linear Embedding (Roweis and Saul, 2000). The experimental validation of the approach on the scikit-feature benchmark suite demonstrates its effectiveness compared to the state of the art.
Tasks Dimensionality Reduction, Feature Selection
Published 2019-05-01
URL https://openreview.net/forum?id=ByxF-nAqYX
PDF https://openreview.net/pdf?id=ByxF-nAqYX
PWC https://paperswithcode.com/paper/locally-linear-unsupervised-feature-selection
Repo
Framework

Automatic error classification with multiple error labels

Title Automatic error classification with multiple error labels
Authors Maja Popovic, David Vilar
Abstract
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-6609/
PDF https://www.aclweb.org/anthology/W19-6609
PWC https://paperswithcode.com/paper/automatic-error-classification-with-multiple
Repo
Framework

Towards augmenting crisis counselor training by improving message retrieval

Title Towards augmenting crisis counselor training by improving message retrieval
Authors Orianna Demasi, Marti A. Hearst, Benjamin Recht
Abstract A fundamental challenge when training counselors is presenting novices with the opportunity to practice counseling distressed individuals without exacerbating a situation. Rather than replacing human empathy with an automated counselor, we propose simulating an individual in crisis so that human counselors in training can practice crisis counseling in a low-risk environment. Towards this end, we collect a dataset of suicide prevention counselor role-play transcripts and make initial steps towards constructing a CRISISbot for humans to counsel while in training. In this data-constrained setting, we evaluate the potential for message retrieval to construct a coherent chat agent in light of recent advances with text embedding methods. Our results show that embeddings can considerably improve retrieval approaches to make them competitive with generative models. By coherently retrieving messages, we can help counselors practice chatting in a low-risk environment.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-3001/
PDF https://www.aclweb.org/anthology/W19-3001
PWC https://paperswithcode.com/paper/towards-augmenting-crisis-counselor-training
Repo
Framework

Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases

Title Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases
Authors Lama Alsudias, Paul Rayson
Abstract
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/W19-5604/
PDF https://www.aclweb.org/anthology/W19-5604
PWC https://paperswithcode.com/paper/classifying-information-sources-in-arabic
Repo
Framework
comments powered by Disqus