April 3, 2020

3043 words 15 mins read

Paper Group AWR 68

Interactive Natural Language-based Person Search. Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks. A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification. Bayesian Neural Networks With Maximum Mean Discrepancy Regularization. On Identifying Hashtags in Disaster Twitter Dat …

Interactive Natural Language-based Person Search


Title	Interactive Natural Language-based Person Search
Authors	Vikram Shree, Wei-Lun Chao, Mark Campbell
Abstract	In this work, we consider the problem of searching people in an unconstrained environment, with natural language descriptions. Specifically, we study how to systematically design an algorithm to effectively acquire descriptions from humans. An algorithm is proposed by adapting models, used for visual and language understanding, to search a person of interest (POI) in a principled way, achieving promising results without the need to re-design another complicated model. We then investigate an iterative question-answering (QA) strategy that enable robots to request additional information about the POI’s appearance from the user. To this end, we introduce a greedy algorithm to rank questions in terms of their significance, and equip the algorithm with the capability to dynamically adjust the length of human-robot interaction according to model’s uncertainty. Our approach is validated not only on benchmark datasets but on a mobile robot, moving in a dynamic and crowded environment.
Tasks	Person Search, Question Answering
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08434v1
PDF	https://arxiv.org/pdf/2002.08434v1.pdf
PWC	https://paperswithcode.com/paper/interactive-natural-language-based-person
Repo	https://github.com/vikshree/QA_PersonSearchLanguageData
Framework	none

Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks


Title	Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks
Authors	Lena Schmidt, Julie Weeds, Julian P. T. Higgins
Abstract	This research on data extraction methods applies recent advances in natural language processing to evidence synthesis based on medical texts. Texts of interest include abstracts of clinical trials in English and in multilingual contexts. The main focus is on information characterized via the Population, Intervention, Comparator, and Outcome (PICO) framework, but data extraction is not limited to these fields. Recent neural network architectures based on transformers show capacities for transfer learning and increased performance on downstream natural language processing tasks such as universal reading comprehension, brought forward by this architecture’s use of contextualized word embeddings and self-attention mechanisms. This paper contributes to solving problems related to ambiguity in PICO sentence prediction tasks, as well as highlighting how annotations for training named entity recognition systems are used to train a high-performing, but nevertheless flexible architecture for question answering in systematic review automation. Additionally, it demonstrates how the problem of insufficient amounts of training annotations for PICO entity extraction is tackled by augmentation. All models in this paper were created with the aim to support systematic review (semi)automation. They achieve high F1 scores, and demonstrate the feasibility of applying transformer-based classification methods to support data mining in the biomedical literature.
Tasks	Entity Extraction, Named Entity Recognition, Question Answering, Reading Comprehension, Transfer Learning, Word Embeddings
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11268v1
PDF	https://arxiv.org/pdf/2001.11268v1.pdf
PWC	https://paperswithcode.com/paper/data-mining-in-clinical-trial-text
Repo	https://github.com/L-ENA/HealthINF2020
Framework	none

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification


Title	A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification
Authors	Juan M. Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset
Abstract	Despite the growing popularity of metric learning approaches, very little work has attempted to perform a fair comparison of these techniques for speaker verification. We try to fill this gap and compare several metric learning loss functions in a systematic manner on the VoxCeleb dataset. The first family of loss functions is derived from the cross entropy loss (usually used for supervised classification) and includes the congenerous cosine loss, the additive angular margin loss, and the center loss. The second family of loss functions focuses on the similarity between training samples and includes the contrastive loss and the triplet loss. We show that the additive angular margin loss function outperforms all other loss functions in the study, while learning more robust representations. Based on a combination of SincNet trainable features and the x-vector architecture, the network used in this paper brings us a step closer to a really-end-to-end speaker verification system, when combined with the additive angular margin loss, while still being competitive with the x-vector baseline. In the spirit of reproducible research, we also release open source Python code for reproducing our results, and share pretrained PyTorch models on torch.hub that can be used either directly or after fine-tuning.
Tasks	Metric Learning, Speaker Verification
Published	2020-03-31
URL	https://arxiv.org/abs/2003.14021v1
PDF	https://arxiv.org/pdf/2003.14021v1.pdf
PWC	https://paperswithcode.com/paper/a-comparison-of-metric-learning-loss
Repo	https://github.com/juanmc2005/SpeakerEmbeddingLossComparison
Framework	none

Bayesian Neural Networks With Maximum Mean Discrepancy Regularization


Title	Bayesian Neural Networks With Maximum Mean Discrepancy Regularization
Authors	Jary Pomponi, Simone Scardapane, Aurelio Uncini
Abstract	Bayesian Neural Networks (BNNs) are trained to optimize an entire distribution over their weights instead of a single set, having significant advantages in terms of, e.g., interpretability, multi-task learning, and calibration. Because of the intractability of the resulting optimization problem, most BNNs are either sampled through Monte Carlo methods, or trained by minimizing a suitable Evidence Lower BOund (ELBO) on a variational approximation. In this paper, we propose a variant of the latter, wherein we replace the Kullback-Leibler divergence in the ELBO term with a Maximum Mean Discrepancy (MMD) estimator, inspired by recent work in variational inference. After motivating our proposal based on the properties of the MMD term, we proceed to show a number of empirical advantages of the proposed formulation over the state-of-the-art. In particular, our BNNs achieve higher accuracy on multiple benchmarks, including several image classification tasks. In addition, they are more robust to the selection of a prior over the weights, and they are better calibrated. As a second contribution, we provide a new formulation for estimating the uncertainty on a given prediction, showing it performs in a more robust fashion against adversarial attacks and the injection of noise over their inputs, compared to more classical criteria such as the differential entropy.
Tasks	Calibration, Image Classification, Multi-Task Learning
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00952v1
PDF	https://arxiv.org/pdf/2003.00952v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-neural-networks-with-maximum-mean
Repo	https://github.com/ispamm/MMD-Bayesian-Neural-Network
Framework	pytorch

On Identifying Hashtags in Disaster Twitter Data


Title	On Identifying Hashtags in Disaster Twitter Data
Authors	Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea
Abstract	Tweet hashtags have the potential to improve the search for information during disaster events. However, there is a large number of disaster-related tweets that do not have any user-provided hashtags. Moreover, only a small number of tweets that contain actionable hashtags are useful for disaster response. To facilitate progress on automatic identification (or extraction) of disaster hashtags for Twitter data, we construct a unique dataset of disaster-related tweets annotated with hashtags useful for filtering actionable information. Using this dataset, we further investigate Long Short Term Memory-based models within a Multi-Task Learning framework. The best performing model achieves an F1-score as high as 92.22%. The dataset, code, and other resources are available on Github.
Tasks	Multi-Task Learning
Published	2020-01-05
URL	https://arxiv.org/abs/2001.01323v1
PDF	https://arxiv.org/pdf/2001.01323v1.pdf
PWC	https://paperswithcode.com/paper/on-identifying-hashtags-in-disaster-twitter
Repo	https://github.com/JRC1995/Tweet-Disaster-Keyphrase
Framework	tf

Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video


Title	Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video
Authors	Jie Wu, Guanbin Li, Si Liu, Liang Lin
Abstract	Temporally language grounding in untrimmed videos is a newly-raised task in video understanding. Most of the existing methods suffer from inferior efficiency, lacking interpretability, and deviating from the human perception mechanism. Inspired by human’s coarse-to-fine decision-making paradigm, we formulate a novel Tree-Structured Policy based Progressive Reinforcement Learning (TSP-PRL) framework to sequentially regulate the temporal boundary by an iterative refinement process. The semantic concepts are explicitly represented as the branches in the policy, which contributes to efficiently decomposing complex policies into an interpretable primitive action. Progressive reinforcement learning provides correct credit assignment via two task-oriented rewards that encourage mutual promotion within the tree-structured policy. We extensively evaluate TSP-PRL on the Charades-STA and ActivityNet datasets, and experimental results show that TSP-PRL achieves competitive performance over existing state-of-the-art methods.
Tasks	Decision Making, Video Understanding
Published	2020-01-18
URL	https://arxiv.org/abs/2001.06680v1
PDF	https://arxiv.org/pdf/2001.06680v1.pdf
PWC	https://paperswithcode.com/paper/tree-structured-policy-based-progressive
Repo	https://github.com/WuJie1010/TSP-PRL
Framework	pytorch

Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish


Title	Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish
Authors	Barbara Plank
Abstract	Named Entity Recognition (NER) has greatly advanced by the introduction of deep neural architectures. However, the success of these methods depends on large amounts of training data. The scarcity of publicly-available human-labeled datasets has resulted in limited evaluation of existing NER systems, as is the case for Danish. This paper studies the effectiveness of cross-lingual transfer for Danish, evaluates its complementarity to limited gold data, and sheds light on performance of Danish NER.
Tasks	Cross-Lingual Transfer, Named Entity Recognition
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02931v1
PDF	https://arxiv.org/pdf/2003.02931v1.pdf
PWC	https://paperswithcode.com/paper/neural-cross-lingual-transfer-and-limited
Repo	https://github.com/bplank/danish_ner_transfer
Framework	none

PhoBERT: Pre-trained language models for Vietnamese


Title	PhoBERT: Pre-trained language models for Vietnamese
Authors	Dat Quoc Nguyen, Anh Tuan Nguyen
Abstract	We present PhoBERT with two versions of “base” and “large”–the first public large-scale monolingual language models pre-trained for Vietnamese. We show that PhoBERT improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part-of-speech tagging, Named-entity recognition and Natural language inference. We release PhoBERT to facilitate future research and downstream applications for Vietnamese NLP. Our PhoBERT is released at: https://github.com/VinAIResearch/PhoBERT
Tasks	Named Entity Recognition, Natural Language Inference, Part-Of-Speech Tagging
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00744v1
PDF	https://arxiv.org/pdf/2003.00744v1.pdf
PWC	https://paperswithcode.com/paper/phobert-pre-trained-language-models-for
Repo	https://github.com/VinAIResearch/PhoBERT
Framework	pytorch

Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages


Title	Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages
Authors	Edoardo M. Ponti, Ivan Vulić, Ryan Cotterell, Marinela Parovic, Roi Reichart, Anna Korhonen
Abstract	Most combinations of NLP tasks and language varieties lack in-domain examples for supervised training because of the paucity of annotated data. How can neural models make sample-efficient generalizations from task-language combinations with available data to low-resource ones? In this work, we propose a Bayesian generative model for the space of neural parameters. We assume that this space can be factorized into latent variables for each language and each task. We infer the posteriors over such latent variables based on data from seen task-language combinations through variational inference. This enables zero-shot classification on unseen combinations at prediction time. For instance, given training data for named entity recognition (NER) in Vietnamese and for part-of-speech (POS) tagging in Wolof, our model can perform accurate predictions for NER in Wolof. In particular, we experiment with a typologically diverse sample of 33 languages from 4 continents and 11 families, and show that our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods; it increases performance by 4.49 points for POS tagging and 7.73 points for NER on average compared to the strongest baseline.
Tasks	Cross-Lingual Transfer, Named Entity Recognition, Part-Of-Speech Tagging, Zero-Shot Learning
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11453v1
PDF	https://arxiv.org/pdf/2001.11453v1.pdf
PWC	https://paperswithcode.com/paper/parameter-space-factorization-for-zero-shot
Repo	https://github.com/cambridgeltl/parameter-factorization
Framework	pytorch

Learning Delicate Local Representations for Multi-Person Pose Estimation


Title	Learning Delicate Local Representations for Multi-Person Pose Estimation
Authors	Yuanhao Cai, Zhicheng Wang, Zhengxiong Luo, Binyi Yin, Angang Du, Haoqian Wang, Xinyu Zhou, Erjin Zhou, Xiangyu Zhang, Jian Sun
Abstract	In this paper, we propose a novel method called Residual Steps Network (RSN). RSN aggregates features with the same spatialsize (Intra-level features) efficiently to obtain delicate local representations, which retain rich low-level spatial information and result in pre-cise keypoint localization. In addition, we propose an efficient attention mechanism - Pose Refine Machine (PRM) to further refine the keypointlocations. Our approach won the 1st place of COCO Keypoint Challenge 2019 and achieves state-of-the-art results on both COCO and MPII benchmarks, without using extra training data and pretrained model. Our single model achieves 78.6 on COCO test-dev, 93.0 on MPII test dataset. Ensembled models achieve 79.2 on COCO test-dev, 77.1 on COCO test-challenge dataset. The source code is publicly available for further research at https://github.com/caiyuanhao1998/RSN
Tasks	Multi-Person Pose Estimation, Pose Estimation
Published	2020-03-09
URL	https://arxiv.org/abs/2003.04030v2
PDF	https://arxiv.org/pdf/2003.04030v2.pdf
PWC	https://paperswithcode.com/paper/learning-delicate-local-representations-for
Repo	https://github.com/caiyuanhao1998/RSN
Framework	pytorch

MoVi: A Large Multipurpose Motion and Video Dataset


Title	MoVi: A Large Multipurpose Motion and Video Dataset
Authors	Saeed Ghorbani, Kimia Mahdaviani, Anne Thaler, Konrad Kording, Douglas James Cook, Gunnar Blohm, Nikolaus F. Troje
Abstract	Human movements are both an area of intense study and the basis of many applications such as character animation. For many applications, it is crucial to identify movements from videos or analyze datasets of movements. Here we introduce a new human Motion and Video dataset MoVi, which we make available publicly. It contains 60 female and 30 male actors performing a collection of 20 predefined everyday actions and sports movements, and one self-chosen movement. In five capture rounds, the same actors and movements were recorded using different hardware systems, including an optical motion capture system, video cameras, and inertial measurement units (IMU). For some of the capture rounds, the actors were recorded when wearing natural clothing, for the other rounds they wore minimal clothing. In total, our dataset contains 9 hours of motion capture data, 17 hours of video data from 4 different points of view (including one hand-held camera), and 6.6 hours of IMU data. In this paper, we describe how the dataset was collected and post-processed; We present state-of-the-art estimates of skeletal motions and full-body shape deformations associated with skeletal motion. We discuss examples for potential studies this dataset could enable.
Tasks	Motion Capture
Published	2020-03-04
URL	https://arxiv.org/abs/2003.01888v1
PDF	https://arxiv.org/pdf/2003.01888v1.pdf
PWC	https://paperswithcode.com/paper/movi-a-large-multipurpose-motion-and-video
Repo	https://github.com/saeed1262/MoVi-Toolbox
Framework	none

VegasFlow: accelerating Monte Carlo simulation across multiple hardware platforms


Title	VegasFlow: accelerating Monte Carlo simulation across multiple hardware platforms
Authors	Stefano Carrazza, Juan M. Cruz-Martinez
Abstract	We present VegasFlow, a new software for fast evaluation of high dimensional integrals based on Monte Carlo integration techniques designed for platforms with hardware accelerators. The growing complexity of calculations and simulations in many areas of science have been accompanied by advances in the computational tools which have helped their developments. VegasFlow enables developers to delegate all complicated aspects of hardware or platform implementation to the library so they can focus on the problem at hand. This software is inspired on the Vegas algorithm, ubiquitous in the particle physics community as the driver of cross section integration, and based on Google’s powerful TensorFlow library. We benchmark the performance of this library on many different consumer and professional grade GPUs and CPUs.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12921v1
PDF	https://arxiv.org/pdf/2002.12921v1.pdf
PWC	https://paperswithcode.com/paper/vegasflow-accelerating-monte-carlo-simulation
Repo	https://github.com/N3PDF/vegasflow
Framework	tf

Gaining a Sense of Touch. Physical Parameters Estimation using a Soft Gripper and Neural Networks


Title	Gaining a Sense of Touch. Physical Parameters Estimation using a Soft Gripper and Neural Networks
Authors	Michał Bednarek, Piotr Kicki, Jakub Bednarek, Krzysztof Walas
Abstract	Soft grippers are gaining significant attention in the manipulation of elastic objects, where it is required to handle soft and unstructured objects which are vulnerable to deformations. A crucial problem is to estimate the physical parameters of a squeezed object to adjust the manipulation procedure, which is considered as a significant challenge. To the best of the authors’ knowledge, there is not enough research on physical parameters estimation using deep learning algorithms on measurements from direct interaction with objects using robotic grippers. In our work, we proposed a trainable system for the regression of a stiffness coefficient and provided extensive experiments using the physics simulator environment. Moreover, we prepared the application that works in the real-world scenario. Our system can reliably estimate the stiffness of an object using the Yale OpenHand soft gripper based on readings from Inertial Measurement Units (IMUs) attached to its fingers. Additionally, during the experiments, we prepared three datasets of signals gathered while squeezing objects – two created in the simulation environment and one composed of real data.
Tasks
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00784v2
PDF	https://arxiv.org/pdf/2003.00784v2.pdf
PWC	https://paperswithcode.com/paper/gaining-a-sense-of-touch-physical-parameters
Repo	https://github.com/mbed92/soft-grip
Framework	tf

Towards Detection of Subjective Bias using Contextualized Word Embeddings


Title	Towards Detection of Subjective Bias using Contextualized Word Embeddings
Authors	Tanvi Dadu, Kartikey Pant, Radhika Mamidi
Abstract	Subjective bias detection is critical for applications like propaganda detection, content recommendation, sentiment analysis, and bias neutralization. This bias is introduced in natural language via inflammatory words and phrases, casting doubt over facts, and presupposing the truth. In this work, we perform comprehensive experiments for detecting subjective bias using BERT-based models on the Wiki Neutrality Corpus(WNC). The dataset consists of $360k$ labeled instances, from Wikipedia edits that remove various instances of the bias. We further propose BERT-based ensembles that outperform state-of-the-art methods like $BERT_{large}$ by a margin of $5.6$ F1 score.
Tasks	Sentiment Analysis, Word Embeddings
Published	2020-02-16
URL	https://arxiv.org/abs/2002.06644v1
PDF	https://arxiv.org/pdf/2002.06644v1.pdf
PWC	https://paperswithcode.com/paper/towards-detection-of-subjective-bias-using
Repo	https://github.com/tanvidadu/Subjective-Bias-Detection
Framework	none

Lesion Harvester: Iteratively Mining Unlabeled Lesions and Hard-Negative Examples at Scale


Title	Lesion Harvester: Iteratively Mining Unlabeled Lesions and Hard-Negative Examples at Scale
Authors	Jinzheng Cai, Adam P. Harrison, Youjing Zheng, Ke Yan, Yuankai Huo, Jing Xiao, Lin Yang, Le Lu
Abstract	Acquiring large-scale medical image data, necessary for training machine learning algorithms, is frequently intractable, due to prohibitive expert-driven annotation costs. Recent datasets extracted from hospital archives, e.g., DeepLesion, have begun to address this problem. However, these are often incompletely or noisily labeled, e.g., DeepLesion leaves over 50% of its lesions unlabeled. Thus, effective methods to harvest missing annotations are critical for continued progress in medical image analysis. This is the goal of our work, where we develop a powerful system to harvest missing lesions from the DeepLesion dataset at high precision. Accepting the need for some degree of expert labor to achieve high fidelity, we exploit a small fully-labeled subset of medical image volumes and use it to intelligently mine annotations from the remainder. To do this, we chain together a highly sensitive lesion proposal generator and a very selective lesion proposal classifier. While our framework is generic, we optimize our performance by proposing a 3D contextual lesion proposal generator and by using a multi-view multi-scale lesion proposal classifier. These produce harvested and hard-negative proposals, which we then re-use to finetune our proposal generator by using a novel hard negative suppression loss, continuing this process until no extra lesions are found. Extensive experimental analysis demonstrates that our method can harvest an additional 9,805 lesions while keeping precision above 90%. To demonstrate the benefits of our approach, we show that lesion detectors trained on our harvested lesions can significantly outperform the same variants only trained on the original annotations, with boost of average precision of 7% to 10%. We open source our annotations at https://github.com/JimmyCai91/DeepLesionAnnotation.
Tasks
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07776v2
PDF	https://arxiv.org/pdf/2001.07776v2.pdf
PWC	https://paperswithcode.com/paper/lesion-harvester-iteratively-mining-unlabeled
Repo	https://github.com/JimmyCai91/DeepLesionAnnotation
Framework	none