February 1, 2020

3068 words 15 mins read

Paper Group AWR 366

Robust sound event detection in bioacoustic sensor networks. Learning Metrics from Teachers: Compact Networks for Image Embedding. SciBERT: A Pretrained Language Model for Scientific Text. Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data. Learning to synthesise the ageing brain without longitudinal data. A …

Robust sound event detection in bioacoustic sensor networks


Title	Robust sound event detection in bioacoustic sensor networks
Authors	Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello
Abstract	Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 milliseconds) and long-term (30 minutes) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer. Combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.
Tasks	Data Augmentation, Sound Event Detection
Published	2019-05-20
URL	https://arxiv.org/abs/1905.08352v2
PDF	https://arxiv.org/pdf/1905.08352v2.pdf
PWC	https://paperswithcode.com/paper/robust-sound-event-detection-in-bioacoustic
Repo	https://github.com/BirdVox/birdvoxdetect
Framework	tf

Learning Metrics from Teachers: Compact Networks for Image Embedding


Title	Learning Metrics from Teachers: Compact Networks for Image Embedding
Authors	Lu Yu, Vacit Oguz Yazici, Xialei Liu, Joost van de Weijer, Yongmei Cheng, Arnau Ramisa
Abstract	Metric learning networks are used to compute image embeddings, which are widely used in many applications such as image retrieval and face recognition. In this paper, we propose to use network distillation to efficiently compute image embeddings with small networks. Network distillation has been successfully applied to improve image classification, but has hardly been explored for metric learning. To do so, we propose two new loss functions that model the communication of a deep teacher network to a small student network. We evaluate our system in several datasets, including CUB-200-2011, Cars-196, Stanford Online Products and show that embeddings computed using small student networks perform significantly better than those computed using standard networks of similar size. Results on a very compact network (MobileNet-0.25), which can be used on mobile devices, show that the proposed method can greatly improve Recall@1 results from 27.5% to 44.6%. Furthermore, we investigate various aspects of distillation for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. (Code is available at https://github.com/yulu0724/EmbeddingDistillation.)
Tasks	Face Recognition, Image Classification, Image Retrieval, Metric Learning
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03624v1
PDF	http://arxiv.org/pdf/1904.03624v1.pdf
PWC	https://paperswithcode.com/paper/learning-metrics-from-teachers-compact
Repo	https://github.com/yulu0724/EmbeddingDistillation
Framework	pytorch

SciBERT: A Pretrained Language Model for Scientific Text


Title	SciBERT: A Pretrained Language Model for Scientific Text
Authors	Iz Beltagy, Kyle Lo, Arman Cohan
Abstract	Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing, with datasets from a variety of scientific domains. We demonstrate statistically significant improvements over BERT and achieve new state-of-the-art results on several of these tasks. The code and pretrained models are available at https://github.com/allenai/scibert/.
Tasks	Citation Intent Classification, Dependency Parsing, Language Modelling, Medical Named Entity Recognition, Named Entity Recognition, Participant Intervention Comparison Outcome Extraction, Relation Extraction, Sentence Classification
Published	2019-03-26
URL	https://arxiv.org/abs/1903.10676v3
PDF	https://arxiv.org/pdf/1903.10676v3.pdf
PWC	https://paperswithcode.com/paper/scibert-pretrained-contextualized-embeddings
Repo	https://github.com/allenai/scibert
Framework	pytorch

Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data


Title	Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data
Authors	Nicolas Girard, Guillaume Charpiat, Yuliya Tarabalka
Abstract	In machine learning the best performance on a certain task is achieved by fully supervised methods when perfect ground truth labels are available. However, labels are often noisy, especially in remote sensing where manually curated public datasets are rare. We study the multi-modal cadaster map alignment problem for which available annotations are mis-aligned polygons, resulting in noisy supervision. We subsequently set up a multiple-rounds training scheme which corrects the ground truth annotations at each round to better train the model at the next round. We show that it is possible to reduce the noise of the dataset by iteratively training a better alignment model to correct the annotation alignment.
Tasks
Published	2019-03-12
URL	http://arxiv.org/abs/1903.06529v1
PDF	http://arxiv.org/pdf/1903.06529v1.pdf
PWC	https://paperswithcode.com/paper/noisy-supervision-for-correcting-misaligned
Repo	https://github.com/Lydorn/mapalignment
Framework	tf

Learning to synthesise the ageing brain without longitudinal data


Title	Learning to synthesise the ageing brain without longitudinal data
Authors	Tian Xia, Agisilaos Chartsias, Chengjia Wang, Sotirios A. Tsaftaris
Abstract	Brain ageing is a continuous process that is affected by many factors including neurodegenerative diseases. Understanding this process is of great value for both neuroscience research and clinical applications. However, revealing underlying mechanisms is challenging due to the lack of longitudinal data. In this paper, we propose a deep learning-based method that learns to simulate subject-specific brain ageing trajectories without relying on longitudinal data. Our method synthesises aged images using a network conditioned on two clinical variables: age as a continuous variable, and health state, i.e. status of Alzheimer’s Disease (AD) for this work, as an ordinal variable. We adopt an adversarial loss to learn the joint distribution of brain appearance and clinical variables and define reconstruction losses that help preserve subject identity. To demonstrate our model, we compare with several approaches using two widely used datasets: Cam-CAN and ADNI. We use ground-truth longitudinal data from ADNI to evaluate the quality of synthesised images. A pre-trained age predictor, which estimates the apparent age of a brain image, is used to assess age accuracy. In addition, we show that we can train the model on Cam-CAN data and evaluate on the longitudinal data from ADNI, indicating the generalisation power of our approach. Both qualitative and quantitative results show that our method can progressively simulate the ageing process by synthesising realistic brain images. The code will be made publicly available at: https://github.com/xiat0616/BrainAgeing.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02620v2
PDF	https://arxiv.org/pdf/1912.02620v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-synthesise-the-ageing-brain
Repo	https://github.com/xiat0616/BrainAgeing
Framework	none

AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation


Title	AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation
Authors	Hyeongmin Lee, Taeoh Kim, Tae-young Chung, Daehyun Pak, Yuseok Ban, Sangyoun Lee
Abstract	Video frame interpolation is one of the most challenging tasks in video processing research. Recently, many studies based on deep learning have been suggested. Most of these methods focus on finding locations with useful information to estimate each output pixel using their own frame warping operations. However, many of them have Degrees of Freedom (DoF) limitations and fail to deal with the complex motions found in real world videos. To solve this problem, we propose a new warping module named Adaptive Collaboration of Flows (AdaCoF). Our method estimates both kernel weights and offset vectors for each target pixel to synthesize the output frame. AdaCoF is one of the most generalized warping modules compared to other approaches, and covers most of them as special cases of it. Therefore, it can deal with a significantly wide domain of complex motions. To further improve our framework and synthesize more realistic outputs, we introduce dual-frame adversarial loss which is applicable only to video frame interpolation tasks. The experimental results show that our method outperforms the state-of-the-art methods for both fixed training set environments and the Middlebury benchmark.
Tasks	Video Frame Interpolation
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10244v3
PDF	https://arxiv.org/pdf/1907.10244v3.pdf
PWC	https://paperswithcode.com/paper/learning-spatial-transform-for-video-frame
Repo	https://github.com/HyeongminLEE/AdaCoF-pytorch
Framework	pytorch

NAIL: A General Interactive Fiction Agent


Title	NAIL: A General Interactive Fiction Agent
Authors	Matthew Hausknecht, Ricky Loynd, Greg Yang, Adith Swaminathan, Jason D. Williams
Abstract	Interactive Fiction (IF) games are complex textual decision making problems. This paper introduces NAIL, an autonomous agent for general parser-based IF games. NAIL won the 2018 Text Adventure AI Competition, where it was evaluated on twenty unseen games. This paper describes the architecture, development, and insights underpinning NAIL’s performance.
Tasks	Decision Making
Published	2019-02-12
URL	http://arxiv.org/abs/1902.04259v2
PDF	http://arxiv.org/pdf/1902.04259v2.pdf
PWC	https://paperswithcode.com/paper/nail-a-general-interactive-fiction-agent
Repo	https://github.com/Microsoft/nail_agent
Framework	none

BERT with History Answer Embedding for Conversational Question Answering


Title	BERT with History Answer Embedding for Conversational Question Answering
Authors	Chen Qu, Liu Yang, Minghui Qiu, W. Bruce Croft, Yongfeng Zhang, Mohit Iyyer
Abstract	Conversational search is an emerging topic in the information retrieval community. One of the major challenges to multi-turn conversational search is to model the conversation history to answer the current question. Existing methods either prepend history turns to the current question or use complicated attention mechanisms to model the history. We propose a conceptually simple yet highly effective approach referred to as history answer embedding. It enables seamless integration of conversation history into a conversational question answering (ConvQA) model built on BERT (Bidirectional Encoder Representations from Transformers). We first explain our view that ConvQA is a simplified but concrete setting of conversational search, and then we provide a general framework to solve ConvQA. We further demonstrate the effectiveness of our approach under this framework. Finally, we analyze the impact of different numbers of history turns under different settings to provide new insights into conversation history modeling in ConvQA.
Tasks	Information Retrieval, Question Answering
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05412v2
PDF	https://arxiv.org/pdf/1905.05412v2.pdf
PWC	https://paperswithcode.com/paper/bert-with-history-answer-embedding-for
Repo	https://github.com/prdwb/bert_hae
Framework	tf

Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels


Title	Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels
Authors	Pengfei Chen, Benben Liao, Guangyong Chen, Shengyu Zhang
Abstract	Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) as DNNs usually have the high capacity to memorize the noisy labels. In this paper, we find that the test accuracy can be quantitatively characterized in terms of the noise ratio in datasets. In particular, the test accuracy is a quadratic function of the noise ratio in the case of symmetric noise, which explains the experimental findings previously published. Based on our analysis, we apply cross-validation to randomly split noisy datasets, which identifies most samples that have correct labels. Then we adopt the Co-teaching strategy which takes full advantage of the identified samples to train DNNs robustly against noisy labels. Compared with extensive state-of-the-art methods, our strategy consistently improves the generalization performance of DNNs under both synthetic and real-world training noise.
Tasks
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05040v1
PDF	https://arxiv.org/pdf/1905.05040v1.pdf
PWC	https://paperswithcode.com/paper/understanding-and-utilizing-deep-neural
Repo	https://github.com/chenpf1025/noisy_label_understanding_utilizing
Framework	tf

Assistive Gym: A Physics Simulation Framework for Assistive Robotics


Title	Assistive Gym: A Physics Simulation Framework for Assistive Robotics
Authors	Zackory Erickson, Vamsee Gangaram, Ariel Kapusta, C. Karen Liu, Charles C. Kemp
Abstract	Autonomous robots have the potential to serve as versatile caregivers that improve quality of life for millions of people worldwide. Yet, conducting research in this area presents numerous challenges, including the risks of physical interaction between people and robots. Physics simulations have been used to optimize and train robots for physical assistance, but have typically focused on a single task. In this paper, we present Assistive Gym, an open source physics simulation framework for assistive robots that models multiple tasks. It includes six simulated environments in which a robotic manipulator can attempt to assist a person with activities of daily living (ADLs): itch scratching, drinking, feeding, body manipulation, dressing, and bathing. Assistive Gym models a person’s physical capabilities and preferences for assistance, which are used to provide a reward function. We present baseline policies trained using reinforcement learning for four different commercial robots in the six environments. We demonstrate that modeling human motion results in better assistance and we compare the performance of different robots. Overall, we show that Assistive Gym is a promising tool for assistive robotics research.
Tasks
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04700v1
PDF	https://arxiv.org/pdf/1910.04700v1.pdf
PWC	https://paperswithcode.com/paper/assistive-gym-a-physics-simulation-framework
Repo	https://github.com/Healthcare-Robotics/assistive-gym
Framework	none

Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation


Title	Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation
Authors	Xinjie Fan, Yizhe Zhang, Zhendong Wang, Mingyuan Zhou
Abstract	Sequence generation models are commonly refined with reinforcement learning over user-defined metrics. However, high gradient variance hinders the practical use of this method. To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. Due to the correlation, the number of unique rollouts is random and adaptive to model uncertainty; those rollouts naturally become baselines for each other, and hence are combined to effectively reduce gradient variance. We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios by decomposing each categorical action into a sequence of binary actions. We evaluate our methods on both neural program synthesis and image captioning. The proposed methods yield lower gradient variance and consistent improvement over related baselines.
Tasks	Image Captioning, Program Synthesis
Published	2019-12-31
URL	https://arxiv.org/abs/1912.13151v1
PDF	https://arxiv.org/pdf/1912.13151v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-correlated-monte-carlo-for-1
Repo	https://github.com/xinjiefan/ACMC_ICLR
Framework	pytorch

ANDA: A Novel Data Augmentation Technique Applied to Salient Object Detection


Title	ANDA: A Novel Data Augmentation Technique Applied to Salient Object Detection
Authors	Daniel V. Ruiz, Bruno A. Krinski, Eduardo Todt
Abstract	In this paper, we propose a novel data augmentation technique (ANDA) applied to the Salient Object Detection (SOD) context. Standard data augmentation techniques proposed in the literature, such as image cropping, rotation, flipping, and resizing, only generate variations of the existing examples, providing a limited generalization. Our method has the novelty of creating new images, by combining an object with a new background while retaining part of its salience in this new context; To do so, the ANDA technique relies on the linear combination between labeled salient objects and new backgrounds, generated by removing the original salient object in a process known as image inpainting. Our proposed technique allows for more precise control of the object’s position and size while preserving background information. Aiming to evaluate our proposed method, we trained multiple deep neural networks and compared the effect that our technique has in each one. We also compared our method with other data augmentation techniques. Our findings show that depending on the network improvement can be up to 14.1% in the F-measure and decay of up to 2.6% in the Mean Absolute Error.
Tasks	Data Augmentation, Image Augmentation, Image Cropping, Image Inpainting, Object Detection, Salient Object Detection
Published	2019-10-03
URL	https://arxiv.org/abs/1910.01256v1
PDF	https://arxiv.org/pdf/1910.01256v1.pdf
PWC	https://paperswithcode.com/paper/anda-a-novel-data-augmentation-technique
Repo	https://github.com/ruizvitor/ANDA
Framework	none

Text-based Depression Detection: What Triggers An Alert


Title	Text-based Depression Detection: What Triggers An Alert
Authors	Heinrich Dinkel, Mengyue Wu, Kai Yu
Abstract	Recent advances in automatic depression detection mostly derive from modality fusion and deep learning methods. However multi-modal approaches insert significant difficulty in data collection phase while deep learning methods’ opaqueness lowers its credibility. This current work proposes a text-based multi-task BLSTM model with pretrained word embeddings. Our method outputs depression presence results as well as predicted severity score, culminating a state-of-the-art F1 score of 0.87, outperforming previous multi-modal studies. We also achieve the lowest RMSE compared with currently available text-based approaches. Further, by utilizing a per time step attention mechanism we analyse the sentences/words that contribute most in predicting the depressed state. Surprisingly, `unmeaningful' words/paralinguistic information such as` um’ and `uh’ are the indicators to our model when making a depression prediction. It is for the first time revealed that fillers in a conversation trigger a depression alert for a deep learning model. \|
Tasks	Word Embeddings
Published	2019-04-08
URL	https://arxiv.org/abs/1904.05154v2
PDF	https://arxiv.org/pdf/1904.05154v2.pdf
PWC	https://paperswithcode.com/paper/text-based-depression-detection-what-triggers
Repo	https://github.com/richermans/text_based_depression
Framework	pytorch

rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch


Title	rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch
Authors	Adam Stooke, Pieter Abbeel
Abstract	Since the recent advent of deep reinforcement learning for game play and simulated robotic control, a multitude of new algorithms have flourished. Most are model-free algorithms which can be categorized into three families: deep Q-learning, policy gradients, and Q-value policy gradients. These have developed along separate lines of research, such that few, if any, code bases incorporate all three kinds. Yet these algorithms share a great depth of common deep reinforcement learning machinery. We are pleased to share rlpyt, which implements all three algorithm families on top of a shared, optimized infrastructure, in a single repository. It contains modular implementations of many common deep RL algorithms in Python using PyTorch, a leading deep learning library. rlpyt is designed as a high-throughput code base for small- to medium-scale research in deep RL. This white paper summarizes its features, algorithms implemented, and relation to prior work, and concludes with detailed implementation and usage notes. rlpyt is available at https://github.com/astooke/rlpyt.
Tasks	Q-Learning
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01500v2
PDF	https://arxiv.org/pdf/1909.01500v2.pdf
PWC	https://paperswithcode.com/paper/rlpyt-a-research-code-base-for-deep
Repo	https://github.com/sarahisyoung/rlpyt
Framework	pytorch

Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention


Title	Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention
Authors	Ye Xia, Jinkyu Kim, John Canny, Karl Zipser, David Whitney
Abstract	Inspired by human vision, we propose a new periphery-fovea multi-resolution driving model that predicts vehicle speed from dash camera videos. The peripheral vision module of the model processes the full video frames in low resolution. Its foveal vision module selects sub-regions and uses high-resolution input from those regions to improve its driving performance. We train the fovea selection module with supervision from driver gaze. We show that adding high-resolution input from predicted human driver gaze locations significantly improves the driving accuracy of the model. Our periphery-fovea multi-resolution model outperforms a uni-resolution periphery-only model that has the same amount of floating-point operations. More importantly, we demonstrate that our driving model achieves a significantly higher performance gain in pedestrian-involved critical situations than in other non-critical situations.
Tasks
Published	2019-03-24
URL	http://arxiv.org/abs/1903.09950v1
PDF	http://arxiv.org/pdf/1903.09950v1.pdf
PWC	https://paperswithcode.com/paper/periphery-fovea-multi-resolution-driving
Repo	https://github.com/pascalxia/periphery_fovea_driving
Framework	tf