February 1, 2020

3068 words 15 mins read

Paper Group AWR 366

Paper Group AWR 366

Robust sound event detection in bioacoustic sensor networks. Learning Metrics from Teachers: Compact Networks for Image Embedding. SciBERT: A Pretrained Language Model for Scientific Text. Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data. Learning to synthesise the ageing brain without longitudinal data. A …

Robust sound event detection in bioacoustic sensor networks

Title Robust sound event detection in bioacoustic sensor networks
Authors Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello
Abstract Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 milliseconds) and long-term (30 minutes) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer. Combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.
Tasks Data Augmentation, Sound Event Detection
Published 2019-05-20
URL https://arxiv.org/abs/1905.08352v2
PDF https://arxiv.org/pdf/1905.08352v2.pdf
PWC https://paperswithcode.com/paper/robust-sound-event-detection-in-bioacoustic
Repo https://github.com/BirdVox/birdvoxdetect
Framework tf

Learning Metrics from Teachers: Compact Networks for Image Embedding

Title Learning Metrics from Teachers: Compact Networks for Image Embedding
Authors Lu Yu, Vacit Oguz Yazici, Xialei Liu, Joost van de Weijer, Yongmei Cheng, Arnau Ramisa
Abstract Metric learning networks are used to compute image embeddings, which are widely used in many applications such as image retrieval and face recognition. In this paper, we propose to use network distillation to efficiently compute image embeddings with small networks. Network distillation has been successfully applied to improve image classification, but has hardly been explored for metric learning. To do so, we propose two new loss functions that model the communication of a deep teacher network to a small student network. We evaluate our system in several datasets, including CUB-200-2011, Cars-196, Stanford Online Products and show that embeddings computed using small student networks perform significantly better than those computed using standard networks of similar size. Results on a very compact network (MobileNet-0.25), which can be used on mobile devices, show that the proposed method can greatly improve Recall@1 results from 27.5% to 44.6%. Furthermore, we investigate various aspects of distillation for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. (Code is available at https://github.com/yulu0724/EmbeddingDistillation.)
Tasks Face Recognition, Image Classification, Image Retrieval, Metric Learning
Published 2019-04-07
URL http://arxiv.org/abs/1904.03624v1
PDF http://arxiv.org/pdf/1904.03624v1.pdf
PWC https://paperswithcode.com/paper/learning-metrics-from-teachers-compact
Repo https://github.com/yulu0724/EmbeddingDistillation
Framework pytorch

SciBERT: A Pretrained Language Model for Scientific Text

Title SciBERT: A Pretrained Language Model for Scientific Text
Authors Iz Beltagy, Kyle Lo, Arman Cohan
Abstract Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing, with datasets from a variety of scientific domains. We demonstrate statistically significant improvements over BERT and achieve new state-of-the-art results on several of these tasks. The code and pretrained models are available at https://github.com/allenai/scibert/.
Tasks Citation Intent Classification, Dependency Parsing, Language Modelling, Medical Named Entity Recognition, Named Entity Recognition, Participant Intervention Comparison Outcome Extraction, Relation Extraction, Sentence Classification
Published 2019-03-26
URL https://arxiv.org/abs/1903.10676v3
PDF https://arxiv.org/pdf/1903.10676v3.pdf
PWC https://paperswithcode.com/paper/scibert-pretrained-contextualized-embeddings
Repo https://github.com/allenai/scibert
Framework pytorch

Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data

Title Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data
Authors Nicolas Girard, Guillaume Charpiat, Yuliya Tarabalka
Abstract In machine learning the best performance on a certain task is achieved by fully supervised methods when perfect ground truth labels are available. However, labels are often noisy, especially in remote sensing where manually curated public datasets are rare. We study the multi-modal cadaster map alignment problem for which available annotations are mis-aligned polygons, resulting in noisy supervision. We subsequently set up a multiple-rounds training scheme which corrects the ground truth annotations at each round to better train the model at the next round. We show that it is possible to reduce the noise of the dataset by iteratively training a better alignment model to correct the annotation alignment.
Tasks
Published 2019-03-12
URL http://arxiv.org/abs/1903.06529v1
PDF http://arxiv.org/pdf/1903.06529v1.pdf
PWC https://paperswithcode.com/paper/noisy-supervision-for-correcting-misaligned
Repo https://github.com/Lydorn/mapalignment
Framework tf

Learning to synthesise the ageing brain without longitudinal data

Title Learning to synthesise the ageing brain without longitudinal data
Authors Tian Xia, Agisilaos Chartsias, Chengjia Wang, Sotirios A. Tsaftaris
Abstract Brain ageing is a continuous process that is affected by many factors including neurodegenerative diseases. Understanding this process is of great value for both neuroscience research and clinical applications. However, revealing underlying mechanisms is challenging due to the lack of longitudinal data. In this paper, we propose a deep learning-based method that learns to simulate subject-specific brain ageing trajectories without relying on longitudinal data. Our method synthesises aged images using a network conditioned on two clinical variables: age as a continuous variable, and health state, i.e. status of Alzheimer’s Disease (AD) for this work, as an ordinal variable. We adopt an adversarial loss to learn the joint distribution of brain appearance and clinical variables and define reconstruction losses that help preserve subject identity. To demonstrate our model, we compare with several approaches using two widely used datasets: Cam-CAN and ADNI. We use ground-truth longitudinal data from ADNI to evaluate the quality of synthesised images. A pre-trained age predictor, which estimates the apparent age of a brain image, is used to assess age accuracy. In addition, we show that we can train the model on Cam-CAN data and evaluate on the longitudinal data from ADNI, indicating the generalisation power of our approach. Both qualitative and quantitative results show that our method can progressively simulate the ageing process by synthesising realistic brain images. The code will be made publicly available at: https://github.com/xiat0616/BrainAgeing.
Tasks
Published 2019-12-04
URL https://arxiv.org/abs/1912.02620v2
PDF https://arxiv.org/pdf/1912.02620v2.pdf
PWC https://paperswithcode.com/paper/learning-to-synthesise-the-ageing-brain
Repo https://github.com/xiat0616/BrainAgeing
Framework none

AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

Title AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation
Authors Hyeongmin Lee, Taeoh Kim, Tae-young Chung, Daehyun Pak, Yuseok Ban, Sangyoun Lee
Abstract Video frame interpolation is one of the most challenging tasks in video processing research. Recently, many studies based on deep learning have been suggested. Most of these methods focus on finding locations with useful information to estimate each output pixel using their own frame warping operations. However, many of them have Degrees of Freedom (DoF) limitations and fail to deal with the complex motions found in real world videos. To solve this problem, we propose a new warping module named Adaptive Collaboration of Flows (AdaCoF). Our method estimates both kernel weights and offset vectors for each target pixel to synthesize the output frame. AdaCoF is one of the most generalized warping modules compared to other approaches, and covers most of them as special cases of it. Therefore, it can deal with a significantly wide domain of complex motions. To further improve our framework and synthesize more realistic outputs, we introduce dual-frame adversarial loss which is applicable only to video frame interpolation tasks. The experimental results show that our method outperforms the state-of-the-art methods for both fixed training set environments and the Middlebury benchmark.
Tasks Video Frame Interpolation
Published 2019-07-24
URL https://arxiv.org/abs/1907.10244v3
PDF https://arxiv.org/pdf/1907.10244v3.pdf
PWC https://paperswithcode.com/paper/learning-spatial-transform-for-video-frame
Repo https://github.com/HyeongminLEE/AdaCoF-pytorch
Framework pytorch

NAIL: A General Interactive Fiction Agent

Title NAIL: A General Interactive Fiction Agent
Authors Matthew Hausknecht, Ricky Loynd, Greg Yang, Adith Swaminathan, Jason D. Williams
Abstract Interactive Fiction (IF) games are complex textual decision making problems. This paper introduces NAIL, an autonomous agent for general parser-based IF games. NAIL won the 2018 Text Adventure AI Competition, where it was evaluated on twenty unseen games. This paper describes the architecture, development, and insights underpinning NAIL’s performance.
Tasks Decision Making
Published 2019-02-12
URL http://arxiv.org/abs/1902.04259v2
PDF http://arxiv.org/pdf/1902.04259v2.pdf
PWC https://paperswithcode.com/paper/nail-a-general-interactive-fiction-agent
Repo https://github.com/Microsoft/nail_agent
Framework none

BERT with History Answer Embedding for Conversational Question Answering

Title BERT with History Answer Embedding for Conversational Question Answering
Authors Chen Qu, Liu Yang, Minghui Qiu, W. Bruce Croft, Yongfeng Zhang, Mohit Iyyer
Abstract Conversational search is an emerging topic in the information retrieval community. One of the major challenges to multi-turn conversational search is to model the conversation history to answer the current question. Existing methods either prepend history turns to the current question or use complicated attention mechanisms to model the history. We propose a conceptually simple yet highly effective approach referred to as history answer embedding. It enables seamless integration of conversation history into a conversational question answering (ConvQA) model built on BERT (Bidirectional Encoder Representations from Transformers). We first explain our view that ConvQA is a simplified but concrete setting of conversational search, and then we provide a general framework to solve ConvQA. We further demonstrate the effectiveness of our approach under this framework. Finally, we analyze the impact of different numbers of history turns under different settings to provide new insights into conversation history modeling in ConvQA.
Tasks Information Retrieval, Question Answering
Published 2019-05-14
URL https://arxiv.org/abs/1905.05412v2
PDF https://arxiv.org/pdf/1905.05412v2.pdf
PWC https://paperswithcode.com/paper/bert-with-history-answer-embedding-for
Repo https://github.com/prdwb/bert_hae
Framework tf

Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels

Title Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels
Authors Pengfei Chen, Benben Liao, Guangyong Chen, Shengyu Zhang
Abstract Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) as DNNs usually have the high capacity to memorize the noisy labels. In this paper, we find that the test accuracy can be quantitatively characterized in terms of the noise ratio in datasets. In particular, the test accuracy is a quadratic function of the noise ratio in the case of symmetric noise, which explains the experimental findings previously published. Based on our analysis, we apply cross-validation to randomly split noisy datasets, which identifies most samples that have correct labels. Then we adopt the Co-teaching strategy which takes full advantage of the identified samples to train DNNs robustly against noisy labels. Compared with extensive state-of-the-art methods, our strategy consistently improves the generalization performance of DNNs under both synthetic and real-world training noise.
Tasks
Published 2019-05-13
URL https://arxiv.org/abs/1905.05040v1
PDF https://arxiv.org/pdf/1905.05040v1.pdf
PWC https://paperswithcode.com/paper/understanding-and-utilizing-deep-neural
Repo https://github.com/chenpf1025/noisy_label_understanding_utilizing
Framework tf

Assistive Gym: A Physics Simulation Framework for Assistive Robotics

Title Assistive Gym: A Physics Simulation Framework for Assistive Robotics
Authors Zackory Erickson, Vamsee Gangaram, Ariel Kapusta, C. Karen Liu, Charles C. Kemp
Abstract Autonomous robots have the potential to serve as versatile caregivers that improve quality of life for millions of people worldwide. Yet, conducting research in this area presents numerous challenges, including the risks of physical interaction between people and robots. Physics simulations have been used to optimize and train robots for physical assistance, but have typically focused on a single task. In this paper, we present Assistive Gym, an open source physics simulation framework for assistive robots that models multiple tasks. It includes six simulated environments in which a robotic manipulator can attempt to assist a person with activities of daily living (ADLs): itch scratching, drinking, feeding, body manipulation, dressing, and bathing. Assistive Gym models a person’s physical capabilities and preferences for assistance, which are used to provide a reward function. We present baseline policies trained using reinforcement learning for four different commercial robots in the six environments. We demonstrate that modeling human motion results in better assistance and we compare the performance of different robots. Overall, we show that Assistive Gym is a promising tool for assistive robotics research.
Tasks
Published 2019-10-10
URL https://arxiv.org/abs/1910.04700v1
PDF https://arxiv.org/pdf/1910.04700v1.pdf
PWC https://paperswithcode.com/paper/assistive-gym-a-physics-simulation-framework
Repo https://github.com/Healthcare-Robotics/assistive-gym
Framework none

Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation

Title Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation
Authors Xinjie Fan, Yizhe Zhang, Zhendong Wang, Mingyuan Zhou
Abstract Sequence generation models are commonly refined with reinforcement learning over user-defined metrics. However, high gradient variance hinders the practical use of this method. To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. Due to the correlation, the number of unique rollouts is random and adaptive to model uncertainty; those rollouts naturally become baselines for each other, and hence are combined to effectively reduce gradient variance. We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios by decomposing each categorical action into a sequence of binary actions. We evaluate our methods on both neural program synthesis and image captioning. The proposed methods yield lower gradient variance and consistent improvement over related baselines.
Tasks Image Captioning, Program Synthesis
Published 2019-12-31
URL https://arxiv.org/abs/1912.13151v1
PDF https://arxiv.org/pdf/1912.13151v1.pdf
PWC https://paperswithcode.com/paper/adaptive-correlated-monte-carlo-for-1
Repo https://github.com/xinjiefan/ACMC_ICLR
Framework pytorch

ANDA: A Novel Data Augmentation Technique Applied to Salient Object Detection

Title ANDA: A Novel Data Augmentation Technique Applied to Salient Object Detection
Authors Daniel V. Ruiz, Bruno A. Krinski, Eduardo Todt
Abstract In this paper, we propose a novel data augmentation technique (ANDA) applied to the Salient Object Detection (SOD) context. Standard data augmentation techniques proposed in the literature, such as image cropping, rotation, flipping, and resizing, only generate variations of the existing examples, providing a limited generalization. Our method has the novelty of creating new images, by combining an object with a new background while retaining part of its salience in this new context; To do so, the ANDA technique relies on the linear combination between labeled salient objects and new backgrounds, generated by removing the original salient object in a process known as image inpainting. Our proposed technique allows for more precise control of the object’s position and size while preserving background information. Aiming to evaluate our proposed method, we trained multiple deep neural networks and compared the effect that our technique has in each one. We also compared our method with other data augmentation techniques. Our findings show that depending on the network improvement can be up to 14.1% in the F-measure and decay of up to 2.6% in the Mean Absolute Error.
Tasks Data Augmentation, Image Augmentation, Image Cropping, Image Inpainting, Object Detection, Salient Object Detection
Published 2019-10-03
URL https://arxiv.org/abs/1910.01256v1
PDF https://arxiv.org/pdf/1910.01256v1.pdf
PWC https://paperswithcode.com/paper/anda-a-novel-data-augmentation-technique
Repo https://github.com/ruizvitor/ANDA
Framework none

Text-based Depression Detection: What Triggers An Alert

Title Text-based Depression Detection: What Triggers An Alert
Authors Heinrich Dinkel, Mengyue Wu, Kai Yu
Abstract Recent advances in automatic depression detection mostly derive from modality fusion and deep learning methods. However multi-modal approaches insert significant difficulty in data collection phase while deep learning methods’ opaqueness lowers its credibility. This current work proposes a text-based multi-task BLSTM model with pretrained word embeddings. Our method outputs depression presence results as well as predicted severity score, culminating a state-of-the-art F1 score of 0.87, outperforming previous multi-modal studies. We also achieve the lowest RMSE compared with currently available text-based approaches. Further, by utilizing a per time step attention mechanism we analyse the sentences/words that contribute most in predicting the depressed state. Surprisingly, unmeaningful' words/paralinguistic information such as um’ and `uh’ are the indicators to our model when making a depression prediction. It is for the first time revealed that fillers in a conversation trigger a depression alert for a deep learning model. |
Tasks Word Embeddings
Published 2019-04-08
URL https://arxiv.org/abs/1904.05154v2
PDF https://arxiv.org/pdf/1904.05154v2.pdf
PWC https://paperswithcode.com/paper/text-based-depression-detection-what-triggers
Repo https://github.com/richermans/text_based_depression
Framework pytorch

rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch

Title rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch
Authors Adam Stooke, Pieter Abbeel
Abstract Since the recent advent of deep reinforcement learning for game play and simulated robotic control, a multitude of new algorithms have flourished. Most are model-free algorithms which can be categorized into three families: deep Q-learning, policy gradients, and Q-value policy gradients. These have developed along separate lines of research, such that few, if any, code bases incorporate all three kinds. Yet these algorithms share a great depth of common deep reinforcement learning machinery. We are pleased to share rlpyt, which implements all three algorithm families on top of a shared, optimized infrastructure, in a single repository. It contains modular implementations of many common deep RL algorithms in Python using PyTorch, a leading deep learning library. rlpyt is designed as a high-throughput code base for small- to medium-scale research in deep RL. This white paper summarizes its features, algorithms implemented, and relation to prior work, and concludes with detailed implementation and usage notes. rlpyt is available at https://github.com/astooke/rlpyt.
Tasks Q-Learning
Published 2019-09-03
URL https://arxiv.org/abs/1909.01500v2
PDF https://arxiv.org/pdf/1909.01500v2.pdf
PWC https://paperswithcode.com/paper/rlpyt-a-research-code-base-for-deep
Repo https://github.com/sarahisyoung/rlpyt
Framework pytorch

Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention

Title Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention
Authors Ye Xia, Jinkyu Kim, John Canny, Karl Zipser, David Whitney
Abstract Inspired by human vision, we propose a new periphery-fovea multi-resolution driving model that predicts vehicle speed from dash camera videos. The peripheral vision module of the model processes the full video frames in low resolution. Its foveal vision module selects sub-regions and uses high-resolution input from those regions to improve its driving performance. We train the fovea selection module with supervision from driver gaze. We show that adding high-resolution input from predicted human driver gaze locations significantly improves the driving accuracy of the model. Our periphery-fovea multi-resolution model outperforms a uni-resolution periphery-only model that has the same amount of floating-point operations. More importantly, we demonstrate that our driving model achieves a significantly higher performance gain in pedestrian-involved critical situations than in other non-critical situations.
Tasks
Published 2019-03-24
URL http://arxiv.org/abs/1903.09950v1
PDF http://arxiv.org/pdf/1903.09950v1.pdf
PWC https://paperswithcode.com/paper/periphery-fovea-multi-resolution-driving
Repo https://github.com/pascalxia/periphery_fovea_driving
Framework tf
comments powered by Disqus