Paper Group AWR 366
Robust sound event detection in bioacoustic sensor networks. Learning Metrics from Teachers: Compact Networks for Image Embedding. SciBERT: A Pretrained Language Model for Scientific Text. Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data. Learning to synthesise the ageing brain without longitudinal data. A …
Robust sound event detection in bioacoustic sensor networks
Title | Robust sound event detection in bioacoustic sensor networks |
Authors | Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello |
Abstract | Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 milliseconds) and long-term (30 minutes) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer. Combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings. |
Tasks | Data Augmentation, Sound Event Detection |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.08352v2 |
https://arxiv.org/pdf/1905.08352v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-sound-event-detection-in-bioacoustic |
Repo | https://github.com/BirdVox/birdvoxdetect |
Framework | tf |
Learning Metrics from Teachers: Compact Networks for Image Embedding
Title | Learning Metrics from Teachers: Compact Networks for Image Embedding |
Authors | Lu Yu, Vacit Oguz Yazici, Xialei Liu, Joost van de Weijer, Yongmei Cheng, Arnau Ramisa |
Abstract | Metric learning networks are used to compute image embeddings, which are widely used in many applications such as image retrieval and face recognition. In this paper, we propose to use network distillation to efficiently compute image embeddings with small networks. Network distillation has been successfully applied to improve image classification, but has hardly been explored for metric learning. To do so, we propose two new loss functions that model the communication of a deep teacher network to a small student network. We evaluate our system in several datasets, including CUB-200-2011, Cars-196, Stanford Online Products and show that embeddings computed using small student networks perform significantly better than those computed using standard networks of similar size. Results on a very compact network (MobileNet-0.25), which can be used on mobile devices, show that the proposed method can greatly improve Recall@1 results from 27.5% to 44.6%. Furthermore, we investigate various aspects of distillation for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. (Code is available at https://github.com/yulu0724/EmbeddingDistillation.) |
Tasks | Face Recognition, Image Classification, Image Retrieval, Metric Learning |
Published | 2019-04-07 |
URL | http://arxiv.org/abs/1904.03624v1 |
http://arxiv.org/pdf/1904.03624v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-metrics-from-teachers-compact |
Repo | https://github.com/yulu0724/EmbeddingDistillation |
Framework | pytorch |
SciBERT: A Pretrained Language Model for Scientific Text
Title | SciBERT: A Pretrained Language Model for Scientific Text |
Authors | Iz Beltagy, Kyle Lo, Arman Cohan |
Abstract | Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing, with datasets from a variety of scientific domains. We demonstrate statistically significant improvements over BERT and achieve new state-of-the-art results on several of these tasks. The code and pretrained models are available at https://github.com/allenai/scibert/. |
Tasks | Citation Intent Classification, Dependency Parsing, Language Modelling, Medical Named Entity Recognition, Named Entity Recognition, Participant Intervention Comparison Outcome Extraction, Relation Extraction, Sentence Classification |
Published | 2019-03-26 |
URL | https://arxiv.org/abs/1903.10676v3 |
https://arxiv.org/pdf/1903.10676v3.pdf | |
PWC | https://paperswithcode.com/paper/scibert-pretrained-contextualized-embeddings |
Repo | https://github.com/allenai/scibert |
Framework | pytorch |
Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data
Title | Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data |
Authors | Nicolas Girard, Guillaume Charpiat, Yuliya Tarabalka |
Abstract | In machine learning the best performance on a certain task is achieved by fully supervised methods when perfect ground truth labels are available. However, labels are often noisy, especially in remote sensing where manually curated public datasets are rare. We study the multi-modal cadaster map alignment problem for which available annotations are mis-aligned polygons, resulting in noisy supervision. We subsequently set up a multiple-rounds training scheme which corrects the ground truth annotations at each round to better train the model at the next round. We show that it is possible to reduce the noise of the dataset by iteratively training a better alignment model to correct the annotation alignment. |
Tasks | |
Published | 2019-03-12 |
URL | http://arxiv.org/abs/1903.06529v1 |
http://arxiv.org/pdf/1903.06529v1.pdf | |
PWC | https://paperswithcode.com/paper/noisy-supervision-for-correcting-misaligned |
Repo | https://github.com/Lydorn/mapalignment |
Framework | tf |
Learning to synthesise the ageing brain without longitudinal data
Title | Learning to synthesise the ageing brain without longitudinal data |
Authors | Tian Xia, Agisilaos Chartsias, Chengjia Wang, Sotirios A. Tsaftaris |
Abstract | Brain ageing is a continuous process that is affected by many factors including neurodegenerative diseases. Understanding this process is of great value for both neuroscience research and clinical applications. However, revealing underlying mechanisms is challenging due to the lack of longitudinal data. In this paper, we propose a deep learning-based method that learns to simulate subject-specific brain ageing trajectories without relying on longitudinal data. Our method synthesises aged images using a network conditioned on two clinical variables: age as a continuous variable, and health state, i.e. status of Alzheimer’s Disease (AD) for this work, as an ordinal variable. We adopt an adversarial loss to learn the joint distribution of brain appearance and clinical variables and define reconstruction losses that help preserve subject identity. To demonstrate our model, we compare with several approaches using two widely used datasets: Cam-CAN and ADNI. We use ground-truth longitudinal data from ADNI to evaluate the quality of synthesised images. A pre-trained age predictor, which estimates the apparent age of a brain image, is used to assess age accuracy. In addition, we show that we can train the model on Cam-CAN data and evaluate on the longitudinal data from ADNI, indicating the generalisation power of our approach. Both qualitative and quantitative results show that our method can progressively simulate the ageing process by synthesising realistic brain images. The code will be made publicly available at: https://github.com/xiat0616/BrainAgeing. |
Tasks | |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.02620v2 |
https://arxiv.org/pdf/1912.02620v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-synthesise-the-ageing-brain |
Repo | https://github.com/xiat0616/BrainAgeing |
Framework | none |
AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation
Title | AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation |
Authors | Hyeongmin Lee, Taeoh Kim, Tae-young Chung, Daehyun Pak, Yuseok Ban, Sangyoun Lee |
Abstract | Video frame interpolation is one of the most challenging tasks in video processing research. Recently, many studies based on deep learning have been suggested. Most of these methods focus on finding locations with useful information to estimate each output pixel using their own frame warping operations. However, many of them have Degrees of Freedom (DoF) limitations and fail to deal with the complex motions found in real world videos. To solve this problem, we propose a new warping module named Adaptive Collaboration of Flows (AdaCoF). Our method estimates both kernel weights and offset vectors for each target pixel to synthesize the output frame. AdaCoF is one of the most generalized warping modules compared to other approaches, and covers most of them as special cases of it. Therefore, it can deal with a significantly wide domain of complex motions. To further improve our framework and synthesize more realistic outputs, we introduce dual-frame adversarial loss which is applicable only to video frame interpolation tasks. The experimental results show that our method outperforms the state-of-the-art methods for both fixed training set environments and the Middlebury benchmark. |
Tasks | Video Frame Interpolation |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10244v3 |
https://arxiv.org/pdf/1907.10244v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-spatial-transform-for-video-frame |
Repo | https://github.com/HyeongminLEE/AdaCoF-pytorch |
Framework | pytorch |
NAIL: A General Interactive Fiction Agent
Title | NAIL: A General Interactive Fiction Agent |
Authors | Matthew Hausknecht, Ricky Loynd, Greg Yang, Adith Swaminathan, Jason D. Williams |
Abstract | Interactive Fiction (IF) games are complex textual decision making problems. This paper introduces NAIL, an autonomous agent for general parser-based IF games. NAIL won the 2018 Text Adventure AI Competition, where it was evaluated on twenty unseen games. This paper describes the architecture, development, and insights underpinning NAIL’s performance. |
Tasks | Decision Making |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04259v2 |
http://arxiv.org/pdf/1902.04259v2.pdf | |
PWC | https://paperswithcode.com/paper/nail-a-general-interactive-fiction-agent |
Repo | https://github.com/Microsoft/nail_agent |
Framework | none |
BERT with History Answer Embedding for Conversational Question Answering
Title | BERT with History Answer Embedding for Conversational Question Answering |
Authors | Chen Qu, Liu Yang, Minghui Qiu, W. Bruce Croft, Yongfeng Zhang, Mohit Iyyer |
Abstract | Conversational search is an emerging topic in the information retrieval community. One of the major challenges to multi-turn conversational search is to model the conversation history to answer the current question. Existing methods either prepend history turns to the current question or use complicated attention mechanisms to model the history. We propose a conceptually simple yet highly effective approach referred to as history answer embedding. It enables seamless integration of conversation history into a conversational question answering (ConvQA) model built on BERT (Bidirectional Encoder Representations from Transformers). We first explain our view that ConvQA is a simplified but concrete setting of conversational search, and then we provide a general framework to solve ConvQA. We further demonstrate the effectiveness of our approach under this framework. Finally, we analyze the impact of different numbers of history turns under different settings to provide new insights into conversation history modeling in ConvQA. |
Tasks | Information Retrieval, Question Answering |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05412v2 |
https://arxiv.org/pdf/1905.05412v2.pdf | |
PWC | https://paperswithcode.com/paper/bert-with-history-answer-embedding-for |
Repo | https://github.com/prdwb/bert_hae |
Framework | tf |
Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels
Title | Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels |
Authors | Pengfei Chen, Benben Liao, Guangyong Chen, Shengyu Zhang |
Abstract | Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) as DNNs usually have the high capacity to memorize the noisy labels. In this paper, we find that the test accuracy can be quantitatively characterized in terms of the noise ratio in datasets. In particular, the test accuracy is a quadratic function of the noise ratio in the case of symmetric noise, which explains the experimental findings previously published. Based on our analysis, we apply cross-validation to randomly split noisy datasets, which identifies most samples that have correct labels. Then we adopt the Co-teaching strategy which takes full advantage of the identified samples to train DNNs robustly against noisy labels. Compared with extensive state-of-the-art methods, our strategy consistently improves the generalization performance of DNNs under both synthetic and real-world training noise. |
Tasks | |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05040v1 |
https://arxiv.org/pdf/1905.05040v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-and-utilizing-deep-neural |
Repo | https://github.com/chenpf1025/noisy_label_understanding_utilizing |
Framework | tf |
Assistive Gym: A Physics Simulation Framework for Assistive Robotics
Title | Assistive Gym: A Physics Simulation Framework for Assistive Robotics |
Authors | Zackory Erickson, Vamsee Gangaram, Ariel Kapusta, C. Karen Liu, Charles C. Kemp |
Abstract | Autonomous robots have the potential to serve as versatile caregivers that improve quality of life for millions of people worldwide. Yet, conducting research in this area presents numerous challenges, including the risks of physical interaction between people and robots. Physics simulations have been used to optimize and train robots for physical assistance, but have typically focused on a single task. In this paper, we present Assistive Gym, an open source physics simulation framework for assistive robots that models multiple tasks. It includes six simulated environments in which a robotic manipulator can attempt to assist a person with activities of daily living (ADLs): itch scratching, drinking, feeding, body manipulation, dressing, and bathing. Assistive Gym models a person’s physical capabilities and preferences for assistance, which are used to provide a reward function. We present baseline policies trained using reinforcement learning for four different commercial robots in the six environments. We demonstrate that modeling human motion results in better assistance and we compare the performance of different robots. Overall, we show that Assistive Gym is a promising tool for assistive robotics research. |
Tasks | |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04700v1 |
https://arxiv.org/pdf/1910.04700v1.pdf | |
PWC | https://paperswithcode.com/paper/assistive-gym-a-physics-simulation-framework |
Repo | https://github.com/Healthcare-Robotics/assistive-gym |
Framework | none |
Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation
Title | Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation |
Authors | Xinjie Fan, Yizhe Zhang, Zhendong Wang, Mingyuan Zhou |
Abstract | Sequence generation models are commonly refined with reinforcement learning over user-defined metrics. However, high gradient variance hinders the practical use of this method. To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. Due to the correlation, the number of unique rollouts is random and adaptive to model uncertainty; those rollouts naturally become baselines for each other, and hence are combined to effectively reduce gradient variance. We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios by decomposing each categorical action into a sequence of binary actions. We evaluate our methods on both neural program synthesis and image captioning. The proposed methods yield lower gradient variance and consistent improvement over related baselines. |
Tasks | Image Captioning, Program Synthesis |
Published | 2019-12-31 |
URL | https://arxiv.org/abs/1912.13151v1 |
https://arxiv.org/pdf/1912.13151v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-correlated-monte-carlo-for-1 |
Repo | https://github.com/xinjiefan/ACMC_ICLR |
Framework | pytorch |
ANDA: A Novel Data Augmentation Technique Applied to Salient Object Detection
Title | ANDA: A Novel Data Augmentation Technique Applied to Salient Object Detection |
Authors | Daniel V. Ruiz, Bruno A. Krinski, Eduardo Todt |
Abstract | In this paper, we propose a novel data augmentation technique (ANDA) applied to the Salient Object Detection (SOD) context. Standard data augmentation techniques proposed in the literature, such as image cropping, rotation, flipping, and resizing, only generate variations of the existing examples, providing a limited generalization. Our method has the novelty of creating new images, by combining an object with a new background while retaining part of its salience in this new context; To do so, the ANDA technique relies on the linear combination between labeled salient objects and new backgrounds, generated by removing the original salient object in a process known as image inpainting. Our proposed technique allows for more precise control of the object’s position and size while preserving background information. Aiming to evaluate our proposed method, we trained multiple deep neural networks and compared the effect that our technique has in each one. We also compared our method with other data augmentation techniques. Our findings show that depending on the network improvement can be up to 14.1% in the F-measure and decay of up to 2.6% in the Mean Absolute Error. |
Tasks | Data Augmentation, Image Augmentation, Image Cropping, Image Inpainting, Object Detection, Salient Object Detection |
Published | 2019-10-03 |
URL | https://arxiv.org/abs/1910.01256v1 |
https://arxiv.org/pdf/1910.01256v1.pdf | |
PWC | https://paperswithcode.com/paper/anda-a-novel-data-augmentation-technique |
Repo | https://github.com/ruizvitor/ANDA |
Framework | none |
Text-based Depression Detection: What Triggers An Alert
Title | Text-based Depression Detection: What Triggers An Alert |
Authors | Heinrich Dinkel, Mengyue Wu, Kai Yu |
Abstract | Recent advances in automatic depression detection mostly derive from modality fusion and deep learning methods. However multi-modal approaches insert significant difficulty in data collection phase while deep learning methods’ opaqueness lowers its credibility. This current work proposes a text-based multi-task BLSTM model with pretrained word embeddings. Our method outputs depression presence results as well as predicted severity score, culminating a state-of-the-art F1 score of 0.87, outperforming previous multi-modal studies. We also achieve the lowest RMSE compared with currently available text-based approaches. Further, by utilizing a per time step attention mechanism we analyse the sentences/words that contribute most in predicting the depressed state. Surprisingly, unmeaningful' words/paralinguistic information such as um’ and `uh’ are the indicators to our model when making a depression prediction. It is for the first time revealed that fillers in a conversation trigger a depression alert for a deep learning model. | |
Tasks | Word Embeddings |
Published | 2019-04-08 |
URL | https://arxiv.org/abs/1904.05154v2 |
https://arxiv.org/pdf/1904.05154v2.pdf | |
PWC | https://paperswithcode.com/paper/text-based-depression-detection-what-triggers |
Repo | https://github.com/richermans/text_based_depression |
Framework | pytorch |
rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch
Title | rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch |
Authors | Adam Stooke, Pieter Abbeel |
Abstract | Since the recent advent of deep reinforcement learning for game play and simulated robotic control, a multitude of new algorithms have flourished. Most are model-free algorithms which can be categorized into three families: deep Q-learning, policy gradients, and Q-value policy gradients. These have developed along separate lines of research, such that few, if any, code bases incorporate all three kinds. Yet these algorithms share a great depth of common deep reinforcement learning machinery. We are pleased to share rlpyt, which implements all three algorithm families on top of a shared, optimized infrastructure, in a single repository. It contains modular implementations of many common deep RL algorithms in Python using PyTorch, a leading deep learning library. rlpyt is designed as a high-throughput code base for small- to medium-scale research in deep RL. This white paper summarizes its features, algorithms implemented, and relation to prior work, and concludes with detailed implementation and usage notes. rlpyt is available at https://github.com/astooke/rlpyt. |
Tasks | Q-Learning |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01500v2 |
https://arxiv.org/pdf/1909.01500v2.pdf | |
PWC | https://paperswithcode.com/paper/rlpyt-a-research-code-base-for-deep |
Repo | https://github.com/sarahisyoung/rlpyt |
Framework | pytorch |
Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention
Title | Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention |
Authors | Ye Xia, Jinkyu Kim, John Canny, Karl Zipser, David Whitney |
Abstract | Inspired by human vision, we propose a new periphery-fovea multi-resolution driving model that predicts vehicle speed from dash camera videos. The peripheral vision module of the model processes the full video frames in low resolution. Its foveal vision module selects sub-regions and uses high-resolution input from those regions to improve its driving performance. We train the fovea selection module with supervision from driver gaze. We show that adding high-resolution input from predicted human driver gaze locations significantly improves the driving accuracy of the model. Our periphery-fovea multi-resolution model outperforms a uni-resolution periphery-only model that has the same amount of floating-point operations. More importantly, we demonstrate that our driving model achieves a significantly higher performance gain in pedestrian-involved critical situations than in other non-critical situations. |
Tasks | |
Published | 2019-03-24 |
URL | http://arxiv.org/abs/1903.09950v1 |
http://arxiv.org/pdf/1903.09950v1.pdf | |
PWC | https://paperswithcode.com/paper/periphery-fovea-multi-resolution-driving |
Repo | https://github.com/pascalxia/periphery_fovea_driving |
Framework | tf |