April 2, 2020

3206 words 16 mins read

Paper Group ANR 363

Calibrating Deep Neural Networks using Focal Loss. Understanding Image Captioning Models beyond Visualizing Attention. Minimax optimal approaches to the label shift problem. Enhanced Adversarial Strategically-Timed Attacks against Deep Reinforcement Learning. Incremental Learning Algorithm for Sound Event Detection. Compact recurrent neural network …

Calibrating Deep Neural Networks using Focal Loss


Title	Calibrating Deep Neural Networks using Focal Loss
Authors	Jishnu Mukhoti, Viveka Kulharia, Amartya Sanyal, Stuart Golodetz, Philip H. S. Torr, Puneet K. Dokania
Abstract	Miscalibration – a mismatch between a model’s confidence and its correctness – of Deep Neural Networks (DNNs) makes their predictions hard to rely on. Ideally, we want networks to be accurate, calibrated and confident. We show that, as opposed to the standard cross-entropy loss, focal loss (Lin et al., 2017) allows us to learn models that are already very well calibrated. When combined with temperature scaling, whilst preserving accuracy, it yields state-of-the-art calibrated models. We provide a thorough analysis of the factors causing miscalibration, and use the insights we glean from this to justify the empirically excellent performance of focal loss. To facilitate the use of focal loss in practice, we also provide a principled approach to automatically select the hyperparameter involved in the loss function. We perform extensive experiments on a variety of computer vision and NLP datasets, and with a wide variety of network architectures, and show that our approach achieves state-of-the-art accuracy and calibration in almost all cases.
Tasks	Calibration
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09437v1
PDF	https://arxiv.org/pdf/2002.09437v1.pdf
PWC	https://paperswithcode.com/paper/calibrating-deep-neural-networks-using-focal
Repo
Framework

Understanding Image Captioning Models beyond Visualizing Attention


Title	Understanding Image Captioning Models beyond Visualizing Attention
Authors	Jiamei Sun, Sebastian Lapuschkin, Wojciech Samek, Alexander Binder
Abstract	This paper explains predictions of image captioning models with attention mechanisms beyond visualizing the attention itself. In this paper, we develop variants of layer-wise relevance backpropagation (LRP) and gradient backpropagation, tailored to image captioning with attention. The result provides simultaneously pixel-wise image explanation and linguistic explanation for each word in the captions. We show that given a word in the caption to be explained, explanation methods such as LRP reveal supporting and opposing pixels as well as words. We compare the properties of attention heatmaps systematically against those computed with explanation methods such as LRP, Grad-CAM and Guided Grad-CAM. We show that explanation methods, firstly, correlate to object locations with higher precision than attention, secondly, are able to identify object words that are unsupported by image content, and thirdly, provide guidance to debias and improve the model. Results are reported for image captioning using two different attention models trained with Flickr30K and MSCOCO2017 datasets. Experimental analyses show the strength of explanation methods for understanding image captioning attention models.
Tasks	Image Captioning
Published	2020-01-04
URL	https://arxiv.org/abs/2001.01037v2
PDF	https://arxiv.org/pdf/2001.01037v2.pdf
PWC	https://paperswithcode.com/paper/understanding-image-captioning-models-beyond
Repo
Framework

Minimax optimal approaches to the label shift problem


Title	Minimax optimal approaches to the label shift problem
Authors	Subha Maity, Yuekai Sun, Moulinath Banerjee
Abstract	We study minimax rates of convergence in the label shift problem. In addition to the usual setting in which the learner only has access to unlabeled examples from the target domain, we also consider the setting in which a small number of labeled examples from the target domain are available to the learner. Our study reveals a difference in the difficulty of the label shift problem in the two settings. We attribute this difference to the availability of data from the target domain to estimate the class conditional distributions in the latter setting. We also show that a distributional matching approach is minimax rate-optimal in the former setting.
Tasks
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10443v1
PDF	https://arxiv.org/pdf/2003.10443v1.pdf
PWC	https://paperswithcode.com/paper/minimax-optimal-approaches-to-the-label-shift
Repo
Framework

Enhanced Adversarial Strategically-Timed Attacks against Deep Reinforcement Learning


Title	Enhanced Adversarial Strategically-Timed Attacks against Deep Reinforcement Learning
Authors	Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Yi Ouyang, I-Te Danny Hung, Chin-Hui Lee, Xiaoli Ma
Abstract	Recent deep neural networks based techniques, especially those equipped with the ability of self-adaptation in the system level such as deep reinforcement learning (DRL), are shown to possess many advantages of optimizing robot learning systems (e.g., autonomous navigation and continuous robot arm control.) However, the learning-based systems and the associated models may be threatened by the risks of intentionally adaptive (e.g., noisy sensor confusion) and adversarial perturbations from real-world scenarios. In this paper, we introduce timing-based adversarial strategies against a DRL-based navigation system by jamming in physical noise patterns on the selected time frames. To study the vulnerability of learning-based navigation systems, we propose two adversarial agent models: one refers to online learning; another one is based on evolutionary learning. Besides, three open-source robot learning and navigation control environments are employed to study the vulnerability under adversarial timing attacks. Our experimental results show that the adversarial timing attacks can lead to a significant performance drop, and also suggest the necessity of enhancing the robustness of robot learning systems.
Tasks	Autonomous Navigation
Published	2020-02-20
URL	https://arxiv.org/abs/2002.09027v1
PDF	https://arxiv.org/pdf/2002.09027v1.pdf
PWC	https://paperswithcode.com/paper/enhanced-adversarial-strategically-timed
Repo
Framework

Incremental Learning Algorithm for Sound Event Detection


Title	Incremental Learning Algorithm for Sound Event Detection
Authors	Eunjeong Koh, Fatemeh Saki, Yinyi Guo, Cheng-Yu Hung, Erik Visser
Abstract	This paper presents a new learning strategy for the Sound Event Detection (SED) system to tackle the issues of i) knowledge migration from a pre-trained model to a new target model and ii) learning new sound events without forgetting the previously learned ones without re-training from scratch. In order to migrate the previously learned knowledge from the source model to the target one, a neural adapter is employed on the top of the source model. The source model and the target model are merged via this neural adapter layer. The neural adapter layer facilitates the target model to learn new sound events with minimal training data and maintaining the performance of the previously learned sound events similar to the source model. Our extensive analysis on the DCASE16 and US-SED dataset reveals the effectiveness of the proposed method in transferring knowledge between source and target models without introducing any performance degradation on the previously learned sound events while obtaining a competitive detection performance on the newly learned sound events.
Tasks	Sound Event Detection
Published	2020-03-26
URL	https://arxiv.org/abs/2003.12175v1
PDF	https://arxiv.org/pdf/2003.12175v1.pdf
PWC	https://paperswithcode.com/paper/incremental-learning-algorithm-for-sound
Repo
Framework

Compact recurrent neural networks for acoustic event detection on low-energy low-complexity platforms


Title	Compact recurrent neural networks for acoustic event detection on low-energy low-complexity platforms
Authors	Gianmarco Cerutti, Rahul Prasad, Alessio Brutti, Elisabetta Farella
Abstract	Outdoor acoustic events detection is an exciting research field but challenged by the need for complex algorithms and deep learning techniques, typically requiring many computational, memory, and energy resources. This challenge discourages IoT implementation, where an efficient use of resources is required. However, current embedded technologies and microcontrollers have increased their capabilities without penalizing energy efficiency. This paper addresses the application of sound event detection at the edge, by optimizing deep learning techniques on resource-constrained embedded platforms for the IoT. The contribution is two-fold: firstly, a two-stage student-teacher approach is presented to make state-of-the-art neural networks for sound event detection fit on current microcontrollers; secondly, we test our approach on an ARM Cortex M4, particularly focusing on issues related to 8-bits quantization. Our embedded implementation can achieve 68% accuracy in recognition on Urbansound8k, not far from state-of-the-art performance, with an inference time of 125 ms for each second of the audio stream, and power consumption of 5.5 mW in just 34.3 kB of RAM.
Tasks	Quantization, Sound Event Detection
Published	2020-01-29
URL	https://arxiv.org/abs/2001.10876v1
PDF	https://arxiv.org/pdf/2001.10876v1.pdf
PWC	https://paperswithcode.com/paper/compact-recurrent-neural-networks-for
Repo
Framework


Title	Capsule Network Performance with Autonomous Navigation
Authors	Thomas Molnar, Eugenio Culurciello
Abstract	Capsule Networks (CapsNets) have been proposed as an alternative to Convolutional Neural Networks (CNNs). This paper showcases how CapsNets are more capable than CNNs for autonomous agent exploration of realistic scenarios. In real world navigation, rewards external to agents may be rare. In turn, reinforcement learning algorithms can struggle to form meaningful policy functions. This paper’s approach Capsules Exploration Module (Caps-EM) pairs a CapsNets architecture with an Advantage Actor Critic algorithm. Other approaches for navigating sparse environments require intrinsic reward generators, such as the Intrinsic Curiosity Module (ICM) and Augmented Curiosity Modules (ACM). Caps-EM uses a more compact architecture without need for intrinsic rewards. Tested using ViZDoom, the Caps-EM uses 44% and 83% fewer trainable network parameters than the ICM and Depth-Augmented Curiosity Module (D-ACM), respectively, for 1141% and 437% average time improvement over the ICM and D-ACM, respectively, for converging to a policy function across “My Way Home” scenarios.
Tasks	Autonomous Navigation
Published	2020-02-08
URL	https://arxiv.org/abs/2002.03181v1
PDF	https://arxiv.org/pdf/2002.03181v1.pdf
PWC	https://paperswithcode.com/paper/capsule-network-performance-with-autonomous
Repo
Framework

Uncovering Coresets for Classification With Multi-Objective Evolutionary Algorithms


Title	Uncovering Coresets for Classification With Multi-Objective Evolutionary Algorithms
Authors	Pietro Barbiero, Giovanni Squillero, Alberto Tonda
Abstract	A coreset is a subset of the training set, using which a machine learning algorithm obtains performances similar to what it would deliver if trained over the whole original data. Coreset discovery is an active and open line of research as it allows improving training speed for the algorithms and may help human understanding the results. Building on previous works, a novel approach is presented: candidate corsets are iteratively optimized, adding and removing samples. As there is an obvious trade-off between limiting training size and quality of the results, a multi-objective evolutionary algorithm is used to minimize simultaneously the number of points in the set and the classification error. Experimental results on non-trivial benchmarks show that the proposed approach is able to deliver results that allow a classifier to obtain lower error and better ability of generalizing on unseen data than state-of-the-art coreset discovery techniques.
Tasks
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08645v1
PDF	https://arxiv.org/pdf/2002.08645v1.pdf
PWC	https://paperswithcode.com/paper/uncovering-coresets-for-classification-with
Repo
Framework

Mass Estimation of Galaxy Clusters with Deep Learning I: Sunyaev-Zel’dovich Effect


Title	Mass Estimation of Galaxy Clusters with Deep Learning I: Sunyaev-Zel’dovich Effect
Authors	Nikhel Gupta, Christian L. Reichardt
Abstract	We present a new application of deep learning to infer the masses of galaxy clusters directly from images of the microwave sky. Effectively, this is a novel approach to determining the scaling relation between a cluster’s Sunyaev-Zel’dovich (SZ) effect signal and mass. The deep learning algorithm used is mResUNet, which is a modified feed-forward deep learning algorithm that broadly combines residual learning, convolution layers with different dilation rates, image regression activation and a U-Net framework. We train and test the deep learning model using simulated images of the microwave sky that include signals from the cosmic microwave background (CMB), dusty and radio galaxies, instrumental noise as well as the cluster’s own SZ signal. The simulated cluster sample covers the mass range 1$\times 10^{14}~\rm M_{\odot}$ $<M_{200\rm c}<$ 8$\times 10^{14}~\rm M_{\odot}$ at $z=0.7$. The trained model estimates the cluster masses with a 1 $\sigma$ uncertainty $\Delta M/M \leq 0.2$, consistent with the input scatter on the SZ signal of 20%. We verify that the model works for realistic SZ profiles even when trained on azimuthally symmetric SZ profiles by using the Magneticum hydrodynamical simulations. We find the model returns unbiased mass estimates for the hydrodynamical simulations with a scatter consistent with the SZ-mass scatter in the light cones.
Tasks
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06135v1
PDF	https://arxiv.org/pdf/2003.06135v1.pdf
PWC	https://paperswithcode.com/paper/mass-estimation-of-galaxy-clusters-with-deep
Repo
Framework


Title	A water-obstacle separation and refinement network for unmanned surface vehicles
Authors	Borja Bovcon, Matej Kristan
Abstract	Obstacle detection by semantic segmentation shows a great promise for autonomous navigation in unmanned surface vehicles (USV). However, existing methods suffer from poor estimation of the water edge in the presence of visual ambiguities, poor detection of small obstacles and high false-positive rate on water reflections and wakes. We propose a new deep encoder-decoder architecture, a water-obstacle separation and refinement network (WaSR), to address these issues. Detection and water edge accuracy are improved by a novel decoder that gradually fuses inertial information from IMU with the visual features from the encoder. In addition, a novel loss function is designed to increase the separation between water and obstacle features early on in the network. Subsequently, the capacity of the remaining layers in the decoder is better utilised, leading to a significant reduction in false positives and increased true positives. Experimental results show that WaSR outperforms the current state-of-the-art by a large margin, yielding a 14% increase in F-measure over the second-best method.
Tasks	Autonomous Navigation, Semantic Segmentation
Published	2020-01-07
URL	https://arxiv.org/abs/2001.01921v1
PDF	https://arxiv.org/pdf/2001.01921v1.pdf
PWC	https://paperswithcode.com/paper/a-water-obstacle-separation-and-refinement
Repo
Framework

Resource-Efficient Neural Networks for Embedded Systems


Title	Resource-Efficient Neural Networks for Embedded Systems
Authors	Wolfgang Roth, Günther Schindler, Matthias Zöhrer, Lukas Pfeifenberger, Robert Peharz, Sebastian Tschiatschek, Holger Fröning, Franz Pernkopf, Zoubin Ghahramani
Abstract	While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into every day’s applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We substantiate our discussion with experiments on well-known benchmark data sets to showcase the difficulty of finding good trade-offs between resource-efficiency and predictive performance.
Tasks	Autonomous Navigation, Network Pruning
Published	2020-01-07
URL	https://arxiv.org/abs/2001.03048v1
PDF	https://arxiv.org/pdf/2001.03048v1.pdf
PWC	https://paperswithcode.com/paper/resource-efficient-neural-networks-for
Repo
Framework

Coronary Artery Segmentation in Angiographic Videos Using A 3D-2D CE-Net


Title	Coronary Artery Segmentation in Angiographic Videos Using A 3D-2D CE-Net
Authors	Lu Wang, Dong-xue Liang, Xiao-lei Yin, Jing Qiu, Zhi-yun Yang, Jun-hui Xing, Jian-zeng Dong, Zhao-yuan Ma
Abstract	Coronary angiography is an indispensable assistive technique for cardiac interventional surgery. Segmentation and extraction of blood vessels from coronary angiography videos are very essential prerequisites for physicians to locate, assess and diagnose the plaques and stenosis in blood vessels. This article proposes a new video segmentation framework that can extract the clearest and most comprehensive coronary angiography images from a video sequence, thereby helping physicians to better observe the condition of blood vessels. This framework combines a 3D convolutional layer to extract spatial–temporal information from a video sequence and a 2D CE–Net to accomplish the segmentation task of an image sequence. The input is a few continuous frames of angiographic video, and the output is a mask of segmentation result. From the results of segmentation and extraction, we can get good segmentation results despite the poor quality of coronary angiography video sequences.
Tasks	Video Semantic Segmentation
Published	2020-03-26
URL	https://arxiv.org/abs/2003.11851v1
PDF	https://arxiv.org/pdf/2003.11851v1.pdf
PWC	https://paperswithcode.com/paper/coronary-artery-segmentation-in-angiographic
Repo
Framework

Improving predictions by nonlinear regression models from outlying input data


Title	Improving predictions by nonlinear regression models from outlying input data
Authors	William W. Hsieh
Abstract	When applying machine learning/statistical methods to the environmental sciences, nonlinear regression (NLR) models often perform only slightly better and occasionally worse than linear regression (LR). The proposed reason for this conundrum is that NLR models can give predictions much worse than LR when given input data which lie outside the domain used in model training. Continuous unbounded variables are widely used in environmental sciences, whence not uncommon for new input data to lie far outside the training domain. For six environmental datasets, inputs in the test data were classified as “outliers” and “non-outliers” based on the Mahalanobis distance from the training input data. The prediction scores (mean absolute error, Spearman correlation) showed NLR to outperform LR for the non-outliers, but often underperform LR for the outliers. An approach based on Occam’s Razor (OR) was proposed, where linear extrapolation was used instead of nonlinear extrapolation for the outliers. The linear extrapolation to the outlier domain was based on the NLR model within the non-outlier domain. This NLR$_{\mathrm{OR}}$ approach reduced occurrences of very poor extrapolation by NLR, and it tended to outperform NLR and LR for the outliers. In conclusion, input test data should be screened for outliers. For outliers, the unreliable NLR predictions can be replaced by NLR$_{\mathrm{OR}}$ or LR predictions, or by issuing a “no reliable prediction” warning.
Tasks
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07926v1
PDF	https://arxiv.org/pdf/2003.07926v1.pdf
PWC	https://paperswithcode.com/paper/improving-predictions-by-nonlinear-regression
Repo
Framework

Unsupervised Temporal Video Segmentation as an Auxiliary Task for Predicting the Remaining Surgery Duration


Title	Unsupervised Temporal Video Segmentation as an Auxiliary Task for Predicting the Remaining Surgery Duration
Authors	Dominik Rivoir, Sebastian Bodenstedt, Felix von Bechtolsheim, Marius Distler, Jürgen Weitz, Stefanie Speidel
Abstract	Estimating the remaining surgery duration (RSD) during surgical procedures can be useful for OR planning and anesthesia dose estimation. With the recent success of deep learning-based methods in computer vision, several neural network approaches have been proposed for fully automatic RSD prediction based solely on visual data from the endoscopic camera. We investigate whether RSD prediction can be improved using unsupervised temporal video segmentation as an auxiliary learning task. As opposed to previous work, which presented supervised surgical phase recognition as auxiliary task, we avoid the need for manual annotations by proposing a similar but unsupervised learning objective which clusters video sequences into temporally coherent segments. In multiple experimental setups, results obtained by learning the auxiliary task are incorporated into a deep RSD model through feature extraction, pretraining or regularization. Further, we propose a novel loss function for RSD training which attempts to counteract unfavorable characteristics of the RSD ground truth. Using our unsupervised method as an auxiliary task for RSD training, we outperform other self-supervised methods and are comparable to the supervised state-of-the-art. Combined with the novel RSD loss, we slightly outperform the supervised approach.
Tasks	Auxiliary Learning, Video Semantic Segmentation
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11367v1
PDF	https://arxiv.org/pdf/2002.11367v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-temporal-video-segmentation-as
Repo
Framework

Wide Neural Networks with Bottlenecks are Deep Gaussian Processes


Title	Wide Neural Networks with Bottlenecks are Deep Gaussian Processes
Authors	Devanshu Agrawal, Theodore Papamarkou, Jacob Hinkle
Abstract	There has recently been much work on the “wide limit” of neural networks, where Bayesian neural networks (BNNs) are shown to converge to a Gaussian process (GP) as all hidden layers are sent to infinite width. However, these results do not apply to architectures that require one or more of the hidden layers to remain narrow. In this paper, we consider the wide limit of BNNs where some hidden layers, called “bottlenecks”, are held at finite width. The result is a composition of GPs that we term a “bottleneck neural network Gaussian process” (bottleneck NNGP). Although intuitive, the subtlety of the proof is in showing that the wide limit of a composition of networks is in fact the composition of the limiting GPs. We also analyze theoretically a single-bottleneck NNGP, finding that the bottleneck induces dependence between the outputs of a multi-output network that persists through extreme post-bottleneck depths, and prevents the kernel of the network from losing discriminative power at extreme post-bottleneck depths.
Tasks	Gaussian Processes
Published	2020-01-03
URL	https://arxiv.org/abs/2001.00921v2
PDF	https://arxiv.org/pdf/2001.00921v2.pdf
PWC	https://paperswithcode.com/paper/wide-neural-networks-with-bottlenecks-are
Repo
Framework