April 3, 2020

3289 words 16 mins read

Paper Group ANR 44

Deep Learning Approach to Diabetic Retinopathy Detection. Conditional Channel Gated Networks for Task-Aware Continual Learning. An Empirical Study of Person Re-Identification with Attributes. Safety-Aware Hardening of 3D Object Detection Neural Network Systems. Convergence of a Stochastic Gradient Method with Momentum for Nonsmooth Nonconvex Optimi …

Deep Learning Approach to Diabetic Retinopathy Detection


Title	Deep Learning Approach to Diabetic Retinopathy Detection
Authors	Borys Tymchenko, Philip Marchenko, Dmitry Spodarets
Abstract	Diabetic retinopathy is one of the most threatening complications of diabetes that leads to permanent blindness if left untreated. One of the essential challenges is early detection, which is very important for treatment success. Unfortunately, the exact identification of the diabetic retinopathy stage is notoriously tricky and requires expert human interpretation of fundus images. Simplification of the detection step is crucial and can help millions of people. Convolutional neural networks (CNN) have been successfully applied in many adjacent subjects, and for diagnosis of diabetic retinopathy itself. However, the high cost of big labeled datasets, as well as inconsistency between different doctors, impede the performance of these methods. In this paper, we propose an automatic deep-learning-based method for stage detection of diabetic retinopathy by single photography of the human fundus. Additionally, we propose the multistage approach to transfer learning, which makes use of similar datasets with different labeling. The presented method can be used as a screening method for early detection of diabetic retinopathy with sensitivity and specificity of 0.99 and is ranked 54 of 2943 competing methods (quadratic weighted kappa score of 0.925466) on APTOS 2019 Blindness Detection Dataset (13000 images).
Tasks	Diabetic Retinopathy Detection, Transfer Learning
Published	2020-03-03
URL	https://arxiv.org/abs/2003.02261v1
PDF	https://arxiv.org/pdf/2003.02261v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-approach-to-diabetic
Repo
Framework

Conditional Channel Gated Networks for Task-Aware Continual Learning


Title	Conditional Channel Gated Networks for Task-Aware Continual Learning
Authors	Davide Abati, Jakub Tomczak, Tijmen Blankevoort, Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi
Abstract	Convolutional Neural Networks experience catastrophic forgetting when optimized on a sequence of learning problems: as they meet the objective of the current training examples, their performance on previous tasks drops drastically. In this work, we introduce a novel framework to tackle this problem with conditional computation. We equip each convolutional layer with task-specific gating modules, selecting which filters to apply on the given input. This way, we achieve two appealing properties. Firstly, the execution patterns of the gates allow to identify and protect important filters, ensuring no loss in the performance of the model for previously learned tasks. Secondly, by using a sparsity objective, we can promote the selection of a limited set of kernels, allowing to retain sufficient model capacity to digest new tasks.Existing solutions require, at test time, awareness of the task to which each example belongs to. This knowledge, however, may not be available in many practical scenarios. Therefore, we additionally introduce a task classifier that predicts the task label of each example, to deal with settings in which a task oracle is not available. We validate our proposal on four continual learning datasets. Results show that our model consistently outperforms existing methods both in the presence and the absence of a task oracle. Notably, on Split SVHN and Imagenet-50 datasets, our model yields up to 23.98% and 17.42% improvement in accuracy w.r.t. competing methods.
Tasks	Continual Learning
Published	2020-03-31
URL	https://arxiv.org/abs/2004.00070v1
PDF	https://arxiv.org/pdf/2004.00070v1.pdf
PWC	https://paperswithcode.com/paper/conditional-channel-gated-networks-for-task
Repo
Framework

An Empirical Study of Person Re-Identification with Attributes


Title	An Empirical Study of Person Re-Identification with Attributes
Authors	Vikram Shree, Wei-Lun Chao, Mark Campbell
Abstract	Person re-identification aims to identify a person from an image collection, given one image of that person as the query. There is, however, a plethora of real-life scenarios where we may not have a priori library of query images and therefore must rely on information from other modalities. In this paper, an attribute-based approach is proposed where the person of interest (POI) is described by a set of visual attributes, which are used to perform the search. We compare multiple algorithms and analyze how the quality of attributes impacts the performance. While prior work mostly relies on high precision attributes annotated by experts, we conduct a human-subject study and reveal that certain visual attributes could not be consistently described by human observers, making them less reliable in real applications. A key conclusion is that the performance achieved by non-expert attributes, instead of expert-annotated ones, is a more faithful indicator of the status quo of attribute-based approaches for person re-identification.
Tasks	Person Re-Identification
Published	2020-01-25
URL	https://arxiv.org/abs/2002.03752v1
PDF	https://arxiv.org/pdf/2002.03752v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-of-person-re
Repo
Framework

Safety-Aware Hardening of 3D Object Detection Neural Network Systems


Title	Safety-Aware Hardening of 3D Object Detection Neural Network Systems
Authors	Chih-Hong Cheng
Abstract	We study how state-of-the-art neural networks for 3D object detection using a single-stage pipeline can be made safety aware. We start with the safety specification (reflecting the capability of other components) that partitions the 3D input space by criticality, where the critical area employs a separate criterion on robustness under perturbation, quality of bounding boxes, and the tolerance over false negatives demonstrated on the training set. In the architecture design, we consider symbolic error propagation to allow feature-level perturbation. Subsequently, we introduce a specialized loss function reflecting (1) the safety specification, (2) the use of single-stage detection architecture, and finally, (3) the characterization of robustness under perturbation. We also replace the commonly seen non-max-suppression post-processing algorithm by a safety-aware non-max-inclusion algorithm, in order to maintain the safety claim created by the neural network. The concept is detailed by extending the state-of-the-art PIXOR detector which creates object bounding boxes in bird’s eye view with inputs from point clouds.
Tasks	3D Object Detection, Object Detection
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11242v3
PDF	https://arxiv.org/pdf/2003.11242v3.pdf
PWC	https://paperswithcode.com/paper/safety-aware-hardening-of-3d-object-detection
Repo
Framework

Convergence of a Stochastic Gradient Method with Momentum for Nonsmooth Nonconvex Optimization


Title	Convergence of a Stochastic Gradient Method with Momentum for Nonsmooth Nonconvex Optimization
Authors	Vien V. Mai, Mikael Johansson
Abstract	Stochastic gradient methods with momentum are widely used in applications and at the core of optimization subroutines in many popular machine learning libraries. However, their sample complexities have never been obtained for problems that are non-convex and non-smooth. This paper establishes the convergence rate of a stochastic subgradient method with a momentum term of Polyak type for a broad class of non-smooth, non-convex, and constrained optimization problems. Our key innovation is the construction of a special Lyapunov function for which the proven complexity can be achieved without any tunning of the momentum parameter. For smooth problems, we extend the known complexity bound to the constrained case and demonstrate how the unconstrained case can be analyzed under weaker assumptions than the state-of-the-art. Numerical results confirm our theoretical developments.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05466v1
PDF	https://arxiv.org/pdf/2002.05466v1.pdf
PWC	https://paperswithcode.com/paper/convergence-of-a-stochastic-gradient-method
Repo
Framework

Artificial Intelligence for Digital Agriculture at Scale: Techniques, Policies, and Challenges


Title	Artificial Intelligence for Digital Agriculture at Scale: Techniques, Policies, and Challenges
Authors	Somali Chaterji, Nathan DeLay, John Evans, Nathan Mosier, Bernard Engel, Dennis Buckmaster, Ranveer Chandra
Abstract	Digital agriculture has the promise to transform agricultural throughput. It can do this by applying data science and engineering for mapping input factors to crop throughput, while bounding the available resources. In addition, as the data volumes and varieties increase with the increase in sensor deployment in agricultural fields, data engineering techniques will also be instrumental in collection of distributed data as well as distributed processing of the data. These have to be done such that the latency requirements of the end users and applications are satisfied. Understanding how farm technology and big data can improve farm productivity can significantly increase the world’s food production by 2050 in the face of constrained arable land and with the water levels receding. While much has been written about digital agriculture’s potential, little is known about the economic costs and benefits of these emergent systems. In particular, the on-farm decision making processes, both in terms of adoption and optimal implementation, have not been adequately addressed. For example, if some algorithm needs data from multiple data owners to be pooled together, that raises the question of data ownership. This paper is the first one to bring together the important questions that will guide the end-to-end pipeline for the evolution of a new generation of digital agricultural solutions, driving the next revolution in agriculture and sustainability under one umbrella.
Tasks	Decision Making
Published	2020-01-21
URL	https://arxiv.org/abs/2001.09786v1
PDF	https://arxiv.org/pdf/2001.09786v1.pdf
PWC	https://paperswithcode.com/paper/artificial-intelligence-for-digital
Repo
Framework

Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network


Title	Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network
Authors	Mary Gooneratne, Khe Chai Sim, Petr Zadrazil, Andreas Kabel, Françoise Beaufays, Giovanni Motta
Abstract	Training machine learning models on mobile devices has the potential of improving both privacy and accuracy of the models. However, one of the major obstacles to achieving this goal is the memory limitation of mobile devices. Reducing training memory enables models with high-dimensional weight matrices, like automatic speech recognition (ASR) models, to be trained on-device. In this paper, we propose approximating the gradient matrices of deep neural networks using a low-rank parameterization as an avenue to save training memory. The low-rank gradient approximation enables more advanced, memory-intensive optimization techniques to be run on device. Our experimental results show that we can reduce the training memory by about 33.0% for Adam optimization. It uses comparable memory to momentum optimization and achieves a 4.5% relative lower word error rate on an ASR personalization task.
Tasks	Speech Recognition
Published	2020-01-24
URL	https://arxiv.org/abs/2001.08885v1
PDF	https://arxiv.org/pdf/2001.08885v1.pdf
PWC	https://paperswithcode.com/paper/low-rank-gradient-approximation-for-memory
Repo
Framework

Data Techniques For Online End-to-end Speech Recognition


Title	Data Techniques For Online End-to-end Speech Recognition
Authors	Yang Chen, Weiran Wang, I-Fan Chen, Chao Wang
Abstract	Practitioners often need to build ASR systems for new use cases in a short amount of time, given limited in-domain data. While recently developed end-to-end methods largely simplify the modeling pipelines, they still suffer from the data sparsity issue. In this work, we explore a few simple-to-implement techniques for building online ASR systems in an end-to-end fashion, with a small amount of transcribed data in the target domain. These techniques include data augmentation in the target domain, domain adaptation using models previously trained on a large source domain, and knowledge distillation on non-transcribed target domain data; they are applicable in real scenarios with different types of resources. Our experiments demonstrate that each technique is independently useful in the low-resource setting, and combining them yields significant improvement of the online ASR performance in the target domain.
Tasks	Data Augmentation, Domain Adaptation, End-To-End Speech Recognition, Speech Recognition
Published	2020-01-24
URL	https://arxiv.org/abs/2001.09221v1
PDF	https://arxiv.org/pdf/2001.09221v1.pdf
PWC	https://paperswithcode.com/paper/data-techniques-for-online-end-to-end-speech
Repo
Framework

TLT-school: a Corpus of Non Native Children Speech


Title	TLT-school: a Corpus of Non Native Children Speech
Authors	Roberto Gretter, Marco Matassoni, Stefano Bannò, Daniele Falavigna
Abstract	This paper describes “TLT-school” a corpus of speech utterances collected in schools of northern Italy for assessing the performance of students learning both English and German. The corpus was recorded in the years 2017 and 2018 from students aged between nine and sixteen years, attending primary, middle and high school. All utterances have been scored, in terms of some predefined proficiency indicators, by human experts. In addition, most of utterances recorded in 2017 have been manually transcribed carefully. Guidelines and procedures used for manual transcriptions of utterances will be described in detail, as well as results achieved by means of an automatic speech recognition system developed by us. Part of the corpus is going to be freely distributed to scientific community particularly interested both in non-native speech recognition and automatic assessment of second language proficiency.
Tasks	Speech Recognition
Published	2020-01-22
URL	https://arxiv.org/abs/2001.08051v1
PDF	https://arxiv.org/pdf/2001.08051v1.pdf
PWC	https://paperswithcode.com/paper/tlt-school-a-corpus-of-non-native-children
Repo
Framework

A Novel and Efficient Tumor Detection Framework for Pancreatic Cancer via CT Images


Title	A Novel and Efficient Tumor Detection Framework for Pancreatic Cancer via CT Images
Authors	Zhengdong Zhang, Shuai Li, Ziyang Wang, Yun Lu
Abstract	As Deep Convolutional Neural Networks (DCNNs) have shown robust performance and results in medical image analysis, a number of deep-learning-based tumor detection methods were developed in recent years. Nowadays, the automatic detection of pancreatic tumors using contrast-enhanced Computed Tomography (CT) is widely applied for the diagnosis and staging of pancreatic cancer. Traditional hand-crafted methods only extract low-level features. Normal convolutional neural networks, however, fail to make full use of effective context information, which causes inferior detection results. In this paper, a novel and efficient pancreatic tumor detection framework aiming at fully exploiting the context information at multiple scales is designed. More specifically, the contribution of the proposed method mainly consists of three components: Augmented Feature Pyramid networks, Self-adaptive Feature Fusion and a Dependencies Computation (DC) Module. A bottom-up path augmentation to fully extract and propagate low-level accurate localization information is established firstly. Then, the Self-adaptive Feature Fusion can encode much richer context information at multiple scales based on the proposed regions. Finally, the DC Module is specifically designed to capture the interaction information between proposals and surrounding tissues. Experimental results achieve competitive performance in detection with the AUC of 0.9455, which outperforms other state-of-the-art methods to our best of knowledge, demonstrating the proposed framework can detect the tumor of pancreatic cancer efficiently and accurately.
Tasks	Computed Tomography (CT)
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04493v1
PDF	https://arxiv.org/pdf/2002.04493v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-and-efficient-tumor-detection
Repo
Framework


Title	Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks
Authors	Tian Bian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang, Yu Rong, Junzhou Huang
Abstract	Social media has been developing rapidly in public due to its nature of spreading new information, which leads to rumors being circulated. Meanwhile, detecting rumors from such massive information in social media is becoming an arduous challenge. Therefore, some deep learning methods are applied to discover rumors through the way they spread, such as Recursive Neural Network (RvNN) and so on. However, these deep learning methods only take into account the patterns of deep propagation but ignore the structures of wide dispersion in rumor detection. Actually, propagation and dispersion are two crucial characteristics of rumors. In this paper, we propose a novel bi-directional graph model, named Bi-Directional Graph Convolutional Networks (Bi-GCN), to explore both characteristics by operating on both top-down and bottom-up propagation of rumors. It leverages a GCN with a top-down directed graph of rumor spreading to learn the patterns of rumor propagation, and a GCN with an opposite directed graph of rumor diffusion to capture the structures of rumor dispersion. Moreover, the information from the source post is involved in each layer of GCN to enhance the influences from the roots of rumors. Encouraging empirical results on several benchmarks confirm the superiority of the proposed method over the state-of-the-art approaches.
Tasks
Published	2020-01-17
URL	https://arxiv.org/abs/2001.06362v1
PDF	https://arxiv.org/pdf/2001.06362v1.pdf
PWC	https://paperswithcode.com/paper/rumor-detection-on-social-media-with-bi
Repo
Framework

Information-Theoretic Lower Bounds for Zero-Order Stochastic Gradient Estimation


Title	Information-Theoretic Lower Bounds for Zero-Order Stochastic Gradient Estimation
Authors	Abdulrahman Alabdulkareem, Jean Honorio
Abstract	In this paper we analyze the necessary number of samples to estimate the gradient of any multidimensional smooth (possibly non-convex) function in a zero-order stochastic oracle model. In this model, an estimator has access to noisy values of the function, in order to produce the estimate of the gradient. We also provide an analysis on the sufficient number of samples for the finite difference method, a classical technique in numerical linear algebra. For $T$ samples and $d$ dimensions, our information-theoretic lower bound is $\Omega(\sqrt{d/T})$. We show that the finite difference method has rate $O(d^{4/3}/\sqrt{T})$ for functions with zero third and higher order derivatives. Thus, the finite difference method is not minimax optimal, and therefore there is space for the development of better gradient estimation methods.
Tasks
Published	2020-03-31
URL	https://arxiv.org/abs/2003.13881v1
PDF	https://arxiv.org/pdf/2003.13881v1.pdf
PWC	https://paperswithcode.com/paper/information-theoretic-lower-bounds-for-zero
Repo
Framework

DeepSIC: Deep Soft Interference Cancellation for Multiuser MIMO Detection


Title	DeepSIC: Deep Soft Interference Cancellation for Multiuser MIMO Detection
Authors	Nir Shlezinger, Rong Fu, Yonina C. Eldar
Abstract	Digital receivers are required to recover the transmitted symbols from their observed channel output. In multiuser multiple-input multiple-output (MIMO) setups, where multiple symbols are simultaneously transmitted, accurate symbol detection is challenging. A family of algorithms capable of reliably recovering multiple symbols is based on interference cancellation. However, these methods assume that the channel is linear, a model which does not reflect many relevant channels, as well as require accurate channel state information (CSI), which may not be available. In this work we propose a multiuser MIMO receiver which learns to jointly detect in a data-driven fashion, without assuming a specific channel model or requiring CSI. In particular, we propose a data-driven implementation of the iterative soft interference cancellation (SIC) algorithm which we refer to as DeepSIC. The resulting symbol detector is based on integrating dedicated machine-learning (ML) methods into the iterative SIC algorithm. DeepSIC learns to carry out joint detection from a limited set of training samples without requiring the channel to be linear and its parameters to be known. Our numerical evaluations demonstrate that for linear channels with full CSI, DeepSIC approaches the performance of iterative SIC, which is comparable to the optimal performance, and outperforms previously proposed ML-based MIMO receivers. Furthermore, in the presence of CSI uncertainty, DeepSIC significantly outperforms model-based approaches. Finally, we show that DeepSIC accurately detects symbols in non-linear channels, where conventional iterative SIC fails even when accurate CSI is available.
Tasks
Published	2020-02-08
URL	https://arxiv.org/abs/2002.03214v1
PDF	https://arxiv.org/pdf/2002.03214v1.pdf
PWC	https://paperswithcode.com/paper/deepsic-deep-soft-interference-cancellation
Repo
Framework

A theory of independent mechanisms for extrapolation in generative models


Title	A theory of independent mechanisms for extrapolation in generative models
Authors	Michel Besserve, Rémy Sun, Dominik Janzing, Bernhard Schölkopf
Abstract	Deep generative models reproduce complex empirical data but cannot extrapolate to novel environments. An intuitive idea to promote extrapolation capabilities is to enforce the architecture to have the modular structure of a causal graphical model, where one can intervene on each module independently of the others in the graph. We develop a framework to formalize this intuition, using the principle of Independent Causal Mechanisms, and show how over-parameterization of generative neural networks can hinder extrapolation capabilities. Our experiments on the generation of human faces shows successive layers of a generator architecture implement independent mechanisms to some extent, allowing meaningful extrapolations. Finally, we illustrate that independence of mechanisms may be enforced during training to improve extrapolation.
Tasks
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00184v1
PDF	https://arxiv.org/pdf/2004.00184v1.pdf
PWC	https://paperswithcode.com/paper/a-theory-of-independent-mechanisms-for
Repo
Framework

A Reference Architecture for Plausible Threat Image Projection (TIP) Within 3D X-ray Computed Tomography Volumes


Title	A Reference Architecture for Plausible Threat Image Projection (TIP) Within 3D X-ray Computed Tomography Volumes
Authors	Qian Wang, Najla Megherbi, Toby P. Breckon
Abstract	Threat Image Projection (TIP) is a technique used in X-ray security baggage screening systems that superimposes a threat object signature onto a benign X-ray baggage image in a plausible and realistic manner. It has been shown to be highly effective in evaluating the ongoing performance of human operators, improving their vigilance and performance on threat detection. However, with the increasing use of 3D Computed Tomography (CT) in aviation security for both hold and cabin baggage screening a significant challenge arises in extending TIP to 3D CT volumes due to the difficulty in 3D CT volume segmentation and the proper insertion location determination. In this paper, we present an approach for 3D TIP in CT volumes targeting realistic and plausible threat object insertion within 3D CT baggage images. The proposed approach consists of dual threat (source) and baggage (target) volume segmentation, particle swarm optimisation based insertion determination and metal artefact generation. In addition, we propose a TIP quality score metric to evaluate the quality of generated TIP volumes. Qualitative evaluations on real 3D CT baggage imagery show that our approach is able to generate realistic and plausible TIP which are indiscernible from real CT volumes and the TIP quality scores are consistent with human evaluations.
Tasks	Computed Tomography (CT)
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05459v1
PDF	https://arxiv.org/pdf/2001.05459v1.pdf
PWC	https://paperswithcode.com/paper/a-reference-architecture-for-plausible-threat
Repo
Framework