April 2, 2020

3385 words 16 mins read

Paper Group ANR 245

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey. A Spike in Performance: Training Hybrid-Spiking Neural Networks with Quantized Activation Functions. Decision-Making with Auto-Encoding Variational Bayes. Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition. Joint Learning of Insta …

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey


Title	Communication-Efficient Distributed Deep Learning: A Comprehensive Survey
Authors	Zhenheng Tang, Shaohuai Shi, Xiaowen Chu, Wei Wang, Bo Li
Abstract	Distributed deep learning becomes very common to reduce the overall training time by exploiting multiple computing devices (e.g., GPUs/TPUs) as the size of deep models and data sets increases. However, data communication between computing devices could be a potential bottleneck to limit the system scalability. How to address the communication problem in distributed deep learning is becoming a hot research topic recently. In this paper, we provide a comprehensive survey of the communication-efficient distributed training algorithms in both system-level and algorithmic-level optimizations. In the system-level, we demystify the system design and implementation to reduce the communication cost. In algorithmic-level, we compare different algorithms with theoretical convergence bounds and communication complexity. Specifically, we first propose the taxonomy of data-parallel distributed training algorithms, which contains four main dimensions: communication synchronization, system architectures, compression techniques, and parallelism of communication and computing. Then we discuss the studies in addressing the problems of the four dimensions to compare the communication cost. We further compare the convergence rates of different algorithms, which enable us to know how fast the algorithms can converge to the solution in terms of iterations. According to the system-level communication cost analysis and theoretical convergence speed comparison, we provide the readers to understand what algorithms are more efficient under specific distributed environments and extrapolate potential directions for further optimizations.
Tasks
Published	2020-03-10
URL	https://arxiv.org/abs/2003.06307v1
PDF	https://arxiv.org/pdf/2003.06307v1.pdf
PWC	https://paperswithcode.com/paper/communication-efficient-distributed-deep
Repo
Framework

A Spike in Performance: Training Hybrid-Spiking Neural Networks with Quantized Activation Functions


Title	A Spike in Performance: Training Hybrid-Spiking Neural Networks with Quantized Activation Functions
Authors	Aaron R. Voelker, Daniel Rasmussen, Chris Eliasmith
Abstract	The machine learning community has become increasingly interested in the energy efficiency of neural networks. The Spiking Neural Network (SNN) is a promising approach to energy-efficient computing, since its activation levels are quantized into temporally sparse, one-bit values (i.e., “spike” events), which additionally converts the sum over weight-activity products into a simple addition of weights (one weight for each spike). However, the goal of maintaining state-of-the-art (SotA) accuracy when converting a non-spiking network into an SNN has remained an elusive challenge, primarily due to spikes having only a single bit of precision. Adopting tools from signal processing, we cast neural activation functions as quantizers with temporally-diffused error, and then train networks while smoothly interpolating between the non-spiking and spiking regimes. We apply this technique to the Legendre Memory Unit (LMU) to obtain the first known example of a hybrid SNN outperforming SotA recurrent architectures—including the LSTM, GRU, and NRU—in accuracy, while reducing activities to at most 3.74 bits on average with 1.26 significant bits multiplying each weight. We discuss how these methods can significantly improve the energy efficiency of neural networks.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03553v1
PDF	https://arxiv.org/pdf/2002.03553v1.pdf
PWC	https://paperswithcode.com/paper/a-spike-in-performance-training-hybrid
Repo
Framework

Decision-Making with Auto-Encoding Variational Bayes


Title	Decision-Making with Auto-Encoding Variational Bayes
Authors	Romain Lopez, Pierre Boyeau, Nir Yosef, Michael I. Jordan, Jeffrey Regier
Abstract	To make decisions based on a model fit by Auto-Encoding Variational Bayes (AEVB), practitioners typically use importance sampling to estimate a functional of the posterior distribution. The variational distribution found by AEVB serves as the proposal distribution for importance sampling. However, this proposal distribution may give unreliable (high variance) importance sampling estimates, thus leading to poor decisions. We explore how changing the objective function for learning the variational distribution, while continuing to learn the generative model based on the ELBO, affects the quality of downstream decisions. For a particular model, we characterize the error of importance sampling as a function of posterior variance and show that proposal distributions learned with evidence upper bounds are better. Motivated by these theoretical results, we propose a novel variant of the VAE. In addition to experimenting with MNIST, we present a full-fledged application of the proposed method to single-cell RNA sequencing. In this challenging instance of multiple hypothesis testing, the proposed method surpasses the current state of the art.
Tasks	Decision Making
Published	2020-02-17
URL	https://arxiv.org/abs/2002.07217v1
PDF	https://arxiv.org/pdf/2002.07217v1.pdf
PWC	https://paperswithcode.com/paper/decision-making-with-auto-encoding
Repo
Framework

Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition


Title	Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition
Authors	Jiaming Wang, Jun Du, Jianshu Zhang
Abstract	In this paper, we propose a novel stroke constrained attention network (SCAN) which treats stroke as the basic unit for encoder-decoder based online handwritten mathematical expression recognition (HMER). Unlike previous methods which use trace points or image pixels as basic units, SCAN makes full use of stroke-level information for better alignment and representation. The proposed SCAN can be adopted in both single-modal (online or offline) and multi-modal HMER. For single-modal HMER, SCAN first employs a CNN-GRU encoder to extract point-level features from input traces in online mode and employs a CNN encoder to extract pixel-level features from input images in offline mode, then use stroke constrained information to convert them into online and offline stroke-level features. Using stroke-level features can explicitly group points or pixels belonging to the same stroke, therefore reduces the difficulty of symbol segmentation and recognition via the decoder with attention mechanism. For multi-modal HMER, other than fusing multi-modal information in decoder, SCAN can also fuse multi-modal information in encoder by utilizing the stroke based alignments between online and offline modalities. The encoder fusion is a better way for combining multi-modal information as it implements the information interaction one step before the decoder fusion so that the advantages of multiple modalities can be exploited earlier and more adequately when training the encoder-decoder model. Evaluated on a benchmark published by CROHME competition, the proposed SCAN achieves the state-of-the-art performance.
Tasks
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08670v1
PDF	https://arxiv.org/pdf/2002.08670v1.pdf
PWC	https://paperswithcode.com/paper/stroke-constrained-attention-network-for
Repo
Framework

Joint Learning of Instance and Semantic Segmentation for Robotic Pick-and-Place with Heavy Occlusions in Clutter


Title	Joint Learning of Instance and Semantic Segmentation for Robotic Pick-and-Place with Heavy Occlusions in Clutter
Authors	Kentaro Wada, Kei Okada, Masayuki Inaba
Abstract	We present joint learning of instance and semantic segmentation for visible and occluded region masks. Sharing the feature extractor with instance occlusion segmentation, we introduce semantic occlusion segmentation into the instance segmentation model. This joint learning fuses the instance- and image-level reasoning of the mask prediction on the different segmentation tasks, which was missing in the previous work of learning instance segmentation only (instance-only). In the experiments, we evaluated the proposed joint learning comparing the instance-only learning on the test dataset. We also applied the joint learning model to 2 different types of robotic pick-and-place tasks (random and target picking) and evaluated its effectiveness to achieve real-world robotic tasks.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07481v1
PDF	https://arxiv.org/pdf/2001.07481v1.pdf
PWC	https://paperswithcode.com/paper/joint-learning-of-instance-and-semantic
Repo
Framework

FLAME: A Self-Adaptive Auto-labeling System for Heterogeneous Mobile Processors


Title	FLAME: A Self-Adaptive Auto-labeling System for Heterogeneous Mobile Processors
Authors	Jie Liu, Jiawen Liu, Zhen Xie, Dong Li
Abstract	How to accurately and efficiently label data on a mobile device is critical for the success of training machine learning models on mobile devices. Auto-labeling data on mobile devices is challenging, because data is usually incrementally generated and there is possibility of having unknown labels. Furthermore, the rich hardware heterogeneity on mobile devices creates challenges on efficiently executing auto-labeling workloads. In this paper, we introduce Flame, an auto-labeling system that can label non-stationary data with unknown labels. Flame includes a runtime system that efficiently schedules and executes auto-labeling workloads on heterogeneous mobile processors. Evaluating Flame with eight datasets on a smartphone, we demonstrate that Flame enables auto-labeling with high labeling accuracy and high performance.
Tasks
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01762v1
PDF	https://arxiv.org/pdf/2003.01762v1.pdf
PWC	https://paperswithcode.com/paper/flame-a-self-adaptive-auto-labeling-system
Repo
Framework

Reveal of Domain Effect: How Visual Restoration Contributes to Object Detection in Aquatic Scenes


Title	Reveal of Domain Effect: How Visual Restoration Contributes to Object Detection in Aquatic Scenes
Authors	Xingyu Chen, Yue Lu, Zhengxing Wu, Junzhi Yu, Li Wen
Abstract	Underwater robotic perception usually requires visual restoration and object detection, both of which have been studied for many years. Meanwhile, data domain has a huge impact on modern data-driven leaning process. However, exactly indicating domain effect, the relation between restoration and detection remains unclear. In this paper, we generally investigate the relation of quality-diverse data domain to detection performance. In the meantime, we unveil how visual restoration contributes to object detection in real-world underwater scenes. According to our analysis, five key discoveries are reported: 1) Domain quality has an ignorable effect on within-domain convolutional representation and detection accuracy; 2) low-quality domain leads to higher generalization ability in cross-domain detection; 3) low-quality domain can hardly be well learned in a domain-mixed learning process; 4) degrading recall efficiency, restoration cannot improve within-domain detection accuracy; 5) visual restoration is beneficial to detection in the wild by reducing the domain shift between training data and real-world scenes. Finally, as an illustrative example, we successfully perform underwater object detection with an aquatic robot.
Tasks	Object Detection
Published	2020-03-04
URL	https://arxiv.org/abs/2003.01913v1
PDF	https://arxiv.org/pdf/2003.01913v1.pdf
PWC	https://paperswithcode.com/paper/reveal-of-domain-effect-how-visual
Repo
Framework

Real-time Linear Operator Construction and State Estimation with The Kalman Filter


Title	Real-time Linear Operator Construction and State Estimation with The Kalman Filter
Authors	Tsuyoshi Ishizone, Kazuyuki Nakamura
Abstract	The Kalman filter is the most powerful tool for estimation of the states of a linear Gaussian system. In addition, using this method, an expectation maximization algorithm can be used to estimate the parameters of the model. However, this algorithm cannot function in real time. Thus, we propose a new method that can be used to estimate the transition matrices and the states of the system in real time. The proposed method uses three ideas: estimation in an observation space, a time-invariant interval, and an online learning framework. Applied to damped oscillation model, we have obtained extraordinary performance to estimate the matrices. In addition, by introducing localization and spatial uniformity to the proposed method, we have demonstrated that noise can be reduced in high-dimensional spatio-temporal data. Moreover, the proposed method has potential for use in areas such as weather forecasting and vector field analysis.
Tasks	Weather Forecasting
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11256v2
PDF	https://arxiv.org/pdf/2001.11256v2.pdf
PWC	https://paperswithcode.com/paper/real-time-linear-operator-construction-and
Repo
Framework

Adversarial Loss for Semantic Segmentation of Aerial Imagery


Title	Adversarial Loss for Semantic Segmentation of Aerial Imagery
Authors	Clint Sebastian, Raffaele Imbriaco, Egor Bondarev, Peter H. N. de With
Abstract	Automatic building extraction from aerial imagery has several applications in urban planning, disaster management, and change detection. In recent years, several works have adopted deep convolutional neural networks (CNNs) for building extraction, since they produce rich features that are invariant against lighting conditions, shadows, etc. Although several advances have been made, building extraction from aerial imagery still presents multiple challenges. Most of the deep learning segmentation methods optimize the per-pixel loss with respect to the ground truth without knowledge of the context. This often leads to imperfect outputs that may lead to missing or unrefined regions. In this work, we propose a novel loss function combining both adversarial and cross-entropy losses that learn to understand both local and global contexts for semantic segmentation. The newly proposed loss function deployed on the DeepLab v3+ network obtains state-of-the-art results on the Massachusetts buildings dataset. The loss function improves the structure and refines the edges of buildings without requiring any of the commonly used post-processing methods, such as Conditional Random Fields. We also perform ablation studies to understand the impact of the adversarial loss. Finally, the proposed method achieves a relaxed F1 score of 95.59% on the Massachusetts buildings dataset compared to the previous best F1 of 94.88%.
Tasks	Semantic Segmentation
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04269v2
PDF	https://arxiv.org/pdf/2001.04269v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-loss-for-semantic-segmentation-of
Repo
Framework

Defect segmentation: Mapping tunnel lining internal defects with ground penetrating radar data using a convolutional neural network


Title	Defect segmentation: Mapping tunnel lining internal defects with ground penetrating radar data using a convolutional neural network
Authors	Senlin Yang, Zhengfang Wang, Jing Wang, Anthony G. Cohn, Jiaqi Zhang, Peng Jiang, Peng Jiang, Qingmei Sui
Abstract	This research proposes a Ground Penetrating Radar (GPR) data processing method for non-destructive detection of tunnel lining internal defects, called defect segmentation. To perform this critical step of automatic tunnel lining detection, the method uses a CNN called Segnet combined with the Lov'asz softmax loss function to map the internal defect structure with GPR synthetic data, which improves the accuracy, automation and efficiency of defects detection. The novel method we present overcomes several difficulties of traditional GPR data interpretation as demonstrated by an evaluation on both synthetic and real datas – to verify the method on real data, a test model containing a known defect was designed and built and GPR data was obtained and analyzed.
Tasks
Published	2020-03-29
URL	https://arxiv.org/abs/2003.13120v1
PDF	https://arxiv.org/pdf/2003.13120v1.pdf
PWC	https://paperswithcode.com/paper/defect-segmentation-mapping-tunnel-lining
Repo
Framework

Learning What to Learn for Video Object Segmentation


Title	Learning What to Learn for Video Object Segmentation
Authors	Goutam Bhat, Felix Järemo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc Van Gool, Radu Timofte
Abstract	Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined during inference with a given first-frame reference mask. The problem of how to capture and utilize this limited target information remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module. This internal learner is designed to predict a powerful parametric model of the target by minimizing a segmentation error in the first frame. We further go beyond standard few-shot learning techniques by learning what the few-shot learner should learn. This allows us to achieve a rich internal representation of the target in the current frame, significantly increasing the segmentation accuracy of our approach. We perform extensive experiments on multiple benchmarks. Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding to a 2.6% relative improvement over the previous best result.
Tasks	Few-Shot Learning, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11540v1
PDF	https://arxiv.org/pdf/2003.11540v1.pdf
PWC	https://paperswithcode.com/paper/learning-what-to-learn-for-video-object
Repo
Framework

Assessment Modeling: Fundamental Pre-training Tasks for Interactive Educational Systems


Title	Assessment Modeling: Fundamental Pre-training Tasks for Interactive Educational Systems
Authors	Youngduck Choi, Youngnam Lee, Junghyun Cho, Jineon Baek, Dongmin Shin, Seewoo Lee, Youngmin Cha, Byungsoo Kim, Jaewe Heo
Abstract	Interactive Educational Systems (IESs) have developed rapidly in recent years to address the issue of quality and affordability of education. Analogous to other domains in AI, there are specific tasks of AIEd for which labels are scarce. For instance, labels like exam score and grade are considered important in educational and social context. However, obtaining the labels is costly as they require student actions taken outside the system. Likewise, while student events like course dropout and review correctness are automatically recorded by IESs, they are few in number as the events occur sporadically in practice. A common way of circumventing the label-scarcity problem is the pre-train/fine-tine method. Accordingly, existing works pre-train a model to learn representations of contents in learning items. However, such methods fail to utilize the student interaction data available and model student learning behavior. To this end, we propose assessment modeling, fundamental pre-training tasks for IESs. An assessment is a feature of student-system interactions which can act as pedagogical evaluation, such as student response correctness or timeliness. Assessment modeling is the prediction of assessments conditioned on the surrounding context of interactions. Although it is natural to pre-train interactive features available in large amount, narrowing down the prediction targets to assessments holds relevance to the label-scarce educational problems while reducing irrelevant noises. To the best of our knowledge, this is the first work investigating appropriate pre-training method of predicting educational features from student-system interactions. While the effectiveness of different combinations of assessments is open for exploration, we suggest assessment modeling as a guiding principle for selecting proper pre-training tasks for the label-scarce educational problems.
Tasks
Published	2020-01-01
URL	https://arxiv.org/abs/2002.05505v1
PDF	https://arxiv.org/pdf/2002.05505v1.pdf
PWC	https://paperswithcode.com/paper/assessment-modeling-fundamental-pre-training
Repo
Framework

Fast Video Object Segmentation using the Global Context Module


Title	Fast Video Object Segmentation using the Global Context Module
Authors	Yu Li, Zhuoran Shen, Ying Shan
Abstract	We developed a real-time, high-quality video object segmentation algorithm for semi-supervised video segmentation. Its performance is on par with the most accurate, time-consuming online-learning model, while its speed is similar to the fastest template-matching method which has sub-optimal accuracy. The core in achieving this is a novel global context module that reliably summarizes and propagates information through the entire video. Compared to previous approaches that only use the first, the last, or a select few frames to guide the segmentation of the current frame, the global context module allows us to use all past frames to guide the processing. Unlike the state-of-the-art space-time memory network that caches a memory at each spatiotemporal position, our global context module is a fixed-size representation that does not use more memory as more frames are processed. It is straightforward in implementation and has lower memory and computational costs than the space-time memory module. Equipped with the global context module, our method achieved top accuracy on DAVIS 2016 and near-state-of-the-art results on DAVIS 2017 at a real-time speed.
Tasks	Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11243v1
PDF	https://arxiv.org/pdf/2001.11243v1.pdf
PWC	https://paperswithcode.com/paper/fast-video-object-segmentation-using-the
Repo
Framework

Self-Tuning Deep Reinforcement Learning


Title	Self-Tuning Deep Reinforcement Learning
Authors	Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh
Abstract	Reinforcement learning (RL) algorithms often require expensive manual or automated hyperparameter searches in order to perform well on a new domain. This need is particularly acute in modern deep RL architectures which often incorporate many modules and multiple loss functions. In this paper, we take a step towards addressing this issue by using metagradients (Xu et al., 2018) to tune these hyperparameters via differentiable cross validation, whilst the agent interacts with and learns from the environment. We present the Self-Tuning Actor Critic (STAC) which uses this process to tune the hyperparameters of the usual loss function of the IMPALA actor critic agent(Espeholt et. al., 2018), to learn the hyperparameters that define auxiliary loss functions, and to balance trade offs in off policy learning by introducing and adapting the hyperparameters of a novel leaky V-trace operator. The method is simple to use, sample efficient and does not require significant increase in compute. Ablative studies show that the overall performance of STAC improves as we adapt more hyperparameters. When applied to 57 games on the Atari 2600 environment over 200 million frames our algorithm improves the median human normalized score of the baseline from 243% to 364%.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12928v2
PDF	https://arxiv.org/pdf/2002.12928v2.pdf
PWC	https://paperswithcode.com/paper/self-tuning-deep-reinforcement-learning
Repo
Framework

Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval


Title	Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval
Authors	Xinxun Xu, Cheng Deng, Muli Yang, Hao Wang
Abstract	Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal retrieval task for searching natural images given free-hand sketches under the zero-shot scenario. Most existing methods solve this problem by simultaneously projecting visual features and semantic supervision into a low-dimensional common space for efficient retrieval. However, such low-dimensional projection destroys the completeness of semantic knowledge in original semantic space, so that it is unable to transfer useful knowledge well when learning semantic from different modalities. Moreover, the domain information and semantic information are entangled in visual features, which is not conducive for cross-modal matching since it will hinder the reduction of domain gap between sketch and image. In this paper, we propose a Progressive Domain-independent Feature Decomposition (PDFD) network for ZS-SBIR. Specifically, with the supervision of original semantic knowledge, PDFD decomposes visual features into domain features and semantic ones, and then the semantic features are projected into common space as retrieval features for ZS-SBIR. The progressive projection strategy maintains strong semantic supervision. Besides, to guarantee the retrieval features to capture clean and complete semantic information, the cross-reconstruction loss is introduced to encourage that any combinations of retrieval features and domain features can reconstruct the visual features. Extensive experiments demonstrate the superiority of our PDFD over state-of-the-art competitors.
Tasks	Cross-Modal Retrieval, Image Retrieval, Sketch-Based Image Retrieval
Published	2020-03-22
URL	https://arxiv.org/abs/2003.09869v1
PDF	https://arxiv.org/pdf/2003.09869v1.pdf
PWC	https://paperswithcode.com/paper/progressive-domain-independent-feature
Repo
Framework