October 18, 2019

3365 words 16 mins read

Paper Group ANR 507

A Hitchhiker’s Guide On Distributed Training of Deep Neural Networks. Pixel-level Reconstruction and Classification for Noisy Handwritten Bangla Characters. Transfer Learning for Clinical Time Series Analysis using Recurrent Neural Networks. Deep Learning Based Instance Segmentation in 3D Biomedical Images Using Weak Annotation. Towards Automatic S …

A Hitchhiker’s Guide On Distributed Training of Deep Neural Networks


Title	A Hitchhiker’s Guide On Distributed Training of Deep Neural Networks
Authors	Karanbir Chahal, Manraj Singh Grover, Kuntal Dey
Abstract	Deep learning has led to tremendous advancements in the field of Artificial Intelligence. One caveat however is the substantial amount of compute needed to train these deep learning models. Training a benchmark dataset like ImageNet on a single machine with a modern GPU can take upto a week, distributing training on multiple machines has been observed to drastically bring this time down. Recent work has brought down ImageNet training time to a time as low as 4 minutes by using a cluster of 2048 GPUs. This paper surveys the various algorithms and techniques used to distribute training and presents the current state of the art for a modern distributed training framework. More specifically, we explore the synchronous and asynchronous variants of distributed Stochastic Gradient Descent, various All Reduce gradient aggregation strategies and best practices for obtaining higher throughout and lower latency over a cluster such as mixed precision training, large batch training and gradient compression.
Tasks
Published	2018-10-28
URL	http://arxiv.org/abs/1810.11787v1
PDF	http://arxiv.org/pdf/1810.11787v1.pdf
PWC	https://paperswithcode.com/paper/a-hitchhikers-guide-on-distributed-training
Repo
Framework

Pixel-level Reconstruction and Classification for Noisy Handwritten Bangla Characters


Title	Pixel-level Reconstruction and Classification for Noisy Handwritten Bangla Characters
Authors	Manohar Karki, Qun Liu, Robert DiBiano, Saikat Basu, Supratik Mukhopadhyay
Abstract	Classification techniques for images of handwritten characters are susceptible to noise. Quadtrees can be an efficient representation for learning from sparse features. In this paper, we improve the effectiveness of probabilistic quadtrees by using a pixel level classifier to extract the character pixels and remove noise from handwritten character images. The pixel level denoiser (a deep belief network) uses the map responses obtained from a pretrained CNN as features for reconstructing the characters eliminating noise. We experimentally demonstrate the effectiveness of our approach by reconstructing and classifying a noisy version of handwritten Bangla Numeral and Basic Character datasets.
Tasks
Published	2018-06-21
URL	http://arxiv.org/abs/1806.08037v1
PDF	http://arxiv.org/pdf/1806.08037v1.pdf
PWC	https://paperswithcode.com/paper/pixel-level-reconstruction-and-classification
Repo
Framework

Transfer Learning for Clinical Time Series Analysis using Recurrent Neural Networks


Title	Transfer Learning for Clinical Time Series Analysis using Recurrent Neural Networks
Authors	Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, Gautam Shroff
Abstract	Deep neural networks have shown promising results for various clinical prediction tasks such as diagnosis, mortality prediction, predicting duration of stay in hospital, etc. However, training deep networks – such as those based on Recurrent Neural Networks (RNNs) – requires large labeled data, high computational resources, and significant hyperparameter tuning effort. In this work, we investigate as to what extent can transfer learning address these issues when using deep RNNs to model multivariate clinical time series. We consider transferring the knowledge captured in an RNN trained on several source tasks simultaneously using a large labeled dataset to build the model for a target task with limited labeled data. An RNN pre-trained on several tasks provides generic features, which are then used to build simpler linear models for new target tasks without training task-specific RNNs. For evaluation, we train a deep RNN to identify several patient phenotypes on time series from MIMIC-III database, and then use the features extracted using that RNN to build classifiers for identifying previously unseen phenotypes, and also for a seemingly unrelated task of in-hospital mortality. We demonstrate that (i) models trained on features extracted using pre-trained RNN outperform or, in the worst case, perform as well as task-specific RNNs; (ii) the models using features from pre-trained models are more robust to the size of labeled data than task-specific RNNs; and (iii) features extracted using pre-trained RNN are generic enough and perform better than typical statistical hand-crafted features.
Tasks	Mortality Prediction, Time Series, Time Series Analysis, Transfer Learning
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01705v1
PDF	http://arxiv.org/pdf/1807.01705v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-for-clinical-time-series
Repo
Framework

Deep Learning Based Instance Segmentation in 3D Biomedical Images Using Weak Annotation


Title	Deep Learning Based Instance Segmentation in 3D Biomedical Images Using Weak Annotation
Authors	Zhuo Zhao, Lin Yang, Hao Zheng, Ian H. Guldner, Siyuan Zhang, Danny Z. Chen
Abstract	Instance segmentation in 3D images is a fundamental task in biomedical image analysis. While deep learning models often work well for 2D instance segmentation, 3D instance segmentation still faces critical challenges, such as insufficient training data due to various annotation difficulties in 3D biomedical images. Common 3D annotation methods (e.g., full voxel annotation) incur high workloads and costs for labeling enough instances for training deep learning 3D instance segmentation models. In this paper, we propose a new weak annotation approach for training a fast deep learning 3D instance segmentation model without using full voxel mask annotation. Our approach needs only 3D bounding boxes for all instances and full voxel annotation for a small fraction of the instances, and uses a novel two-stage 3D instance segmentation model utilizing these two kinds of annotation, respectively. We evaluate our approach on several biomedical image datasets, and the experimental results show that (1) with full annotated boxes and a small amount of masks, our approach can achieve similar performance as the best known methods using full annotation, and (2) with similar annotation time, our approach outperforms the best known methods that use full annotation.
Tasks	3D Instance Segmentation, Instance Segmentation, Semantic Segmentation
Published	2018-06-28
URL	http://arxiv.org/abs/1806.11137v1
PDF	http://arxiv.org/pdf/1806.11137v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-instance-segmentation-in
Repo
Framework

Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI


Title	Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Authors	Pramit Saha, Praneeth Srungarapu, Sidney Fels
Abstract	Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.
Tasks	Temporal Action Localization, Video Classification
Published	2018-07-29
URL	http://arxiv.org/abs/1807.11089v1
PDF	http://arxiv.org/pdf/1807.11089v1.pdf
PWC	https://paperswithcode.com/paper/towards-automatic-speech-identification-from
Repo
Framework

CNN-based Preprocessing to Optimize Watershed-based Cell Segmentation in 3D Confocal Microscopy Images


Title	CNN-based Preprocessing to Optimize Watershed-based Cell Segmentation in 3D Confocal Microscopy Images
Authors	Dennis Eschweiler, Thiago V. Spina, Rohan C. Choudhury, Elliot Meyerowitz, Alexandre Cunha, Johannes Stegmaier
Abstract	The quantitative analysis of cellular membranes helps understanding developmental processes at the cellular level. Particularly 3D microscopic image data offers valuable insights into cell dynamics, but error-free automatic segmentation remains challenging due to the huge amount of data generated and strong variations in image intensities. In this paper, we propose a new 3D segmentation approach which combines the discriminative power of convolutional neural networks (CNNs) for preprocessing and investigates the performance of three watershed-based postprocessing strategies (WS), which are well suited to segment object shapes, even when supplied with vague seed and boundary constraints. To leverage the full potential of the watershed algorithm, the multi-instance segmentation problem is initially interpreted as three-class semantic segmentation problem, which in turn is well-suited for the application of CNNs. Using manually annotated 3D confocal microscopy images of Arabidopsis thaliana, we show the superior performance of the proposed method compared to the state of the art.
Tasks	Cell Segmentation, Instance Segmentation, Semantic Segmentation
Published	2018-10-16
URL	http://arxiv.org/abs/1810.06933v1
PDF	http://arxiv.org/pdf/1810.06933v1.pdf
PWC	https://paperswithcode.com/paper/cnn-based-preprocessing-to-optimize-watershed
Repo
Framework

Spatio-Temporal Road Scene Reconstruction using Superpixel Markov Random Field


Title	Spatio-Temporal Road Scene Reconstruction using Superpixel Markov Random Field
Authors	Yaochen Li, Yuehu Liu, Jihua Zhu, Shiqi Ma, Zhenning Niu, Rui Guo
Abstract	Scene model construction based on image rendering is an indispensable but challenging technique in computer vision and intelligent transportation systems. In this paper, we propose a framework for constructing 3D corridor-based road scene models. This consists of two successive stages: road detection and scene construction. The road detection is realized by a new superpixel Markov random field (MRF) algorithm. The data fidelity term in the MRF’s energy function is jointly computed according to the superpixel features of color, texture and location. The smoothness term is established on the basis of the interaction of spatio-temporally adjacent superpixels. In the subsequent scene construction, the foreground and background regions are modeled independently. Experiments for road detection demonstrate the proposed method outperforms the state-of-the-art in both accuracy and speed. The scene construction experiments confirm that the proposed scene models show better correctness ratios, and have the potential to support a range of applications.
Tasks
Published	2018-11-24
URL	https://arxiv.org/abs/1811.09790v3
PDF	https://arxiv.org/pdf/1811.09790v3.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-road-scene-reconstruction
Repo
Framework

Learning Curriculum Policies for Reinforcement Learning


Title	Learning Curriculum Policies for Reinforcement Learning
Authors	Sanmit Narvekar, Peter Stone
Abstract	Curriculum learning in reinforcement learning is a training methodology that seeks to speed up learning of a difficult target task, by first training on a series of simpler tasks and transferring the knowledge acquired to the target task. Automatically choosing a sequence of such tasks (i.e. a curriculum) is an open problem that has been the subject of much recent work in this area. In this paper, we build upon a recent method for curriculum design, which formulates the curriculum sequencing problem as a Markov Decision Process. We extend this model to handle multiple transfer learning algorithms, and show for the first time that a curriculum policy over this MDP can be learned from experience. We explore various representations that make this possible, and evaluate our approach by learning curriculum policies for multiple agents in two different domains. The results show that our method produces curricula that can train agents to perform on a target task as fast or faster than existing methods.
Tasks	Transfer Learning
Published	2018-12-01
URL	http://arxiv.org/abs/1812.00285v1
PDF	http://arxiv.org/pdf/1812.00285v1.pdf
PWC	https://paperswithcode.com/paper/learning-curriculum-policies-for
Repo
Framework

Automatic Estimation of Modulation Transfer Functions


Title	Automatic Estimation of Modulation Transfer Functions
Authors	Matthias Bauer, Valentin Volchkov, Michael Hirsch, Bernhard Schölkopf
Abstract	The modulation transfer function (MTF) is widely used to characterise the performance of optical systems. Measuring it is costly and it is thus rarely available for a given lens specimen. Instead, MTFs based on simulations or, at best, MTFs measured on other specimens of the same lens are used. Fortunately, images recorded through an optical system contain ample information about its MTF, only that it is confounded with the statistics of the images. This work presents a method to estimate the MTF of camera lens systems directly from photographs, without the need for expensive equipment. We use a custom grid display to accurately measure the point response of lenses to acquire ground truth training data. We then use the same lenses to record natural images and employ a data-driven supervised learning approach using a convolutional neural network to estimate the MTF on small image patches, aggregating the information into MTF charts over the entire field of view. It generalises to unseen lenses and can be applied for single photographs, with the performance improving if multiple photographs are available.
Tasks
Published	2018-05-04
URL	http://arxiv.org/abs/1805.01872v1
PDF	http://arxiv.org/pdf/1805.01872v1.pdf
PWC	https://paperswithcode.com/paper/automatic-estimation-of-modulation-transfer
Repo
Framework

Gotta Adapt ‘Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild


Title	Gotta Adapt ‘Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild
Authors	Luan Tran, Kihyuk Sohn, Xiang Yu, Xiaoming Liu, Manmohan Chandraker
Abstract	Recent developments in deep domain adaptation have allowed knowledge transfer from a labeled source domain to an unlabeled target domain at the level of intermediate features or input pixels. We propose that advantages may be derived by combining them, in the form of different insights that lead to a novel design and complementary properties that result in better performance. At the feature level, inspired by insights from semi-supervised learning, we propose a classification-aware domain adversarial neural network that brings target examples into more classifiable regions of source domain. Next, we posit that computer vision insights are more amenable to injection at the pixel level. In particular, we use 3D geometry and image synthesis based on a generalized appearance flow to preserve identity across pose transformations, while using an attribute-conditioned CycleGAN to translate a single source into multiple target images that differ in lower-level properties such as lighting. Besides standard UDA benchmark, we validate on a novel and apt problem of car recognition in unlabeled surveillance images using labeled images from the web, handling explicitly specified, nameable factors of variation through pixel-level and implicit, unspecified factors through feature-level adaptation.
Tasks	Domain Adaptation, Image Generation, Transfer Learning
Published	2018-02-28
URL	https://arxiv.org/abs/1803.00068v2
PDF	https://arxiv.org/pdf/1803.00068v2.pdf
PWC	https://paperswithcode.com/paper/joint-pixel-and-feature-level-domain
Repo
Framework

Online Newton Step Algorithm with Estimated Gradient


Title	Online Newton Step Algorithm with Estimated Gradient
Authors	Binbin Liu, Jundong Li, Yunquan Song, Xijun Liang, Ling Jian, Huan Liu
Abstract	Online learning with limited information feedback (bandit) tries to solve the problem where an online learner receives partial feedback information from the environment in the course of learning. Under this setting, Flaxman et al.[8] extended Zinkevich’s classical Online Gradient Descent (OGD) algorithm [29] by proposing the Online Gradient Descent with Expected Gradient (OGDEG) algorithm. Specifically, it uses a simple trick to approximate the gradient of the loss function $f_t$ by evaluating it at a single point and bounds the expected regret as $\mathcal{O}(T^{5/6})$ [8], where the number of rounds is $T$. Meanwhile, past research efforts have shown that compared with the first-order algorithms, second-order online learning algorithms such as Online Newton Step (ONS) [11] can significantly accelerate the convergence rate of traditional online learning algorithms. Motivated by this, this paper aims to exploit the second-order information to speed up the convergence of the OGDEG algorithm. In particular, we extend the ONS algorithm with the trick of expected gradient and develop a novel second-order online learning algorithm, i.e., Online Newton Step with Expected Gradient (ONSEG). Theoretically, we show that the proposed ONSEG algorithm significantly reduces the expected regret of OGDEG algorithm from $\mathcal{O}(T^{5/6})$ to $\mathcal{O}(T^{2/3})$ in the bandit feedback scenario. Empirically, we further demonstrate the advantages of the proposed algorithm on multiple real-world datasets.
Tasks
Published	2018-11-25
URL	http://arxiv.org/abs/1811.09955v3
PDF	http://arxiv.org/pdf/1811.09955v3.pdf
PWC	https://paperswithcode.com/paper/online-newton-step-algorithm-with-estimated
Repo
Framework

Meta Reinforcement Learning with Distribution of Exploration Parameters Learned by Evolution Strategies


Title	Meta Reinforcement Learning with Distribution of Exploration Parameters Learned by Evolution Strategies
Authors	Yiming Shen, Kehan Yang, Yufeng Yuan, Simon Cheng Liu
Abstract	In this paper, we propose a novel meta-learning method in a reinforcement learning setting, based on evolution strategies (ES), exploration in parameter space and deterministic policy gradients. ES methods are easy to parallelize, which is desirable for modern training architectures; however, such methods typically require a huge number of samples for effective training. We use deterministic policy gradients during adaptation and other techniques to compensate for the sample-efficiency problem while maintaining the inherent scalability of ES methods. We demonstrate that our method achieves good results compared to gradient-based meta-learning in high-dimensional control tasks in the MuJoCo simulator. In addition, because of gradient-free methods in the meta-training phase, which do not need information about gradients and policies in adaptation training, we predict and confirm our algorithm performs better in tasks that need multi-step adaptation.
Tasks	Meta-Learning
Published	2018-12-29
URL	https://arxiv.org/abs/1812.11314v2
PDF	https://arxiv.org/pdf/1812.11314v2.pdf
PWC	https://paperswithcode.com/paper/meta-reinforcement-learning-with-distribution
Repo
Framework

One-Bit OFDM Receivers via Deep Learning


Title	One-Bit OFDM Receivers via Deep Learning
Authors	Eren Balevi, Jeffrey G. Andrews
Abstract	This paper develops novel deep learning-based architectures and design methodologies for an orthogonal frequency division multiplexing (OFDM) receiver under the constraint of one-bit complex quantization. Single bit quantization greatly reduces complexity and power consumption, but makes accurate channel estimation and data detection difficult. This is particularly true for multicarrier waveforms, which have high peak-to-average ratio in the time domain and fragile subcarrier orthogonality in the frequency domain. The severe distortion for one-bit quantization typically results in an error floor even at moderately low signal-to-noise-ratio (SNR) such as 5 dB. For channel estimation (using pilots), we design a novel generative supervised deep neural network (DNN) that can be trained with a reasonable number of pilots. After channel estimation, a neural network-based receiver – specifically, an autoencoder – jointly learns a precoder and decoder for data symbol detection. Since quantization prevents end-to-end training, we propose a two-step sequential training policy for this model. With synthetic data, our deep learning-based channel estimation can outperform least squares (LS) channel estimation for unquantized (full-resolution) OFDM at average SNRs up to 14 dB. For data detection, our proposed design achieves lower bit error rate (BER) in fading than unquantized OFDM at average SNRs up to 10 dB.
Tasks	Quantization
Published	2018-11-02
URL	https://arxiv.org/abs/1811.00971v2
PDF	https://arxiv.org/pdf/1811.00971v2.pdf
PWC	https://paperswithcode.com/paper/one-bit-ofdm-receivers-via-deep-learning
Repo
Framework

Contributions to the development of the CRO-SL algorithm: Engineering applications problems


Title	Contributions to the development of the CRO-SL algorithm: Engineering applications problems
Authors	Carlos Camacho-Gómez
Abstract	This Ph.D. thesis discusses advanced design issues of the evolutionary-based algorithm \textit{“Coral Reef Optimization”}, in its Substrate-Layer (CRO-SL) version, for optimization problems in Engineering Applications. The problems that can be tackled with meta-heuristic approaches is very wide and varied, and it is not exclusive of engineering. However we focus the Thesis on it area, one of the most prominent in our time. One of the proposed application is battery scheduling problem in Micro-Grids (MGs). Specifically, we consider an MG that includes renewable distributed generation and different loads, defined by its power profiles, and is equipped with an energy storage device (battery) to address its programming (duration of loading / discharging and occurrence) in a real scenario with variable electricity prices. Also, we discuss a problem of vibration cancellation over structures of two and four floors, using Tuned Mass Dampers (TMD’s). The optimization algorithm will try to find the best solution by obtaining three physical parameters and the TMD location. As another related application, CRO-SL is used to design Multi-Input-Multi-Output Active Vibration Control (MIMO-AVC) via inertial-mass actuators, for structures subjected to human induced vibration. In this problem, we will optimize the location of each actuator and tune control gains. Finally, we tackle the optimization of a textile modified meander-line Inverted-F Antenna (IFA) with variable width and spacing meander, for RFID systems. Specifically, the CRO-SL is used to obtain an optimal antenna design, with a good bandwidth and radiation pattern, ideal for RFID readers. Radio Frequency Identification (RFID) has become one of the most numerous manufactured devices worldwide due to a reliable and inexpensive means of locating people. They are used in access and money cards and product labels and many other applications.
Tasks
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10562v1
PDF	http://arxiv.org/pdf/1807.10562v1.pdf
PWC	https://paperswithcode.com/paper/contributions-to-the-development-of-the-cro
Repo
Framework

Dynamic Planning Networks


Title	Dynamic Planning Networks
Authors	Norman Tasfi, Miriam Capretz
Abstract	We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, that combines model-based and model-free aspects for online planning. Our architecture learns to dynamically construct plans using a learned state-transition model by selecting and traversing between simulated states and actions to maximize information before acting. In contrast to model-free methods, model-based planning lets the agent efficiently test action hypotheses without performing costly trial-and-error in the environment. DPN learns to efficiently form plans by expanding a single action-conditional state transition at a time instead of exhaustively evaluating each action, reducing the required number of state-transitions during planning by up to 96%. We observe various emergent planning patterns used to solve environments, including classical search methods such as breadth-first and depth-first search. DPN shows improved data efficiency, performance, and generalization to new and unseen domains in comparison to several baselines.
Tasks
Published	2018-12-28
URL	http://arxiv.org/abs/1812.11240v2
PDF	http://arxiv.org/pdf/1812.11240v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-planning-networks
Repo
Framework