October 16, 2019

3470 words 17 mins read

Paper Group ANR 1119

Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction. KonIQ-10k: Towards an ecologically valid and large-scale IQA database. Deep Energy Estimator Networks. New Feature Detection Mechanism for Extended Kalman Filter Based Monocular SLAM with 1-Point RANSAC. Left Vent …

Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction


Title	Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction
Authors	Daniele Ravì, Agnieszka Barbara Szczotka, Dzhoshkun Ismail Shakir, Stephen P Pereira, Tom Vercauteren
Abstract	Purpose: Probe-based Confocal Laser Endomicroscopy (pCLE) is a recent imaging modality that allows performing in vivo optical biopsies. The design of pCLE hardware, and its reliance on an optical fibre bundle, fundamentally limits the image quality with a few tens of thousands fibres, each acting as the equivalent of a single-pixel detector, assembled into a single fibre bundle. Video-registration techniques can be used to estimate high-resolution (HR) images by exploiting the temporal information contained in a sequence of low-resolution (LR) images. However, the alignment of LR frames, required for the fusion, is computationally demanding and prone to artefacts. Methods: In this work, we propose a novel synthetic data generation approach to train exemplar-based Deep Neural Networks (DNNs). HR pCLE images with enhanced quality are recovered by the models trained on pairs of estimated HR images (generated by the video-registration algorithm) and realistic synthetic LR images. Performance of three different state-of-the-art DNNs techniques were analysed on a Smart Atlas database of 8806 images from 238 pCLE video sequences. The results were validated through an extensive Image Quality Assessment (IQA) that takes into account different quality scores, including a Mean Opinion Score (MOS). Results: Results indicate that the proposed solution produces an effective improvement in the quality of the obtained reconstructed image. Conclusion: The proposed training strategy and associated DNNs allows us to perform convincing super-resolution of pCLE images.
Tasks	Image Quality Assessment, Image Super-Resolution, Super-Resolution, Synthetic Data Generation
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08840v1
PDF	http://arxiv.org/pdf/1803.08840v1.pdf
PWC	https://paperswithcode.com/paper/effective-deep-learning-training-for-single
Repo
Framework

KonIQ-10k: Towards an ecologically valid and large-scale IQA database


Title	KonIQ-10k: Towards an ecologically valid and large-scale IQA database
Authors	Hanhe Lin, Vlad Hosu, Dietmar Saupe
Abstract	The main challenge in applying state-of-the-art deep learning methods to predict image quality in-the-wild is the relatively small size of existing quality scored datasets. The reason for the lack of larger datasets is the massive resources required in generating diverse and publishable content. We present a new systematic and scalable approach to create large-scale, authentic and diverse image datasets for Image Quality Assessment (IQA). We show how we built an IQA database, KonIQ-10k, consisting of 10,073 images, on which we performed very large scale crowdsourcing experiments in order to obtain reliable quality ratings from 1,467 crowd workers (1.2 million ratings). We argue for its ecological validity by analyzing the diversity of the dataset, by comparing it to state-of-the-art IQA databases, and by checking the reliability of our user studies.
Tasks	Image Quality Assessment
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08489v1
PDF	http://arxiv.org/pdf/1803.08489v1.pdf
PWC	https://paperswithcode.com/paper/koniq-10k-towards-an-ecologically-valid-and
Repo
Framework

Deep Energy Estimator Networks


Title	Deep Energy Estimator Networks
Authors	Saeed Saremi, Arash Mehrjou, Bernhard Schölkopf, Aapo Hyvärinen
Abstract	Density estimation is a fundamental problem in statistical learning. This problem is especially challenging for complex high-dimensional data due to the curse of dimensionality. A promising solution to this problem is given here in an inference-free hierarchical framework that is built on score matching. We revisit the Bayesian interpretation of the score function and the Parzen score matching, and construct a multilayer perceptron with a scalable objective for learning the energy (i.e. the unnormalized log-density), which is then optimized with stochastic gradient descent. In addition, the resulting deep energy estimator network (DEEN) is designed as products of experts. We present the utility of DEEN in learning the energy, the score function, and in single-step denoising experiments for synthetic and high-dimensional data. We also diagnose stability problems in the direct estimation of the score function that had been observed for denoising autoencoders.
Tasks	Denoising, Density Estimation
Published	2018-05-21
URL	http://arxiv.org/abs/1805.08306v1
PDF	http://arxiv.org/pdf/1805.08306v1.pdf
PWC	https://paperswithcode.com/paper/deep-energy-estimator-networks
Repo
Framework

New Feature Detection Mechanism for Extended Kalman Filter Based Monocular SLAM with 1-Point RANSAC


Title	New Feature Detection Mechanism for Extended Kalman Filter Based Monocular SLAM with 1-Point RANSAC
Authors	Agniva Sengupta, Shafeeq Elanattil
Abstract	We present a different approach of feature point detection for improving the accuracy of SLAM using single, monocular camera. Traditionally, Harris Corner detection, SURF or FAST corner detectors are used for finding feature points of interest in the image. We replace this with another approach, which involves building a non-linear scale-space representation of images using Perona and Malik Diffusion equation and computing the scale normalized Hessian at multiple scale levels (KAZE feature). The feature points so detected are used to estimate the state and pose of a mono camera using extended Kalman filter. By using accelerated KAZE features and a more rigorous feature rejection routine combined with 1-point RANSAC for outlier rejection, short baseline matching of features are significantly improved, even with a lesser number of feature points, especially in the presence of motion blur. We present a comparative study of our proposal with FAST and show improved localization accuracy in terms of absolute trajectory error.
Tasks
Published	2018-05-31
URL	http://arxiv.org/abs/1805.12443v1
PDF	http://arxiv.org/pdf/1805.12443v1.pdf
PWC	https://paperswithcode.com/paper/new-feature-detection-mechanism-for-extended
Repo
Framework

Left Ventricle Segmentation via Optical-Flow-Net from Short-axis Cine MRI: Preserving the Temporal Coherence of Cardiac Motion


Title	Left Ventricle Segmentation via Optical-Flow-Net from Short-axis Cine MRI: Preserving the Temporal Coherence of Cardiac Motion
Authors	Wenjun Yan, Yuanyuan Wang, Zeju Li, Rob J. van der Geest, Qian Tao
Abstract	Quantitative assessment of left ventricle (LV) function from cine MRI has significant diagnostic and prognostic value for cardiovascular disease patients. The temporal movement of LV provides essential information on the contracting/relaxing pattern of heart, which is keenly evaluated by clinical experts in clinical practice. Inspired by the expert way of viewing Cine MRI, we propose a new CNN module that is able to incorporate the temporal information into LV segmentation from cine MRI. In the proposed CNN, the optical flow (OF) between neighboring frames is integrated and aggregated at feature level, such that temporal coherence in cardiac motion can be taken into account during segmentation. The proposed module is integrated into the U-net architecture without need of additional training. Furthermore, dilated convolution is introduced to improve the spatial accuracy of segmentation. Trained and tested on the Cardiac Atlas database, the proposed network resulted in a Dice index of 95% and an average perpendicular distance of 0.9 pixels for the middle LV contour, significantly outperforming the original U-net that processes each frame individually. Notably, the proposed method improved the temporal coherence of LV segmentation results, especially at the LV apex and base where the cardiac motion is difficult to follow.
Tasks	Optical Flow Estimation
Published	2018-10-20
URL	http://arxiv.org/abs/1810.08753v1
PDF	http://arxiv.org/pdf/1810.08753v1.pdf
PWC	https://paperswithcode.com/paper/left-ventricle-segmentation-via-optical-flow
Repo
Framework

Full Reference Objective Quality Assessment for Reconstructed Background Images


Title	Full Reference Objective Quality Assessment for Reconstructed Background Images
Authors	Aditee Shrotre, Lina Karam
Abstract	With an increased interest in applications that require a clean background image, such as video surveillance, object tracking, street view imaging and location-based services on web-based maps, multiple algorithms have been developed to reconstruct a background image from cluttered scenes. Traditionally, statistical measures and existing image quality techniques have been applied for evaluating the quality of the reconstructed background images. Though these quality assessment methods have been widely used in the past, their performance in evaluating the perceived quality of the reconstructed background image has not been verified. In this work, we discuss the shortcomings in existing metrics and propose a full reference Reconstructed Background image Quality Index (RBQI) that combines color and structural information at multiple scales using a probability summation model to predict the perceived quality in the reconstructed background image given a reference image. To compare the performance of the proposed quality index with existing image quality assessment measures, we construct two different datasets consisting of reconstructed background images and corresponding subjective scores. The quality assessment measures are evaluated by correlating their objective scores with human subjective ratings. The correlation results show that the proposed RBQI outperforms all the existing approaches. Additionally, the constructed datasets and the corresponding subjective scores provide a benchmark to evaluate the performance of future metrics that are developed to evaluate the perceived quality of reconstructed background images.
Tasks	Image Quality Assessment, Object Tracking
Published	2018-03-12
URL	http://arxiv.org/abs/1803.04103v2
PDF	http://arxiv.org/pdf/1803.04103v2.pdf
PWC	https://paperswithcode.com/paper/full-reference-objective-quality-assessment
Repo
Framework

Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks


Title	Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks
Authors	Pichao Wang, Wanqing Li, Zhimin Gao, Chang Tang, Philip Ogunbona
Abstract	This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI), for both isolated and continuous action recognition. These dynamic images are constructed from a segmented sequence of depth maps using hierarchical bidirectional rank pooling to effectively capture the spatial-temporal information. Specifically, DDI exploits the dynamics of postures over time and DDNI and DDMNI exploit the 3D structural information captured by depth maps. Upon the proposed representations, a ConvNet based method is developed for action recognition. The image-based representations enable us to fine-tune the existing Convolutional Neural Network (ConvNet) models trained on image data without training a large number of parameters from scratch. The proposed method achieved the state-of-art results on three large datasets, namely, the Large-scale Continuous Gesture Recognition Dataset (means Jaccard index 0.4109), the Large-scale Isolated Gesture Recognition Dataset (59.21%), and the NTU RGB+D Dataset (87.08% cross-subject and 84.22% cross-view) even though only the depth modality was used.
Tasks	3D Human Action Recognition, Gesture Recognition, Temporal Action Localization
Published	2018-03-17
URL	http://arxiv.org/abs/1804.01194v2
PDF	http://arxiv.org/pdf/1804.01194v2.pdf
PWC	https://paperswithcode.com/paper/depth-pooling-based-large-scale-3d-action
Repo
Framework

Speaker-Invariant Training via Adversarial Learning


Title	Speaker-Invariant Training via Adversarial Learning
Authors	Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang, Juang
Abstract	We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system. We call the scheme speaker-invariant training (SIT). In SIT, a DNN acoustic model and a speaker classifier network are jointly optimized to minimize the senone (tied triphone state) classification loss, and simultaneously mini-maximize the speaker classification loss. A speaker-invariant and senone-discriminative deep feature is learned through this adversarial multi-task learning. With SIT, a canonical DNN acoustic model with significantly reduced variance in its output probabilities is learned with no explicit speaker-independent (SI) transformations or speaker-specific representations used in training or testing. Evaluated on the CHiME-3 dataset, the SIT achieves 4.99% relative word error rate (WER) improvement over the conventional SI acoustic model. With additional unsupervised speaker adaptation, the speaker-adapted (SA) SIT model achieves 4.86% relative WER gain over the SA SI acoustic model.
Tasks	Multi-Task Learning
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00732v3
PDF	http://arxiv.org/pdf/1804.00732v3.pdf
PWC	https://paperswithcode.com/paper/speaker-invariant-training-via-adversarial
Repo
Framework

Automatic Semantic Content Removal by Learning to Neglect


Title	Automatic Semantic Content Removal by Learning to Neglect
Authors	Siyang Qin, Jiahui Wei, Roberto Manduchi
Abstract	We introduce a new system for automatic image content removal and inpainting. Unlike traditional inpainting algorithms, which require advance knowledge of the region to be filled in, our system automatically detects the area to be removed and infilled. Region segmentation and inpainting are performed jointly in a single pass. In this way, potential segmentation errors are more naturally alleviated by the inpainting module. The system is implemented as an encoder-decoder architecture, with two decoder branches, one tasked with segmentation of the foreground region, the other with inpainting. The encoder and the two decoder branches are linked via neglect nodes, which guide the inpainting process in selecting which areas need reconstruction. The whole model is trained using a conditional GAN strategy. Comparative experiments show that our algorithm outperforms state-of-the-art inpainting techniques (which, unlike our system, do not segment the input image and thus must be aided by an external segmentation module.)
Tasks
Published	2018-07-20
URL	http://arxiv.org/abs/1807.07696v1
PDF	http://arxiv.org/pdf/1807.07696v1.pdf
PWC	https://paperswithcode.com/paper/automatic-semantic-content-removal-by
Repo
Framework

Characterizing the public perception of WhatsApp through the lens of media


Title	Characterizing the public perception of WhatsApp through the lens of media
Authors	Josemar Alves Caetano, Gabriel Magno, Evandro Cunha, Wagner Meira Jr., Humberto T. Marques-Neto, Virgilio Almeida
Abstract	WhatsApp is, as of 2018, a significant component of the global information and communication infrastructure, especially in developing countries. However, probably due to its strong end-to-end encryption, WhatsApp became an attractive place for the dissemination of misinformation, extremism and other forms of undesirable behavior. In this paper, we investigate the public perception of WhatsApp through the lens of media. We analyze two large datasets of news and show the kind of content that is being associated with WhatsApp in different regions of the world and over time. Our analyses include the examination of named entities, general vocabulary, and topics addressed in news articles that mention WhatsApp, as well as the polarity of these texts. Among other results, we demonstrate that the vocabulary and topics around the term “whatsapp” in the media have been changing over the years and in 2018 concentrate on matters related to misinformation, politics and criminal scams. More generally, our findings are useful to understand the impact that tools like WhatsApp play in the contemporary society and how they are seen by the communities themselves.
Tasks
Published	2018-08-17
URL	http://arxiv.org/abs/1808.05927v1
PDF	http://arxiv.org/pdf/1808.05927v1.pdf
PWC	https://paperswithcode.com/paper/characterizing-the-public-perception-of
Repo
Framework

An argument in favor of strong scaling for deep neural networks with small datasets


Title	An argument in favor of strong scaling for deep neural networks with small datasets
Authors	Renato L. de F. Cunha, Eduardo R. Rodrigues, Matheus Palhares Viana, Dario Augusto Borges Oliveira
Abstract	In recent years, with the popularization of deep learning frameworks and large datasets, researchers have started parallelizing their models in order to train faster. This is crucially important, because they typically explore many hyperparameters in order to find the best ones for their applications. This process is time consuming and, consequently, speeding up training improves productivity. One approach to parallelize deep learning models followed by many researchers is based on weak scaling. The minibatches increase in size as new GPUs are added to the system. In addition, new learning rates schedules have been proposed to fix optimization issues that occur with large minibatch sizes. In this paper, however, we show that the recommendations provided by recent work do not apply to models that lack large datasets. In fact, we argument in favor of using strong scaling for achieving reliable performance in such cases. We evaluated our approach with up to 32 GPUs and show that weak scaling not only does not have the same accuracy as the sequential model, it also fails to converge most of time. Meanwhile, strong scaling has good scalability while having exactly the same accuracy of a sequential implementation.
Tasks
Published	2018-07-24
URL	http://arxiv.org/abs/1807.09161v2
PDF	http://arxiv.org/pdf/1807.09161v2.pdf
PWC	https://paperswithcode.com/paper/an-argument-in-favor-of-strong-scaling-for
Repo
Framework

Deep Reinforcement Learning for Time Scheduling in RF-Powered Backscatter Cognitive Radio Networks


Title	Deep Reinforcement Learning for Time Scheduling in RF-Powered Backscatter Cognitive Radio Networks
Authors	Tran The Anh, Nguyen Cong Luong, Dusit Niyato, Ying-Chang Liang, Dong In Kim
Abstract	In an RF-powered backscatter cognitive radio network, multiple secondary users communicate with a secondary gateway by backscattering or harvesting energy and actively transmitting their data depending on the primary channel state. To coordinate the transmission of multiple secondary transmitters, the secondary gateway needs to schedule the backscattering time, energy harvesting time, and transmission time among them. However, under the dynamics of the primary channel and the uncertainty of the energy state of the secondary transmitters, it is challenging for the gateway to find a time scheduling mechanism which maximizes the total throughput. In this paper, we propose to use the deep reinforcement learning algorithm to derive an optimal time scheduling policy for the gateway. Specifically, to deal with the problem with large state and action spaces, we adopt a Double Deep-Q Network (DDQN) that enables the gateway to learn the optimal policy. The simulation results clearly show that the proposed deep reinforcement learning algorithm outperforms non-learning schemes in terms of network throughput.
Tasks
Published	2018-10-03
URL	http://arxiv.org/abs/1810.04520v1
PDF	http://arxiv.org/pdf/1810.04520v1.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-time
Repo
Framework

Traffic Density Estimation using a Convolutional Neural Network


Title	Traffic Density Estimation using a Convolutional Neural Network
Authors	Julian Nubert, Nicholas Giai Truong, Abel Lim, Herbert Ilhan Tanujaya, Leah Lim, Mai Anh Vu
Abstract	The goal of this project is to introduce and present a machine learning application that aims to improve the quality of life of people in Singapore. In particular, we investigate the use of machine learning solutions to tackle the problem of traffic congestion in Singapore. In layman’s terms, we seek to make Singapore (or any other city) a smoother place. To accomplish this aim, we present an end-to-end system comprising of 1. A traffic density estimation algorithm at traffic lights/junctions and 2. a suitable traffic signal control algorithms that make use of the density information for better traffic control. Traffic density estimation can be obtained from traffic junction images using various machine learning techniques (combined with CV tools). After research into various advanced machine learning methods, we decided on convolutional neural networks (CNNs). We conducted experiments on our algorithms, using the publicly available traffic camera dataset published by the Land Transport Authority (LTA) to demonstrate the feasibility of this approach. With these traffic density estimates, different traffic algorithms can be applied to minimize congestion at traffic junctions in general.
Tasks	Density Estimation
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01564v1
PDF	http://arxiv.org/pdf/1809.01564v1.pdf
PWC	https://paperswithcode.com/paper/traffic-density-estimation-using-a
Repo
Framework

VIREL: A Variational Inference Framework for Reinforcement Learning


Title	VIREL: A Variational Inference Framework for Reinforcement Learning
Authors	Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson
Abstract	Applying probabilistic models to reinforcement learning (RL) enables the application of powerful optimisation tools such as variational inference to RL. However, existing inference frameworks and their algorithms pose significant challenges for learning optimal policies, e.g., the absence of mode capturing behaviour in pseudo-likelihood methods and difficulties learning deterministic policies in maximum entropy RL based approaches. We propose VIREL, a novel, theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP. This gives VIREL a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference and the ability to optimise value functions and policies in separate, iterative steps. In applying variational expectation-maximisation to VIREL we thus show that the actor-critic algorithm can be reduced to expectation-maximisation, with policy improvement equivalent to an E-step and policy evaluation to an M-step. We then derive a family of actor-critic methods from VIREL, including a scheme for adaptive exploration. Finally, we demonstrate that actor-critic algorithms from this family outperform state-of-the-art methods based on soft value functions in several domains.
Tasks
Published	2018-11-03
URL	https://arxiv.org/abs/1811.01132v8
PDF	https://arxiv.org/pdf/1811.01132v8.pdf
PWC	https://paperswithcode.com/paper/virel-a-variational-inference-framework-for
Repo
Framework

TBD: Benchmarking and Analyzing Deep Neural Network Training


Title	TBD: Benchmarking and Analyzing Deep Neural Network Training
Authors	Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko
Abstract	The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus is usually very narrow and limited to (i) inference – i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation. Our primary goal in this work is to break this myopic view by (i) proposing a new benchmark for DNN training, called TBD (TBD is short for Training Benchmark for DNNs), that uses a representative set of DNN models that cover a wide range of machine learning applications: image classification, machine translation, speech recognition, object detection, adversarial networks, reinforcement learning, and (ii) by performing an extensive performance analysis of training these different applications on three major deep learning frameworks (TensorFlow, MXNet, CNTK) across different hardware configurations (single-GPU, multi-GPU, and multi-machine). TBD currently covers six major application domains and eight different state-of-the-art models. We present a new toolchain for performance analysis for these models that combines the targeted usage of existing performance analysis tools, careful selection of new and existing metrics and methodologies to analyze the results, and utilization of domain specific characteristics of DNN training. We also build a new set of tools for memory profiling in all three major frameworks; much needed tools that can finally shed some light on precisely how much memory is consumed by different data structures (weights, activations, gradients, workspace) in DNN training. By using our tools and methodologies, we make several important observations and recommendations on where the future research and optimization of DNN training should be focused.
Tasks	Image Classification, Machine Translation, Object Detection, Speech Recognition
Published	2018-03-16
URL	http://arxiv.org/abs/1803.06905v2
PDF	http://arxiv.org/pdf/1803.06905v2.pdf
PWC	https://paperswithcode.com/paper/tbd-benchmarking-and-analyzing-deep-neural
Repo
Framework