Paper Group ANR 1119
Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction. KonIQ-10k: Towards an ecologically valid and large-scale IQA database. Deep Energy Estimator Networks. New Feature Detection Mechanism for Extended Kalman Filter Based Monocular SLAM with 1-Point RANSAC. Left Vent …
Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction
Title | Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction |
Authors | Daniele Ravì, Agnieszka Barbara Szczotka, Dzhoshkun Ismail Shakir, Stephen P Pereira, Tom Vercauteren |
Abstract | Purpose: Probe-based Confocal Laser Endomicroscopy (pCLE) is a recent imaging modality that allows performing in vivo optical biopsies. The design of pCLE hardware, and its reliance on an optical fibre bundle, fundamentally limits the image quality with a few tens of thousands fibres, each acting as the equivalent of a single-pixel detector, assembled into a single fibre bundle. Video-registration techniques can be used to estimate high-resolution (HR) images by exploiting the temporal information contained in a sequence of low-resolution (LR) images. However, the alignment of LR frames, required for the fusion, is computationally demanding and prone to artefacts. Methods: In this work, we propose a novel synthetic data generation approach to train exemplar-based Deep Neural Networks (DNNs). HR pCLE images with enhanced quality are recovered by the models trained on pairs of estimated HR images (generated by the video-registration algorithm) and realistic synthetic LR images. Performance of three different state-of-the-art DNNs techniques were analysed on a Smart Atlas database of 8806 images from 238 pCLE video sequences. The results were validated through an extensive Image Quality Assessment (IQA) that takes into account different quality scores, including a Mean Opinion Score (MOS). Results: Results indicate that the proposed solution produces an effective improvement in the quality of the obtained reconstructed image. Conclusion: The proposed training strategy and associated DNNs allows us to perform convincing super-resolution of pCLE images. |
Tasks | Image Quality Assessment, Image Super-Resolution, Super-Resolution, Synthetic Data Generation |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08840v1 |
http://arxiv.org/pdf/1803.08840v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-deep-learning-training-for-single |
Repo | |
Framework | |
KonIQ-10k: Towards an ecologically valid and large-scale IQA database
Title | KonIQ-10k: Towards an ecologically valid and large-scale IQA database |
Authors | Hanhe Lin, Vlad Hosu, Dietmar Saupe |
Abstract | The main challenge in applying state-of-the-art deep learning methods to predict image quality in-the-wild is the relatively small size of existing quality scored datasets. The reason for the lack of larger datasets is the massive resources required in generating diverse and publishable content. We present a new systematic and scalable approach to create large-scale, authentic and diverse image datasets for Image Quality Assessment (IQA). We show how we built an IQA database, KonIQ-10k, consisting of 10,073 images, on which we performed very large scale crowdsourcing experiments in order to obtain reliable quality ratings from 1,467 crowd workers (1.2 million ratings). We argue for its ecological validity by analyzing the diversity of the dataset, by comparing it to state-of-the-art IQA databases, and by checking the reliability of our user studies. |
Tasks | Image Quality Assessment |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08489v1 |
http://arxiv.org/pdf/1803.08489v1.pdf | |
PWC | https://paperswithcode.com/paper/koniq-10k-towards-an-ecologically-valid-and |
Repo | |
Framework | |
Deep Energy Estimator Networks
Title | Deep Energy Estimator Networks |
Authors | Saeed Saremi, Arash Mehrjou, Bernhard Schölkopf, Aapo Hyvärinen |
Abstract | Density estimation is a fundamental problem in statistical learning. This problem is especially challenging for complex high-dimensional data due to the curse of dimensionality. A promising solution to this problem is given here in an inference-free hierarchical framework that is built on score matching. We revisit the Bayesian interpretation of the score function and the Parzen score matching, and construct a multilayer perceptron with a scalable objective for learning the energy (i.e. the unnormalized log-density), which is then optimized with stochastic gradient descent. In addition, the resulting deep energy estimator network (DEEN) is designed as products of experts. We present the utility of DEEN in learning the energy, the score function, and in single-step denoising experiments for synthetic and high-dimensional data. We also diagnose stability problems in the direct estimation of the score function that had been observed for denoising autoencoders. |
Tasks | Denoising, Density Estimation |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.08306v1 |
http://arxiv.org/pdf/1805.08306v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-energy-estimator-networks |
Repo | |
Framework | |
New Feature Detection Mechanism for Extended Kalman Filter Based Monocular SLAM with 1-Point RANSAC
Title | New Feature Detection Mechanism for Extended Kalman Filter Based Monocular SLAM with 1-Point RANSAC |
Authors | Agniva Sengupta, Shafeeq Elanattil |
Abstract | We present a different approach of feature point detection for improving the accuracy of SLAM using single, monocular camera. Traditionally, Harris Corner detection, SURF or FAST corner detectors are used for finding feature points of interest in the image. We replace this with another approach, which involves building a non-linear scale-space representation of images using Perona and Malik Diffusion equation and computing the scale normalized Hessian at multiple scale levels (KAZE feature). The feature points so detected are used to estimate the state and pose of a mono camera using extended Kalman filter. By using accelerated KAZE features and a more rigorous feature rejection routine combined with 1-point RANSAC for outlier rejection, short baseline matching of features are significantly improved, even with a lesser number of feature points, especially in the presence of motion blur. We present a comparative study of our proposal with FAST and show improved localization accuracy in terms of absolute trajectory error. |
Tasks | |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1805.12443v1 |
http://arxiv.org/pdf/1805.12443v1.pdf | |
PWC | https://paperswithcode.com/paper/new-feature-detection-mechanism-for-extended |
Repo | |
Framework | |
Left Ventricle Segmentation via Optical-Flow-Net from Short-axis Cine MRI: Preserving the Temporal Coherence of Cardiac Motion
Title | Left Ventricle Segmentation via Optical-Flow-Net from Short-axis Cine MRI: Preserving the Temporal Coherence of Cardiac Motion |
Authors | Wenjun Yan, Yuanyuan Wang, Zeju Li, Rob J. van der Geest, Qian Tao |
Abstract | Quantitative assessment of left ventricle (LV) function from cine MRI has significant diagnostic and prognostic value for cardiovascular disease patients. The temporal movement of LV provides essential information on the contracting/relaxing pattern of heart, which is keenly evaluated by clinical experts in clinical practice. Inspired by the expert way of viewing Cine MRI, we propose a new CNN module that is able to incorporate the temporal information into LV segmentation from cine MRI. In the proposed CNN, the optical flow (OF) between neighboring frames is integrated and aggregated at feature level, such that temporal coherence in cardiac motion can be taken into account during segmentation. The proposed module is integrated into the U-net architecture without need of additional training. Furthermore, dilated convolution is introduced to improve the spatial accuracy of segmentation. Trained and tested on the Cardiac Atlas database, the proposed network resulted in a Dice index of 95% and an average perpendicular distance of 0.9 pixels for the middle LV contour, significantly outperforming the original U-net that processes each frame individually. Notably, the proposed method improved the temporal coherence of LV segmentation results, especially at the LV apex and base where the cardiac motion is difficult to follow. |
Tasks | Optical Flow Estimation |
Published | 2018-10-20 |
URL | http://arxiv.org/abs/1810.08753v1 |
http://arxiv.org/pdf/1810.08753v1.pdf | |
PWC | https://paperswithcode.com/paper/left-ventricle-segmentation-via-optical-flow |
Repo | |
Framework | |
Full Reference Objective Quality Assessment for Reconstructed Background Images
Title | Full Reference Objective Quality Assessment for Reconstructed Background Images |
Authors | Aditee Shrotre, Lina Karam |
Abstract | With an increased interest in applications that require a clean background image, such as video surveillance, object tracking, street view imaging and location-based services on web-based maps, multiple algorithms have been developed to reconstruct a background image from cluttered scenes. Traditionally, statistical measures and existing image quality techniques have been applied for evaluating the quality of the reconstructed background images. Though these quality assessment methods have been widely used in the past, their performance in evaluating the perceived quality of the reconstructed background image has not been verified. In this work, we discuss the shortcomings in existing metrics and propose a full reference Reconstructed Background image Quality Index (RBQI) that combines color and structural information at multiple scales using a probability summation model to predict the perceived quality in the reconstructed background image given a reference image. To compare the performance of the proposed quality index with existing image quality assessment measures, we construct two different datasets consisting of reconstructed background images and corresponding subjective scores. The quality assessment measures are evaluated by correlating their objective scores with human subjective ratings. The correlation results show that the proposed RBQI outperforms all the existing approaches. Additionally, the constructed datasets and the corresponding subjective scores provide a benchmark to evaluate the performance of future metrics that are developed to evaluate the perceived quality of reconstructed background images. |
Tasks | Image Quality Assessment, Object Tracking |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04103v2 |
http://arxiv.org/pdf/1803.04103v2.pdf | |
PWC | https://paperswithcode.com/paper/full-reference-objective-quality-assessment |
Repo | |
Framework | |
Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks
Title | Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks |
Authors | Pichao Wang, Wanqing Li, Zhimin Gao, Chang Tang, Philip Ogunbona |
Abstract | This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI), for both isolated and continuous action recognition. These dynamic images are constructed from a segmented sequence of depth maps using hierarchical bidirectional rank pooling to effectively capture the spatial-temporal information. Specifically, DDI exploits the dynamics of postures over time and DDNI and DDMNI exploit the 3D structural information captured by depth maps. Upon the proposed representations, a ConvNet based method is developed for action recognition. The image-based representations enable us to fine-tune the existing Convolutional Neural Network (ConvNet) models trained on image data without training a large number of parameters from scratch. The proposed method achieved the state-of-art results on three large datasets, namely, the Large-scale Continuous Gesture Recognition Dataset (means Jaccard index 0.4109), the Large-scale Isolated Gesture Recognition Dataset (59.21%), and the NTU RGB+D Dataset (87.08% cross-subject and 84.22% cross-view) even though only the depth modality was used. |
Tasks | 3D Human Action Recognition, Gesture Recognition, Temporal Action Localization |
Published | 2018-03-17 |
URL | http://arxiv.org/abs/1804.01194v2 |
http://arxiv.org/pdf/1804.01194v2.pdf | |
PWC | https://paperswithcode.com/paper/depth-pooling-based-large-scale-3d-action |
Repo | |
Framework | |
Speaker-Invariant Training via Adversarial Learning
Title | Speaker-Invariant Training via Adversarial Learning |
Authors | Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang, Juang |
Abstract | We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system. We call the scheme speaker-invariant training (SIT). In SIT, a DNN acoustic model and a speaker classifier network are jointly optimized to minimize the senone (tied triphone state) classification loss, and simultaneously mini-maximize the speaker classification loss. A speaker-invariant and senone-discriminative deep feature is learned through this adversarial multi-task learning. With SIT, a canonical DNN acoustic model with significantly reduced variance in its output probabilities is learned with no explicit speaker-independent (SI) transformations or speaker-specific representations used in training or testing. Evaluated on the CHiME-3 dataset, the SIT achieves 4.99% relative word error rate (WER) improvement over the conventional SI acoustic model. With additional unsupervised speaker adaptation, the speaker-adapted (SA) SIT model achieves 4.86% relative WER gain over the SA SI acoustic model. |
Tasks | Multi-Task Learning |
Published | 2018-04-02 |
URL | http://arxiv.org/abs/1804.00732v3 |
http://arxiv.org/pdf/1804.00732v3.pdf | |
PWC | https://paperswithcode.com/paper/speaker-invariant-training-via-adversarial |
Repo | |
Framework | |
Automatic Semantic Content Removal by Learning to Neglect
Title | Automatic Semantic Content Removal by Learning to Neglect |
Authors | Siyang Qin, Jiahui Wei, Roberto Manduchi |
Abstract | We introduce a new system for automatic image content removal and inpainting. Unlike traditional inpainting algorithms, which require advance knowledge of the region to be filled in, our system automatically detects the area to be removed and infilled. Region segmentation and inpainting are performed jointly in a single pass. In this way, potential segmentation errors are more naturally alleviated by the inpainting module. The system is implemented as an encoder-decoder architecture, with two decoder branches, one tasked with segmentation of the foreground region, the other with inpainting. The encoder and the two decoder branches are linked via neglect nodes, which guide the inpainting process in selecting which areas need reconstruction. The whole model is trained using a conditional GAN strategy. Comparative experiments show that our algorithm outperforms state-of-the-art inpainting techniques (which, unlike our system, do not segment the input image and thus must be aided by an external segmentation module.) |
Tasks | |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.07696v1 |
http://arxiv.org/pdf/1807.07696v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-semantic-content-removal-by |
Repo | |
Framework | |
Characterizing the public perception of WhatsApp through the lens of media
Title | Characterizing the public perception of WhatsApp through the lens of media |
Authors | Josemar Alves Caetano, Gabriel Magno, Evandro Cunha, Wagner Meira Jr., Humberto T. Marques-Neto, Virgilio Almeida |
Abstract | WhatsApp is, as of 2018, a significant component of the global information and communication infrastructure, especially in developing countries. However, probably due to its strong end-to-end encryption, WhatsApp became an attractive place for the dissemination of misinformation, extremism and other forms of undesirable behavior. In this paper, we investigate the public perception of WhatsApp through the lens of media. We analyze two large datasets of news and show the kind of content that is being associated with WhatsApp in different regions of the world and over time. Our analyses include the examination of named entities, general vocabulary, and topics addressed in news articles that mention WhatsApp, as well as the polarity of these texts. Among other results, we demonstrate that the vocabulary and topics around the term “whatsapp” in the media have been changing over the years and in 2018 concentrate on matters related to misinformation, politics and criminal scams. More generally, our findings are useful to understand the impact that tools like WhatsApp play in the contemporary society and how they are seen by the communities themselves. |
Tasks | |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05927v1 |
http://arxiv.org/pdf/1808.05927v1.pdf | |
PWC | https://paperswithcode.com/paper/characterizing-the-public-perception-of |
Repo | |
Framework | |
An argument in favor of strong scaling for deep neural networks with small datasets
Title | An argument in favor of strong scaling for deep neural networks with small datasets |
Authors | Renato L. de F. Cunha, Eduardo R. Rodrigues, Matheus Palhares Viana, Dario Augusto Borges Oliveira |
Abstract | In recent years, with the popularization of deep learning frameworks and large datasets, researchers have started parallelizing their models in order to train faster. This is crucially important, because they typically explore many hyperparameters in order to find the best ones for their applications. This process is time consuming and, consequently, speeding up training improves productivity. One approach to parallelize deep learning models followed by many researchers is based on weak scaling. The minibatches increase in size as new GPUs are added to the system. In addition, new learning rates schedules have been proposed to fix optimization issues that occur with large minibatch sizes. In this paper, however, we show that the recommendations provided by recent work do not apply to models that lack large datasets. In fact, we argument in favor of using strong scaling for achieving reliable performance in such cases. We evaluated our approach with up to 32 GPUs and show that weak scaling not only does not have the same accuracy as the sequential model, it also fails to converge most of time. Meanwhile, strong scaling has good scalability while having exactly the same accuracy of a sequential implementation. |
Tasks | |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.09161v2 |
http://arxiv.org/pdf/1807.09161v2.pdf | |
PWC | https://paperswithcode.com/paper/an-argument-in-favor-of-strong-scaling-for |
Repo | |
Framework | |
Deep Reinforcement Learning for Time Scheduling in RF-Powered Backscatter Cognitive Radio Networks
Title | Deep Reinforcement Learning for Time Scheduling in RF-Powered Backscatter Cognitive Radio Networks |
Authors | Tran The Anh, Nguyen Cong Luong, Dusit Niyato, Ying-Chang Liang, Dong In Kim |
Abstract | In an RF-powered backscatter cognitive radio network, multiple secondary users communicate with a secondary gateway by backscattering or harvesting energy and actively transmitting their data depending on the primary channel state. To coordinate the transmission of multiple secondary transmitters, the secondary gateway needs to schedule the backscattering time, energy harvesting time, and transmission time among them. However, under the dynamics of the primary channel and the uncertainty of the energy state of the secondary transmitters, it is challenging for the gateway to find a time scheduling mechanism which maximizes the total throughput. In this paper, we propose to use the deep reinforcement learning algorithm to derive an optimal time scheduling policy for the gateway. Specifically, to deal with the problem with large state and action spaces, we adopt a Double Deep-Q Network (DDQN) that enables the gateway to learn the optimal policy. The simulation results clearly show that the proposed deep reinforcement learning algorithm outperforms non-learning schemes in terms of network throughput. |
Tasks | |
Published | 2018-10-03 |
URL | http://arxiv.org/abs/1810.04520v1 |
http://arxiv.org/pdf/1810.04520v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-time |
Repo | |
Framework | |
Traffic Density Estimation using a Convolutional Neural Network
Title | Traffic Density Estimation using a Convolutional Neural Network |
Authors | Julian Nubert, Nicholas Giai Truong, Abel Lim, Herbert Ilhan Tanujaya, Leah Lim, Mai Anh Vu |
Abstract | The goal of this project is to introduce and present a machine learning application that aims to improve the quality of life of people in Singapore. In particular, we investigate the use of machine learning solutions to tackle the problem of traffic congestion in Singapore. In layman’s terms, we seek to make Singapore (or any other city) a smoother place. To accomplish this aim, we present an end-to-end system comprising of 1. A traffic density estimation algorithm at traffic lights/junctions and 2. a suitable traffic signal control algorithms that make use of the density information for better traffic control. Traffic density estimation can be obtained from traffic junction images using various machine learning techniques (combined with CV tools). After research into various advanced machine learning methods, we decided on convolutional neural networks (CNNs). We conducted experiments on our algorithms, using the publicly available traffic camera dataset published by the Land Transport Authority (LTA) to demonstrate the feasibility of this approach. With these traffic density estimates, different traffic algorithms can be applied to minimize congestion at traffic junctions in general. |
Tasks | Density Estimation |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01564v1 |
http://arxiv.org/pdf/1809.01564v1.pdf | |
PWC | https://paperswithcode.com/paper/traffic-density-estimation-using-a |
Repo | |
Framework | |
VIREL: A Variational Inference Framework for Reinforcement Learning
Title | VIREL: A Variational Inference Framework for Reinforcement Learning |
Authors | Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson |
Abstract | Applying probabilistic models to reinforcement learning (RL) enables the application of powerful optimisation tools such as variational inference to RL. However, existing inference frameworks and their algorithms pose significant challenges for learning optimal policies, e.g., the absence of mode capturing behaviour in pseudo-likelihood methods and difficulties learning deterministic policies in maximum entropy RL based approaches. We propose VIREL, a novel, theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP. This gives VIREL a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference and the ability to optimise value functions and policies in separate, iterative steps. In applying variational expectation-maximisation to VIREL we thus show that the actor-critic algorithm can be reduced to expectation-maximisation, with policy improvement equivalent to an E-step and policy evaluation to an M-step. We then derive a family of actor-critic methods from VIREL, including a scheme for adaptive exploration. Finally, we demonstrate that actor-critic algorithms from this family outperform state-of-the-art methods based on soft value functions in several domains. |
Tasks | |
Published | 2018-11-03 |
URL | https://arxiv.org/abs/1811.01132v8 |
https://arxiv.org/pdf/1811.01132v8.pdf | |
PWC | https://paperswithcode.com/paper/virel-a-variational-inference-framework-for |
Repo | |
Framework | |
TBD: Benchmarking and Analyzing Deep Neural Network Training
Title | TBD: Benchmarking and Analyzing Deep Neural Network Training |
Authors | Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko |
Abstract | The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus is usually very narrow and limited to (i) inference – i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation. Our primary goal in this work is to break this myopic view by (i) proposing a new benchmark for DNN training, called TBD (TBD is short for Training Benchmark for DNNs), that uses a representative set of DNN models that cover a wide range of machine learning applications: image classification, machine translation, speech recognition, object detection, adversarial networks, reinforcement learning, and (ii) by performing an extensive performance analysis of training these different applications on three major deep learning frameworks (TensorFlow, MXNet, CNTK) across different hardware configurations (single-GPU, multi-GPU, and multi-machine). TBD currently covers six major application domains and eight different state-of-the-art models. We present a new toolchain for performance analysis for these models that combines the targeted usage of existing performance analysis tools, careful selection of new and existing metrics and methodologies to analyze the results, and utilization of domain specific characteristics of DNN training. We also build a new set of tools for memory profiling in all three major frameworks; much needed tools that can finally shed some light on precisely how much memory is consumed by different data structures (weights, activations, gradients, workspace) in DNN training. By using our tools and methodologies, we make several important observations and recommendations on where the future research and optimization of DNN training should be focused. |
Tasks | Image Classification, Machine Translation, Object Detection, Speech Recognition |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.06905v2 |
http://arxiv.org/pdf/1803.06905v2.pdf | |
PWC | https://paperswithcode.com/paper/tbd-benchmarking-and-analyzing-deep-neural |
Repo | |
Framework | |