October 19, 2019

2891 words 14 mins read

Paper Group ANR 373

Paper Group ANR 373

Application of Deep Learning in Fundus Image Processing for Ophthalmic Diagnosis – A Review. Face Hallucination Revisited: An Exploratory Study on Dataset Bias. Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement. Learning random points from geometric graphs or orderings. Unifying Bilateral Filtering and Adversa …

Application of Deep Learning in Fundus Image Processing for Ophthalmic Diagnosis – A Review

Title Application of Deep Learning in Fundus Image Processing for Ophthalmic Diagnosis – A Review
Authors Sourya Sengupta, Amitojdeep Singh, Henry A. Leopold, Tanmay Gulati, Vasudevan Lakshminarayanan
Abstract An overview of the applications of deep learning in ophthalmic diagnosis using retinal fundus images is presented. We also review various retinal image datasets that can be used for deep learning purposes. Applications of deep learning for segmentation of optic disk, blood vessels and retinal layer as well as detection of lesions are reviewed. Recent deep learning models for classification of diseases such as age-related macular degeneration, glaucoma,diabetic macular edema and diabetic retinopathy are also reported.
Tasks
Published 2018-12-09
URL https://arxiv.org/abs/1812.07101v3
PDF https://arxiv.org/pdf/1812.07101v3.pdf
PWC https://paperswithcode.com/paper/ophthalmic-diagnosis-and-deep-learning-a
Repo
Framework

Face Hallucination Revisited: An Exploratory Study on Dataset Bias

Title Face Hallucination Revisited: An Exploratory Study on Dataset Bias
Authors Klemen Grm, Martin Pernuš, Leo Cluzel, Walter Scheirer, Simon Dobrišek, Vitomir Štruc
Abstract Contemporary face hallucination (FH) models exhibit considerable ability to reconstruct high-resolution (HR) details from low-resolution (LR) face images. This ability is commonly learned from examples of corresponding HR-LR image pairs, created by artificially down-sampling the HR ground truth data. This down-sampling (or degradation) procedure not only defines the characteristics of the LR training data, but also determines the type of image degradations the learned FH models are eventually able to handle. If the image characteristics encountered with real-world LR images differ from the ones seen during training, FH models are still expected to perform well, but in practice may not produce the desired results. In this paper we study this problem and explore the bias introduced into FH models by the characteristics of the training data. We systematically analyze the generalization capabilities of several FH models in various scenarios, where the image the degradation function does not match the training setup and conduct experiments with synthetically downgraded as well as real-life low-quality images. We make several interesting findings that provide insight into existing problems with FH models and point to future research directions.
Tasks Face Hallucination
Published 2018-12-21
URL http://arxiv.org/abs/1812.09010v1
PDF http://arxiv.org/pdf/1812.09010v1.pdf
PWC https://paperswithcode.com/paper/face-hallucination-revisited-an-exploratory
Repo
Framework

Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement

Title Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement
Authors Yibing Song, Jiawei Zhang, Lijun Gong, Shengfeng He, Linchao Bao, Jinshan Pan, Qingxiong Yang, Ming-Hsuan Yang
Abstract We address the problem of restoring a high-resolution face image from a blurry low-resolution input. This problem is difficult as super-resolution and deblurring need to be tackled simultaneously. Moreover, existing algorithms cannot handle face images well as low-resolution face images do not have much texture which is especially critical for deblurring. In this paper, we propose an effective algorithm by utilizing the domain-specific knowledge of human faces to recover high-quality faces. We first propose a facial component guided deep Convolutional Neural Network (CNN) to restore a coarse face image, which is denoted as the base image where the facial component is automatically generated from the input face image. However, the CNN based method cannot handle image details well. We further develop a novel exemplar-based detail enhancement algorithm via facial component matching. Extensive experiments show that the proposed method outperforms the state-of-the-art algorithms both quantitatively and qualitatively.
Tasks Deblurring, Face Hallucination, Super-Resolution
Published 2018-11-22
URL http://arxiv.org/abs/1811.09019v1
PDF http://arxiv.org/pdf/1811.09019v1.pdf
PWC https://paperswithcode.com/paper/joint-face-hallucination-and-deblurring-via
Repo
Framework

Learning random points from geometric graphs or orderings

Title Learning random points from geometric graphs or orderings
Authors Josep Diaz, Colin McDiarmid, Dieter Mitsche
Abstract Suppose that there is a family of $n$ random points $X_v$ for $v \in V$, independently and uniformly distributed in the square $\left[-\sqrt{n}/2,\sqrt{n}/2\right]^2$ of area $n$. We do not see these points, but learn about them in one of the following two ways. Suppose first that we are given the corresponding random geometric graph $G$, where distinct vertices $u$ and $v$ are adjacent when the Euclidean distance $d_E(X_u,X_v)$ is at most $r$. If the threshold distance $r$ satisfies $n^{3/14} \ll r \ll n^{1/2}$, then the following holds with high probability. Given the graph $G$ (without any geometric information), in polynomial time we can approximately reconstruct the hidden embedding, in the sense that, up to symmetries', for each vertex $v$ we find a point within distance about $r$ of $X_v$; that is, we find an embedding with displacement’ at most about $r$. Now suppose that, instead of being given the graph $G$, we are given, for each vertex $v$, the ordering of the other vertices by increasing Euclidean distance from $v$. Then, with high probability, in polynomial time we can find an embedding with the much smaller displacement error $O(\sqrt{\log n})$.
Tasks
Published 2018-09-26
URL https://arxiv.org/abs/1809.09879v2
PDF https://arxiv.org/pdf/1809.09879v2.pdf
PWC https://paperswithcode.com/paper/learning-random-points-from-geometric-graphs
Repo
Framework

Unifying Bilateral Filtering and Adversarial Training for Robust Neural Networks

Title Unifying Bilateral Filtering and Adversarial Training for Robust Neural Networks
Authors Neale Ratzlaff, Li Fuxin
Abstract Recent analysis of deep neural networks has revealed their vulnerability to carefully structured adversarial examples. Many effective algorithms exist to craft these adversarial examples, but performant defenses seem to be far away. In this work, we explore the use of edge-aware bilateral filtering as a projection back to the space of natural images. We show that bilateral filtering is an effective defense in multiple attack settings, where the strength of the adversary gradually increases. In the case of an adversary who has no knowledge of the defense, bilateral filtering can remove more than 90% of adversarial examples from a variety of different attacks. To evaluate against an adversary with complete knowledge of our defense, we adapt the bilateral filter as a trainable layer in a neural network and show that adding this layer makes ImageNet images significantly more robust to attacks. When trained under a framework of adversarial training, we show that the resulting model is hard to fool with even the best attack methods.
Tasks
Published 2018-04-05
URL http://arxiv.org/abs/1804.01635v3
PDF http://arxiv.org/pdf/1804.01635v3.pdf
PWC https://paperswithcode.com/paper/unifying-bilateral-filtering-and-adversarial
Repo
Framework

Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input

Title Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input
Authors Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark
Abstract The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by developing agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured.
Tasks
Published 2018-04-11
URL http://arxiv.org/abs/1804.03984v1
PDF http://arxiv.org/pdf/1804.03984v1.pdf
PWC https://paperswithcode.com/paper/emergence-of-linguistic-communication-from
Repo
Framework

Learning Contracting Vector Fields For Stable Imitation Learning

Title Learning Contracting Vector Fields For Stable Imitation Learning
Authors Vikas Sindhwani, Stephen Tu, Mohi Khansari
Abstract We propose a new non-parametric framework for learning incrementally stable dynamical systems x’ = f(x) from a set of sampled trajectories. We construct a rich family of smooth vector fields induced by certain classes of matrix-valued kernels, whose equilibria are placed exactly at a desired set of locations and whose local contraction and curvature properties at various points can be explicitly controlled using convex optimization. With curl-free kernels, our framework may also be viewed as a mechanism to learn potential fields and gradient flows. We develop large-scale techniques using randomized kernel approximations in this context. We demonstrate our approach, called contracting vector fields (CVF), on imitation learning tasks involving complex point-to-point human handwriting motions.
Tasks Imitation Learning
Published 2018-04-13
URL http://arxiv.org/abs/1804.04878v1
PDF http://arxiv.org/pdf/1804.04878v1.pdf
PWC https://paperswithcode.com/paper/learning-contracting-vector-fields-for-stable
Repo
Framework

Towards Understanding Language through Perception in Situated Human-Robot Interaction: From Word Grounding to Grammar Induction

Title Towards Understanding Language through Perception in Situated Human-Robot Interaction: From Word Grounding to Grammar Induction
Authors Amir Aly, Tadahiro Taniguchi
Abstract Robots are widely collaborating with human users in diferent tasks that require high-level cognitive functions to make them able to discover the surrounding environment. A difcult challenge that we briefy highlight in this short paper is inferring the latent grammatical structure of language, which includes grounding parts of speech (e.g., verbs, nouns, adjectives, and prepositions) through visual perception, and induction of Combinatory Categorial Grammar (CCG) for phrases. This paves the way towards grounding phrases so as to make a robot able to understand human instructions appropriately during interaction.
Tasks
Published 2018-12-12
URL https://arxiv.org/abs/1812.04840v3
PDF https://arxiv.org/pdf/1812.04840v3.pdf
PWC https://paperswithcode.com/paper/towards-understanding-language-through
Repo
Framework

A Comprehensive Approach for Learning-based Fully-Automated Inter-slice Motion Correction for Short-Axis Cine Cardiac MR Image Stacks

Title A Comprehensive Approach for Learning-based Fully-Automated Inter-slice Motion Correction for Short-Axis Cine Cardiac MR Image Stacks
Authors Giacomo Tarroni, Ozan Oktay, Matthew Sinclair, Wenjia Bai, Andreas Schuh, Hideaki Suzuki, Antonio de Marvao, Declan O’Regan, Stuart Cook, Daniel Rueckert
Abstract In the clinical routine, short axis (SA) cine cardiac MR (CMR) image stacks are acquired during multiple subsequent breath-holds. If the patient cannot consistently hold the breath at the same position, the acquired image stack will be affected by inter-slice respiratory motion and will not correctly represent the cardiac volume, introducing potential errors in the following analyses and visualisations. We propose an approach to automatically correct inter-slice respiratory motion in SA CMR image stacks. Our approach makes use of probabilistic segmentation maps (PSMs) of the left ventricular (LV) cavity generated with decision forests. PSMs are generated for each slice of the SA stack and rigidly registered in-plane to a target PSM. If long axis (LA) images are available, PSMs are generated for them and combined to create the target PSM; if not, the target PSM is produced from the same stack using a 3D model trained from motion-free stacks. The proposed approach was tested on a dataset of SA stacks acquired from 24 healthy subjects (for which anatomical 3D cardiac images were also available as reference) and compared to two techniques which use LA intensity images and LA segmentations as targets, respectively. The results show the accuracy and robustness of the proposed approach in motion compensation.
Tasks Motion Compensation
Published 2018-10-03
URL http://arxiv.org/abs/1810.02201v1
PDF http://arxiv.org/pdf/1810.02201v1.pdf
PWC https://paperswithcode.com/paper/a-comprehensive-approach-for-learning-based
Repo
Framework

A Reflectance Based Method For Shadow Detection and Removal

Title A Reflectance Based Method For Shadow Detection and Removal
Authors Sri Kalyan Yarlagadda, Fengqing Zhu
Abstract Shadows are common aspect of images and when left undetected can hinder scene understanding and visual processing. We propose a simple yet effective approach based on reflectance to detect shadows from single image. An image is first segmented and based on the reflectance, illumination and texture characteristics, segments pairs are identified as shadow and non-shadow pairs. The proposed method is tested on two publicly available and widely used datasets. Our method achieves higher accuracy in detecting shadows compared to previous reported methods despite requiring fewer parameters. We also show results of shadow-free images by relighting the pixels in the detected shadow regions.
Tasks Detecting Shadows, Scene Understanding, Shadow Detection, Shadow Detection And Removal
Published 2018-07-11
URL http://arxiv.org/abs/1807.04352v1
PDF http://arxiv.org/pdf/1807.04352v1.pdf
PWC https://paperswithcode.com/paper/a-reflectance-based-method-for-shadow
Repo
Framework

Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks

Title Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks
Authors Reinhard Heckel, Paul Hand
Abstract Deep neural networks, in particular convolutional neural networks, have become highly effective tools for compressing images and solving inverse problems including denoising, inpainting, and reconstruction from few and noisy measurements. This success can be attributed in part to their ability to represent and generate natural images well. Contrary to classical tools such as wavelets, image-generating deep neural networks have a large number of parameters—typically a multiple of their output dimension—and need to be trained on large datasets. In this paper, we propose an untrained simple image model, called the deep decoder, which is a deep neural network that can generate natural images from very few weight parameters. The deep decoder has a simple architecture with no convolutions and fewer weight parameters than the output dimensionality. This underparameterization enables the deep decoder to compress images into a concise set of network weights, which we show is on par with wavelet-based thresholding. Further, underparameterization provides a barrier to overfitting, allowing the deep decoder to have state-of-the-art performance for denoising. The deep decoder is simple in the sense that each layer has an identical structure that consists of only one upsampling unit, pixel-wise linear combination of channels, ReLU activation, and channelwise normalization. This simplicity makes the network amenable to theoretical analysis, and it sheds light on the aspects of neural networks that enable them to form effective signal representations.
Tasks Denoising
Published 2018-10-02
URL http://arxiv.org/abs/1810.03982v1
PDF http://arxiv.org/pdf/1810.03982v1.pdf
PWC https://paperswithcode.com/paper/deep-decoder-concise-image-representations
Repo
Framework

Generalized chart constraints for efficient PCFG and TAG parsing

Title Generalized chart constraints for efficient PCFG and TAG parsing
Authors Stefan Grünewald, Sophie Henning, Alexander Koller
Abstract Chart constraints, which specify at which string positions a constituent may begin or end, have been shown to speed up chart parsers for PCFGs. We generalize chart constraints to more expressive grammar formalisms and describe a neural tagger which predicts chart constraints at very high precision. Our constraints accelerate both PCFG and TAG parsing, and combine effectively with other pruning techniques (coarse-to-fine and supertagging) for an overall speedup of two orders of magnitude, while improving accuracy.
Tasks
Published 2018-06-27
URL http://arxiv.org/abs/1806.10654v1
PDF http://arxiv.org/pdf/1806.10654v1.pdf
PWC https://paperswithcode.com/paper/generalized-chart-constraints-for-efficient
Repo
Framework

Bi-GANs-ST for Perceptual Image Super-resolution

Title Bi-GANs-ST for Perceptual Image Super-resolution
Authors Xiaotong Luo, Rong Chen, Yuan Xie, Yanyun Qu, Cuihua Li
Abstract Image quality measurement is a critical problem for image super-resolution (SR) algorithms. Usually, they are evaluated by some well-known objective metrics, e.g., PSNR and SSIM, but these indices cannot provide suitable results in accordance with the perception of human being. Recently, a more reasonable perception measurement has been proposed in [1], which is also adopted by the PIRM-SR 2018 challenge. In this paper, motivated by [1], we aim to generate a high-quality SR result which balances between the two indices, i.e., the perception index and root-mean-square error (RMSE). To do so, we design a new deep SR framework, dubbed Bi-GANs-ST, by integrating two complementary generative adversarial networks (GAN) branches. One is memory residual SRGAN (MR-SRGAN), which emphasizes on improving the objective performance, such as reducing the RMSE. The other is weight perception SRGAN (WP-SRGAN), which obtains the result that favors better subjective perception via a two-stage adversarial training mechanism. Then, to produce final result with excellent perception scores and RMSE, we use soft-thresholding method to merge the results generated by the two GANs. Our method performs well on the perceptual image super-resolution task of the PIRM 2018 challenge. Experimental results on five benchmarks show that our proposal achieves highly competent performance compared with other state-of-the-art methods.
Tasks Image Super-Resolution, Super-Resolution
Published 2018-11-01
URL http://arxiv.org/abs/1811.00367v1
PDF http://arxiv.org/pdf/1811.00367v1.pdf
PWC https://paperswithcode.com/paper/bi-gans-st-for-perceptual-image-super
Repo
Framework

Variance reduction properties of the reparameterization trick

Title Variance reduction properties of the reparameterization trick
Authors Ming Xu, Matias Quiroz, Robert Kohn, Scott A. Sisson
Abstract The reparameterization trick is widely used in variational inference as it yields more accurate estimates of the gradient of the variational objective than alternative approaches such as the score function method. Although there is overwhelming empirical evidence in the literature showing its success, there is relatively little research exploring why the reparameterization trick is so effective. We explore this under the idealized assumptions that the variational approximation is a mean-field Gaussian density and that the log of the joint density of the model parameters and the data is a quadratic function that depends on the variational mean. From this, we show that the marginal variances of the reparameterization gradient estimator are smaller than those of the score function gradient estimator. We apply the result of our idealized analysis to real-world examples.
Tasks
Published 2018-09-27
URL http://arxiv.org/abs/1809.10330v3
PDF http://arxiv.org/pdf/1809.10330v3.pdf
PWC https://paperswithcode.com/paper/variance-reduction-properties-of-the
Repo
Framework

Efficient Large-Scale Multi-Modal Classification

Title Efficient Large-Scale Multi-Modal Classification
Authors D. Kiela, E. Grave, A. Joulin, T. Mikolov
Abstract While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multi-modal fusion, with the additional benefit of improved interpretability.
Tasks
Published 2018-02-06
URL http://arxiv.org/abs/1802.02892v1
PDF http://arxiv.org/pdf/1802.02892v1.pdf
PWC https://paperswithcode.com/paper/efficient-large-scale-multi-modal
Repo
Framework
comments powered by Disqus