Paper Group ANR 373
Application of Deep Learning in Fundus Image Processing for Ophthalmic Diagnosis – A Review. Face Hallucination Revisited: An Exploratory Study on Dataset Bias. Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement. Learning random points from geometric graphs or orderings. Unifying Bilateral Filtering and Adversa …
Application of Deep Learning in Fundus Image Processing for Ophthalmic Diagnosis – A Review
Title | Application of Deep Learning in Fundus Image Processing for Ophthalmic Diagnosis – A Review |
Authors | Sourya Sengupta, Amitojdeep Singh, Henry A. Leopold, Tanmay Gulati, Vasudevan Lakshminarayanan |
Abstract | An overview of the applications of deep learning in ophthalmic diagnosis using retinal fundus images is presented. We also review various retinal image datasets that can be used for deep learning purposes. Applications of deep learning for segmentation of optic disk, blood vessels and retinal layer as well as detection of lesions are reviewed. Recent deep learning models for classification of diseases such as age-related macular degeneration, glaucoma,diabetic macular edema and diabetic retinopathy are also reported. |
Tasks | |
Published | 2018-12-09 |
URL | https://arxiv.org/abs/1812.07101v3 |
https://arxiv.org/pdf/1812.07101v3.pdf | |
PWC | https://paperswithcode.com/paper/ophthalmic-diagnosis-and-deep-learning-a |
Repo | |
Framework | |
Face Hallucination Revisited: An Exploratory Study on Dataset Bias
Title | Face Hallucination Revisited: An Exploratory Study on Dataset Bias |
Authors | Klemen Grm, Martin Pernuš, Leo Cluzel, Walter Scheirer, Simon Dobrišek, Vitomir Štruc |
Abstract | Contemporary face hallucination (FH) models exhibit considerable ability to reconstruct high-resolution (HR) details from low-resolution (LR) face images. This ability is commonly learned from examples of corresponding HR-LR image pairs, created by artificially down-sampling the HR ground truth data. This down-sampling (or degradation) procedure not only defines the characteristics of the LR training data, but also determines the type of image degradations the learned FH models are eventually able to handle. If the image characteristics encountered with real-world LR images differ from the ones seen during training, FH models are still expected to perform well, but in practice may not produce the desired results. In this paper we study this problem and explore the bias introduced into FH models by the characteristics of the training data. We systematically analyze the generalization capabilities of several FH models in various scenarios, where the image the degradation function does not match the training setup and conduct experiments with synthetically downgraded as well as real-life low-quality images. We make several interesting findings that provide insight into existing problems with FH models and point to future research directions. |
Tasks | Face Hallucination |
Published | 2018-12-21 |
URL | http://arxiv.org/abs/1812.09010v1 |
http://arxiv.org/pdf/1812.09010v1.pdf | |
PWC | https://paperswithcode.com/paper/face-hallucination-revisited-an-exploratory |
Repo | |
Framework | |
Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement
Title | Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement |
Authors | Yibing Song, Jiawei Zhang, Lijun Gong, Shengfeng He, Linchao Bao, Jinshan Pan, Qingxiong Yang, Ming-Hsuan Yang |
Abstract | We address the problem of restoring a high-resolution face image from a blurry low-resolution input. This problem is difficult as super-resolution and deblurring need to be tackled simultaneously. Moreover, existing algorithms cannot handle face images well as low-resolution face images do not have much texture which is especially critical for deblurring. In this paper, we propose an effective algorithm by utilizing the domain-specific knowledge of human faces to recover high-quality faces. We first propose a facial component guided deep Convolutional Neural Network (CNN) to restore a coarse face image, which is denoted as the base image where the facial component is automatically generated from the input face image. However, the CNN based method cannot handle image details well. We further develop a novel exemplar-based detail enhancement algorithm via facial component matching. Extensive experiments show that the proposed method outperforms the state-of-the-art algorithms both quantitatively and qualitatively. |
Tasks | Deblurring, Face Hallucination, Super-Resolution |
Published | 2018-11-22 |
URL | http://arxiv.org/abs/1811.09019v1 |
http://arxiv.org/pdf/1811.09019v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-face-hallucination-and-deblurring-via |
Repo | |
Framework | |
Learning random points from geometric graphs or orderings
Title | Learning random points from geometric graphs or orderings |
Authors | Josep Diaz, Colin McDiarmid, Dieter Mitsche |
Abstract | Suppose that there is a family of $n$ random points $X_v$ for $v \in V$, independently and uniformly distributed in the square $\left[-\sqrt{n}/2,\sqrt{n}/2\right]^2$ of area $n$. We do not see these points, but learn about them in one of the following two ways. Suppose first that we are given the corresponding random geometric graph $G$, where distinct vertices $u$ and $v$ are adjacent when the Euclidean distance $d_E(X_u,X_v)$ is at most $r$. If the threshold distance $r$ satisfies $n^{3/14} \ll r \ll n^{1/2}$, then the following holds with high probability. Given the graph $G$ (without any geometric information), in polynomial time we can approximately reconstruct the hidden embedding, in the sense that, up to symmetries', for each vertex $v$ we find a point within distance about $r$ of $X_v$; that is, we find an embedding with displacement’ at most about $r$. Now suppose that, instead of being given the graph $G$, we are given, for each vertex $v$, the ordering of the other vertices by increasing Euclidean distance from $v$. Then, with high probability, in polynomial time we can find an embedding with the much smaller displacement error $O(\sqrt{\log n})$. |
Tasks | |
Published | 2018-09-26 |
URL | https://arxiv.org/abs/1809.09879v2 |
https://arxiv.org/pdf/1809.09879v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-random-points-from-geometric-graphs |
Repo | |
Framework | |
Unifying Bilateral Filtering and Adversarial Training for Robust Neural Networks
Title | Unifying Bilateral Filtering and Adversarial Training for Robust Neural Networks |
Authors | Neale Ratzlaff, Li Fuxin |
Abstract | Recent analysis of deep neural networks has revealed their vulnerability to carefully structured adversarial examples. Many effective algorithms exist to craft these adversarial examples, but performant defenses seem to be far away. In this work, we explore the use of edge-aware bilateral filtering as a projection back to the space of natural images. We show that bilateral filtering is an effective defense in multiple attack settings, where the strength of the adversary gradually increases. In the case of an adversary who has no knowledge of the defense, bilateral filtering can remove more than 90% of adversarial examples from a variety of different attacks. To evaluate against an adversary with complete knowledge of our defense, we adapt the bilateral filter as a trainable layer in a neural network and show that adding this layer makes ImageNet images significantly more robust to attacks. When trained under a framework of adversarial training, we show that the resulting model is hard to fool with even the best attack methods. |
Tasks | |
Published | 2018-04-05 |
URL | http://arxiv.org/abs/1804.01635v3 |
http://arxiv.org/pdf/1804.01635v3.pdf | |
PWC | https://paperswithcode.com/paper/unifying-bilateral-filtering-and-adversarial |
Repo | |
Framework | |
Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input
Title | Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input |
Authors | Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark |
Abstract | The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by developing agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured. |
Tasks | |
Published | 2018-04-11 |
URL | http://arxiv.org/abs/1804.03984v1 |
http://arxiv.org/pdf/1804.03984v1.pdf | |
PWC | https://paperswithcode.com/paper/emergence-of-linguistic-communication-from |
Repo | |
Framework | |
Learning Contracting Vector Fields For Stable Imitation Learning
Title | Learning Contracting Vector Fields For Stable Imitation Learning |
Authors | Vikas Sindhwani, Stephen Tu, Mohi Khansari |
Abstract | We propose a new non-parametric framework for learning incrementally stable dynamical systems x’ = f(x) from a set of sampled trajectories. We construct a rich family of smooth vector fields induced by certain classes of matrix-valued kernels, whose equilibria are placed exactly at a desired set of locations and whose local contraction and curvature properties at various points can be explicitly controlled using convex optimization. With curl-free kernels, our framework may also be viewed as a mechanism to learn potential fields and gradient flows. We develop large-scale techniques using randomized kernel approximations in this context. We demonstrate our approach, called contracting vector fields (CVF), on imitation learning tasks involving complex point-to-point human handwriting motions. |
Tasks | Imitation Learning |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.04878v1 |
http://arxiv.org/pdf/1804.04878v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-contracting-vector-fields-for-stable |
Repo | |
Framework | |
Towards Understanding Language through Perception in Situated Human-Robot Interaction: From Word Grounding to Grammar Induction
Title | Towards Understanding Language through Perception in Situated Human-Robot Interaction: From Word Grounding to Grammar Induction |
Authors | Amir Aly, Tadahiro Taniguchi |
Abstract | Robots are widely collaborating with human users in diferent tasks that require high-level cognitive functions to make them able to discover the surrounding environment. A difcult challenge that we briefy highlight in this short paper is inferring the latent grammatical structure of language, which includes grounding parts of speech (e.g., verbs, nouns, adjectives, and prepositions) through visual perception, and induction of Combinatory Categorial Grammar (CCG) for phrases. This paves the way towards grounding phrases so as to make a robot able to understand human instructions appropriately during interaction. |
Tasks | |
Published | 2018-12-12 |
URL | https://arxiv.org/abs/1812.04840v3 |
https://arxiv.org/pdf/1812.04840v3.pdf | |
PWC | https://paperswithcode.com/paper/towards-understanding-language-through |
Repo | |
Framework | |
A Comprehensive Approach for Learning-based Fully-Automated Inter-slice Motion Correction for Short-Axis Cine Cardiac MR Image Stacks
Title | A Comprehensive Approach for Learning-based Fully-Automated Inter-slice Motion Correction for Short-Axis Cine Cardiac MR Image Stacks |
Authors | Giacomo Tarroni, Ozan Oktay, Matthew Sinclair, Wenjia Bai, Andreas Schuh, Hideaki Suzuki, Antonio de Marvao, Declan O’Regan, Stuart Cook, Daniel Rueckert |
Abstract | In the clinical routine, short axis (SA) cine cardiac MR (CMR) image stacks are acquired during multiple subsequent breath-holds. If the patient cannot consistently hold the breath at the same position, the acquired image stack will be affected by inter-slice respiratory motion and will not correctly represent the cardiac volume, introducing potential errors in the following analyses and visualisations. We propose an approach to automatically correct inter-slice respiratory motion in SA CMR image stacks. Our approach makes use of probabilistic segmentation maps (PSMs) of the left ventricular (LV) cavity generated with decision forests. PSMs are generated for each slice of the SA stack and rigidly registered in-plane to a target PSM. If long axis (LA) images are available, PSMs are generated for them and combined to create the target PSM; if not, the target PSM is produced from the same stack using a 3D model trained from motion-free stacks. The proposed approach was tested on a dataset of SA stacks acquired from 24 healthy subjects (for which anatomical 3D cardiac images were also available as reference) and compared to two techniques which use LA intensity images and LA segmentations as targets, respectively. The results show the accuracy and robustness of the proposed approach in motion compensation. |
Tasks | Motion Compensation |
Published | 2018-10-03 |
URL | http://arxiv.org/abs/1810.02201v1 |
http://arxiv.org/pdf/1810.02201v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comprehensive-approach-for-learning-based |
Repo | |
Framework | |
A Reflectance Based Method For Shadow Detection and Removal
Title | A Reflectance Based Method For Shadow Detection and Removal |
Authors | Sri Kalyan Yarlagadda, Fengqing Zhu |
Abstract | Shadows are common aspect of images and when left undetected can hinder scene understanding and visual processing. We propose a simple yet effective approach based on reflectance to detect shadows from single image. An image is first segmented and based on the reflectance, illumination and texture characteristics, segments pairs are identified as shadow and non-shadow pairs. The proposed method is tested on two publicly available and widely used datasets. Our method achieves higher accuracy in detecting shadows compared to previous reported methods despite requiring fewer parameters. We also show results of shadow-free images by relighting the pixels in the detected shadow regions. |
Tasks | Detecting Shadows, Scene Understanding, Shadow Detection, Shadow Detection And Removal |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04352v1 |
http://arxiv.org/pdf/1807.04352v1.pdf | |
PWC | https://paperswithcode.com/paper/a-reflectance-based-method-for-shadow |
Repo | |
Framework | |
Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks
Title | Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks |
Authors | Reinhard Heckel, Paul Hand |
Abstract | Deep neural networks, in particular convolutional neural networks, have become highly effective tools for compressing images and solving inverse problems including denoising, inpainting, and reconstruction from few and noisy measurements. This success can be attributed in part to their ability to represent and generate natural images well. Contrary to classical tools such as wavelets, image-generating deep neural networks have a large number of parameters—typically a multiple of their output dimension—and need to be trained on large datasets. In this paper, we propose an untrained simple image model, called the deep decoder, which is a deep neural network that can generate natural images from very few weight parameters. The deep decoder has a simple architecture with no convolutions and fewer weight parameters than the output dimensionality. This underparameterization enables the deep decoder to compress images into a concise set of network weights, which we show is on par with wavelet-based thresholding. Further, underparameterization provides a barrier to overfitting, allowing the deep decoder to have state-of-the-art performance for denoising. The deep decoder is simple in the sense that each layer has an identical structure that consists of only one upsampling unit, pixel-wise linear combination of channels, ReLU activation, and channelwise normalization. This simplicity makes the network amenable to theoretical analysis, and it sheds light on the aspects of neural networks that enable them to form effective signal representations. |
Tasks | Denoising |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.03982v1 |
http://arxiv.org/pdf/1810.03982v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-decoder-concise-image-representations |
Repo | |
Framework | |
Generalized chart constraints for efficient PCFG and TAG parsing
Title | Generalized chart constraints for efficient PCFG and TAG parsing |
Authors | Stefan Grünewald, Sophie Henning, Alexander Koller |
Abstract | Chart constraints, which specify at which string positions a constituent may begin or end, have been shown to speed up chart parsers for PCFGs. We generalize chart constraints to more expressive grammar formalisms and describe a neural tagger which predicts chart constraints at very high precision. Our constraints accelerate both PCFG and TAG parsing, and combine effectively with other pruning techniques (coarse-to-fine and supertagging) for an overall speedup of two orders of magnitude, while improving accuracy. |
Tasks | |
Published | 2018-06-27 |
URL | http://arxiv.org/abs/1806.10654v1 |
http://arxiv.org/pdf/1806.10654v1.pdf | |
PWC | https://paperswithcode.com/paper/generalized-chart-constraints-for-efficient |
Repo | |
Framework | |
Bi-GANs-ST for Perceptual Image Super-resolution
Title | Bi-GANs-ST for Perceptual Image Super-resolution |
Authors | Xiaotong Luo, Rong Chen, Yuan Xie, Yanyun Qu, Cuihua Li |
Abstract | Image quality measurement is a critical problem for image super-resolution (SR) algorithms. Usually, they are evaluated by some well-known objective metrics, e.g., PSNR and SSIM, but these indices cannot provide suitable results in accordance with the perception of human being. Recently, a more reasonable perception measurement has been proposed in [1], which is also adopted by the PIRM-SR 2018 challenge. In this paper, motivated by [1], we aim to generate a high-quality SR result which balances between the two indices, i.e., the perception index and root-mean-square error (RMSE). To do so, we design a new deep SR framework, dubbed Bi-GANs-ST, by integrating two complementary generative adversarial networks (GAN) branches. One is memory residual SRGAN (MR-SRGAN), which emphasizes on improving the objective performance, such as reducing the RMSE. The other is weight perception SRGAN (WP-SRGAN), which obtains the result that favors better subjective perception via a two-stage adversarial training mechanism. Then, to produce final result with excellent perception scores and RMSE, we use soft-thresholding method to merge the results generated by the two GANs. Our method performs well on the perceptual image super-resolution task of the PIRM 2018 challenge. Experimental results on five benchmarks show that our proposal achieves highly competent performance compared with other state-of-the-art methods. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00367v1 |
http://arxiv.org/pdf/1811.00367v1.pdf | |
PWC | https://paperswithcode.com/paper/bi-gans-st-for-perceptual-image-super |
Repo | |
Framework | |
Variance reduction properties of the reparameterization trick
Title | Variance reduction properties of the reparameterization trick |
Authors | Ming Xu, Matias Quiroz, Robert Kohn, Scott A. Sisson |
Abstract | The reparameterization trick is widely used in variational inference as it yields more accurate estimates of the gradient of the variational objective than alternative approaches such as the score function method. Although there is overwhelming empirical evidence in the literature showing its success, there is relatively little research exploring why the reparameterization trick is so effective. We explore this under the idealized assumptions that the variational approximation is a mean-field Gaussian density and that the log of the joint density of the model parameters and the data is a quadratic function that depends on the variational mean. From this, we show that the marginal variances of the reparameterization gradient estimator are smaller than those of the score function gradient estimator. We apply the result of our idealized analysis to real-world examples. |
Tasks | |
Published | 2018-09-27 |
URL | http://arxiv.org/abs/1809.10330v3 |
http://arxiv.org/pdf/1809.10330v3.pdf | |
PWC | https://paperswithcode.com/paper/variance-reduction-properties-of-the |
Repo | |
Framework | |
Efficient Large-Scale Multi-Modal Classification
Title | Efficient Large-Scale Multi-Modal Classification |
Authors | D. Kiela, E. Grave, A. Joulin, T. Mikolov |
Abstract | While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multi-modal fusion, with the additional benefit of improved interpretability. |
Tasks | |
Published | 2018-02-06 |
URL | http://arxiv.org/abs/1802.02892v1 |
http://arxiv.org/pdf/1802.02892v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-large-scale-multi-modal |
Repo | |
Framework | |