July 30, 2019

3518 words 17 mins read

Paper Group AWR 4

Empirically Analyzing the Effect of Dataset Biases on Deep Face Recognition Systems. Low-Shot Learning with Imprinted Weights. InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure. Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks. DeepNAT: Deep Convolutional Neural Network …

Empirically Analyzing the Effect of Dataset Biases on Deep Face Recognition Systems


Title	Empirically Analyzing the Effect of Dataset Biases on Deep Face Recognition Systems
Authors	Adam Kortylewski, Bernhard Egger, Andreas Schneider, Thomas Gerig, Andreas Morel-Forster, Thomas Vetter
Abstract	It is unknown what kind of biases modern in the wild face datasets have because of their lack of annotation. A direct consequence of this is that total recognition rates alone only provide limited insight about the generalization ability of a Deep Convolutional Neural Networks (DCNNs). We propose to empirically study the effect of different types of dataset biases on the generalization ability of DCNNs. Using synthetically generated face images, we study the face recognition rate as a function of interpretable parameters such as face pose and light. The proposed method allows valuable details about the generalization performance of different DCNN architectures to be observed and compared. In our experiments, we find that: 1) Indeed, dataset bias has a significant influence on the generalization performance of DCNNs. 2) DCNNs can generalize surprisingly well to unseen illumination conditions and large sampling gaps in the pose variation. 3) Using the presented methodology we reveal that the VGG-16 architecture outperforms the AlexNet architecture at face recognition tasks because it can much better generalize to unseen face poses, although it has significantly more parameters. 4) We uncover a main limitation of current DCNN architectures, which is the difficulty to generalize when different identities to not share the same pose variation. 5) We demonstrate that our findings on synthetic data also apply when learning from real-world data. Our face image generator is publicly available to enable the community to benchmark other DCNN architectures.
Tasks	Face Recognition
Published	2017-12-05
URL	http://arxiv.org/abs/1712.01619v4
PDF	http://arxiv.org/pdf/1712.01619v4.pdf
PWC	https://paperswithcode.com/paper/empirically-analyzing-the-effect-of-dataset
Repo	https://github.com/unibas-gravis/parametric-face-image-generator
Framework	none

Low-Shot Learning with Imprinted Weights


Title	Low-Shot Learning with Imprinted Weights
Authors	Hang Qi, Matthew Brown, David G. Lowe
Abstract	Human vision is able to immediately recognize novel visual categories after seeing just one or a few training examples. We describe how to add a similar capability to ConvNet classifiers by directly setting the final layer weights from novel training examples during low-shot learning. We call this process weight imprinting as it directly sets weights for a new category based on an appropriately scaled copy of the embedding layer activations for that training example. The imprinting process provides a valuable complement to training with stochastic gradient descent, as it provides immediate good classification performance and an initialization for any further fine-tuning in the future. We show how this imprinting process is related to proxy-based embeddings. However, it differs in that only a single imprinted weight vector is learned for each novel category, rather than relying on a nearest-neighbor distance to training instances as typically used with embedding methods. Our experiments show that using averaging of imprinted weights provides better generalization than using nearest-neighbor instance embeddings.
Tasks
Published	2017-12-19
URL	http://arxiv.org/abs/1712.07136v2
PDF	http://arxiv.org/pdf/1712.07136v2.pdf
PWC	https://paperswithcode.com/paper/low-shot-learning-with-imprinted-weights
Repo	https://github.com/Kao1126/edgetpu-TransferLearning
Framework	none

InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure


Title	InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure
Authors	Victor Adrian Prisacariu, Olaf Kähler, Stuart Golodetz, Michael Sapienza, Tommaso Cavallari, Philip H S Torr, David W Murray
Abstract	Volumetric models have become a popular representation for 3D scenes in recent years. One breakthrough leading to their popularity was KinectFusion, which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM has since also been tackled with very similar approaches. Representing the reconstruction volumetrically as a TSDF leads to most of the simplicity and efficiency that can be achieved with GPU implementations of these systems. However, this representation is memory-intensive and limits applicability to small-scale reconstructions. Several avenues have been explored to overcome this. With the aim of summarizing them and providing for a fast, flexible 3D reconstruction pipeline, we propose a new, unifying framework called InfiniTAM. The idea is that steps like camera tracking, scene representation and integration of new data can easily be replaced and adapted to the user’s needs. This report describes the technical implementation details of InfiniTAM v3, the third version of our InfiniTAM system. We have added various new features, as well as making numerous enhancements to the low-level code that significantly improve our camera tracking performance. The new features that we expect to be of most interest are (i) a robust camera tracking module; (ii) an implementation of Glocker et al.‘s keyframe-based random ferns camera relocaliser; (iii) a novel approach to globally-consistent TSDF-based reconstruction, based on dividing the scene into rigid submaps and optimising the relative poses between them; and (iv) an implementation of Keller et al.‘s surfel-based reconstruction approach.
Tasks	3D Reconstruction, Simultaneous Localization and Mapping
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00783v1
PDF	http://arxiv.org/pdf/1708.00783v1.pdf
PWC	https://paperswithcode.com/paper/infinitam-v3-a-framework-for-large-scale-3d
Repo	https://github.com/victorprad/InfiniTAM
Framework	none

Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks


Title	Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks
Authors	Lars Mescheder, Sebastian Nowozin, Andreas Geiger
Abstract	Variational Autoencoders (VAEs) are expressive latent variable models that can be used to learn complex probability distributions from training data. However, the quality of the resulting model crucially relies on the expressiveness of the inference model. We introduce Adversarial Variational Bayes (AVB), a technique for training Variational Autoencoders with arbitrarily expressive inference models. We achieve this by introducing an auxiliary discriminative network that allows to rephrase the maximum-likelihood-problem as a two-player game, hence establishing a principled connection between VAEs and Generative Adversarial Networks (GANs). We show that in the nonparametric limit our method yields an exact maximum-likelihood assignment for the parameters of the generative model, as well as the exact posterior distribution over the latent variables given an observation. Contrary to competing approaches which combine VAEs with GANs, our approach has a clear theoretical justification, retains most advantages of standard Variational Autoencoders and is easy to implement.
Tasks	Latent Variable Models
Published	2017-01-17
URL	http://arxiv.org/abs/1701.04722v4
PDF	http://arxiv.org/pdf/1701.04722v4.pdf
PWC	https://paperswithcode.com/paper/adversarial-variational-bayes-unifying
Repo	https://github.com/LMescheder/AdversarialVariationalBayes
Framework	tf

DeepNAT: Deep Convolutional Neural Network for Segmenting Neuroanatomy


Title	DeepNAT: Deep Convolutional Neural Network for Segmenting Neuroanatomy
Authors	Christian Wachinger, Martin Reuter, Tassilo Klein
Abstract	We introduce DeepNAT, a 3D Deep convolutional neural network for the automatic segmentation of NeuroAnaTomy in T1-weighted magnetic resonance images. DeepNAT is an end-to-end learning-based approach to brain segmentation that jointly learns an abstract feature representation and a multi-class classification. We propose a 3D patch-based approach, where we do not only predict the center voxel of the patch but also neighbors, which is formulated as multi-task learning. To address a class imbalance problem, we arrange two networks hierarchically, where the first one separates foreground from background, and the second one identifies 25 brain structures on the foreground. Since patches lack spatial context, we augment them with coordinates. To this end, we introduce a novel intrinsic parameterization of the brain volume, formed by eigenfunctions of the Laplace-Beltrami operator. As network architecture, we use three convolutional layers with pooling, batch normalization, and non-linearities, followed by fully connected layers with dropout. The final segmentation is inferred from the probabilistic output of the network with a 3D fully connected conditional random field, which ensures label agreement between close voxels. The roughly 2.7 million parameters in the network are learned with stochastic gradient descent. Our results show that DeepNAT compares favorably to state-of-the-art methods. Finally, the purely learning-based method may have a high potential for the adaptation to young, old, or diseased brains by fine-tuning the pre-trained network with a small training sample on the target application, where the availability of larger datasets with manual annotations may boost the overall segmentation accuracy in the future.
Tasks	Brain Segmentation, Multi-Task Learning
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08192v1
PDF	http://arxiv.org/pdf/1702.08192v1.pdf
PWC	https://paperswithcode.com/paper/deepnat-deep-convolutional-neural-network-for
Repo	https://github.com/ai-med/DeepNAT
Framework	none

Convolutional Recurrent Neural Networks for Dynamic MR Image Reconstruction


Title	Convolutional Recurrent Neural Networks for Dynamic MR Image Reconstruction
Authors	Chen Qin, Jo Schlemper, Jose Caballero, Anthony Price, Joseph V. Hajnal, Daniel Rueckert
Abstract	Accelerating the data acquisition of dynamic magnetic resonance imaging (MRI) leads to a challenging ill-posed inverse problem, which has received great interest from both the signal processing and machine learning community over the last decades. The key ingredient to the problem is how to exploit the temporal correlation of the MR sequence to resolve the aliasing artefact. Traditionally, such observation led to a formulation of a non-convex optimisation problem, which were solved using iterative algorithms. Recently, however, deep learning based-approaches have gained significant popularity due to its ability to solve general inversion problems. In this work, we propose a unique, novel convolutional recurrent neural network (CRNN) architecture which reconstructs high quality cardiac MR images from highly undersampled k-space data by jointly exploiting the dependencies of the temporal sequences as well as the iterative nature of the traditional optimisation algorithms. In particular, the proposed architecture embeds the structure of the traditional iterative algorithms, efficiently modelling the recurrence of the iterative reconstruction stages by using recurrent hidden connections over such iterations. In addition, spatiotemporal dependencies are simultaneously learnt by exploiting bidirectional recurrent hidden connections across time sequences. The proposed algorithm is able to learn both the temporal dependency and the iterative reconstruction process effectively with only a very small number of parameters, while outperforming current MR reconstruction methods in terms of computational complexity, reconstruction accuracy and speed.
Tasks	Image Reconstruction
Published	2017-12-05
URL	http://arxiv.org/abs/1712.01751v3
PDF	http://arxiv.org/pdf/1712.01751v3.pdf
PWC	https://paperswithcode.com/paper/convolutional-recurrent-neural-networks-for-7
Repo	https://github.com/js3611/Deep-MRI-Reconstruction
Framework	pytorch

A Deep Cascade of Convolutional Neural Networks for Dynamic MR Image Reconstruction


Title	A Deep Cascade of Convolutional Neural Networks for Dynamic MR Image Reconstruction
Authors	Jo Schlemper, Jose Caballero, Joseph V. Hajnal, Anthony Price, Daniel Rueckert
Abstract	Inspired by recent advances in deep learning, we propose a framework for reconstructing dynamic sequences of 2D cardiac magnetic resonance (MR) images from undersampled data using a deep cascade of convolutional neural networks (CNNs) to accelerate the data acquisition process. In particular, we address the case where data is acquired using aggressive Cartesian undersampling. Firstly, we show that when each 2D image frame is reconstructed independently, the proposed method outperforms state-of-the-art 2D compressed sensing approaches such as dictionary learning-based MR image reconstruction, in terms of reconstruction error and reconstruction speed. Secondly, when reconstructing the frames of the sequences jointly, we demonstrate that CNNs can learn spatio-temporal correlations efficiently by combining convolution and data sharing approaches. We show that the proposed method consistently outperforms state-of-the-art methods and is capable of preserving anatomical structure more faithfully up to 11-fold undersampling. Moreover, reconstruction is very fast: each complete dynamic sequence can be reconstructed in less than 10s and, for the 2D case, each image frame can be reconstructed in 23ms, enabling real-time applications.
Tasks	Dictionary Learning, Image Reconstruction
Published	2017-04-08
URL	http://arxiv.org/abs/1704.02422v2
PDF	http://arxiv.org/pdf/1704.02422v2.pdf
PWC	https://paperswithcode.com/paper/a-deep-cascade-of-convolutional-neural
Repo	https://github.com/js3611/Deep-MRI-Reconstruction
Framework	pytorch

A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction


Title	A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction
Authors	Jo Schlemper, Jose Caballero, Joseph V. Hajnal, Anthony Price, Daniel Rueckert
Abstract	The acquisition of Magnetic Resonance Imaging (MRI) is inherently slow. Inspired by recent advances in deep learning, we propose a framework for reconstructing MR images from undersampled data using a deep cascade of convolutional neural networks to accelerate the data acquisition process. We show that for Cartesian undersampling of 2D cardiac MR images, the proposed method outperforms the state-of-the-art compressed sensing approaches, such as dictionary learning-based MRI (DLMRI) reconstruction, in terms of reconstruction error, perceptual quality and reconstruction speed for both 3-fold and 6-fold undersampling. Compared to DLMRI, the error produced by the method proposed is approximately twice as small, allowing to preserve anatomical structures more faithfully. Using our method, each image can be reconstructed in 23 ms, which is fast enough to enable real-time applications.
Tasks	Dictionary Learning, Image Reconstruction
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00555v1
PDF	http://arxiv.org/pdf/1703.00555v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-cascade-of-convolutional-neural-1
Repo	https://github.com/js3611/Deep-MRI-Reconstruction
Framework	pytorch

Bayesian GAN


Title	Bayesian GAN
Authors	Yunus Saatchi, Andrew Gordon Wilson
Abstract	Generative adversarial networks (GANs) can implicitly learn rich distributions over images, audio, and data which are hard to model with an explicit likelihood. We present a practical Bayesian formulation for unsupervised and semi-supervised learning with GANs. Within this framework, we use stochastic gradient Hamiltonian Monte Carlo to marginalize the weights of the generator and discriminator networks. The resulting approach is straightforward and obtains good performance without any standard interventions such as feature matching, or mini-batch discrimination. By exploring an expressive posterior over the parameters of the generator, the Bayesian GAN avoids mode-collapse, produces interpretable and diverse candidate samples, and provides state-of-the-art quantitative results for semi-supervised learning on benchmarks including SVHN, CelebA, and CIFAR-10, outperforming DCGAN, Wasserstein GANs, and DCGAN ensembles.
Tasks
Published	2017-05-26
URL	http://arxiv.org/abs/1705.09558v3
PDF	http://arxiv.org/pdf/1705.09558v3.pdf
PWC	https://paperswithcode.com/paper/bayesian-gan
Repo	https://github.com/rafa2000/Top-TensorFlow
Framework	tf

Reconstruction of three-dimensional porous media using generative adversarial neural networks


Title	Reconstruction of three-dimensional porous media using generative adversarial neural networks
Authors	Lukas Mosser, Olivier Dubrule, Martin J. Blunt
Abstract	To evaluate the variability of multi-phase flow properties of porous media at the pore scale, it is necessary to acquire a number of representative samples of the void-solid structure. While modern x-ray computer tomography has made it possible to extract three-dimensional images of the pore space, assessment of the variability in the inherent material properties is often experimentally not feasible. We present a novel method to reconstruct the solid-void structure of porous media by applying a generative neural network that allows an implicit description of the probability distribution represented by three-dimensional image datasets. We show, by using an adversarial learning approach for neural networks, that this method of unsupervised learning is able to generate representative samples of porous media that honor their statistics. We successfully compare measures of pore morphology, such as the Euler characteristic, two-point statistics and directional single-phase permeability of synthetic realizations with the calculated properties of a bead pack, Berea sandstone, and Ketton limestone. Results show that GANs can be used to reconstruct high-resolution three-dimensional images of porous media at different scales that are representative of the morphology of the images used to train the neural network. The fully convolutional nature of the trained neural network allows the generation of large samples while maintaining computational efficiency. Compared to classical stochastic methods of image reconstruction, the implicit representation of the learned data distribution can be stored and reused to generate multiple realizations of the pore structure very rapidly.
Tasks	Image Reconstruction
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03225v1
PDF	http://arxiv.org/pdf/1704.03225v1.pdf
PWC	https://paperswithcode.com/paper/reconstruction-of-three-dimensional-porous
Repo	https://github.com/LukasMosser/PorousMediaGan
Framework	pytorch

Retrospective Higher-Order Markov Processes for User Trails


Title	Retrospective Higher-Order Markov Processes for User Trails
Authors	Tao Wu, David Gleich
Abstract	Users form information trails as they browse the web, checkin with a geolocation, rate items, or consume media. A common problem is to predict what a user might do next for the purposes of guidance, recommendation, or prefetching. First-order and higher-order Markov chains have been widely used methods to study such sequences of data. First-order Markov chains are easy to estimate, but lack accuracy when history matters. Higher-order Markov chains, in contrast, have too many parameters and suffer from overfitting the training data. Fitting these parameters with regularization and smoothing only offers mild improvements. In this paper we propose the retrospective higher-order Markov process (RHOMP) as a low-parameter model for such sequences. This model is a special case of a higher-order Markov chain where the transitions depend retrospectively on a single history state instead of an arbitrary combination of history states. There are two immediate computational advantages: the number of parameters is linear in the order of the Markov chain and the model can be fit to large state spaces. Furthermore, by providing a specific structure to the higher-order chain, RHOMPs improve the model accuracy by efficiently utilizing history states without risks of overfitting the data. We demonstrate how to estimate a RHOMP from data and we demonstrate the effectiveness of our method on various real application datasets spanning geolocation data, review sequences, and business locations. The RHOMP model uniformly outperforms higher-order Markov chains, Kneser-Ney regularization, and tensor factorizations in terms of prediction accuracy.
Tasks
Published	2017-04-20
URL	http://arxiv.org/abs/1704.05982v1
PDF	http://arxiv.org/pdf/1704.05982v1.pdf
PWC	https://paperswithcode.com/paper/retrospective-higher-order-markov-processes
Repo	https://github.com/wutao27/RHOMP
Framework	none

Composing Distributed Representations of Relational Patterns


Title	Composing Distributed Representations of Relational Patterns
Authors	Sho Takase, Naoaki Okazaki, Kentaro Inui
Abstract	Learning distributed representations for relation instances is a central technique in downstream NLP applications. In order to address semantic modeling of relational patterns, this paper constructs a new dataset that provides multiple similarity ratings for every pair of relational patterns on the existing dataset. In addition, we conduct a comparative study of different encoders including additive composition, RNN, LSTM, and GRU for composing distributed representations of relational patterns. We also present Gated Additive Composition, which is an enhancement of additive composition with the gating mechanism. Experiments show that the new dataset does not only enable detailed analyses of the different encoders, but also provides a gauge to predict successes of distributed representations of relational patterns in the relation classification task.
Tasks	Relation Classification
Published	2017-07-23
URL	http://arxiv.org/abs/1707.07265v1
PDF	http://arxiv.org/pdf/1707.07265v1.pdf
PWC	https://paperswithcode.com/paper/composing-distributed-representations-of
Repo	https://github.com/takase/relPatSim
Framework	none

Deep Neural Network Architectures for Modulation Classification


Title	Deep Neural Network Architectures for Modulation Classification
Authors	Xiaoyu Liu, Diyu Yang, Aly El Gamal
Abstract	In this work, we investigate the value of employing deep learning for the task of wireless signal modulation recognition. Recently in [1], a framework has been introduced by generating a dataset using GNU radio that mimics the imperfections in a real wireless channel, and uses 10 different modulation types. Further, a convolutional neural network (CNN) architecture was developed and shown to deliver performance that exceeds that of expert-based approaches. Here, we follow the framework of [1] and find deep neural network architectures that deliver higher accuracy than the state of the art. We tested the architecture of [1] and found it to achieve an accuracy of approximately 75% of correctly recognizing the modulation type. We first tune the CNN architecture of [1] and find a design with four convolutional layers and two dense layers that gives an accuracy of approximately 83.8% at high SNR. We then develop architectures based on the recently introduced ideas of Residual Networks (ResNet [2]) and Densely Connected Networks (DenseNet [3]) to achieve high SNR accuracies of approximately 83.5% and 86.6%, respectively. Finally, we introduce a Convolutional Long Short-term Deep Neural Network (CLDNN [4]) to achieve an accuracy of approximately 88.5% at high SNR.
Tasks
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00443v3
PDF	http://arxiv.org/pdf/1712.00443v3.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-architectures-for
Repo	https://github.com/dl4amc/source
Framework	tf

Dealing with Integer-valued Variables in Bayesian Optimization with Gaussian Processes


Title	Dealing with Integer-valued Variables in Bayesian Optimization with Gaussian Processes
Authors	Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato
Abstract	Bayesian optimization (BO) methods are useful for optimizing functions that are expensive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the objective function, typically a Gaussian process (GP), upon which an acquisition function is built. This function guides the optimization process and measures the expected utility of performing an evaluation of the objective at a new point. GPs assume continous input variables. When this is not the case, such as when some of the input variables take integer values, one has to introduce extra approximations. A common approach is to round the suggested variable value to the closest integer before doing the evaluation of the objective. We show that this can lead to problems in the optimization process and describe a more principled approach to account for input variables that are integer-valued. We illustrate in both synthetic and a real experiments the utility of our approach, which significantly improves the results of standard BO methods on problems involving integer-valued variables.
Tasks	Gaussian Processes
Published	2017-06-12
URL	http://arxiv.org/abs/1706.03673v2
PDF	http://arxiv.org/pdf/1706.03673v2.pdf
PWC	https://paperswithcode.com/paper/dealing-with-integer-valued-variables-in
Repo	https://github.com/darthdeus/master-thesis-code
Framework	none

Aesthetic-Driven Image Enhancement by Adversarial Learning


Title	Aesthetic-Driven Image Enhancement by Adversarial Learning
Authors	Yubin Deng, Chen Change Loy, Xiaoou Tang
Abstract	We introduce EnhanceGAN, an adversarial learning based model that performs automatic image enhancement. Traditional image enhancement frameworks typically involve training models in a fully-supervised manner, which require expensive annotations in the form of aligned image pairs. In contrast to these approaches, our proposed EnhanceGAN only requires weak supervision (binary labels on image aesthetic quality) and is able to learn enhancement operators for the task of aesthetic-based image enhancement. In particular, we show the effectiveness of a piecewise color enhancement module trained with weak supervision, and extend the proposed EnhanceGAN framework to learning a deep filtering-based aesthetic enhancer. The full differentiability of our image enhancement operators enables the training of EnhanceGAN in an end-to-end manner. We further demonstrate the capability of EnhanceGAN in learning aesthetic-based image cropping without any groundtruth cropping pairs. Our weakly-supervised EnhanceGAN reports competitive quantitative results on aesthetic-based color enhancement as well as automatic image cropping, and a user study confirms that our image enhancement results are on par with or even preferred over professional enhancement.
Tasks	Image Cropping, Image Enhancement
Published	2017-07-17
URL	http://arxiv.org/abs/1707.05251v2
PDF	http://arxiv.org/pdf/1707.05251v2.pdf
PWC	https://paperswithcode.com/paper/aesthetic-driven-image-enhancement-by
Repo	https://github.com/dannysdeng/EnhanceGAN
Framework	torch