October 21, 2019

3255 words 16 mins read

Paper Group AWR 4

Learning Free-Form Deformations for 3D Object Reconstruction. Dense 3D Object Reconstruction from a Single Depth View. Adversarial Vision Challenge. Towards a Robust Parameterization for Conditioning Facies Models Using Deep Variational Autoencoders and Ensemble Smoother. Heterogeneous Multi-output Gaussian Process Prediction. DeepDRR – A Catalyst …

Learning Free-Form Deformations for 3D Object Reconstruction


Title	Learning Free-Form Deformations for 3D Object Reconstruction
Authors	Dominic Jack, Jhony K. Pontes, Sridha Sridharan, Clinton Fookes, Sareh Shirazi, Frederic Maire, Anders Eriksson
Abstract	Representing 3D shape in deep learning frameworks in an accurate, efficient and compact manner still remains an open challenge. Most existing work addresses this issue by employing voxel-based representations. While these approaches benefit greatly from advances in computer vision by generalizing 2D convolutions to the 3D setting, they also have several considerable drawbacks. The computational complexity of voxel-encodings grows cubically with the resolution thus limiting such representations to low-resolution 3D reconstruction. In an attempt to solve this problem, point cloud representations have been proposed. Although point clouds are more efficient than voxel representations as they only cover surfaces rather than volumes, they do not encode detailed geometric information about relationships between points. In this paper we propose a method to learn free-form deformations (FFD) for the task of 3D reconstruction from a single image. By learning to deform points sampled from a high-quality mesh, our trained model can be used to produce arbitrarily dense point clouds or meshes with fine-grained geometry. We evaluate our proposed framework on both synthetic and real-world data and achieve state-of-the-art results on point-cloud and volumetric metrics. Additionally, we qualitatively demonstrate its applicability to label transferring for 3D semantic segmentation.
Tasks	3D Object Reconstruction, 3D Reconstruction, 3D Semantic Segmentation, Object Reconstruction, Semantic Segmentation
Published	2018-03-29
URL	http://arxiv.org/abs/1803.10932v1
PDF	http://arxiv.org/pdf/1803.10932v1.pdf
PWC	https://paperswithcode.com/paper/learning-free-form-deformations-for-3d-object
Repo	https://github.com/jackd/template_ffd
Framework	tf

Dense 3D Object Reconstruction from a Single Depth View


Title	Dense 3D Object Reconstruction from a Single Depth View
Authors	Bo Yang, Stefano Rosa, Andrew Markham, Niki Trigoni, Hongkai Wen
Abstract	In this paper, we propose a novel approach, 3D-RecGAN++, which reconstructs the complete 3D structure of a given object from a single arbitrary depth view using generative adversarial networks. Unlike existing work which typically requires multiple views of the same object or class labels to recover the full 3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation of a depth view of the object as input, and is able to generate the complete 3D occupancy grid with a high resolution of 256^3 by recovering the occluded/missing regions. The key idea is to combine the generative capabilities of autoencoders and the conditional Generative Adversarial Networks (GAN) framework, to infer accurate and fine-grained 3D structures of objects in high-dimensional voxel space. Extensive experiments on large synthetic datasets and real-world Kinect datasets show that the proposed 3D-RecGAN++ significantly outperforms the state of the art in single view 3D object reconstruction, and is able to reconstruct unseen types of objects.
Tasks	3D Object Reconstruction, Object Reconstruction
Published	2018-02-01
URL	http://arxiv.org/abs/1802.00411v2
PDF	http://arxiv.org/pdf/1802.00411v2.pdf
PWC	https://paperswithcode.com/paper/dense-3d-object-reconstruction-from-a-single
Repo	https://github.com/Yang7879/3D-RecGAN-extended
Framework	tf

Adversarial Vision Challenge


Title	Adversarial Vision Challenge
Authors	Wieland Brendel, Jonas Rauber, Alexey Kurakin, Nicolas Papernot, Behar Veliqi, Marcel Salathé, Sharada P. Mohanty, Matthias Bethge
Abstract	The NIPS 2018 Adversarial Vision Challenge is a competition to facilitate measurable progress towards robust machine vision models and more generally applicable adversarial attacks. This document is an updated version of our competition proposal that was accepted in the competition track of 32nd Conference on Neural Information Processing Systems (NIPS 2018).
Tasks
Published	2018-08-06
URL	http://arxiv.org/abs/1808.01976v2
PDF	http://arxiv.org/pdf/1808.01976v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-vision-challenge
Repo	https://github.com/paperblack/AVC-18
Framework	none

Towards a Robust Parameterization for Conditioning Facies Models Using Deep Variational Autoencoders and Ensemble Smoother


Title	Towards a Robust Parameterization for Conditioning Facies Models Using Deep Variational Autoencoders and Ensemble Smoother
Authors	Smith W. A. Canchumuni, Alexandre A. Emerick, Marco Aurélio C. Pacheco
Abstract	The literature about history matching is vast and despite the impressive number of methods proposed and the significant progresses reported in the last decade, conditioning reservoir models to dynamic data is still a challenging task. Ensemble-based methods are among the most successful and efficient techniques currently available for history matching. These methods are usually able to achieve reasonable data matches, especially if an iterative formulation is employed. However, they sometimes fail to preserve the geological realism of the model, which is particularly evident in reservoir with complex facies distributions. This occurs mainly because of the Gaussian assumptions inherent in these methods. This fact has encouraged an intense research activity to develop parameterizations for facies history matching. Despite the large number of publications, the development of robust parameterizations for facies remains an open problem. Deep learning techniques have been delivering impressive results in a number of different areas and the first applications in data assimilation in geoscience have started to appear in literature. The present paper reports the current results of our investigations on the use of deep neural networks towards the construction of a continuous parameterization of facies which can be used for data assimilation with ensemble methods. Specifically, we use a convolutional variational autoencoder and the ensemble smoother with multiple data assimilation. We tested the parameterization in three synthetic history-matching problems with channelized facies. We focus on this type of facies because they are among the most challenging to preserve after the assimilation of data. The parameterization showed promising results outperforming previous methods and generating well-defined channelized facies.
Tasks
Published	2018-12-17
URL	http://arxiv.org/abs/1812.06900v1
PDF	http://arxiv.org/pdf/1812.06900v1.pdf
PWC	https://paperswithcode.com/paper/towards-a-robust-parameterization-for
Repo	https://github.com/smith31t/GeoFacies_DL
Framework	none

Heterogeneous Multi-output Gaussian Process Prediction


Title	Heterogeneous Multi-output Gaussian Process Prediction
Authors	Pablo Moreno-Muñoz, Antonio Artés-Rodríguez, Mauricio A. Álvarez
Abstract	We present a novel extension of multi-output Gaussian processes for handling heterogeneous outputs. We assume that each output has its own likelihood function and use a vector-valued Gaussian process prior to jointly model the parameters in all likelihoods as latent functions. Our multi-output Gaussian process uses a covariance function with a linear model of coregionalisation form. Assuming conditional independence across the underlying latent functions together with an inducing variable framework, we are able to obtain tractable variational bounds amenable to stochastic variational inference. We illustrate the performance of the model on synthetic data and two real datasets: a human behavioral study and a demographic high-dimensional dataset.
Tasks	Gaussian Processes
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07633v2
PDF	http://arxiv.org/pdf/1805.07633v2.pdf
PWC	https://paperswithcode.com/paper/heterogeneous-multi-output-gaussian-process
Repo	https://github.com/pmorenoz/HetMOGP
Framework	none

DeepDRR – A Catalyst for Machine Learning in Fluoroscopy-guided Procedures


Title	DeepDRR – A Catalyst for Machine Learning in Fluoroscopy-guided Procedures
Authors	Mathias Unberath, Jan-Nico Zaech, Sing Chun Lee, Bastian Bier, Javad Fotouhi, Mehran Armand, Nassir Navab
Abstract	Machine learning-based approaches outperform competing methods in most disciplines relevant to diagnostic radiology. Interventional radiology, however, has not yet benefited substantially from the advent of deep learning, in particular because of two reasons: 1) Most images acquired during the procedure are never archived and are thus not available for learning, and 2) even if they were available, annotations would be a severe challenge due to the vast amounts of data. When considering fluoroscopy-guided procedures, an interesting alternative to true interventional fluoroscopy is in silico simulation of the procedure from 3D diagnostic CT. In this case, labeling is comparably easy and potentially readily available, yet, the appropriateness of resulting synthetic data is dependent on the forward model. In this work, we propose DeepDRR, a framework for fast and realistic simulation of fluoroscopy and digital radiography from CT scans, tightly integrated with the software platforms native to deep learning. We use machine learning for material decomposition and scatter estimation in 3D and 2D, respectively, combined with analytic forward projection and noise injection to achieve the required performance. On the example of anatomical landmark detection in X-ray images of the pelvis, we demonstrate that machine learning models trained on DeepDRRs generalize to unseen clinically acquired data without the need for re-training or domain adaptation. Our results are promising and promote the establishment of machine learning in fluoroscopy-guided procedures.
Tasks	Domain Adaptation
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08606v1
PDF	http://arxiv.org/pdf/1803.08606v1.pdf
PWC	https://paperswithcode.com/paper/deepdrr-a-catalyst-for-machine-learning-in
Repo	https://github.com/mathiasunberath/DeepDRR
Framework	pytorch

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation


Title	PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation
Authors	Perttu Hämäläinen, Amin Babadi, Xiaoxiao Ma, Jaakko Lehtinen
Abstract	Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress and may make the algorithm prone to getting stuck in local optima. Drawing inspiration from CMA-ES, a black-box evolutionary optimization method designed for robustness in similar situations, we propose PPO-CMA, a proximal policy optimization approach that adaptively expands the exploration variance to speed up progress. This can be considered as a form of action-space momentum. With only minor changes to PPO, our algorithm considerably improves performance in Roboschool continuous control benchmarks.
Tasks	Continuous Control
Published	2018-10-05
URL	https://arxiv.org/abs/1810.02541v7
PDF	https://arxiv.org/pdf/1810.02541v7.pdf
PWC	https://paperswithcode.com/paper/ppo-cma-proximal-policy-optimization-with
Repo	https://github.com/ppocma/ppocma
Framework	tf

Ancient-Modern Chinese Translation with a Large Training Dataset


Title	Ancient-Modern Chinese Translation with a Large Training Dataset
Authors	Dayiheng Liu, Jiancheng Lv, Kexin Yang, Qian Qu
Abstract	Ancient Chinese brings the wisdom and spirit culture of the Chinese nation. Automatic translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. However, the lack of large-scale parallel corpus limits the study of machine translation in Ancient-Modern Chinese. In this paper, we propose an Ancient-Modern Chinese clause alignment approach based on the characteristics of these two languages. This method combines both lexical-based information and statistical-based information, which achieves 94.2 F1-score on our manual annotation Test set. We use this method to create a new large-scale Ancient-Modern Chinese parallel corpus which contains 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset. Furthermore, we analyzed and compared the performance of the SMT and various NMT models on this dataset and provided a strong baseline for this task.
Tasks	Machine Translation
Published	2018-08-11
URL	https://arxiv.org/abs/1808.03738v2
PDF	https://arxiv.org/pdf/1808.03738v2.pdf
PWC	https://paperswithcode.com/paper/ancient-modern-chinese-translation-with-a
Repo	https://github.com/dayihengliu/a2m_chineseNMT
Framework	none

On Learning Associations of Faces and Voices


Title	On Learning Associations of Faces and Voices
Authors	Changil Kim, Hijung Valentina Shin, Tae-Hyun Oh, Alexandre Kaspar, Mohamed Elgharib, Wojciech Matusik
Abstract	In this paper, we study the associations between human faces and voices. Audiovisual integration, specifically the integration of facial and vocal information is a well-researched area in neuroscience. It is shown that the overlapping information between the two modalities plays a significant role in perceptual tasks such as speaker identification. Through an online study on a new dataset we created, we confirm previous findings that people can associate unseen faces with corresponding voices and vice versa with greater than chance accuracy. We computationally model the overlapping information between faces and voices and show that the learned cross-modal representation contains enough information to identify matching faces and voices with performance similar to that of humans. Our representation exhibits correlations to certain demographic attributes and features obtained from either visual or aural modality alone. We release our dataset of audiovisual recordings and demographic annotations of people reading out short text used in our studies.
Tasks	Speaker Identification
Published	2018-05-15
URL	http://arxiv.org/abs/1805.05553v3
PDF	http://arxiv.org/pdf/1805.05553v3.pdf
PWC	https://paperswithcode.com/paper/on-learning-associations-of-faces-and-voices
Repo	https://github.com/changil/facevoice
Framework	tf

O-HAZE: a dehazing benchmark with real hazy and haze-free outdoor images


Title	O-HAZE: a dehazing benchmark with real hazy and haze-free outdoor images
Authors	Codruta O. Ancuti, Cosmin Ancuti, Radu Timofte, Christophe De Vleeschouwer
Abstract	Haze removal or dehazing is a challenging ill-posed problem that has drawn a significant attention in the last few years. Despite this growing interest, the scientific community is still lacking a reference dataset to evaluate objectively and quantitatively the performance of proposed dehazing methods. The few datasets that are currently considered, both for assessment and training of learning-based dehazing techniques, exclusively rely on synthetic hazy images. To address this limitation, we introduce the first outdoor scenes database (named O-HAZE) composed of pairs of real hazy and corresponding haze-free images. In practice, hazy images have been captured in presence of real haze, generated by professional haze machines, and OHAZE contains 45 different outdoor scenes depicting the same visual content recorded in haze-free and hazy conditions, under the same illumination parameters. To illustrate its usefulness, O-HAZE is used to compare a representative set of state-of-the-art dehazing techniques, using traditional image quality metrics such as PSNR, SSIM and CIEDE2000. This reveals the limitations of current techniques, and questions some of their underlying assumptions.
Tasks
Published	2018-04-13
URL	http://arxiv.org/abs/1804.05101v1
PDF	http://arxiv.org/pdf/1804.05101v1.pdf
PWC	https://paperswithcode.com/paper/o-haze-a-dehazing-benchmark-with-real-hazy
Repo	https://github.com/inyong37/Vision
Framework	tf

Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification


Title	Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification
Authors	Bo Yang Hsueh, Wei Li, I-Chen Wu
Abstract	Learning rate scheduler has been a critical issue in the deep neural network training. Several schedulers and methods have been proposed, including step decay scheduler, adaptive method, cosine scheduler and cyclical scheduler. This paper proposes a new scheduling method, named hyperbolic-tangent decay (HTD). We run experiments on several benchmarks such as: ResNet, Wide ResNet and DenseNet for CIFAR-10 and CIFAR-100 datasets, LSTM for PAMAP2 dataset, ResNet on ImageNet and Fashion-MNIST datasets. In our experiments, HTD outperforms step decay and cosine scheduler in nearly all cases, while requiring less hyperparameters than step decay, and more flexible than cosine scheduler. Code is available at https://github.com/BIGBALLON/HTD.
Tasks
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01593v2
PDF	http://arxiv.org/pdf/1806.01593v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-gradient-descent-with-hyperbolic
Repo	https://github.com/BIGBALLON/HTD
Framework	pytorch

Training Deep Face Recognition Systems with Synthetic Data


Title	Training Deep Face Recognition Systems with Synthetic Data
Authors	Adam Kortylewski, Andreas Schneider, Thomas Gerig, Bernhard Egger, Andreas Morel-Forster, Thomas Vetter
Abstract	Recent advances in deep learning have significantly increased the performance of face recognition systems. The performance and reliability of these models depend heavily on the amount and quality of the training data. However, the collection of annotated large datasets does not scale well and the control over the quality of the data decreases with the size of the dataset. In this work, we explore how synthetically generated data can be used to decrease the number of real-world images needed for training deep face recognition systems. In particular, we make use of a 3D morphable face model for the generation of images with arbitrary amounts of facial identities and with full control over image variations, such as pose, illumination, and background. In our experiments with an off-the-shelf face recognition software we observe the following phenomena: 1) The amount of real training data needed to train competitive deep face recognition systems can be reduced significantly. 2) Combining large-scale real-world data with synthetic data leads to an increased performance. 3) Models trained only on synthetic data with strong variations in pose, illumination, and background perform very well across different datasets even without dataset adaptation. 4) The real-to-virtual performance gap can be closed when using synthetic data for pre-training, followed by fine-tuning with real-world images. 5) There are no observable negative effects of pre-training with synthetic data. Thus, any face recognition system in our experiments benefits from using synthetic face images. The synthetic data generator, as well as all experiments, are publicly available.
Tasks	Face Recognition
Published	2018-02-16
URL	http://arxiv.org/abs/1802.05891v1
PDF	http://arxiv.org/pdf/1802.05891v1.pdf
PWC	https://paperswithcode.com/paper/training-deep-face-recognition-systems-with
Repo	https://github.com/unibas-gravis/parametric-face-image-generator
Framework	none


Title	PoseFix: Model-agnostic General Human Pose Refinement Network
Authors	Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee
Abstract	Multi-person pose estimation from a 2D image is an essential technique for human behavior understanding. In this paper, we propose a human pose refinement network that estimates a refined pose from a tuple of an input image and input pose. The pose refinement was performed mainly through an end-to-end trainable multi-stage architecture in previous methods. However, they are highly dependent on pose estimation models and require careful model design. By contrast, we propose a model-agnostic pose refinement method. According to a recent study, state-of-the-art 2D human pose estimation methods have similar error distributions. We use this error statistics as prior information to generate synthetic poses and use the synthesized poses to train our model. In the testing stage, pose estimation results of any other methods can be input to the proposed method. Moreover, the proposed model does not require code or knowledge about other methods, which allows it to be easily used in the post-processing step. We show that the proposed approach achieves better performance than the conventional multi-stage refinement models and consistently improves the performance of various state-of-the-art pose estimation methods on the commonly used benchmark. The code is available in this https URL\footnote{\url{https://github.com/mks0601/PoseFix_RELEASE}}.
Tasks	Keypoint Detection, Multi-Person Pose Estimation, Pose Estimation
Published	2018-12-10
URL	http://arxiv.org/abs/1812.03595v3
PDF	http://arxiv.org/pdf/1812.03595v3.pdf
PWC	https://paperswithcode.com/paper/posefix-model-agnostic-general-human-pose
Repo	https://github.com/mks0601/PoseFix_RELEASE
Framework	tf

Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model


Title	Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model
Authors	Baris Gecer, Binod Bhattarai, Josef Kittler, Tae-Kyun Kim
Abstract	We propose a novel end-to-end semi-supervised adversarial framework to generate photorealistic face images of new identities with wide ranges of expressions, poses, and illuminations conditioned by a 3D morphable model. Previous adversarial style-transfer methods either supervise their networks with large volume of paired data or use unpaired data with a highly under-constrained two-way generative framework in an unsupervised fashion. We introduce pairwise adversarial supervision to constrain two-way domain adaptation by a small number of paired real and synthetic images for training along with the large volume of unpaired data. Extensive qualitative and quantitative experiments are performed to validate our idea. Generated face images of new identities contain pose, lighting and expression diversity and qualitative results show that they are highly constraint by the synthetic input image while adding photorealism and retaining identity information. We combine face images generated by the proposed method with the real data set to train face recognition algorithms. We evaluated the model on two challenging data sets: LFW and IJB-A. We observe that the generated images from our framework consistently improves over the performance of deep face recognition network trained with Oxford VGG Face dataset and achieves comparable results to the state-of-the-art.
Tasks	Domain Adaptation, Face Generation, Face Recognition, Style Transfer
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03675v1
PDF	http://arxiv.org/pdf/1804.03675v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-adversarial-learning-to
Repo	https://github.com/barisgecer/facegan
Framework	tf

Support Vector Guided Softmax Loss for Face Recognition


Title	Support Vector Guided Softmax Loss for Face Recognition
Authors	Xiaobo Wang, Shuo Wang, Shifeng Zhang, Tianyu Fu, Hailin Shi, Tao Mei
Abstract	Face recognition has witnessed significant progresses due to the advances of deep convolutional neural networks (CNNs), the central challenge of which, is feature discrimination. To address it, one group tries to exploit mining-based strategies (\textit{e.g.}, hard example mining and focal loss) to focus on the informative examples. The other group devotes to designing margin-based loss functions (\textit{e.g.}, angular, additive and additive angular margins) to increase the feature margin from the perspective of ground truth class. Both of them have been well-verified to learn discriminative features. However, they suffer from either the ambiguity of hard examples or the lack of discriminative power of other classes. In this paper, we design a novel loss function, namely support vector guided softmax loss (SV-Softmax), which adaptively emphasizes the mis-classified points (support vectors) to guide the discriminative features learning. So the developed SV-Softmax loss is able to eliminate the ambiguity of hard examples as well as absorb the discriminative power of other classes, and thus results in more discrimiantive features. To the best of our knowledge, this is the first attempt to inherit the advantages of mining-based and margin-based losses into one framework. Experimental results on several benchmarks have demonstrated the effectiveness of our approach over state-of-the-arts.
Tasks	Face Recognition
Published	2018-12-29
URL	http://arxiv.org/abs/1812.11317v1
PDF	http://arxiv.org/pdf/1812.11317v1.pdf
PWC	https://paperswithcode.com/paper/support-vector-guided-softmax-loss-for-face
Repo	https://github.com/comratvlad/sv_softmax
Framework	tf