Paper Group ANR 955
3D Face Hallucination from a Single Depth Frame. SiGAN: Siamese Generative Adversarial Network for Identity-Preserving Face Hallucination. A Bandit Approach to Multiple Testing with False Discovery Control. Face hallucination using cascaded super-resolution and identity priors. A Systematic Analysis for State-of-the-Art 3D Lung Nodule Proposals Gen …
3D Face Hallucination from a Single Depth Frame
Title | 3D Face Hallucination from a Single Depth Frame |
Authors | Shu Liang, Ira Kemelmacher-Shlizerman, Linda G. Shapiro |
Abstract | We present an algorithm that takes a single frame of a person’s face from a depth camera, e.g., Kinect, and produces a high-resolution 3D mesh of the input face. We leverage a dataset of 3D face meshes of 1204 distinct individuals ranging from age 3 to 40, captured in a neutral expression. We divide the input depth frame into semantically significant regions (eyes, nose, mouth, cheeks) and search the database for the best matching shape per region. We further combine the input depth frame with the matched database shapes into a single mesh that results in a high-resolution shape of the input person. Our system is fully automatic and uses only depth data for matching, making it invariant to imaging conditions. We evaluate our results using ground truth shapes, as well as compare to state-of-the-art shape estimation methods. We demonstrate the robustness of our local matching approach with high-quality reconstruction of faces that fall outside of the dataset span, e.g., faces older than 40 years old, facial expressions, and different ethnicities. |
Tasks | Face Hallucination |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.04764v1 |
http://arxiv.org/pdf/1809.04764v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-face-hallucination-from-a-single-depth |
Repo | |
Framework | |
SiGAN: Siamese Generative Adversarial Network for Identity-Preserving Face Hallucination
Title | SiGAN: Siamese Generative Adversarial Network for Identity-Preserving Face Hallucination |
Authors | Chih-Chung Hsu, Chia-Wen Lin, Weng-Tai Su, Gene Cheung |
Abstract | Despite generative adversarial networks (GANs) can hallucinate photo-realistic high-resolution (HR) faces from low-resolution (LR) faces, they cannot guarantee preserving the identities of hallucinated HR faces, making the HR faces poorly recognizable. To address this problem, we propose a Siamese GAN (SiGAN) to reconstruct HR faces that visually resemble their corresponding identities. On top of a Siamese network, the proposed SiGAN consists of a pair of two identical generators and one discriminator. We incorporate reconstruction error and identity label information in the loss function of SiGAN in a pairwise manner. By iteratively optimizing the loss functions of the generator pair and discriminator of SiGAN, we cannot only achieve photo-realistic face reconstruction, but also ensures the reconstructed information is useful for identity recognition. Experimental results demonstrate that SiGAN significantly outperforms existing face hallucination GANs in objective face verification performance, while achieving photo-realistic reconstruction. Moreover, for input LR faces from unknown identities who are not included in training, SiGAN can still do a good job. |
Tasks | Face Hallucination, Face Reconstruction, Face Verification |
Published | 2018-07-22 |
URL | http://arxiv.org/abs/1807.08370v1 |
http://arxiv.org/pdf/1807.08370v1.pdf | |
PWC | https://paperswithcode.com/paper/sigan-siamese-generative-adversarial-network |
Repo | |
Framework | |
A Bandit Approach to Multiple Testing with False Discovery Control
Title | A Bandit Approach to Multiple Testing with False Discovery Control |
Authors | Kevin Jamieson, Lalit Jain |
Abstract | We propose an adaptive sampling approach for multiple testing which aims to maximize statistical power while ensuring anytime false discovery control. We consider $n$ distributions whose means are partitioned by whether they are below or equal to a baseline (nulls), versus above the baseline (actual positives). In addition, each distribution can be sequentially and repeatedly sampled. Inspired by the multi-armed bandit literature, we provide an algorithm that takes as few samples as possible to exceed a target true positive proportion (i.e. proportion of actual positives discovered) while giving anytime control of the false discovery proportion (nulls predicted as actual positives). Our sample complexity results match known information theoretic lower bounds and through simulations we show a substantial performance improvement over uniform sampling and an adaptive elimination style algorithm. Given the simplicity of the approach, and its sample efficiency, the method has promise for wide adoption in the biological sciences, clinical testing for drug discovery, and online A/B/n testing problems. |
Tasks | Drug Discovery |
Published | 2018-09-06 |
URL | https://arxiv.org/abs/1809.02235v3 |
https://arxiv.org/pdf/1809.02235v3.pdf | |
PWC | https://paperswithcode.com/paper/a-bandit-approach-to-multiple-testing-with |
Repo | |
Framework | |
Face hallucination using cascaded super-resolution and identity priors
Title | Face hallucination using cascaded super-resolution and identity priors |
Authors | Klemen Grm, Simon Dobrišek, Walter J. Scheirer, Vitomir Štruc |
Abstract | In this paper we address the problem of hallucinating high-resolution facial images from unaligned low-resolution inputs at high magnification factors. We approach the problem with convolutional neural networks (CNNs) and propose a novel (deep) face hallucination model that incorporates identity priors into the learning procedure. The model consists of two main parts: i) a cascaded super-resolution network that upscales the low-resolution images, and ii) an ensemble of face recognition models that act as identity priors for the super-resolution network during training. Different from competing super-resolution approaches that typically rely on a single model for upscaling (even with large magnification factors), our network uses a cascade of multiple SR models that progressively upscale the low-resolution images using steps of $2\times$. This characteristic allows us to apply supervision signals (target appearances) at different resolutions and incorporate identity constraints at multiple-scales. Our model is able to upscale (very) low-resolution images captured in unconstrained conditions and produce visually convincing results. We rigorously evaluate the proposed model on a large datasets of facial images and report superior performance compared to the state-of-the-art. |
Tasks | Face Hallucination, Face Recognition, Super-Resolution |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.10938v2 |
http://arxiv.org/pdf/1805.10938v2.pdf | |
PWC | https://paperswithcode.com/paper/face-hallucination-using-cascaded-super |
Repo | |
Framework | |
A Systematic Analysis for State-of-the-Art 3D Lung Nodule Proposals Generation
Title | A Systematic Analysis for State-of-the-Art 3D Lung Nodule Proposals Generation |
Authors | Hui Wu, Matrix Yao, Albert Hu, Gaofeng Sun, Xiaokun Yu, Jian Tang |
Abstract | Lung nodule proposals generation is the primary step of lung nodule detection and has received much attention in recent years . In this paper, we first construct a model of 3-dimension Convolutional Neural Network (3D CNN) to generate lung nodule proposals, which can achieve the state-of-the-art performance. Then, we analyze a series of key problems concerning the training performance and efficiency. Firstly, we train the 3D CNN model with data in different resolutions and find out that models trained by high resolution input data achieve better lung nodule proposals generation performances especially for nodules in too small sizes, while consumes much more memory at the same time. Then, we analyze the memory consumptions on different platforms and the experimental results indicate that CPU architecture can provide us with larger memory and enables us to explore more possibilities of 3D applications. We implement the 3D CNN model on CPU platform and propose an Intel Extended-Caffe framework which supports many highly-efficient 3D computations, which is opened source at https://github.com/extendedcaffe/extended-caffe. |
Tasks | Lung Nodule Detection |
Published | 2018-01-09 |
URL | http://arxiv.org/abs/1802.02179v1 |
http://arxiv.org/pdf/1802.02179v1.pdf | |
PWC | https://paperswithcode.com/paper/a-systematic-analysis-for-state-of-the-art-3d |
Repo | |
Framework | |
Reusing Weights in Subword-aware Neural Language Models
Title | Reusing Weights in Subword-aware Neural Language Models |
Authors | Zhenisbek Assylbekov, Rustem Takhanov |
Abstract | We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters. |
Tasks | |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.08375v2 |
http://arxiv.org/pdf/1802.08375v2.pdf | |
PWC | https://paperswithcode.com/paper/reusing-weights-in-subword-aware-neural |
Repo | |
Framework | |
A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm
Title | A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm |
Authors | Konstantin Mishchenko, Franck Iutzeler, Jérôme Malick |
Abstract | We develop and analyze an asynchronous algorithm for distributed convex optimization when the objective writes a sum of smooth functions, local to each worker, and a non-smooth function. Unlike many existing methods, our distributed algorithm is adjustable to various levels of communication cost, delays, machines computational power, and functions smoothness. A unique feature is that the stepsizes do not depend on communication delays nor number of machines, which is highly desirable for scalability. We prove that the algorithm converges linearly in the strongly convex case, and provide guarantees of convergence for the non-strongly convex case. The obtained rates are the same as the vanilla proximal gradient algorithm over some introduced epoch sequence that subsumes the delays of the system. We provide numerical results on large-scale machine learning problems to demonstrate the merits of the proposed method. |
Tasks | |
Published | 2018-06-25 |
URL | https://arxiv.org/abs/1806.09429v3 |
https://arxiv.org/pdf/1806.09429v3.pdf | |
PWC | https://paperswithcode.com/paper/a-distributed-flexible-delay-tolerant |
Repo | |
Framework | |
CerfGAN: A Compact, Effective, Robust, and Fast Model for Unsupervised Multi-Domain Image-to-Image Translation
Title | CerfGAN: A Compact, Effective, Robust, and Fast Model for Unsupervised Multi-Domain Image-to-Image Translation |
Authors | Xiao Liu, Shengchuan Zhang, Hong Liu, Xin Liu, Cheng Deng, Rongrong Ji |
Abstract | In this paper, we aim at solving the multi-domain image-to-image translation problem with a unified model in an unsupervised manner. The most successful work in this area refers to StarGAN, which works well in tasks like face attribute modulation. However, StarGAN is unable to match multiple translation mappings when encountering general translations with very diverse domain shifts. On the other hand, StarGAN adopts an Encoder-Decoder-Discriminator (EDD) architecture, where the model is time-consuming and unstable to train. To this end, we propose a Compact, effective, robust, and fast GAN model, termed CerfGAN, to solve the above problem. In principle, CerfGAN contains a novel component, i.e., a multi-class discriminator (MCD), which gives the model an extremely powerful ability to match multiple translation mappings. To stabilize the training process, MCD also plays a role of the encoder in CerfGAN, which saves a lot of computation and memory costs. We perform extensive experiments to verify the effectiveness of the proposed method. Quantitatively, CerfGAN is demonstrated to handle a serial of image-to-image translation tasks including style transfer, season transfer, face hallucination, etc, where the input images are sampled from diverse domains. The comparisons to several recently proposed approaches demonstrate the superiority and novelty of the proposed method. |
Tasks | Face Hallucination, Image-to-Image Translation, Style Transfer |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.10871v2 |
http://arxiv.org/pdf/1805.10871v2.pdf | |
PWC | https://paperswithcode.com/paper/cerfgan-a-compact-effective-robust-and-fast |
Repo | |
Framework | |
Development and application of a machine learning supported methodology for measurement and verification (M&V) 2.0
Title | Development and application of a machine learning supported methodology for measurement and verification (M&V) 2.0 |
Authors | Colm V. Gallagher, Kevin Leahy, Peter O’Donovan, Ken Bruton, Dominic T. J. O’Sullivan |
Abstract | The foundations of all methodologies for the measurement and verification (M&V) of energy savings are based on the same five key principles: accuracy, completeness, conservatism, consistency and transparency. The most widely accepted methodologies tend to generalise M&V so as to ensure applicability across the spectrum of energy conservation measures (ECM’s). These do not provide a rigid calculation procedure to follow. This paper aims to bridge the gap between high-level methodologies and the practical application of modelling algorithms, with a focus on the industrial buildings sector. This is achieved with the development of a novel, machine learning supported methodology for M&V 2.0 which enables accurate quantification of savings. A novel and computationally efficient feature selection algorithm and powerful machine learning regression algorithms are employed to maximise the effectiveness of available data. The baseline period energy consumption is modelled using artificial neural networks, support vector machines, k-nearest neighbours and multiple ordinary least squares regression. Improved knowledge discovery and an expanded boundary of analysis allow more complex energy systems be analysed, thus increasing the applicability of M&V. A case study in a large biomedical manufacturing facility is used to demonstrate the methodology’s ability to accurately quantify the savings under real-world conditions. The ECM was found to result in 604,527 kWh of energy savings with 57% uncertainty at a confidence interval of 68%. 20 baseline energy models are developed using an exhaustive approach with the optimal model being used to quantify savings. The range of savings estimated with each model are presented and the acceptability of uncertainty is reviewed. The case study demonstrates the ability of the methodology to perform M&V to an acceptable standard in challenging circumstances. |
Tasks | Feature Selection |
Published | 2018-01-24 |
URL | http://arxiv.org/abs/1801.08175v1 |
http://arxiv.org/pdf/1801.08175v1.pdf | |
PWC | https://paperswithcode.com/paper/development-and-application-of-a-machine |
Repo | |
Framework | |
Diagnose like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification
Title | Diagnose like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification |
Authors | Qingji Guan, Yaping Huang, Zhun Zhong, Zhedong Zheng, Liang Zheng, Yi Yang |
Abstract | This paper considers the task of thorax disease classification on chest X-ray images. Existing methods generally use the global image as input for network learning. Such a strategy is limited in two aspects. 1) A thorax disease usually happens in (small) localized areas which are disease specific. Training CNNs using global image may be affected by the (excessive) irrelevant noisy areas. 2) Due to the poor alignment of some CXR images, the existence of irregular borders hinders the network performance. In this paper, we address the above problems by proposing a three-branch attention guided convolution neural network (AG-CNN). AG-CNN 1) learns from disease-specific regions to avoid noise and improve alignment, 2) also integrates a global branch to compensate the lost discriminative cues by local branch. Specifically, we first learn a global CNN branch using global images. Then, guided by the attention heat map generated from the global branch, we inference a mask to crop a discriminative region from the global image. The local region is used for training a local CNN branch. Lastly, we concatenate the last pooling layers of both the global and local branches for fine-tuning the fusion branch. The Comprehensive experiment is conducted on the ChestX-ray14 dataset. We first report a strong global baseline producing an average AUC of 0.841 with ResNet-50 as backbone. After combining the local cues with the global information, AG-CNN improves the average AUC to 0.868. While DenseNet-121 is used, the average AUC achieves 0.871, which is a new state of the art in the community. |
Tasks | |
Published | 2018-01-30 |
URL | http://arxiv.org/abs/1801.09927v1 |
http://arxiv.org/pdf/1801.09927v1.pdf | |
PWC | https://paperswithcode.com/paper/diagnose-like-a-radiologist-attention-guided |
Repo | |
Framework | |
Additional Representations for Improving Synthetic Aperture Sonar Classification Using Convolutional Neural Networks
Title | Additional Representations for Improving Synthetic Aperture Sonar Classification Using Convolutional Neural Networks |
Authors | Isaac Gerg, David Williams |
Abstract | Object classification in synthetic aperture sonar (SAS) imagery is usually a data starved and class imbalanced problem. There are few objects of interest present among much benign seafloor. Despite these problems, current classification techniques discard a large portion of the collected SAS information. In particular, a beamformed SAS image, which we call a single-look complex (SLC) image, contains complex pixels composed of real and imaginary parts. For human consumption, the SLC is converted to a magnitude-phase representation and the phase information is discarded. Even more problematic, the magnitude information usually exhibits a large dynamic range (>80dB) and must be dynamic range compressed for human display. Often it is this dynamic range compressed representation, originally designed for human consumption, which is fed into a classifier. Consequently, the classification process is completely void of the phase information. In this work, we show improvements in classification performance using the phase information from the SLC as well as information from an alternate source: photographs. We perform statistical testing to demonstrate the validity of our results. |
Tasks | Object Classification |
Published | 2018-08-08 |
URL | http://arxiv.org/abs/1808.02868v4 |
http://arxiv.org/pdf/1808.02868v4.pdf | |
PWC | https://paperswithcode.com/paper/additional-representations-for-improving |
Repo | |
Framework | |
Structure Aware SLAM using Quadrics and Planes
Title | Structure Aware SLAM using Quadrics and Planes |
Authors | Mehdi Hosseinzadeh, Yasir Latif, Trung Pham, Niko Suenderhauf, Ian Reid |
Abstract | Simultaneous Localization And Mapping (SLAM) is a fundamental problem in mobile robotics. While point-based SLAM methods provide accurate camera localization, the generated maps lack semantic information. On the other hand, state of the art object detection methods provide rich information about entities present in the scene from a single image. This work marries the two and proposes a method for representing generic objects as quadrics which allows object detections to be seamlessly integrated in a SLAM framework. For scene coverage, additional dominant planar structures are modeled as infinite planes. Experiments show that the proposed points-planes-quadrics representation can easily incorporate Manhattan and object affordance constraints, greatly improving camera localization and leading to semantically meaningful maps. The performance of our SLAM system is demonstrated in https://youtu.be/dR-rB9keF8M . |
Tasks | Camera Localization, Object Detection, Simultaneous Localization and Mapping |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1804.09111v3 |
http://arxiv.org/pdf/1804.09111v3.pdf | |
PWC | https://paperswithcode.com/paper/structure-aware-slam-using-quadrics-and |
Repo | |
Framework | |
HDFD — A High Deformation Facial Dynamics Benchmark for Evaluation of Non-Rigid Surface Registration and Classification
Title | HDFD — A High Deformation Facial Dynamics Benchmark for Evaluation of Non-Rigid Surface Registration and Classification |
Authors | Gareth Andrews, Sam Endean, Roberto Dyke, Yukun Lai, Gwenno Ffrancon, Gary KL Tam |
Abstract | Objects that undergo non-rigid deformation are common in the real world. A typical and challenging example is the human faces. While various techniques have been developed for deformable shape registration and classification, benchmarks with detailed labels and landmarks suitable for evaluating such techniques are still limited. In this paper, we present a novel facial dynamic dataset HDFD which addresses the gap of existing datasets, including 4D funny faces with substantial non-isometric deformation, and 4D visual-audio faces of spoken phrases in a minority language (Welsh). Both datasets are captured from 21 participants. The sequences are manually landmarked, with the spoken phrases further rated by a Welsh expert for level of fluency. These are useful for quantitative evaluation of both registration and classification tasks. We further develop a methodology to evaluate several recent non-rigid surface registration techniques, using our dynamic sequences as test cases. The study demonstrates the significance and usefulness of our new dataset — a challenging benchmark dataset for future techniques. |
Tasks | |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03354v1 |
http://arxiv.org/pdf/1807.03354v1.pdf | |
PWC | https://paperswithcode.com/paper/hdfd-a-high-deformation-facial-dynamics |
Repo | |
Framework | |
Modulated Policy Hierarchies
Title | Modulated Policy Hierarchies |
Authors | Alexander Pashevich, Danijar Hafner, James Davidson, Rahul Sukthankar, Cordelia Schmid |
Abstract | Solving tasks with sparse rewards is a main challenge in reinforcement learning. While hierarchical controllers are an intuitive approach to this problem, current methods often require manual reward shaping, alternating training phases, or manually defined sub tasks. We introduce modulated policy hierarchies (MPH), that can learn end-to-end to solve tasks from sparse rewards. To achieve this, we study different modulation signals and exploration for hierarchical controllers. Specifically, we find that communicating via bit-vectors is more efficient than selecting one out of multiple skills, as it enables mixing between them. To facilitate exploration, MPH uses its different time scales for temporally extended intrinsic motivation at each level of the hierarchy. We evaluate MPH on the robotics tasks of pushing and sparse block stacking, where it outperforms recent baselines. |
Tasks | |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1812.00025v1 |
http://arxiv.org/pdf/1812.00025v1.pdf | |
PWC | https://paperswithcode.com/paper/modulated-policy-hierarchies |
Repo | |
Framework | |
Disease Classification within Dermascopic Images Using features extracted by ResNet50 and classification through Deep Forest
Title | Disease Classification within Dermascopic Images Using features extracted by ResNet50 and classification through Deep Forest |
Authors | Suhita Ray |
Abstract | In this report we propose a classification technique for skin lesion images as a part of our submission for ISIC 2018 Challenge in Skin Lesion Analysis Towards Melanoma Detection. Our data was extracted from the ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection grand challenge datasets. The features are extracted through a Convolutional Neural Network, in our case ResNet50 and then using these features we train a DeepForest, having cascading layers, to classify our skin lesion images. We know that Convolutional Neural Networks are a state-of-the-art technique in representation learning for images, with the convolutional filters learning to detect features from images through backpropagation. These features are then usually fed to a classifier like a softmax layer or other such classifiers for classification tasks. In our case we do not use the traditional backpropagation method and train a softmax layer for classification. Instead, we use Deep Forest, a novel decision tree ensemble approach with performance highly competitive to deep neural networks in a broad range of tasks. Thus we use a ResNet50 to extract the features from skin lesion images and then use the Deep Forest to classify these images. This method has been used because Deep Forest has been found to be hugely efficient in areas where there are only small-scale training data available. Also as the Deep Forest network decides its complexity by itself, it also caters to the problem of dataset imbalance we faced in this problem. |
Tasks | Representation Learning |
Published | 2018-07-16 |
URL | http://arxiv.org/abs/1807.05711v3 |
http://arxiv.org/pdf/1807.05711v3.pdf | |
PWC | https://paperswithcode.com/paper/disease-classification-within-dermascopic |
Repo | |
Framework | |