January 31, 2020

3069 words 15 mins read

Paper Group AWR 457

Paper Group AWR 457

SinGAN: Learning a Generative Model from a Single Natural Image. Boundary Effect-Aware Visual Tracking for UAV with Online Enhanced Background Learning and Multi-Frame Consensus Verification. The Limited Multi-Label Projection Layer. Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research. Learning to Regress 3D Face Shape and Expressi …

SinGAN: Learning a Generative Model from a Single Natural Image

Title SinGAN: Learning a Generative Model from a Single Natural Image
Authors Tamar Rott Shaham, Tali Dekel, Tomer Michaeli
Abstract We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks.
Tasks Image Generation
Published 2019-05-02
URL https://arxiv.org/abs/1905.01164v2
PDF https://arxiv.org/pdf/1905.01164v2.pdf
PWC https://paperswithcode.com/paper/singan-learning-a-generative-model-from-a
Repo https://github.com/tamarott/SinGAN
Framework pytorch

Boundary Effect-Aware Visual Tracking for UAV with Online Enhanced Background Learning and Multi-Frame Consensus Verification

Title Boundary Effect-Aware Visual Tracking for UAV with Online Enhanced Background Learning and Multi-Frame Consensus Verification
Authors Changhong Fu, Ziyuan Huang, Yiming Li, Ran Duan, Peng Lu
Abstract Due to implicitly introduced periodic shifting of limited searching area, visual object tracking using correlation filters often has to confront undesired boundary effect. As boundary effect severely degrade the quality of object model, it has made it a challenging task for unmanned aerial vehicles (UAV) to perform robust and accurate object following. Traditional hand-crafted features are also not precise and robust enough to describe the object in the viewing point of UAV. In this work, a novel tracker with online enhanced background learning is specifically proposed to tackle boundary effects. Real background samples are densely extracted to learn as well as update correlation filters. Spatial penalization is introduced to offset the noise introduced by exceedingly more background information so that a more accurate appearance model can be established. Meanwhile, convolutional features are extracted to provide a more comprehensive representation of the object. In order to mitigate changes of objects’ appearances, multi-frame technique is applied to learn an ideal response map and verify the generated one in each frame. Exhaustive experiments were conducted on 100 challenging UAV image sequences and the proposed tracker has achieved state-of-the-art performance.
Tasks Object Tracking, Visual Object Tracking, Visual Tracking
Published 2019-08-10
URL https://arxiv.org/abs/1908.03701v1
PDF https://arxiv.org/pdf/1908.03701v1.pdf
PWC https://paperswithcode.com/paper/boundary-effect-aware-visual-tracking-for-uav
Repo https://github.com/vision4robotics/BEVT-tracker
Framework none

The Limited Multi-Label Projection Layer

Title The Limited Multi-Label Projection Layer
Authors Brandon Amos, Vladlen Koltun, J. Zico Kolter
Abstract We propose the Limited Multi-Label (LML) projection layer as a new primitive operation for end-to-end learning systems. The LML layer provides a probabilistic way of modeling multi-label predictions limited to having exactly k labels. We derive efficient forward and backward passes for this layer and show how the layer can be used to optimize the top-k recall for multi-label tasks with incomplete label information. We evaluate LML layers on top-k CIFAR-100 classification and scene graph generation. We show that LML layers add a negligible amount of computational overhead, strictly improve the model’s representational capacity, and improve accuracy. We also revisit the truncated top-k entropy method as a competitive baseline for top-k classification.
Tasks Graph Generation, Scene Graph Generation
Published 2019-06-20
URL https://arxiv.org/abs/1906.08707v3
PDF https://arxiv.org/pdf/1906.08707v3.pdf
PWC https://paperswithcode.com/paper/the-limited-multi-label-projection-layer
Repo https://github.com/locuslab/lml
Framework pytorch

Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research

Title Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research
Authors Krishna Murthy Jatavallabhula, Edward Smith, Jean-Francois Lafleche, Clement Fuji Tsang, Artem Rozantsev, Wenzheng Chen, Tommy Xiang, Rev Lebaredian, Sanja Fidler
Abstract We present Kaolin, a PyTorch library aiming to accelerate 3D deep learning research. Kaolin provides efficient implementations of differentiable 3D modules for use in deep learning systems. With functionality to load and preprocess several popular 3D datasets, and native functions to manipulate meshes, pointclouds, signed distance functions, and voxel grids, Kaolin mitigates the need to write wasteful boilerplate code. Kaolin packages together several differentiable graphics modules including rendering, lighting, shading, and view warping. Kaolin also supports an array of loss functions and evaluation metrics for seamless evaluation and provides visualization functionality to render the 3D results. Importantly, we curate a comprehensive model zoo comprising many state-of-the-art 3D deep learning architectures, to serve as a starting point for future research endeavours. Kaolin is available as open-source software at https://github.com/NVIDIAGameWorks/kaolin/.
Tasks
Published 2019-11-12
URL https://arxiv.org/abs/1911.05063v2
PDF https://arxiv.org/pdf/1911.05063v2.pdf
PWC https://paperswithcode.com/paper/kaolin-a-pytorch-library-for-accelerating-3d
Repo https://github.com/NVIDIAGameWorks/kaolin
Framework pytorch

Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

Title Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision
Authors Soubhik Sanyal, Timo Bolkart, Haiwen Feng, Michael J. Black
Abstract The estimation of 3D face shape from a single image must be robust to variations in lighting, head pose, expression, facial hair, makeup, and occlusions. Robustness requires a large training set of in-the-wild images, which by construction, lack ground truth 3D shape. To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image. Our key observation is that an individual’s face shape is constant across images, regardless of expression, pose, lighting, etc. RingNet leverages multiple images of a person and automatically detected 2D face features. It uses a novel loss that encourages the face shape to be similar when the identity is the same and different for different people. We achieve invariance to expression by representing the face using the FLAME model. Once trained, our method takes a single image and outputs the parameters of FLAME, which can be readily animated. Additionally we create a new database of faces `not quite in-the-wild’ (NoW) with 3D head scans and high-resolution images of the subjects in a wide variety of conditions. We evaluate publicly available methods and find that RingNet is more accurate than methods that use 3D supervision. The dataset, model, and results are available for research purposes at http://ringnet.is.tuebingen.mpg.de. |
Tasks 3D Face Reconstruction
Published 2019-05-16
URL https://arxiv.org/abs/1905.06817v1
PDF https://arxiv.org/pdf/1905.06817v1.pdf
PWC https://paperswithcode.com/paper/learning-to-regress-3d-face-shape-and
Repo https://github.com/soubhiksanyal/RingNet
Framework tf

SparseMask: Differentiable Connectivity Learning for Dense Image Prediction

Title SparseMask: Differentiable Connectivity Learning for Dense Image Prediction
Authors Huikai Wu, Junge Zhang, Kaiqi Huang
Abstract In this paper, we aim at automatically searching an efficient network architecture for dense image prediction. Particularly, we follow the encoder-decoder style and focus on designing a connectivity structure for the decoder. To achieve that, we design a densely connected network with learnable connections, named Fully Dense Network, which contains a large set of possible final connectivity structures. We then employ gradient descent to search the optimal connectivity from the dense connections. The search process is guided by a novel loss function, which pushes the weight of each connection to be binary and the connections to be sparse. The discovered connectivity achieves competitive results on two segmentation datasets, while runs more than three times faster and requires less than half parameters compared to the state-of-the-art methods. An extensive experiment shows that the discovered connectivity is compatible with various backbones and generalizes well to other dense image prediction tasks.
Tasks
Published 2019-04-16
URL https://arxiv.org/abs/1904.07642v2
PDF https://arxiv.org/pdf/1904.07642v2.pdf
PWC https://paperswithcode.com/paper/sparsemask-differentiable-connectivity
Repo https://github.com/wuhuikai/SparseMask
Framework pytorch

Hallucinating Optical Flow Features for Video Classification

Title Hallucinating Optical Flow Features for Video Classification
Authors Yongyi Tang, Lin Ma, Lianqiang Zhou
Abstract Appearance and motion are two key components to depict and characterize the video content. Currently, the two-stream models have achieved state-of-the-art performances on video classification. However, extracting motion information, specifically in the form of optical flow features, is extremely computationally expensive, especially for large-scale video classification. In this paper, we propose a motion hallucination network, namely MoNet, to imagine the optical flow features from the appearance features, with no reliance on the optical flow computation. Specifically, MoNet models the temporal relationships of the appearance features and exploits the contextual relationships of the optical flow features with concurrent connections. Extensive experimental results demonstrate that the proposed MoNet can effectively and efficiently hallucinate the optical flow features, which together with the appearance features consistently improve the video classification performances. Moreover, MoNet can help cutting down almost a half of computational and data-storage burdens for the two-stream video classification. Our code is available at: https://github.com/YongyiTang92/MoNet-Features.
Tasks Optical Flow Estimation, Video Classification
Published 2019-05-28
URL https://arxiv.org/abs/1905.11799v2
PDF https://arxiv.org/pdf/1905.11799v2.pdf
PWC https://paperswithcode.com/paper/hallucinating-optical-flow-features-for-video
Repo https://github.com/YongyiTang92/MoNet-Features
Framework tf

Unsupervised Medical Image Segmentation with Adversarial Networks: From Edge Diagrams to Segmentation Maps

Title Unsupervised Medical Image Segmentation with Adversarial Networks: From Edge Diagrams to Segmentation Maps
Authors Umaseh Sivanesan, Luis H. Braga, Ranil R. Sonnadara, Kiret Dhindsa
Abstract We develop and approach to unsupervised semantic medical image segmentation that extends previous work with generative adversarial networks. We use existing edge detection methods to construct simple edge diagrams, train a generative model to convert them into synthetic medical images, and construct a dataset of synthetic images with known segmentations using variations on extracted edge diagrams. This synthetic dataset is then used to train a supervised image segmentation model. We test our approach on a clinical dataset of kidney ultrasound images and the benchmark ISIC 2018 skin lesion dataset. We show that our unsupervised approach is more accurate than previous unsupervised methods, and performs reasonably compared to supervised image segmentation models. All code and trained models are available at https://github.com/kiretd/Unsupervised-MIseg.
Tasks Edge Detection, Medical Image Segmentation, Semantic Segmentation
Published 2019-11-12
URL https://arxiv.org/abs/1911.05140v1
PDF https://arxiv.org/pdf/1911.05140v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-medical-image-segmentation-with
Repo https://github.com/kiretd/Unsupervised-MIseg
Framework tf

Spatio-Temporal Alignments: Optimal transport through space and time

Title Spatio-Temporal Alignments: Optimal transport through space and time
Authors Hicham Janati, Marco Cuturi, Alexandre Gramfort
Abstract Comparing data defined over space and time is notoriously hard, because it involves quantifying both spatial and temporal variability, while at the same time taking into account the chronological structure of data. Dynamic Time Warping (DTW) computes an optimal alignment between time series in agreement with the chronological order, but is inherently blind to spatial shifts. In this paper, we propose Spatio-Temporal Alignments (STA), a new differentiable formulation of DTW, in which spatial differences between time samples are accounted for using regularized optimal transport (OT). Our temporal alignments are handled through a smooth variant of DTW called soft-DTW, for which we prove a new property: soft-DTW increases quadratically with time shifts. The cost matrix within soft-DTW that we use are computed using unbalanced OT, to handle the case in which observations are not normalized probabilities. Experiments on handwritten letters and brain imaging data confirm our theoretical findings and illustrate the effectiveness of STA as a dissimilarity for spatio-temporal data.
Tasks Time Series
Published 2019-10-09
URL https://arxiv.org/abs/1910.03860v3
PDF https://arxiv.org/pdf/1910.03860v3.pdf
PWC https://paperswithcode.com/paper/spatio-temporal-alignments-optimal-transport
Repo https://github.com/hichamjanati/spatio-temporal-alignements
Framework pytorch

Unsupervised Community Detection with Modularity-Based Attention Model

Title Unsupervised Community Detection with Modularity-Based Attention Model
Authors Ivan Lobov, Sergey Ivanov
Abstract In this paper we take a problem of unsupervised nodes clustering on graphs and show how recent advances in attention models can be applied successfully in a “hard” regime of the problem. We propose an unsupervised algorithm that encodes Bethe Hessian embeddings by optimizing soft modularity loss and argue that our model is competitive to both classical and Graph Neural Network (GNN) models while it can be trained on a single graph.
Tasks Community Detection
Published 2019-05-20
URL https://arxiv.org/abs/1905.10350v1
PDF https://arxiv.org/pdf/1905.10350v1.pdf
PWC https://paperswithcode.com/paper/190510350
Repo https://github.com/Ivanopolo/modnet
Framework tf

Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

Title Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification
Authors Xiaoyu Zhang, Jingqing Zhang, Kai Sun, Xian Yang, Chengliang Dai, Yike Guo
Abstract Different aspects of a clinical sample can be revealed by multiple types of omics data. Integrated analysis of multi-omics data provides a comprehensive view of patients, which has the potential to facilitate more accurate clinical decision making. However, omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The “dimensionality curse” makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classification network to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase without the classifier and a supervised phase with the classifier. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average classification accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than other existing methods. The OmiVAE model learned from multi-omics data outperformed that using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification.
Tasks Decision Making
Published 2019-08-17
URL https://arxiv.org/abs/1908.06278v1
PDF https://arxiv.org/pdf/1908.06278v1.pdf
PWC https://paperswithcode.com/paper/integrated-multi-omics-analysis-using
Repo https://github.com/zhangxiaoyu11/OmiVAE
Framework pytorch

ThirdEye: Triplet Based Iris Recognition without Normalization

Title ThirdEye: Triplet Based Iris Recognition without Normalization
Authors Sohaib Ahmad, Benjamin Fuller
Abstract Most iris recognition pipelines involve three stages: segmenting into iris/non-iris pixels, normalization the iris region to a fixed area, and extracting relevant features for comparison. Given recent advances in deep learning it is prudent to ask which stages are required for accurate iris recognition. Lojez et al. (IWBF 2019) recently concluded that the segmentation stage is still crucial for good accuracy.We ask if normalization is beneficial? Towards answering this question, we develop a new iris recognition system called ThirdEye based on triplet convolutional neural networks (Schroff et al., ICCV 2015). ThirdEye directly uses segmented images without normalization. We observe equal error rates of 1.32%, 9.20%, and 0.59% on the ND-0405, UbirisV2, and IITD datasets respectively. For IITD, the most constrained dataset, this improves on the best prior work. However, for ND-0405 and UbirisV2,our equal error rate is slightly worse than prior systems. Our concluding hypothesis is that normalization is more important for less constrained environments.
Tasks Iris Recognition
Published 2019-07-13
URL https://arxiv.org/abs/1907.06147v1
PDF https://arxiv.org/pdf/1907.06147v1.pdf
PWC https://paperswithcode.com/paper/thirdeye-triplet-based-iris-recognition
Repo https://github.com/sohaib50k/ThirdEye---Iris-recognition-using-triplets
Framework tf

Importance Estimation for Neural Network Pruning

Title Importance Estimation for Neural Network Pruning
Authors Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz
Abstract Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter’s contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-of-the-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning.
Tasks Network Pruning
Published 2019-06-25
URL https://arxiv.org/abs/1906.10771v1
PDF https://arxiv.org/pdf/1906.10771v1.pdf
PWC https://paperswithcode.com/paper/importance-estimation-for-neural-network-1
Repo https://github.com/NVlabs/Taylor_pruning
Framework pytorch

GRATIS: GeneRAting TIme Series with diverse and controllable characteristics

Title GRATIS: GeneRAting TIme Series with diverse and controllable characteristics
Authors Yanfei Kang, Rob J Hyndman, Feng Li
Abstract The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires either collecting or simulating a diverse set of time series benchmarking data to enable reliable comparisons against alternative approaches. We propose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We simulate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application.
Tasks Time Series, Time Series Analysis, Time Series Forecasting
Published 2019-03-07
URL https://arxiv.org/abs/1903.02787v2
PDF https://arxiv.org/pdf/1903.02787v2.pdf
PWC https://paperswithcode.com/paper/gratis-generating-time-series-with-diverse
Repo https://github.com/xqnwang/fuma
Framework none

Pluralistic Image Completion

Title Pluralistic Image Completion
Authors Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
Abstract Most image completion methods produce only one result for each masked input, although there may be many reasonable possibilities. In this paper, we present an approach for \textbf{pluralistic image completion} – the task of generating multiple and diverse plausible solutions for image completion. A major challenge faced by learning-based approaches is that usually only one ground truth training instance per label. As such, sampling from conditional VAEs still leads to minimal diversity. To overcome this, we propose a novel and probabilistically principled framework with two parallel paths. One is a reconstructive path that utilizes the only one given ground truth to get prior distribution of missing parts and rebuild the original image from this distribution. The other is a generative path for which the conditional prior is coupled to the distribution obtained in the reconstructive path. Both are supported by GANs. We also introduce a new short+long term attention layer that exploits distant relations among decoder and encoder features, improving appearance consistency. When tested on datasets with buildings (Paris), faces (CelebA-HQ), and natural images (ImageNet), our method not only generated higher-quality completion results, but also with multiple and diverse plausible outputs.
Tasks Image Inpainting
Published 2019-03-11
URL http://arxiv.org/abs/1903.04227v2
PDF http://arxiv.org/pdf/1903.04227v2.pdf
PWC https://paperswithcode.com/paper/pluralistic-image-completion
Repo https://github.com/lyndonzheng/Pluralistic-Inpainting
Framework pytorch
comments powered by Disqus