January 31, 2020

3069 words 15 mins read

Paper Group AWR 457

SinGAN: Learning a Generative Model from a Single Natural Image. Boundary Effect-Aware Visual Tracking for UAV with Online Enhanced Background Learning and Multi-Frame Consensus Verification. The Limited Multi-Label Projection Layer. Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research. Learning to Regress 3D Face Shape and Expressi …

SinGAN: Learning a Generative Model from a Single Natural Image


Title	SinGAN: Learning a Generative Model from a Single Natural Image
Authors	Tamar Rott Shaham, Tali Dekel, Tomer Michaeli
Abstract	We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks.
Tasks	Image Generation
Published	2019-05-02
URL	https://arxiv.org/abs/1905.01164v2
PDF	https://arxiv.org/pdf/1905.01164v2.pdf
PWC	https://paperswithcode.com/paper/singan-learning-a-generative-model-from-a
Repo	https://github.com/tamarott/SinGAN
Framework	pytorch

Boundary Effect-Aware Visual Tracking for UAV with Online Enhanced Background Learning and Multi-Frame Consensus Verification


Title	Boundary Effect-Aware Visual Tracking for UAV with Online Enhanced Background Learning and Multi-Frame Consensus Verification
Authors	Changhong Fu, Ziyuan Huang, Yiming Li, Ran Duan, Peng Lu
Abstract	Due to implicitly introduced periodic shifting of limited searching area, visual object tracking using correlation filters often has to confront undesired boundary effect. As boundary effect severely degrade the quality of object model, it has made it a challenging task for unmanned aerial vehicles (UAV) to perform robust and accurate object following. Traditional hand-crafted features are also not precise and robust enough to describe the object in the viewing point of UAV. In this work, a novel tracker with online enhanced background learning is specifically proposed to tackle boundary effects. Real background samples are densely extracted to learn as well as update correlation filters. Spatial penalization is introduced to offset the noise introduced by exceedingly more background information so that a more accurate appearance model can be established. Meanwhile, convolutional features are extracted to provide a more comprehensive representation of the object. In order to mitigate changes of objects’ appearances, multi-frame technique is applied to learn an ideal response map and verify the generated one in each frame. Exhaustive experiments were conducted on 100 challenging UAV image sequences and the proposed tracker has achieved state-of-the-art performance.
Tasks	Object Tracking, Visual Object Tracking, Visual Tracking
Published	2019-08-10
URL	https://arxiv.org/abs/1908.03701v1
PDF	https://arxiv.org/pdf/1908.03701v1.pdf
PWC	https://paperswithcode.com/paper/boundary-effect-aware-visual-tracking-for-uav
Repo	https://github.com/vision4robotics/BEVT-tracker
Framework	none

The Limited Multi-Label Projection Layer


Title	The Limited Multi-Label Projection Layer
Authors	Brandon Amos, Vladlen Koltun, J. Zico Kolter
Abstract	We propose the Limited Multi-Label (LML) projection layer as a new primitive operation for end-to-end learning systems. The LML layer provides a probabilistic way of modeling multi-label predictions limited to having exactly k labels. We derive efficient forward and backward passes for this layer and show how the layer can be used to optimize the top-k recall for multi-label tasks with incomplete label information. We evaluate LML layers on top-k CIFAR-100 classification and scene graph generation. We show that LML layers add a negligible amount of computational overhead, strictly improve the model’s representational capacity, and improve accuracy. We also revisit the truncated top-k entropy method as a competitive baseline for top-k classification.
Tasks	Graph Generation, Scene Graph Generation
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08707v3
PDF	https://arxiv.org/pdf/1906.08707v3.pdf
PWC	https://paperswithcode.com/paper/the-limited-multi-label-projection-layer
Repo	https://github.com/locuslab/lml
Framework	pytorch

Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research


Title	Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research
Authors	Krishna Murthy Jatavallabhula, Edward Smith, Jean-Francois Lafleche, Clement Fuji Tsang, Artem Rozantsev, Wenzheng Chen, Tommy Xiang, Rev Lebaredian, Sanja Fidler
Abstract	We present Kaolin, a PyTorch library aiming to accelerate 3D deep learning research. Kaolin provides efficient implementations of differentiable 3D modules for use in deep learning systems. With functionality to load and preprocess several popular 3D datasets, and native functions to manipulate meshes, pointclouds, signed distance functions, and voxel grids, Kaolin mitigates the need to write wasteful boilerplate code. Kaolin packages together several differentiable graphics modules including rendering, lighting, shading, and view warping. Kaolin also supports an array of loss functions and evaluation metrics for seamless evaluation and provides visualization functionality to render the 3D results. Importantly, we curate a comprehensive model zoo comprising many state-of-the-art 3D deep learning architectures, to serve as a starting point for future research endeavours. Kaolin is available as open-source software at https://github.com/NVIDIAGameWorks/kaolin/.
Tasks
Published	2019-11-12
URL	https://arxiv.org/abs/1911.05063v2
PDF	https://arxiv.org/pdf/1911.05063v2.pdf
PWC	https://paperswithcode.com/paper/kaolin-a-pytorch-library-for-accelerating-3d
Repo	https://github.com/NVIDIAGameWorks/kaolin
Framework	pytorch

Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision


Title	Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision
Authors	Soubhik Sanyal, Timo Bolkart, Haiwen Feng, Michael J. Black
Abstract	The estimation of 3D face shape from a single image must be robust to variations in lighting, head pose, expression, facial hair, makeup, and occlusions. Robustness requires a large training set of in-the-wild images, which by construction, lack ground truth 3D shape. To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image. Our key observation is that an individual’s face shape is constant across images, regardless of expression, pose, lighting, etc. RingNet leverages multiple images of a person and automatically detected 2D face features. It uses a novel loss that encourages the face shape to be similar when the identity is the same and different for different people. We achieve invariance to expression by representing the face using the FLAME model. Once trained, our method takes a single image and outputs the parameters of FLAME, which can be readily animated. Additionally we create a new database of faces `not quite in-the-wild’ (NoW) with 3D head scans and high-resolution images of the subjects in a wide variety of conditions. We evaluate publicly available methods and find that RingNet is more accurate than methods that use 3D supervision. The dataset, model, and results are available for research purposes at http://ringnet.is.tuebingen.mpg.de. \|
Tasks	3D Face Reconstruction
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06817v1
PDF	https://arxiv.org/pdf/1905.06817v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-regress-3d-face-shape-and
Repo	https://github.com/soubhiksanyal/RingNet
Framework	tf

SparseMask: Differentiable Connectivity Learning for Dense Image Prediction


Title	SparseMask: Differentiable Connectivity Learning for Dense Image Prediction
Authors	Huikai Wu, Junge Zhang, Kaiqi Huang
Abstract	In this paper, we aim at automatically searching an efficient network architecture for dense image prediction. Particularly, we follow the encoder-decoder style and focus on designing a connectivity structure for the decoder. To achieve that, we design a densely connected network with learnable connections, named Fully Dense Network, which contains a large set of possible final connectivity structures. We then employ gradient descent to search the optimal connectivity from the dense connections. The search process is guided by a novel loss function, which pushes the weight of each connection to be binary and the connections to be sparse. The discovered connectivity achieves competitive results on two segmentation datasets, while runs more than three times faster and requires less than half parameters compared to the state-of-the-art methods. An extensive experiment shows that the discovered connectivity is compatible with various backbones and generalizes well to other dense image prediction tasks.
Tasks
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07642v2
PDF	https://arxiv.org/pdf/1904.07642v2.pdf
PWC	https://paperswithcode.com/paper/sparsemask-differentiable-connectivity
Repo	https://github.com/wuhuikai/SparseMask
Framework	pytorch

Hallucinating Optical Flow Features for Video Classification


Title	Hallucinating Optical Flow Features for Video Classification
Authors	Yongyi Tang, Lin Ma, Lianqiang Zhou
Abstract	Appearance and motion are two key components to depict and characterize the video content. Currently, the two-stream models have achieved state-of-the-art performances on video classification. However, extracting motion information, specifically in the form of optical flow features, is extremely computationally expensive, especially for large-scale video classification. In this paper, we propose a motion hallucination network, namely MoNet, to imagine the optical flow features from the appearance features, with no reliance on the optical flow computation. Specifically, MoNet models the temporal relationships of the appearance features and exploits the contextual relationships of the optical flow features with concurrent connections. Extensive experimental results demonstrate that the proposed MoNet can effectively and efficiently hallucinate the optical flow features, which together with the appearance features consistently improve the video classification performances. Moreover, MoNet can help cutting down almost a half of computational and data-storage burdens for the two-stream video classification. Our code is available at: https://github.com/YongyiTang92/MoNet-Features.
Tasks	Optical Flow Estimation, Video Classification
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11799v2
PDF	https://arxiv.org/pdf/1905.11799v2.pdf
PWC	https://paperswithcode.com/paper/hallucinating-optical-flow-features-for-video
Repo	https://github.com/YongyiTang92/MoNet-Features
Framework	tf

Unsupervised Medical Image Segmentation with Adversarial Networks: From Edge Diagrams to Segmentation Maps


Title	Unsupervised Medical Image Segmentation with Adversarial Networks: From Edge Diagrams to Segmentation Maps
Authors	Umaseh Sivanesan, Luis H. Braga, Ranil R. Sonnadara, Kiret Dhindsa
Abstract	We develop and approach to unsupervised semantic medical image segmentation that extends previous work with generative adversarial networks. We use existing edge detection methods to construct simple edge diagrams, train a generative model to convert them into synthetic medical images, and construct a dataset of synthetic images with known segmentations using variations on extracted edge diagrams. This synthetic dataset is then used to train a supervised image segmentation model. We test our approach on a clinical dataset of kidney ultrasound images and the benchmark ISIC 2018 skin lesion dataset. We show that our unsupervised approach is more accurate than previous unsupervised methods, and performs reasonably compared to supervised image segmentation models. All code and trained models are available at https://github.com/kiretd/Unsupervised-MIseg.
Tasks	Edge Detection, Medical Image Segmentation, Semantic Segmentation
Published	2019-11-12
URL	https://arxiv.org/abs/1911.05140v1
PDF	https://arxiv.org/pdf/1911.05140v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-medical-image-segmentation-with
Repo	https://github.com/kiretd/Unsupervised-MIseg
Framework	tf

Spatio-Temporal Alignments: Optimal transport through space and time


Title	Spatio-Temporal Alignments: Optimal transport through space and time
Authors	Hicham Janati, Marco Cuturi, Alexandre Gramfort
Abstract	Comparing data defined over space and time is notoriously hard, because it involves quantifying both spatial and temporal variability, while at the same time taking into account the chronological structure of data. Dynamic Time Warping (DTW) computes an optimal alignment between time series in agreement with the chronological order, but is inherently blind to spatial shifts. In this paper, we propose Spatio-Temporal Alignments (STA), a new differentiable formulation of DTW, in which spatial differences between time samples are accounted for using regularized optimal transport (OT). Our temporal alignments are handled through a smooth variant of DTW called soft-DTW, for which we prove a new property: soft-DTW increases quadratically with time shifts. The cost matrix within soft-DTW that we use are computed using unbalanced OT, to handle the case in which observations are not normalized probabilities. Experiments on handwritten letters and brain imaging data confirm our theoretical findings and illustrate the effectiveness of STA as a dissimilarity for spatio-temporal data.
Tasks	Time Series
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03860v3
PDF	https://arxiv.org/pdf/1910.03860v3.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-alignments-optimal-transport
Repo	https://github.com/hichamjanati/spatio-temporal-alignements
Framework	pytorch

Unsupervised Community Detection with Modularity-Based Attention Model


Title	Unsupervised Community Detection with Modularity-Based Attention Model
Authors	Ivan Lobov, Sergey Ivanov
Abstract	In this paper we take a problem of unsupervised nodes clustering on graphs and show how recent advances in attention models can be applied successfully in a “hard” regime of the problem. We propose an unsupervised algorithm that encodes Bethe Hessian embeddings by optimizing soft modularity loss and argue that our model is competitive to both classical and Graph Neural Network (GNN) models while it can be trained on a single graph.
Tasks	Community Detection
Published	2019-05-20
URL	https://arxiv.org/abs/1905.10350v1
PDF	https://arxiv.org/pdf/1905.10350v1.pdf
PWC	https://paperswithcode.com/paper/190510350
Repo	https://github.com/Ivanopolo/modnet
Framework	tf

Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification


Title	Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification
Authors	Xiaoyu Zhang, Jingqing Zhang, Kai Sun, Xian Yang, Chengliang Dai, Yike Guo
Abstract	Different aspects of a clinical sample can be revealed by multiple types of omics data. Integrated analysis of multi-omics data provides a comprehensive view of patients, which has the potential to facilitate more accurate clinical decision making. However, omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The “dimensionality curse” makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classification network to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase without the classifier and a supervised phase with the classifier. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average classification accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than other existing methods. The OmiVAE model learned from multi-omics data outperformed that using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification.
Tasks	Decision Making
Published	2019-08-17
URL	https://arxiv.org/abs/1908.06278v1
PDF	https://arxiv.org/pdf/1908.06278v1.pdf
PWC	https://paperswithcode.com/paper/integrated-multi-omics-analysis-using
Repo	https://github.com/zhangxiaoyu11/OmiVAE
Framework	pytorch

ThirdEye: Triplet Based Iris Recognition without Normalization


Title	ThirdEye: Triplet Based Iris Recognition without Normalization
Authors	Sohaib Ahmad, Benjamin Fuller
Abstract	Most iris recognition pipelines involve three stages: segmenting into iris/non-iris pixels, normalization the iris region to a fixed area, and extracting relevant features for comparison. Given recent advances in deep learning it is prudent to ask which stages are required for accurate iris recognition. Lojez et al. (IWBF 2019) recently concluded that the segmentation stage is still crucial for good accuracy.We ask if normalization is beneficial? Towards answering this question, we develop a new iris recognition system called ThirdEye based on triplet convolutional neural networks (Schroff et al., ICCV 2015). ThirdEye directly uses segmented images without normalization. We observe equal error rates of 1.32%, 9.20%, and 0.59% on the ND-0405, UbirisV2, and IITD datasets respectively. For IITD, the most constrained dataset, this improves on the best prior work. However, for ND-0405 and UbirisV2,our equal error rate is slightly worse than prior systems. Our concluding hypothesis is that normalization is more important for less constrained environments.
Tasks	Iris Recognition
Published	2019-07-13
URL	https://arxiv.org/abs/1907.06147v1
PDF	https://arxiv.org/pdf/1907.06147v1.pdf
PWC	https://paperswithcode.com/paper/thirdeye-triplet-based-iris-recognition
Repo	https://github.com/sohaib50k/ThirdEye---Iris-recognition-using-triplets
Framework	tf

Importance Estimation for Neural Network Pruning


Title	Importance Estimation for Neural Network Pruning
Authors	Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz
Abstract	Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter’s contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-of-the-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning.
Tasks	Network Pruning
Published	2019-06-25
URL	https://arxiv.org/abs/1906.10771v1
PDF	https://arxiv.org/pdf/1906.10771v1.pdf
PWC	https://paperswithcode.com/paper/importance-estimation-for-neural-network-1
Repo	https://github.com/NVlabs/Taylor_pruning
Framework	pytorch

GRATIS: GeneRAting TIme Series with diverse and controllable characteristics


Title	GRATIS: GeneRAting TIme Series with diverse and controllable characteristics
Authors	Yanfei Kang, Rob J Hyndman, Feng Li
Abstract	The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires either collecting or simulating a diverse set of time series benchmarking data to enable reliable comparisons against alternative approaches. We propose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We simulate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application.
Tasks	Time Series, Time Series Analysis, Time Series Forecasting
Published	2019-03-07
URL	https://arxiv.org/abs/1903.02787v2
PDF	https://arxiv.org/pdf/1903.02787v2.pdf
PWC	https://paperswithcode.com/paper/gratis-generating-time-series-with-diverse
Repo	https://github.com/xqnwang/fuma
Framework	none

Pluralistic Image Completion


Title	Pluralistic Image Completion
Authors	Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
Abstract	Most image completion methods produce only one result for each masked input, although there may be many reasonable possibilities. In this paper, we present an approach for \textbf{pluralistic image completion} – the task of generating multiple and diverse plausible solutions for image completion. A major challenge faced by learning-based approaches is that usually only one ground truth training instance per label. As such, sampling from conditional VAEs still leads to minimal diversity. To overcome this, we propose a novel and probabilistically principled framework with two parallel paths. One is a reconstructive path that utilizes the only one given ground truth to get prior distribution of missing parts and rebuild the original image from this distribution. The other is a generative path for which the conditional prior is coupled to the distribution obtained in the reconstructive path. Both are supported by GANs. We also introduce a new short+long term attention layer that exploits distant relations among decoder and encoder features, improving appearance consistency. When tested on datasets with buildings (Paris), faces (CelebA-HQ), and natural images (ImageNet), our method not only generated higher-quality completion results, but also with multiple and diverse plausible outputs.
Tasks	Image Inpainting
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04227v2
PDF	http://arxiv.org/pdf/1903.04227v2.pdf
PWC	https://paperswithcode.com/paper/pluralistic-image-completion
Repo	https://github.com/lyndonzheng/Pluralistic-Inpainting
Framework	pytorch