Paper Group AWR 457
SinGAN: Learning a Generative Model from a Single Natural Image. Boundary Effect-Aware Visual Tracking for UAV with Online Enhanced Background Learning and Multi-Frame Consensus Verification. The Limited Multi-Label Projection Layer. Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research. Learning to Regress 3D Face Shape and Expressi …
SinGAN: Learning a Generative Model from a Single Natural Image
Title | SinGAN: Learning a Generative Model from a Single Natural Image |
Authors | Tamar Rott Shaham, Tali Dekel, Tomer Michaeli |
Abstract | We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks. |
Tasks | Image Generation |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.01164v2 |
https://arxiv.org/pdf/1905.01164v2.pdf | |
PWC | https://paperswithcode.com/paper/singan-learning-a-generative-model-from-a |
Repo | https://github.com/tamarott/SinGAN |
Framework | pytorch |
Boundary Effect-Aware Visual Tracking for UAV with Online Enhanced Background Learning and Multi-Frame Consensus Verification
Title | Boundary Effect-Aware Visual Tracking for UAV with Online Enhanced Background Learning and Multi-Frame Consensus Verification |
Authors | Changhong Fu, Ziyuan Huang, Yiming Li, Ran Duan, Peng Lu |
Abstract | Due to implicitly introduced periodic shifting of limited searching area, visual object tracking using correlation filters often has to confront undesired boundary effect. As boundary effect severely degrade the quality of object model, it has made it a challenging task for unmanned aerial vehicles (UAV) to perform robust and accurate object following. Traditional hand-crafted features are also not precise and robust enough to describe the object in the viewing point of UAV. In this work, a novel tracker with online enhanced background learning is specifically proposed to tackle boundary effects. Real background samples are densely extracted to learn as well as update correlation filters. Spatial penalization is introduced to offset the noise introduced by exceedingly more background information so that a more accurate appearance model can be established. Meanwhile, convolutional features are extracted to provide a more comprehensive representation of the object. In order to mitigate changes of objects’ appearances, multi-frame technique is applied to learn an ideal response map and verify the generated one in each frame. Exhaustive experiments were conducted on 100 challenging UAV image sequences and the proposed tracker has achieved state-of-the-art performance. |
Tasks | Object Tracking, Visual Object Tracking, Visual Tracking |
Published | 2019-08-10 |
URL | https://arxiv.org/abs/1908.03701v1 |
https://arxiv.org/pdf/1908.03701v1.pdf | |
PWC | https://paperswithcode.com/paper/boundary-effect-aware-visual-tracking-for-uav |
Repo | https://github.com/vision4robotics/BEVT-tracker |
Framework | none |
The Limited Multi-Label Projection Layer
Title | The Limited Multi-Label Projection Layer |
Authors | Brandon Amos, Vladlen Koltun, J. Zico Kolter |
Abstract | We propose the Limited Multi-Label (LML) projection layer as a new primitive operation for end-to-end learning systems. The LML layer provides a probabilistic way of modeling multi-label predictions limited to having exactly k labels. We derive efficient forward and backward passes for this layer and show how the layer can be used to optimize the top-k recall for multi-label tasks with incomplete label information. We evaluate LML layers on top-k CIFAR-100 classification and scene graph generation. We show that LML layers add a negligible amount of computational overhead, strictly improve the model’s representational capacity, and improve accuracy. We also revisit the truncated top-k entropy method as a competitive baseline for top-k classification. |
Tasks | Graph Generation, Scene Graph Generation |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08707v3 |
https://arxiv.org/pdf/1906.08707v3.pdf | |
PWC | https://paperswithcode.com/paper/the-limited-multi-label-projection-layer |
Repo | https://github.com/locuslab/lml |
Framework | pytorch |
Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research
Title | Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research |
Authors | Krishna Murthy Jatavallabhula, Edward Smith, Jean-Francois Lafleche, Clement Fuji Tsang, Artem Rozantsev, Wenzheng Chen, Tommy Xiang, Rev Lebaredian, Sanja Fidler |
Abstract | We present Kaolin, a PyTorch library aiming to accelerate 3D deep learning research. Kaolin provides efficient implementations of differentiable 3D modules for use in deep learning systems. With functionality to load and preprocess several popular 3D datasets, and native functions to manipulate meshes, pointclouds, signed distance functions, and voxel grids, Kaolin mitigates the need to write wasteful boilerplate code. Kaolin packages together several differentiable graphics modules including rendering, lighting, shading, and view warping. Kaolin also supports an array of loss functions and evaluation metrics for seamless evaluation and provides visualization functionality to render the 3D results. Importantly, we curate a comprehensive model zoo comprising many state-of-the-art 3D deep learning architectures, to serve as a starting point for future research endeavours. Kaolin is available as open-source software at https://github.com/NVIDIAGameWorks/kaolin/. |
Tasks | |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.05063v2 |
https://arxiv.org/pdf/1911.05063v2.pdf | |
PWC | https://paperswithcode.com/paper/kaolin-a-pytorch-library-for-accelerating-3d |
Repo | https://github.com/NVIDIAGameWorks/kaolin |
Framework | pytorch |
Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision
Title | Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision |
Authors | Soubhik Sanyal, Timo Bolkart, Haiwen Feng, Michael J. Black |
Abstract | The estimation of 3D face shape from a single image must be robust to variations in lighting, head pose, expression, facial hair, makeup, and occlusions. Robustness requires a large training set of in-the-wild images, which by construction, lack ground truth 3D shape. To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image. Our key observation is that an individual’s face shape is constant across images, regardless of expression, pose, lighting, etc. RingNet leverages multiple images of a person and automatically detected 2D face features. It uses a novel loss that encourages the face shape to be similar when the identity is the same and different for different people. We achieve invariance to expression by representing the face using the FLAME model. Once trained, our method takes a single image and outputs the parameters of FLAME, which can be readily animated. Additionally we create a new database of faces `not quite in-the-wild’ (NoW) with 3D head scans and high-resolution images of the subjects in a wide variety of conditions. We evaluate publicly available methods and find that RingNet is more accurate than methods that use 3D supervision. The dataset, model, and results are available for research purposes at http://ringnet.is.tuebingen.mpg.de. | |
Tasks | 3D Face Reconstruction |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06817v1 |
https://arxiv.org/pdf/1905.06817v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-regress-3d-face-shape-and |
Repo | https://github.com/soubhiksanyal/RingNet |
Framework | tf |
SparseMask: Differentiable Connectivity Learning for Dense Image Prediction
Title | SparseMask: Differentiable Connectivity Learning for Dense Image Prediction |
Authors | Huikai Wu, Junge Zhang, Kaiqi Huang |
Abstract | In this paper, we aim at automatically searching an efficient network architecture for dense image prediction. Particularly, we follow the encoder-decoder style and focus on designing a connectivity structure for the decoder. To achieve that, we design a densely connected network with learnable connections, named Fully Dense Network, which contains a large set of possible final connectivity structures. We then employ gradient descent to search the optimal connectivity from the dense connections. The search process is guided by a novel loss function, which pushes the weight of each connection to be binary and the connections to be sparse. The discovered connectivity achieves competitive results on two segmentation datasets, while runs more than three times faster and requires less than half parameters compared to the state-of-the-art methods. An extensive experiment shows that the discovered connectivity is compatible with various backbones and generalizes well to other dense image prediction tasks. |
Tasks | |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07642v2 |
https://arxiv.org/pdf/1904.07642v2.pdf | |
PWC | https://paperswithcode.com/paper/sparsemask-differentiable-connectivity |
Repo | https://github.com/wuhuikai/SparseMask |
Framework | pytorch |
Hallucinating Optical Flow Features for Video Classification
Title | Hallucinating Optical Flow Features for Video Classification |
Authors | Yongyi Tang, Lin Ma, Lianqiang Zhou |
Abstract | Appearance and motion are two key components to depict and characterize the video content. Currently, the two-stream models have achieved state-of-the-art performances on video classification. However, extracting motion information, specifically in the form of optical flow features, is extremely computationally expensive, especially for large-scale video classification. In this paper, we propose a motion hallucination network, namely MoNet, to imagine the optical flow features from the appearance features, with no reliance on the optical flow computation. Specifically, MoNet models the temporal relationships of the appearance features and exploits the contextual relationships of the optical flow features with concurrent connections. Extensive experimental results demonstrate that the proposed MoNet can effectively and efficiently hallucinate the optical flow features, which together with the appearance features consistently improve the video classification performances. Moreover, MoNet can help cutting down almost a half of computational and data-storage burdens for the two-stream video classification. Our code is available at: https://github.com/YongyiTang92/MoNet-Features. |
Tasks | Optical Flow Estimation, Video Classification |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11799v2 |
https://arxiv.org/pdf/1905.11799v2.pdf | |
PWC | https://paperswithcode.com/paper/hallucinating-optical-flow-features-for-video |
Repo | https://github.com/YongyiTang92/MoNet-Features |
Framework | tf |
Unsupervised Medical Image Segmentation with Adversarial Networks: From Edge Diagrams to Segmentation Maps
Title | Unsupervised Medical Image Segmentation with Adversarial Networks: From Edge Diagrams to Segmentation Maps |
Authors | Umaseh Sivanesan, Luis H. Braga, Ranil R. Sonnadara, Kiret Dhindsa |
Abstract | We develop and approach to unsupervised semantic medical image segmentation that extends previous work with generative adversarial networks. We use existing edge detection methods to construct simple edge diagrams, train a generative model to convert them into synthetic medical images, and construct a dataset of synthetic images with known segmentations using variations on extracted edge diagrams. This synthetic dataset is then used to train a supervised image segmentation model. We test our approach on a clinical dataset of kidney ultrasound images and the benchmark ISIC 2018 skin lesion dataset. We show that our unsupervised approach is more accurate than previous unsupervised methods, and performs reasonably compared to supervised image segmentation models. All code and trained models are available at https://github.com/kiretd/Unsupervised-MIseg. |
Tasks | Edge Detection, Medical Image Segmentation, Semantic Segmentation |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.05140v1 |
https://arxiv.org/pdf/1911.05140v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-medical-image-segmentation-with |
Repo | https://github.com/kiretd/Unsupervised-MIseg |
Framework | tf |
Spatio-Temporal Alignments: Optimal transport through space and time
Title | Spatio-Temporal Alignments: Optimal transport through space and time |
Authors | Hicham Janati, Marco Cuturi, Alexandre Gramfort |
Abstract | Comparing data defined over space and time is notoriously hard, because it involves quantifying both spatial and temporal variability, while at the same time taking into account the chronological structure of data. Dynamic Time Warping (DTW) computes an optimal alignment between time series in agreement with the chronological order, but is inherently blind to spatial shifts. In this paper, we propose Spatio-Temporal Alignments (STA), a new differentiable formulation of DTW, in which spatial differences between time samples are accounted for using regularized optimal transport (OT). Our temporal alignments are handled through a smooth variant of DTW called soft-DTW, for which we prove a new property: soft-DTW increases quadratically with time shifts. The cost matrix within soft-DTW that we use are computed using unbalanced OT, to handle the case in which observations are not normalized probabilities. Experiments on handwritten letters and brain imaging data confirm our theoretical findings and illustrate the effectiveness of STA as a dissimilarity for spatio-temporal data. |
Tasks | Time Series |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.03860v3 |
https://arxiv.org/pdf/1910.03860v3.pdf | |
PWC | https://paperswithcode.com/paper/spatio-temporal-alignments-optimal-transport |
Repo | https://github.com/hichamjanati/spatio-temporal-alignements |
Framework | pytorch |
Unsupervised Community Detection with Modularity-Based Attention Model
Title | Unsupervised Community Detection with Modularity-Based Attention Model |
Authors | Ivan Lobov, Sergey Ivanov |
Abstract | In this paper we take a problem of unsupervised nodes clustering on graphs and show how recent advances in attention models can be applied successfully in a “hard” regime of the problem. We propose an unsupervised algorithm that encodes Bethe Hessian embeddings by optimizing soft modularity loss and argue that our model is competitive to both classical and Graph Neural Network (GNN) models while it can be trained on a single graph. |
Tasks | Community Detection |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.10350v1 |
https://arxiv.org/pdf/1905.10350v1.pdf | |
PWC | https://paperswithcode.com/paper/190510350 |
Repo | https://github.com/Ivanopolo/modnet |
Framework | tf |
Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification
Title | Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification |
Authors | Xiaoyu Zhang, Jingqing Zhang, Kai Sun, Xian Yang, Chengliang Dai, Yike Guo |
Abstract | Different aspects of a clinical sample can be revealed by multiple types of omics data. Integrated analysis of multi-omics data provides a comprehensive view of patients, which has the potential to facilitate more accurate clinical decision making. However, omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The “dimensionality curse” makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classification network to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase without the classifier and a supervised phase with the classifier. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average classification accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than other existing methods. The OmiVAE model learned from multi-omics data outperformed that using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification. |
Tasks | Decision Making |
Published | 2019-08-17 |
URL | https://arxiv.org/abs/1908.06278v1 |
https://arxiv.org/pdf/1908.06278v1.pdf | |
PWC | https://paperswithcode.com/paper/integrated-multi-omics-analysis-using |
Repo | https://github.com/zhangxiaoyu11/OmiVAE |
Framework | pytorch |
ThirdEye: Triplet Based Iris Recognition without Normalization
Title | ThirdEye: Triplet Based Iris Recognition without Normalization |
Authors | Sohaib Ahmad, Benjamin Fuller |
Abstract | Most iris recognition pipelines involve three stages: segmenting into iris/non-iris pixels, normalization the iris region to a fixed area, and extracting relevant features for comparison. Given recent advances in deep learning it is prudent to ask which stages are required for accurate iris recognition. Lojez et al. (IWBF 2019) recently concluded that the segmentation stage is still crucial for good accuracy.We ask if normalization is beneficial? Towards answering this question, we develop a new iris recognition system called ThirdEye based on triplet convolutional neural networks (Schroff et al., ICCV 2015). ThirdEye directly uses segmented images without normalization. We observe equal error rates of 1.32%, 9.20%, and 0.59% on the ND-0405, UbirisV2, and IITD datasets respectively. For IITD, the most constrained dataset, this improves on the best prior work. However, for ND-0405 and UbirisV2,our equal error rate is slightly worse than prior systems. Our concluding hypothesis is that normalization is more important for less constrained environments. |
Tasks | Iris Recognition |
Published | 2019-07-13 |
URL | https://arxiv.org/abs/1907.06147v1 |
https://arxiv.org/pdf/1907.06147v1.pdf | |
PWC | https://paperswithcode.com/paper/thirdeye-triplet-based-iris-recognition |
Repo | https://github.com/sohaib50k/ThirdEye---Iris-recognition-using-triplets |
Framework | tf |
Importance Estimation for Neural Network Pruning
Title | Importance Estimation for Neural Network Pruning |
Authors | Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz |
Abstract | Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter’s contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-of-the-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning. |
Tasks | Network Pruning |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10771v1 |
https://arxiv.org/pdf/1906.10771v1.pdf | |
PWC | https://paperswithcode.com/paper/importance-estimation-for-neural-network-1 |
Repo | https://github.com/NVlabs/Taylor_pruning |
Framework | pytorch |
GRATIS: GeneRAting TIme Series with diverse and controllable characteristics
Title | GRATIS: GeneRAting TIme Series with diverse and controllable characteristics |
Authors | Yanfei Kang, Rob J Hyndman, Feng Li |
Abstract | The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires either collecting or simulating a diverse set of time series benchmarking data to enable reliable comparisons against alternative approaches. We propose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We simulate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application. |
Tasks | Time Series, Time Series Analysis, Time Series Forecasting |
Published | 2019-03-07 |
URL | https://arxiv.org/abs/1903.02787v2 |
https://arxiv.org/pdf/1903.02787v2.pdf | |
PWC | https://paperswithcode.com/paper/gratis-generating-time-series-with-diverse |
Repo | https://github.com/xqnwang/fuma |
Framework | none |
Pluralistic Image Completion
Title | Pluralistic Image Completion |
Authors | Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai |
Abstract | Most image completion methods produce only one result for each masked input, although there may be many reasonable possibilities. In this paper, we present an approach for \textbf{pluralistic image completion} – the task of generating multiple and diverse plausible solutions for image completion. A major challenge faced by learning-based approaches is that usually only one ground truth training instance per label. As such, sampling from conditional VAEs still leads to minimal diversity. To overcome this, we propose a novel and probabilistically principled framework with two parallel paths. One is a reconstructive path that utilizes the only one given ground truth to get prior distribution of missing parts and rebuild the original image from this distribution. The other is a generative path for which the conditional prior is coupled to the distribution obtained in the reconstructive path. Both are supported by GANs. We also introduce a new short+long term attention layer that exploits distant relations among decoder and encoder features, improving appearance consistency. When tested on datasets with buildings (Paris), faces (CelebA-HQ), and natural images (ImageNet), our method not only generated higher-quality completion results, but also with multiple and diverse plausible outputs. |
Tasks | Image Inpainting |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04227v2 |
http://arxiv.org/pdf/1903.04227v2.pdf | |
PWC | https://paperswithcode.com/paper/pluralistic-image-completion |
Repo | https://github.com/lyndonzheng/Pluralistic-Inpainting |
Framework | pytorch |