January 31, 2020

2905 words 14 mins read

Paper Group AWR 421

Paper Group AWR 421

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. Emerging Convolutions for Generative Normalizing Flows. Automatic Temporally Coherent Video Colorization. SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses. A Personalized Subreddit Recommendation …

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Title Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
Authors Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, Cho-Jui Hsieh
Abstract Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy—using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at https://github.com/google-research/google-research/tree/master/cluster_gcn.
Tasks Graph Clustering, Link Prediction, Node Classification
Published 2019-05-20
URL https://arxiv.org/abs/1905.07953v2
PDF https://arxiv.org/pdf/1905.07953v2.pdf
PWC https://paperswithcode.com/paper/cluster-gcn-an-efficient-algorithm-for
Repo https://github.com/benedekrozemberczki/ClusterGCN
Framework pytorch

Emerging Convolutions for Generative Normalizing Flows

Title Emerging Convolutions for Generative Normalizing Flows
Authors Emiel Hoogeboom, Rianne van den Berg, Max Welling
Abstract Generative flows are attractive because they admit exact likelihood optimization and efficient image synthesis. Recently, Kingma & Dhariwal (2018) demonstrated with Glow that generative flows are capable of generating high quality images. We generalize the 1 x 1 convolutions proposed in Glow to invertible d x d convolutions, which are more flexible since they operate on both channel and spatial axes. We propose two methods to produce invertible convolutions that have receptive fields identical to standard convolutions: Emerging convolutions are obtained by chaining specific autoregressive convolutions, and periodic convolutions are decoupled in the frequency domain. Our experiments show that the flexibility of d x d convolutions significantly improves the performance of generative flow models on galaxy images, CIFAR10 and ImageNet.
Tasks Image Generation
Published 2019-01-30
URL https://arxiv.org/abs/1901.11137v3
PDF https://arxiv.org/pdf/1901.11137v3.pdf
PWC https://paperswithcode.com/paper/emerging-convolutions-for-generative
Repo https://github.com/ehoogeboom/emerging
Framework tf

Automatic Temporally Coherent Video Colorization

Title Automatic Temporally Coherent Video Colorization
Authors Harrish Thasarathan, Kamyar Nazeri, Mehran Ebrahimi
Abstract Greyscale image colorization for applications in image restoration has seen significant improvements in recent years. Many of these techniques that use learning-based methods struggle to effectively colorize sparse inputs. With the consistent growth of the anime industry, the ability to colorize sparse input such as line art can reduce significant cost and redundant work for production studios by eliminating the in-between frame colorization process. Simply using existing methods yields inconsistent colors between related frames resulting in a flicker effect in the final video. In order to successfully automate key areas of large-scale anime production, the colorization of line arts must be temporally consistent between frames. This paper proposes a method to colorize line art frames in an adversarial setting, to create temporally coherent video of large anime by improving existing image to image translation methods. We show that by adding an extra condition to the generator and discriminator, we can effectively create temporally consistent video sequences from anime line arts. Code and models available at: https://github.com/Harry-Thasarathan/TCVC
Tasks Colorization, Image Restoration, Image-to-Image Translation
Published 2019-04-21
URL http://arxiv.org/abs/1904.09527v1
PDF http://arxiv.org/pdf/1904.09527v1.pdf
PWC https://paperswithcode.com/paper/automatic-temporally-coherent-video
Repo https://github.com/iver56/automatic-video-colorization
Framework pytorch

SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses

Title SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses
Authors Zhiqiang Shen, Harsh Maheshwari, Weichen Yao, Marios Savvides
Abstract Unsupervised domain adaptive object detection aims to learn a robust detector in the domain shift circumstance, where the training (source) domain is label-rich with bounding box annotations, while the testing (target) domain is label-agnostic and the feature distributions between training and testing domains are dissimilar or even totally different. In this paper, we propose a gradient detach based stacked complementary losses (SCL) method that uses detection losses as the primary objective, and cuts in several auxiliary losses in different network stages accompanying with gradient detach training to learn more discriminative representations. We argue that the prior methods mainly leverage more loss functions for training but ignore the interaction of different losses and also the compatible training strategy (gradient detach updating in our work). Thus, our proposed method is a more syncretic adaptation learning process. We conduct comprehensive experiments on seven datasets, the results demonstrate that our method performs favorably better than the state-of-the-art methods by a significant margin. For instance, from Cityscapes to FoggyCityscapes, we achieve 37.9% mAP, outperforming the previous art Strong-Weak by 3.6%.
Tasks Object Detection
Published 2019-11-06
URL https://arxiv.org/abs/1911.02559v3
PDF https://arxiv.org/pdf/1911.02559v3.pdf
PWC https://paperswithcode.com/paper/scl-towards-accurate-domain-adaptive-object
Repo https://github.com/harsh-99/SCL
Framework pytorch

A Personalized Subreddit Recommendation Engine

Title A Personalized Subreddit Recommendation Engine
Authors Abhishek K Das, Nikhil Bhat, Sukanto Guha, Janvi Palan
Abstract This paper aims to improve upon the generic recommendations that Reddit provides for its users. We propose a novel personalized recommender system that learns from both, the presence and the content of user-subreddit interaction, using implicit and explicit signals to provide robust recommendations.
Tasks Recommendation Systems
Published 2019-05-03
URL https://arxiv.org/abs/1905.01263v1
PDF https://arxiv.org/pdf/1905.01263v1.pdf
PWC https://paperswithcode.com/paper/a-personalized-subreddit-recommendation
Repo https://github.com/abkds/r-ecommender
Framework none

Multiple Light Source Dataset for Colour Research

Title Multiple Light Source Dataset for Colour Research
Authors Anna Smagina, Egor Ershov, Anton Grigoryev
Abstract We present a collection of 24 multiple object scenes each recorded under 18 multiple light source illumination scenarios. The illuminants are varying in dominant spectral colours, intensity and distance from the scene. We mainly address the realistic scenarios for evaluation of computational colour constancy algorithms, but also have aimed to make the data as general as possible for computational colour science and computer vision. Along with the images of the scenes, we provide spectral characteristics of the camera, light sources and the objects and include pixel-by-pixel ground truth annotation of uniformly coloured object surfaces thus making this useful for benchmarking colour-based image segmentation algorithms. The dataset is freely available at https://github.com/visillect/mls-dataset.
Tasks Semantic Segmentation
Published 2019-08-16
URL https://arxiv.org/abs/1908.06126v4
PDF https://arxiv.org/pdf/1908.06126v4.pdf
PWC https://paperswithcode.com/paper/multiple-light-source-dataset-for-colour
Repo https://github.com/Visillect/mls-dataset
Framework none

RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation

Title RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation
Authors Shaoru Wang, Yongchao Gong, Junliang Xing, Lichao Huang, Chang Huang, Weiming Hu
Abstract Object detection and instance segmentation are two fundamental computer vision tasks. They are closely correlated but their relations have not yet been fully explored in most previous work. This paper presents RDSNet, a novel deep architecture for reciprocal object detection and instance segmentation. To reciprocate these two tasks, we design a two-stream structure to learn features on both the object level (i.e., bounding boxes) and the pixel level (i.e., instance masks) jointly. Within this structure, information from the two streams is fused alternately, namely information on the object level introduces the awareness of instance and translation variance to the pixel level, and information on the pixel level refines the localization accuracy of objects on the object level in return. Specifically, a correlation module and a cropping module are proposed to yield instance masks, as well as a mask based boundary refinement module for more accurate bounding boxes. Extensive experimental analyses and comparisons on the COCO dataset demonstrate the effectiveness and efficiency of RDSNet. The source code is available at https://github.com/wangsr126/RDSNet.
Tasks Instance Segmentation, Object Detection, Semantic Segmentation
Published 2019-12-11
URL https://arxiv.org/abs/1912.05070v1
PDF https://arxiv.org/pdf/1912.05070v1.pdf
PWC https://paperswithcode.com/paper/rdsnet-a-new-deep-architecture-for-reciprocal
Repo https://github.com/wangsr126/RDSNet
Framework pytorch

Variational AutoEncoder For Regression: Application to Brain Aging Analysis

Title Variational AutoEncoder For Regression: Application to Brain Aging Analysis
Authors Qingyu Zhao, Ehsan Adeli, Nicolas Honnorat, Tuo Leng, Kilian M. Pohl
Abstract While unsupervised variational autoencoders (VAE) have become a powerful tool in neuroimage analysis, their application to supervised learning is under-explored. We aim to close this gap by proposing a unified probabilistic model for learning the latent space of imaging data and performing supervised regression. Based on recent advances in learning disentangled representations, the novel generative process explicitly models the conditional distribution of latent representations with respect to the regression target variable. Performing a variational inference procedure on this model leads to joint regularization between the VAE and a neural-network regressor. In predicting the age of 245 subjects from their structural Magnetic Resonance (MR) images, our model is more accurate than state-of-the-art methods when applied to either region-of-interest (ROI) measurements or raw 3D volume images. More importantly, unlike simple feed-forward neural-networks, disentanglement of age in latent representations allows for intuitive interpretation of the structural developmental patterns of the human brain.
Tasks
Published 2019-04-11
URL https://arxiv.org/abs/1904.05948v2
PDF https://arxiv.org/pdf/1904.05948v2.pdf
PWC https://paperswithcode.com/paper/variational-autoencoder-for-regression
Repo https://github.com/QingyuZhao/VAE-for-Regression
Framework none

Generative Smoke Removal

Title Generative Smoke Removal
Authors Oleksii Sidorov, Congcong Wang, Faouzi Alaya Cheikh
Abstract In minimally invasive surgery, the use of tissue dissection tools causes smoke, which inevitably degrades the image quality. This could reduce the visibility of the operation field for surgeons and introduces errors for the computer vision algorithms used in surgical navigation systems. In this paper, we propose a novel approach for computational smoke removal using supervised image-to-image translation. We demonstrate that straightforward application of existing generative algorithms allows removing smoke but decreases image quality and introduces synthetic noise (grid-structure). Thus, we propose to solve this issue by modification of GAN’s architecture and adding perceptual image quality metric to the loss function. Obtained results demonstrate that proposed method efficiently removes smoke as well as preserves perceptually sufficient image quality.
Tasks Image-to-Image Translation
Published 2019-02-01
URL https://arxiv.org/abs/1902.00311v2
PDF https://arxiv.org/pdf/1902.00311v2.pdf
PWC https://paperswithcode.com/paper/generative-smoke-removal
Repo https://github.com/acecreamu/ssim-pan
Framework pytorch

Objects as Points

Title Objects as Points
Authors Xingyi Zhou, Dequan Wang, Philipp Krähenbühl
Abstract Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point — the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.
Tasks Keypoint Detection, Object Detection, Real-Time Object Detection
Published 2019-04-16
URL http://arxiv.org/abs/1904.07850v2
PDF http://arxiv.org/pdf/1904.07850v2.pdf
PWC https://paperswithcode.com/paper/objects-as-points
Repo https://github.com/xingyizhou/CenterNet
Framework pytorch

Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction

Title Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction
Authors Somdyuti Paul, Andrey Norkin, Alan C. Bovik
Abstract In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64$\times$64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate.
Tasks
Published 2019-06-15
URL https://arxiv.org/abs/1906.06476v1
PDF https://arxiv.org/pdf/1906.06476v1.pdf
PWC https://paperswithcode.com/paper/speeding-up-vp9-intra-encoder-with
Repo https://github.com/Somdyuti2/H-FCN
Framework tf

Towards Better Forecasting by Fusing Near and Distant Future Visions

Title Towards Better Forecasting by Fusing Near and Distant Future Visions
Authors Jiezhu Cheng, Kaizhu Huang, Zibin Zheng
Abstract Multivariate time series forecasting is an important yet challenging problem in machine learning. Most existing approaches only forecast the series value of one future moment, ignoring the interactions between predictions of future moments with different temporal distance. Such a deficiency probably prevents the model from getting enough information about the future, thus limiting the forecasting accuracy. To address this problem, we propose Multi-Level Construal Neural Network (MLCNN), a novel multi-task deep learning framework. Inspired by the Construal Level Theory of psychology, this model aims to improve the predictive performance by fusing forecasting information (i.e., future visions) of different future time. We first use the Convolution Neural Network to extract multi-level abstract representations of the raw data for near and distant future predictions. We then model the interplay between multiple predictive tasks and fuse their future visions through a modified Encoder-Decoder architecture. Finally, we combine traditional Autoregression model with the neural network to solve the scale insensitive problem. Experiments on three real-world datasets show that our method achieves statistically significant improvements compared to the most state-of-the-art baseline methods, with average 4.59% reduction on RMSE metric and average 6.87% reduction on MAE metric.
Tasks Multivariate Time Series Forecasting, Time Series, Time Series Forecasting
Published 2019-12-11
URL https://arxiv.org/abs/1912.05122v1
PDF https://arxiv.org/pdf/1912.05122v1.pdf
PWC https://paperswithcode.com/paper/towards-better-forecasting-by-fusing-near-and
Repo https://github.com/smallGum/MLCNN-Multivariate-Time-Series
Framework pytorch

Long and Diverse Text Generation with Planning-based Hierarchical Variational Model

Title Long and Diverse Text Generation with Planning-based Hierarchical Variational Model
Authors Zhihong Shao, Minlie Huang, Jiangtao Wen, Wenfei Xu, Xiaoyan Zhu
Abstract Existing neural methods for data-to-text generation are still struggling to produce long and diverse texts: they are insufficient to model input data dynamically during generation, to capture inter-sentence coherence, or to generate diversified expressions. To address these issues, we propose a Planning-based Hierarchical Variational Model (PHVM). Our model first plans a sequence of groups (each group is a subset of input items to be covered by a sentence) and then realizes each sentence conditioned on the planning result and the previously generated context, thereby decomposing long text generation into dependent sentence generation sub-tasks. To capture expression diversity, we devise a hierarchical latent structure where a global planning latent variable models the diversity of reasonable planning and a sequence of local latent variables controls sentence realization. Experiments show that our model outperforms state-of-the-art baselines in long and diverse text generation.
Tasks Data-to-Text Generation, Latent Variable Models, Text Generation
Published 2019-08-19
URL https://arxiv.org/abs/1908.06605v2
PDF https://arxiv.org/pdf/1908.06605v2.pdf
PWC https://paperswithcode.com/paper/long-and-diverse-text-generation-with
Repo https://github.com/ZhihongShao/Planning-based-Hierarchical-Variational-Model
Framework tf

Lightweight Image Super-Resolution with Adaptive Weighted Learning Network

Title Lightweight Image Super-Resolution with Adaptive Weighted Learning Network
Authors Chaofeng Wang, Zheng Li, Jun Shi
Abstract PyTorch code for our paper “Lightweight Image Super-Resolution with Adaptive Weighted Learning Network”
Tasks Image Super-Resolution, Super-Resolution
Published 2019-04-04
URL http://arxiv.org/abs/1904.02358v1
PDF http://arxiv.org/pdf/1904.02358v1.pdf
PWC https://paperswithcode.com/paper/lightweight-image-super-resolution-with
Repo https://github.com/ChaofWang/AWSRN
Framework pytorch

Measuring the Reliability of Reinforcement Learning Algorithms

Title Measuring the Reliability of Reinforcement Learning Algorithms
Authors Stephanie C. Y. Chan, Samuel Fishman, John Canny, Anoop Korattikara, Sergio Guadarrama
Abstract Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. This problem has gained increasing attention in recent years, and efforts to improve it have grown substantially. To aid RL researchers and production users with the evaluation and improvement of reliability, we propose a set of metrics that quantitatively measure different aspects of reliability. In this work, we focus on variability and risk, both during training and after learning (on a fixed policy). We designed these metrics to be general-purpose, and we also designed complementary statistical tests to enable rigorous comparisons on these metrics. In this paper, we first describe the desired properties of the metrics and their design, the aspects of reliability that they measure, and their applicability to different scenarios. We then describe the statistical tests and make additional practical recommendations for reporting results. The metrics and accompanying statistical tools have been made available as an open-source library at https://github.com/google-research/rl-reliability-metrics. We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results.
Tasks
Published 2019-12-10
URL https://arxiv.org/abs/1912.05663v2
PDF https://arxiv.org/pdf/1912.05663v2.pdf
PWC https://paperswithcode.com/paper/measuring-the-reliability-of-reinforcement-1
Repo https://github.com/google-research/rl-reliability-metrics
Framework tf
comments powered by Disqus