January 31, 2020

2905 words 14 mins read

Paper Group AWR 421

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. Emerging Convolutions for Generative Normalizing Flows. Automatic Temporally Coherent Video Colorization. SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses. A Personalized Subreddit Recommendation …

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks


Title	Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
Authors	Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, Cho-Jui Hsieh
Abstract	Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy—using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at https://github.com/google-research/google-research/tree/master/cluster_gcn.
Tasks	Graph Clustering, Link Prediction, Node Classification
Published	2019-05-20
URL	https://arxiv.org/abs/1905.07953v2
PDF	https://arxiv.org/pdf/1905.07953v2.pdf
PWC	https://paperswithcode.com/paper/cluster-gcn-an-efficient-algorithm-for
Repo	https://github.com/benedekrozemberczki/ClusterGCN
Framework	pytorch

Emerging Convolutions for Generative Normalizing Flows


Title	Emerging Convolutions for Generative Normalizing Flows
Authors	Emiel Hoogeboom, Rianne van den Berg, Max Welling
Abstract	Generative flows are attractive because they admit exact likelihood optimization and efficient image synthesis. Recently, Kingma & Dhariwal (2018) demonstrated with Glow that generative flows are capable of generating high quality images. We generalize the 1 x 1 convolutions proposed in Glow to invertible d x d convolutions, which are more flexible since they operate on both channel and spatial axes. We propose two methods to produce invertible convolutions that have receptive fields identical to standard convolutions: Emerging convolutions are obtained by chaining specific autoregressive convolutions, and periodic convolutions are decoupled in the frequency domain. Our experiments show that the flexibility of d x d convolutions significantly improves the performance of generative flow models on galaxy images, CIFAR10 and ImageNet.
Tasks	Image Generation
Published	2019-01-30
URL	https://arxiv.org/abs/1901.11137v3
PDF	https://arxiv.org/pdf/1901.11137v3.pdf
PWC	https://paperswithcode.com/paper/emerging-convolutions-for-generative
Repo	https://github.com/ehoogeboom/emerging
Framework	tf

Automatic Temporally Coherent Video Colorization


Title	Automatic Temporally Coherent Video Colorization
Authors	Harrish Thasarathan, Kamyar Nazeri, Mehran Ebrahimi
Abstract	Greyscale image colorization for applications in image restoration has seen significant improvements in recent years. Many of these techniques that use learning-based methods struggle to effectively colorize sparse inputs. With the consistent growth of the anime industry, the ability to colorize sparse input such as line art can reduce significant cost and redundant work for production studios by eliminating the in-between frame colorization process. Simply using existing methods yields inconsistent colors between related frames resulting in a flicker effect in the final video. In order to successfully automate key areas of large-scale anime production, the colorization of line arts must be temporally consistent between frames. This paper proposes a method to colorize line art frames in an adversarial setting, to create temporally coherent video of large anime by improving existing image to image translation methods. We show that by adding an extra condition to the generator and discriminator, we can effectively create temporally consistent video sequences from anime line arts. Code and models available at: https://github.com/Harry-Thasarathan/TCVC
Tasks	Colorization, Image Restoration, Image-to-Image Translation
Published	2019-04-21
URL	http://arxiv.org/abs/1904.09527v1
PDF	http://arxiv.org/pdf/1904.09527v1.pdf
PWC	https://paperswithcode.com/paper/automatic-temporally-coherent-video
Repo	https://github.com/iver56/automatic-video-colorization
Framework	pytorch

SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses


Title	SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses
Authors	Zhiqiang Shen, Harsh Maheshwari, Weichen Yao, Marios Savvides
Abstract	Unsupervised domain adaptive object detection aims to learn a robust detector in the domain shift circumstance, where the training (source) domain is label-rich with bounding box annotations, while the testing (target) domain is label-agnostic and the feature distributions between training and testing domains are dissimilar or even totally different. In this paper, we propose a gradient detach based stacked complementary losses (SCL) method that uses detection losses as the primary objective, and cuts in several auxiliary losses in different network stages accompanying with gradient detach training to learn more discriminative representations. We argue that the prior methods mainly leverage more loss functions for training but ignore the interaction of different losses and also the compatible training strategy (gradient detach updating in our work). Thus, our proposed method is a more syncretic adaptation learning process. We conduct comprehensive experiments on seven datasets, the results demonstrate that our method performs favorably better than the state-of-the-art methods by a significant margin. For instance, from Cityscapes to FoggyCityscapes, we achieve 37.9% mAP, outperforming the previous art Strong-Weak by 3.6%.
Tasks	Object Detection
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02559v3
PDF	https://arxiv.org/pdf/1911.02559v3.pdf
PWC	https://paperswithcode.com/paper/scl-towards-accurate-domain-adaptive-object
Repo	https://github.com/harsh-99/SCL
Framework	pytorch

A Personalized Subreddit Recommendation Engine


Title	A Personalized Subreddit Recommendation Engine
Authors	Abhishek K Das, Nikhil Bhat, Sukanto Guha, Janvi Palan
Abstract	This paper aims to improve upon the generic recommendations that Reddit provides for its users. We propose a novel personalized recommender system that learns from both, the presence and the content of user-subreddit interaction, using implicit and explicit signals to provide robust recommendations.
Tasks	Recommendation Systems
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01263v1
PDF	https://arxiv.org/pdf/1905.01263v1.pdf
PWC	https://paperswithcode.com/paper/a-personalized-subreddit-recommendation
Repo	https://github.com/abkds/r-ecommender
Framework	none

Multiple Light Source Dataset for Colour Research


Title	Multiple Light Source Dataset for Colour Research
Authors	Anna Smagina, Egor Ershov, Anton Grigoryev
Abstract	We present a collection of 24 multiple object scenes each recorded under 18 multiple light source illumination scenarios. The illuminants are varying in dominant spectral colours, intensity and distance from the scene. We mainly address the realistic scenarios for evaluation of computational colour constancy algorithms, but also have aimed to make the data as general as possible for computational colour science and computer vision. Along with the images of the scenes, we provide spectral characteristics of the camera, light sources and the objects and include pixel-by-pixel ground truth annotation of uniformly coloured object surfaces thus making this useful for benchmarking colour-based image segmentation algorithms. The dataset is freely available at https://github.com/visillect/mls-dataset.
Tasks	Semantic Segmentation
Published	2019-08-16
URL	https://arxiv.org/abs/1908.06126v4
PDF	https://arxiv.org/pdf/1908.06126v4.pdf
PWC	https://paperswithcode.com/paper/multiple-light-source-dataset-for-colour
Repo	https://github.com/Visillect/mls-dataset
Framework	none

RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation


Title	RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation
Authors	Shaoru Wang, Yongchao Gong, Junliang Xing, Lichao Huang, Chang Huang, Weiming Hu
Abstract	Object detection and instance segmentation are two fundamental computer vision tasks. They are closely correlated but their relations have not yet been fully explored in most previous work. This paper presents RDSNet, a novel deep architecture for reciprocal object detection and instance segmentation. To reciprocate these two tasks, we design a two-stream structure to learn features on both the object level (i.e., bounding boxes) and the pixel level (i.e., instance masks) jointly. Within this structure, information from the two streams is fused alternately, namely information on the object level introduces the awareness of instance and translation variance to the pixel level, and information on the pixel level refines the localization accuracy of objects on the object level in return. Specifically, a correlation module and a cropping module are proposed to yield instance masks, as well as a mask based boundary refinement module for more accurate bounding boxes. Extensive experimental analyses and comparisons on the COCO dataset demonstrate the effectiveness and efficiency of RDSNet. The source code is available at https://github.com/wangsr126/RDSNet.
Tasks	Instance Segmentation, Object Detection, Semantic Segmentation
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05070v1
PDF	https://arxiv.org/pdf/1912.05070v1.pdf
PWC	https://paperswithcode.com/paper/rdsnet-a-new-deep-architecture-for-reciprocal
Repo	https://github.com/wangsr126/RDSNet
Framework	pytorch

Variational AutoEncoder For Regression: Application to Brain Aging Analysis


Title	Variational AutoEncoder For Regression: Application to Brain Aging Analysis
Authors	Qingyu Zhao, Ehsan Adeli, Nicolas Honnorat, Tuo Leng, Kilian M. Pohl
Abstract	While unsupervised variational autoencoders (VAE) have become a powerful tool in neuroimage analysis, their application to supervised learning is under-explored. We aim to close this gap by proposing a unified probabilistic model for learning the latent space of imaging data and performing supervised regression. Based on recent advances in learning disentangled representations, the novel generative process explicitly models the conditional distribution of latent representations with respect to the regression target variable. Performing a variational inference procedure on this model leads to joint regularization between the VAE and a neural-network regressor. In predicting the age of 245 subjects from their structural Magnetic Resonance (MR) images, our model is more accurate than state-of-the-art methods when applied to either region-of-interest (ROI) measurements or raw 3D volume images. More importantly, unlike simple feed-forward neural-networks, disentanglement of age in latent representations allows for intuitive interpretation of the structural developmental patterns of the human brain.
Tasks
Published	2019-04-11
URL	https://arxiv.org/abs/1904.05948v2
PDF	https://arxiv.org/pdf/1904.05948v2.pdf
PWC	https://paperswithcode.com/paper/variational-autoencoder-for-regression
Repo	https://github.com/QingyuZhao/VAE-for-Regression
Framework	none

Generative Smoke Removal


Title	Generative Smoke Removal
Authors	Oleksii Sidorov, Congcong Wang, Faouzi Alaya Cheikh
Abstract	In minimally invasive surgery, the use of tissue dissection tools causes smoke, which inevitably degrades the image quality. This could reduce the visibility of the operation field for surgeons and introduces errors for the computer vision algorithms used in surgical navigation systems. In this paper, we propose a novel approach for computational smoke removal using supervised image-to-image translation. We demonstrate that straightforward application of existing generative algorithms allows removing smoke but decreases image quality and introduces synthetic noise (grid-structure). Thus, we propose to solve this issue by modification of GAN’s architecture and adding perceptual image quality metric to the loss function. Obtained results demonstrate that proposed method efficiently removes smoke as well as preserves perceptually sufficient image quality.
Tasks	Image-to-Image Translation
Published	2019-02-01
URL	https://arxiv.org/abs/1902.00311v2
PDF	https://arxiv.org/pdf/1902.00311v2.pdf
PWC	https://paperswithcode.com/paper/generative-smoke-removal
Repo	https://github.com/acecreamu/ssim-pan
Framework	pytorch

Objects as Points


Title	Objects as Points
Authors	Xingyi Zhou, Dequan Wang, Philipp Krähenbühl
Abstract	Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point — the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.
Tasks	Keypoint Detection, Object Detection, Real-Time Object Detection
Published	2019-04-16
URL	http://arxiv.org/abs/1904.07850v2
PDF	http://arxiv.org/pdf/1904.07850v2.pdf
PWC	https://paperswithcode.com/paper/objects-as-points
Repo	https://github.com/xingyizhou/CenterNet
Framework	pytorch

Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction


Title	Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction
Authors	Somdyuti Paul, Andrey Norkin, Alan C. Bovik
Abstract	In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64$\times$64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate.
Tasks
Published	2019-06-15
URL	https://arxiv.org/abs/1906.06476v1
PDF	https://arxiv.org/pdf/1906.06476v1.pdf
PWC	https://paperswithcode.com/paper/speeding-up-vp9-intra-encoder-with
Repo	https://github.com/Somdyuti2/H-FCN
Framework	tf

Towards Better Forecasting by Fusing Near and Distant Future Visions


Title	Towards Better Forecasting by Fusing Near and Distant Future Visions
Authors	Jiezhu Cheng, Kaizhu Huang, Zibin Zheng
Abstract	Multivariate time series forecasting is an important yet challenging problem in machine learning. Most existing approaches only forecast the series value of one future moment, ignoring the interactions between predictions of future moments with different temporal distance. Such a deficiency probably prevents the model from getting enough information about the future, thus limiting the forecasting accuracy. To address this problem, we propose Multi-Level Construal Neural Network (MLCNN), a novel multi-task deep learning framework. Inspired by the Construal Level Theory of psychology, this model aims to improve the predictive performance by fusing forecasting information (i.e., future visions) of different future time. We first use the Convolution Neural Network to extract multi-level abstract representations of the raw data for near and distant future predictions. We then model the interplay between multiple predictive tasks and fuse their future visions through a modified Encoder-Decoder architecture. Finally, we combine traditional Autoregression model with the neural network to solve the scale insensitive problem. Experiments on three real-world datasets show that our method achieves statistically significant improvements compared to the most state-of-the-art baseline methods, with average 4.59% reduction on RMSE metric and average 6.87% reduction on MAE metric.
Tasks	Multivariate Time Series Forecasting, Time Series, Time Series Forecasting
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05122v1
PDF	https://arxiv.org/pdf/1912.05122v1.pdf
PWC	https://paperswithcode.com/paper/towards-better-forecasting-by-fusing-near-and
Repo	https://github.com/smallGum/MLCNN-Multivariate-Time-Series
Framework	pytorch

Long and Diverse Text Generation with Planning-based Hierarchical Variational Model


Title	Long and Diverse Text Generation with Planning-based Hierarchical Variational Model
Authors	Zhihong Shao, Minlie Huang, Jiangtao Wen, Wenfei Xu, Xiaoyan Zhu
Abstract	Existing neural methods for data-to-text generation are still struggling to produce long and diverse texts: they are insufficient to model input data dynamically during generation, to capture inter-sentence coherence, or to generate diversified expressions. To address these issues, we propose a Planning-based Hierarchical Variational Model (PHVM). Our model first plans a sequence of groups (each group is a subset of input items to be covered by a sentence) and then realizes each sentence conditioned on the planning result and the previously generated context, thereby decomposing long text generation into dependent sentence generation sub-tasks. To capture expression diversity, we devise a hierarchical latent structure where a global planning latent variable models the diversity of reasonable planning and a sequence of local latent variables controls sentence realization. Experiments show that our model outperforms state-of-the-art baselines in long and diverse text generation.
Tasks	Data-to-Text Generation, Latent Variable Models, Text Generation
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06605v2
PDF	https://arxiv.org/pdf/1908.06605v2.pdf
PWC	https://paperswithcode.com/paper/long-and-diverse-text-generation-with
Repo	https://github.com/ZhihongShao/Planning-based-Hierarchical-Variational-Model
Framework	tf

Lightweight Image Super-Resolution with Adaptive Weighted Learning Network


Title	Lightweight Image Super-Resolution with Adaptive Weighted Learning Network
Authors	Chaofeng Wang, Zheng Li, Jun Shi
Abstract	PyTorch code for our paper “Lightweight Image Super-Resolution with Adaptive Weighted Learning Network”
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02358v1
PDF	http://arxiv.org/pdf/1904.02358v1.pdf
PWC	https://paperswithcode.com/paper/lightweight-image-super-resolution-with
Repo	https://github.com/ChaofWang/AWSRN
Framework	pytorch

Measuring the Reliability of Reinforcement Learning Algorithms


Title	Measuring the Reliability of Reinforcement Learning Algorithms
Authors	Stephanie C. Y. Chan, Samuel Fishman, John Canny, Anoop Korattikara, Sergio Guadarrama
Abstract	Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. This problem has gained increasing attention in recent years, and efforts to improve it have grown substantially. To aid RL researchers and production users with the evaluation and improvement of reliability, we propose a set of metrics that quantitatively measure different aspects of reliability. In this work, we focus on variability and risk, both during training and after learning (on a fixed policy). We designed these metrics to be general-purpose, and we also designed complementary statistical tests to enable rigorous comparisons on these metrics. In this paper, we first describe the desired properties of the metrics and their design, the aspects of reliability that they measure, and their applicability to different scenarios. We then describe the statistical tests and make additional practical recommendations for reporting results. The metrics and accompanying statistical tools have been made available as an open-source library at https://github.com/google-research/rl-reliability-metrics. We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.05663v2
PDF	https://arxiv.org/pdf/1912.05663v2.pdf
PWC	https://paperswithcode.com/paper/measuring-the-reliability-of-reinforcement-1
Repo	https://github.com/google-research/rl-reliability-metrics
Framework	tf