Paper Group AWR 421
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. Emerging Convolutions for Generative Normalizing Flows. Automatic Temporally Coherent Video Colorization. SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses. A Personalized Subreddit Recommendation …
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
Title | Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks |
Authors | Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, Cho-Jui Hsieh |
Abstract | Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy—using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at https://github.com/google-research/google-research/tree/master/cluster_gcn. |
Tasks | Graph Clustering, Link Prediction, Node Classification |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.07953v2 |
https://arxiv.org/pdf/1905.07953v2.pdf | |
PWC | https://paperswithcode.com/paper/cluster-gcn-an-efficient-algorithm-for |
Repo | https://github.com/benedekrozemberczki/ClusterGCN |
Framework | pytorch |
Emerging Convolutions for Generative Normalizing Flows
Title | Emerging Convolutions for Generative Normalizing Flows |
Authors | Emiel Hoogeboom, Rianne van den Berg, Max Welling |
Abstract | Generative flows are attractive because they admit exact likelihood optimization and efficient image synthesis. Recently, Kingma & Dhariwal (2018) demonstrated with Glow that generative flows are capable of generating high quality images. We generalize the 1 x 1 convolutions proposed in Glow to invertible d x d convolutions, which are more flexible since they operate on both channel and spatial axes. We propose two methods to produce invertible convolutions that have receptive fields identical to standard convolutions: Emerging convolutions are obtained by chaining specific autoregressive convolutions, and periodic convolutions are decoupled in the frequency domain. Our experiments show that the flexibility of d x d convolutions significantly improves the performance of generative flow models on galaxy images, CIFAR10 and ImageNet. |
Tasks | Image Generation |
Published | 2019-01-30 |
URL | https://arxiv.org/abs/1901.11137v3 |
https://arxiv.org/pdf/1901.11137v3.pdf | |
PWC | https://paperswithcode.com/paper/emerging-convolutions-for-generative |
Repo | https://github.com/ehoogeboom/emerging |
Framework | tf |
Automatic Temporally Coherent Video Colorization
Title | Automatic Temporally Coherent Video Colorization |
Authors | Harrish Thasarathan, Kamyar Nazeri, Mehran Ebrahimi |
Abstract | Greyscale image colorization for applications in image restoration has seen significant improvements in recent years. Many of these techniques that use learning-based methods struggle to effectively colorize sparse inputs. With the consistent growth of the anime industry, the ability to colorize sparse input such as line art can reduce significant cost and redundant work for production studios by eliminating the in-between frame colorization process. Simply using existing methods yields inconsistent colors between related frames resulting in a flicker effect in the final video. In order to successfully automate key areas of large-scale anime production, the colorization of line arts must be temporally consistent between frames. This paper proposes a method to colorize line art frames in an adversarial setting, to create temporally coherent video of large anime by improving existing image to image translation methods. We show that by adding an extra condition to the generator and discriminator, we can effectively create temporally consistent video sequences from anime line arts. Code and models available at: https://github.com/Harry-Thasarathan/TCVC |
Tasks | Colorization, Image Restoration, Image-to-Image Translation |
Published | 2019-04-21 |
URL | http://arxiv.org/abs/1904.09527v1 |
http://arxiv.org/pdf/1904.09527v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-temporally-coherent-video |
Repo | https://github.com/iver56/automatic-video-colorization |
Framework | pytorch |
SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses
Title | SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses |
Authors | Zhiqiang Shen, Harsh Maheshwari, Weichen Yao, Marios Savvides |
Abstract | Unsupervised domain adaptive object detection aims to learn a robust detector in the domain shift circumstance, where the training (source) domain is label-rich with bounding box annotations, while the testing (target) domain is label-agnostic and the feature distributions between training and testing domains are dissimilar or even totally different. In this paper, we propose a gradient detach based stacked complementary losses (SCL) method that uses detection losses as the primary objective, and cuts in several auxiliary losses in different network stages accompanying with gradient detach training to learn more discriminative representations. We argue that the prior methods mainly leverage more loss functions for training but ignore the interaction of different losses and also the compatible training strategy (gradient detach updating in our work). Thus, our proposed method is a more syncretic adaptation learning process. We conduct comprehensive experiments on seven datasets, the results demonstrate that our method performs favorably better than the state-of-the-art methods by a significant margin. For instance, from Cityscapes to FoggyCityscapes, we achieve 37.9% mAP, outperforming the previous art Strong-Weak by 3.6%. |
Tasks | Object Detection |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.02559v3 |
https://arxiv.org/pdf/1911.02559v3.pdf | |
PWC | https://paperswithcode.com/paper/scl-towards-accurate-domain-adaptive-object |
Repo | https://github.com/harsh-99/SCL |
Framework | pytorch |
A Personalized Subreddit Recommendation Engine
Title | A Personalized Subreddit Recommendation Engine |
Authors | Abhishek K Das, Nikhil Bhat, Sukanto Guha, Janvi Palan |
Abstract | This paper aims to improve upon the generic recommendations that Reddit provides for its users. We propose a novel personalized recommender system that learns from both, the presence and the content of user-subreddit interaction, using implicit and explicit signals to provide robust recommendations. |
Tasks | Recommendation Systems |
Published | 2019-05-03 |
URL | https://arxiv.org/abs/1905.01263v1 |
https://arxiv.org/pdf/1905.01263v1.pdf | |
PWC | https://paperswithcode.com/paper/a-personalized-subreddit-recommendation |
Repo | https://github.com/abkds/r-ecommender |
Framework | none |
Multiple Light Source Dataset for Colour Research
Title | Multiple Light Source Dataset for Colour Research |
Authors | Anna Smagina, Egor Ershov, Anton Grigoryev |
Abstract | We present a collection of 24 multiple object scenes each recorded under 18 multiple light source illumination scenarios. The illuminants are varying in dominant spectral colours, intensity and distance from the scene. We mainly address the realistic scenarios for evaluation of computational colour constancy algorithms, but also have aimed to make the data as general as possible for computational colour science and computer vision. Along with the images of the scenes, we provide spectral characteristics of the camera, light sources and the objects and include pixel-by-pixel ground truth annotation of uniformly coloured object surfaces thus making this useful for benchmarking colour-based image segmentation algorithms. The dataset is freely available at https://github.com/visillect/mls-dataset. |
Tasks | Semantic Segmentation |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.06126v4 |
https://arxiv.org/pdf/1908.06126v4.pdf | |
PWC | https://paperswithcode.com/paper/multiple-light-source-dataset-for-colour |
Repo | https://github.com/Visillect/mls-dataset |
Framework | none |
RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation
Title | RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation |
Authors | Shaoru Wang, Yongchao Gong, Junliang Xing, Lichao Huang, Chang Huang, Weiming Hu |
Abstract | Object detection and instance segmentation are two fundamental computer vision tasks. They are closely correlated but their relations have not yet been fully explored in most previous work. This paper presents RDSNet, a novel deep architecture for reciprocal object detection and instance segmentation. To reciprocate these two tasks, we design a two-stream structure to learn features on both the object level (i.e., bounding boxes) and the pixel level (i.e., instance masks) jointly. Within this structure, information from the two streams is fused alternately, namely information on the object level introduces the awareness of instance and translation variance to the pixel level, and information on the pixel level refines the localization accuracy of objects on the object level in return. Specifically, a correlation module and a cropping module are proposed to yield instance masks, as well as a mask based boundary refinement module for more accurate bounding boxes. Extensive experimental analyses and comparisons on the COCO dataset demonstrate the effectiveness and efficiency of RDSNet. The source code is available at https://github.com/wangsr126/RDSNet. |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05070v1 |
https://arxiv.org/pdf/1912.05070v1.pdf | |
PWC | https://paperswithcode.com/paper/rdsnet-a-new-deep-architecture-for-reciprocal |
Repo | https://github.com/wangsr126/RDSNet |
Framework | pytorch |
Variational AutoEncoder For Regression: Application to Brain Aging Analysis
Title | Variational AutoEncoder For Regression: Application to Brain Aging Analysis |
Authors | Qingyu Zhao, Ehsan Adeli, Nicolas Honnorat, Tuo Leng, Kilian M. Pohl |
Abstract | While unsupervised variational autoencoders (VAE) have become a powerful tool in neuroimage analysis, their application to supervised learning is under-explored. We aim to close this gap by proposing a unified probabilistic model for learning the latent space of imaging data and performing supervised regression. Based on recent advances in learning disentangled representations, the novel generative process explicitly models the conditional distribution of latent representations with respect to the regression target variable. Performing a variational inference procedure on this model leads to joint regularization between the VAE and a neural-network regressor. In predicting the age of 245 subjects from their structural Magnetic Resonance (MR) images, our model is more accurate than state-of-the-art methods when applied to either region-of-interest (ROI) measurements or raw 3D volume images. More importantly, unlike simple feed-forward neural-networks, disentanglement of age in latent representations allows for intuitive interpretation of the structural developmental patterns of the human brain. |
Tasks | |
Published | 2019-04-11 |
URL | https://arxiv.org/abs/1904.05948v2 |
https://arxiv.org/pdf/1904.05948v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-autoencoder-for-regression |
Repo | https://github.com/QingyuZhao/VAE-for-Regression |
Framework | none |
Generative Smoke Removal
Title | Generative Smoke Removal |
Authors | Oleksii Sidorov, Congcong Wang, Faouzi Alaya Cheikh |
Abstract | In minimally invasive surgery, the use of tissue dissection tools causes smoke, which inevitably degrades the image quality. This could reduce the visibility of the operation field for surgeons and introduces errors for the computer vision algorithms used in surgical navigation systems. In this paper, we propose a novel approach for computational smoke removal using supervised image-to-image translation. We demonstrate that straightforward application of existing generative algorithms allows removing smoke but decreases image quality and introduces synthetic noise (grid-structure). Thus, we propose to solve this issue by modification of GAN’s architecture and adding perceptual image quality metric to the loss function. Obtained results demonstrate that proposed method efficiently removes smoke as well as preserves perceptually sufficient image quality. |
Tasks | Image-to-Image Translation |
Published | 2019-02-01 |
URL | https://arxiv.org/abs/1902.00311v2 |
https://arxiv.org/pdf/1902.00311v2.pdf | |
PWC | https://paperswithcode.com/paper/generative-smoke-removal |
Repo | https://github.com/acecreamu/ssim-pan |
Framework | pytorch |
Objects as Points
Title | Objects as Points |
Authors | Xingyi Zhou, Dequan Wang, Philipp Krähenbühl |
Abstract | Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point — the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time. |
Tasks | Keypoint Detection, Object Detection, Real-Time Object Detection |
Published | 2019-04-16 |
URL | http://arxiv.org/abs/1904.07850v2 |
http://arxiv.org/pdf/1904.07850v2.pdf | |
PWC | https://paperswithcode.com/paper/objects-as-points |
Repo | https://github.com/xingyizhou/CenterNet |
Framework | pytorch |
Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction
Title | Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction |
Authors | Somdyuti Paul, Andrey Norkin, Alan C. Bovik |
Abstract | In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64$\times$64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate. |
Tasks | |
Published | 2019-06-15 |
URL | https://arxiv.org/abs/1906.06476v1 |
https://arxiv.org/pdf/1906.06476v1.pdf | |
PWC | https://paperswithcode.com/paper/speeding-up-vp9-intra-encoder-with |
Repo | https://github.com/Somdyuti2/H-FCN |
Framework | tf |
Towards Better Forecasting by Fusing Near and Distant Future Visions
Title | Towards Better Forecasting by Fusing Near and Distant Future Visions |
Authors | Jiezhu Cheng, Kaizhu Huang, Zibin Zheng |
Abstract | Multivariate time series forecasting is an important yet challenging problem in machine learning. Most existing approaches only forecast the series value of one future moment, ignoring the interactions between predictions of future moments with different temporal distance. Such a deficiency probably prevents the model from getting enough information about the future, thus limiting the forecasting accuracy. To address this problem, we propose Multi-Level Construal Neural Network (MLCNN), a novel multi-task deep learning framework. Inspired by the Construal Level Theory of psychology, this model aims to improve the predictive performance by fusing forecasting information (i.e., future visions) of different future time. We first use the Convolution Neural Network to extract multi-level abstract representations of the raw data for near and distant future predictions. We then model the interplay between multiple predictive tasks and fuse their future visions through a modified Encoder-Decoder architecture. Finally, we combine traditional Autoregression model with the neural network to solve the scale insensitive problem. Experiments on three real-world datasets show that our method achieves statistically significant improvements compared to the most state-of-the-art baseline methods, with average 4.59% reduction on RMSE metric and average 6.87% reduction on MAE metric. |
Tasks | Multivariate Time Series Forecasting, Time Series, Time Series Forecasting |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05122v1 |
https://arxiv.org/pdf/1912.05122v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-better-forecasting-by-fusing-near-and |
Repo | https://github.com/smallGum/MLCNN-Multivariate-Time-Series |
Framework | pytorch |
Long and Diverse Text Generation with Planning-based Hierarchical Variational Model
Title | Long and Diverse Text Generation with Planning-based Hierarchical Variational Model |
Authors | Zhihong Shao, Minlie Huang, Jiangtao Wen, Wenfei Xu, Xiaoyan Zhu |
Abstract | Existing neural methods for data-to-text generation are still struggling to produce long and diverse texts: they are insufficient to model input data dynamically during generation, to capture inter-sentence coherence, or to generate diversified expressions. To address these issues, we propose a Planning-based Hierarchical Variational Model (PHVM). Our model first plans a sequence of groups (each group is a subset of input items to be covered by a sentence) and then realizes each sentence conditioned on the planning result and the previously generated context, thereby decomposing long text generation into dependent sentence generation sub-tasks. To capture expression diversity, we devise a hierarchical latent structure where a global planning latent variable models the diversity of reasonable planning and a sequence of local latent variables controls sentence realization. Experiments show that our model outperforms state-of-the-art baselines in long and diverse text generation. |
Tasks | Data-to-Text Generation, Latent Variable Models, Text Generation |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06605v2 |
https://arxiv.org/pdf/1908.06605v2.pdf | |
PWC | https://paperswithcode.com/paper/long-and-diverse-text-generation-with |
Repo | https://github.com/ZhihongShao/Planning-based-Hierarchical-Variational-Model |
Framework | tf |
Lightweight Image Super-Resolution with Adaptive Weighted Learning Network
Title | Lightweight Image Super-Resolution with Adaptive Weighted Learning Network |
Authors | Chaofeng Wang, Zheng Li, Jun Shi |
Abstract | PyTorch code for our paper “Lightweight Image Super-Resolution with Adaptive Weighted Learning Network” |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02358v1 |
http://arxiv.org/pdf/1904.02358v1.pdf | |
PWC | https://paperswithcode.com/paper/lightweight-image-super-resolution-with |
Repo | https://github.com/ChaofWang/AWSRN |
Framework | pytorch |
Measuring the Reliability of Reinforcement Learning Algorithms
Title | Measuring the Reliability of Reinforcement Learning Algorithms |
Authors | Stephanie C. Y. Chan, Samuel Fishman, John Canny, Anoop Korattikara, Sergio Guadarrama |
Abstract | Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. This problem has gained increasing attention in recent years, and efforts to improve it have grown substantially. To aid RL researchers and production users with the evaluation and improvement of reliability, we propose a set of metrics that quantitatively measure different aspects of reliability. In this work, we focus on variability and risk, both during training and after learning (on a fixed policy). We designed these metrics to be general-purpose, and we also designed complementary statistical tests to enable rigorous comparisons on these metrics. In this paper, we first describe the desired properties of the metrics and their design, the aspects of reliability that they measure, and their applicability to different scenarios. We then describe the statistical tests and make additional practical recommendations for reporting results. The metrics and accompanying statistical tools have been made available as an open-source library at https://github.com/google-research/rl-reliability-metrics. We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results. |
Tasks | |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.05663v2 |
https://arxiv.org/pdf/1912.05663v2.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-reliability-of-reinforcement-1 |
Repo | https://github.com/google-research/rl-reliability-metrics |
Framework | tf |