October 20, 2019

2927 words 14 mins read

Paper Group AWR 242

Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam. DeepV2D: Video to Depth with Differentiable Structure from Motion. Neural Network Renormalization Group. Adversarial Personalized Ranking for Recommendation. Towards Binary-Valued Gates for Robust LSTM Training. FEVER: a large-scale dataset for Fact Extraction and VERification …

Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam


Title	Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Authors	Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava
Abstract	Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented within the Adam optimizer by perturbing the network weights during gradient evaluations, and uncertainty estimates can be cheaply obtained by using the vector that adapts the learning rate. This requires lower memory, computation, and implementation effort than existing VI methods, while obtaining uncertainty estimates of comparable quality. Our empirical results confirm this and further suggest that the weight-perturbation in our algorithm could be useful for exploration in reinforcement learning and stochastic optimization.
Tasks	Stochastic Optimization
Published	2018-06-13
URL	http://arxiv.org/abs/1806.04854v3
PDF	http://arxiv.org/pdf/1806.04854v3.pdf
PWC	https://paperswithcode.com/paper/fast-and-scalable-bayesian-deep-learning-by
Repo	https://github.com/emtiyaz/vadam
Framework	none

DeepV2D: Video to Depth with Differentiable Structure from Motion


Title	DeepV2D: Video to Depth with Differentiable Structure from Motion
Authors	Zachary Teed, Jia Deng
Abstract	We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth. Code is available https://github.com/princeton-vl/DeepV2D.
Tasks	Depth Estimation, Motion Estimation, Optical Flow Estimation, Stereo Matching Hand
Published	2018-12-11
URL	https://arxiv.org/abs/1812.04605v3
PDF	https://arxiv.org/pdf/1812.04605v3.pdf
PWC	https://paperswithcode.com/paper/deepv2d-video-to-depth-with-differentiable
Repo	https://github.com/princeton-vl/DeepV2D
Framework	tf

Neural Network Renormalization Group


Title	Neural Network Renormalization Group
Authors	Shuo-Hui Li, Lei Wang
Abstract	We present a variational renormalization group (RG) approach using a deep generative model based on normalizing flows. The model performs hierarchical change-of-variables transformations from the physical space to a latent space with reduced mutual information. Conversely, the neural net directly maps independent Gaussian noises to physical configurations following the inverse RG flow. The model has an exact and tractable likelihood, which allows unbiased training and direct access to the renormalized energy function of the latent variables. To train the model, we employ probability density distillation for the bare energy function of the physical problem, in which the training loss provides a variational upper bound of the physical free energy. We demonstrate practical usage of the approach by identifying mutually independent collective variables of the Ising model and performing accelerated hybrid Monte Carlo sampling in the latent space. Lastly, we comment on the connection of the present approach to the wavelet formulation of RG and the modern pursuit of information preserving RG.
Tasks
Published	2018-02-08
URL	http://arxiv.org/abs/1802.02840v4
PDF	http://arxiv.org/pdf/1802.02840v4.pdf
PWC	https://paperswithcode.com/paper/neural-network-renormalization-group
Repo	https://github.com/li012589/NeuralRG
Framework	pytorch

Adversarial Personalized Ranking for Recommendation


Title	Adversarial Personalized Ranking for Recommendation
Authors	Xiangnan He, Zhankui He, Xiaoyu Du, Tat-Seng Chua
Abstract	Item recommendation is a personalized ranking task. To this end, many recommender systems optimize models with pairwise ranking objectives, such as the Bayesian Personalized Ranking (BPR). Using matrix Factorization (MF) — the most widely used model in recommendation — as a demonstration, we show that optimizing it with BPR leads to a recommender model that is not robust. In particular, we find that the resultant model is highly vulnerable to adversarial perturbations on its model parameters, which implies the possibly large error in generalization. To enhance the robustness of a recommender model and thus improve its generalization performance, we propose a new optimization framework, namely Adversarial Personalized Ranking (APR). In short, our APR enhances the pairwise ranking method BPR by performing adversarial training. It can be interpreted as playing a minimax game, where the minimization of the BPR objective function meanwhile defends an adversary, which adds adversarial perturbations on model parameters to maximize the BPR objective function. To illustrate how it works, we implement APR on MF by adding adversarial perturbations on the embedding vectors of users and items. Extensive experiments on three public real-world datasets demonstrate the effectiveness of APR — by optimizing MF with APR, it outperforms BPR with a relative improvement of 11.2% on average and achieves state-of-the-art performance for item recommendation. Our implementation is available at: https://github.com/hexiangnan/adversarial_personalized_ranking.
Tasks	Recommendation Systems
Published	2018-08-12
URL	http://arxiv.org/abs/1808.03908v1
PDF	http://arxiv.org/pdf/1808.03908v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-personalized-ranking-for
Repo	https://github.com/hexiangnan/adversarial_personalized_ranking
Framework	tf

Towards Binary-Valued Gates for Robust LSTM Training


Title	Towards Binary-Valued Gates for Robust LSTM Training
Authors	Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Liwei Wang, Tie-Yan Liu
Abstract	Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal. In this paper, we propose a new way for LSTM training, which pushes the output values of the gates towards 0 or 1. By doing so, we can better control the information flow: the gates are mostly open or closed, instead of in a middle state, which makes the results more interpretable. Empirical studies show that (1) Although it seems that we restrict the model capacity, there is no performance drop: we achieve better or comparable performances due to its better generalization ability; (2) The outputs of gates are not sensitive to their inputs: we can easily compress the LSTM unit in multiple ways, e.g., low-rank approximation and low-precision approximation. The compressed models are even better than the baseline models without compression.
Tasks
Published	2018-06-08
URL	http://arxiv.org/abs/1806.02988v1
PDF	http://arxiv.org/pdf/1806.02988v1.pdf
PWC	https://paperswithcode.com/paper/towards-binary-valued-gates-for-robust-lstm
Repo	https://github.com/zhuohan123/g2-lstm
Framework	pytorch

FEVER: a large-scale dataset for Fact Extraction and VERification


Title	FEVER: a large-scale dataset for Fact Extraction and VERification
Authors	James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal
Abstract	In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achieving 0.6841 in Fleiss $\kappa$. For the first two classes, the annotators also recorded the sentence(s) forming the necessary evidence for their judgment. To characterize the challenge of the dataset presented, we develop a pipeline approach and compare it to suitably designed oracles. The best accuracy we achieve on labeling a claim accompanied by the correct evidence is 31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that FEVER is a challenging testbed that will help stimulate progress on claim verification against textual sources.
Tasks
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05355v3
PDF	http://arxiv.org/pdf/1803.05355v3.pdf
PWC	https://paperswithcode.com/paper/fever-a-large-scale-dataset-for-fact
Repo	https://github.com/awslabs/fever
Framework	none

The Unreasonable Effectiveness of Texture Transfer for Single Image Super-resolution


Title	The Unreasonable Effectiveness of Texture Transfer for Single Image Super-resolution
Authors	Muhammad Waleed Gondal, Bernhard Schölkopf, Michael Hirsch
Abstract	While implicit generative models such as GANs have shown impressive results in high quality image reconstruction and manipulation using a combination of various losses, we consider a simpler approach leading to surprisingly strong results. We show that texture loss alone allows the generation of perceptually high quality images. We provide a better understanding of texture constraining mechanism and develop a novel semantically guided texture constraining method for further improvement. Using a recently developed perceptual metric employing “deep features” and termed LPIPS, the method obtains state-of-the-art results. Moreover, we show that a texture representation of those deep features better capture the perceptual quality of an image than the original deep features. Using texture information, off-the-shelf deep classification networks (without training) perform as well as the best performing (tuned and calibrated) LPIPS metrics. The code is publicly available.
Tasks	Image Reconstruction, Image Super-Resolution, Super-Resolution
Published	2018-07-31
URL	http://arxiv.org/abs/1808.00043v1
PDF	http://arxiv.org/pdf/1808.00043v1.pdf
PWC	https://paperswithcode.com/paper/the-unreasonable-effectiveness-of-texture
Repo	https://github.com/waleedgondal/Texture-based-Super-Resolution-Network
Framework	pytorch

Hyperparameter Learning for Conditional Kernel Mean Embeddings with Rademacher Complexity Bounds


Title	Hyperparameter Learning for Conditional Kernel Mean Embeddings with Rademacher Complexity Bounds
Authors	Kelvin Hsu, Richard Nock, Fabio Ramos
Abstract	Conditional kernel mean embeddings are nonparametric models that encode conditional expectations in a reproducing kernel Hilbert space. While they provide a flexible and powerful framework for probabilistic inference, their performance is highly dependent on the choice of kernel and regularization hyperparameters. Nevertheless, current hyperparameter tuning methods predominantly rely on expensive cross validation or heuristics that is not optimized for the inference task. For conditional kernel mean embeddings with categorical targets and arbitrary inputs, we propose a hyperparameter learning framework based on Rademacher complexity bounds to prevent overfitting by balancing data fit against model complexity. Our approach only requires batch updates, allowing scalable kernel hyperparameter tuning without invoking kernel approximations. Experiments demonstrate that our learning framework outperforms competing methods, and can be further extended to incorporate and learn deep neural network weights to improve generalization.
Tasks
Published	2018-09-01
URL	http://arxiv.org/abs/1809.00175v3
PDF	http://arxiv.org/pdf/1809.00175v3.pdf
PWC	https://paperswithcode.com/paper/hyperparameter-learning-for-conditional
Repo	https://github.com/Kelvin-Hsu/cake
Framework	tf

Segmentation-driven 6D Object Pose Estimation


Title	Segmentation-driven 6D Object Pose Estimation
Authors	Yinlin Hu, Joachim Hugonot, Pascal Fua, Mathieu Salzmann
Abstract	The most recent trend in estimating the 6D pose of rigid objects has been to train deep networks to either directly regress the pose from the image or to predict the 2D locations of 3D keypoints, from which the pose can be obtained using a PnP algorithm. In both cases, the object is treated as a global entity, and a single pose estimate is computed. As a consequence, the resulting techniques can be vulnerable to large occlusions. In this paper, we introduce a segmentation-driven 6D pose estimation framework where each visible part of the objects contributes a local pose prediction in the form of 2D keypoint locations. We then use a predicted measure of confidence to combine these pose candidates into a robust set of 3D-to-2D correspondences, from which a reliable pose estimate can be obtained. We outperform the state-of-the-art on the challenging Occluded-LINEMOD and YCB-Video datasets, which is evidence that our approach deals well with multiple poorly-textured objects occluding each other. Furthermore, it relies on a simple enough architecture to achieve real-time performance.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation, Pose Prediction
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02541v3
PDF	http://arxiv.org/pdf/1812.02541v3.pdf
PWC	https://paperswithcode.com/paper/segmentation-driven-6d-object-pose-estimation
Repo	https://github.com/sjtuytc/segmentation-driven-pose
Framework	pytorch

Learning with Abandonment


Title	Learning with Abandonment
Authors	Ramesh Johari, Sven Schmit
Abstract	Consider a platform that wants to learn a personalized policy for each user, but the platform faces the risk of a user abandoning the platform if she is dissatisfied with the actions of the platform. For example, a platform is interested in personalizing the number of newsletters it sends, but faces the risk that the user unsubscribes forever. We propose a general thresholded learning model for scenarios like this, and discuss the structure of optimal policies. We describe salient features of optimal personalization algorithms and how feedback the platform receives impacts the results. Furthermore, we investigate how the platform can efficiently learn the heterogeneity across users by interacting with a population and provide performance guarantees.
Tasks
Published	2018-02-23
URL	http://arxiv.org/abs/1802.08718v1
PDF	http://arxiv.org/pdf/1802.08718v1.pdf
PWC	https://paperswithcode.com/paper/learning-with-abandonment
Repo	https://github.com/schmit/learning-abandonment
Framework	none

Nested LSTMs


Title	Nested LSTMs
Authors	Joel Ruben Antony Moniz, David Krueger
Abstract	We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, which has its own inner memory cell. Specifically, instead of computing the value of the (outer) memory cell as $c^{outer}t = f_t \odot c{t-1} + i_t \odot g_t$, NLSTM memory cells use the concatenation $(f_t \odot c_{t-1}, i_t \odot g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set $c^{outer}_t$ = $h^{inner}_t$. Nested LSTMs outperform both stacked and single-layer LSTMs with similar numbers of parameters in our experiments on various character-level language modeling tasks, and the inner memories of an LSTM learn longer term dependencies compared with the higher-level units of a stacked LSTM.
Tasks	Language Modelling
Published	2018-01-31
URL	http://arxiv.org/abs/1801.10308v1
PDF	http://arxiv.org/pdf/1801.10308v1.pdf
PWC	https://paperswithcode.com/paper/nested-lstms
Repo	https://github.com/hannw/nlstm
Framework	tf

Rethinking ImageNet Pre-training


Title	Rethinking ImageNet Pre-training
Authors	Kaiming He, Ross Girshick, Piotr Dollár
Abstract	We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are no worse than their ImageNet pre-training counterparts even when using the hyper-parameters of the baseline system (Mask R-CNN) that were optimized for fine-tuning pre-trained models, with the sole exception of increasing the number of training iterations so the randomly initialized models may converge. Training from random initialization is surprisingly robust; our results hold even when: (i) using only 10% of the training data, (ii) for deeper and wider models, and (iii) for multiple tasks and metrics. Experiments show that ImageNet pre-training speeds up convergence early in training, but does not necessarily provide regularization or improve final target task accuracy. To push the envelope we demonstrate 50.9 AP on COCO object detection without using any external data—a result on par with the top COCO 2017 competition results that used ImageNet pre-training. These observations challenge the conventional wisdom of ImageNet pre-training for dependent tasks and we expect these discoveries will encourage people to rethink the current de facto paradigm of `pre-training and fine-tuning’ in computer vision. \|
Tasks	Instance Segmentation, Object Detection, Semantic Segmentation
Published	2018-11-21
URL	http://arxiv.org/abs/1811.08883v1
PDF	http://arxiv.org/pdf/1811.08883v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-imagenet-pre-training
Repo	https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN
Framework	tf

A Benchmark of Selected Algorithmic Differentiation Tools on Some Problems in Computer Vision and Machine Learning


Title	A Benchmark of Selected Algorithmic Differentiation Tools on Some Problems in Computer Vision and Machine Learning
Authors	Filip Šrajer, Zuzana Kukelova, Andrew Fitzgibbon
Abstract	Algorithmic differentiation (AD) allows exact computation of derivatives given only an implementation of an objective function. Although many AD tools are available, a proper and efficient implementation of AD methods is not straightforward. The existing tools are often too different to allow for a general test suite. In this paper, we compare fifteen ways of computing derivatives including eleven automatic differentiation tools implementing various methods and written in various languages (C++, F#, MATLAB, Julia and Python), two symbolic differentiation tools, finite differences, and hand-derived computation. We look at three objective functions from computer vision and machine learning. These objectives are for the most part simple, in the sense that no iterative loops are involved, and conditional statements are encapsulated in functions such as {\tt abs} or {\tt logsumexp}. However, it is important for the success of algorithmic differentiation that such `simple’ objective functions are handled efficiently, as so many problems in computer vision and machine learning are of this form. Of course, our results depend on programmer skill, and familiarity with the tools. However, we contend that this paper presents an important datapoint: a skilled programmer devoting roughly a week to each tool produced the timings we present. We have made our implementations available as open source to allow the community to replicate and update these benchmarks. \|
Tasks
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10129v1
PDF	http://arxiv.org/pdf/1807.10129v1.pdf
PWC	https://paperswithcode.com/paper/a-benchmark-of-selected-algorithmic
Repo	https://github.com/awf/ADBench
Framework	none

Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining


Title	Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining
Authors	Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, Hongbin Zha
Abstract	Rain streaks can severely degrade the visibility, which causes many current computer vision algorithms fail to work. So it is necessary to remove the rain from images. We propose a novel deep network architecture based on deep convolutional and recurrent neural networks for single image deraining. As contextual information is very important for rain removal, we first adopt the dilated convolutional neural network to acquire large receptive field. To better fit the rain removal task, we also modify the network. In heavy rain, rain streaks have various directions and shapes, which can be regarded as the accumulation of multiple rain streak layers. We assign different alpha-values to various rain streak layers according to the intensity and transparency by incorporating the squeeze-and-excitation block. Since rain streak layers overlap with each other, it is not easy to remove the rain in one stage. So we further decompose the rain removal into multiple stages. Recurrent neural network is incorporated to preserve the useful information in previous stages and benefit the rain removal in later stages. We conduct extensive experiments on both synthetic and real-world datasets. Our proposed method outperforms the state-of-the-art approaches under all evaluation metrics. Codes and supplementary material are available at our project webpage: https://xialipku.github.io/RESCAN .
Tasks	Rain Removal, Single Image Deraining
Published	2018-07-16
URL	http://arxiv.org/abs/1807.05698v2
PDF	http://arxiv.org/pdf/1807.05698v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-squeeze-and-excitation-context
Repo	https://github.com/XueweiMeng/derain_filter
Framework	tf

Deep convolutional Gaussian processes


Title	Deep convolutional Gaussian processes
Authors	Kenneth Blomqvist, Samuel Kaski, Markus Heinonen
Abstract	We propose deep convolutional Gaussian processes, a deep Gaussian process architecture with convolutional structure. The model is a principled Bayesian framework for detecting hierarchical combinations of local features for image classification. We demonstrate greatly improved image classification performance compared to current Gaussian process approaches on the MNIST and CIFAR-10 datasets. In particular, we improve CIFAR-10 accuracy by over 10 percentage points.
Tasks	Gaussian Processes, Image Classification
Published	2018-10-06
URL	http://arxiv.org/abs/1810.03052v1
PDF	http://arxiv.org/pdf/1810.03052v1.pdf
PWC	https://paperswithcode.com/paper/deep-convolutional-gaussian-processes
Repo	https://github.com/kekeblom/DeepCGP
Framework	tf