October 20, 2019

2927 words 14 mins read

Paper Group AWR 242

Paper Group AWR 242

Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam. DeepV2D: Video to Depth with Differentiable Structure from Motion. Neural Network Renormalization Group. Adversarial Personalized Ranking for Recommendation. Towards Binary-Valued Gates for Robust LSTM Training. FEVER: a large-scale dataset for Fact Extraction and VERification …

Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam

Title Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Authors Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava
Abstract Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented within the Adam optimizer by perturbing the network weights during gradient evaluations, and uncertainty estimates can be cheaply obtained by using the vector that adapts the learning rate. This requires lower memory, computation, and implementation effort than existing VI methods, while obtaining uncertainty estimates of comparable quality. Our empirical results confirm this and further suggest that the weight-perturbation in our algorithm could be useful for exploration in reinforcement learning and stochastic optimization.
Tasks Stochastic Optimization
Published 2018-06-13
URL http://arxiv.org/abs/1806.04854v3
PDF http://arxiv.org/pdf/1806.04854v3.pdf
PWC https://paperswithcode.com/paper/fast-and-scalable-bayesian-deep-learning-by
Repo https://github.com/emtiyaz/vadam
Framework none

DeepV2D: Video to Depth with Differentiable Structure from Motion

Title DeepV2D: Video to Depth with Differentiable Structure from Motion
Authors Zachary Teed, Jia Deng
Abstract We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth. Code is available https://github.com/princeton-vl/DeepV2D.
Tasks Depth Estimation, Motion Estimation, Optical Flow Estimation, Stereo Matching Hand
Published 2018-12-11
URL https://arxiv.org/abs/1812.04605v3
PDF https://arxiv.org/pdf/1812.04605v3.pdf
PWC https://paperswithcode.com/paper/deepv2d-video-to-depth-with-differentiable
Repo https://github.com/princeton-vl/DeepV2D
Framework tf

Neural Network Renormalization Group

Title Neural Network Renormalization Group
Authors Shuo-Hui Li, Lei Wang
Abstract We present a variational renormalization group (RG) approach using a deep generative model based on normalizing flows. The model performs hierarchical change-of-variables transformations from the physical space to a latent space with reduced mutual information. Conversely, the neural net directly maps independent Gaussian noises to physical configurations following the inverse RG flow. The model has an exact and tractable likelihood, which allows unbiased training and direct access to the renormalized energy function of the latent variables. To train the model, we employ probability density distillation for the bare energy function of the physical problem, in which the training loss provides a variational upper bound of the physical free energy. We demonstrate practical usage of the approach by identifying mutually independent collective variables of the Ising model and performing accelerated hybrid Monte Carlo sampling in the latent space. Lastly, we comment on the connection of the present approach to the wavelet formulation of RG and the modern pursuit of information preserving RG.
Tasks
Published 2018-02-08
URL http://arxiv.org/abs/1802.02840v4
PDF http://arxiv.org/pdf/1802.02840v4.pdf
PWC https://paperswithcode.com/paper/neural-network-renormalization-group
Repo https://github.com/li012589/NeuralRG
Framework pytorch

Adversarial Personalized Ranking for Recommendation

Title Adversarial Personalized Ranking for Recommendation
Authors Xiangnan He, Zhankui He, Xiaoyu Du, Tat-Seng Chua
Abstract Item recommendation is a personalized ranking task. To this end, many recommender systems optimize models with pairwise ranking objectives, such as the Bayesian Personalized Ranking (BPR). Using matrix Factorization (MF) — the most widely used model in recommendation — as a demonstration, we show that optimizing it with BPR leads to a recommender model that is not robust. In particular, we find that the resultant model is highly vulnerable to adversarial perturbations on its model parameters, which implies the possibly large error in generalization. To enhance the robustness of a recommender model and thus improve its generalization performance, we propose a new optimization framework, namely Adversarial Personalized Ranking (APR). In short, our APR enhances the pairwise ranking method BPR by performing adversarial training. It can be interpreted as playing a minimax game, where the minimization of the BPR objective function meanwhile defends an adversary, which adds adversarial perturbations on model parameters to maximize the BPR objective function. To illustrate how it works, we implement APR on MF by adding adversarial perturbations on the embedding vectors of users and items. Extensive experiments on three public real-world datasets demonstrate the effectiveness of APR — by optimizing MF with APR, it outperforms BPR with a relative improvement of 11.2% on average and achieves state-of-the-art performance for item recommendation. Our implementation is available at: https://github.com/hexiangnan/adversarial_personalized_ranking.
Tasks Recommendation Systems
Published 2018-08-12
URL http://arxiv.org/abs/1808.03908v1
PDF http://arxiv.org/pdf/1808.03908v1.pdf
PWC https://paperswithcode.com/paper/adversarial-personalized-ranking-for
Repo https://github.com/hexiangnan/adversarial_personalized_ranking
Framework tf

Towards Binary-Valued Gates for Robust LSTM Training

Title Towards Binary-Valued Gates for Robust LSTM Training
Authors Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Liwei Wang, Tie-Yan Liu
Abstract Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal. In this paper, we propose a new way for LSTM training, which pushes the output values of the gates towards 0 or 1. By doing so, we can better control the information flow: the gates are mostly open or closed, instead of in a middle state, which makes the results more interpretable. Empirical studies show that (1) Although it seems that we restrict the model capacity, there is no performance drop: we achieve better or comparable performances due to its better generalization ability; (2) The outputs of gates are not sensitive to their inputs: we can easily compress the LSTM unit in multiple ways, e.g., low-rank approximation and low-precision approximation. The compressed models are even better than the baseline models without compression.
Tasks
Published 2018-06-08
URL http://arxiv.org/abs/1806.02988v1
PDF http://arxiv.org/pdf/1806.02988v1.pdf
PWC https://paperswithcode.com/paper/towards-binary-valued-gates-for-robust-lstm
Repo https://github.com/zhuohan123/g2-lstm
Framework pytorch

FEVER: a large-scale dataset for Fact Extraction and VERification

Title FEVER: a large-scale dataset for Fact Extraction and VERification
Authors James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal
Abstract In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achieving 0.6841 in Fleiss $\kappa$. For the first two classes, the annotators also recorded the sentence(s) forming the necessary evidence for their judgment. To characterize the challenge of the dataset presented, we develop a pipeline approach and compare it to suitably designed oracles. The best accuracy we achieve on labeling a claim accompanied by the correct evidence is 31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that FEVER is a challenging testbed that will help stimulate progress on claim verification against textual sources.
Tasks
Published 2018-03-14
URL http://arxiv.org/abs/1803.05355v3
PDF http://arxiv.org/pdf/1803.05355v3.pdf
PWC https://paperswithcode.com/paper/fever-a-large-scale-dataset-for-fact
Repo https://github.com/awslabs/fever
Framework none

The Unreasonable Effectiveness of Texture Transfer for Single Image Super-resolution

Title The Unreasonable Effectiveness of Texture Transfer for Single Image Super-resolution
Authors Muhammad Waleed Gondal, Bernhard Schölkopf, Michael Hirsch
Abstract While implicit generative models such as GANs have shown impressive results in high quality image reconstruction and manipulation using a combination of various losses, we consider a simpler approach leading to surprisingly strong results. We show that texture loss alone allows the generation of perceptually high quality images. We provide a better understanding of texture constraining mechanism and develop a novel semantically guided texture constraining method for further improvement. Using a recently developed perceptual metric employing “deep features” and termed LPIPS, the method obtains state-of-the-art results. Moreover, we show that a texture representation of those deep features better capture the perceptual quality of an image than the original deep features. Using texture information, off-the-shelf deep classification networks (without training) perform as well as the best performing (tuned and calibrated) LPIPS metrics. The code is publicly available.
Tasks Image Reconstruction, Image Super-Resolution, Super-Resolution
Published 2018-07-31
URL http://arxiv.org/abs/1808.00043v1
PDF http://arxiv.org/pdf/1808.00043v1.pdf
PWC https://paperswithcode.com/paper/the-unreasonable-effectiveness-of-texture
Repo https://github.com/waleedgondal/Texture-based-Super-Resolution-Network
Framework pytorch

Hyperparameter Learning for Conditional Kernel Mean Embeddings with Rademacher Complexity Bounds

Title Hyperparameter Learning for Conditional Kernel Mean Embeddings with Rademacher Complexity Bounds
Authors Kelvin Hsu, Richard Nock, Fabio Ramos
Abstract Conditional kernel mean embeddings are nonparametric models that encode conditional expectations in a reproducing kernel Hilbert space. While they provide a flexible and powerful framework for probabilistic inference, their performance is highly dependent on the choice of kernel and regularization hyperparameters. Nevertheless, current hyperparameter tuning methods predominantly rely on expensive cross validation or heuristics that is not optimized for the inference task. For conditional kernel mean embeddings with categorical targets and arbitrary inputs, we propose a hyperparameter learning framework based on Rademacher complexity bounds to prevent overfitting by balancing data fit against model complexity. Our approach only requires batch updates, allowing scalable kernel hyperparameter tuning without invoking kernel approximations. Experiments demonstrate that our learning framework outperforms competing methods, and can be further extended to incorporate and learn deep neural network weights to improve generalization.
Tasks
Published 2018-09-01
URL http://arxiv.org/abs/1809.00175v3
PDF http://arxiv.org/pdf/1809.00175v3.pdf
PWC https://paperswithcode.com/paper/hyperparameter-learning-for-conditional
Repo https://github.com/Kelvin-Hsu/cake
Framework tf

Segmentation-driven 6D Object Pose Estimation

Title Segmentation-driven 6D Object Pose Estimation
Authors Yinlin Hu, Joachim Hugonot, Pascal Fua, Mathieu Salzmann
Abstract The most recent trend in estimating the 6D pose of rigid objects has been to train deep networks to either directly regress the pose from the image or to predict the 2D locations of 3D keypoints, from which the pose can be obtained using a PnP algorithm. In both cases, the object is treated as a global entity, and a single pose estimate is computed. As a consequence, the resulting techniques can be vulnerable to large occlusions. In this paper, we introduce a segmentation-driven 6D pose estimation framework where each visible part of the objects contributes a local pose prediction in the form of 2D keypoint locations. We then use a predicted measure of confidence to combine these pose candidates into a robust set of 3D-to-2D correspondences, from which a reliable pose estimate can be obtained. We outperform the state-of-the-art on the challenging Occluded-LINEMOD and YCB-Video datasets, which is evidence that our approach deals well with multiple poorly-textured objects occluding each other. Furthermore, it relies on a simple enough architecture to achieve real-time performance.
Tasks 6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation, Pose Prediction
Published 2018-12-06
URL http://arxiv.org/abs/1812.02541v3
PDF http://arxiv.org/pdf/1812.02541v3.pdf
PWC https://paperswithcode.com/paper/segmentation-driven-6d-object-pose-estimation
Repo https://github.com/sjtuytc/segmentation-driven-pose
Framework pytorch

Learning with Abandonment

Title Learning with Abandonment
Authors Ramesh Johari, Sven Schmit
Abstract Consider a platform that wants to learn a personalized policy for each user, but the platform faces the risk of a user abandoning the platform if she is dissatisfied with the actions of the platform. For example, a platform is interested in personalizing the number of newsletters it sends, but faces the risk that the user unsubscribes forever. We propose a general thresholded learning model for scenarios like this, and discuss the structure of optimal policies. We describe salient features of optimal personalization algorithms and how feedback the platform receives impacts the results. Furthermore, we investigate how the platform can efficiently learn the heterogeneity across users by interacting with a population and provide performance guarantees.
Tasks
Published 2018-02-23
URL http://arxiv.org/abs/1802.08718v1
PDF http://arxiv.org/pdf/1802.08718v1.pdf
PWC https://paperswithcode.com/paper/learning-with-abandonment
Repo https://github.com/schmit/learning-abandonment
Framework none

Nested LSTMs

Title Nested LSTMs
Authors Joel Ruben Antony Moniz, David Krueger
Abstract We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, which has its own inner memory cell. Specifically, instead of computing the value of the (outer) memory cell as $c^{outer}t = f_t \odot c{t-1} + i_t \odot g_t$, NLSTM memory cells use the concatenation $(f_t \odot c_{t-1}, i_t \odot g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set $c^{outer}_t$ = $h^{inner}_t$. Nested LSTMs outperform both stacked and single-layer LSTMs with similar numbers of parameters in our experiments on various character-level language modeling tasks, and the inner memories of an LSTM learn longer term dependencies compared with the higher-level units of a stacked LSTM.
Tasks Language Modelling
Published 2018-01-31
URL http://arxiv.org/abs/1801.10308v1
PDF http://arxiv.org/pdf/1801.10308v1.pdf
PWC https://paperswithcode.com/paper/nested-lstms
Repo https://github.com/hannw/nlstm
Framework tf

Rethinking ImageNet Pre-training

Title Rethinking ImageNet Pre-training
Authors Kaiming He, Ross Girshick, Piotr Dollár
Abstract We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are no worse than their ImageNet pre-training counterparts even when using the hyper-parameters of the baseline system (Mask R-CNN) that were optimized for fine-tuning pre-trained models, with the sole exception of increasing the number of training iterations so the randomly initialized models may converge. Training from random initialization is surprisingly robust; our results hold even when: (i) using only 10% of the training data, (ii) for deeper and wider models, and (iii) for multiple tasks and metrics. Experiments show that ImageNet pre-training speeds up convergence early in training, but does not necessarily provide regularization or improve final target task accuracy. To push the envelope we demonstrate 50.9 AP on COCO object detection without using any external data—a result on par with the top COCO 2017 competition results that used ImageNet pre-training. These observations challenge the conventional wisdom of ImageNet pre-training for dependent tasks and we expect these discoveries will encourage people to rethink the current de facto paradigm of `pre-training and fine-tuning’ in computer vision. |
Tasks Instance Segmentation, Object Detection, Semantic Segmentation
Published 2018-11-21
URL http://arxiv.org/abs/1811.08883v1
PDF http://arxiv.org/pdf/1811.08883v1.pdf
PWC https://paperswithcode.com/paper/rethinking-imagenet-pre-training
Repo https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN
Framework tf

A Benchmark of Selected Algorithmic Differentiation Tools on Some Problems in Computer Vision and Machine Learning

Title A Benchmark of Selected Algorithmic Differentiation Tools on Some Problems in Computer Vision and Machine Learning
Authors Filip Šrajer, Zuzana Kukelova, Andrew Fitzgibbon
Abstract Algorithmic differentiation (AD) allows exact computation of derivatives given only an implementation of an objective function. Although many AD tools are available, a proper and efficient implementation of AD methods is not straightforward. The existing tools are often too different to allow for a general test suite. In this paper, we compare fifteen ways of computing derivatives including eleven automatic differentiation tools implementing various methods and written in various languages (C++, F#, MATLAB, Julia and Python), two symbolic differentiation tools, finite differences, and hand-derived computation. We look at three objective functions from computer vision and machine learning. These objectives are for the most part simple, in the sense that no iterative loops are involved, and conditional statements are encapsulated in functions such as {\tt abs} or {\tt logsumexp}. However, it is important for the success of algorithmic differentiation that such `simple’ objective functions are handled efficiently, as so many problems in computer vision and machine learning are of this form. Of course, our results depend on programmer skill, and familiarity with the tools. However, we contend that this paper presents an important datapoint: a skilled programmer devoting roughly a week to each tool produced the timings we present. We have made our implementations available as open source to allow the community to replicate and update these benchmarks. |
Tasks
Published 2018-07-26
URL http://arxiv.org/abs/1807.10129v1
PDF http://arxiv.org/pdf/1807.10129v1.pdf
PWC https://paperswithcode.com/paper/a-benchmark-of-selected-algorithmic
Repo https://github.com/awf/ADBench
Framework none

Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining

Title Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining
Authors Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, Hongbin Zha
Abstract Rain streaks can severely degrade the visibility, which causes many current computer vision algorithms fail to work. So it is necessary to remove the rain from images. We propose a novel deep network architecture based on deep convolutional and recurrent neural networks for single image deraining. As contextual information is very important for rain removal, we first adopt the dilated convolutional neural network to acquire large receptive field. To better fit the rain removal task, we also modify the network. In heavy rain, rain streaks have various directions and shapes, which can be regarded as the accumulation of multiple rain streak layers. We assign different alpha-values to various rain streak layers according to the intensity and transparency by incorporating the squeeze-and-excitation block. Since rain streak layers overlap with each other, it is not easy to remove the rain in one stage. So we further decompose the rain removal into multiple stages. Recurrent neural network is incorporated to preserve the useful information in previous stages and benefit the rain removal in later stages. We conduct extensive experiments on both synthetic and real-world datasets. Our proposed method outperforms the state-of-the-art approaches under all evaluation metrics. Codes and supplementary material are available at our project webpage: https://xialipku.github.io/RESCAN .
Tasks Rain Removal, Single Image Deraining
Published 2018-07-16
URL http://arxiv.org/abs/1807.05698v2
PDF http://arxiv.org/pdf/1807.05698v2.pdf
PWC https://paperswithcode.com/paper/recurrent-squeeze-and-excitation-context
Repo https://github.com/XueweiMeng/derain_filter
Framework tf

Deep convolutional Gaussian processes

Title Deep convolutional Gaussian processes
Authors Kenneth Blomqvist, Samuel Kaski, Markus Heinonen
Abstract We propose deep convolutional Gaussian processes, a deep Gaussian process architecture with convolutional structure. The model is a principled Bayesian framework for detecting hierarchical combinations of local features for image classification. We demonstrate greatly improved image classification performance compared to current Gaussian process approaches on the MNIST and CIFAR-10 datasets. In particular, we improve CIFAR-10 accuracy by over 10 percentage points.
Tasks Gaussian Processes, Image Classification
Published 2018-10-06
URL http://arxiv.org/abs/1810.03052v1
PDF http://arxiv.org/pdf/1810.03052v1.pdf
PWC https://paperswithcode.com/paper/deep-convolutional-gaussian-processes
Repo https://github.com/kekeblom/DeepCGP
Framework tf
comments powered by Disqus