October 20, 2019

3234 words 16 mins read

Paper Group AWR 267

Paper Group AWR 267

GILBO: One Metric to Measure Them All. PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning. Learn To Pay Attention. Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems. Information-Directed Exploration for Deep Reinforcement Learning. Unsupervised Learning with …

GILBO: One Metric to Measure Them All

Title GILBO: One Metric to Measure Them All
Authors Alexander A. Alemi, Ian Fischer
Abstract We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). It offers a data-independent measure of the complexity of the learned latent variable description, giving the log of the effective description length. It is well-defined for both VAEs and GANs. We compute the GILBO for 800 GANs and VAEs each trained on four datasets (MNIST, FashionMNIST, CIFAR-10 and CelebA) and discuss the results.
Tasks
Published 2018-02-13
URL http://arxiv.org/abs/1802.04874v3
PDF http://arxiv.org/pdf/1802.04874v3.pdf
PWC https://paperswithcode.com/paper/gilbo-one-metric-to-measure-them-all
Repo https://github.com/google/compare_gan
Framework tf

PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning

Title PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning
Authors Caelan Reed Garrett, Tomás Lozano-Pérez, Leslie Pack Kaelbling
Abstract Many planning applications involve complex relationships defined on high-dimensional, continuous variables. For example, robotic manipulation requires planning with kinematic, collision, visibility, and motion constraints involving robot configurations, object poses, and robot trajectories. These constraints typically require specialized procedures to sample satisfying values. We extend PDDL to support a generic, declarative specification for these procedures that treats their implementation as black boxes. We provide domain-independent algorithms that reduce PDDLStream problems to a sequence of finite PDDL problems. We also introduce an algorithm that dynamically balances exploring new candidate plans and exploiting existing ones. This enables the algorithm to greedily search the space of parameter bindings to more quickly solve tightly-constrained problems as well as locally optimize to produce low-cost solutions. We evaluate our algorithms on three simulated robotic planning domains as well as several real-world robotic tasks.
Tasks Motion Planning
Published 2018-02-23
URL https://arxiv.org/abs/1802.08705v5
PDF https://arxiv.org/pdf/1802.08705v5.pdf
PWC https://paperswithcode.com/paper/stripstream-integrating-symbolic-planners-and
Repo https://github.com/jingxixu/pddlstream
Framework none

Learn To Pay Attention

Title Learn To Pay Attention
Authors Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr
Abstract We propose an end-to-end-trainable attention module for convolutional neural network (CNN) architectures built for image classification. The module takes as input the 2D feature vector maps which form the intermediate representations of the input image at different stages in the CNN pipeline, and outputs a 2D matrix of scores for each map. Standard CNN architectures are modified through the incorporation of this module, and trained under the constraint that a convex combination of the intermediate 2D feature vectors, as parameterised by the score matrices, must \textit{alone} be used for classification. Incentivised to amplify the relevant and suppress the irrelevant or misleading, the scores thus assume the role of attention values. Our experimental observations provide clear evidence to this effect: the learned attention maps neatly highlight the regions of interest while suppressing background clutter. Consequently, the proposed function is able to bootstrap standard CNN architectures for the task of image classification, demonstrating superior generalisation over 6 unseen benchmark datasets. When binarised, our attention maps outperform other CNN-based attention maps, traditional saliency maps, and top object proposals for weakly supervised segmentation as demonstrated on the Object Discovery dataset. We also demonstrate improved robustness against the fast gradient sign method of adversarial attack.
Tasks Adversarial Attack, Image Classification
Published 2018-04-06
URL http://arxiv.org/abs/1804.02391v2
PDF http://arxiv.org/pdf/1804.02391v2.pdf
PWC https://paperswithcode.com/paper/learn-to-pay-attention
Repo https://github.com/caoquanjie/LearnToPayAttention
Framework tf

Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems

Title Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems
Authors Yu Yao, Mingze Xu, Chiho Choi, David J. Crandall, Ella M. Atkins, Behzad Dariush
Abstract Predicting the future location of vehicles is essential for safety-critical applications such as advanced driver assistance systems (ADAS) and autonomous driving. This paper introduces a novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle. We present a multi-stream recurrent neural network (RNN) encoder-decoder model that separately captures both object location and scale and pixel-level observations for future vehicle localization. We show that incorporating dense optical flow improves prediction results significantly since it captures information about motion as well as appearance change. We also find that explicitly modeling future motion of the ego-vehicle improves the prediction accuracy, which could be especially beneficial in intelligent and automated vehicles that have motion planning capability. To evaluate the performance of our approach, we present a new dataset of first-person videos collected from a variety of scenarios at road intersections, which are particularly challenging moments for prediction because vehicle trajectories are diverse and dynamic.
Tasks Autonomous Driving, Motion Planning, Optical Flow Estimation
Published 2018-09-19
URL http://arxiv.org/abs/1809.07408v2
PDF http://arxiv.org/pdf/1809.07408v2.pdf
PWC https://paperswithcode.com/paper/egocentric-vision-based-future-vehicle
Repo https://github.com/MoonBlvd/tad-IROS2019
Framework pytorch

Information-Directed Exploration for Deep Reinforcement Learning

Title Information-Directed Exploration for Deep Reinforcement Learning
Authors Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, Andreas Krause
Abstract Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches.
Tasks Atari Games, Distributional Reinforcement Learning, Efficient Exploration, Q-Learning
Published 2018-12-18
URL http://arxiv.org/abs/1812.07544v2
PDF http://arxiv.org/pdf/1812.07544v2.pdf
PWC https://paperswithcode.com/paper/information-directed-exploration-for-deep
Repo https://github.com/nikonikolov/rltf
Framework tf

Unsupervised Learning with Stein’s Unbiased Risk Estimator

Title Unsupervised Learning with Stein’s Unbiased Risk Estimator
Authors Christopher A. Metzler, Ali Mousavi, Reinhard Heckel, Richard G. Baraniuk
Abstract Learning from unlabeled and noisy data is one of the grand challenges of machine learning. As such, it has seen a flurry of research with new ideas proposed continuously. In this work, we revisit a classical idea: Stein’s Unbiased Risk Estimator (SURE). We show that, in the context of image recovery, SURE and its generalizations can be used to train convolutional neural networks (CNNs) for a range of image denoising and recovery problems without any ground truth data. Specifically, our goal is to reconstruct an image $x$ from a noisy linear transformation (measurement) of the image. We consider two scenarios: one where no additional data is available and one where we have measurements of other images that are drawn from the same noisy distribution as $x$, but have no access to the clean images. Such is the case, for instance, in the context of medical imaging, microscopy, and astronomy, where noise-less ground truth data is rarely available. We show that in this situation, SURE can be used to estimate the mean-squared-error loss associated with an estimate of $x$. Using this estimate of the loss, we train networks to perform denoising and compressed sensing recovery. In addition, we also use the SURE framework to partially explain and improve upon an intriguing results presented by Ulyanov et al. in “Deep Image Prior”: that a network initialized with random weights and fit to a single noisy image can effectively denoise that image. Public implementations of the networks and methods described in this paper can be found at https://github.com/ricedsp/D-AMP_Toolbox.
Tasks Denoising, Image Denoising
Published 2018-05-26
URL https://arxiv.org/abs/1805.10531v2
PDF https://arxiv.org/pdf/1805.10531v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-with-steins-unbiased
Repo https://github.com/ricedsp/D-AMP_Toolbox
Framework tf

Noise2Void - Learning Denoising from Single Noisy Images

Title Noise2Void - Learning Denoising from Single Noisy Images
Authors Alexander Krull, Tim-Oliver Buchholz, Florian Jug
Abstract The field of image denoising is currently dominated by discriminative deep learning methods that are trained on pairs of noisy input and clean target images. Recently it has been shown that such methods can also be trained without clean targets. Instead, independent pairs of noisy images can be used, in an approach known as Noise2Noise (N2N). Here, we introduce Noise2Void (N2V), a training scheme that takes this idea one step further. It does not require noisy image pairs, nor clean target images. Consequently, N2V allows us to train directly on the body of data to be denoised and can therefore be applied when other methods cannot. Especially interesting is the application to biomedical image data, where the acquisition of training targets, clean or noisy, is frequently not possible. We compare the performance of N2V to approaches that have either clean target images and/or noisy image pairs available. Intuitively, N2V cannot be expected to outperform methods that have more information available during training. Still, we observe that the denoising performance of Noise2Void drops in moderation and compares favorably to training-free denoising methods.
Tasks Denoising, Image Denoising
Published 2018-11-27
URL http://arxiv.org/abs/1811.10980v2
PDF http://arxiv.org/pdf/1811.10980v2.pdf
PWC https://paperswithcode.com/paper/noise2void-learning-denoising-from-single
Repo https://github.com/juglab/n2v
Framework tf

OCNet: Object Context Network for Scene Parsing

Title OCNet: Object Context Network for Scene Parsing
Authors Yuhui Yuan, Jingdong Wang
Abstract In this paper, we address the problem of scene parsing with deep learning and focus on the context aggregation strategy for robust segmentation. Motivated by that the label of a pixel is the category of the object that the pixel belongs to, we introduce an \emph{object context pooling (OCP)} scheme, which represents each pixel by exploiting the set of pixels that belong to the same object category with such a pixel, and we call the set of pixels as object context. Our implementation, inspired by the self-attention approach, consists of two steps: (i) compute the similarities between each pixel and all the pixels, forming a so-called object context map for each pixel served as a surrogate for the true object context, and (ii) represent the pixel by aggregating the features of all the pixels weighted by the similarities. The resulting representation is more robust compared to existing context aggregation schemes, e.g., pyramid pooling modules (PPM) in PSPNet and atrous spatial pyramid pooling (ASPP), which do not differentiate the context pixels belonging to the same object category or not, making the reliability of contextually aggregated representations limited. We empirically demonstrate our approach and two pyramid extensions with state-of-the-art performance on three semantic segmentation benchmarks: Cityscapes, ADE20K and LIP. Code has been made available at: https://github.com/PkuRainBow/OCNet.
Tasks Scene Parsing, Semantic Segmentation
Published 2018-09-04
URL http://arxiv.org/abs/1809.00916v3
PDF http://arxiv.org/pdf/1809.00916v3.pdf
PWC https://paperswithcode.com/paper/ocnet-object-context-network-for-scene
Repo https://github.com/PkuRainBow/OCNet
Framework pytorch

MolGAN: An implicit generative model for small molecular graphs

Title MolGAN: An implicit generative model for small molecular graphs
Authors Nicola De Cao, Thomas Kipf
Abstract Deep generative models for graph-structured data offer a new angle on the problem of chemical synthesis: by optimizing differentiable models that directly generate molecular graphs, it is possible to side-step expensive search procedures in the discrete and vast space of chemical structures. We introduce MolGAN, an implicit, likelihood-free generative model for small molecular graphs that circumvents the need for expensive graph matching procedures or node ordering heuristics of previous likelihood-based methods. Our method adapts generative adversarial networks (GANs) to operate directly on graph-structured data. We combine our approach with a reinforcement learning objective to encourage the generation of molecules with specific desired chemical properties. In experiments on the QM9 chemical database, we demonstrate that our model is capable of generating close to 100% valid compounds. MolGAN compares favorably both to recent proposals that use string-based (SMILES) representations of molecules and to a likelihood-based method that directly generates graphs, albeit being susceptible to mode collapse.
Tasks Graph Matching
Published 2018-05-30
URL http://arxiv.org/abs/1805.11973v1
PDF http://arxiv.org/pdf/1805.11973v1.pdf
PWC https://paperswithcode.com/paper/molgan-an-implicit-generative-model-for-small
Repo https://github.com/nicola-decao/MolGAN
Framework tf

Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger

Title Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger
Authors Gabriel Synnaeve, Zeming Lin, Jonas Gehring, Dan Gant, Vegard Mella, Vasil Khalidov, Nicolas Carion, Nicolas Usunier
Abstract We formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games. We propose to employ encoder-decoder neural networks for this task, and introduce proxy tasks and baselines for evaluation to assess their ability of capturing basic game rules and high-level dynamics. By combining convolutional neural networks and recurrent networks, we exploit spatial and sequential correlations and train well-performing models on a large dataset of human games of StarCraft: Brood War. Finally, we demonstrate the relevance of our models to downstream tasks by applying them for enemy unit prediction in a state-of-the-art, rule-based StarCraft bot. We observe improvements in win rates against several strong community bots.
Tasks Real-Time Strategy Games, Starcraft
Published 2018-11-30
URL http://arxiv.org/abs/1812.00054v1
PDF http://arxiv.org/pdf/1812.00054v1.pdf
PWC https://paperswithcode.com/paper/forward-modeling-for-partial-observation
Repo https://github.com/facebookresearch/starcraft_defogger
Framework pytorch

LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis

Title LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis
Authors Zhe Liu, Shunbo Zhou, Chuanzhe Suo, Yingtian Liu, Peng Yin, Hesheng Wang, Yun-Hui Liu
Abstract Point cloud based place recognition is still an open issue due to the difficulty in extracting local features from the raw 3D point cloud and generating the global descriptor, and it’s even harder in the large-scale dynamic environments. In this paper, we develop a novel deep neural network, named LPD-Net (Large-scale Place Description Network), which can extract discriminative and generalizable global descriptors from the raw 3D point cloud. Two modules, the adaptive local feature extraction module and the graph-based neighborhood aggregation module, are proposed, which contribute to extract the local structures and reveal the spatial distribution of local features in the large-scale point cloud, with an end-to-end manner. We implement the proposed global descriptor in solving point cloud based retrieval tasks to achieve the large-scale place recognition. Comparison results show that our LPD-Net is much better than PointNetVLAD and reaches the state-of-the-art. We also compare our LPD-Net with the vision-based solutions to show the robustness of our approach to different weather and light conditions.
Tasks
Published 2018-12-11
URL https://arxiv.org/abs/1812.07050v2
PDF https://arxiv.org/pdf/1812.07050v2.pdf
PWC https://paperswithcode.com/paper/181207050
Repo https://github.com/Suoivy/LPD-net
Framework tf

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

Title A Memory-Network Based Solution for Multivariate Time-Series Forecasting
Authors Yen-Yu Chang, Fan-Yun Sun, Yueh-Hua Wu, Shou-De Lin
Abstract Multivariate time series forecasting is extensively studied throughout the years with ubiquitous applications in areas such as finance, traffic, environment, etc. Still, concerns have been raised on traditional methods for incapable of modeling complex patterns or dependencies lying in real word data. To address such concerns, various deep learning models, mainly Recurrent Neural Network (RNN) based methods, are proposed. Nevertheless, capturing extremely long-term patterns while effectively incorporating information from other variables remains a challenge for time-series forecasting. Furthermore, lack-of-explainability remains one serious drawback for deep neural network models. Inspired by Memory Network proposed for solving the question-answering task, we propose a deep learning based model named Memory Time-series network (MTNet) for time series forecasting. MTNet consists of a large memory component, three separate encoders, and an autoregressive component to train jointly. Additionally, the attention mechanism designed enable MTNet to be highly interpretable. We can easily tell which part of the historic data is referenced the most.
Tasks Multivariate Time Series Forecasting, Question Answering, Time Series, Time Series Forecasting
Published 2018-09-06
URL http://arxiv.org/abs/1809.02105v1
PDF http://arxiv.org/pdf/1809.02105v1.pdf
PWC https://paperswithcode.com/paper/a-memory-network-based-solution-for
Repo https://github.com/Maple728/MTNet
Framework tf

CCNet: Criss-Cross Attention for Semantic Segmentation

Title CCNet: Criss-Cross Attention for Semantic Segmentation
Authors Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, Wenyu Liu
Abstract Long-range dependencies can capture useful contextual information to benefit visual understanding problems. In this work, we propose a Criss-Cross Network (CCNet) for obtaining such important information through a more effective and efficient way. Concretely, for each pixel, our CCNet can harvest the contextual information of its surrounding pixels on the criss-cross path through a novel criss-cross attention module. By taking a further recurrent operation, each pixel can finally capture the long-range dependencies from all pixels. Overall, our CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the recurrent criss-cross attention module requires $11\times$ less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85% of the non-local block in computing long-range dependencies. 3) The state-of-the-art performance. We conduct extensive experiments on popular semantic segmentation benchmarks including Cityscapes, ADE20K, and instance segmentation benchmark COCO. In particular, our CCNet achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results. We make the code publicly available at \url{https://github.com/speedinghzl/CCNet .
Tasks Instance Segmentation, Semantic Segmentation
Published 2018-11-28
URL http://arxiv.org/abs/1811.11721v1
PDF http://arxiv.org/pdf/1811.11721v1.pdf
PWC https://paperswithcode.com/paper/ccnet-criss-cross-attention-for-semantic
Repo https://github.com/speedinghzl/CCNet
Framework pytorch

Accurate Uncertainties for Deep Learning Using Calibrated Regression

Title Accurate Uncertainties for Deep Learning Using Calibrated Regression
Authors Volodymyr Kuleshov, Nathan Fenner, Stefano Ermon
Abstract Methods for reasoning under uncertainty are a key building block of accurate and reliable machine learning systems. Bayesian methods provide a general framework to quantify uncertainty. However, because of model misspecification and the use of approximate inference, Bayesian uncertainty estimates are often inaccurate – for example, a 90% credible interval may not contain the true outcome 90% of the time. Here, we propose a simple procedure for calibrating any regression algorithm; when applied to Bayesian and probabilistic models, it is guaranteed to produce calibrated uncertainty estimates given enough data. Our procedure is inspired by Platt scaling and extends previous work on classification. We evaluate this approach on Bayesian linear regression, feedforward, and recurrent neural networks, and find that it consistently outputs well-calibrated credible intervals while improving performance on time series forecasting and model-based reinforcement learning tasks.
Tasks Time Series, Time Series Forecasting
Published 2018-07-01
URL http://arxiv.org/abs/1807.00263v1
PDF http://arxiv.org/pdf/1807.00263v1.pdf
PWC https://paperswithcode.com/paper/accurate-uncertainties-for-deep-learning
Repo https://github.com/ulissigroup/uncertainty_benchmarking
Framework pytorch

Self-Supervised Model Adaptation for Multimodal Semantic Segmentation

Title Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
Authors Abhinav Valada, Rohit Mohan, Wolfram Burgard
Abstract Learning to reliably perceive and understand the scene is an integral enabler for robots to operate in the real-world. This problem is inherently challenging due to the multitude of object types as well as appearance changes caused by varying illumination and weather conditions. Leveraging complementary modalities can enable learning of semantically richer representations that are resilient to such perturbations. Despite the tremendous progress in recent years, most multimodal convolutional neural network approaches directly concatenate feature maps from individual modality streams rendering the model incapable of focusing only on relevant complementary information for fusion. To address this limitation, we propose a mutimodal semantic segmentation framework that dynamically adapts the fusion of modality-specific features while being sensitive to the object category, spatial location and scene context in a self-supervised manner. Specifically, we propose an architecture consisting of two modality-specific encoder streams that fuse intermediate encoder representations into a single decoder using our proposed self-supervised model adaptation fusion mechanism which optimally combines complementary features. As intermediate representations are not aligned across modalities, we introduce an attention scheme for better correlation. In addition, we propose a computationally efficient unimodal segmentation architecture termed AdapNet++ that incorporates a new encoder with multiscale residual units and an efficient atrous spatial pyramid pooling that has a larger effective receptive field with more than 10x fewer parameters, complemented with a strong decoder with a multi-resolution supervision scheme that recovers high-resolution details. Comprehensive empirical evaluations on several benchmarks demonstrate that both our unimodal and multimodal architectures achieve state-of-the-art performance.
Tasks Scene Recognition, Semantic Segmentation
Published 2018-08-11
URL https://arxiv.org/abs/1808.03833v3
PDF https://arxiv.org/pdf/1808.03833v3.pdf
PWC https://paperswithcode.com/paper/self-supervised-model-adaptation-for
Repo https://github.com/DeepSceneSeg/SSMA
Framework tf
comments powered by Disqus