October 20, 2019

3234 words 16 mins read

Paper Group AWR 267

GILBO: One Metric to Measure Them All. PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning. Learn To Pay Attention. Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems. Information-Directed Exploration for Deep Reinforcement Learning. Unsupervised Learning with …

GILBO: One Metric to Measure Them All


Title	GILBO: One Metric to Measure Them All
Authors	Alexander A. Alemi, Ian Fischer
Abstract	We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). It offers a data-independent measure of the complexity of the learned latent variable description, giving the log of the effective description length. It is well-defined for both VAEs and GANs. We compute the GILBO for 800 GANs and VAEs each trained on four datasets (MNIST, FashionMNIST, CIFAR-10 and CelebA) and discuss the results.
Tasks
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04874v3
PDF	http://arxiv.org/pdf/1802.04874v3.pdf
PWC	https://paperswithcode.com/paper/gilbo-one-metric-to-measure-them-all
Repo	https://github.com/google/compare_gan
Framework	tf

PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning


Title	PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning
Authors	Caelan Reed Garrett, Tomás Lozano-Pérez, Leslie Pack Kaelbling
Abstract	Many planning applications involve complex relationships defined on high-dimensional, continuous variables. For example, robotic manipulation requires planning with kinematic, collision, visibility, and motion constraints involving robot configurations, object poses, and robot trajectories. These constraints typically require specialized procedures to sample satisfying values. We extend PDDL to support a generic, declarative specification for these procedures that treats their implementation as black boxes. We provide domain-independent algorithms that reduce PDDLStream problems to a sequence of finite PDDL problems. We also introduce an algorithm that dynamically balances exploring new candidate plans and exploiting existing ones. This enables the algorithm to greedily search the space of parameter bindings to more quickly solve tightly-constrained problems as well as locally optimize to produce low-cost solutions. We evaluate our algorithms on three simulated robotic planning domains as well as several real-world robotic tasks.
Tasks	Motion Planning
Published	2018-02-23
URL	https://arxiv.org/abs/1802.08705v5
PDF	https://arxiv.org/pdf/1802.08705v5.pdf
PWC	https://paperswithcode.com/paper/stripstream-integrating-symbolic-planners-and
Repo	https://github.com/jingxixu/pddlstream
Framework	none

Learn To Pay Attention


Title	Learn To Pay Attention
Authors	Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr
Abstract	We propose an end-to-end-trainable attention module for convolutional neural network (CNN) architectures built for image classification. The module takes as input the 2D feature vector maps which form the intermediate representations of the input image at different stages in the CNN pipeline, and outputs a 2D matrix of scores for each map. Standard CNN architectures are modified through the incorporation of this module, and trained under the constraint that a convex combination of the intermediate 2D feature vectors, as parameterised by the score matrices, must \textit{alone} be used for classification. Incentivised to amplify the relevant and suppress the irrelevant or misleading, the scores thus assume the role of attention values. Our experimental observations provide clear evidence to this effect: the learned attention maps neatly highlight the regions of interest while suppressing background clutter. Consequently, the proposed function is able to bootstrap standard CNN architectures for the task of image classification, demonstrating superior generalisation over 6 unseen benchmark datasets. When binarised, our attention maps outperform other CNN-based attention maps, traditional saliency maps, and top object proposals for weakly supervised segmentation as demonstrated on the Object Discovery dataset. We also demonstrate improved robustness against the fast gradient sign method of adversarial attack.
Tasks	Adversarial Attack, Image Classification
Published	2018-04-06
URL	http://arxiv.org/abs/1804.02391v2
PDF	http://arxiv.org/pdf/1804.02391v2.pdf
PWC	https://paperswithcode.com/paper/learn-to-pay-attention
Repo	https://github.com/caoquanjie/LearnToPayAttention
Framework	tf

Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems


Title	Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems
Authors	Yu Yao, Mingze Xu, Chiho Choi, David J. Crandall, Ella M. Atkins, Behzad Dariush
Abstract	Predicting the future location of vehicles is essential for safety-critical applications such as advanced driver assistance systems (ADAS) and autonomous driving. This paper introduces a novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle. We present a multi-stream recurrent neural network (RNN) encoder-decoder model that separately captures both object location and scale and pixel-level observations for future vehicle localization. We show that incorporating dense optical flow improves prediction results significantly since it captures information about motion as well as appearance change. We also find that explicitly modeling future motion of the ego-vehicle improves the prediction accuracy, which could be especially beneficial in intelligent and automated vehicles that have motion planning capability. To evaluate the performance of our approach, we present a new dataset of first-person videos collected from a variety of scenarios at road intersections, which are particularly challenging moments for prediction because vehicle trajectories are diverse and dynamic.
Tasks	Autonomous Driving, Motion Planning, Optical Flow Estimation
Published	2018-09-19
URL	http://arxiv.org/abs/1809.07408v2
PDF	http://arxiv.org/pdf/1809.07408v2.pdf
PWC	https://paperswithcode.com/paper/egocentric-vision-based-future-vehicle
Repo	https://github.com/MoonBlvd/tad-IROS2019
Framework	pytorch

Information-Directed Exploration for Deep Reinforcement Learning


Title	Information-Directed Exploration for Deep Reinforcement Learning
Authors	Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, Andreas Krause
Abstract	Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches.
Tasks	Atari Games, Distributional Reinforcement Learning, Efficient Exploration, Q-Learning
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07544v2
PDF	http://arxiv.org/pdf/1812.07544v2.pdf
PWC	https://paperswithcode.com/paper/information-directed-exploration-for-deep
Repo	https://github.com/nikonikolov/rltf
Framework	tf

Unsupervised Learning with Stein’s Unbiased Risk Estimator


Title	Unsupervised Learning with Stein’s Unbiased Risk Estimator
Authors	Christopher A. Metzler, Ali Mousavi, Reinhard Heckel, Richard G. Baraniuk
Abstract	Learning from unlabeled and noisy data is one of the grand challenges of machine learning. As such, it has seen a flurry of research with new ideas proposed continuously. In this work, we revisit a classical idea: Stein’s Unbiased Risk Estimator (SURE). We show that, in the context of image recovery, SURE and its generalizations can be used to train convolutional neural networks (CNNs) for a range of image denoising and recovery problems without any ground truth data. Specifically, our goal is to reconstruct an image $x$ from a noisy linear transformation (measurement) of the image. We consider two scenarios: one where no additional data is available and one where we have measurements of other images that are drawn from the same noisy distribution as $x$, but have no access to the clean images. Such is the case, for instance, in the context of medical imaging, microscopy, and astronomy, where noise-less ground truth data is rarely available. We show that in this situation, SURE can be used to estimate the mean-squared-error loss associated with an estimate of $x$. Using this estimate of the loss, we train networks to perform denoising and compressed sensing recovery. In addition, we also use the SURE framework to partially explain and improve upon an intriguing results presented by Ulyanov et al. in “Deep Image Prior”: that a network initialized with random weights and fit to a single noisy image can effectively denoise that image. Public implementations of the networks and methods described in this paper can be found at https://github.com/ricedsp/D-AMP_Toolbox.
Tasks	Denoising, Image Denoising
Published	2018-05-26
URL	https://arxiv.org/abs/1805.10531v2
PDF	https://arxiv.org/pdf/1805.10531v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-with-steins-unbiased
Repo	https://github.com/ricedsp/D-AMP_Toolbox
Framework	tf

Noise2Void - Learning Denoising from Single Noisy Images


Title	Noise2Void - Learning Denoising from Single Noisy Images
Authors	Alexander Krull, Tim-Oliver Buchholz, Florian Jug
Abstract	The field of image denoising is currently dominated by discriminative deep learning methods that are trained on pairs of noisy input and clean target images. Recently it has been shown that such methods can also be trained without clean targets. Instead, independent pairs of noisy images can be used, in an approach known as Noise2Noise (N2N). Here, we introduce Noise2Void (N2V), a training scheme that takes this idea one step further. It does not require noisy image pairs, nor clean target images. Consequently, N2V allows us to train directly on the body of data to be denoised and can therefore be applied when other methods cannot. Especially interesting is the application to biomedical image data, where the acquisition of training targets, clean or noisy, is frequently not possible. We compare the performance of N2V to approaches that have either clean target images and/or noisy image pairs available. Intuitively, N2V cannot be expected to outperform methods that have more information available during training. Still, we observe that the denoising performance of Noise2Void drops in moderation and compares favorably to training-free denoising methods.
Tasks	Denoising, Image Denoising
Published	2018-11-27
URL	http://arxiv.org/abs/1811.10980v2
PDF	http://arxiv.org/pdf/1811.10980v2.pdf
PWC	https://paperswithcode.com/paper/noise2void-learning-denoising-from-single
Repo	https://github.com/juglab/n2v
Framework	tf

OCNet: Object Context Network for Scene Parsing


Title	OCNet: Object Context Network for Scene Parsing
Authors	Yuhui Yuan, Jingdong Wang
Abstract	In this paper, we address the problem of scene parsing with deep learning and focus on the context aggregation strategy for robust segmentation. Motivated by that the label of a pixel is the category of the object that the pixel belongs to, we introduce an \emph{object context pooling (OCP)} scheme, which represents each pixel by exploiting the set of pixels that belong to the same object category with such a pixel, and we call the set of pixels as object context. Our implementation, inspired by the self-attention approach, consists of two steps: (i) compute the similarities between each pixel and all the pixels, forming a so-called object context map for each pixel served as a surrogate for the true object context, and (ii) represent the pixel by aggregating the features of all the pixels weighted by the similarities. The resulting representation is more robust compared to existing context aggregation schemes, e.g., pyramid pooling modules (PPM) in PSPNet and atrous spatial pyramid pooling (ASPP), which do not differentiate the context pixels belonging to the same object category or not, making the reliability of contextually aggregated representations limited. We empirically demonstrate our approach and two pyramid extensions with state-of-the-art performance on three semantic segmentation benchmarks: Cityscapes, ADE20K and LIP. Code has been made available at: https://github.com/PkuRainBow/OCNet.
Tasks	Scene Parsing, Semantic Segmentation
Published	2018-09-04
URL	http://arxiv.org/abs/1809.00916v3
PDF	http://arxiv.org/pdf/1809.00916v3.pdf
PWC	https://paperswithcode.com/paper/ocnet-object-context-network-for-scene
Repo	https://github.com/PkuRainBow/OCNet
Framework	pytorch

MolGAN: An implicit generative model for small molecular graphs


Title	MolGAN: An implicit generative model for small molecular graphs
Authors	Nicola De Cao, Thomas Kipf
Abstract	Deep generative models for graph-structured data offer a new angle on the problem of chemical synthesis: by optimizing differentiable models that directly generate molecular graphs, it is possible to side-step expensive search procedures in the discrete and vast space of chemical structures. We introduce MolGAN, an implicit, likelihood-free generative model for small molecular graphs that circumvents the need for expensive graph matching procedures or node ordering heuristics of previous likelihood-based methods. Our method adapts generative adversarial networks (GANs) to operate directly on graph-structured data. We combine our approach with a reinforcement learning objective to encourage the generation of molecules with specific desired chemical properties. In experiments on the QM9 chemical database, we demonstrate that our model is capable of generating close to 100% valid compounds. MolGAN compares favorably both to recent proposals that use string-based (SMILES) representations of molecules and to a likelihood-based method that directly generates graphs, albeit being susceptible to mode collapse.
Tasks	Graph Matching
Published	2018-05-30
URL	http://arxiv.org/abs/1805.11973v1
PDF	http://arxiv.org/pdf/1805.11973v1.pdf
PWC	https://paperswithcode.com/paper/molgan-an-implicit-generative-model-for-small
Repo	https://github.com/nicola-decao/MolGAN
Framework	tf

Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger


Title	Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger
Authors	Gabriel Synnaeve, Zeming Lin, Jonas Gehring, Dan Gant, Vegard Mella, Vasil Khalidov, Nicolas Carion, Nicolas Usunier
Abstract	We formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games. We propose to employ encoder-decoder neural networks for this task, and introduce proxy tasks and baselines for evaluation to assess their ability of capturing basic game rules and high-level dynamics. By combining convolutional neural networks and recurrent networks, we exploit spatial and sequential correlations and train well-performing models on a large dataset of human games of StarCraft: Brood War. Finally, we demonstrate the relevance of our models to downstream tasks by applying them for enemy unit prediction in a state-of-the-art, rule-based StarCraft bot. We observe improvements in win rates against several strong community bots.
Tasks	Real-Time Strategy Games, Starcraft
Published	2018-11-30
URL	http://arxiv.org/abs/1812.00054v1
PDF	http://arxiv.org/pdf/1812.00054v1.pdf
PWC	https://paperswithcode.com/paper/forward-modeling-for-partial-observation
Repo	https://github.com/facebookresearch/starcraft_defogger
Framework	pytorch

LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis


Title	LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis
Authors	Zhe Liu, Shunbo Zhou, Chuanzhe Suo, Yingtian Liu, Peng Yin, Hesheng Wang, Yun-Hui Liu
Abstract	Point cloud based place recognition is still an open issue due to the difficulty in extracting local features from the raw 3D point cloud and generating the global descriptor, and it’s even harder in the large-scale dynamic environments. In this paper, we develop a novel deep neural network, named LPD-Net (Large-scale Place Description Network), which can extract discriminative and generalizable global descriptors from the raw 3D point cloud. Two modules, the adaptive local feature extraction module and the graph-based neighborhood aggregation module, are proposed, which contribute to extract the local structures and reveal the spatial distribution of local features in the large-scale point cloud, with an end-to-end manner. We implement the proposed global descriptor in solving point cloud based retrieval tasks to achieve the large-scale place recognition. Comparison results show that our LPD-Net is much better than PointNetVLAD and reaches the state-of-the-art. We also compare our LPD-Net with the vision-based solutions to show the robustness of our approach to different weather and light conditions.
Tasks
Published	2018-12-11
URL	https://arxiv.org/abs/1812.07050v2
PDF	https://arxiv.org/pdf/1812.07050v2.pdf
PWC	https://paperswithcode.com/paper/181207050
Repo	https://github.com/Suoivy/LPD-net
Framework	tf

A Memory-Network Based Solution for Multivariate Time-Series Forecasting


Title	A Memory-Network Based Solution for Multivariate Time-Series Forecasting
Authors	Yen-Yu Chang, Fan-Yun Sun, Yueh-Hua Wu, Shou-De Lin
Abstract	Multivariate time series forecasting is extensively studied throughout the years with ubiquitous applications in areas such as finance, traffic, environment, etc. Still, concerns have been raised on traditional methods for incapable of modeling complex patterns or dependencies lying in real word data. To address such concerns, various deep learning models, mainly Recurrent Neural Network (RNN) based methods, are proposed. Nevertheless, capturing extremely long-term patterns while effectively incorporating information from other variables remains a challenge for time-series forecasting. Furthermore, lack-of-explainability remains one serious drawback for deep neural network models. Inspired by Memory Network proposed for solving the question-answering task, we propose a deep learning based model named Memory Time-series network (MTNet) for time series forecasting. MTNet consists of a large memory component, three separate encoders, and an autoregressive component to train jointly. Additionally, the attention mechanism designed enable MTNet to be highly interpretable. We can easily tell which part of the historic data is referenced the most.
Tasks	Multivariate Time Series Forecasting, Question Answering, Time Series, Time Series Forecasting
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02105v1
PDF	http://arxiv.org/pdf/1809.02105v1.pdf
PWC	https://paperswithcode.com/paper/a-memory-network-based-solution-for
Repo	https://github.com/Maple728/MTNet
Framework	tf

CCNet: Criss-Cross Attention for Semantic Segmentation


Title	CCNet: Criss-Cross Attention for Semantic Segmentation
Authors	Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, Wenyu Liu
Abstract	Long-range dependencies can capture useful contextual information to benefit visual understanding problems. In this work, we propose a Criss-Cross Network (CCNet) for obtaining such important information through a more effective and efficient way. Concretely, for each pixel, our CCNet can harvest the contextual information of its surrounding pixels on the criss-cross path through a novel criss-cross attention module. By taking a further recurrent operation, each pixel can finally capture the long-range dependencies from all pixels. Overall, our CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the recurrent criss-cross attention module requires $11\times$ less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85% of the non-local block in computing long-range dependencies. 3) The state-of-the-art performance. We conduct extensive experiments on popular semantic segmentation benchmarks including Cityscapes, ADE20K, and instance segmentation benchmark COCO. In particular, our CCNet achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results. We make the code publicly available at \url{https://github.com/speedinghzl/CCNet .
Tasks	Instance Segmentation, Semantic Segmentation
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11721v1
PDF	http://arxiv.org/pdf/1811.11721v1.pdf
PWC	https://paperswithcode.com/paper/ccnet-criss-cross-attention-for-semantic
Repo	https://github.com/speedinghzl/CCNet
Framework	pytorch

Accurate Uncertainties for Deep Learning Using Calibrated Regression


Title	Accurate Uncertainties for Deep Learning Using Calibrated Regression
Authors	Volodymyr Kuleshov, Nathan Fenner, Stefano Ermon
Abstract	Methods for reasoning under uncertainty are a key building block of accurate and reliable machine learning systems. Bayesian methods provide a general framework to quantify uncertainty. However, because of model misspecification and the use of approximate inference, Bayesian uncertainty estimates are often inaccurate – for example, a 90% credible interval may not contain the true outcome 90% of the time. Here, we propose a simple procedure for calibrating any regression algorithm; when applied to Bayesian and probabilistic models, it is guaranteed to produce calibrated uncertainty estimates given enough data. Our procedure is inspired by Platt scaling and extends previous work on classification. We evaluate this approach on Bayesian linear regression, feedforward, and recurrent neural networks, and find that it consistently outputs well-calibrated credible intervals while improving performance on time series forecasting and model-based reinforcement learning tasks.
Tasks	Time Series, Time Series Forecasting
Published	2018-07-01
URL	http://arxiv.org/abs/1807.00263v1
PDF	http://arxiv.org/pdf/1807.00263v1.pdf
PWC	https://paperswithcode.com/paper/accurate-uncertainties-for-deep-learning
Repo	https://github.com/ulissigroup/uncertainty_benchmarking
Framework	pytorch

Self-Supervised Model Adaptation for Multimodal Semantic Segmentation


Title	Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
Authors	Abhinav Valada, Rohit Mohan, Wolfram Burgard
Abstract	Learning to reliably perceive and understand the scene is an integral enabler for robots to operate in the real-world. This problem is inherently challenging due to the multitude of object types as well as appearance changes caused by varying illumination and weather conditions. Leveraging complementary modalities can enable learning of semantically richer representations that are resilient to such perturbations. Despite the tremendous progress in recent years, most multimodal convolutional neural network approaches directly concatenate feature maps from individual modality streams rendering the model incapable of focusing only on relevant complementary information for fusion. To address this limitation, we propose a mutimodal semantic segmentation framework that dynamically adapts the fusion of modality-specific features while being sensitive to the object category, spatial location and scene context in a self-supervised manner. Specifically, we propose an architecture consisting of two modality-specific encoder streams that fuse intermediate encoder representations into a single decoder using our proposed self-supervised model adaptation fusion mechanism which optimally combines complementary features. As intermediate representations are not aligned across modalities, we introduce an attention scheme for better correlation. In addition, we propose a computationally efficient unimodal segmentation architecture termed AdapNet++ that incorporates a new encoder with multiscale residual units and an efficient atrous spatial pyramid pooling that has a larger effective receptive field with more than 10x fewer parameters, complemented with a strong decoder with a multi-resolution supervision scheme that recovers high-resolution details. Comprehensive empirical evaluations on several benchmarks demonstrate that both our unimodal and multimodal architectures achieve state-of-the-art performance.
Tasks	Scene Recognition, Semantic Segmentation
Published	2018-08-11
URL	https://arxiv.org/abs/1808.03833v3
PDF	https://arxiv.org/pdf/1808.03833v3.pdf
PWC	https://paperswithcode.com/paper/self-supervised-model-adaptation-for
Repo	https://github.com/DeepSceneSeg/SSMA
Framework	tf