Paper Group AWR 267
GILBO: One Metric to Measure Them All. PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning. Learn To Pay Attention. Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems. Information-Directed Exploration for Deep Reinforcement Learning. Unsupervised Learning with …
GILBO: One Metric to Measure Them All
Title | GILBO: One Metric to Measure Them All |
Authors | Alexander A. Alemi, Ian Fischer |
Abstract | We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). It offers a data-independent measure of the complexity of the learned latent variable description, giving the log of the effective description length. It is well-defined for both VAEs and GANs. We compute the GILBO for 800 GANs and VAEs each trained on four datasets (MNIST, FashionMNIST, CIFAR-10 and CelebA) and discuss the results. |
Tasks | |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04874v3 |
http://arxiv.org/pdf/1802.04874v3.pdf | |
PWC | https://paperswithcode.com/paper/gilbo-one-metric-to-measure-them-all |
Repo | https://github.com/google/compare_gan |
Framework | tf |
PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning
Title | PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning |
Authors | Caelan Reed Garrett, Tomás Lozano-Pérez, Leslie Pack Kaelbling |
Abstract | Many planning applications involve complex relationships defined on high-dimensional, continuous variables. For example, robotic manipulation requires planning with kinematic, collision, visibility, and motion constraints involving robot configurations, object poses, and robot trajectories. These constraints typically require specialized procedures to sample satisfying values. We extend PDDL to support a generic, declarative specification for these procedures that treats their implementation as black boxes. We provide domain-independent algorithms that reduce PDDLStream problems to a sequence of finite PDDL problems. We also introduce an algorithm that dynamically balances exploring new candidate plans and exploiting existing ones. This enables the algorithm to greedily search the space of parameter bindings to more quickly solve tightly-constrained problems as well as locally optimize to produce low-cost solutions. We evaluate our algorithms on three simulated robotic planning domains as well as several real-world robotic tasks. |
Tasks | Motion Planning |
Published | 2018-02-23 |
URL | https://arxiv.org/abs/1802.08705v5 |
https://arxiv.org/pdf/1802.08705v5.pdf | |
PWC | https://paperswithcode.com/paper/stripstream-integrating-symbolic-planners-and |
Repo | https://github.com/jingxixu/pddlstream |
Framework | none |
Learn To Pay Attention
Title | Learn To Pay Attention |
Authors | Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr |
Abstract | We propose an end-to-end-trainable attention module for convolutional neural network (CNN) architectures built for image classification. The module takes as input the 2D feature vector maps which form the intermediate representations of the input image at different stages in the CNN pipeline, and outputs a 2D matrix of scores for each map. Standard CNN architectures are modified through the incorporation of this module, and trained under the constraint that a convex combination of the intermediate 2D feature vectors, as parameterised by the score matrices, must \textit{alone} be used for classification. Incentivised to amplify the relevant and suppress the irrelevant or misleading, the scores thus assume the role of attention values. Our experimental observations provide clear evidence to this effect: the learned attention maps neatly highlight the regions of interest while suppressing background clutter. Consequently, the proposed function is able to bootstrap standard CNN architectures for the task of image classification, demonstrating superior generalisation over 6 unseen benchmark datasets. When binarised, our attention maps outperform other CNN-based attention maps, traditional saliency maps, and top object proposals for weakly supervised segmentation as demonstrated on the Object Discovery dataset. We also demonstrate improved robustness against the fast gradient sign method of adversarial attack. |
Tasks | Adversarial Attack, Image Classification |
Published | 2018-04-06 |
URL | http://arxiv.org/abs/1804.02391v2 |
http://arxiv.org/pdf/1804.02391v2.pdf | |
PWC | https://paperswithcode.com/paper/learn-to-pay-attention |
Repo | https://github.com/caoquanjie/LearnToPayAttention |
Framework | tf |
Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems
Title | Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems |
Authors | Yu Yao, Mingze Xu, Chiho Choi, David J. Crandall, Ella M. Atkins, Behzad Dariush |
Abstract | Predicting the future location of vehicles is essential for safety-critical applications such as advanced driver assistance systems (ADAS) and autonomous driving. This paper introduces a novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle. We present a multi-stream recurrent neural network (RNN) encoder-decoder model that separately captures both object location and scale and pixel-level observations for future vehicle localization. We show that incorporating dense optical flow improves prediction results significantly since it captures information about motion as well as appearance change. We also find that explicitly modeling future motion of the ego-vehicle improves the prediction accuracy, which could be especially beneficial in intelligent and automated vehicles that have motion planning capability. To evaluate the performance of our approach, we present a new dataset of first-person videos collected from a variety of scenarios at road intersections, which are particularly challenging moments for prediction because vehicle trajectories are diverse and dynamic. |
Tasks | Autonomous Driving, Motion Planning, Optical Flow Estimation |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.07408v2 |
http://arxiv.org/pdf/1809.07408v2.pdf | |
PWC | https://paperswithcode.com/paper/egocentric-vision-based-future-vehicle |
Repo | https://github.com/MoonBlvd/tad-IROS2019 |
Framework | pytorch |
Information-Directed Exploration for Deep Reinforcement Learning
Title | Information-Directed Exploration for Deep Reinforcement Learning |
Authors | Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, Andreas Krause |
Abstract | Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches. |
Tasks | Atari Games, Distributional Reinforcement Learning, Efficient Exploration, Q-Learning |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07544v2 |
http://arxiv.org/pdf/1812.07544v2.pdf | |
PWC | https://paperswithcode.com/paper/information-directed-exploration-for-deep |
Repo | https://github.com/nikonikolov/rltf |
Framework | tf |
Unsupervised Learning with Stein’s Unbiased Risk Estimator
Title | Unsupervised Learning with Stein’s Unbiased Risk Estimator |
Authors | Christopher A. Metzler, Ali Mousavi, Reinhard Heckel, Richard G. Baraniuk |
Abstract | Learning from unlabeled and noisy data is one of the grand challenges of machine learning. As such, it has seen a flurry of research with new ideas proposed continuously. In this work, we revisit a classical idea: Stein’s Unbiased Risk Estimator (SURE). We show that, in the context of image recovery, SURE and its generalizations can be used to train convolutional neural networks (CNNs) for a range of image denoising and recovery problems without any ground truth data. Specifically, our goal is to reconstruct an image $x$ from a noisy linear transformation (measurement) of the image. We consider two scenarios: one where no additional data is available and one where we have measurements of other images that are drawn from the same noisy distribution as $x$, but have no access to the clean images. Such is the case, for instance, in the context of medical imaging, microscopy, and astronomy, where noise-less ground truth data is rarely available. We show that in this situation, SURE can be used to estimate the mean-squared-error loss associated with an estimate of $x$. Using this estimate of the loss, we train networks to perform denoising and compressed sensing recovery. In addition, we also use the SURE framework to partially explain and improve upon an intriguing results presented by Ulyanov et al. in “Deep Image Prior”: that a network initialized with random weights and fit to a single noisy image can effectively denoise that image. Public implementations of the networks and methods described in this paper can be found at https://github.com/ricedsp/D-AMP_Toolbox. |
Tasks | Denoising, Image Denoising |
Published | 2018-05-26 |
URL | https://arxiv.org/abs/1805.10531v2 |
https://arxiv.org/pdf/1805.10531v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-with-steins-unbiased |
Repo | https://github.com/ricedsp/D-AMP_Toolbox |
Framework | tf |
Noise2Void - Learning Denoising from Single Noisy Images
Title | Noise2Void - Learning Denoising from Single Noisy Images |
Authors | Alexander Krull, Tim-Oliver Buchholz, Florian Jug |
Abstract | The field of image denoising is currently dominated by discriminative deep learning methods that are trained on pairs of noisy input and clean target images. Recently it has been shown that such methods can also be trained without clean targets. Instead, independent pairs of noisy images can be used, in an approach known as Noise2Noise (N2N). Here, we introduce Noise2Void (N2V), a training scheme that takes this idea one step further. It does not require noisy image pairs, nor clean target images. Consequently, N2V allows us to train directly on the body of data to be denoised and can therefore be applied when other methods cannot. Especially interesting is the application to biomedical image data, where the acquisition of training targets, clean or noisy, is frequently not possible. We compare the performance of N2V to approaches that have either clean target images and/or noisy image pairs available. Intuitively, N2V cannot be expected to outperform methods that have more information available during training. Still, we observe that the denoising performance of Noise2Void drops in moderation and compares favorably to training-free denoising methods. |
Tasks | Denoising, Image Denoising |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.10980v2 |
http://arxiv.org/pdf/1811.10980v2.pdf | |
PWC | https://paperswithcode.com/paper/noise2void-learning-denoising-from-single |
Repo | https://github.com/juglab/n2v |
Framework | tf |
OCNet: Object Context Network for Scene Parsing
Title | OCNet: Object Context Network for Scene Parsing |
Authors | Yuhui Yuan, Jingdong Wang |
Abstract | In this paper, we address the problem of scene parsing with deep learning and focus on the context aggregation strategy for robust segmentation. Motivated by that the label of a pixel is the category of the object that the pixel belongs to, we introduce an \emph{object context pooling (OCP)} scheme, which represents each pixel by exploiting the set of pixels that belong to the same object category with such a pixel, and we call the set of pixels as object context. Our implementation, inspired by the self-attention approach, consists of two steps: (i) compute the similarities between each pixel and all the pixels, forming a so-called object context map for each pixel served as a surrogate for the true object context, and (ii) represent the pixel by aggregating the features of all the pixels weighted by the similarities. The resulting representation is more robust compared to existing context aggregation schemes, e.g., pyramid pooling modules (PPM) in PSPNet and atrous spatial pyramid pooling (ASPP), which do not differentiate the context pixels belonging to the same object category or not, making the reliability of contextually aggregated representations limited. We empirically demonstrate our approach and two pyramid extensions with state-of-the-art performance on three semantic segmentation benchmarks: Cityscapes, ADE20K and LIP. Code has been made available at: https://github.com/PkuRainBow/OCNet. |
Tasks | Scene Parsing, Semantic Segmentation |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.00916v3 |
http://arxiv.org/pdf/1809.00916v3.pdf | |
PWC | https://paperswithcode.com/paper/ocnet-object-context-network-for-scene |
Repo | https://github.com/PkuRainBow/OCNet |
Framework | pytorch |
MolGAN: An implicit generative model for small molecular graphs
Title | MolGAN: An implicit generative model for small molecular graphs |
Authors | Nicola De Cao, Thomas Kipf |
Abstract | Deep generative models for graph-structured data offer a new angle on the problem of chemical synthesis: by optimizing differentiable models that directly generate molecular graphs, it is possible to side-step expensive search procedures in the discrete and vast space of chemical structures. We introduce MolGAN, an implicit, likelihood-free generative model for small molecular graphs that circumvents the need for expensive graph matching procedures or node ordering heuristics of previous likelihood-based methods. Our method adapts generative adversarial networks (GANs) to operate directly on graph-structured data. We combine our approach with a reinforcement learning objective to encourage the generation of molecules with specific desired chemical properties. In experiments on the QM9 chemical database, we demonstrate that our model is capable of generating close to 100% valid compounds. MolGAN compares favorably both to recent proposals that use string-based (SMILES) representations of molecules and to a likelihood-based method that directly generates graphs, albeit being susceptible to mode collapse. |
Tasks | Graph Matching |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11973v1 |
http://arxiv.org/pdf/1805.11973v1.pdf | |
PWC | https://paperswithcode.com/paper/molgan-an-implicit-generative-model-for-small |
Repo | https://github.com/nicola-decao/MolGAN |
Framework | tf |
Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger
Title | Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger |
Authors | Gabriel Synnaeve, Zeming Lin, Jonas Gehring, Dan Gant, Vegard Mella, Vasil Khalidov, Nicolas Carion, Nicolas Usunier |
Abstract | We formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games. We propose to employ encoder-decoder neural networks for this task, and introduce proxy tasks and baselines for evaluation to assess their ability of capturing basic game rules and high-level dynamics. By combining convolutional neural networks and recurrent networks, we exploit spatial and sequential correlations and train well-performing models on a large dataset of human games of StarCraft: Brood War. Finally, we demonstrate the relevance of our models to downstream tasks by applying them for enemy unit prediction in a state-of-the-art, rule-based StarCraft bot. We observe improvements in win rates against several strong community bots. |
Tasks | Real-Time Strategy Games, Starcraft |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1812.00054v1 |
http://arxiv.org/pdf/1812.00054v1.pdf | |
PWC | https://paperswithcode.com/paper/forward-modeling-for-partial-observation |
Repo | https://github.com/facebookresearch/starcraft_defogger |
Framework | pytorch |
LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis
Title | LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis |
Authors | Zhe Liu, Shunbo Zhou, Chuanzhe Suo, Yingtian Liu, Peng Yin, Hesheng Wang, Yun-Hui Liu |
Abstract | Point cloud based place recognition is still an open issue due to the difficulty in extracting local features from the raw 3D point cloud and generating the global descriptor, and it’s even harder in the large-scale dynamic environments. In this paper, we develop a novel deep neural network, named LPD-Net (Large-scale Place Description Network), which can extract discriminative and generalizable global descriptors from the raw 3D point cloud. Two modules, the adaptive local feature extraction module and the graph-based neighborhood aggregation module, are proposed, which contribute to extract the local structures and reveal the spatial distribution of local features in the large-scale point cloud, with an end-to-end manner. We implement the proposed global descriptor in solving point cloud based retrieval tasks to achieve the large-scale place recognition. Comparison results show that our LPD-Net is much better than PointNetVLAD and reaches the state-of-the-art. We also compare our LPD-Net with the vision-based solutions to show the robustness of our approach to different weather and light conditions. |
Tasks | |
Published | 2018-12-11 |
URL | https://arxiv.org/abs/1812.07050v2 |
https://arxiv.org/pdf/1812.07050v2.pdf | |
PWC | https://paperswithcode.com/paper/181207050 |
Repo | https://github.com/Suoivy/LPD-net |
Framework | tf |
A Memory-Network Based Solution for Multivariate Time-Series Forecasting
Title | A Memory-Network Based Solution for Multivariate Time-Series Forecasting |
Authors | Yen-Yu Chang, Fan-Yun Sun, Yueh-Hua Wu, Shou-De Lin |
Abstract | Multivariate time series forecasting is extensively studied throughout the years with ubiquitous applications in areas such as finance, traffic, environment, etc. Still, concerns have been raised on traditional methods for incapable of modeling complex patterns or dependencies lying in real word data. To address such concerns, various deep learning models, mainly Recurrent Neural Network (RNN) based methods, are proposed. Nevertheless, capturing extremely long-term patterns while effectively incorporating information from other variables remains a challenge for time-series forecasting. Furthermore, lack-of-explainability remains one serious drawback for deep neural network models. Inspired by Memory Network proposed for solving the question-answering task, we propose a deep learning based model named Memory Time-series network (MTNet) for time series forecasting. MTNet consists of a large memory component, three separate encoders, and an autoregressive component to train jointly. Additionally, the attention mechanism designed enable MTNet to be highly interpretable. We can easily tell which part of the historic data is referenced the most. |
Tasks | Multivariate Time Series Forecasting, Question Answering, Time Series, Time Series Forecasting |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.02105v1 |
http://arxiv.org/pdf/1809.02105v1.pdf | |
PWC | https://paperswithcode.com/paper/a-memory-network-based-solution-for |
Repo | https://github.com/Maple728/MTNet |
Framework | tf |
CCNet: Criss-Cross Attention for Semantic Segmentation
Title | CCNet: Criss-Cross Attention for Semantic Segmentation |
Authors | Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, Wenyu Liu |
Abstract | Long-range dependencies can capture useful contextual information to benefit visual understanding problems. In this work, we propose a Criss-Cross Network (CCNet) for obtaining such important information through a more effective and efficient way. Concretely, for each pixel, our CCNet can harvest the contextual information of its surrounding pixels on the criss-cross path through a novel criss-cross attention module. By taking a further recurrent operation, each pixel can finally capture the long-range dependencies from all pixels. Overall, our CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the recurrent criss-cross attention module requires $11\times$ less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85% of the non-local block in computing long-range dependencies. 3) The state-of-the-art performance. We conduct extensive experiments on popular semantic segmentation benchmarks including Cityscapes, ADE20K, and instance segmentation benchmark COCO. In particular, our CCNet achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results. We make the code publicly available at \url{https://github.com/speedinghzl/CCNet . |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11721v1 |
http://arxiv.org/pdf/1811.11721v1.pdf | |
PWC | https://paperswithcode.com/paper/ccnet-criss-cross-attention-for-semantic |
Repo | https://github.com/speedinghzl/CCNet |
Framework | pytorch |
Accurate Uncertainties for Deep Learning Using Calibrated Regression
Title | Accurate Uncertainties for Deep Learning Using Calibrated Regression |
Authors | Volodymyr Kuleshov, Nathan Fenner, Stefano Ermon |
Abstract | Methods for reasoning under uncertainty are a key building block of accurate and reliable machine learning systems. Bayesian methods provide a general framework to quantify uncertainty. However, because of model misspecification and the use of approximate inference, Bayesian uncertainty estimates are often inaccurate – for example, a 90% credible interval may not contain the true outcome 90% of the time. Here, we propose a simple procedure for calibrating any regression algorithm; when applied to Bayesian and probabilistic models, it is guaranteed to produce calibrated uncertainty estimates given enough data. Our procedure is inspired by Platt scaling and extends previous work on classification. We evaluate this approach on Bayesian linear regression, feedforward, and recurrent neural networks, and find that it consistently outputs well-calibrated credible intervals while improving performance on time series forecasting and model-based reinforcement learning tasks. |
Tasks | Time Series, Time Series Forecasting |
Published | 2018-07-01 |
URL | http://arxiv.org/abs/1807.00263v1 |
http://arxiv.org/pdf/1807.00263v1.pdf | |
PWC | https://paperswithcode.com/paper/accurate-uncertainties-for-deep-learning |
Repo | https://github.com/ulissigroup/uncertainty_benchmarking |
Framework | pytorch |
Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
Title | Self-Supervised Model Adaptation for Multimodal Semantic Segmentation |
Authors | Abhinav Valada, Rohit Mohan, Wolfram Burgard |
Abstract | Learning to reliably perceive and understand the scene is an integral enabler for robots to operate in the real-world. This problem is inherently challenging due to the multitude of object types as well as appearance changes caused by varying illumination and weather conditions. Leveraging complementary modalities can enable learning of semantically richer representations that are resilient to such perturbations. Despite the tremendous progress in recent years, most multimodal convolutional neural network approaches directly concatenate feature maps from individual modality streams rendering the model incapable of focusing only on relevant complementary information for fusion. To address this limitation, we propose a mutimodal semantic segmentation framework that dynamically adapts the fusion of modality-specific features while being sensitive to the object category, spatial location and scene context in a self-supervised manner. Specifically, we propose an architecture consisting of two modality-specific encoder streams that fuse intermediate encoder representations into a single decoder using our proposed self-supervised model adaptation fusion mechanism which optimally combines complementary features. As intermediate representations are not aligned across modalities, we introduce an attention scheme for better correlation. In addition, we propose a computationally efficient unimodal segmentation architecture termed AdapNet++ that incorporates a new encoder with multiscale residual units and an efficient atrous spatial pyramid pooling that has a larger effective receptive field with more than 10x fewer parameters, complemented with a strong decoder with a multi-resolution supervision scheme that recovers high-resolution details. Comprehensive empirical evaluations on several benchmarks demonstrate that both our unimodal and multimodal architectures achieve state-of-the-art performance. |
Tasks | Scene Recognition, Semantic Segmentation |
Published | 2018-08-11 |
URL | https://arxiv.org/abs/1808.03833v3 |
https://arxiv.org/pdf/1808.03833v3.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-model-adaptation-for |
Repo | https://github.com/DeepSceneSeg/SSMA |
Framework | tf |