February 1, 2020

2729 words 13 mins read

Paper Group AWR 84

Paper Group AWR 84

Scalable Extreme Deconvolution. F3Net: Fusion, Feedback and Focus for Salient Object Detection. BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding. A Bayesian Decision Tree Algorithm. Modular Block-diagonal Curvature Approximations for Feedforward Architectures. Depth-Aware Video Frame Interpolation. Neutron: An Imp …

Scalable Extreme Deconvolution

Title Scalable Extreme Deconvolution
Authors James A. Ritchie, Iain Murray
Abstract The Extreme Deconvolution method fits a probability density to a dataset where each observation has Gaussian noise added with a known sample-specific covariance, originally intended for use with astronomical datasets. The existing fitting method is batch EM, which would not normally be applied to large datasets such as the Gaia catalog containing noisy observations of a billion stars. We propose two minibatch variants of extreme deconvolution, based on an online variation of the EM algorithm, and direct gradient-based optimisation of the log-likelihood, both of which can run on GPUs. We demonstrate that these methods provide faster fitting, whilst being able to scale to much larger models for use with larger datasets.
Tasks
Published 2019-11-26
URL https://arxiv.org/abs/1911.11663v1
PDF https://arxiv.org/pdf/1911.11663v1.pdf
PWC https://paperswithcode.com/paper/scalable-extreme-deconvolution
Repo https://github.com/bayesiains/scalable_xd
Framework pytorch

F3Net: Fusion, Feedback and Focus for Salient Object Detection

Title F3Net: Fusion, Feedback and Focus for Salient Object Detection
Authors Jun Wei, Shuhui Wang, Qingming Huang
Abstract Most of existing salient object detection models have achieved great progress by aggregating multi-level features extracted from convolutional neural networks. However, because of the different receptive fields of different convolutional layers, there exists big differences between features generated by these layers. Common feature fusion strategies (addition or concatenation) ignore these differences and may cause suboptimal solutions. In this paper, we propose the F3Net to solve above problem, which mainly consists of cross feature module (CFM) and cascaded feedback decoder (CFD) trained by minimizing a new pixel position aware loss (PPA). Specifically, CFM aims to selectively aggregate multi-level features. Different from addition and concatenation, CFM adaptively selects complementary components from input features before fusion, which can effectively avoid introducing too much redundant information that may destroy the original features. Besides, CFD adopts a multi-stage feedback mechanism, where features closed to supervision will be introduced to the output of previous layers to supplement them and eliminate the differences between features. These refined features will go through multiple similar iterations before generating the final saliency maps. Furthermore, different from binary cross entropy, the proposed PPA loss doesn’t treat pixels equally, which can synthesize the local structure information of a pixel to guide the network to focus more on local details. Hard pixels from boundaries or error-prone parts will be given more attention to emphasize their importance. F3Net is able to segment salient object regions accurately and provide clear local details. Comprehensive experiments on five benchmark datasets demonstrate that F3Net outperforms state-of-the-art approaches on six evaluation metrics.
Tasks Object Detection, Salient Object Detection
Published 2019-11-26
URL https://arxiv.org/abs/1911.11445v1
PDF https://arxiv.org/pdf/1911.11445v1.pdf
PWC https://paperswithcode.com/paper/f3net-fusion-feedback-and-focus-for-salient
Repo https://github.com/weijun88/F3Net
Framework pytorch

BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

Title BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding
Authors Timo I. Denk, Christian Reisswig
Abstract For understanding generic documents, information like font sizes, column layout, and generally the positioning of words may carry semantic information that is crucial for solving a downstream document intelligence task. Our novel BERTgrid, which is based on Chargrid by Katti et al. (2018), represents a document as a grid of contextualized word piece embedding vectors, thereby making its spatial structure and semantics accessible to the processing neural network. The contextualized embedding vectors are retrieved from a BERT language model. We use BERTgrid in combination with a fully convolutional network on a semantic instance segmentation task for extracting fields from invoices. We demonstrate its performance on tabulated line item and document header field extraction.
Tasks Instance Segmentation, Language Modelling, Semantic Segmentation
Published 2019-09-11
URL https://arxiv.org/abs/1909.04948v2
PDF https://arxiv.org/pdf/1909.04948v2.pdf
PWC https://paperswithcode.com/paper/bertgrid-contextualized-embedding-for-2d
Repo https://github.com/sam-ai/BertGrid
Framework none

A Bayesian Decision Tree Algorithm

Title A Bayesian Decision Tree Algorithm
Authors Giuseppe Nuti, Lluís Antoni Jiménez Rugama, Andreea-Ingrid Cross
Abstract Bayesian Decision Trees are known for their probabilistic interpretability. However, their construction can sometimes be costly. In this article we present a general Bayesian Decision Tree algorithm applicable to both regression and classification problems. The algorithm does not apply Markov Chain Monte Carlo and does not require a pruning step. While it is possible to construct a weighted probability tree space we find that one particular tree, the greedy-modal tree (GMT), explains most of the information contained in the numerical examples. This approach seems to perform similarly to Random Forests.
Tasks
Published 2019-01-10
URL http://arxiv.org/abs/1901.03214v2
PDF http://arxiv.org/pdf/1901.03214v2.pdf
PWC https://paperswithcode.com/paper/a-bayesian-decision-tree-algorithm
Repo https://github.com/UBS-IB/bayesian_tree
Framework none

Modular Block-diagonal Curvature Approximations for Feedforward Architectures

Title Modular Block-diagonal Curvature Approximations for Feedforward Architectures
Authors Felix Dangel, Stefan Harmeling, Philipp Hennig
Abstract We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.
Tasks
Published 2019-02-05
URL https://arxiv.org/abs/1902.01813v3
PDF https://arxiv.org/pdf/1902.01813v3.pdf
PWC https://paperswithcode.com/paper/a-modular-approach-to-block-diagonal-hessian
Repo https://github.com/f-dangel/hbp
Framework pytorch

Depth-Aware Video Frame Interpolation

Title Depth-Aware Video Frame Interpolation
Authors Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang
Abstract Video frame interpolation aims to synthesize nonexistent frames in-between the original frames. While significant advances have been made from the recent deep convolutional neural networks, the quality of interpolation is often reduced due to large object motion or occlusion. In this work, we propose a video frame interpolation method which explicitly detects the occlusion by exploring the depth information. Specifically, we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. In addition, we learn hierarchical features to gather contextual information from neighboring pixels. The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame. Our model is compact, efficient, and fully differentiable. Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets.
Tasks Optical Flow Estimation, Video Frame Interpolation
Published 2019-04-01
URL http://arxiv.org/abs/1904.00830v1
PDF http://arxiv.org/pdf/1904.00830v1.pdf
PWC https://paperswithcode.com/paper/depth-aware-video-frame-interpolation
Repo https://github.com/baowenbo/DAIN
Framework pytorch

Neutron: An Implementation of the Transformer Translation Model and its Variants

Title Neutron: An Implementation of the Transformer Translation Model and its Variants
Authors Hongfei Xu, Qiuhui Liu
Abstract The Transformer translation model is easier to parallelize and provides better performance compared to recurrent seq2seq models, which makes it popular among industry and research community. We implement the Neutron in this work, including the Transformer model and its several variants from most recent researches. It is highly optimized, easy to modify and provides comparable performance with interesting features while keeping readability.
Tasks
Published 2019-03-18
URL https://arxiv.org/abs/1903.07402v2
PDF https://arxiv.org/pdf/1903.07402v2.pdf
PWC https://paperswithcode.com/paper/neutron-an-implementation-of-the-transformer
Repo https://github.com/anoidgit/transformer
Framework pytorch

Introducing an Explicit Symplectic Integration Scheme for Riemannian Manifold Hamiltonian Monte Carlo

Title Introducing an Explicit Symplectic Integration Scheme for Riemannian Manifold Hamiltonian Monte Carlo
Authors Adam D. Cobb, Atılım Güneş Baydin, Andrew Markham, Stephen J. Roberts
Abstract We introduce a recent symplectic integration scheme derived for solving physically motivated systems with non-separable Hamiltonians. We show its relevance to Riemannian manifold Hamiltonian Monte Carlo (RMHMC) and provide an alternative to the currently used generalised leapfrog symplectic integrator, which relies on solving multiple fixed point iterations to convergence. Via this approach, we are able to reduce the number of higher-order derivative calculations per leapfrog step. We explore the implications of this integrator and demonstrate its efficacy in reducing the computational burden of RMHMC. Our code is provided in a new open-source Python package, hamiltorch.
Tasks Bayesian Inference
Published 2019-10-14
URL https://arxiv.org/abs/1910.06243v1
PDF https://arxiv.org/pdf/1910.06243v1.pdf
PWC https://paperswithcode.com/paper/introducing-an-explicit-symplectic
Repo https://github.com/AdamCobb/hamiltorch
Framework pytorch

Recurrent Neural Processes

Title Recurrent Neural Processes
Authors Timon Willi, Jonathan Masci, Jürgen Schmidhuber, Christian Osendorfer
Abstract We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models. RNPs model the state space with Neural Processes. Given time series observed on fast real-world time scales but containing slow long-term variabilities, RNPs may derive appropriate slow latent time scales. They do so in an efficient manner by establishing conditional independence among subsequences of the time series. Our theoretically grounded framework for stochastic processes expands the applicability of NPs while retaining their benefits of flexibility, uncertainty estimation, and favorable runtime with respect to Gaussian Processes (GPs). We demonstrate that state spaces learned by RNPs benefit predictive performance on real-world time-series data and nonlinear system identification, even in the case of limited data availability.
Tasks Gaussian Processes, Time Series
Published 2019-06-13
URL https://arxiv.org/abs/1906.05915v2
PDF https://arxiv.org/pdf/1906.05915v2.pdf
PWC https://paperswithcode.com/paper/recurrent-neural-processes
Repo https://github.com/KurochkinAlexey/Recurrent-neural-process
Framework pytorch

Kernelized Wasserstein Natural Gradient

Title Kernelized Wasserstein Natural Gradient
Authors Michael Arbel, Arthur Gretton, Wuchen Li, Guido Montufar
Abstract Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions. It is often beneficial to solve such optimization problems using natural gradient methods. These methods are invariant to the parametrization of the family, and thus can yield more effective optimization. Unfortunately, computing the natural gradient is challenging as it requires inverting a high dimensional matrix at each iteration. We propose a general framework to approximate the natural gradient for the Wasserstein metric, by leveraging a dual formulation of the metric restricted to a Reproducing Kernel Hilbert Space. Our approach leads to an estimator for gradient direction that can trade-off accuracy and computational cost, with theoretical guarantees. We verify its accuracy on simple examples, and show the advantage of using such an estimator in classification tasks on Cifar10 and Cifar100 empirically.
Tasks
Published 2019-10-21
URL https://arxiv.org/abs/1910.09652v4
PDF https://arxiv.org/pdf/1910.09652v4.pdf
PWC https://paperswithcode.com/paper/kernelized-wasserstein-natural-gradient-1
Repo https://github.com/MichaelArbel/KWNG
Framework pytorch

Adding Interpretable Attention to Neural Translation Models Improves Word Alignment

Title Adding Interpretable Attention to Neural Translation Models Improves Word Alignment
Authors Thomas Zenkel, Joern Wuebker, John DeNero
Abstract Multi-layer models with multiple attention heads per layer provide superior translation quality compared to simpler and shallower models, but determining what source context is most relevant to each target word is more challenging as a result. Therefore, deriving high-accuracy word alignments from the activations of a state-of-the-art neural machine translation model is an open challenge. We propose a simple model extension to the Transformer architecture that makes use of its hidden representations and is restricted to attend solely on encoder information to predict the next word. It can be trained on bilingual data without word-alignment information. We further introduce a novel alignment inference procedure which applies stochastic gradient descent to directly optimize the attention activations towards a given target word. The resulting alignments dramatically outperform the naive approach to interpreting Transformer attention activations, and are comparable to Giza++ on two publicly available data sets.
Tasks Machine Translation, Word Alignment
Published 2019-01-31
URL http://arxiv.org/abs/1901.11359v1
PDF http://arxiv.org/pdf/1901.11359v1.pdf
PWC https://paperswithcode.com/paper/adding-interpretable-attention-to-neural
Repo https://github.com/shuoyangd/meerkat
Framework pytorch

Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight

Title Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight
Authors Katie Kang, Suneel Belkhale, Gregory Kahn, Pieter Abbeel, Sergey Levine
Abstract Deep reinforcement learning provides a promising approach for vision-based control of real-world robots. However, the generalization of such models depends critically on the quantity and variety of data available for training. This data can be difficult to obtain for some types of robotic systems, such as fragile, small-scale quadrotors. Simulated rendering and physics can provide for much larger datasets, but such data is inherently of lower quality: many of the phenomena that make the real-world autonomous flight problem challenging, such as complex physics and air currents, are modeled poorly or not at all, and the systematic differences between simulation and the real world are typically impossible to eliminate. In this work, we investigate how data from both simulation and the real world can be combined in a hybrid deep reinforcement learning algorithm. Our method uses real-world data to learn about the dynamics of the system, and simulated data to learn a generalizable perception system that can enable the robot to avoid collisions using only a monocular camera. We demonstrate our approach on a real-world nano aerial vehicle collision avoidance task, showing that with only an hour of real-world data, the quadrotor can avoid collisions in new environments with various lighting conditions and geometry. Code, instructions for building the aerial vehicles, and videos of the experiments can be found at github.com/gkahn13/GtS
Tasks
Published 2019-02-11
URL http://arxiv.org/abs/1902.03701v1
PDF http://arxiv.org/pdf/1902.03701v1.pdf
PWC https://paperswithcode.com/paper/generalization-through-simulation-integrating
Repo https://github.com/gkahn13/GtS
Framework none

Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval

Title Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval
Authors Zhenguo Yang, Zehang Lin, Peipei Kang, Jianming Lv, Qing Li, Wenyin Liu
Abstract In this paper, we propose to learn shared semantic space with correlation alignment (${S}^{3}CA$) for multimodal data representations, which aligns nonlinear correlations of multimodal data distributions in deep neural networks designed for heterogeneous data. In the context of cross-modal (event) retrieval, we design a neural network with convolutional layers and fully-connected layers to extract features for images, including images on Flickr-like social media. Simultaneously, we exploit a fully-connected neural network to extract semantic features for texts, including news articles from news media. In particular, nonlinear correlations of layer activations in the two neural networks are aligned with correlation alignment during the joint training of the networks. Furthermore, we project the multimodal data into a shared semantic space for cross-modal (event) retrieval, where the distances between heterogeneous data samples can be measured directly. In addition, we contribute a Wiki-Flickr Event dataset, where the multimodal data samples are not describing each other in pairs like the existing paired datasets, but all of them are describing semantic events. Extensive experiments conducted on both paired and unpaired datasets manifest the effectiveness of ${S}^{3}CA$, outperforming the state-of-the-art methods.
Tasks
Published 2019-01-14
URL https://arxiv.org/abs/1901.04268v3
PDF https://arxiv.org/pdf/1901.04268v3.pdf
PWC https://paperswithcode.com/paper/learning-shared-semantic-space-with
Repo https://github.com/zhengyang5/Wiki-Flickr-Event-Dataset
Framework none

Patchy Image Structure Classification Using Multi-Orientation Region Transform

Title Patchy Image Structure Classification Using Multi-Orientation Region Transform
Authors Xiaohan Yu, Yang Zhao, Yongsheng Gao, Shengwu Xiong, Xiaohui Yuan
Abstract Exterior contour and interior structure are both vital features for classifying objects. However, most of the existing methods consider exterior contour feature and internal structure feature separately, and thus fail to function when classifying patchy image structures that have similar contours and flexible structures. To address above limitations, this paper proposes a novel Multi-Orientation Region Transform (MORT), which can effectively characterize both contour and structure features simultaneously, for patchy image structure classification. MORT is performed over multiple orientation regions at multiple scales to effectively integrate patchy features, and thus enables a better description of the shape in a coarse-to-fine manner. Moreover, the proposed MORT can be extended to combine with the deep convolutional neural network techniques, for further enhancement of classification accuracy. Very encouraging experimental results on the challenging ultra-fine-grained cultivar recognition task, insect wing recognition task, and large variation butterfly recognition task are obtained, which demonstrate the effectiveness and superiority of the proposed MORT over the state-of-the-art methods in classifying patchy image structures. Our code and three patchy image structure datasets are available at: https://github.com/XiaohanYu-GU/MReT2019.
Tasks
Published 2019-12-02
URL https://arxiv.org/abs/1912.00622v1
PDF https://arxiv.org/pdf/1912.00622v1.pdf
PWC https://paperswithcode.com/paper/patchy-image-structure-classification-using
Repo https://github.com/XiaohanYu-GU/MReT2019
Framework none

Variationally Inferred Sampling Through a Refined Bound for Probabilistic Programs

Title Variationally Inferred Sampling Through a Refined Bound for Probabilistic Programs
Authors Victor Gallego, David Rios Insua
Abstract A framework to boost the efficiency of Bayesian inference in probabilistic programs is introduced by embedding a sampler inside a variational posterior approximation. We call it the refined variational approximation. Its strength lies both in ease of implementation and automatically tuning of the sampler parameters to speed up mixing time using automatic differentiation. Several strategies to approximate \emph{evidence lower bound} (ELBO) computation are introduced. Experimental evidence of its efficient performance is shown solving an influence diagram in a high-dimensional space using a conditional variational autoencoder (cVAE) as a deep Bayes classifier; an unconditional VAE on density estimation tasks; and state-space models for time-series data.
Tasks Bayesian Inference, Density Estimation, Time Series
Published 2019-08-26
URL https://arxiv.org/abs/1908.09744v4
PDF https://arxiv.org/pdf/1908.09744v4.pdf
PWC https://paperswithcode.com/paper/variationally-inferred-sampling-through-a
Repo https://github.com/vicgalle/vis
Framework pytorch
comments powered by Disqus