February 1, 2020

2729 words 13 mins read

Paper Group AWR 84

Scalable Extreme Deconvolution. F3Net: Fusion, Feedback and Focus for Salient Object Detection. BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding. A Bayesian Decision Tree Algorithm. Modular Block-diagonal Curvature Approximations for Feedforward Architectures. Depth-Aware Video Frame Interpolation. Neutron: An Imp …

Scalable Extreme Deconvolution


Title	Scalable Extreme Deconvolution
Authors	James A. Ritchie, Iain Murray
Abstract	The Extreme Deconvolution method fits a probability density to a dataset where each observation has Gaussian noise added with a known sample-specific covariance, originally intended for use with astronomical datasets. The existing fitting method is batch EM, which would not normally be applied to large datasets such as the Gaia catalog containing noisy observations of a billion stars. We propose two minibatch variants of extreme deconvolution, based on an online variation of the EM algorithm, and direct gradient-based optimisation of the log-likelihood, both of which can run on GPUs. We demonstrate that these methods provide faster fitting, whilst being able to scale to much larger models for use with larger datasets.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11663v1
PDF	https://arxiv.org/pdf/1911.11663v1.pdf
PWC	https://paperswithcode.com/paper/scalable-extreme-deconvolution
Repo	https://github.com/bayesiains/scalable_xd
Framework	pytorch

F3Net: Fusion, Feedback and Focus for Salient Object Detection


Title	F3Net: Fusion, Feedback and Focus for Salient Object Detection
Authors	Jun Wei, Shuhui Wang, Qingming Huang
Abstract	Most of existing salient object detection models have achieved great progress by aggregating multi-level features extracted from convolutional neural networks. However, because of the different receptive fields of different convolutional layers, there exists big differences between features generated by these layers. Common feature fusion strategies (addition or concatenation) ignore these differences and may cause suboptimal solutions. In this paper, we propose the F3Net to solve above problem, which mainly consists of cross feature module (CFM) and cascaded feedback decoder (CFD) trained by minimizing a new pixel position aware loss (PPA). Specifically, CFM aims to selectively aggregate multi-level features. Different from addition and concatenation, CFM adaptively selects complementary components from input features before fusion, which can effectively avoid introducing too much redundant information that may destroy the original features. Besides, CFD adopts a multi-stage feedback mechanism, where features closed to supervision will be introduced to the output of previous layers to supplement them and eliminate the differences between features. These refined features will go through multiple similar iterations before generating the final saliency maps. Furthermore, different from binary cross entropy, the proposed PPA loss doesn’t treat pixels equally, which can synthesize the local structure information of a pixel to guide the network to focus more on local details. Hard pixels from boundaries or error-prone parts will be given more attention to emphasize their importance. F3Net is able to segment salient object regions accurately and provide clear local details. Comprehensive experiments on five benchmark datasets demonstrate that F3Net outperforms state-of-the-art approaches on six evaluation metrics.
Tasks	Object Detection, Salient Object Detection
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11445v1
PDF	https://arxiv.org/pdf/1911.11445v1.pdf
PWC	https://paperswithcode.com/paper/f3net-fusion-feedback-and-focus-for-salient
Repo	https://github.com/weijun88/F3Net
Framework	pytorch

BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding


Title	BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding
Authors	Timo I. Denk, Christian Reisswig
Abstract	For understanding generic documents, information like font sizes, column layout, and generally the positioning of words may carry semantic information that is crucial for solving a downstream document intelligence task. Our novel BERTgrid, which is based on Chargrid by Katti et al. (2018), represents a document as a grid of contextualized word piece embedding vectors, thereby making its spatial structure and semantics accessible to the processing neural network. The contextualized embedding vectors are retrieved from a BERT language model. We use BERTgrid in combination with a fully convolutional network on a semantic instance segmentation task for extracting fields from invoices. We demonstrate its performance on tabulated line item and document header field extraction.
Tasks	Instance Segmentation, Language Modelling, Semantic Segmentation
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04948v2
PDF	https://arxiv.org/pdf/1909.04948v2.pdf
PWC	https://paperswithcode.com/paper/bertgrid-contextualized-embedding-for-2d
Repo	https://github.com/sam-ai/BertGrid
Framework	none

A Bayesian Decision Tree Algorithm


Title	A Bayesian Decision Tree Algorithm
Authors	Giuseppe Nuti, Lluís Antoni Jiménez Rugama, Andreea-Ingrid Cross
Abstract	Bayesian Decision Trees are known for their probabilistic interpretability. However, their construction can sometimes be costly. In this article we present a general Bayesian Decision Tree algorithm applicable to both regression and classification problems. The algorithm does not apply Markov Chain Monte Carlo and does not require a pruning step. While it is possible to construct a weighted probability tree space we find that one particular tree, the greedy-modal tree (GMT), explains most of the information contained in the numerical examples. This approach seems to perform similarly to Random Forests.
Tasks
Published	2019-01-10
URL	http://arxiv.org/abs/1901.03214v2
PDF	http://arxiv.org/pdf/1901.03214v2.pdf
PWC	https://paperswithcode.com/paper/a-bayesian-decision-tree-algorithm
Repo	https://github.com/UBS-IB/bayesian_tree
Framework	none

Modular Block-diagonal Curvature Approximations for Feedforward Architectures


Title	Modular Block-diagonal Curvature Approximations for Feedforward Architectures
Authors	Felix Dangel, Stefan Harmeling, Philipp Hennig
Abstract	We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.
Tasks
Published	2019-02-05
URL	https://arxiv.org/abs/1902.01813v3
PDF	https://arxiv.org/pdf/1902.01813v3.pdf
PWC	https://paperswithcode.com/paper/a-modular-approach-to-block-diagonal-hessian
Repo	https://github.com/f-dangel/hbp
Framework	pytorch

Depth-Aware Video Frame Interpolation


Title	Depth-Aware Video Frame Interpolation
Authors	Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang
Abstract	Video frame interpolation aims to synthesize nonexistent frames in-between the original frames. While significant advances have been made from the recent deep convolutional neural networks, the quality of interpolation is often reduced due to large object motion or occlusion. In this work, we propose a video frame interpolation method which explicitly detects the occlusion by exploring the depth information. Specifically, we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. In addition, we learn hierarchical features to gather contextual information from neighboring pixels. The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame. Our model is compact, efficient, and fully differentiable. Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets.
Tasks	Optical Flow Estimation, Video Frame Interpolation
Published	2019-04-01
URL	http://arxiv.org/abs/1904.00830v1
PDF	http://arxiv.org/pdf/1904.00830v1.pdf
PWC	https://paperswithcode.com/paper/depth-aware-video-frame-interpolation
Repo	https://github.com/baowenbo/DAIN
Framework	pytorch

Neutron: An Implementation of the Transformer Translation Model and its Variants


Title	Neutron: An Implementation of the Transformer Translation Model and its Variants
Authors	Hongfei Xu, Qiuhui Liu
Abstract	The Transformer translation model is easier to parallelize and provides better performance compared to recurrent seq2seq models, which makes it popular among industry and research community. We implement the Neutron in this work, including the Transformer model and its several variants from most recent researches. It is highly optimized, easy to modify and provides comparable performance with interesting features while keeping readability.
Tasks
Published	2019-03-18
URL	https://arxiv.org/abs/1903.07402v2
PDF	https://arxiv.org/pdf/1903.07402v2.pdf
PWC	https://paperswithcode.com/paper/neutron-an-implementation-of-the-transformer
Repo	https://github.com/anoidgit/transformer
Framework	pytorch

Introducing an Explicit Symplectic Integration Scheme for Riemannian Manifold Hamiltonian Monte Carlo


Title	Introducing an Explicit Symplectic Integration Scheme for Riemannian Manifold Hamiltonian Monte Carlo
Authors	Adam D. Cobb, Atılım Güneş Baydin, Andrew Markham, Stephen J. Roberts
Abstract	We introduce a recent symplectic integration scheme derived for solving physically motivated systems with non-separable Hamiltonians. We show its relevance to Riemannian manifold Hamiltonian Monte Carlo (RMHMC) and provide an alternative to the currently used generalised leapfrog symplectic integrator, which relies on solving multiple fixed point iterations to convergence. Via this approach, we are able to reduce the number of higher-order derivative calculations per leapfrog step. We explore the implications of this integrator and demonstrate its efficacy in reducing the computational burden of RMHMC. Our code is provided in a new open-source Python package, hamiltorch.
Tasks	Bayesian Inference
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06243v1
PDF	https://arxiv.org/pdf/1910.06243v1.pdf
PWC	https://paperswithcode.com/paper/introducing-an-explicit-symplectic
Repo	https://github.com/AdamCobb/hamiltorch
Framework	pytorch

Recurrent Neural Processes


Title	Recurrent Neural Processes
Authors	Timon Willi, Jonathan Masci, Jürgen Schmidhuber, Christian Osendorfer
Abstract	We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models. RNPs model the state space with Neural Processes. Given time series observed on fast real-world time scales but containing slow long-term variabilities, RNPs may derive appropriate slow latent time scales. They do so in an efficient manner by establishing conditional independence among subsequences of the time series. Our theoretically grounded framework for stochastic processes expands the applicability of NPs while retaining their benefits of flexibility, uncertainty estimation, and favorable runtime with respect to Gaussian Processes (GPs). We demonstrate that state spaces learned by RNPs benefit predictive performance on real-world time-series data and nonlinear system identification, even in the case of limited data availability.
Tasks	Gaussian Processes, Time Series
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05915v2
PDF	https://arxiv.org/pdf/1906.05915v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-processes
Repo	https://github.com/KurochkinAlexey/Recurrent-neural-process
Framework	pytorch

Kernelized Wasserstein Natural Gradient


Title	Kernelized Wasserstein Natural Gradient
Authors	Michael Arbel, Arthur Gretton, Wuchen Li, Guido Montufar
Abstract	Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions. It is often beneficial to solve such optimization problems using natural gradient methods. These methods are invariant to the parametrization of the family, and thus can yield more effective optimization. Unfortunately, computing the natural gradient is challenging as it requires inverting a high dimensional matrix at each iteration. We propose a general framework to approximate the natural gradient for the Wasserstein metric, by leveraging a dual formulation of the metric restricted to a Reproducing Kernel Hilbert Space. Our approach leads to an estimator for gradient direction that can trade-off accuracy and computational cost, with theoretical guarantees. We verify its accuracy on simple examples, and show the advantage of using such an estimator in classification tasks on Cifar10 and Cifar100 empirically.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09652v4
PDF	https://arxiv.org/pdf/1910.09652v4.pdf
PWC	https://paperswithcode.com/paper/kernelized-wasserstein-natural-gradient-1
Repo	https://github.com/MichaelArbel/KWNG
Framework	pytorch

Adding Interpretable Attention to Neural Translation Models Improves Word Alignment


Title	Adding Interpretable Attention to Neural Translation Models Improves Word Alignment
Authors	Thomas Zenkel, Joern Wuebker, John DeNero
Abstract	Multi-layer models with multiple attention heads per layer provide superior translation quality compared to simpler and shallower models, but determining what source context is most relevant to each target word is more challenging as a result. Therefore, deriving high-accuracy word alignments from the activations of a state-of-the-art neural machine translation model is an open challenge. We propose a simple model extension to the Transformer architecture that makes use of its hidden representations and is restricted to attend solely on encoder information to predict the next word. It can be trained on bilingual data without word-alignment information. We further introduce a novel alignment inference procedure which applies stochastic gradient descent to directly optimize the attention activations towards a given target word. The resulting alignments dramatically outperform the naive approach to interpreting Transformer attention activations, and are comparable to Giza++ on two publicly available data sets.
Tasks	Machine Translation, Word Alignment
Published	2019-01-31
URL	http://arxiv.org/abs/1901.11359v1
PDF	http://arxiv.org/pdf/1901.11359v1.pdf
PWC	https://paperswithcode.com/paper/adding-interpretable-attention-to-neural
Repo	https://github.com/shuoyangd/meerkat
Framework	pytorch

Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight


Title	Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight
Authors	Katie Kang, Suneel Belkhale, Gregory Kahn, Pieter Abbeel, Sergey Levine
Abstract	Deep reinforcement learning provides a promising approach for vision-based control of real-world robots. However, the generalization of such models depends critically on the quantity and variety of data available for training. This data can be difficult to obtain for some types of robotic systems, such as fragile, small-scale quadrotors. Simulated rendering and physics can provide for much larger datasets, but such data is inherently of lower quality: many of the phenomena that make the real-world autonomous flight problem challenging, such as complex physics and air currents, are modeled poorly or not at all, and the systematic differences between simulation and the real world are typically impossible to eliminate. In this work, we investigate how data from both simulation and the real world can be combined in a hybrid deep reinforcement learning algorithm. Our method uses real-world data to learn about the dynamics of the system, and simulated data to learn a generalizable perception system that can enable the robot to avoid collisions using only a monocular camera. We demonstrate our approach on a real-world nano aerial vehicle collision avoidance task, showing that with only an hour of real-world data, the quadrotor can avoid collisions in new environments with various lighting conditions and geometry. Code, instructions for building the aerial vehicles, and videos of the experiments can be found at github.com/gkahn13/GtS
Tasks
Published	2019-02-11
URL	http://arxiv.org/abs/1902.03701v1
PDF	http://arxiv.org/pdf/1902.03701v1.pdf
PWC	https://paperswithcode.com/paper/generalization-through-simulation-integrating
Repo	https://github.com/gkahn13/GtS
Framework	none


Title	Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval
Authors	Zhenguo Yang, Zehang Lin, Peipei Kang, Jianming Lv, Qing Li, Wenyin Liu
Abstract	In this paper, we propose to learn shared semantic space with correlation alignment (${S}^{3}CA$) for multimodal data representations, which aligns nonlinear correlations of multimodal data distributions in deep neural networks designed for heterogeneous data. In the context of cross-modal (event) retrieval, we design a neural network with convolutional layers and fully-connected layers to extract features for images, including images on Flickr-like social media. Simultaneously, we exploit a fully-connected neural network to extract semantic features for texts, including news articles from news media. In particular, nonlinear correlations of layer activations in the two neural networks are aligned with correlation alignment during the joint training of the networks. Furthermore, we project the multimodal data into a shared semantic space for cross-modal (event) retrieval, where the distances between heterogeneous data samples can be measured directly. In addition, we contribute a Wiki-Flickr Event dataset, where the multimodal data samples are not describing each other in pairs like the existing paired datasets, but all of them are describing semantic events. Extensive experiments conducted on both paired and unpaired datasets manifest the effectiveness of ${S}^{3}CA$, outperforming the state-of-the-art methods.
Tasks
Published	2019-01-14
URL	https://arxiv.org/abs/1901.04268v3
PDF	https://arxiv.org/pdf/1901.04268v3.pdf
PWC	https://paperswithcode.com/paper/learning-shared-semantic-space-with
Repo	https://github.com/zhengyang5/Wiki-Flickr-Event-Dataset
Framework	none

Patchy Image Structure Classification Using Multi-Orientation Region Transform


Title	Patchy Image Structure Classification Using Multi-Orientation Region Transform
Authors	Xiaohan Yu, Yang Zhao, Yongsheng Gao, Shengwu Xiong, Xiaohui Yuan
Abstract	Exterior contour and interior structure are both vital features for classifying objects. However, most of the existing methods consider exterior contour feature and internal structure feature separately, and thus fail to function when classifying patchy image structures that have similar contours and flexible structures. To address above limitations, this paper proposes a novel Multi-Orientation Region Transform (MORT), which can effectively characterize both contour and structure features simultaneously, for patchy image structure classification. MORT is performed over multiple orientation regions at multiple scales to effectively integrate patchy features, and thus enables a better description of the shape in a coarse-to-fine manner. Moreover, the proposed MORT can be extended to combine with the deep convolutional neural network techniques, for further enhancement of classification accuracy. Very encouraging experimental results on the challenging ultra-fine-grained cultivar recognition task, insect wing recognition task, and large variation butterfly recognition task are obtained, which demonstrate the effectiveness and superiority of the proposed MORT over the state-of-the-art methods in classifying patchy image structures. Our code and three patchy image structure datasets are available at: https://github.com/XiaohanYu-GU/MReT2019.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00622v1
PDF	https://arxiv.org/pdf/1912.00622v1.pdf
PWC	https://paperswithcode.com/paper/patchy-image-structure-classification-using
Repo	https://github.com/XiaohanYu-GU/MReT2019
Framework	none

Variationally Inferred Sampling Through a Refined Bound for Probabilistic Programs


Title	Variationally Inferred Sampling Through a Refined Bound for Probabilistic Programs
Authors	Victor Gallego, David Rios Insua
Abstract	A framework to boost the efficiency of Bayesian inference in probabilistic programs is introduced by embedding a sampler inside a variational posterior approximation. We call it the refined variational approximation. Its strength lies both in ease of implementation and automatically tuning of the sampler parameters to speed up mixing time using automatic differentiation. Several strategies to approximate \emph{evidence lower bound} (ELBO) computation are introduced. Experimental evidence of its efficient performance is shown solving an influence diagram in a high-dimensional space using a conditional variational autoencoder (cVAE) as a deep Bayes classifier; an unconditional VAE on density estimation tasks; and state-space models for time-series data.
Tasks	Bayesian Inference, Density Estimation, Time Series
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09744v4
PDF	https://arxiv.org/pdf/1908.09744v4.pdf
PWC	https://paperswithcode.com/paper/variationally-inferred-sampling-through-a
Repo	https://github.com/vicgalle/vis
Framework	pytorch