Paper Group AWR 84
Scalable Extreme Deconvolution. F3Net: Fusion, Feedback and Focus for Salient Object Detection. BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding. A Bayesian Decision Tree Algorithm. Modular Block-diagonal Curvature Approximations for Feedforward Architectures. Depth-Aware Video Frame Interpolation. Neutron: An Imp …
Scalable Extreme Deconvolution
Title | Scalable Extreme Deconvolution |
Authors | James A. Ritchie, Iain Murray |
Abstract | The Extreme Deconvolution method fits a probability density to a dataset where each observation has Gaussian noise added with a known sample-specific covariance, originally intended for use with astronomical datasets. The existing fitting method is batch EM, which would not normally be applied to large datasets such as the Gaia catalog containing noisy observations of a billion stars. We propose two minibatch variants of extreme deconvolution, based on an online variation of the EM algorithm, and direct gradient-based optimisation of the log-likelihood, both of which can run on GPUs. We demonstrate that these methods provide faster fitting, whilst being able to scale to much larger models for use with larger datasets. |
Tasks | |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11663v1 |
https://arxiv.org/pdf/1911.11663v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-extreme-deconvolution |
Repo | https://github.com/bayesiains/scalable_xd |
Framework | pytorch |
F3Net: Fusion, Feedback and Focus for Salient Object Detection
Title | F3Net: Fusion, Feedback and Focus for Salient Object Detection |
Authors | Jun Wei, Shuhui Wang, Qingming Huang |
Abstract | Most of existing salient object detection models have achieved great progress by aggregating multi-level features extracted from convolutional neural networks. However, because of the different receptive fields of different convolutional layers, there exists big differences between features generated by these layers. Common feature fusion strategies (addition or concatenation) ignore these differences and may cause suboptimal solutions. In this paper, we propose the F3Net to solve above problem, which mainly consists of cross feature module (CFM) and cascaded feedback decoder (CFD) trained by minimizing a new pixel position aware loss (PPA). Specifically, CFM aims to selectively aggregate multi-level features. Different from addition and concatenation, CFM adaptively selects complementary components from input features before fusion, which can effectively avoid introducing too much redundant information that may destroy the original features. Besides, CFD adopts a multi-stage feedback mechanism, where features closed to supervision will be introduced to the output of previous layers to supplement them and eliminate the differences between features. These refined features will go through multiple similar iterations before generating the final saliency maps. Furthermore, different from binary cross entropy, the proposed PPA loss doesn’t treat pixels equally, which can synthesize the local structure information of a pixel to guide the network to focus more on local details. Hard pixels from boundaries or error-prone parts will be given more attention to emphasize their importance. F3Net is able to segment salient object regions accurately and provide clear local details. Comprehensive experiments on five benchmark datasets demonstrate that F3Net outperforms state-of-the-art approaches on six evaluation metrics. |
Tasks | Object Detection, Salient Object Detection |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11445v1 |
https://arxiv.org/pdf/1911.11445v1.pdf | |
PWC | https://paperswithcode.com/paper/f3net-fusion-feedback-and-focus-for-salient |
Repo | https://github.com/weijun88/F3Net |
Framework | pytorch |
BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding
Title | BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding |
Authors | Timo I. Denk, Christian Reisswig |
Abstract | For understanding generic documents, information like font sizes, column layout, and generally the positioning of words may carry semantic information that is crucial for solving a downstream document intelligence task. Our novel BERTgrid, which is based on Chargrid by Katti et al. (2018), represents a document as a grid of contextualized word piece embedding vectors, thereby making its spatial structure and semantics accessible to the processing neural network. The contextualized embedding vectors are retrieved from a BERT language model. We use BERTgrid in combination with a fully convolutional network on a semantic instance segmentation task for extracting fields from invoices. We demonstrate its performance on tabulated line item and document header field extraction. |
Tasks | Instance Segmentation, Language Modelling, Semantic Segmentation |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04948v2 |
https://arxiv.org/pdf/1909.04948v2.pdf | |
PWC | https://paperswithcode.com/paper/bertgrid-contextualized-embedding-for-2d |
Repo | https://github.com/sam-ai/BertGrid |
Framework | none |
A Bayesian Decision Tree Algorithm
Title | A Bayesian Decision Tree Algorithm |
Authors | Giuseppe Nuti, Lluís Antoni Jiménez Rugama, Andreea-Ingrid Cross |
Abstract | Bayesian Decision Trees are known for their probabilistic interpretability. However, their construction can sometimes be costly. In this article we present a general Bayesian Decision Tree algorithm applicable to both regression and classification problems. The algorithm does not apply Markov Chain Monte Carlo and does not require a pruning step. While it is possible to construct a weighted probability tree space we find that one particular tree, the greedy-modal tree (GMT), explains most of the information contained in the numerical examples. This approach seems to perform similarly to Random Forests. |
Tasks | |
Published | 2019-01-10 |
URL | http://arxiv.org/abs/1901.03214v2 |
http://arxiv.org/pdf/1901.03214v2.pdf | |
PWC | https://paperswithcode.com/paper/a-bayesian-decision-tree-algorithm |
Repo | https://github.com/UBS-IB/bayesian_tree |
Framework | none |
Modular Block-diagonal Curvature Approximations for Feedforward Architectures
Title | Modular Block-diagonal Curvature Approximations for Feedforward Architectures |
Authors | Felix Dangel, Stefan Harmeling, Philipp Hennig |
Abstract | We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work. |
Tasks | |
Published | 2019-02-05 |
URL | https://arxiv.org/abs/1902.01813v3 |
https://arxiv.org/pdf/1902.01813v3.pdf | |
PWC | https://paperswithcode.com/paper/a-modular-approach-to-block-diagonal-hessian |
Repo | https://github.com/f-dangel/hbp |
Framework | pytorch |
Depth-Aware Video Frame Interpolation
Title | Depth-Aware Video Frame Interpolation |
Authors | Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang |
Abstract | Video frame interpolation aims to synthesize nonexistent frames in-between the original frames. While significant advances have been made from the recent deep convolutional neural networks, the quality of interpolation is often reduced due to large object motion or occlusion. In this work, we propose a video frame interpolation method which explicitly detects the occlusion by exploring the depth information. Specifically, we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. In addition, we learn hierarchical features to gather contextual information from neighboring pixels. The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame. Our model is compact, efficient, and fully differentiable. Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets. |
Tasks | Optical Flow Estimation, Video Frame Interpolation |
Published | 2019-04-01 |
URL | http://arxiv.org/abs/1904.00830v1 |
http://arxiv.org/pdf/1904.00830v1.pdf | |
PWC | https://paperswithcode.com/paper/depth-aware-video-frame-interpolation |
Repo | https://github.com/baowenbo/DAIN |
Framework | pytorch |
Neutron: An Implementation of the Transformer Translation Model and its Variants
Title | Neutron: An Implementation of the Transformer Translation Model and its Variants |
Authors | Hongfei Xu, Qiuhui Liu |
Abstract | The Transformer translation model is easier to parallelize and provides better performance compared to recurrent seq2seq models, which makes it popular among industry and research community. We implement the Neutron in this work, including the Transformer model and its several variants from most recent researches. It is highly optimized, easy to modify and provides comparable performance with interesting features while keeping readability. |
Tasks | |
Published | 2019-03-18 |
URL | https://arxiv.org/abs/1903.07402v2 |
https://arxiv.org/pdf/1903.07402v2.pdf | |
PWC | https://paperswithcode.com/paper/neutron-an-implementation-of-the-transformer |
Repo | https://github.com/anoidgit/transformer |
Framework | pytorch |
Introducing an Explicit Symplectic Integration Scheme for Riemannian Manifold Hamiltonian Monte Carlo
Title | Introducing an Explicit Symplectic Integration Scheme for Riemannian Manifold Hamiltonian Monte Carlo |
Authors | Adam D. Cobb, Atılım Güneş Baydin, Andrew Markham, Stephen J. Roberts |
Abstract | We introduce a recent symplectic integration scheme derived for solving physically motivated systems with non-separable Hamiltonians. We show its relevance to Riemannian manifold Hamiltonian Monte Carlo (RMHMC) and provide an alternative to the currently used generalised leapfrog symplectic integrator, which relies on solving multiple fixed point iterations to convergence. Via this approach, we are able to reduce the number of higher-order derivative calculations per leapfrog step. We explore the implications of this integrator and demonstrate its efficacy in reducing the computational burden of RMHMC. Our code is provided in a new open-source Python package, hamiltorch. |
Tasks | Bayesian Inference |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06243v1 |
https://arxiv.org/pdf/1910.06243v1.pdf | |
PWC | https://paperswithcode.com/paper/introducing-an-explicit-symplectic |
Repo | https://github.com/AdamCobb/hamiltorch |
Framework | pytorch |
Recurrent Neural Processes
Title | Recurrent Neural Processes |
Authors | Timon Willi, Jonathan Masci, Jürgen Schmidhuber, Christian Osendorfer |
Abstract | We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models. RNPs model the state space with Neural Processes. Given time series observed on fast real-world time scales but containing slow long-term variabilities, RNPs may derive appropriate slow latent time scales. They do so in an efficient manner by establishing conditional independence among subsequences of the time series. Our theoretically grounded framework for stochastic processes expands the applicability of NPs while retaining their benefits of flexibility, uncertainty estimation, and favorable runtime with respect to Gaussian Processes (GPs). We demonstrate that state spaces learned by RNPs benefit predictive performance on real-world time-series data and nonlinear system identification, even in the case of limited data availability. |
Tasks | Gaussian Processes, Time Series |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05915v2 |
https://arxiv.org/pdf/1906.05915v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-processes |
Repo | https://github.com/KurochkinAlexey/Recurrent-neural-process |
Framework | pytorch |
Kernelized Wasserstein Natural Gradient
Title | Kernelized Wasserstein Natural Gradient |
Authors | Michael Arbel, Arthur Gretton, Wuchen Li, Guido Montufar |
Abstract | Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions. It is often beneficial to solve such optimization problems using natural gradient methods. These methods are invariant to the parametrization of the family, and thus can yield more effective optimization. Unfortunately, computing the natural gradient is challenging as it requires inverting a high dimensional matrix at each iteration. We propose a general framework to approximate the natural gradient for the Wasserstein metric, by leveraging a dual formulation of the metric restricted to a Reproducing Kernel Hilbert Space. Our approach leads to an estimator for gradient direction that can trade-off accuracy and computational cost, with theoretical guarantees. We verify its accuracy on simple examples, and show the advantage of using such an estimator in classification tasks on Cifar10 and Cifar100 empirically. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09652v4 |
https://arxiv.org/pdf/1910.09652v4.pdf | |
PWC | https://paperswithcode.com/paper/kernelized-wasserstein-natural-gradient-1 |
Repo | https://github.com/MichaelArbel/KWNG |
Framework | pytorch |
Adding Interpretable Attention to Neural Translation Models Improves Word Alignment
Title | Adding Interpretable Attention to Neural Translation Models Improves Word Alignment |
Authors | Thomas Zenkel, Joern Wuebker, John DeNero |
Abstract | Multi-layer models with multiple attention heads per layer provide superior translation quality compared to simpler and shallower models, but determining what source context is most relevant to each target word is more challenging as a result. Therefore, deriving high-accuracy word alignments from the activations of a state-of-the-art neural machine translation model is an open challenge. We propose a simple model extension to the Transformer architecture that makes use of its hidden representations and is restricted to attend solely on encoder information to predict the next word. It can be trained on bilingual data without word-alignment information. We further introduce a novel alignment inference procedure which applies stochastic gradient descent to directly optimize the attention activations towards a given target word. The resulting alignments dramatically outperform the naive approach to interpreting Transformer attention activations, and are comparable to Giza++ on two publicly available data sets. |
Tasks | Machine Translation, Word Alignment |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1901.11359v1 |
http://arxiv.org/pdf/1901.11359v1.pdf | |
PWC | https://paperswithcode.com/paper/adding-interpretable-attention-to-neural |
Repo | https://github.com/shuoyangd/meerkat |
Framework | pytorch |
Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight
Title | Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight |
Authors | Katie Kang, Suneel Belkhale, Gregory Kahn, Pieter Abbeel, Sergey Levine |
Abstract | Deep reinforcement learning provides a promising approach for vision-based control of real-world robots. However, the generalization of such models depends critically on the quantity and variety of data available for training. This data can be difficult to obtain for some types of robotic systems, such as fragile, small-scale quadrotors. Simulated rendering and physics can provide for much larger datasets, but such data is inherently of lower quality: many of the phenomena that make the real-world autonomous flight problem challenging, such as complex physics and air currents, are modeled poorly or not at all, and the systematic differences between simulation and the real world are typically impossible to eliminate. In this work, we investigate how data from both simulation and the real world can be combined in a hybrid deep reinforcement learning algorithm. Our method uses real-world data to learn about the dynamics of the system, and simulated data to learn a generalizable perception system that can enable the robot to avoid collisions using only a monocular camera. We demonstrate our approach on a real-world nano aerial vehicle collision avoidance task, showing that with only an hour of real-world data, the quadrotor can avoid collisions in new environments with various lighting conditions and geometry. Code, instructions for building the aerial vehicles, and videos of the experiments can be found at github.com/gkahn13/GtS |
Tasks | |
Published | 2019-02-11 |
URL | http://arxiv.org/abs/1902.03701v1 |
http://arxiv.org/pdf/1902.03701v1.pdf | |
PWC | https://paperswithcode.com/paper/generalization-through-simulation-integrating |
Repo | https://github.com/gkahn13/GtS |
Framework | none |
Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval
Title | Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval |
Authors | Zhenguo Yang, Zehang Lin, Peipei Kang, Jianming Lv, Qing Li, Wenyin Liu |
Abstract | In this paper, we propose to learn shared semantic space with correlation alignment (${S}^{3}CA$) for multimodal data representations, which aligns nonlinear correlations of multimodal data distributions in deep neural networks designed for heterogeneous data. In the context of cross-modal (event) retrieval, we design a neural network with convolutional layers and fully-connected layers to extract features for images, including images on Flickr-like social media. Simultaneously, we exploit a fully-connected neural network to extract semantic features for texts, including news articles from news media. In particular, nonlinear correlations of layer activations in the two neural networks are aligned with correlation alignment during the joint training of the networks. Furthermore, we project the multimodal data into a shared semantic space for cross-modal (event) retrieval, where the distances between heterogeneous data samples can be measured directly. In addition, we contribute a Wiki-Flickr Event dataset, where the multimodal data samples are not describing each other in pairs like the existing paired datasets, but all of them are describing semantic events. Extensive experiments conducted on both paired and unpaired datasets manifest the effectiveness of ${S}^{3}CA$, outperforming the state-of-the-art methods. |
Tasks | |
Published | 2019-01-14 |
URL | https://arxiv.org/abs/1901.04268v3 |
https://arxiv.org/pdf/1901.04268v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-shared-semantic-space-with |
Repo | https://github.com/zhengyang5/Wiki-Flickr-Event-Dataset |
Framework | none |
Patchy Image Structure Classification Using Multi-Orientation Region Transform
Title | Patchy Image Structure Classification Using Multi-Orientation Region Transform |
Authors | Xiaohan Yu, Yang Zhao, Yongsheng Gao, Shengwu Xiong, Xiaohui Yuan |
Abstract | Exterior contour and interior structure are both vital features for classifying objects. However, most of the existing methods consider exterior contour feature and internal structure feature separately, and thus fail to function when classifying patchy image structures that have similar contours and flexible structures. To address above limitations, this paper proposes a novel Multi-Orientation Region Transform (MORT), which can effectively characterize both contour and structure features simultaneously, for patchy image structure classification. MORT is performed over multiple orientation regions at multiple scales to effectively integrate patchy features, and thus enables a better description of the shape in a coarse-to-fine manner. Moreover, the proposed MORT can be extended to combine with the deep convolutional neural network techniques, for further enhancement of classification accuracy. Very encouraging experimental results on the challenging ultra-fine-grained cultivar recognition task, insect wing recognition task, and large variation butterfly recognition task are obtained, which demonstrate the effectiveness and superiority of the proposed MORT over the state-of-the-art methods in classifying patchy image structures. Our code and three patchy image structure datasets are available at: https://github.com/XiaohanYu-GU/MReT2019. |
Tasks | |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00622v1 |
https://arxiv.org/pdf/1912.00622v1.pdf | |
PWC | https://paperswithcode.com/paper/patchy-image-structure-classification-using |
Repo | https://github.com/XiaohanYu-GU/MReT2019 |
Framework | none |
Variationally Inferred Sampling Through a Refined Bound for Probabilistic Programs
Title | Variationally Inferred Sampling Through a Refined Bound for Probabilistic Programs |
Authors | Victor Gallego, David Rios Insua |
Abstract | A framework to boost the efficiency of Bayesian inference in probabilistic programs is introduced by embedding a sampler inside a variational posterior approximation. We call it the refined variational approximation. Its strength lies both in ease of implementation and automatically tuning of the sampler parameters to speed up mixing time using automatic differentiation. Several strategies to approximate \emph{evidence lower bound} (ELBO) computation are introduced. Experimental evidence of its efficient performance is shown solving an influence diagram in a high-dimensional space using a conditional variational autoencoder (cVAE) as a deep Bayes classifier; an unconditional VAE on density estimation tasks; and state-space models for time-series data. |
Tasks | Bayesian Inference, Density Estimation, Time Series |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09744v4 |
https://arxiv.org/pdf/1908.09744v4.pdf | |
PWC | https://paperswithcode.com/paper/variationally-inferred-sampling-through-a |
Repo | https://github.com/vicgalle/vis |
Framework | pytorch |