July 29, 2019

2959 words 14 mins read

Paper Group AWR 117

Paper Group AWR 117

Deep Learning Based Large-Scale Automatic Satellite Crosswalk Classification. A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-based Variational Autoencoder. Real-Time 6DOF Pose Relocalization for Event Cameras with Stacked Spatial LSTM Networks. Frame-Based Continuous Lexical Semantics through Exponential Family Tensor Factor …

Deep Learning Based Large-Scale Automatic Satellite Crosswalk Classification

Title Deep Learning Based Large-Scale Automatic Satellite Crosswalk Classification
Authors Rodrigo F. Berriel, Andre Teixeira Lopes, Alberto F. de Souza, Thiago Oliveira-Santos
Abstract High-resolution satellite imagery have been increasingly used on remote sensing classification problems. One of the main factors is the availability of this kind of data. Even though, very little effort has been placed on the zebra crossing classification problem. In this letter, crowdsourcing systems are exploited in order to enable the automatic acquisition and annotation of a large-scale satellite imagery database for crosswalks related tasks. Then, this dataset is used to train deep-learning-based models in order to accurately classify satellite images that contains or not zebra crossings. A novel dataset with more than 240,000 images from 3 continents, 9 countries and more than 20 cities was used in the experiments. Experimental results showed that freely available crowdsourcing data can be used to accurately (97.11%) train robust models to perform crosswalk classification on a global scale.
Tasks
Published 2017-06-28
URL http://arxiv.org/abs/1706.09302v2
PDF http://arxiv.org/pdf/1706.09302v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-based-large-scale-automatic
Repo https://github.com/rodrigoberriel/satellite-crosswalk-classification
Framework none

A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-based Variational Autoencoder

Title A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-based Variational Autoencoder
Authors Daehyung Park, Yuuna Hoshi, Charles C. Kemp
Abstract The detection of anomalous executions is valuable for reducing potential hazards in assistive manipulation. Multimodal sensory signals can be helpful for detecting a wide range of anomalies. However, the fusion of high-dimensional and heterogeneous modalities is a challenging problem. We introduce a long short-term memory based variational autoencoder (LSTM-VAE) that fuses signals and reconstructs their expected distribution. We also introduce an LSTM-VAE-based detector using a reconstruction-based anomaly score and a state-based threshold. For evaluations with 1,555 robot-assisted feeding executions including 12 representative types of anomalies, our detector had a higher area under the receiver operating characteristic curve (AUC) of 0.8710 than 5 other baseline detectors from the literature. We also show the multimodal fusion through the LSTM-VAE is effective by comparing our detector with 17 raw sensory signals versus 4 hand-engineered features.
Tasks
Published 2017-11-02
URL http://arxiv.org/abs/1711.00614v1
PDF http://arxiv.org/pdf/1711.00614v1.pdf
PWC https://paperswithcode.com/paper/a-multimodal-anomaly-detector-for-robot
Repo https://github.com/freedombenLiu/RNN-Time-series-Anomaly-Detection
Framework pytorch

Real-Time 6DOF Pose Relocalization for Event Cameras with Stacked Spatial LSTM Networks

Title Real-Time 6DOF Pose Relocalization for Event Cameras with Stacked Spatial LSTM Networks
Authors Anh Nguyen, Thanh-Toan Do, Darwin G. Caldwell, Nikos G. Tsagarakis
Abstract We present a new method to relocalize the 6DOF pose of an event camera solely based on the event stream. Our method first creates the event image from a list of events that occurs in a very short time interval, then a Stacked Spatial LSTM Network (SP-LSTM) is used to learn the camera pose. Our SP-LSTM is composed of a CNN to learn deep features from the event images and a stack of LSTM to learn spatial dependencies in the image feature space. We show that the spatial dependency plays an important role in the relocalization task and the SP-LSTM can effectively learn this information. The experimental results on a publicly available dataset show that our approach generalizes well and outperforms recent methods by a substantial margin. Overall, our proposed method reduces by approx. 6 times the position error and 3 times the orientation error compared to the current state of the art. The source code and trained models will be released.
Tasks
Published 2017-08-22
URL http://arxiv.org/abs/1708.09011v3
PDF http://arxiv.org/pdf/1708.09011v3.pdf
PWC https://paperswithcode.com/paper/real-time-6dof-pose-relocalization-for-event
Repo https://github.com/nqanh/pose_relocalization
Framework tf

Frame-Based Continuous Lexical Semantics through Exponential Family Tensor Factorization and Semantic Proto-Roles

Title Frame-Based Continuous Lexical Semantics through Exponential Family Tensor Factorization and Semantic Proto-Roles
Authors Francis Ferraro, Adam Poliak, Ryan Cotterell, Benjamin Van Durme
Abstract We study how different frame annotations complement one another when learning continuous lexical semantics. We learn the representations from a tensorized skip-gram model that consistently encodes syntactic-semantic content better, with multiple 10% gains over baselines.
Tasks
Published 2017-06-29
URL http://arxiv.org/abs/1706.09562v1
PDF http://arxiv.org/pdf/1706.09562v1.pdf
PWC https://paperswithcode.com/paper/frame-based-continuous-lexical-semantics
Repo https://github.com/fmof/tensor-factorization
Framework none

End-to-end Recovery of Human Shape and Pose

Title End-to-end Recovery of Human Shape and Pose
Authors Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik
Abstract We describe Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image. In contrast to most current methods that compute 2D or 3D joint locations, we produce a richer and more useful mesh representation that is parameterized by shape and 3D joint angles. The main objective is to minimize the reprojection loss of keypoints, which allow our model to be trained using images in-the-wild that only have ground truth 2D annotations. However, the reprojection loss alone leaves the model highly under constrained. In this work we address this problem by introducing an adversary trained to tell whether a human body parameter is real or not using a large database of 3D human meshes. We show that HMR can be trained with and without using any paired 2D-to-3D supervision. We do not rely on intermediate 2D keypoint detections and infer 3D pose and shape parameters directly from image pixels. Our model runs in real-time given a bounding box containing the person. We demonstrate our approach on various images in-the-wild and out-perform previous optimization based methods that output 3D meshes and show competitive results on tasks such as 3D joint location estimation and part segmentation.
Tasks 3D Human Pose Estimation
Published 2017-12-18
URL http://arxiv.org/abs/1712.06584v2
PDF http://arxiv.org/pdf/1712.06584v2.pdf
PWC https://paperswithcode.com/paper/end-to-end-recovery-of-human-shape-and-pose
Repo https://github.com/MandyMo/pytorch_HMR
Framework pytorch

Radially-Distorted Conjugate Translations

Title Radially-Distorted Conjugate Translations
Authors James Pritts, Zuzana Kukelova, Viktor Larsson, Ondrej Chum
Abstract This paper introduces the first minimal solvers that jointly solve for affine-rectification and radial lens distortion from coplanar repeated patterns. Even with imagery from moderately distorted lenses, plane rectification using the pinhole camera model is inaccurate or invalid. The proposed solvers incorporate lens distortion into the camera model and extend accurate rectification to wide-angle imagery, which is now common from consumer cameras. The solvers are derived from constraints induced by the conjugate translations of an imaged scene plane, which are integrated with the division model for radial lens distortion. The hidden-variable trick with ideal saturation is used to reformulate the constraints so that the solvers generated by the Grobner-basis method are stable, small and fast. Rectification and lens distortion are recovered from either one conjugately translated affine-covariant feature or two independently translated similarity-covariant features. The proposed solvers are used in a \RANSAC-based estimator, which gives accurate rectifications after few iterations. The proposed solvers are evaluated against the state-of-the-art and demonstrate significantly better rectifications on noisy measurements. Qualitative results on diverse imagery demonstrate high-accuracy undistortions and rectifications. The source code is publicly available at https://github.com/prittjam/repeats.
Tasks
Published 2017-11-30
URL http://arxiv.org/abs/1711.11339v3
PDF http://arxiv.org/pdf/1711.11339v3.pdf
PWC https://paperswithcode.com/paper/radially-distorted-conjugate-translations
Repo https://github.com/prittjam/repeats
Framework none

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

Title Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method
Authors Xu Sun, Xuancheng Ren, Shuming Ma, Bingzhen Wei, Wei Li, Jingjing Xu, Houfeng Wang, Yi Zhang
Abstract We propose a simple yet effective technique to simplify the training and the resulting model of neural networks. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-k elements (in terms of magnitude) are kept. As a result, only k rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction in the computational cost. Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications. Surprisingly, experimental results demonstrate that most of time we only need to update fewer than 5% of the weights at each back propagation pass. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The model simplification results show that we could adaptively simplify the model which could often be reduced by around 9x, without any loss on accuracy or even with improved accuracy. The codes, including the extension, are available at https://github.com/lancopku/meSimp
Tasks
Published 2017-11-17
URL http://arxiv.org/abs/1711.06528v2
PDF http://arxiv.org/pdf/1711.06528v2.pdf
PWC https://paperswithcode.com/paper/training-simplification-and-model
Repo https://github.com/jklj077/meProp
Framework pytorch

Low-shot learning with large-scale diffusion

Title Low-shot learning with large-scale diffusion
Authors Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou
Abstract This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime.
Tasks graph construction
Published 2017-06-07
URL http://arxiv.org/abs/1706.02332v3
PDF http://arxiv.org/pdf/1706.02332v3.pdf
PWC https://paperswithcode.com/paper/low-shot-learning-with-large-scale-diffusion
Repo https://github.com/facebookresearch/low-shot-with-diffusion
Framework pytorch

Sliced Wasserstein Generative Models

Title Sliced Wasserstein Generative Models
Authors Jiqing Wu, Zhiwu Huang, Dinesh Acharya, Wen Li, Janine Thoma, Danda Pani Paudel, Luc Van Gool
Abstract In generative modeling, the Wasserstein distance (WD) has emerged as a useful metric to measure the discrepancy between generated and real data distributions. Unfortunately, it is challenging to approximate the WD of high-dimensional distributions. In contrast, the sliced Wasserstein distance (SWD) factorizes high-dimensional distributions into their multiple one-dimensional marginal distributions and is thus easier to approximate. In this paper, we introduce novel approximations of the primal and dual SWD. Instead of using a large number of random projections, as it is done by conventional SWD approximation methods, we propose to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion. As concrete applications of our SWD approximations, we design two types of differentiable SWD blocks to equip modern generative frameworks—Auto-Encoders (AE) and Generative Adversarial Networks (GAN). In the experiments, we not only show the superiority of the proposed generative models on standard image synthesis benchmarks, but also demonstrate the state-of-the-art performance on challenging high resolution image and video generation in an unsupervised manner.
Tasks Image Generation, Video Generation
Published 2017-06-08
URL http://arxiv.org/abs/1706.02631v4
PDF http://arxiv.org/pdf/1706.02631v4.pdf
PWC https://paperswithcode.com/paper/sliced-wasserstein-generative-models
Repo https://github.com/musikisomorphie/swd
Framework tf

Do GANs actually learn the distribution? An empirical study

Title Do GANs actually learn the distribution? An empirical study
Authors Sanjeev Arora, Yi Zhang
Abstract Do GANS (Generative Adversarial Nets) actually learn the target distribution? The foundational paper of (Goodfellow et al 2014) suggested they do, if they were given sufficiently large deep nets, sample size, and computation time. A recent theoretical analysis in Arora et al (to appear at ICML 2017) raised doubts whether the same holds when discriminator has finite size. It showed that the training objective can approach its optimum value even if the generated distribution has very low support —in other words, the training objective is unable to prevent mode collapse. The current note reports experiments suggesting that such problems are not merely theoretical. It presents empirical evidence that well-known GANs approaches do learn distributions of fairly low support, and thus presumably are not learning the target distribution. The main technical contribution is a new proposed test, based upon the famous birthday paradox, for estimating the support size of the generated distribution.
Tasks
Published 2017-06-26
URL http://arxiv.org/abs/1706.08224v2
PDF http://arxiv.org/pdf/1706.08224v2.pdf
PWC https://paperswithcode.com/paper/do-gans-actually-learn-the-distribution-an
Repo https://github.com/Adi-iitd/Image-Compositing
Framework tf

The Merging Path Plot: adaptive fusing of k-groups with likelihood-based model selection

Title The Merging Path Plot: adaptive fusing of k-groups with likelihood-based model selection
Authors Agnieszka Sitko, Przemyslaw Biecek
Abstract There are many statistical tests that verify the null hypothesis: the variable of interest has the same distribution among k-groups. But once the null hypothesis is rejected, how to present the structure of dissimilarity between groups? In this article, we introduce The Merging Path Plot - a methodology, and factorMerger - an R package, for exploration and visualization of k-group dissimilarities. Comparison of k-groups is one of the most important issues in exploratory analyses and it has zillions of applications. The classical solution is to test a~null hypothesis that observations from all groups come from the same distribution. If the global null hypothesis is rejected, a~more detailed analysis of differences among pairs of groups is performed. The traditional approach is to use pairwise post hoc tests in order to verify which groups differ significantly. However, this approach fails with a large number of groups in both interpretation and visualization layer. The~Merging Path Plot methodology solves this problem by using an easy-to-understand description of dissimilarity among groups based on Likelihood Ratio Test (LRT) statistic.
Tasks Model Selection
Published 2017-09-13
URL http://arxiv.org/abs/1709.04412v2
PDF http://arxiv.org/pdf/1709.04412v2.pdf
PWC https://paperswithcode.com/paper/the-merging-path-plot-adaptive-fusing-of-k
Repo https://github.com/ModelOriented/DrWhy
Framework none

Bayesian Optimization for Parameter Tuning of the XOR Neural Network

Title Bayesian Optimization for Parameter Tuning of the XOR Neural Network
Authors Lawrence Stewart, Mark Stalzer
Abstract When applying Machine Learning techniques to problems, one must select model parameters to ensure that the system converges but also does not become stuck at the objective function’s local minimum. Tuning these parameters becomes a non-trivial task for large models and it is not always apparent if the user has found the optimal parameters. We aim to automate the process of tuning a Neural Network, (where only a limited number of parameter search attempts are available) by implementing Bayesian Optimization. In particular, by assigning Gaussian Process Priors to the parameter space, we utilize Bayesian Optimization to tune an Artificial Neural Network used to learn the XOR function, with the result of achieving higher prediction accuracy.
Tasks
Published 2017-09-22
URL http://arxiv.org/abs/1709.07842v2
PDF http://arxiv.org/pdf/1709.07842v2.pdf
PWC https://paperswithcode.com/paper/bayesian-optimization-for-parameter-tuning-of
Repo https://github.com/LawrenceMMStewart/Bayesian_Optimization
Framework none

Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation

Title Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation
Authors Rafael Izbicki, Ann B. Lee
Abstract There is a growing demand for nonparametric conditional density estimators (CDEs) in fields such as astronomy and economics. In astronomy, for example, one can dramatically improve estimates of the parameters that dictate the evolution of the Universe by working with full conditional densities instead of regression (i.e., conditional mean) estimates. More generally, standard regression falls short in any prediction problem where the distribution of the response is more complex with multi-modality, asymmetry or heteroscedastic noise. Nevertheless, much of the work on high-dimensional inference concerns regression and classification only, whereas research on density estimation has lagged behind. Here we propose FlexCode, a fully nonparametric approach to conditional density estimation that reformulates CDE as a non-parametric orthogonal series problem where the expansion coefficients are estimated by regression. By taking such an approach, one can efficiently estimate conditional densities and not just expectations in high dimensions by drawing upon the success in high-dimensional regression. Depending on the choice of regression procedure, our method can adapt to a variety of challenging high-dimensional settings with different structures in the data (e.g., a large number of irrelevant components and nonlinear manifold structure) as well as different data types (e.g., functional data, mixed data types and sample sets). We study the theoretical and empirical performance of our proposed method, and we compare our approach with traditional conditional density estimators on simulated as well as real-world data, such as photometric galaxy data, Twitter data, and line-of-sight velocities in a galaxy cluster.
Tasks Density Estimation
Published 2017-04-26
URL http://arxiv.org/abs/1704.08095v1
PDF http://arxiv.org/pdf/1704.08095v1.pdf
PWC https://paperswithcode.com/paper/converting-high-dimensional-regression-to
Repo https://github.com/rizbicki/FlexCoDE
Framework none

Boosted Generative Models

Title Boosted Generative Models
Authors Aditya Grover, Stefano Ermon
Abstract We propose a novel approach for using unsupervised boosting to create an ensemble of generative models, where models are trained in sequence to correct earlier mistakes. Our meta-algorithmic framework can leverage any existing base learner that permits likelihood evaluation, including recent deep expressive models. Further, our approach allows the ensemble to include discriminative models trained to distinguish real data from model-generated data. We show theoretical conditions under which incorporating a new model in the ensemble will improve the fit and empirically demonstrate the effectiveness of our black-box boosting algorithms on density estimation, classification, and sample generation on benchmark datasets for a wide range of generative models.
Tasks Density Estimation
Published 2017-02-27
URL http://arxiv.org/abs/1702.08484v2
PDF http://arxiv.org/pdf/1702.08484v2.pdf
PWC https://paperswithcode.com/paper/boosted-generative-models
Repo https://github.com/ermongroup/bgm
Framework tf

Solving high-dimensional partial differential equations using deep learning

Title Solving high-dimensional partial differential equations using deep learning
Authors Jiequn Han, Arnulf Jentzen, Weinan E
Abstract Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the “curse of dimensionality”. This paper introduces a deep learning-based approach that can handle general high-dimensional parabolic PDEs. To this end, the PDEs are reformulated using backward stochastic differential equations and the gradient of the unknown solution is approximated by neural networks, very much in the spirit of deep reinforcement learning with the gradient acting as the policy function. Numerical results on examples including the nonlinear Black-Scholes equation, the Hamilton-Jacobi-Bellman equation, and the Allen-Cahn equation suggest that the proposed algorithm is quite effective in high dimensions, in terms of both accuracy and cost. This opens up new possibilities in economics, finance, operational research, and physics, by considering all participating agents, assets, resources, or particles together at the same time, instead of making ad hoc assumptions on their inter-relationships.
Tasks
Published 2017-07-09
URL http://arxiv.org/abs/1707.02568v3
PDF http://arxiv.org/pdf/1707.02568v3.pdf
PWC https://paperswithcode.com/paper/solving-high-dimensional-partial-differential
Repo https://github.com/kousun12/TFDE
Framework tf
comments powered by Disqus