January 28, 2020

3621 words 17 mins read

Paper Group ANR 796

Paper Group ANR 796

GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild. Sim2real transfer learning for 3D human pose estimation: motion to the rescue. Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation. Bayesian Inference of Spacecraft Pose using …

GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild

Title GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild
Authors Alexander Grabner, Peter M. Roth, Vincent Lepetit
Abstract We present a joint 3D pose and focal length estimation approach for object categories in the wild. In contrast to previous methods that predict 3D poses independently of the focal length or assume a constant focal length, we explicitly estimate and integrate the focal length into the 3D pose estimation. For this purpose, we combine deep learning techniques and geometric algorithms in a two-stage approach: First, we estimate an initial focal length and establish 2D-3D correspondences from a single RGB image using a deep network. Second, we recover 3D poses and refine the focal length by minimizing the reprojection error of the predicted correspondences. In this way, we exploit the geometric prior given by the focal length for 3D pose estimation. This results in two advantages: First, we achieve significantly improved 3D translation and 3D pose accuracy compared to existing methods. Second, our approach finds a geometric consensus between the individual projection parameters, which is required for precise 2D-3D alignment. We evaluate our proposed approach on three challenging real-world datasets (Pix3D, Comp, and Stanford) with different object categories and significantly outperform the state-of-the-art by up to 20% absolute in multiple different metrics.
Tasks 3D Pose Estimation, Pose Estimation
Published 2019-08-07
URL https://arxiv.org/abs/1908.02809v1
PDF https://arxiv.org/pdf/1908.02809v1.pdf
PWC https://paperswithcode.com/paper/gp2c-geometric-projection-parameter-consensus
Repo
Framework

Sim2real transfer learning for 3D human pose estimation: motion to the rescue

Title Sim2real transfer learning for 3D human pose estimation: motion to the rescue
Authors Carl Doersch, Andrew Zisserman
Abstract Synthetic visual data can provide practically infinite diversity and rich labels, while avoiding ethical issues with privacy and bias. However, for many tasks, current models trained on synthetic data generalize poorly to real data. The task of 3D human pose estimation is a particularly interesting example of this sim2real problem, because learning-based approaches perform reasonably well given real training data, yet labeled 3D poses are extremely difficult to obtain in the wild, limiting scalability. In this paper, we show that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person’s motion, notably as optical flow and the motion of 2D keypoints. Therefore, our results suggest that motion can be a simple way to bridge a sim2real gap when video is available. We evaluate on the 3D Poses in the Wild dataset, the most challenging modern benchmark for 3D pose estimation, where we show full 3D mesh recovery that is on par with state-of-the-art methods trained on real 3D sequences, despite training only on synthetic humans from the SURREAL dataset.
Tasks 3D Human Pose Estimation, 3D Pose Estimation, Optical Flow Estimation, Pose Estimation, Transfer Learning
Published 2019-07-04
URL https://arxiv.org/abs/1907.02499v2
PDF https://arxiv.org/pdf/1907.02499v2.pdf
PWC https://paperswithcode.com/paper/sim2real-transfer-learning-for-3d-pose
Repo
Framework

Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation

Title Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation
Authors Xiaolong Ma, Geng Yuan, Sheng Lin, Caiwen Ding, Fuxun Yu, Tao Liu, Wujie Wen, Xiang Chen, Yanzhi Wang
Abstract The state-of-art DNN structures involve intensive computation and high memory storage. To mitigate the challenges, the memristor crossbar array has emerged as an intrinsically suitable matrix computation and low-power acceleration framework for DNN applications. However, the high accuracy solution for extreme model compression on memristor crossbar array architecture is still waiting for unraveling. In this paper, we propose a memristor-based DNN framework which combines both structured weight pruning and quantization by incorporating alternating direction method of multipliers (ADMM) algorithm for better pruning and quantization performance. We also discover the non-optimality of the ADMM solution in weight pruning and the unused data path in a structured pruned model. Motivated by these discoveries, we design a software-hardware co-optimization framework which contains the first proposed Network Purification and Unused Path Removal algorithms targeting on post-processing a structured pruned model after ADMM steps. By taking memristor hardware constraints into our whole framework, we achieve extreme high compression ratio on the state-of-art neural network structures with minimum accuracy loss. For quantizing structured pruned model, our framework achieves nearly no accuracy loss after quantizing weights to 8-bit memristor weight representation. We share our models at anonymous link https://bit.ly/2VnMUy0.
Tasks Model Compression, Quantization
Published 2019-08-27
URL https://arxiv.org/abs/1908.10017v1
PDF https://arxiv.org/pdf/1908.10017v1.pdf
PWC https://paperswithcode.com/paper/tiny-but-accurate-a-pruned-quantized-and
Repo
Framework

Bayesian Inference of Spacecraft Pose using Particle Filtering

Title Bayesian Inference of Spacecraft Pose using Particle Filtering
Authors Maxim Bazik, Brien Flewelling, Manoranjan Majji, Joseph Mundy
Abstract Automated 3D pose estimation of satellites and other known space objects is a critical component of space situational awareness. Ground-based imagery offers a convenient data source for satellite characterization; however, analysis algorithms must contend with atmospheric distortion, variable lighting, and unknown reflectance properties. Traditional feature-based pose estimation approaches are unable to discover an accurate correlation between a known 3D model and imagery given this challenging image environment. This paper presents an innovative method for automated 3D pose estimation of known space objects in the absence of satisfactory texture. The proposed approach fits the silhouette of a known satellite model to ground-based imagery via particle filtering. Each particle contains enough information (orientation, position, scale, model articulation) to generate an accurate object silhouette. The silhouette of individual particles is compared to an observed image. Comparison is done probabilistically by calculating the joint probability that pixels inside the silhouette belong to the foreground distribution and that pixels outside the silhouette belong to the background distribution. Both foreground and background distributions are computed by observing empty space. The population of particles are resampled at each new image observation, with the probability of a particle being resampled proportional to how the particle’s silhouette matches the observation image. The resampling process maintains multiple pose estimates which is beneficial in preventing and escaping local minimums. Experiments were conducted on both commercial imagery and on LEO satellite imagery. Imagery from the commercial experiments are shown in this paper.
Tasks 3D Pose Estimation, Bayesian Inference, Pose Estimation
Published 2019-06-26
URL https://arxiv.org/abs/1906.11182v1
PDF https://arxiv.org/pdf/1906.11182v1.pdf
PWC https://paperswithcode.com/paper/bayesian-inference-of-spacecraft-pose-using
Repo
Framework

Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network

Title Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Authors Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin
Abstract Explanation and high-order reasoning capabilities are crucial for real-world visual question answering with diverse levels of inference complexity (e.g., what is the dog that is near the girl playing with?) and important for users to understand and diagnose the trustworthiness of the system. Current VQA benchmarks on natural images with only an accuracy metric end up pushing the models to exploit the dataset biases and cannot provide any interpretable justification, which severally hinders advances in high-level question answering. In this work, we propose a new HVQR benchmark for evaluating explainable and high-order visual question reasoning ability with three distinguishable merits: 1) the questions often contain one or two relationship triplets, which requires the model to have the ability of multistep reasoning to predict plausible answers; 2) we provide an explicit evaluation on a multistep reasoning process that is constructed with image scene graphs and commonsense knowledge bases; and 3) each relationship triplet in a large-scale knowledge base only appears once among all questions, which poses challenges for existing networks that often attempt to overfit the knowledge base that already appears in the training set and enforces the models to handle unseen questions and knowledge fact usage. We also propose a new knowledge-routed modular network (KM-net) that incorporates the multistep reasoning process over a large knowledge base into visual question reasoning. An extensive dataset analysis and comparisons with existing models on the HVQR benchmark show that our benchmark provides explainable evaluations, comprehensive reasoning requirements and realistic challenges of VQA systems, as well as our KM-net’s superiority in terms of accuracy and explanation ability.
Tasks Question Answering, Visual Question Answering
Published 2019-09-23
URL https://arxiv.org/abs/1909.10128v1
PDF https://arxiv.org/pdf/1909.10128v1.pdf
PWC https://paperswithcode.com/paper/190910128
Repo
Framework

Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning

Title Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning
Authors Alex Kearney, Vivek Veeriah, Jaden Travnik, Patrick M. Pilarski, Richard S. Sutton
Abstract There is a long history of using meta learning as representation learning, specifically for determining the relevance of inputs. In this paper, we examine an instance of meta-learning in which feature relevance is learned by adapting step size parameters of stochastic gradient descent—building on a variety of prior work in stochastic approximation, machine learning, and artificial neural networks. In particular, we focus on stochastic meta-descent introduced in the Incremental Delta-Bar-Delta (IDBD) algorithm for setting individual step sizes for each feature of a linear function approximator. Using IDBD, a feature with large or small step sizes will have a large or small impact on generalization from training examples. As a main contribution of this work, we extend IDBD to temporal-difference (TD) learning—a form of learning which is effective in sequential, non i.i.d. problems. We derive a variety of IDBD generalizations for TD learning, demonstrating that they are able to distinguish which features are relevant and which are not. We demonstrate that TD IDBD is effective at learning feature relevance in both an idealized gridworld and a real-world robotic prediction task.
Tasks Meta-Learning, Representation Learning
Published 2019-03-08
URL http://arxiv.org/abs/1903.03252v1
PDF http://arxiv.org/pdf/1903.03252v1.pdf
PWC https://paperswithcode.com/paper/learning-feature-relevance-through-step-size
Repo
Framework

From Variational to Deterministic Autoencoders

Title From Variational to Deterministic Autoencoders
Authors Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari, Michael Black, Bernhard Schölkopf
Abstract Variational Autoencoders (VAEs) provide a theoretically-backed and popular framework for deep generative models. However, learning a VAE from data poses still unanswered theoretical questions and considerable practical challenges. In this work, we propose an alternative framework for generative modeling that is simpler, easier to train, and deterministic, yet has many of the advantages of the VAE. We observe that sampling a stochastic encoder in a Gaussian VAE can be interpreted as simply injecting noise into the input of a deterministic decoder. We investigate how substituting this kind of stochasticity, with other explicit and implicit regularization schemes, can lead to an equally smooth and meaningful latent space without having to force it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism to sample new data points, we introduce an ex-post density estimation step that can be readily applied to the proposed framework as well as existing VAEs, improving their sample quality. We show, in a rigorous empirical study, that the proposed regularized deterministic autoencoders are able to generate samples that are comparable to, or better than, those of VAEs and more powerful alternatives when applied to images as well as to structured data such as molecules.
Tasks Density Estimation
Published 2019-03-29
URL https://arxiv.org/abs/1903.12436v3
PDF https://arxiv.org/pdf/1903.12436v3.pdf
PWC https://paperswithcode.com/paper/from-variational-to-deterministic
Repo
Framework

Knowledge Hypergraphs: Prediction Beyond Binary Relations

Title Knowledge Hypergraphs: Prediction Beyond Binary Relations
Authors Bahare Fatemi, Perouz Taslakian, David Vazquez, David Poole
Abstract Knowledge graphs store facts using relations between pairs of entities. In this work, we address the question of link prediction in knowledge hypergraphs where each relation is defined on any number of entities. While there exist techniques (such as reification) that convert the non-binary relations of a knowledge hypergraph into binary ones, current embedding-based methods for knowledge graph completion do not work well out of the box for knowledge graphs obtained through these techniques. Thus we introduce HSimplE and HypE, two embedding-based methods that work directly with knowledge hypergraphs in which the representation of an entity is a function of its position in the relation. We also develop public benchmarks and baselines for this task and show experimentally that the proposed models are more effective than the baselines. Our experiments show that HypE outperforms HSimplE when trained with fewer parameters and when tested on samples that contain at least one entity in a position never encountered during training.
Tasks Knowledge Graph Completion, Knowledge Graphs, Link Prediction
Published 2019-06-01
URL https://arxiv.org/abs/1906.00137v2
PDF https://arxiv.org/pdf/1906.00137v2.pdf
PWC https://paperswithcode.com/paper/190600137
Repo
Framework

AI for Explaining Decisions in Multi-Agent Environments

Title AI for Explaining Decisions in Multi-Agent Environments
Authors Sarit Kraus, Amos Azaria, Jelena Fiosina, Maike Greve, Noam Hazon, Lutz Kolbe, Tim-Benjamin Lembcke, Jörg P. Müller, Sören Schleibaum, Mark Vollrath
Abstract Explanation is necessary for humans to understand and accept decisions made by an AI system when the system’s goal is known. It is even more important when the AI system makes decisions in multi-agent environments where the human does not know the systems’ goals since they may depend on other agents’ preferences. In such situations, explanations should aim to increase user satisfaction, taking into account the system’s decision, the user’s and the other agents’ preferences, the environment settings and properties such as fairness, envy and privacy. Generating explanations that will increase user satisfaction is very challenging; to this end, we propose a new research direction: xMASE. We then review the state of the art and discuss research directions towards efficient methodologies and algorithms for generating explanations that will increase users’ satisfaction from AI system’s decisions in multi-agent environments.
Tasks
Published 2019-10-10
URL https://arxiv.org/abs/1910.04404v2
PDF https://arxiv.org/pdf/1910.04404v2.pdf
PWC https://paperswithcode.com/paper/ai-for-explaining-decisions-in-multi-agent
Repo
Framework

Support Feature Machines

Title Support Feature Machines
Authors Tomasz Maszczyk, Włodzisław Duch
Abstract Support Vector Machines (SVMs) with various kernels have played dominant role in machine learning for many years, finding numerous applications. Although they have many attractive features interpretation of their solutions is quite difficult, the use of a single kernel type may not be appropriate in all areas of the input space, convergence problems for some kernels are not uncommon, the standard quadratic programming solution has $O(m^3)$ time and $O(m^2)$ space complexity for $m$ training patterns. Kernel methods work because they implicitly provide new, useful features. Such features, derived from various kernels and other vector transformations, may be used directly in any machine learning algorithm, facilitating multiresolution, heterogeneous models of data. Therefore Support Feature Machines (SFM) based on linear models in the extended feature spaces, enabling control over selection of support features, give at least as good results as any kernel-based SVMs, removing all problems related to interpretation, scaling and convergence. This is demonstrated for a number of benchmark datasets analyzed with linear discrimination, SVM, decision trees and nearest neighbor methods.
Tasks
Published 2019-01-28
URL http://arxiv.org/abs/1901.09643v1
PDF http://arxiv.org/pdf/1901.09643v1.pdf
PWC https://paperswithcode.com/paper/support-feature-machines
Repo
Framework

An Adaptive Training-less System for Anomaly Detection in Crowd Scenes

Title An Adaptive Training-less System for Anomaly Detection in Crowd Scenes
Authors Arindam Sikdar, Ananda S. Chowdhury
Abstract Anomaly detection in crowd videos has become a popular area of research for the computer vision community. Several existing methods generally perform a prior training about the scene with or without the use of labeled data. However, it is difficult to always guarantee the availability of prior data, especially, for scenarios like remote area surveillance. To address such challenge, we propose an adaptive training-less system capable of detecting anomaly on-the-fly while dynamically estimating and adjusting response based on certain parameters. This makes our system both training-less and adaptive in nature. Our pipeline consists of three main components, namely, adaptive 3D-DCT model for multi-object detection-based association, local motion structure description through saliency modulated optic flow, and anomaly detection based on earth movers distance (EMD). The proposed model, despite being training-free, is found to achieve comparable performance with several state-of-the-art methods on the publicly available UCSD, UMN, CHUK-Avenue and ShanghaiTech datasets.
Tasks Anomaly Detection, Object Detection
Published 2019-06-03
URL https://arxiv.org/abs/1906.00705v1
PDF https://arxiv.org/pdf/1906.00705v1.pdf
PWC https://paperswithcode.com/paper/190600705
Repo
Framework

Multidimensional ground reaction forces and moments from wearable sensor accelerations via deep learning

Title Multidimensional ground reaction forces and moments from wearable sensor accelerations via deep learning
Authors William R. Johnson, Ajmal Mian, Mark A. Robinson, Jasper Verheul, David G. Lloyd, Jacqueline A. Alderson
Abstract Objective: Monitoring athlete internal workload exposure, including prevention of catastrophic non-contact knee injuries, relies on the existence of a custom early-warning detection system. This system must be able to estimate accurate, reliable, and valid musculoskeletal joint loads, for sporting maneuvers in near real-time and during match play. However, current methods are constrained to laboratory instrumentation, are labor and cost intensive, and require highly trained specialist knowledge, thereby limiting their ecological validity and volume deployment. Methods: Here we show that kinematic data obtained from wearable sensor accelerometers, in lieu of embedded force platforms, can leverage recent supervised learning techniques to predict in-game near real-time multidimensional ground reaction forces and moments (GRF/M). Competing convolutional neural network (CNN) deep learning models were trained using laboratory-derived stance phase GRF/M data and simulated sensor accelerations for running and sidestepping maneuvers derived from nearly half a million legacy motion trials. Then, predictions were made from each model driven by five sensor accelerations recorded during independent inter-laboratory data capture sessions. Results: Despite adversarial conditions, the proposed deep learning workbench achieved correlations to ground truth, by GRF component, of vertical 0.9663, anterior 0.9579 (both running), and lateral 0.8737 (sidestepping). Conclusion: The lessons learned from this study will facilitate the use of wearable sensors in conjunction with deep learning to accurately estimate near real-time on-field GRF/M. Significance: Coaching, medical, and allied health staff can use this technology to monitor a range of joint loading indicators during game play, with the ultimate aim to minimize the occurrence of non-contact injuries in elite and community-level sports.
Tasks
Published 2019-03-18
URL http://arxiv.org/abs/1903.07221v2
PDF http://arxiv.org/pdf/1903.07221v2.pdf
PWC https://paperswithcode.com/paper/multidimensional-ground-reaction-forces-and
Repo
Framework

Image Fusion via Sparse Regularization with Non-Convex Penalties

Title Image Fusion via Sparse Regularization with Non-Convex Penalties
Authors Nantheera Anantrasirichai, Rencheng Zheng, Ivan Selesnick, Alin Achim
Abstract The L1 norm regularized least squares method is often used for finding sparse approximate solutions and is widely used in 1-D signal restoration. Basis pursuit denoising (BPD) performs noise reduction in this way. However, the shortcoming of using L1 norm regularization is the underestimation of the true solution. Recently, a class of non-convex penalties have been proposed to improve this situation. This kind of penalty function is non-convex itself, but preserves the convexity property of the whole cost function. This approach has been confirmed to offer good performance in 1-D signal denoising. This paper demonstrates the aforementioned method to 2-D signals (images) and applies it to multisensor image fusion. The problem is posed as an inverse one and a corresponding cost function is judiciously designed to include two data attachment terms. The whole cost function is proved to be convex upon suitably choosing the non-convex penalty, so that the cost function minimization can be tackled by convex optimization approaches, which comprise simple computations. The performance of the proposed method is benchmarked against a number of state-of-the-art image fusion techniques and superior performance is demonstrated both visually and in terms of various assessment measures.
Tasks Denoising
Published 2019-05-23
URL https://arxiv.org/abs/1905.09645v3
PDF https://arxiv.org/pdf/1905.09645v3.pdf
PWC https://paperswithcode.com/paper/image-fusion-via-sparse-regularization-with
Repo
Framework

DeepBiRD: An Automatic Bibliographic Reference Detection Approach

Title DeepBiRD: An Automatic Bibliographic Reference Detection Approach
Authors Syed Tahseen Raza Rizvi, Andreas Dengel, Sheraz Ahmed
Abstract The contribution of this paper is two fold. First, it presents a novel approach called DeepBiRD which is inspired from human visual perception and exploits layout features to identify individual references in a scientific publication. Second, we present a new dataset for image-based reference detection with 2401 scans containing 12244 references, all manually annotated for individual reference. Our proposed approach consists of two stages, firstly it identifies whether given document image is single column or multi-column. Using this information, document image is then splitted into individual columns. Secondly it performs layout driven reference detection using Mask R-CNN in a given scientific publication. DeepBiRD was evaluated on two different datasets to demonstrate the generalization of this approach. The proposed system achieved an F-measure of 0.96 on our dataset. DeepBiRD detected 2.5 times more references than current state-of-the-art approach on their own dataset. Therefore, suggesting that DeepBiRD is significantly superior in performance, generalizable and independent of any domain or referencing style.
Tasks
Published 2019-12-16
URL https://arxiv.org/abs/1912.07266v1
PDF https://arxiv.org/pdf/1912.07266v1.pdf
PWC https://paperswithcode.com/paper/deepbird-an-automatic-bibliographic-reference
Repo
Framework

GAN-based Multiple Adjacent Brain MRI Slice Reconstruction for Unsupervised Alzheimer’s Disease Diagnosis

Title GAN-based Multiple Adjacent Brain MRI Slice Reconstruction for Unsupervised Alzheimer’s Disease Diagnosis
Authors Changhee Han, Leonardo Rundo, Kohei Murao, Zoltán Ádám Milacski, Kazuki Umemoto, Evis Sala, Hideki Nakayama, Shin’ichi Satoh
Abstract Unsupervised learning can discover various unseen diseases, relying on large-scale unannotated medical images of healthy subjects. Towards this, unsupervised methods reconstruct a single medical image to detect outliers either in the learned feature space or from high reconstruction loss. However, without considering continuity between multiple adjacent slices, they cannot directly discriminate diseases composed of the accumulation of subtle anatomical anomalies, such as Alzheimer’s Disease (AD). Moreover, no study has shown how unsupervised anomaly detection is associated with disease stages. Therefore, we propose a two-step method using Generative Adversarial Network-based multiple adjacent brain MRI slice reconstruction to detect AD at various stages: (Reconstruction) Wasserstein loss with Gradient Penalty + L1 loss—trained on 3 healthy slices to reconstruct the next 3 ones—reconstructs unseen healthy/AD cases; (Diagnosis) Average/Maximum loss (e.g., L2 loss) per scan discriminates them, comparing the reconstructed/ground truth images. The results show that we can reliably detect AD at a very early stage with Area Under the Curve (AUC) 0.780 while also detecting AD at a late stage much more accurately with AUC 0.917; since our method is fully unsupervised, it should also discover and alert any anomalies including rare disease.
Tasks Anomaly Detection, Unsupervised Anomaly Detection
Published 2019-06-14
URL https://arxiv.org/abs/1906.06114v5
PDF https://arxiv.org/pdf/1906.06114v5.pdf
PWC https://paperswithcode.com/paper/gan-based-multiple-adjacent-brain-mri-slice
Repo
Framework
comments powered by Disqus