January 27, 2020

2895 words 14 mins read

Paper Group ANR 1237

Paper Group ANR 1237

Spooky effect in optimal OSPA estimation and how GOSPA solves it. View-invariant Deep Architecture for Human Action Recognition using late fusion. Vision-model-based Real-time Localization of Unmanned Aerial Vehicle for Autonomous Structure Inspection under GPS-denied Environment. RSA: Randomized Simulation as Augmentation for Robust Human Action R …

Spooky effect in optimal OSPA estimation and how GOSPA solves it

Title Spooky effect in optimal OSPA estimation and how GOSPA solves it
Authors Ángel F. García-Fernández, Lennart Svensson
Abstract In this paper, we show the spooky effect at a distance that arises in optimal estimation of multiple targets with the optimal sub-pattern assignment (OSPA) metric. This effect refers to the fact that if we have several independent potential targets at distant locations, a change in the probability of existence of one of them can completely change the optimal estimation of the rest of the potential targets. As opposed to OSPA, the generalised OSPA (GOSPA) metric ($\alpha=2$) penalises localisation errors for properly detected targets, false targets and missed targets. As a consequence, optimal GOSPA estimation aims to lower the number of false and missed targets, as well as the localisation error for properly detected targets, and avoids the spooky effect.
Tasks
Published 2019-08-23
URL https://arxiv.org/abs/1908.08815v1
PDF https://arxiv.org/pdf/1908.08815v1.pdf
PWC https://paperswithcode.com/paper/spooky-effect-in-optimal-ospa-estimation-and
Repo
Framework

View-invariant Deep Architecture for Human Action Recognition using late fusion

Title View-invariant Deep Architecture for Human Action Recognition using late fusion
Authors Chhavi Dhiman, Dinesh Kumar Vishwakarma
Abstract Human action Recognition for unknown views is a challenging task. We propose a view-invariant deep human action recognition framework, which is a novel integration of two important action cues: motion and shape temporal dynamics (STD). The motion stream encapsulates the motion content of action as RGB Dynamic Images (RGB-DIs) which are processed by the fine-tuned InceptionV3 model. The STD stream learns long-term view-invariant shape dynamics of action using human pose model (HPM) based view-invariant features mined from structural similarity index matrix (SSIM) based key depth human pose frames. To predict the score of the test sample, three types of late fusion (maximum, average and product) techniques are applied on individual stream scores. To validate the performance of the proposed novel framework the experiments are performed using both cross subject and cross-view validation schemes on three publically available benchmarks- NUCLA multi-view dataset, UWA3D-II Activity dataset and NTU RGB-D Activity dataset. Our algorithm outperforms with existing state-of-the-arts significantly that is reported in terms of accuracy, receiver operating characteristic (ROC) curve and area under the curve (AUC).
Tasks Temporal Action Localization
Published 2019-12-08
URL https://arxiv.org/abs/1912.03632v1
PDF https://arxiv.org/pdf/1912.03632v1.pdf
PWC https://paperswithcode.com/paper/view-invariant-deep-architecture-for-human
Repo
Framework

Vision-model-based Real-time Localization of Unmanned Aerial Vehicle for Autonomous Structure Inspection under GPS-denied Environment

Title Vision-model-based Real-time Localization of Unmanned Aerial Vehicle for Autonomous Structure Inspection under GPS-denied Environment
Authors Zhexiong Shang, Zhigang Shen
Abstract UAVs have been widely used in visual inspections of buildings, bridges and other structures. In either outdoor autonomous or semi-autonomous flights missions strong GPS signal is vital for UAV to locate its own positions. However, strong GPS signal is not always available, and it can degrade or fully loss underneath large structures or close to power lines, which can cause serious control issues or even UAV crashes. Such limitations highly restricted the applications of UAV as a routine inspection tool in various domains. In this paper a vision-model-based real-time self-positioning method is proposed to support autonomous aerial inspection without the need of GPS support. Compared to other localization methods that requires additional onboard sensors, the proposed method uses a single camera to continuously estimate the inflight poses of UAV. Each step of the proposed method is discussed in detail, and its performance is tested through an indoor test case.
Tasks
Published 2019-04-10
URL http://arxiv.org/abs/1904.04987v1
PDF http://arxiv.org/pdf/1904.04987v1.pdf
PWC https://paperswithcode.com/paper/vision-model-based-real-time-localization-of
Repo
Framework

RSA: Randomized Simulation as Augmentation for Robust Human Action Recognition

Title RSA: Randomized Simulation as Augmentation for Robust Human Action Recognition
Authors Yi Zhang, Xinyue Wei, Weichao Qiu, Zihao Xiao, Gregory D. Hager, Alan Yuille
Abstract Despite the rapid growth in datasets for video activity, stable robust activity recognition with neural networks remains challenging. This is in large part due to the explosion of possible variation in video – including lighting changes, object variation, movement variation, and changes in surrounding context. An alternative is to make use of simulation data, where all of these factors can be artificially controlled. In this paper, we propose the Randomized Simulation as Augmentation (RSA) framework which augments real-world training data with synthetic data to improve the robustness of action recognition networks. We generate large-scale synthetic datasets with randomized nuisance factors. We show that training with such extra data, when appropriately constrained, can significantly improve the performance of the state-of-the-art I3D networks or, conversely, reduce the number of labeled real videos needed to achieve good performance. Experiments on two real-world datasets NTU RGB+D and VIRAT demonstrate the effectiveness of our method.
Tasks Activity Recognition, Temporal Action Localization
Published 2019-12-03
URL https://arxiv.org/abs/1912.01180v1
PDF https://arxiv.org/pdf/1912.01180v1.pdf
PWC https://paperswithcode.com/paper/rsa-randomized-simulation-as-augmentation-for
Repo
Framework

An Accelerated Correlation Filter Tracker

Title An Accelerated Correlation Filter Tracker
Authors Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler
Abstract Recent visual object tracking methods have witnessed a continuous improvement in the state-of-the-art with the development of efficient discriminative correlation filters (DCF) and robust deep neural network features. Despite the outstanding performance achieved by the above combination, existing advanced trackers suffer from the burden of high computational complexity of the deep feature extraction and online model learning. We propose an accelerated ADMM optimisation method obtained by adding a momentum to the optimisation sequence iterates, and by relaxing the impact of the error between DCF parameters and their norm. The proposed optimisation method is applied to an innovative formulation of the DCF design, which seeks the most discriminative spatially regularised feature channels. A further speed up is achieved by an adaptive initialisation of the filter optimisation process. The significantly increased convergence of the DCF filter is demonstrated by establishing the optimisation process equivalence with a continuous dynamical system for which the convergence properties can readily be derived. The experimental results obtained on several well-known benchmarking datasets demonstrate the efficiency and robustness of the proposed ACFT method, with a tracking accuracy comparable to the start-of-the-art trackers.
Tasks Object Tracking, Visual Object Tracking
Published 2019-12-05
URL https://arxiv.org/abs/1912.02854v1
PDF https://arxiv.org/pdf/1912.02854v1.pdf
PWC https://paperswithcode.com/paper/an-accelerated-correlation-filter-tracker
Repo
Framework

Efficient and Adaptive Kernelization for Nonlinear Max-margin Multi-view Learning

Title Efficient and Adaptive Kernelization for Nonlinear Max-margin Multi-view Learning
Authors Changying Du, Jia He, Changde Du, Fuzhen Zhuang, Qing He, Guoping Long
Abstract Existing multi-view learning methods based on kernel function either require the user to select and tune a single predefined kernel or have to compute and store many Gram matrices to perform multiple kernel learning. Apart from the huge consumption of manpower, computation and memory resources, most of these models seek point estimation of their parameters, and are prone to overfitting to small training data. This paper presents an adaptive kernel nonlinear max-margin multi-view learning model under the Bayesian framework. Specifically, we regularize the posterior of an efficient multi-view latent variable model by explicitly mapping the latent representations extracted from multiple data views to a random Fourier feature space where max-margin classification constraints are imposed. Assuming these random features are drawn from Dirichlet process Gaussian mixtures, we can adaptively learn shift-invariant kernels from data according to Bochners theorem. For inference, we employ the data augmentation idea for hinge loss, and design an efficient gradient-based MCMC sampler in the augmented space. Having no need to compute the Gram matrix, our algorithm scales linearly with the size of training set. Extensive experiments on real-world datasets demonstrate that our method has superior performance.
Tasks Data Augmentation, MULTI-VIEW LEARNING
Published 2019-10-11
URL https://arxiv.org/abs/1910.05250v1
PDF https://arxiv.org/pdf/1910.05250v1.pdf
PWC https://paperswithcode.com/paper/efficient-and-adaptive-kernelization-for
Repo
Framework

Texture and Structure Two-view Classification of Images

Title Texture and Structure Two-view Classification of Images
Authors Samah Khawaled, Michael Zibulevsky, Yehoshua Y. Zeevi
Abstract Textural and structural features can be regraded as “two-view” feature sets. Inspired by the recent progress in multi-view learning, we propose a novel two-view classification method that models each feature set and optimizes the process of merging these views efficiently. Examples of implementation of this approach in classification of real-world data are presented, with special emphasis on medical images. We firstly decompose fully-textured images into two layers of representation, corresponding to natural stochastic textures (NST) and structural layer, respectively. The structural, edge-and-curve-type, information is mostly represented by the local spatial phase, whereas, the pure NST has random phase and is characterized by Gaussianity and self-similarity. Therefore, the NST is modeled by the 2D self-similar process, fractional Brownian motion (fBm). The Hurst parameter, characteristic of fBm, specifies the roughness or irregularity of the texture. This leads us to its estimation and implementation along other features extracted from the structure layer, to build the “two-view” features sets used in our classification scheme. A shallow neural net (NN) is exploited to execute the process of merging these feature sets, in a straightforward and efficient manner.
Tasks MULTI-VIEW LEARNING
Published 2019-08-25
URL https://arxiv.org/abs/1908.09264v1
PDF https://arxiv.org/pdf/1908.09264v1.pdf
PWC https://paperswithcode.com/paper/texture-and-structure-two-view-classification
Repo
Framework

Weight of Evidence as a Basis for Human-Oriented Explanations

Title Weight of Evidence as a Basis for Human-Oriented Explanations
Authors David Alvarez-Melis, Hal Daumé III, Jennifer Wortman Vaughan, Hanna Wallach
Abstract Interpretability is an elusive but highly sought-after characteristic of modern machine learning methods. Recent work has focused on interpretability via $\textit{explanations}$, which justify individual model predictions. In this work, we take a step towards reconciling machine explanations with those that humans produce and prefer by taking inspiration from the study of explanation in philosophy, cognitive science, and the social sciences. We identify key aspects in which these human explanations differ from current machine explanations, distill them into a list of desiderata, and formalize them into a framework via the notion of $\textit{weight of evidence}$ from information theory. Finally, we instantiate this framework in two simple applications and show it produces intuitive and comprehensible explanations.
Tasks
Published 2019-10-29
URL https://arxiv.org/abs/1910.13503v1
PDF https://arxiv.org/pdf/1910.13503v1.pdf
PWC https://paperswithcode.com/paper/weight-of-evidence-as-a-basis-for-human
Repo
Framework

Extracting Frequent Gradual Patterns Using Constraints Modeling

Title Extracting Frequent Gradual Patterns Using Constraints Modeling
Authors Jerry Lonlac, Saïdd Jabbour, Engelbert Mephu Nguifo, Lakhdar Saïs, Badran Raddaoui
Abstract In this paper, we propose a constraint-based modeling approach for the problem of discovering frequent gradual patterns in a numerical dataset. This SAT-based declarative approach offers an additional possibility to benefit from the recent progress in satisfiability testing and to exploit the efficiency of modern SAT solvers for enumerating all frequent gradual patterns in a numerical dataset. Our approach can easily be extended with extra constraints, such as temporal constraints in order to extract more specific patterns in a broad range of gradual patterns mining applications. We show the practical feasibility of our SAT model by running experiments on two real world datasets.
Tasks
Published 2019-03-20
URL http://arxiv.org/abs/1903.08452v1
PDF http://arxiv.org/pdf/1903.08452v1.pdf
PWC https://paperswithcode.com/paper/extracting-frequent-gradual-patterns-using
Repo
Framework

Generating and Sampling Orbits for Lifted Probabilistic Inference

Title Generating and Sampling Orbits for Lifted Probabilistic Inference
Authors Steven Holtzen, Todd Millstein, Guy Van den Broeck
Abstract A key goal in the design of probabilistic inference algorithms is identifying and exploiting properties of the distribution that make inference tractable. Lifted inference algorithms identify symmetry as a property that enables efficient inference and seek to scale with the degree of symmetry of a probability model. A limitation of existing exact lifted inference techniques is that they do not apply to non-relational representations like factor graphs. In this work we provide the first example of an exact lifted inference algorithm for arbitrary discrete factor graphs. In addition we describe a lifted Markov-Chain Monte-Carlo algorithm that provably mixes rapidly in the degree of symmetry of the distribution.
Tasks
Published 2019-03-12
URL https://arxiv.org/abs/1903.04672v3
PDF https://arxiv.org/pdf/1903.04672v3.pdf
PWC https://paperswithcode.com/paper/generating-and-sampling-orbits-for-lifted
Repo
Framework

FollowMeUp Sports: New Benchmark for 2D Human Keypoint Recognition

Title FollowMeUp Sports: New Benchmark for 2D Human Keypoint Recognition
Authors Ying Huang, Bin Sun, Haipeng Kan, Jiankai Zhuang, Zengchang Qin
Abstract Human pose estimation has made significant advancement in recent years. However, the existing datasets are limited in their coverage of pose variety. In this paper, we introduce a novel benchmark FollowMeUp Sports that makes an important advance in terms of specific postures, self-occlusion and class balance, a contribution that we feel is required for future development in human body models. This comprehensive dataset was collected using an established taxonomy of over 200 standard workout activities with three different shot angles. The collected videos cover a wider variety of specific workout activities than previous datasets including push-up, squat and body moving near the ground with severe self-occlusion or occluded by some sport equipment and outfits. Given these rich images, we perform a detailed analysis of the leading human pose estimation approaches gaining insights for the success and failures of these methods.
Tasks Pose Estimation
Published 2019-11-19
URL https://arxiv.org/abs/1911.08344v1
PDF https://arxiv.org/pdf/1911.08344v1.pdf
PWC https://paperswithcode.com/paper/followmeup-sports-new-benchmark-for-2d-human
Repo
Framework

More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation

Title More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation
Authors Quanfu Fan, Chun-Fu Chen, Hilde Kuehne, Marco Pistoia, David Cox
Abstract Current state-of-the-art models for video action recognition are mostly based on expensive 3D ConvNets. This results in a need for large GPU clusters to train and evaluate such architectures. To address this problem, we present a lightweight and memory-friendly architecture for action recognition that performs on par with or better than current architectures by using only a fraction of resources. The proposed architecture is based on a combination of a deep subnet operating on low-resolution frames with a compact subnet operating on high-resolution frames, allowing for high efficiency and accuracy at the same time. We demonstrate that our approach achieves a reduction by $3\sim4$ times in FLOPs and $\sim2$ times in memory usage compared to the baseline. This enables training deeper models with more input frames under the same computational budget. To further obviate the need for large-scale 3D convolutions, a temporal aggregation module is proposed to model temporal dependencies in a video at very small additional computational costs. Our models achieve strong performance on several action recognition benchmarks including Kinetics, Something-Something and Moments-in-time. The code and models are available at https://github.com/IBM/bLVNet-TAM.
Tasks Temporal Action Localization
Published 2019-12-02
URL https://arxiv.org/abs/1912.00869v1
PDF https://arxiv.org/pdf/1912.00869v1.pdf
PWC https://paperswithcode.com/paper/more-is-less-learning-efficient-video-1
Repo
Framework

A Multi-View Discriminant Learning Approach for Indoor Localization Using Bimodal Features of CSI

Title A Multi-View Discriminant Learning Approach for Indoor Localization Using Bimodal Features of CSI
Authors Tahsina Farah Sanam, Hana Godrich
Abstract With the growth of location-based services, indoor localization is attracting great interests as it facilitates further ubiquitous environments. Specifically, device free localization using wireless signals is getting increased attention as human location is estimated using its impact on the surrounding wireless signals without any active device tagged with subject. In this paper, we propose MuDLoc, the first multi-view discriminant learning approach for device free indoor localization using both amplitude and phase features of Channel State Information (CSI) from multiple APs. Multi-view learning is an emerging technique in machine learning which improve performance by utilizing diversity from different view data. In MuDLoc, the localization is modeled as a pattern matching problem, where the target location is predicted based on similarity measure of CSI features of an unknown location with those of the training locations. MuDLoc implements Generalized Inter-view and Intra-view Discriminant Correlation Analysis (GI$^{2}$DCA), a discriminative feature extraction approach using multi-view CSIs. It incorporates inter-view and intra-view class associations while maximizing pairwise correlations across multi-view data sets. A similarity measure is performed to find the best match to localize a subject. Experimental results from two cluttered environments show that MuDLoc can estimate location with high accuracy which outperforms other benchmark approaches.
Tasks MULTI-VIEW LEARNING
Published 2019-08-13
URL https://arxiv.org/abs/1908.07370v1
PDF https://arxiv.org/pdf/1908.07370v1.pdf
PWC https://paperswithcode.com/paper/a-multi-view-discriminant-learning-approach
Repo
Framework

Deep execution monitor for robot assistive tasks

Title Deep execution monitor for robot assistive tasks
Authors Lorenzo Mauro, Edoardo Alati, Marta Sanzari, Valsamis Ntouskos, Gianluca Massimiani, Fiora Pirri
Abstract We consider a novel approach to high-level robot task execution for a robot assistive task. In this work we explore the problem of learning to predict the next subtask by introducing a deep model for both sequencing goals and for visually evaluating the state of a task. We show that deep learning for monitoring robot tasks execution very well supports the interconnection between task-level planning and robot operations. These solutions can also cope with the natural non-determinism of the execution monitor. We show that a deep execution monitor leverages robot performance. We measure the improvement taking into account some robot helping tasks performed at a warehouse.
Tasks
Published 2019-02-07
URL http://arxiv.org/abs/1902.02877v1
PDF http://arxiv.org/pdf/1902.02877v1.pdf
PWC https://paperswithcode.com/paper/deep-execution-monitor-for-robot-assistive
Repo
Framework
Title Fiber Nonlinearity Mitigation via the Parzen Window Classifier for Dispersion Managed and Unmanaged Links
Authors Abdelkerim Amari, Xiang Lin, Octavia A. Dobre, Ramachandran Venkatesan, Alex Alvarado
Abstract Machine learning techniques have recently received significant attention as promising approaches to deal with the optical channel impairments, and in particular, the nonlinear effects. In this work, a machine learning-based classification technique, known as the Parzen window (PW) classifier, is applied to mitigate the nonlinear effects in the optical channel. The PW classifier is used as a detector with improved nonlinear decision boundaries more adapted to the nonlinear fiber channel. Performance improvement is observed when applying the PW in the context of dispersion managed and dispersion unmanaged systems.
Tasks
Published 2019-09-17
URL https://arxiv.org/abs/1909.08188v1
PDF https://arxiv.org/pdf/1909.08188v1.pdf
PWC https://paperswithcode.com/paper/fiber-nonlinearity-mitigation-via-the-parzen
Repo
Framework
comments powered by Disqus