February 1, 2020

3081 words 15 mins read

Paper Group AWR 337

Paper Group AWR 337

Neural Scene Decomposition for Multi-Person Motion Capture. A Functional Representation for Graph Matching. Relational Graph Attention Networks. Unsupervised Traffic Accident Detection in First-Person Videos. Detecting Lesion Bounding Ellipses With Gaussian Proposal Networks. SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited …

Neural Scene Decomposition for Multi-Person Motion Capture

Title Neural Scene Decomposition for Multi-Person Motion Capture
Authors Helge Rhodin, Victor Constantin, Isinsu Katircioglu, Mathieu Salzmann, Pascal Fua
Abstract Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet, mostly because the learned features are a valuable starting point to learn from limited labeled data. However, when it comes to 3D motion capture of multiple people, these features are only of limited use. In this paper, we therefore propose an approach to learning features that are useful for this purpose. To this end, we introduce a self-supervised approach to learning what we call a neural scene decomposition (NSD) that can be exploited for 3D pose estimation. NSD comprises three layers of abstraction to represent human subjects: spatial layout in terms of bounding-boxes and relative depth; a 2D shape representation in terms of an instance segmentation mask; and subject-specific appearance and 3D pose information. By exploiting self-supervision coming from multiview data, our NSD model can be trained end-to-end without any 2D or 3D supervision. In contrast to previous approaches, it works for multiple persons and full-frame images. Because it encodes 3D geometry, NSD can then be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data.
Tasks 3D Pose Estimation, Instance Segmentation, Motion Capture, Pose Estimation, Semantic Segmentation
Published 2019-03-13
URL http://arxiv.org/abs/1903.05684v1
PDF http://arxiv.org/pdf/1903.05684v1.pdf
PWC https://paperswithcode.com/paper/neural-scene-decomposition-for-multi-person
Repo https://github.com/hrhodin/NeuralSceneDecomposition
Framework pytorch

A Functional Representation for Graph Matching

Title A Functional Representation for Graph Matching
Authors Fu-Dong Wang, Gui-Song Xia, Nan Xue, Yipeng Zhang, Marcello Pelillo
Abstract Graph matching is an important and persistent problem in computer vision and pattern recognition for finding node-to-node correspondence between graph-structured data. However, as widely used, graph matching that incorporates pairwise constraints can be formulated as a quadratic assignment problem (QAP), which is NP-complete and results in intrinsic computational difficulties. In this paper, we present a functional representation for graph matching (FRGM) that aims to provide more geometric insights on the problem and reduce the space and time complexities of corresponding algorithms. To achieve these goals, we represent a graph endowed with edge attributes by a linear function space equipped with a functional such as inner product or metric, that has an explicit geometric meaning. Consequently, the correspondence between graphs can be represented as a linear representation map of that functional. Specifically, we reformulate the linear functional representation map as a new parameterization for Euclidean graph matching, which is associative with geometric parameters for graphs under rigid or nonrigid deformations. This allows us to estimate the correspondence and geometric deformations simultaneously. The use of the representation of edge attributes rather than the affinity matrix enables us to reduce the space complexity by two orders of magnitudes. Furthermore, we propose an efficient optimization strategy with low time complexity to optimize the objective function. The experimental results on both synthetic and real-world datasets demonstrate that the proposed FRGM can achieve state-of-the-art performance.
Tasks Graph Matching
Published 2019-01-16
URL http://arxiv.org/abs/1901.05179v1
PDF http://arxiv.org/pdf/1901.05179v1.pdf
PWC https://paperswithcode.com/paper/a-functional-representation-for-graph
Repo https://github.com/wangfudong/FRGM
Framework none

Relational Graph Attention Networks

Title Relational Graph Attention Networks
Authors Dan Busbridge, Dane Sherburn, Pietro Cavallo, Nils Y. Hammerla
Abstract We investigate Relational Graph Attention Networks, a class of models that extends non-relational graph attention mechanisms to incorporate relational information, opening up these methods to a wider variety of problems. A thorough evaluation of these models is performed, and comparisons are made against established benchmarks. To provide a meaningful comparison, we retrain Relational Graph Convolutional Networks, the spectral counterpart of Relational Graph Attention Networks, and evaluate them under the same conditions. We find that Relational Graph Attention Networks perform worse than anticipated, although some configurations are marginally beneficial for modelling molecular properties. We provide insights as to why this may be, and suggest both modifications to evaluation strategies, as well as directions to investigate for future work.
Tasks
Published 2019-04-11
URL http://arxiv.org/abs/1904.05811v1
PDF http://arxiv.org/pdf/1904.05811v1.pdf
PWC https://paperswithcode.com/paper/relational-graph-attention-networks-1
Repo https://github.com/markWJJ/rgat
Framework tf

Unsupervised Traffic Accident Detection in First-Person Videos

Title Unsupervised Traffic Accident Detection in First-Person Videos
Authors Yu Yao, Mingze Xu, Yuchen Wang, David J. Crandall, Ella M. Atkins
Abstract Recognizing abnormal events such as traffic violations and accidents in natural driving scenes is essential for successful autonomous driving and advanced driver assistance systems. However, most work on video anomaly detection suffers from two crucial drawbacks. First, they assume cameras are fixed and videos have static backgrounds, which is reasonable for surveillance applications but not for vehicle-mounted cameras. Second, they pose the problem as one-class classification, relying on arduously hand-labeled training datasets that limit recognition to anomaly categories that have been explicitly trained. This paper proposes an unsupervised approach for traffic accident detection in first-person (dashboard-mounted camera) videos. Our major novelty is to detect anomalies by predicting the future locations of traffic participants and then monitoring the prediction accuracy and consistency metrics with three different strategies. We evaluate our approach using a new dataset of diverse traffic accidents, AnAn Accident Detection (A3D), as well as another publicly-available dataset. Experimental results show that our approach outperforms the state-of-the-art.
Tasks Anomaly Detection, Autonomous Driving, Object Localization
Published 2019-03-02
URL https://arxiv.org/abs/1903.00618v4
PDF https://arxiv.org/pdf/1903.00618v4.pdf
PWC https://paperswithcode.com/paper/unsupervised-traffic-accident-detection-in
Repo https://github.com/MoonBlvd/tad-IROS2019
Framework pytorch

Detecting Lesion Bounding Ellipses With Gaussian Proposal Networks

Title Detecting Lesion Bounding Ellipses With Gaussian Proposal Networks
Authors Yi Li
Abstract Lesions characterized by computed tomography (CT) scans, are arguably often elliptical objects. However, current lesion detection systems are predominantly adopted from the popular Region Proposal Networks (RPNs) that only propose bounding boxes without fully leveraging the elliptical geometry of lesions. In this paper, we present Gaussian Proposal Networks (GPNs), a novel extension to RPNs, to detect lesion bounding ellipses. Instead of directly regressing the rotation angle of the ellipse as the common practice, GPN represents bounding ellipses as 2D Gaussian distributions on the image plain and minimizes the Kullback-Leibler (KL) divergence between the proposed Gaussian and the ground truth Gaussian for object localization. We show the KL divergence loss approximately incarnates the regression loss in the RPN framework when the rotation angle is 0. Experiments on the DeepLesion dataset show that GPN significantly outperforms RPN for lesion bounding ellipse detection thanks to lower localization error. GPN is open sourced at https://github.com/baidu-research/GPN
Tasks Computed Tomography (CT), Object Localization
Published 2019-02-25
URL http://arxiv.org/abs/1902.09658v1
PDF http://arxiv.org/pdf/1902.09658v1.pdf
PWC https://paperswithcode.com/paper/detecting-lesion-bounding-ellipses-with
Repo https://github.com/baidu-research/GPN
Framework pytorch

SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images

Title SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images
Authors Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, Qixiang Ye
Abstract In this paper, we present a large-scale dataset and establish a baseline for prohibited item discovery in Security Inspection X-ray images. Our dataset, named SIXray, consists of 1,059,231 X-ray images, in which 6 classes of 8,929 prohibited items are manually annotated. It raises a brand new challenge of overlapping image data, meanwhile shares the same properties with existing datasets, including complex yet meaningless contexts and class imbalance. We propose an approach named class-balanced hierarchical refinement (CHR) to deal with these difficulties. CHR assumes that each input image is sampled from a mixture distribution, and that deep networks require an iterative process to infer image contents accurately. To accelerate, we insert reversed connections to different network backbones, delivering high-level visual cues to assist mid-level features. In addition, a class-balanced loss function is designed to maximally alleviate the noise introduced by easy negative samples. We evaluate CHR on SIXray with different ratios of positive/negative samples. Compared to the baselines, CHR enjoys a better ability of discriminating objects especially using mid-level features, which offers the possibility of using a weakly-supervised approach towards accurate object localization. In particular, the advantage of CHR is more significant in the scenarios with fewer positive training samples, which demonstrates its potential application in real-world security inspection.
Tasks Object Localization
Published 2019-01-02
URL http://arxiv.org/abs/1901.00303v1
PDF http://arxiv.org/pdf/1901.00303v1.pdf
PWC https://paperswithcode.com/paper/sixray-a-large-scale-security-inspection-x
Repo https://github.com/MeioJane/SIXray
Framework none

$β^3$-IRT: A New Item Response Model and its Applications

Title $β^3$-IRT: A New Item Response Model and its Applications
Authors Yu Chen, Telmo Silva Filho, Ricardo B. C. Prudêncio, Tom Diethe, Peter Flach
Abstract Item Response Theory (IRT) aims to assess latent abilities of respondents based on the correctness of their answers in aptitude test items with different difficulty levels. In this paper, we propose the $\beta^3$-IRT model, which models continuous responses and can generate a much enriched family of Item Characteristic Curve (ICC). In experiments we applied the proposed model to data from an online exam platform, and show our model outperforms a more standard 2PL-ND model on all datasets. Furthermore, we show how to apply $\beta^3$-IRT to assess the ability of machine learning classifiers. This novel application results in a new metric for evaluating the quality of the classifier’s probability estimates, based on the inferred difficulty and discrimination of data instances.
Tasks
Published 2019-03-10
URL https://arxiv.org/abs/1903.04016v3
PDF https://arxiv.org/pdf/1903.04016v3.pdf
PWC https://paperswithcode.com/paper/3-irt-a-new-item-response-model-and-its
Repo https://github.com/yc14600/beta3_IRT
Framework tf

Optimal Feature Transport for Cross-View Image Geo-Localization

Title Optimal Feature Transport for Cross-View Image Geo-Localization
Authors Yujiao Shi, Xin Yu, Liu Liu, Tong Zhang, Hongdong Li
Abstract This paper addresses the problem of cross-view image geo-localization, where the geographic location of a ground-level street-view query image is estimated by matching it against a large scale aerial map (e.g., a high-resolution satellite image). State-of-the-art deep-learning based methods tackle this problem as deep metric learning which aims to learn global feature representations of the scene seen by the two different views. Despite promising results are obtained by such deep metric learning methods, they, however, fail to exploit a crucial cue relevant for localization, namely, the spatial layout of local features. Moreover, little attention is paid to the obvious domain gap (between aerial view and ground view) in the context of cross-view localization. This paper proposes a novel Cross-View Feature Transport (CVFT) technique to explicitly establish cross-view domain transfer that facilitates feature alignment between ground and aerial images. Specifically, we implement the CVFT as network layers, which transports features from one domain to the other, leading to more meaningful feature similarity comparison. Our model is differentiable and can be learned end-to-end. Experiments on large-scale datasets have demonstrated that our method has remarkably boosted the state-of-the-art cross-view localization performance, e.g., on the CVUSA dataset, with significant improvements for top-1 recall from 40.79% to 61.43%, and for top-10 from 76.36% to 90.49%. We expect the key insight of the paper (i.e., explicitly handling domain difference via domain transport) will prove to be useful for other similar problems in computer vision as well.
Tasks Image-Based Localization, Metric Learning
Published 2019-07-11
URL https://arxiv.org/abs/1907.05021v3
PDF https://arxiv.org/pdf/1907.05021v3.pdf
PWC https://paperswithcode.com/paper/optimal-feature-transport-for-cross-view
Repo https://github.com/shiyujiao/cross_view_localization_CVFT
Framework tf

Equivariant neural networks and equivarification

Title Equivariant neural networks and equivarification
Authors Erkao Bao, Linqi Song
Abstract We provide a process to modify a neural network to an equivariant one, which we call equivarification. As an illustration, we build an equivariant neural network for image classification by equivarifying a convolutional neural network.
Tasks Image Classification
Published 2019-06-16
URL https://arxiv.org/abs/1906.07172v4
PDF https://arxiv.org/pdf/1906.07172v4.pdf
PWC https://paperswithcode.com/paper/equivariant-neural-networks-and
Repo https://github.com/symplecticgeometry/equivariant-neural-networks-and-equivarification
Framework tf

Non-Parametric Calibration for Classification

Title Non-Parametric Calibration for Classification
Authors Jonathan Wenger, Hedvig Kjellström, Rudolph Triebel
Abstract Many applications of classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural networks, achieve high accuracy, they tend to incorrectly estimate uncertainty. In this paper, we propose a method that adjusts the confidence estimates of a general classifier such that they approach the probability of classifying correctly. In contrast to existing approaches, our calibration method employs a non-parametric representation using a latent Gaussian process, and is specifically designed for multi-class classification. It can be applied to any classifier that outputs confidence estimates and is not limited to neural networks. We also provide a theoretical analysis regarding the over- and underconfidence of a classifier and its relationship to calibration, as well as an empirical outlook for calibrated active learning. In experiments we show the universally strong performance of our method across different classifiers and benchmark data sets, in particular for state-of-the art neural network architectures.
Tasks Active Learning, Calibration
Published 2019-06-12
URL https://arxiv.org/abs/1906.04933v3
PDF https://arxiv.org/pdf/1906.04933v3.pdf
PWC https://paperswithcode.com/paper/non-parametric-calibration-for-classification
Repo https://github.com/JonathanWenger/pycalib
Framework none

Monocular Neural Image Based Rendering with Continuous View Control

Title Monocular Neural Image Based Rendering with Continuous View Control
Authors Xu Chen, Jie Song, Otmar Hilliges
Abstract We present an approach that learns to synthesize high-quality, novel views of 3D objects or scenes, while providing fine-grained and precise control over the 6-DOF viewpoint. The approach is self-supervised and only requires 2D images and associated view transforms for training. Our main contribution is a network architecture that leverages a transforming auto-encoder in combination with a depth-guided warping procedure to predict geometrically accurate unseen views. Leveraging geometric constraints renders direct supervision via depth or flow maps unnecessary. If large parts of the object are occluded in the source view, a purely learning based prior is used to predict the values for dis-occluded pixels. Our network furthermore predicts a per-pixel mask, used to fuse depth-guided and pixel-based predictions. The resulting images reflect the desired 6-DOF transformation and details are preserved. We thoroughly evaluate our architecture on synthetic and real scenes and under fine-grained and fixed-view settings. Finally, we demonstrate that the approach generalizes to entirely unseen images such as product images downloaded from the internet.
Tasks Novel View Synthesis
Published 2019-01-07
URL https://arxiv.org/abs/1901.01880v2
PDF https://arxiv.org/pdf/1901.01880v2.pdf
PWC https://paperswithcode.com/paper/nvs-machines-learning-novel-view-synthesis
Repo https://github.com/xuchen-ethz/continuous_view_synthesis
Framework pytorch

A Little Is Enough: Circumventing Defenses For Distributed Learning

Title A Little Is Enough: Circumventing Defenses For Distributed Learning
Authors Moran Baruch, Gilad Baruch, Yoav Goldberg
Abstract Distributed learning is central for large-scale training of deep-learning models. However, they are exposed to a security threat in which Byzantine participants can interrupt or control the learning process. Previous attack models and their corresponding defenses assume that the rogue participants are (a) omniscient (know the data of all other participants), and (b) introduce large change to the parameters. We show that small but well-crafted changes are sufficient, leading to a novel non-omniscient attack on distributed learning that go undetected by all existing defenses. We demonstrate our attack method works not only for preventing convergence but also for repurposing of the model behavior (backdooring). We show that 20% of corrupt workers are sufficient to degrade a CIFAR10 model accuracy by 50%, as well as to introduce backdoors into MNIST and CIFAR10 models without hurting their accuracy
Tasks
Published 2019-02-16
URL http://arxiv.org/abs/1902.06156v1
PDF http://arxiv.org/pdf/1902.06156v1.pdf
PWC https://paperswithcode.com/paper/a-little-is-enough-circumventing-defenses-for
Repo https://github.com/hwang595/DETOX
Framework pytorch

Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies

Title Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies
Authors Xiao Guo, Jongmoo Choi
Abstract Human motion prediction from motion capture data is a classical problem in the computer vision, and conventional methods take the holistic human body as input. These methods ignore the fact that, in various human activities, different body components (limbs and the torso) have distinctive characteristics in terms of the moving pattern. In this paper, we argue local representations on different body components should be learned separately and, based on such idea, propose a network, Skeleton Network (SkelNet), for long-term human motion prediction. Specifically, at each time-step, local structure representations of input (human body) are obtained via SkelNet’s branches of component-specific layers, then the shared layer uses local spatial representations to predict the future human pose. Our SkelNet is the first to use local structure representations for predicting the human motion. Then, for short-term human motion prediction, we propose the second network, named as Skeleton Temporal Network (Skel-TNet). Skel-TNet consists of three components: SkelNet and a Recurrent Neural Network, they have advantages in learning spatial and temporal dependencies for predicting human motion, respectively; a feed-forward network that outputs the final estimation. Our methods achieve promising results on the Human3.6M dataset and the CMU motion capture dataset.
Tasks Motion Capture, motion prediction
Published 2019-02-20
URL https://arxiv.org/abs/1902.07367v2
PDF https://arxiv.org/pdf/1902.07367v2.pdf
PWC https://paperswithcode.com/paper/human-motion-prediction-via-learning-local
Repo https://github.com/CHELSEA234/SkelNet_motion_prediction
Framework tf

Noise Regularization for Conditional Density Estimation

Title Noise Regularization for Conditional Density Estimation
Authors Jonas Rothfuss, Fabio Ferreira, Simon Boehm, Simon Walther, Maxim Ulrich, Tamim Asfour, Andreas Krause
Abstract Modelling statistical relationships beyond the conditional mean is crucial in many settings. Conditional density estimation (CDE) aims to learn the full conditional probability density from data. Though highly expressive, neural network based CDE models can suffer from severe over-fitting when trained with the maximum likelihood objective. Due to the inherent structure of such models, classical regularization approaches in the parameter space are rendered ineffective. To address this issue, we develop a model-agnostic noise regularization method for CDE that adds random perturbations to the data during training. We demonstrate that the proposed approach corresponds to a smoothness regularization and prove its asymptotic consistency. In our experiments, noise regularization significantly and consistently outperforms other regularization methods across seven data sets and three CDE models. The effectiveness of noise regularization makes neural network based CDE the preferable method over previous non- and semi-parametric approaches, even when training data is scarce.
Tasks Density Estimation
Published 2019-07-21
URL https://arxiv.org/abs/1907.08982v2
PDF https://arxiv.org/pdf/1907.08982v2.pdf
PWC https://paperswithcode.com/paper/noise-regularization-for-conditional-density
Repo https://github.com/freelunchtheorem/Conditional_Density_Estimation
Framework tf

Generative Model with Dynamic Linear Flow

Title Generative Model with Dynamic Linear Flow
Authors Huadong Liao, Jiawei He, Kunxian Shu
Abstract Flow-based generative models are a family of exact log-likelihood models with tractable sampling and latent-variable inference, hence conceptually attractive for modeling complex distributions. However, flow-based models are limited by density estimation performance issues as compared to state-of-the-art autoregressive models. Autoregressive models, which also belong to the family of likelihood-based methods, however suffer from limited parallelizability. In this paper, we propose Dynamic Linear Flow (DLF), a new family of invertible transformations with partially autoregressive structure. Our method benefits from the efficient computation of flow-based methods and high density estimation performance of autoregressive methods. We demonstrate that the proposed DLF yields state-of-theart performance on ImageNet 32x32 and 64x64 out of all flow-based methods, and is competitive with the best autoregressive model. Additionally, our model converges 10 times faster than Glow (Kingma and Dhariwal, 2018). The code is available at https://github.com/naturomics/DLF.
Tasks Density Estimation
Published 2019-05-08
URL https://arxiv.org/abs/1905.03239v1
PDF https://arxiv.org/pdf/1905.03239v1.pdf
PWC https://paperswithcode.com/paper/generative-model-with-dynamic-linear-flow
Repo https://github.com/naturomics/DLF
Framework tf
comments powered by Disqus