February 1, 2020

3081 words 15 mins read

Paper Group AWR 337

Neural Scene Decomposition for Multi-Person Motion Capture. A Functional Representation for Graph Matching. Relational Graph Attention Networks. Unsupervised Traffic Accident Detection in First-Person Videos. Detecting Lesion Bounding Ellipses With Gaussian Proposal Networks. SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited …

Neural Scene Decomposition for Multi-Person Motion Capture


Title	Neural Scene Decomposition for Multi-Person Motion Capture
Authors	Helge Rhodin, Victor Constantin, Isinsu Katircioglu, Mathieu Salzmann, Pascal Fua
Abstract	Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet, mostly because the learned features are a valuable starting point to learn from limited labeled data. However, when it comes to 3D motion capture of multiple people, these features are only of limited use. In this paper, we therefore propose an approach to learning features that are useful for this purpose. To this end, we introduce a self-supervised approach to learning what we call a neural scene decomposition (NSD) that can be exploited for 3D pose estimation. NSD comprises three layers of abstraction to represent human subjects: spatial layout in terms of bounding-boxes and relative depth; a 2D shape representation in terms of an instance segmentation mask; and subject-specific appearance and 3D pose information. By exploiting self-supervision coming from multiview data, our NSD model can be trained end-to-end without any 2D or 3D supervision. In contrast to previous approaches, it works for multiple persons and full-frame images. Because it encodes 3D geometry, NSD can then be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data.
Tasks	3D Pose Estimation, Instance Segmentation, Motion Capture, Pose Estimation, Semantic Segmentation
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05684v1
PDF	http://arxiv.org/pdf/1903.05684v1.pdf
PWC	https://paperswithcode.com/paper/neural-scene-decomposition-for-multi-person
Repo	https://github.com/hrhodin/NeuralSceneDecomposition
Framework	pytorch

A Functional Representation for Graph Matching


Title	A Functional Representation for Graph Matching
Authors	Fu-Dong Wang, Gui-Song Xia, Nan Xue, Yipeng Zhang, Marcello Pelillo
Abstract	Graph matching is an important and persistent problem in computer vision and pattern recognition for finding node-to-node correspondence between graph-structured data. However, as widely used, graph matching that incorporates pairwise constraints can be formulated as a quadratic assignment problem (QAP), which is NP-complete and results in intrinsic computational difficulties. In this paper, we present a functional representation for graph matching (FRGM) that aims to provide more geometric insights on the problem and reduce the space and time complexities of corresponding algorithms. To achieve these goals, we represent a graph endowed with edge attributes by a linear function space equipped with a functional such as inner product or metric, that has an explicit geometric meaning. Consequently, the correspondence between graphs can be represented as a linear representation map of that functional. Specifically, we reformulate the linear functional representation map as a new parameterization for Euclidean graph matching, which is associative with geometric parameters for graphs under rigid or nonrigid deformations. This allows us to estimate the correspondence and geometric deformations simultaneously. The use of the representation of edge attributes rather than the affinity matrix enables us to reduce the space complexity by two orders of magnitudes. Furthermore, we propose an efficient optimization strategy with low time complexity to optimize the objective function. The experimental results on both synthetic and real-world datasets demonstrate that the proposed FRGM can achieve state-of-the-art performance.
Tasks	Graph Matching
Published	2019-01-16
URL	http://arxiv.org/abs/1901.05179v1
PDF	http://arxiv.org/pdf/1901.05179v1.pdf
PWC	https://paperswithcode.com/paper/a-functional-representation-for-graph
Repo	https://github.com/wangfudong/FRGM
Framework	none

Relational Graph Attention Networks


Title	Relational Graph Attention Networks
Authors	Dan Busbridge, Dane Sherburn, Pietro Cavallo, Nils Y. Hammerla
Abstract	We investigate Relational Graph Attention Networks, a class of models that extends non-relational graph attention mechanisms to incorporate relational information, opening up these methods to a wider variety of problems. A thorough evaluation of these models is performed, and comparisons are made against established benchmarks. To provide a meaningful comparison, we retrain Relational Graph Convolutional Networks, the spectral counterpart of Relational Graph Attention Networks, and evaluate them under the same conditions. We find that Relational Graph Attention Networks perform worse than anticipated, although some configurations are marginally beneficial for modelling molecular properties. We provide insights as to why this may be, and suggest both modifications to evaluation strategies, as well as directions to investigate for future work.
Tasks
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05811v1
PDF	http://arxiv.org/pdf/1904.05811v1.pdf
PWC	https://paperswithcode.com/paper/relational-graph-attention-networks-1
Repo	https://github.com/markWJJ/rgat
Framework	tf

Unsupervised Traffic Accident Detection in First-Person Videos


Title	Unsupervised Traffic Accident Detection in First-Person Videos
Authors	Yu Yao, Mingze Xu, Yuchen Wang, David J. Crandall, Ella M. Atkins
Abstract	Recognizing abnormal events such as traffic violations and accidents in natural driving scenes is essential for successful autonomous driving and advanced driver assistance systems. However, most work on video anomaly detection suffers from two crucial drawbacks. First, they assume cameras are fixed and videos have static backgrounds, which is reasonable for surveillance applications but not for vehicle-mounted cameras. Second, they pose the problem as one-class classification, relying on arduously hand-labeled training datasets that limit recognition to anomaly categories that have been explicitly trained. This paper proposes an unsupervised approach for traffic accident detection in first-person (dashboard-mounted camera) videos. Our major novelty is to detect anomalies by predicting the future locations of traffic participants and then monitoring the prediction accuracy and consistency metrics with three different strategies. We evaluate our approach using a new dataset of diverse traffic accidents, AnAn Accident Detection (A3D), as well as another publicly-available dataset. Experimental results show that our approach outperforms the state-of-the-art.
Tasks	Anomaly Detection, Autonomous Driving, Object Localization
Published	2019-03-02
URL	https://arxiv.org/abs/1903.00618v4
PDF	https://arxiv.org/pdf/1903.00618v4.pdf
PWC	https://paperswithcode.com/paper/unsupervised-traffic-accident-detection-in
Repo	https://github.com/MoonBlvd/tad-IROS2019
Framework	pytorch

Detecting Lesion Bounding Ellipses With Gaussian Proposal Networks


Title	Detecting Lesion Bounding Ellipses With Gaussian Proposal Networks
Authors	Yi Li
Abstract	Lesions characterized by computed tomography (CT) scans, are arguably often elliptical objects. However, current lesion detection systems are predominantly adopted from the popular Region Proposal Networks (RPNs) that only propose bounding boxes without fully leveraging the elliptical geometry of lesions. In this paper, we present Gaussian Proposal Networks (GPNs), a novel extension to RPNs, to detect lesion bounding ellipses. Instead of directly regressing the rotation angle of the ellipse as the common practice, GPN represents bounding ellipses as 2D Gaussian distributions on the image plain and minimizes the Kullback-Leibler (KL) divergence between the proposed Gaussian and the ground truth Gaussian for object localization. We show the KL divergence loss approximately incarnates the regression loss in the RPN framework when the rotation angle is 0. Experiments on the DeepLesion dataset show that GPN significantly outperforms RPN for lesion bounding ellipse detection thanks to lower localization error. GPN is open sourced at https://github.com/baidu-research/GPN
Tasks	Computed Tomography (CT), Object Localization
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09658v1
PDF	http://arxiv.org/pdf/1902.09658v1.pdf
PWC	https://paperswithcode.com/paper/detecting-lesion-bounding-ellipses-with
Repo	https://github.com/baidu-research/GPN
Framework	pytorch

SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images


Title	SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images
Authors	Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, Qixiang Ye
Abstract	In this paper, we present a large-scale dataset and establish a baseline for prohibited item discovery in Security Inspection X-ray images. Our dataset, named SIXray, consists of 1,059,231 X-ray images, in which 6 classes of 8,929 prohibited items are manually annotated. It raises a brand new challenge of overlapping image data, meanwhile shares the same properties with existing datasets, including complex yet meaningless contexts and class imbalance. We propose an approach named class-balanced hierarchical refinement (CHR) to deal with these difficulties. CHR assumes that each input image is sampled from a mixture distribution, and that deep networks require an iterative process to infer image contents accurately. To accelerate, we insert reversed connections to different network backbones, delivering high-level visual cues to assist mid-level features. In addition, a class-balanced loss function is designed to maximally alleviate the noise introduced by easy negative samples. We evaluate CHR on SIXray with different ratios of positive/negative samples. Compared to the baselines, CHR enjoys a better ability of discriminating objects especially using mid-level features, which offers the possibility of using a weakly-supervised approach towards accurate object localization. In particular, the advantage of CHR is more significant in the scenarios with fewer positive training samples, which demonstrates its potential application in real-world security inspection.
Tasks	Object Localization
Published	2019-01-02
URL	http://arxiv.org/abs/1901.00303v1
PDF	http://arxiv.org/pdf/1901.00303v1.pdf
PWC	https://paperswithcode.com/paper/sixray-a-large-scale-security-inspection-x
Repo	https://github.com/MeioJane/SIXray
Framework	none

$β^3$-IRT: A New Item Response Model and its Applications


Title	$β^3$-IRT: A New Item Response Model and its Applications
Authors	Yu Chen, Telmo Silva Filho, Ricardo B. C. Prudêncio, Tom Diethe, Peter Flach
Abstract	Item Response Theory (IRT) aims to assess latent abilities of respondents based on the correctness of their answers in aptitude test items with different difficulty levels. In this paper, we propose the $\beta^3$-IRT model, which models continuous responses and can generate a much enriched family of Item Characteristic Curve (ICC). In experiments we applied the proposed model to data from an online exam platform, and show our model outperforms a more standard 2PL-ND model on all datasets. Furthermore, we show how to apply $\beta^3$-IRT to assess the ability of machine learning classifiers. This novel application results in a new metric for evaluating the quality of the classifier’s probability estimates, based on the inferred difficulty and discrimination of data instances.
Tasks
Published	2019-03-10
URL	https://arxiv.org/abs/1903.04016v3
PDF	https://arxiv.org/pdf/1903.04016v3.pdf
PWC	https://paperswithcode.com/paper/3-irt-a-new-item-response-model-and-its
Repo	https://github.com/yc14600/beta3_IRT
Framework	tf

Optimal Feature Transport for Cross-View Image Geo-Localization


Title	Optimal Feature Transport for Cross-View Image Geo-Localization
Authors	Yujiao Shi, Xin Yu, Liu Liu, Tong Zhang, Hongdong Li
Abstract	This paper addresses the problem of cross-view image geo-localization, where the geographic location of a ground-level street-view query image is estimated by matching it against a large scale aerial map (e.g., a high-resolution satellite image). State-of-the-art deep-learning based methods tackle this problem as deep metric learning which aims to learn global feature representations of the scene seen by the two different views. Despite promising results are obtained by such deep metric learning methods, they, however, fail to exploit a crucial cue relevant for localization, namely, the spatial layout of local features. Moreover, little attention is paid to the obvious domain gap (between aerial view and ground view) in the context of cross-view localization. This paper proposes a novel Cross-View Feature Transport (CVFT) technique to explicitly establish cross-view domain transfer that facilitates feature alignment between ground and aerial images. Specifically, we implement the CVFT as network layers, which transports features from one domain to the other, leading to more meaningful feature similarity comparison. Our model is differentiable and can be learned end-to-end. Experiments on large-scale datasets have demonstrated that our method has remarkably boosted the state-of-the-art cross-view localization performance, e.g., on the CVUSA dataset, with significant improvements for top-1 recall from 40.79% to 61.43%, and for top-10 from 76.36% to 90.49%. We expect the key insight of the paper (i.e., explicitly handling domain difference via domain transport) will prove to be useful for other similar problems in computer vision as well.
Tasks	Image-Based Localization, Metric Learning
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05021v3
PDF	https://arxiv.org/pdf/1907.05021v3.pdf
PWC	https://paperswithcode.com/paper/optimal-feature-transport-for-cross-view
Repo	https://github.com/shiyujiao/cross_view_localization_CVFT
Framework	tf

Equivariant neural networks and equivarification


Title	Equivariant neural networks and equivarification
Authors	Erkao Bao, Linqi Song
Abstract	We provide a process to modify a neural network to an equivariant one, which we call equivarification. As an illustration, we build an equivariant neural network for image classification by equivarifying a convolutional neural network.
Tasks	Image Classification
Published	2019-06-16
URL	https://arxiv.org/abs/1906.07172v4
PDF	https://arxiv.org/pdf/1906.07172v4.pdf
PWC	https://paperswithcode.com/paper/equivariant-neural-networks-and
Repo	https://github.com/symplecticgeometry/equivariant-neural-networks-and-equivarification
Framework	tf

Non-Parametric Calibration for Classification


Title	Non-Parametric Calibration for Classification
Authors	Jonathan Wenger, Hedvig Kjellström, Rudolph Triebel
Abstract	Many applications of classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural networks, achieve high accuracy, they tend to incorrectly estimate uncertainty. In this paper, we propose a method that adjusts the confidence estimates of a general classifier such that they approach the probability of classifying correctly. In contrast to existing approaches, our calibration method employs a non-parametric representation using a latent Gaussian process, and is specifically designed for multi-class classification. It can be applied to any classifier that outputs confidence estimates and is not limited to neural networks. We also provide a theoretical analysis regarding the over- and underconfidence of a classifier and its relationship to calibration, as well as an empirical outlook for calibrated active learning. In experiments we show the universally strong performance of our method across different classifiers and benchmark data sets, in particular for state-of-the art neural network architectures.
Tasks	Active Learning, Calibration
Published	2019-06-12
URL	https://arxiv.org/abs/1906.04933v3
PDF	https://arxiv.org/pdf/1906.04933v3.pdf
PWC	https://paperswithcode.com/paper/non-parametric-calibration-for-classification
Repo	https://github.com/JonathanWenger/pycalib
Framework	none

Monocular Neural Image Based Rendering with Continuous View Control


Title	Monocular Neural Image Based Rendering with Continuous View Control
Authors	Xu Chen, Jie Song, Otmar Hilliges
Abstract	We present an approach that learns to synthesize high-quality, novel views of 3D objects or scenes, while providing fine-grained and precise control over the 6-DOF viewpoint. The approach is self-supervised and only requires 2D images and associated view transforms for training. Our main contribution is a network architecture that leverages a transforming auto-encoder in combination with a depth-guided warping procedure to predict geometrically accurate unseen views. Leveraging geometric constraints renders direct supervision via depth or flow maps unnecessary. If large parts of the object are occluded in the source view, a purely learning based prior is used to predict the values for dis-occluded pixels. Our network furthermore predicts a per-pixel mask, used to fuse depth-guided and pixel-based predictions. The resulting images reflect the desired 6-DOF transformation and details are preserved. We thoroughly evaluate our architecture on synthetic and real scenes and under fine-grained and fixed-view settings. Finally, we demonstrate that the approach generalizes to entirely unseen images such as product images downloaded from the internet.
Tasks	Novel View Synthesis
Published	2019-01-07
URL	https://arxiv.org/abs/1901.01880v2
PDF	https://arxiv.org/pdf/1901.01880v2.pdf
PWC	https://paperswithcode.com/paper/nvs-machines-learning-novel-view-synthesis
Repo	https://github.com/xuchen-ethz/continuous_view_synthesis
Framework	pytorch

A Little Is Enough: Circumventing Defenses For Distributed Learning


Title	A Little Is Enough: Circumventing Defenses For Distributed Learning
Authors	Moran Baruch, Gilad Baruch, Yoav Goldberg
Abstract	Distributed learning is central for large-scale training of deep-learning models. However, they are exposed to a security threat in which Byzantine participants can interrupt or control the learning process. Previous attack models and their corresponding defenses assume that the rogue participants are (a) omniscient (know the data of all other participants), and (b) introduce large change to the parameters. We show that small but well-crafted changes are sufficient, leading to a novel non-omniscient attack on distributed learning that go undetected by all existing defenses. We demonstrate our attack method works not only for preventing convergence but also for repurposing of the model behavior (backdooring). We show that 20% of corrupt workers are sufficient to degrade a CIFAR10 model accuracy by 50%, as well as to introduce backdoors into MNIST and CIFAR10 models without hurting their accuracy
Tasks
Published	2019-02-16
URL	http://arxiv.org/abs/1902.06156v1
PDF	http://arxiv.org/pdf/1902.06156v1.pdf
PWC	https://paperswithcode.com/paper/a-little-is-enough-circumventing-defenses-for
Repo	https://github.com/hwang595/DETOX
Framework	pytorch

Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies


Title	Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies
Authors	Xiao Guo, Jongmoo Choi
Abstract	Human motion prediction from motion capture data is a classical problem in the computer vision, and conventional methods take the holistic human body as input. These methods ignore the fact that, in various human activities, different body components (limbs and the torso) have distinctive characteristics in terms of the moving pattern. In this paper, we argue local representations on different body components should be learned separately and, based on such idea, propose a network, Skeleton Network (SkelNet), for long-term human motion prediction. Specifically, at each time-step, local structure representations of input (human body) are obtained via SkelNet’s branches of component-specific layers, then the shared layer uses local spatial representations to predict the future human pose. Our SkelNet is the first to use local structure representations for predicting the human motion. Then, for short-term human motion prediction, we propose the second network, named as Skeleton Temporal Network (Skel-TNet). Skel-TNet consists of three components: SkelNet and a Recurrent Neural Network, they have advantages in learning spatial and temporal dependencies for predicting human motion, respectively; a feed-forward network that outputs the final estimation. Our methods achieve promising results on the Human3.6M dataset and the CMU motion capture dataset.
Tasks	Motion Capture, motion prediction
Published	2019-02-20
URL	https://arxiv.org/abs/1902.07367v2
PDF	https://arxiv.org/pdf/1902.07367v2.pdf
PWC	https://paperswithcode.com/paper/human-motion-prediction-via-learning-local
Repo	https://github.com/CHELSEA234/SkelNet_motion_prediction
Framework	tf

Noise Regularization for Conditional Density Estimation


Title	Noise Regularization for Conditional Density Estimation
Authors	Jonas Rothfuss, Fabio Ferreira, Simon Boehm, Simon Walther, Maxim Ulrich, Tamim Asfour, Andreas Krause
Abstract	Modelling statistical relationships beyond the conditional mean is crucial in many settings. Conditional density estimation (CDE) aims to learn the full conditional probability density from data. Though highly expressive, neural network based CDE models can suffer from severe over-fitting when trained with the maximum likelihood objective. Due to the inherent structure of such models, classical regularization approaches in the parameter space are rendered ineffective. To address this issue, we develop a model-agnostic noise regularization method for CDE that adds random perturbations to the data during training. We demonstrate that the proposed approach corresponds to a smoothness regularization and prove its asymptotic consistency. In our experiments, noise regularization significantly and consistently outperforms other regularization methods across seven data sets and three CDE models. The effectiveness of noise regularization makes neural network based CDE the preferable method over previous non- and semi-parametric approaches, even when training data is scarce.
Tasks	Density Estimation
Published	2019-07-21
URL	https://arxiv.org/abs/1907.08982v2
PDF	https://arxiv.org/pdf/1907.08982v2.pdf
PWC	https://paperswithcode.com/paper/noise-regularization-for-conditional-density
Repo	https://github.com/freelunchtheorem/Conditional_Density_Estimation
Framework	tf

Generative Model with Dynamic Linear Flow


Title	Generative Model with Dynamic Linear Flow
Authors	Huadong Liao, Jiawei He, Kunxian Shu
Abstract	Flow-based generative models are a family of exact log-likelihood models with tractable sampling and latent-variable inference, hence conceptually attractive for modeling complex distributions. However, flow-based models are limited by density estimation performance issues as compared to state-of-the-art autoregressive models. Autoregressive models, which also belong to the family of likelihood-based methods, however suffer from limited parallelizability. In this paper, we propose Dynamic Linear Flow (DLF), a new family of invertible transformations with partially autoregressive structure. Our method benefits from the efficient computation of flow-based methods and high density estimation performance of autoregressive methods. We demonstrate that the proposed DLF yields state-of-theart performance on ImageNet 32x32 and 64x64 out of all flow-based methods, and is competitive with the best autoregressive model. Additionally, our model converges 10 times faster than Glow (Kingma and Dhariwal, 2018). The code is available at https://github.com/naturomics/DLF.
Tasks	Density Estimation
Published	2019-05-08
URL	https://arxiv.org/abs/1905.03239v1
PDF	https://arxiv.org/pdf/1905.03239v1.pdf
PWC	https://paperswithcode.com/paper/generative-model-with-dynamic-linear-flow
Repo	https://github.com/naturomics/DLF
Framework	tf