Paper Group AWR 337
Neural Scene Decomposition for Multi-Person Motion Capture. A Functional Representation for Graph Matching. Relational Graph Attention Networks. Unsupervised Traffic Accident Detection in First-Person Videos. Detecting Lesion Bounding Ellipses With Gaussian Proposal Networks. SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited …
Neural Scene Decomposition for Multi-Person Motion Capture
Title | Neural Scene Decomposition for Multi-Person Motion Capture |
Authors | Helge Rhodin, Victor Constantin, Isinsu Katircioglu, Mathieu Salzmann, Pascal Fua |
Abstract | Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet, mostly because the learned features are a valuable starting point to learn from limited labeled data. However, when it comes to 3D motion capture of multiple people, these features are only of limited use. In this paper, we therefore propose an approach to learning features that are useful for this purpose. To this end, we introduce a self-supervised approach to learning what we call a neural scene decomposition (NSD) that can be exploited for 3D pose estimation. NSD comprises three layers of abstraction to represent human subjects: spatial layout in terms of bounding-boxes and relative depth; a 2D shape representation in terms of an instance segmentation mask; and subject-specific appearance and 3D pose information. By exploiting self-supervision coming from multiview data, our NSD model can be trained end-to-end without any 2D or 3D supervision. In contrast to previous approaches, it works for multiple persons and full-frame images. Because it encodes 3D geometry, NSD can then be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data. |
Tasks | 3D Pose Estimation, Instance Segmentation, Motion Capture, Pose Estimation, Semantic Segmentation |
Published | 2019-03-13 |
URL | http://arxiv.org/abs/1903.05684v1 |
http://arxiv.org/pdf/1903.05684v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-scene-decomposition-for-multi-person |
Repo | https://github.com/hrhodin/NeuralSceneDecomposition |
Framework | pytorch |
A Functional Representation for Graph Matching
Title | A Functional Representation for Graph Matching |
Authors | Fu-Dong Wang, Gui-Song Xia, Nan Xue, Yipeng Zhang, Marcello Pelillo |
Abstract | Graph matching is an important and persistent problem in computer vision and pattern recognition for finding node-to-node correspondence between graph-structured data. However, as widely used, graph matching that incorporates pairwise constraints can be formulated as a quadratic assignment problem (QAP), which is NP-complete and results in intrinsic computational difficulties. In this paper, we present a functional representation for graph matching (FRGM) that aims to provide more geometric insights on the problem and reduce the space and time complexities of corresponding algorithms. To achieve these goals, we represent a graph endowed with edge attributes by a linear function space equipped with a functional such as inner product or metric, that has an explicit geometric meaning. Consequently, the correspondence between graphs can be represented as a linear representation map of that functional. Specifically, we reformulate the linear functional representation map as a new parameterization for Euclidean graph matching, which is associative with geometric parameters for graphs under rigid or nonrigid deformations. This allows us to estimate the correspondence and geometric deformations simultaneously. The use of the representation of edge attributes rather than the affinity matrix enables us to reduce the space complexity by two orders of magnitudes. Furthermore, we propose an efficient optimization strategy with low time complexity to optimize the objective function. The experimental results on both synthetic and real-world datasets demonstrate that the proposed FRGM can achieve state-of-the-art performance. |
Tasks | Graph Matching |
Published | 2019-01-16 |
URL | http://arxiv.org/abs/1901.05179v1 |
http://arxiv.org/pdf/1901.05179v1.pdf | |
PWC | https://paperswithcode.com/paper/a-functional-representation-for-graph |
Repo | https://github.com/wangfudong/FRGM |
Framework | none |
Relational Graph Attention Networks
Title | Relational Graph Attention Networks |
Authors | Dan Busbridge, Dane Sherburn, Pietro Cavallo, Nils Y. Hammerla |
Abstract | We investigate Relational Graph Attention Networks, a class of models that extends non-relational graph attention mechanisms to incorporate relational information, opening up these methods to a wider variety of problems. A thorough evaluation of these models is performed, and comparisons are made against established benchmarks. To provide a meaningful comparison, we retrain Relational Graph Convolutional Networks, the spectral counterpart of Relational Graph Attention Networks, and evaluate them under the same conditions. We find that Relational Graph Attention Networks perform worse than anticipated, although some configurations are marginally beneficial for modelling molecular properties. We provide insights as to why this may be, and suggest both modifications to evaluation strategies, as well as directions to investigate for future work. |
Tasks | |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05811v1 |
http://arxiv.org/pdf/1904.05811v1.pdf | |
PWC | https://paperswithcode.com/paper/relational-graph-attention-networks-1 |
Repo | https://github.com/markWJJ/rgat |
Framework | tf |
Unsupervised Traffic Accident Detection in First-Person Videos
Title | Unsupervised Traffic Accident Detection in First-Person Videos |
Authors | Yu Yao, Mingze Xu, Yuchen Wang, David J. Crandall, Ella M. Atkins |
Abstract | Recognizing abnormal events such as traffic violations and accidents in natural driving scenes is essential for successful autonomous driving and advanced driver assistance systems. However, most work on video anomaly detection suffers from two crucial drawbacks. First, they assume cameras are fixed and videos have static backgrounds, which is reasonable for surveillance applications but not for vehicle-mounted cameras. Second, they pose the problem as one-class classification, relying on arduously hand-labeled training datasets that limit recognition to anomaly categories that have been explicitly trained. This paper proposes an unsupervised approach for traffic accident detection in first-person (dashboard-mounted camera) videos. Our major novelty is to detect anomalies by predicting the future locations of traffic participants and then monitoring the prediction accuracy and consistency metrics with three different strategies. We evaluate our approach using a new dataset of diverse traffic accidents, AnAn Accident Detection (A3D), as well as another publicly-available dataset. Experimental results show that our approach outperforms the state-of-the-art. |
Tasks | Anomaly Detection, Autonomous Driving, Object Localization |
Published | 2019-03-02 |
URL | https://arxiv.org/abs/1903.00618v4 |
https://arxiv.org/pdf/1903.00618v4.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-traffic-accident-detection-in |
Repo | https://github.com/MoonBlvd/tad-IROS2019 |
Framework | pytorch |
Detecting Lesion Bounding Ellipses With Gaussian Proposal Networks
Title | Detecting Lesion Bounding Ellipses With Gaussian Proposal Networks |
Authors | Yi Li |
Abstract | Lesions characterized by computed tomography (CT) scans, are arguably often elliptical objects. However, current lesion detection systems are predominantly adopted from the popular Region Proposal Networks (RPNs) that only propose bounding boxes without fully leveraging the elliptical geometry of lesions. In this paper, we present Gaussian Proposal Networks (GPNs), a novel extension to RPNs, to detect lesion bounding ellipses. Instead of directly regressing the rotation angle of the ellipse as the common practice, GPN represents bounding ellipses as 2D Gaussian distributions on the image plain and minimizes the Kullback-Leibler (KL) divergence between the proposed Gaussian and the ground truth Gaussian for object localization. We show the KL divergence loss approximately incarnates the regression loss in the RPN framework when the rotation angle is 0. Experiments on the DeepLesion dataset show that GPN significantly outperforms RPN for lesion bounding ellipse detection thanks to lower localization error. GPN is open sourced at https://github.com/baidu-research/GPN |
Tasks | Computed Tomography (CT), Object Localization |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09658v1 |
http://arxiv.org/pdf/1902.09658v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-lesion-bounding-ellipses-with |
Repo | https://github.com/baidu-research/GPN |
Framework | pytorch |
SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images
Title | SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images |
Authors | Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, Qixiang Ye |
Abstract | In this paper, we present a large-scale dataset and establish a baseline for prohibited item discovery in Security Inspection X-ray images. Our dataset, named SIXray, consists of 1,059,231 X-ray images, in which 6 classes of 8,929 prohibited items are manually annotated. It raises a brand new challenge of overlapping image data, meanwhile shares the same properties with existing datasets, including complex yet meaningless contexts and class imbalance. We propose an approach named class-balanced hierarchical refinement (CHR) to deal with these difficulties. CHR assumes that each input image is sampled from a mixture distribution, and that deep networks require an iterative process to infer image contents accurately. To accelerate, we insert reversed connections to different network backbones, delivering high-level visual cues to assist mid-level features. In addition, a class-balanced loss function is designed to maximally alleviate the noise introduced by easy negative samples. We evaluate CHR on SIXray with different ratios of positive/negative samples. Compared to the baselines, CHR enjoys a better ability of discriminating objects especially using mid-level features, which offers the possibility of using a weakly-supervised approach towards accurate object localization. In particular, the advantage of CHR is more significant in the scenarios with fewer positive training samples, which demonstrates its potential application in real-world security inspection. |
Tasks | Object Localization |
Published | 2019-01-02 |
URL | http://arxiv.org/abs/1901.00303v1 |
http://arxiv.org/pdf/1901.00303v1.pdf | |
PWC | https://paperswithcode.com/paper/sixray-a-large-scale-security-inspection-x |
Repo | https://github.com/MeioJane/SIXray |
Framework | none |
$β^3$-IRT: A New Item Response Model and its Applications
Title | $β^3$-IRT: A New Item Response Model and its Applications |
Authors | Yu Chen, Telmo Silva Filho, Ricardo B. C. Prudêncio, Tom Diethe, Peter Flach |
Abstract | Item Response Theory (IRT) aims to assess latent abilities of respondents based on the correctness of their answers in aptitude test items with different difficulty levels. In this paper, we propose the $\beta^3$-IRT model, which models continuous responses and can generate a much enriched family of Item Characteristic Curve (ICC). In experiments we applied the proposed model to data from an online exam platform, and show our model outperforms a more standard 2PL-ND model on all datasets. Furthermore, we show how to apply $\beta^3$-IRT to assess the ability of machine learning classifiers. This novel application results in a new metric for evaluating the quality of the classifier’s probability estimates, based on the inferred difficulty and discrimination of data instances. |
Tasks | |
Published | 2019-03-10 |
URL | https://arxiv.org/abs/1903.04016v3 |
https://arxiv.org/pdf/1903.04016v3.pdf | |
PWC | https://paperswithcode.com/paper/3-irt-a-new-item-response-model-and-its |
Repo | https://github.com/yc14600/beta3_IRT |
Framework | tf |
Optimal Feature Transport for Cross-View Image Geo-Localization
Title | Optimal Feature Transport for Cross-View Image Geo-Localization |
Authors | Yujiao Shi, Xin Yu, Liu Liu, Tong Zhang, Hongdong Li |
Abstract | This paper addresses the problem of cross-view image geo-localization, where the geographic location of a ground-level street-view query image is estimated by matching it against a large scale aerial map (e.g., a high-resolution satellite image). State-of-the-art deep-learning based methods tackle this problem as deep metric learning which aims to learn global feature representations of the scene seen by the two different views. Despite promising results are obtained by such deep metric learning methods, they, however, fail to exploit a crucial cue relevant for localization, namely, the spatial layout of local features. Moreover, little attention is paid to the obvious domain gap (between aerial view and ground view) in the context of cross-view localization. This paper proposes a novel Cross-View Feature Transport (CVFT) technique to explicitly establish cross-view domain transfer that facilitates feature alignment between ground and aerial images. Specifically, we implement the CVFT as network layers, which transports features from one domain to the other, leading to more meaningful feature similarity comparison. Our model is differentiable and can be learned end-to-end. Experiments on large-scale datasets have demonstrated that our method has remarkably boosted the state-of-the-art cross-view localization performance, e.g., on the CVUSA dataset, with significant improvements for top-1 recall from 40.79% to 61.43%, and for top-10 from 76.36% to 90.49%. We expect the key insight of the paper (i.e., explicitly handling domain difference via domain transport) will prove to be useful for other similar problems in computer vision as well. |
Tasks | Image-Based Localization, Metric Learning |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05021v3 |
https://arxiv.org/pdf/1907.05021v3.pdf | |
PWC | https://paperswithcode.com/paper/optimal-feature-transport-for-cross-view |
Repo | https://github.com/shiyujiao/cross_view_localization_CVFT |
Framework | tf |
Equivariant neural networks and equivarification
Title | Equivariant neural networks and equivarification |
Authors | Erkao Bao, Linqi Song |
Abstract | We provide a process to modify a neural network to an equivariant one, which we call equivarification. As an illustration, we build an equivariant neural network for image classification by equivarifying a convolutional neural network. |
Tasks | Image Classification |
Published | 2019-06-16 |
URL | https://arxiv.org/abs/1906.07172v4 |
https://arxiv.org/pdf/1906.07172v4.pdf | |
PWC | https://paperswithcode.com/paper/equivariant-neural-networks-and |
Repo | https://github.com/symplecticgeometry/equivariant-neural-networks-and-equivarification |
Framework | tf |
Non-Parametric Calibration for Classification
Title | Non-Parametric Calibration for Classification |
Authors | Jonathan Wenger, Hedvig Kjellström, Rudolph Triebel |
Abstract | Many applications of classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural networks, achieve high accuracy, they tend to incorrectly estimate uncertainty. In this paper, we propose a method that adjusts the confidence estimates of a general classifier such that they approach the probability of classifying correctly. In contrast to existing approaches, our calibration method employs a non-parametric representation using a latent Gaussian process, and is specifically designed for multi-class classification. It can be applied to any classifier that outputs confidence estimates and is not limited to neural networks. We also provide a theoretical analysis regarding the over- and underconfidence of a classifier and its relationship to calibration, as well as an empirical outlook for calibrated active learning. In experiments we show the universally strong performance of our method across different classifiers and benchmark data sets, in particular for state-of-the art neural network architectures. |
Tasks | Active Learning, Calibration |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.04933v3 |
https://arxiv.org/pdf/1906.04933v3.pdf | |
PWC | https://paperswithcode.com/paper/non-parametric-calibration-for-classification |
Repo | https://github.com/JonathanWenger/pycalib |
Framework | none |
Monocular Neural Image Based Rendering with Continuous View Control
Title | Monocular Neural Image Based Rendering with Continuous View Control |
Authors | Xu Chen, Jie Song, Otmar Hilliges |
Abstract | We present an approach that learns to synthesize high-quality, novel views of 3D objects or scenes, while providing fine-grained and precise control over the 6-DOF viewpoint. The approach is self-supervised and only requires 2D images and associated view transforms for training. Our main contribution is a network architecture that leverages a transforming auto-encoder in combination with a depth-guided warping procedure to predict geometrically accurate unseen views. Leveraging geometric constraints renders direct supervision via depth or flow maps unnecessary. If large parts of the object are occluded in the source view, a purely learning based prior is used to predict the values for dis-occluded pixels. Our network furthermore predicts a per-pixel mask, used to fuse depth-guided and pixel-based predictions. The resulting images reflect the desired 6-DOF transformation and details are preserved. We thoroughly evaluate our architecture on synthetic and real scenes and under fine-grained and fixed-view settings. Finally, we demonstrate that the approach generalizes to entirely unseen images such as product images downloaded from the internet. |
Tasks | Novel View Synthesis |
Published | 2019-01-07 |
URL | https://arxiv.org/abs/1901.01880v2 |
https://arxiv.org/pdf/1901.01880v2.pdf | |
PWC | https://paperswithcode.com/paper/nvs-machines-learning-novel-view-synthesis |
Repo | https://github.com/xuchen-ethz/continuous_view_synthesis |
Framework | pytorch |
A Little Is Enough: Circumventing Defenses For Distributed Learning
Title | A Little Is Enough: Circumventing Defenses For Distributed Learning |
Authors | Moran Baruch, Gilad Baruch, Yoav Goldberg |
Abstract | Distributed learning is central for large-scale training of deep-learning models. However, they are exposed to a security threat in which Byzantine participants can interrupt or control the learning process. Previous attack models and their corresponding defenses assume that the rogue participants are (a) omniscient (know the data of all other participants), and (b) introduce large change to the parameters. We show that small but well-crafted changes are sufficient, leading to a novel non-omniscient attack on distributed learning that go undetected by all existing defenses. We demonstrate our attack method works not only for preventing convergence but also for repurposing of the model behavior (backdooring). We show that 20% of corrupt workers are sufficient to degrade a CIFAR10 model accuracy by 50%, as well as to introduce backdoors into MNIST and CIFAR10 models without hurting their accuracy |
Tasks | |
Published | 2019-02-16 |
URL | http://arxiv.org/abs/1902.06156v1 |
http://arxiv.org/pdf/1902.06156v1.pdf | |
PWC | https://paperswithcode.com/paper/a-little-is-enough-circumventing-defenses-for |
Repo | https://github.com/hwang595/DETOX |
Framework | pytorch |
Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies
Title | Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies |
Authors | Xiao Guo, Jongmoo Choi |
Abstract | Human motion prediction from motion capture data is a classical problem in the computer vision, and conventional methods take the holistic human body as input. These methods ignore the fact that, in various human activities, different body components (limbs and the torso) have distinctive characteristics in terms of the moving pattern. In this paper, we argue local representations on different body components should be learned separately and, based on such idea, propose a network, Skeleton Network (SkelNet), for long-term human motion prediction. Specifically, at each time-step, local structure representations of input (human body) are obtained via SkelNet’s branches of component-specific layers, then the shared layer uses local spatial representations to predict the future human pose. Our SkelNet is the first to use local structure representations for predicting the human motion. Then, for short-term human motion prediction, we propose the second network, named as Skeleton Temporal Network (Skel-TNet). Skel-TNet consists of three components: SkelNet and a Recurrent Neural Network, they have advantages in learning spatial and temporal dependencies for predicting human motion, respectively; a feed-forward network that outputs the final estimation. Our methods achieve promising results on the Human3.6M dataset and the CMU motion capture dataset. |
Tasks | Motion Capture, motion prediction |
Published | 2019-02-20 |
URL | https://arxiv.org/abs/1902.07367v2 |
https://arxiv.org/pdf/1902.07367v2.pdf | |
PWC | https://paperswithcode.com/paper/human-motion-prediction-via-learning-local |
Repo | https://github.com/CHELSEA234/SkelNet_motion_prediction |
Framework | tf |
Noise Regularization for Conditional Density Estimation
Title | Noise Regularization for Conditional Density Estimation |
Authors | Jonas Rothfuss, Fabio Ferreira, Simon Boehm, Simon Walther, Maxim Ulrich, Tamim Asfour, Andreas Krause |
Abstract | Modelling statistical relationships beyond the conditional mean is crucial in many settings. Conditional density estimation (CDE) aims to learn the full conditional probability density from data. Though highly expressive, neural network based CDE models can suffer from severe over-fitting when trained with the maximum likelihood objective. Due to the inherent structure of such models, classical regularization approaches in the parameter space are rendered ineffective. To address this issue, we develop a model-agnostic noise regularization method for CDE that adds random perturbations to the data during training. We demonstrate that the proposed approach corresponds to a smoothness regularization and prove its asymptotic consistency. In our experiments, noise regularization significantly and consistently outperforms other regularization methods across seven data sets and three CDE models. The effectiveness of noise regularization makes neural network based CDE the preferable method over previous non- and semi-parametric approaches, even when training data is scarce. |
Tasks | Density Estimation |
Published | 2019-07-21 |
URL | https://arxiv.org/abs/1907.08982v2 |
https://arxiv.org/pdf/1907.08982v2.pdf | |
PWC | https://paperswithcode.com/paper/noise-regularization-for-conditional-density |
Repo | https://github.com/freelunchtheorem/Conditional_Density_Estimation |
Framework | tf |
Generative Model with Dynamic Linear Flow
Title | Generative Model with Dynamic Linear Flow |
Authors | Huadong Liao, Jiawei He, Kunxian Shu |
Abstract | Flow-based generative models are a family of exact log-likelihood models with tractable sampling and latent-variable inference, hence conceptually attractive for modeling complex distributions. However, flow-based models are limited by density estimation performance issues as compared to state-of-the-art autoregressive models. Autoregressive models, which also belong to the family of likelihood-based methods, however suffer from limited parallelizability. In this paper, we propose Dynamic Linear Flow (DLF), a new family of invertible transformations with partially autoregressive structure. Our method benefits from the efficient computation of flow-based methods and high density estimation performance of autoregressive methods. We demonstrate that the proposed DLF yields state-of-theart performance on ImageNet 32x32 and 64x64 out of all flow-based methods, and is competitive with the best autoregressive model. Additionally, our model converges 10 times faster than Glow (Kingma and Dhariwal, 2018). The code is available at https://github.com/naturomics/DLF. |
Tasks | Density Estimation |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.03239v1 |
https://arxiv.org/pdf/1905.03239v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-model-with-dynamic-linear-flow |
Repo | https://github.com/naturomics/DLF |
Framework | tf |