April 3, 2020

3386 words 16 mins read

Paper Group AWR 34

Paper Group AWR 34

Fast is better than free: Revisiting adversarial training. Blurry Video Frame Interpolation. CAE-LO: LiDAR Odometry Leveraging Fully Unsupervised Convolutional Auto-Encoder for Interest Point Detection and Feature Description. Boosting Adversarial Training with Hypersphere Embedding. Collaborative Motion Prediction via Neural Motion Message Passing …

Fast is better than free: Revisiting adversarial training

Title Fast is better than free: Revisiting adversarial training
Authors Eric Wong, Leslie Rice, J. Zico Kolter
Abstract Adversarial training, a method for learning robust deep networks, is typically assumed to be more expensive than traditional training due to the necessity of constructing adversarial examples via a first-order method like projected gradient decent (PGD). In this paper, we make the surprising discovery that it is possible to train empirically robust models using a much weaker and cheaper adversary, an approach that was previously believed to be ineffective, rendering the method no more costly than standard training in practice. Specifically, we show that adversarial training with the fast gradient sign method (FGSM), when combined with random initialization, is as effective as PGD-based training but has significantly lower cost. Furthermore we show that FGSM adversarial training can be further accelerated by using standard techniques for efficient training of deep networks, allowing us to learn a robust CIFAR10 classifier with 45% robust accuracy to PGD attacks with $\epsilon=8/255$ in 6 minutes, and a robust ImageNet classifier with 43% robust accuracy at $\epsilon=2/255$ in 12 hours, in comparison to past work based on “free” adversarial training which took 10 and 50 hours to reach the same respective thresholds. Finally, we identify a failure mode referred to as “catastrophic overfitting” which may have caused previous attempts to use FGSM adversarial training to fail. All code for reproducing the experiments in this paper as well as pretrained model weights are at https://github.com/locuslab/fast_adversarial.
Tasks
Published 2020-01-12
URL https://arxiv.org/abs/2001.03994v1
PDF https://arxiv.org/pdf/2001.03994v1.pdf
PWC https://paperswithcode.com/paper/fast-is-better-than-free-revisiting-1
Repo https://github.com/locuslab/fast_adversarial
Framework pytorch

Blurry Video Frame Interpolation

Title Blurry Video Frame Interpolation
Authors Wang Shen, Wenbo Bao, Guangtao Zhai, Li Chen, Xiongkuo Min, Zhiyong Gao
Abstract Existing works reduce motion blur and up-convert frame rate through two separate ways, including frame deblurring and frame interpolation. However, few studies have approached the joint video enhancement problem, namely synthesizing high-frame-rate clear results from low-frame-rate blurry inputs. In this paper, we propose a blurry video frame interpolation method to reduce motion blur and up-convert frame rate simultaneously. Specifically, we develop a pyramid module to cyclically synthesize clear intermediate frames. The pyramid module features adjustable spatial receptive field and temporal scope, thus contributing to controllable computational complexity and restoration ability. Besides, we propose an inter-pyramid recurrent module to connect sequential models to exploit the temporal relationship. The pyramid module integrates a recurrent module, thus can iteratively synthesize temporally smooth results without significantly increasing the model size. Extensive experimental results demonstrate that our method performs favorably against state-of-the-art methods.
Tasks Deblurring, Video Frame Interpolation
Published 2020-02-27
URL https://arxiv.org/abs/2002.12259v1
PDF https://arxiv.org/pdf/2002.12259v1.pdf
PWC https://paperswithcode.com/paper/blurry-video-frame-interpolation
Repo https://github.com/laomao0/BIN
Framework pytorch

CAE-LO: LiDAR Odometry Leveraging Fully Unsupervised Convolutional Auto-Encoder for Interest Point Detection and Feature Description

Title CAE-LO: LiDAR Odometry Leveraging Fully Unsupervised Convolutional Auto-Encoder for Interest Point Detection and Feature Description
Authors Deyu Yin, Qian Zhang, Jingbin Liu, Xinlian Liang, Yunsheng Wang, Jyri Maanpää, Hao Ma, Juha Hyyppä, Ruizhi Chen
Abstract As an important technology in 3D mapping, autonomous driving, and robot navigation, LiDAR odometry is still a challenging task. Appropriate data structure and unsupervised deep learning are the keys to achieve an easy adjusted LiDAR odometry solution with high performance. Utilizing compact 2D structured spherical ring projection model and voxel model which preserves the original shape of input data, we propose a fully unsupervised Convolutional Auto-Encoder based LiDAR Odometry (CAE-LO) that detects interest points from spherical ring data using 2D CAE and extracts features from multi-resolution voxel model using 3D CAE. We make several key contributions: 1) experiments based on KITTI dataset show that our interest points can capture more local details to improve the matching success rate on unstructured scenarios and our features outperform state-of-the-art by more than 50% in matching inlier ratio; 2) besides, we also propose a keyframe selection method based on matching pairs transferring, an odometry refinement method for keyframes based on extended interest points from spherical rings, and a backward pose update method. The odometry refinement experiments verify the proposed ideas’ feasibility and effectiveness.
Tasks Autonomous Driving, Interest Point Detection, Robot Navigation
Published 2020-01-06
URL https://arxiv.org/abs/2001.01354v2
PDF https://arxiv.org/pdf/2001.01354v2.pdf
PWC https://paperswithcode.com/paper/cae-lo-lidar-odometry-leveraging-fully
Repo https://github.com/SRainGit/CAE-LO
Framework none

Boosting Adversarial Training with Hypersphere Embedding

Title Boosting Adversarial Training with Hypersphere Embedding
Authors Tianyu Pang, Xiao Yang, Yinpeng Dong, Kun Xu, Hang Su, Jun Zhu
Abstract Adversarial training (AT) is one of the most effective defenses to improve the adversarial robustness of deep learning models. In order to promote the reliability of the adversarially trained models, we propose to boost AT via incorporating hypersphere embedding (HE), which can regularize the adversarial features onto compact hypersphere manifolds. We formally demonstrate that AT and HE are well coupled, which tunes up the learning dynamics of AT from several aspects. We comprehensively validate the effectiveness and universality of HE by embedding it into the popular AT frameworks including PGD-AT, ALP, and TRADES, as well as the FreeAT and FastAT strategies. In experiments, we evaluate our methods on the CIFAR-10 and ImageNet datasets, and verify that integrating HE can consistently enhance the performance of the models trained by each AT framework with little extra computation.
Tasks
Published 2020-02-20
URL https://arxiv.org/abs/2002.08619v1
PDF https://arxiv.org/pdf/2002.08619v1.pdf
PWC https://paperswithcode.com/paper/boosting-adversarial-training-with
Repo https://github.com/ShawnXYang/AT_HE
Framework pytorch

Collaborative Motion Prediction via Neural Motion Message Passing

Title Collaborative Motion Prediction via Neural Motion Message Passing
Authors Yue Hu, Siheng Chen, Ya Zhang, Xiao Gu
Abstract Motion prediction is essential and challenging for autonomous vehicles and social robots. One challenge of motion prediction is to model the interaction among traffic actors, which could cooperate with each other to avoid collisions or form groups. To address this challenge, we propose neural motion message passing (NMMP) to explicitly model the interaction and learn representations for directed interactions between actors. Based on the proposed NMMP, we design the motion prediction systems for two settings: the pedestrian setting and the joint pedestrian and vehicle setting. Both systems share a common pattern: we use an individual branch to model the behavior of a single actor and an interactive branch to model the interaction between actors, while with different wrappers to handle the varied input formats and characteristics. The experimental results show that both systems outperform the previous state-of-the-art methods on several existing benchmarks. Besides, we provide interpretability for interaction learning.
Tasks Autonomous Vehicles, motion prediction
Published 2020-03-14
URL https://arxiv.org/abs/2003.06594v1
PDF https://arxiv.org/pdf/2003.06594v1.pdf
PWC https://paperswithcode.com/paper/collaborative-motion-prediction-via-neural
Repo https://github.com/PhyllisH/NMMP
Framework pytorch

Vision Meets Drones: Past, Present and Future

Title Vision Meets Drones: Past, Present and Future
Authors Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Qinghua Hu, Haibin Ling
Abstract Drones, or general UAVs, equipped with cameras have been fast deployed with a wide range of applications, including agriculture, aerial photography, fast delivery, and surveillance. Consequently, automatic understanding of visual data collected from drones becomes highly demanding, bringing computer vision and drones more and more closely. To promote and track the developments of object detection and tracking algorithms, we have organized two challenge workshops in conjunction with European Conference on Computer Vision (ECCV) 2018, and IEEE International Conference on Computer Vision (ICCV) 2019, attracting more than 100 teams around the world. We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i.e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking. This paper first presents a thorough review of object detection and tracking datasets and benchmarks, and discuss the challenges of collecting large-scale drone-based object detection and tracking datasets with fully manual annotations. After that, we describe our VisDrone dataset, which is captured over various urban/suburban areas of $14$ different cities across China from North to South. Being the largest such dataset ever published, VisDrone enables extensive evaluation and investigation of visual analysis algorithms on the drone platform. We provide a detailed analysis of the current state of the field of large-scale object detection and tracking on drones, and conclude the challenge as well as propose future directions and improvements. We expect the benchmark largely boost the research and development in video analysis on drone platforms. All the datasets and experimental results can be downloaded from the website: https://github.com/VisDrone/VisDrone-Dataset.
Tasks Multi-Object Tracking, Object Detection, Object Tracking, Video Object Detection
Published 2020-01-16
URL https://arxiv.org/abs/2001.06303v1
PDF https://arxiv.org/pdf/2001.06303v1.pdf
PWC https://paperswithcode.com/paper/vision-meets-drones-past-present-and-future
Repo https://github.com/VisDrone/VisDrone-Dataset
Framework none

Variation across Scales: Measurement Fidelity under Twitter Data Sampling

Title Variation across Scales: Measurement Fidelity under Twitter Data Sampling
Authors Siqi Wu, Marian-Andrei Rizoiu, Lexing Xie
Abstract A comprehensive understanding of data quality is the cornerstone of measurement studies in social media research. This paper presents in-depth measurements on the effects of Twitter data sampling across different timescales and different subjects (entities, networks, and cascades). By constructing complete tweet streams, we show that Twitter rate limit message is an accurate indicator for the volume of missing tweets. Sampling also differs significantly across timescales. While the hourly sampling rate is influenced by the diurnal rhythm in different time zones, the millisecond level sampling is heavily affected by the implementation choices. For Twitter entities such as users, we find the Bernoulli process with a uniform rate approximates the empirical distributions well. It also allows us to estimate the true ranking with the observed sample data. For networks on Twitter, their structures are altered significantly and some components are more likely to be preserved. For retweet cascades, we observe changes in distributions of tweet inter-arrival time and user influence, which will affect models that rely on these features. This work calls attention to noises and potential biases in social data, and provides a few tools to measure Twitter sampling effects.
Tasks
Published 2020-03-21
URL https://arxiv.org/abs/2003.09557v2
PDF https://arxiv.org/pdf/2003.09557v2.pdf
PWC https://paperswithcode.com/paper/variation-across-scales-measurement-fidelity
Repo https://github.com/avalanchesiqi/twitter-sampling
Framework none

Identifying Mislabeled Data using the Area Under the Margin Ranking

Title Identifying Mislabeled Data using the Area Under the Margin Ranking
Authors Geoff Pleiss, Tianyi Zhang, Ethan R. Elenberg, Kilian Q. Weinberger
Abstract Not all data in a typical training set help with generalization; some samples can be overly ambiguous or outrightly mislabeled. This paper introduces a new method to identify such samples and mitigate their impact when training neural networks. At the heart of our algorithm is the Area Under the Margin (AUM) statistic, which exploits differences in the training dynamics of clean and mislabeled samples. A simple procedure - adding an extra class populated with purposefully mislabeled indicator samples - learns a threshold that isolates mislabeled data based on this metric. This approach consistently improves upon prior work on synthetic and real-world datasets. On the WebVision50 classification task our method removes 17% of training data, yielding a 2.6% (absolute) improvement in test error. On CIFAR100 removing 13% of the data leads to a 1.2% drop in error.
Tasks
Published 2020-01-28
URL https://arxiv.org/abs/2001.10528v2
PDF https://arxiv.org/pdf/2001.10528v2.pdf
PWC https://paperswithcode.com/paper/identifying-mislabeled-data-using-the-area
Repo https://github.com/Manuscrit/Area-Under-the-Margin-Ranking
Framework pytorch

Object Instance Mining for Weakly Supervised Object Detection

Title Object Instance Mining for Weakly Supervised Object Detection
Authors Chenhao Lin, Siwen Wang, Dongqi Xu, Yu Lu, Wayne Zhang
Abstract Weakly supervised object detection (WSOD) using only image-level annotations has attracted growing attention over the past few years. Existing approaches using multiple instance learning easily fall into local optima, because such mechanism tends to learn from the most discriminative object in an image for each category. Therefore, these methods suffer from missing object instances which degrade the performance of WSOD. To address this problem, this paper introduces an end-to-end object instance mining (OIM) framework for weakly supervised object detection. OIM attempts to detect all possible object instances existing in each image by introducing information propagation on the spatial and appearance graphs, without any additional annotations. During the iterative learning process, the less discriminative object instances from the same class can be gradually detected and utilized for training. In addition, we design an object instance reweighted loss to learn larger portion of each object instance to further improve the performance. The experimental results on two publicly available databases, VOC 2007 and 2012, demonstrate the efficacy of proposed approach.
Tasks Multiple Instance Learning, Object Detection, Weakly Supervised Object Detection
Published 2020-02-04
URL https://arxiv.org/abs/2002.01087v1
PDF https://arxiv.org/pdf/2002.01087v1.pdf
PWC https://paperswithcode.com/paper/object-instance-mining-for-weakly-supervised
Repo https://github.com/bigvideoresearch/OIM
Framework none

Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation

Title Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation
Authors Yifei Chen, Haoyu Ma, Deying Kong, Xiangyi Yan, Jianbao Wu, Wei Fan, Xiaohui Xie
Abstract Hand pose estimation is more challenging than body pose estimation due to severe articulation, self-occlusion and high dexterity of the hand. Current approaches often rely on a popular body pose algorithm, such as the Convolutional Pose Machine (CPM), to learn 2D keypoint features. These algorithms cannot adequately address the unique challenges of hand pose estimation, because they are trained solely based on keypoint positions without seeking to explicitly model structural relationship between them. We propose a novel Nonparametric Structure Regularization Machine (NSRM) for 2D hand pose estimation, adopting a cascade multi-task architecture to learn hand structure and keypoint representations jointly. The structure learning is guided by synthetic hand mask representations, which are directly computed from keypoint positions, and is further strengthened by a novel probabilistic representation of hand limbs and an anatomically inspired composition strategy of mask synthesis. We conduct extensive studies on two public datasets - OneHand 10k and CMU Panoptic Hand. Experimental results demonstrate that explicitly enforcing structure learning consistently improves pose estimation accuracy of CPM baseline models, by 1.17% on the first dataset and 4.01% on the second one. The implementation and experiment code is freely available online. Our proposal of incorporating structural learning to hand pose estimation requires no additional training information, and can be a generic add-on module to other pose estimation models.
Tasks Hand Pose Estimation, Pose Estimation
Published 2020-01-24
URL https://arxiv.org/abs/2001.08869v1
PDF https://arxiv.org/pdf/2001.08869v1.pdf
PWC https://paperswithcode.com/paper/nonparametric-structure-regularization
Repo https://github.com/HowieMa/NSRMhand
Framework pytorch

On Contrastive Learning for Likelihood-free Inference

Title On Contrastive Learning for Likelihood-free Inference
Authors Conor Durkan, Iain Murray, George Papamakarios
Abstract Likelihood-free methods perform parameter inference in stochastic simulator models where evaluating the likelihood is intractable but sampling synthetic data is possible. One class of methods for this likelihood-free problem uses a classifier to distinguish between pairs of parameter-observation samples generated using the simulator and pairs sampled from some reference distribution, which implicitly learns a density ratio proportional to the likelihood. Another popular class of methods fits a conditional distribution to the parameter posterior directly, and a particular recent variant allows for the use of flexible neural density estimators for this task. In this work, we show that both of these approaches can be unified under a general contrastive learning scheme, and clarify how they should be run and compared.
Tasks
Published 2020-02-10
URL https://arxiv.org/abs/2002.03712v1
PDF https://arxiv.org/pdf/2002.03712v1.pdf
PWC https://paperswithcode.com/paper/on-contrastive-learning-for-likelihood-free
Repo https://github.com/mackelab/nflows
Framework pytorch

Adaptive Covariate Acquisition for Minimizing Total Cost of Classification

Title Adaptive Covariate Acquisition for Minimizing Total Cost of Classification
Authors Daniel Andrade, Yuzuru Okajima
Abstract In some applications, acquiring covariates comes at a cost which is not negligible. For example in the medical domain, in order to classify whether a patient has diabetes or not, measuring glucose tolerance can be expensive. Assuming that the cost of each covariate, and the cost of misclassification can be specified by the user, our goal is to minimize the (expected) total cost of classification, i.e. the cost of misclassification plus the cost of the acquired covariates. We formalize this optimization goal using the (conditional) Bayes risk and describe the optimal solution using a recursive procedure. Since the procedure is computationally infeasible, we consequently introduce two assumptions: (1) the optimal classifier can be represented by a generalized additive model, (2) the optimal sets of covariates are limited to a sequence of sets of increasing size. We show that under these two assumptions, a computationally efficient solution exists. Furthermore, on several medical datasets, we show that the proposed method achieves in most situations the lowest total costs when compared to various previous methods. Finally, we weaken the requirement on the user to specify all misclassification costs by allowing the user to specify the minimally acceptable recall (target recall). Our experiments confirm that the proposed method achieves the target recall while minimizing the false discovery rate and the covariate acquisition costs better than previous methods.
Tasks
Published 2020-02-21
URL https://arxiv.org/abs/2002.09162v1
PDF https://arxiv.org/pdf/2002.09162v1.pdf
PWC https://paperswithcode.com/paper/adaptive-covariate-acquisition-for-minimizing
Repo https://github.com/andrade-stats/AdaCOS_public
Framework none

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Title Reinforced Negative Sampling over Knowledge Graph for Recommendation
Authors Xiang Wang, Yaokun Xu, Xiangnan He, Yixin Cao, Meng Wang, Tat-Seng Chua
Abstract Properly handling missing data is a fundamental challenge in recommendation. Most present works perform negative sampling from unobserved data to supply the training of recommender models with negative signals. Nevertheless, existing negative sampling strategies, either static or adaptive ones, are insufficient to yield high-quality negative samples — both informative to model training and reflective of user real needs. In this work, we hypothesize that item knowledge graph (KG), which provides rich relations among items and KG entities, could be useful to infer informative and factual negative samples. Towards this end, we develop a new negative sampling model, Knowledge Graph Policy Network (KGPolicy), which works as a reinforcement learning agent to explore high-quality negatives. Specifically, by conducting our designed exploration operations, it navigates from the target positive interaction, adaptively receives knowledge-aware negative signals, and ultimately yields a potential negative item to train the recommender. We tested on a matrix factorization (MF) model equipped with KGPolicy, and it achieves significant improvements over both state-of-the-art sampling methods like DNS and IRGAN, and KG-enhanced recommender models like KGAT. Further analyses from different angles provide insights of knowledge-aware sampling. We release the codes and datasets at https://github.com/xiangwang1223/kgpolicy.
Tasks
Published 2020-03-12
URL https://arxiv.org/abs/2003.05753v1
PDF https://arxiv.org/pdf/2003.05753v1.pdf
PWC https://paperswithcode.com/paper/reinforced-negative-sampling-over-knowledge
Repo https://github.com/xiangwang1223/kgpolicy
Framework pytorch

Grassmannian Optimization for Online Tensor Completion and Tracking in the t-SVD Algebra

Title Grassmannian Optimization for Online Tensor Completion and Tracking in the t-SVD Algebra
Authors Kyle Gilman, Laura Balzano
Abstract We propose a new streaming algorithm, called TOUCAN, for the tensor completion problem of imputing missing entries of a low-tubal-rank tensor using the recently proposed tensor-tensor product (t-product) and tensor singular value decomposition (t-SVD) algebraic framework. We also demonstrate TOUCAN’s ability to track changing free submodules from highly incomplete streaming 2-D data. TOUCAN uses principles from incremental gradient descent on the Grassmann manifold of subspaces to solve the tensor completion problem with linear complexity and constant memory in the number of time samples. We compare our results to state-of-the-art tensor completion algorithms in real applications to recover temporal chemo-sensing data and MRI data under limited sampling.
Tasks
Published 2020-01-30
URL https://arxiv.org/abs/2001.11419v1
PDF https://arxiv.org/pdf/2001.11419v1.pdf
PWC https://paperswithcode.com/paper/grassmannian-optimization-for-online-tensor
Repo https://github.com/kgilman/TOUCAN
Framework none

Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference

Title Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference
Authors Ting-Kuei Hu, Tianlong Chen, Haotao Wang, Zhangyang Wang
Abstract Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images) (Tsipras et al., 2019). Such a dilemma is shown to be rooted in the inherently higher sample complexity (Schmidt et al., 2018) and/or model capacity (Nakkiran, 2019), for learning a high-accuracy and robust classifier. In view of that, give a classification task, growing the model capacity appears to help draw a win-win between accuracy and robustness, yet at the expense of model size and latency, therefore posing challenges for resource-constrained applications. Is it possible to co-design model accuracy, robustness and efficiency to achieve their triple wins? This paper studies multi-exit networks associated with input-adaptive efficient inference, showing their strong promise in achieving a “sweet point” in cooptimizing model accuracy, robustness and efficiency. Our proposed solution, dubbed Robust Dynamic Inference Networks (RDI-Nets), allows for each input (either clean or adversarial) to adaptively choose one of the multiple output layers (early branches or the final one) to output its prediction. That multi-loss adaptivity adds new variations and flexibility to adversarial attacks and defenses, on which we present a systematical investigation. We show experimentally that by equipping existing backbones with such robust adaptive inference, the resulting RDI-Nets can achieve better accuracy and robustness, yet with over 30% computational savings, compared to the defended original models.
Tasks
Published 2020-02-24
URL https://arxiv.org/abs/2002.10025v2
PDF https://arxiv.org/pdf/2002.10025v2.pdf
PWC https://paperswithcode.com/paper/triple-wins-boosting-accuracy-robustness-and-1
Repo https://github.com/TAMU-VITA/triple-wins
Framework pytorch
comments powered by Disqus