April 3, 2020

3386 words 16 mins read

Paper Group AWR 34

Fast is better than free: Revisiting adversarial training. Blurry Video Frame Interpolation. CAE-LO: LiDAR Odometry Leveraging Fully Unsupervised Convolutional Auto-Encoder for Interest Point Detection and Feature Description. Boosting Adversarial Training with Hypersphere Embedding. Collaborative Motion Prediction via Neural Motion Message Passing …

Fast is better than free: Revisiting adversarial training


Title	Fast is better than free: Revisiting adversarial training
Authors	Eric Wong, Leslie Rice, J. Zico Kolter
Abstract	Adversarial training, a method for learning robust deep networks, is typically assumed to be more expensive than traditional training due to the necessity of constructing adversarial examples via a first-order method like projected gradient decent (PGD). In this paper, we make the surprising discovery that it is possible to train empirically robust models using a much weaker and cheaper adversary, an approach that was previously believed to be ineffective, rendering the method no more costly than standard training in practice. Specifically, we show that adversarial training with the fast gradient sign method (FGSM), when combined with random initialization, is as effective as PGD-based training but has significantly lower cost. Furthermore we show that FGSM adversarial training can be further accelerated by using standard techniques for efficient training of deep networks, allowing us to learn a robust CIFAR10 classifier with 45% robust accuracy to PGD attacks with $\epsilon=8/255$ in 6 minutes, and a robust ImageNet classifier with 43% robust accuracy at $\epsilon=2/255$ in 12 hours, in comparison to past work based on “free” adversarial training which took 10 and 50 hours to reach the same respective thresholds. Finally, we identify a failure mode referred to as “catastrophic overfitting” which may have caused previous attempts to use FGSM adversarial training to fail. All code for reproducing the experiments in this paper as well as pretrained model weights are at https://github.com/locuslab/fast_adversarial.
Tasks
Published	2020-01-12
URL	https://arxiv.org/abs/2001.03994v1
PDF	https://arxiv.org/pdf/2001.03994v1.pdf
PWC	https://paperswithcode.com/paper/fast-is-better-than-free-revisiting-1
Repo	https://github.com/locuslab/fast_adversarial
Framework	pytorch

Blurry Video Frame Interpolation


Title	Blurry Video Frame Interpolation
Authors	Wang Shen, Wenbo Bao, Guangtao Zhai, Li Chen, Xiongkuo Min, Zhiyong Gao
Abstract	Existing works reduce motion blur and up-convert frame rate through two separate ways, including frame deblurring and frame interpolation. However, few studies have approached the joint video enhancement problem, namely synthesizing high-frame-rate clear results from low-frame-rate blurry inputs. In this paper, we propose a blurry video frame interpolation method to reduce motion blur and up-convert frame rate simultaneously. Specifically, we develop a pyramid module to cyclically synthesize clear intermediate frames. The pyramid module features adjustable spatial receptive field and temporal scope, thus contributing to controllable computational complexity and restoration ability. Besides, we propose an inter-pyramid recurrent module to connect sequential models to exploit the temporal relationship. The pyramid module integrates a recurrent module, thus can iteratively synthesize temporally smooth results without significantly increasing the model size. Extensive experimental results demonstrate that our method performs favorably against state-of-the-art methods.
Tasks	Deblurring, Video Frame Interpolation
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12259v1
PDF	https://arxiv.org/pdf/2002.12259v1.pdf
PWC	https://paperswithcode.com/paper/blurry-video-frame-interpolation
Repo	https://github.com/laomao0/BIN
Framework	pytorch

CAE-LO: LiDAR Odometry Leveraging Fully Unsupervised Convolutional Auto-Encoder for Interest Point Detection and Feature Description


Title	CAE-LO: LiDAR Odometry Leveraging Fully Unsupervised Convolutional Auto-Encoder for Interest Point Detection and Feature Description
Authors	Deyu Yin, Qian Zhang, Jingbin Liu, Xinlian Liang, Yunsheng Wang, Jyri Maanpää, Hao Ma, Juha Hyyppä, Ruizhi Chen
Abstract	As an important technology in 3D mapping, autonomous driving, and robot navigation, LiDAR odometry is still a challenging task. Appropriate data structure and unsupervised deep learning are the keys to achieve an easy adjusted LiDAR odometry solution with high performance. Utilizing compact 2D structured spherical ring projection model and voxel model which preserves the original shape of input data, we propose a fully unsupervised Convolutional Auto-Encoder based LiDAR Odometry (CAE-LO) that detects interest points from spherical ring data using 2D CAE and extracts features from multi-resolution voxel model using 3D CAE. We make several key contributions: 1) experiments based on KITTI dataset show that our interest points can capture more local details to improve the matching success rate on unstructured scenarios and our features outperform state-of-the-art by more than 50% in matching inlier ratio; 2) besides, we also propose a keyframe selection method based on matching pairs transferring, an odometry refinement method for keyframes based on extended interest points from spherical rings, and a backward pose update method. The odometry refinement experiments verify the proposed ideas’ feasibility and effectiveness.
Tasks	Autonomous Driving, Interest Point Detection, Robot Navigation
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01354v2
PDF	https://arxiv.org/pdf/2001.01354v2.pdf
PWC	https://paperswithcode.com/paper/cae-lo-lidar-odometry-leveraging-fully
Repo	https://github.com/SRainGit/CAE-LO
Framework	none

Boosting Adversarial Training with Hypersphere Embedding


Title	Boosting Adversarial Training with Hypersphere Embedding
Authors	Tianyu Pang, Xiao Yang, Yinpeng Dong, Kun Xu, Hang Su, Jun Zhu
Abstract	Adversarial training (AT) is one of the most effective defenses to improve the adversarial robustness of deep learning models. In order to promote the reliability of the adversarially trained models, we propose to boost AT via incorporating hypersphere embedding (HE), which can regularize the adversarial features onto compact hypersphere manifolds. We formally demonstrate that AT and HE are well coupled, which tunes up the learning dynamics of AT from several aspects. We comprehensively validate the effectiveness and universality of HE by embedding it into the popular AT frameworks including PGD-AT, ALP, and TRADES, as well as the FreeAT and FastAT strategies. In experiments, we evaluate our methods on the CIFAR-10 and ImageNet datasets, and verify that integrating HE can consistently enhance the performance of the models trained by each AT framework with little extra computation.
Tasks
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08619v1
PDF	https://arxiv.org/pdf/2002.08619v1.pdf
PWC	https://paperswithcode.com/paper/boosting-adversarial-training-with
Repo	https://github.com/ShawnXYang/AT_HE
Framework	pytorch

Collaborative Motion Prediction via Neural Motion Message Passing


Title	Collaborative Motion Prediction via Neural Motion Message Passing
Authors	Yue Hu, Siheng Chen, Ya Zhang, Xiao Gu
Abstract	Motion prediction is essential and challenging for autonomous vehicles and social robots. One challenge of motion prediction is to model the interaction among traffic actors, which could cooperate with each other to avoid collisions or form groups. To address this challenge, we propose neural motion message passing (NMMP) to explicitly model the interaction and learn representations for directed interactions between actors. Based on the proposed NMMP, we design the motion prediction systems for two settings: the pedestrian setting and the joint pedestrian and vehicle setting. Both systems share a common pattern: we use an individual branch to model the behavior of a single actor and an interactive branch to model the interaction between actors, while with different wrappers to handle the varied input formats and characteristics. The experimental results show that both systems outperform the previous state-of-the-art methods on several existing benchmarks. Besides, we provide interpretability for interaction learning.
Tasks	Autonomous Vehicles, motion prediction
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06594v1
PDF	https://arxiv.org/pdf/2003.06594v1.pdf
PWC	https://paperswithcode.com/paper/collaborative-motion-prediction-via-neural
Repo	https://github.com/PhyllisH/NMMP
Framework	pytorch

Vision Meets Drones: Past, Present and Future


Title	Vision Meets Drones: Past, Present and Future
Authors	Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Qinghua Hu, Haibin Ling
Abstract	Drones, or general UAVs, equipped with cameras have been fast deployed with a wide range of applications, including agriculture, aerial photography, fast delivery, and surveillance. Consequently, automatic understanding of visual data collected from drones becomes highly demanding, bringing computer vision and drones more and more closely. To promote and track the developments of object detection and tracking algorithms, we have organized two challenge workshops in conjunction with European Conference on Computer Vision (ECCV) 2018, and IEEE International Conference on Computer Vision (ICCV) 2019, attracting more than 100 teams around the world. We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i.e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking. This paper first presents a thorough review of object detection and tracking datasets and benchmarks, and discuss the challenges of collecting large-scale drone-based object detection and tracking datasets with fully manual annotations. After that, we describe our VisDrone dataset, which is captured over various urban/suburban areas of $14$ different cities across China from North to South. Being the largest such dataset ever published, VisDrone enables extensive evaluation and investigation of visual analysis algorithms on the drone platform. We provide a detailed analysis of the current state of the field of large-scale object detection and tracking on drones, and conclude the challenge as well as propose future directions and improvements. We expect the benchmark largely boost the research and development in video analysis on drone platforms. All the datasets and experimental results can be downloaded from the website: https://github.com/VisDrone/VisDrone-Dataset.
Tasks	Multi-Object Tracking, Object Detection, Object Tracking, Video Object Detection
Published	2020-01-16
URL	https://arxiv.org/abs/2001.06303v1
PDF	https://arxiv.org/pdf/2001.06303v1.pdf
PWC	https://paperswithcode.com/paper/vision-meets-drones-past-present-and-future
Repo	https://github.com/VisDrone/VisDrone-Dataset
Framework	none

Variation across Scales: Measurement Fidelity under Twitter Data Sampling


Title	Variation across Scales: Measurement Fidelity under Twitter Data Sampling
Authors	Siqi Wu, Marian-Andrei Rizoiu, Lexing Xie
Abstract	A comprehensive understanding of data quality is the cornerstone of measurement studies in social media research. This paper presents in-depth measurements on the effects of Twitter data sampling across different timescales and different subjects (entities, networks, and cascades). By constructing complete tweet streams, we show that Twitter rate limit message is an accurate indicator for the volume of missing tweets. Sampling also differs significantly across timescales. While the hourly sampling rate is influenced by the diurnal rhythm in different time zones, the millisecond level sampling is heavily affected by the implementation choices. For Twitter entities such as users, we find the Bernoulli process with a uniform rate approximates the empirical distributions well. It also allows us to estimate the true ranking with the observed sample data. For networks on Twitter, their structures are altered significantly and some components are more likely to be preserved. For retweet cascades, we observe changes in distributions of tweet inter-arrival time and user influence, which will affect models that rely on these features. This work calls attention to noises and potential biases in social data, and provides a few tools to measure Twitter sampling effects.
Tasks
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09557v2
PDF	https://arxiv.org/pdf/2003.09557v2.pdf
PWC	https://paperswithcode.com/paper/variation-across-scales-measurement-fidelity
Repo	https://github.com/avalanchesiqi/twitter-sampling
Framework	none

Identifying Mislabeled Data using the Area Under the Margin Ranking


Title	Identifying Mislabeled Data using the Area Under the Margin Ranking
Authors	Geoff Pleiss, Tianyi Zhang, Ethan R. Elenberg, Kilian Q. Weinberger
Abstract	Not all data in a typical training set help with generalization; some samples can be overly ambiguous or outrightly mislabeled. This paper introduces a new method to identify such samples and mitigate their impact when training neural networks. At the heart of our algorithm is the Area Under the Margin (AUM) statistic, which exploits differences in the training dynamics of clean and mislabeled samples. A simple procedure - adding an extra class populated with purposefully mislabeled indicator samples - learns a threshold that isolates mislabeled data based on this metric. This approach consistently improves upon prior work on synthetic and real-world datasets. On the WebVision50 classification task our method removes 17% of training data, yielding a 2.6% (absolute) improvement in test error. On CIFAR100 removing 13% of the data leads to a 1.2% drop in error.
Tasks
Published	2020-01-28
URL	https://arxiv.org/abs/2001.10528v2
PDF	https://arxiv.org/pdf/2001.10528v2.pdf
PWC	https://paperswithcode.com/paper/identifying-mislabeled-data-using-the-area
Repo	https://github.com/Manuscrit/Area-Under-the-Margin-Ranking
Framework	pytorch

Object Instance Mining for Weakly Supervised Object Detection


Title	Object Instance Mining for Weakly Supervised Object Detection
Authors	Chenhao Lin, Siwen Wang, Dongqi Xu, Yu Lu, Wayne Zhang
Abstract	Weakly supervised object detection (WSOD) using only image-level annotations has attracted growing attention over the past few years. Existing approaches using multiple instance learning easily fall into local optima, because such mechanism tends to learn from the most discriminative object in an image for each category. Therefore, these methods suffer from missing object instances which degrade the performance of WSOD. To address this problem, this paper introduces an end-to-end object instance mining (OIM) framework for weakly supervised object detection. OIM attempts to detect all possible object instances existing in each image by introducing information propagation on the spatial and appearance graphs, without any additional annotations. During the iterative learning process, the less discriminative object instances from the same class can be gradually detected and utilized for training. In addition, we design an object instance reweighted loss to learn larger portion of each object instance to further improve the performance. The experimental results on two publicly available databases, VOC 2007 and 2012, demonstrate the efficacy of proposed approach.
Tasks	Multiple Instance Learning, Object Detection, Weakly Supervised Object Detection
Published	2020-02-04
URL	https://arxiv.org/abs/2002.01087v1
PDF	https://arxiv.org/pdf/2002.01087v1.pdf
PWC	https://paperswithcode.com/paper/object-instance-mining-for-weakly-supervised
Repo	https://github.com/bigvideoresearch/OIM
Framework	none

Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation


Title	Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation
Authors	Yifei Chen, Haoyu Ma, Deying Kong, Xiangyi Yan, Jianbao Wu, Wei Fan, Xiaohui Xie
Abstract	Hand pose estimation is more challenging than body pose estimation due to severe articulation, self-occlusion and high dexterity of the hand. Current approaches often rely on a popular body pose algorithm, such as the Convolutional Pose Machine (CPM), to learn 2D keypoint features. These algorithms cannot adequately address the unique challenges of hand pose estimation, because they are trained solely based on keypoint positions without seeking to explicitly model structural relationship between them. We propose a novel Nonparametric Structure Regularization Machine (NSRM) for 2D hand pose estimation, adopting a cascade multi-task architecture to learn hand structure and keypoint representations jointly. The structure learning is guided by synthetic hand mask representations, which are directly computed from keypoint positions, and is further strengthened by a novel probabilistic representation of hand limbs and an anatomically inspired composition strategy of mask synthesis. We conduct extensive studies on two public datasets - OneHand 10k and CMU Panoptic Hand. Experimental results demonstrate that explicitly enforcing structure learning consistently improves pose estimation accuracy of CPM baseline models, by 1.17% on the first dataset and 4.01% on the second one. The implementation and experiment code is freely available online. Our proposal of incorporating structural learning to hand pose estimation requires no additional training information, and can be a generic add-on module to other pose estimation models.
Tasks	Hand Pose Estimation, Pose Estimation
Published	2020-01-24
URL	https://arxiv.org/abs/2001.08869v1
PDF	https://arxiv.org/pdf/2001.08869v1.pdf
PWC	https://paperswithcode.com/paper/nonparametric-structure-regularization
Repo	https://github.com/HowieMa/NSRMhand
Framework	pytorch

On Contrastive Learning for Likelihood-free Inference


Title	On Contrastive Learning for Likelihood-free Inference
Authors	Conor Durkan, Iain Murray, George Papamakarios
Abstract	Likelihood-free methods perform parameter inference in stochastic simulator models where evaluating the likelihood is intractable but sampling synthetic data is possible. One class of methods for this likelihood-free problem uses a classifier to distinguish between pairs of parameter-observation samples generated using the simulator and pairs sampled from some reference distribution, which implicitly learns a density ratio proportional to the likelihood. Another popular class of methods fits a conditional distribution to the parameter posterior directly, and a particular recent variant allows for the use of flexible neural density estimators for this task. In this work, we show that both of these approaches can be unified under a general contrastive learning scheme, and clarify how they should be run and compared.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03712v1
PDF	https://arxiv.org/pdf/2002.03712v1.pdf
PWC	https://paperswithcode.com/paper/on-contrastive-learning-for-likelihood-free
Repo	https://github.com/mackelab/nflows
Framework	pytorch

Adaptive Covariate Acquisition for Minimizing Total Cost of Classification


Title	Adaptive Covariate Acquisition for Minimizing Total Cost of Classification
Authors	Daniel Andrade, Yuzuru Okajima
Abstract	In some applications, acquiring covariates comes at a cost which is not negligible. For example in the medical domain, in order to classify whether a patient has diabetes or not, measuring glucose tolerance can be expensive. Assuming that the cost of each covariate, and the cost of misclassification can be specified by the user, our goal is to minimize the (expected) total cost of classification, i.e. the cost of misclassification plus the cost of the acquired covariates. We formalize this optimization goal using the (conditional) Bayes risk and describe the optimal solution using a recursive procedure. Since the procedure is computationally infeasible, we consequently introduce two assumptions: (1) the optimal classifier can be represented by a generalized additive model, (2) the optimal sets of covariates are limited to a sequence of sets of increasing size. We show that under these two assumptions, a computationally efficient solution exists. Furthermore, on several medical datasets, we show that the proposed method achieves in most situations the lowest total costs when compared to various previous methods. Finally, we weaken the requirement on the user to specify all misclassification costs by allowing the user to specify the minimally acceptable recall (target recall). Our experiments confirm that the proposed method achieves the target recall while minimizing the false discovery rate and the covariate acquisition costs better than previous methods.
Tasks
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09162v1
PDF	https://arxiv.org/pdf/2002.09162v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-covariate-acquisition-for-minimizing
Repo	https://github.com/andrade-stats/AdaCOS_public
Framework	none

Reinforced Negative Sampling over Knowledge Graph for Recommendation


Title	Reinforced Negative Sampling over Knowledge Graph for Recommendation
Authors	Xiang Wang, Yaokun Xu, Xiangnan He, Yixin Cao, Meng Wang, Tat-Seng Chua
Abstract	Properly handling missing data is a fundamental challenge in recommendation. Most present works perform negative sampling from unobserved data to supply the training of recommender models with negative signals. Nevertheless, existing negative sampling strategies, either static or adaptive ones, are insufficient to yield high-quality negative samples — both informative to model training and reflective of user real needs. In this work, we hypothesize that item knowledge graph (KG), which provides rich relations among items and KG entities, could be useful to infer informative and factual negative samples. Towards this end, we develop a new negative sampling model, Knowledge Graph Policy Network (KGPolicy), which works as a reinforcement learning agent to explore high-quality negatives. Specifically, by conducting our designed exploration operations, it navigates from the target positive interaction, adaptively receives knowledge-aware negative signals, and ultimately yields a potential negative item to train the recommender. We tested on a matrix factorization (MF) model equipped with KGPolicy, and it achieves significant improvements over both state-of-the-art sampling methods like DNS and IRGAN, and KG-enhanced recommender models like KGAT. Further analyses from different angles provide insights of knowledge-aware sampling. We release the codes and datasets at https://github.com/xiangwang1223/kgpolicy.
Tasks
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05753v1
PDF	https://arxiv.org/pdf/2003.05753v1.pdf
PWC	https://paperswithcode.com/paper/reinforced-negative-sampling-over-knowledge
Repo	https://github.com/xiangwang1223/kgpolicy
Framework	pytorch

Grassmannian Optimization for Online Tensor Completion and Tracking in the t-SVD Algebra


Title	Grassmannian Optimization for Online Tensor Completion and Tracking in the t-SVD Algebra
Authors	Kyle Gilman, Laura Balzano
Abstract	We propose a new streaming algorithm, called TOUCAN, for the tensor completion problem of imputing missing entries of a low-tubal-rank tensor using the recently proposed tensor-tensor product (t-product) and tensor singular value decomposition (t-SVD) algebraic framework. We also demonstrate TOUCAN’s ability to track changing free submodules from highly incomplete streaming 2-D data. TOUCAN uses principles from incremental gradient descent on the Grassmann manifold of subspaces to solve the tensor completion problem with linear complexity and constant memory in the number of time samples. We compare our results to state-of-the-art tensor completion algorithms in real applications to recover temporal chemo-sensing data and MRI data under limited sampling.
Tasks
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11419v1
PDF	https://arxiv.org/pdf/2001.11419v1.pdf
PWC	https://paperswithcode.com/paper/grassmannian-optimization-for-online-tensor
Repo	https://github.com/kgilman/TOUCAN
Framework	none

Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference


Title	Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference
Authors	Ting-Kuei Hu, Tianlong Chen, Haotao Wang, Zhangyang Wang
Abstract	Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images) (Tsipras et al., 2019). Such a dilemma is shown to be rooted in the inherently higher sample complexity (Schmidt et al., 2018) and/or model capacity (Nakkiran, 2019), for learning a high-accuracy and robust classifier. In view of that, give a classification task, growing the model capacity appears to help draw a win-win between accuracy and robustness, yet at the expense of model size and latency, therefore posing challenges for resource-constrained applications. Is it possible to co-design model accuracy, robustness and efficiency to achieve their triple wins? This paper studies multi-exit networks associated with input-adaptive efficient inference, showing their strong promise in achieving a “sweet point” in cooptimizing model accuracy, robustness and efficiency. Our proposed solution, dubbed Robust Dynamic Inference Networks (RDI-Nets), allows for each input (either clean or adversarial) to adaptively choose one of the multiple output layers (early branches or the final one) to output its prediction. That multi-loss adaptivity adds new variations and flexibility to adversarial attacks and defenses, on which we present a systematical investigation. We show experimentally that by equipping existing backbones with such robust adaptive inference, the resulting RDI-Nets can achieve better accuracy and robustness, yet with over 30% computational savings, compared to the defended original models.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10025v2
PDF	https://arxiv.org/pdf/2002.10025v2.pdf
PWC	https://paperswithcode.com/paper/triple-wins-boosting-accuracy-robustness-and-1
Repo	https://github.com/TAMU-VITA/triple-wins
Framework	pytorch