April 2, 2020

3232 words 16 mins read

Paper Group ANR 368

3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels. Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection. Privileged Pooling: Supervised attention-based pooling for compensating dataset bias. Symbiotic Attention with Privileged Information for Egocentric Action Recognition. Audiovisual SlowFast Networks for Video Reco …

3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels


Title	3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels
Authors	Qi Zhang, Antoni B. Chan
Abstract	Crowd counting has been studied for decades and a lot of works have achieved good performance, especially the DNNs-based density map estimation methods. Most existing crowd counting works focus on single-view counting, while few works have studied multi-view counting for large and wide scenes, where multiple cameras are used. Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones. Compared to 2D fusion, the 3D fusion extracts more information of the people along z-dimension (height), which helps to solve the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. We also explore the projection consistency among the 3D prediction and the ground-truth in the 2D views to further enhance the counting performance. The proposed method is tested on 3 multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.
Tasks	Crowd Counting
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08162v1
PDF	https://arxiv.org/pdf/2003.08162v1.pdf
PWC	https://paperswithcode.com/paper/3d-crowd-counting-via-multi-view-fusion-with
Repo
Framework

Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection


Title	Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection
Authors	Mao Ye, Chengyue Gong, Lizhen Nie, Denny Zhou, Adam Klivans, Qiang Liu
Abstract	Recent empirical works show that large deep neural networks are often highly redundant and one can find much smaller subnetworks without a significant drop of accuracy. However, most existing methods of network pruning are empirical and heuristic, leaving it open whether good subnetworks provably exist, how to find them efficiently, and if network pruning can be provably better than direct training using gradient descent. We answer these problems positively by proposing a simple greedy selection approach for finding good subnetworks, which starts from an empty network and greedily adds important neurons from the large network. This differs from the existing methods based on backward elimination, which remove redundant neurons from the large network. Theoretically, applying our greedy selection strategy on sufficiently large pre-trained networks guarantees to find small subnetworks with lower loss than networks directly trained with gradient descent. Practically, we improve prior arts of network pruning on learning compact neural architectures on ImageNet, including ResNet, MobilenetV2/V3, and ProxylessNet. Our theory and empirical results on MobileNet suggest that we should fine-tune the pruned subnetworks to leverage the information from the large model, instead of re-training from new random initialization as suggested in \citet{liu2018rethinking}.
Tasks	Network Pruning
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01794v1
PDF	https://arxiv.org/pdf/2003.01794v1.pdf
PWC	https://paperswithcode.com/paper/good-subnetworks-provably-exist-pruning-via
Repo
Framework

Privileged Pooling: Supervised attention-based pooling for compensating dataset bias


Title	Privileged Pooling: Supervised attention-based pooling for compensating dataset bias
Authors	Andres C. Rodriguez, Stefano D’Aronco, Konrad Schindler, Jan Dirk Wegner
Abstract	In this paper we propose a novel supervised image classification method that overcomes dataset bias and scarcity of training data using privileged information in the form of keypoints annotations. Our main motivation is recognition of animal species for ecological applications like biodiversity modelling, which can be challenging due to long-tailed species distributions due to rare species, and strong dataset biases in repetitive scenes such as in camera traps. To counteract these challenges, we propose a weakly-supervised visual attention mechanism that has access to keypoints highlighting the most important object parts. This privileged information, implemented via a novel privileged pooling operation, is only accessible during training and helps the model to focus on the regions that are most discriminative. We show that the proposed approach uses more efficiently small training datasets, generalizes better and outperforms competing methods in challenging training conditions.
Tasks	Image Classification
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09168v2
PDF	https://arxiv.org/pdf/2003.09168v2.pdf
PWC	https://paperswithcode.com/paper/privileged-pooling-supervised-attention-based
Repo
Framework

Symbiotic Attention with Privileged Information for Egocentric Action Recognition


Title	Symbiotic Attention with Privileged Information for Egocentric Action Recognition
Authors	Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang
Abstract	Egocentric video recognition is a natural testbed for diverse interaction reasoning. Due to the large action vocabulary in egocentric video datasets, recent studies usually utilize a two-branch structure for action recognition, ie, one branch for verb classification and the other branch for noun classification. However, correlation studies between the verb and the noun branches have been largely ignored. Besides, the two branches fail to exploit local features due to the absence of a position-aware attention mechanism. In this paper, we propose a novel Symbiotic Attention framework leveraging Privileged information (SAP) for egocentric video recognition. Finer position-aware object detection features can facilitate the understanding of actor’s interaction with the object. We introduce these features in action recognition and regard them as privileged information. Our framework enables mutual communication among the verb branch, the noun branch, and the privileged information. This communication process not only injects local details into global features but also exploits implicit guidance about the spatio-temporal position of an on-going action. We introduce novel symbiotic attention (SA) to enable effective communication. It first normalizes the detection guided features on one branch to underline the action-relevant information from the other branch. SA adaptively enhances the interactions among the three sources. To further catalyze this communication, spatial relations are uncovered for the selection of most action-relevant information. It identifies the most valuable and discriminative feature for classification. We validate the effectiveness of our SAP quantitatively and qualitatively. Notably, it achieves the state-of-the-art on two large-scale egocentric video datasets.
Tasks	Object Detection, Video Recognition
Published	2020-02-08
URL	https://arxiv.org/abs/2002.03137v1
PDF	https://arxiv.org/pdf/2002.03137v1.pdf
PWC	https://paperswithcode.com/paper/symbiotic-attention-with-privileged
Repo
Framework

Audiovisual SlowFast Networks for Video Recognition


Title	Audiovisual SlowFast Networks for Video Recognition
Authors	Fanyi Xiao, Yong Jae Lee, Kristen Grauman, Jitendra Malik, Christoph Feichtenhofer
Abstract	We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual perception. AVSlowFast has Slow and Fast visual pathways that are deeply integrated with a Faster Audio pathway to model vision and sound in a unified representation. We fuse audio and visual features at multiple layers, enabling audio to contribute to the formation of hierarchical audiovisual concepts. To overcome training difficulties that arise from different learning dynamics for audio and visual modalities, we introduce DropPathway, which randomly drops the Audio pathway during training as an effective regularization technique. Inspired by prior studies in neuroscience, we perform hierarchical audiovisual synchronization to learn joint audiovisual features. We report state-of-the-art results on six video action classification and detection datasets, perform detailed ablation studies, and show the generalization of AVSlowFast to learn self-supervised audiovisual features. Code will be made available at: https://github.com/facebookresearch/SlowFast.
Tasks	Action Classification, Video Recognition
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08740v2
PDF	https://arxiv.org/pdf/2001.08740v2.pdf
PWC	https://paperswithcode.com/paper/audiovisual-slowfast-networks-for-video
Repo
Framework

Choosing the Sample with Lowest Loss makes SGD Robust


Title	Choosing the Sample with Lowest Loss makes SGD Robust
Authors	Vatsal Shah, Xiaoxia Wu, Sujay Sanghavi
Abstract	The presence of outliers can potentially significantly skew the parameters of machine learning models trained via stochastic gradient descent (SGD). In this paper we propose a simple variant of the simple SGD method: in each step, first choose a set of k samples, then from these choose the one with the smallest current loss, and do an SGD-like update with this chosen sample. Vanilla SGD corresponds to k = 1, i.e. no choice; k >= 2 represents a new algorithm that is however effectively minimizing a non-convex surrogate loss. Our main contribution is a theoretical analysis of the robustness properties of this idea for ML problems which are sums of convex losses; these are backed up with linear regression and small-scale neural network experiments
Tasks
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03316v1
PDF	https://arxiv.org/pdf/2001.03316v1.pdf
PWC	https://paperswithcode.com/paper/choosing-the-sample-with-lowest-loss-makes
Repo
Framework

Privacy-preserving Learning via Deep Net Pruning


Title	Privacy-preserving Learning via Deep Net Pruning
Authors	Yangsibo Huang, Yushan Su, Sachin Ravi, Zhao Song, Sanjeev Arora, Kai Li
Abstract	This paper attempts to answer the question whether neural network pruning can be used as a tool to achieve differential privacy without losing much data utility. As a first step towards understanding the relationship between neural network pruning and differential privacy, this paper proves that pruning a given layer of the neural network is equivalent to adding a certain amount of differentially private noise to its hidden-layer activations. The paper also presents experimental results to show the practical implications of the theoretical finding and the key parameter values in a simple practical setting. These results show that neural network pruning can be a more effective alternative to adding differentially private noise for neural networks.
Tasks	Network Pruning
Published	2020-03-04
URL	https://arxiv.org/abs/2003.01876v1
PDF	https://arxiv.org/pdf/2003.01876v1.pdf
PWC	https://paperswithcode.com/paper/privacy-preserving-learning-via-deep-net
Repo
Framework

Estimation of conditional mixture Weibull distribution with right-censored data using neural network for time-to-event analysis


Title	Estimation of conditional mixture Weibull distribution with right-censored data using neural network for time-to-event analysis
Authors	Achraf Bennis, Sandrine Mouysset, Mathieu Serrurier
Abstract	In this paper, we consider survival analysis with right-censored data which is a common situation in predictive maintenance and health field. We propose a model based on the estimation of two-parameter Weibull distribution conditionally to the features. To achieve this result, we describe a neural network architecture and the associated loss functions that takes into account the right-censored data. We extend the approach to a finite mixture of two-parameter Weibull distributions. We first validate that our model is able to precisely estimate the right parameters of the conditional Weibull distribution on synthetic datasets. In numerical experiments on two real-word datasets (METABRIC and SEER), our model outperforms the state-of-the-art methods. We also demonstrate that our approach can consider any survival time horizon.
Tasks	Survival Analysis
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09358v1
PDF	https://arxiv.org/pdf/2002.09358v1.pdf
PWC	https://paperswithcode.com/paper/estimation-of-conditional-mixture-weibull
Repo
Framework

Crowd Counting via Hierarchical Scale Recalibration Network


Title	Crowd Counting via Hierarchical Scale Recalibration Network
Authors	Zhikang Zou, Yifan Liu, Shuangjie Xu, Wei Wei, Shiping Wen, Pan Zhou
Abstract	The task of crowd counting is extremely challenging due to complicated difficulties, especially the huge variation in vision scale. Previous works tend to adopt a naive concatenation of multi-scale information to tackle it, while the scale shifts between the feature maps are ignored. In this paper, we propose a novel Hierarchical Scale Recalibration Network (HSRNet), which addresses the above issues by modeling rich contextual dependencies and recalibrating multiple scale-associated information. Specifically, a Scale Focus Module (SFM) first integrates global context into local features by modeling the semantic inter-dependencies along channel and spatial dimensions sequentially. In order to reallocate channel-wise feature responses, a Scale Recalibration Module (SRM) adopts a step-by-step fusion to generate final density maps. Furthermore, we propose a novel Scale Consistency loss to constrain that the scale-associated outputs are coherent with groundtruth of different scales. With the proposed modules, our approach can ignore various noises selectively and focus on appropriate crowd scales automatically. Extensive experiments on crowd counting datasets (ShanghaiTech, MALL, WorldEXPO’10, and UCSD) show that our HSRNet can deliver superior results over all state-of-the-art approaches. More remarkably, we extend experiments on an extra vehicle dataset, whose results indicate that the proposed model is generalized to other applications.
Tasks	Crowd Counting
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03545v1
PDF	https://arxiv.org/pdf/2003.03545v1.pdf
PWC	https://paperswithcode.com/paper/crowd-counting-via-hierarchical-scale
Repo
Framework

OCmst: One-class Novelty Detection using Convolutional Neural Network and Minimum Spanning Trees


Title	OCmst: One-class Novelty Detection using Convolutional Neural Network and Minimum Spanning Trees
Authors	Riccardo La Grassa, Ignazio Gallo, Nicola Landro
Abstract	We present a novel model called One Class Minimum Spanning Tree (OCmst) for novelty detection problem that uses a Convolutional Neural Network (CNN) as deep feature extractor and graph-based model based on Minimum Spanning Tree (MST). In a novelty detection scenario, the training data is no polluted by outliers (abnormal class) and the goal is to recognize if a test instance belongs to the normal class or to the abnormal class. Our approach uses the deep features from CNN to feed a pair of MSTs built starting from each test instance. To cut down the computational time we use a parameter $\gamma$ to specify the size of the MST’s starting to the neighbours from the test instance. To prove the effectiveness of the proposed approach we conducted experiments on two publicly available datasets, well-known in literature and we achieved the state-of-the-art results on CIFAR10 dataset.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13524v1
PDF	https://arxiv.org/pdf/2003.13524v1.pdf
PWC	https://paperswithcode.com/paper/ocmst-one-class-novelty-detection-using
Repo
Framework

Towards Using Count-level Weak Supervision for Crowd Counting


Title	Towards Using Count-level Weak Supervision for Crowd Counting
Authors	Yinjie Lei, Yan Liu, Pingping Zhang, Lingqiao Liu
Abstract	Most existing crowd counting methods require object location-level annotation, i.e., placing a dot at the center of an object. While being simpler than the bounding-box or pixel-level annotation, obtaining this annotation is still labor-intensive and time-consuming especially for images with highly crowded scenes. On the other hand, weaker annotations that only know the total count of objects can be almost effortless in many practical scenarios. Thus, it is desirable to develop a learning method that can effectively train models from count-level annotations. To this end, this paper studies the problem of weakly-supervised crowd counting which learns a model from only a small amount of location-level annotations (fully-supervised) but a large amount of count-level annotations (weakly-supervised). To perform effective training in this scenario, we observe that the direct solution of regressing the integral of density map to the object count is not sufficient and it is beneficial to introduce stronger regularizations on the predicted density map of weakly-annotated images. We devise a simple-yet-effective training strategy, namely Multiple Auxiliary Tasks Training (MATT), to construct regularizes for restricting the freedom of the generated density maps. Through extensive experiments on existing datasets and a newly proposed dataset, we validate the effectiveness of the proposed weakly-supervised method and demonstrate its superior performance over existing solutions.
Tasks	Crowd Counting
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00164v1
PDF	https://arxiv.org/pdf/2003.00164v1.pdf
PWC	https://paperswithcode.com/paper/towards-using-count-level-weak-supervision
Repo
Framework

An evaluation of machine learning techniques to predict the outcome of children treated for Hodgkin-Lymphoma on the AHOD0031 trial: A report from the Children’s Oncology Group


Title	An evaluation of machine learning techniques to predict the outcome of children treated for Hodgkin-Lymphoma on the AHOD0031 trial: A report from the Children’s Oncology Group
Authors	Cédric Beaulac, Jeffrey S. Rosenthal, Qinglin Pei, Debra Friedman, Suzanne Wolden, David Hodgson
Abstract	In this manuscript we analyze a data set containing information on children with Hodgkin Lymphoma (HL) enrolled on a clinical trial. Treatments received and survival status were collected together with other covariates such as demographics and clinical measurements. Our main task is to explore the potential of machine learning (ML) algorithms in a survival analysis context in order to improve over the Cox Proportional Hazard (CoxPH) model. We discuss the weaknesses of the CoxPH model we would like to improve upon and then we introduce multiple algorithms, from well-established ones to state-of-the-art models, that solve these issues. We then compare every model according to the concordance index and the brier score. Finally, we produce a series of recommendations, based on our experience, for practitioners that would like to benefit from the recent advances in artificial intelligence.
Tasks	Survival Analysis
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05534v1
PDF	https://arxiv.org/pdf/2001.05534v1.pdf
PWC	https://paperswithcode.com/paper/an-evaluation-of-machine-learning-techniques
Repo
Framework

Harmonizing Transferability and Discriminability for Adapting Object Detectors


Title	Harmonizing Transferability and Discriminability for Adapting Object Detectors
Authors	Chaoqi Chen, Zebiao Zheng, Xinghao Ding, Yue Huang, Qi Dou
Abstract	Recent advances in adaptive object detection have achieved compelling results in virtue of adversarial feature adaptation to mitigate the distributional shifts along the detection pipeline. Whilst adversarial adaptation significantly enhances the transferability of feature representations, the feature discriminability of object detectors remains less investigated. Moreover, transferability and discriminability may come at a contradiction in adversarial adaptation given the complex combinations of objects and the differentiated scene layouts between domains. In this paper, we propose a Hierarchical Transferability Calibration Network (HTCN) that hierarchically (local-region/image/instance) calibrates the transferability of feature representations for harmonizing transferability and discriminability. The proposed model consists of three components: (1) Importance Weighted Adversarial Training with input Interpolation (IWAT-I), which strengthens the global discriminability by re-weighting the interpolated image-level features; (2) Context-aware Instance-Level Alignment (CILA) module, which enhances the local discriminability by capturing the underlying complementary effect between the instance-level feature and the global context information for the instance-level feature alignment; (3) local feature masks that calibrate the local transferability to provide semantic guidance for the following discriminative pattern alignment. Experimental results show that HTCN significantly outperforms the state-of-the-art methods on benchmark datasets.
Tasks	Calibration, Object Detection
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06297v1
PDF	https://arxiv.org/pdf/2003.06297v1.pdf
PWC	https://paperswithcode.com/paper/harmonizing-transferability-and
Repo
Framework

DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm


Title	DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm
Authors	Hao Xu, Haibin Chang, Dongxiao Zhang
Abstract	Data-driven methods have recently been developed to discover underlying partial differential equations (PDEs) of physical problems. However, for these methods, a complete candidate library of potential terms in a PDE are usually required. To overcome this limitation, we propose a novel framework combining deep learning and genetic algorithm, called DLGA-PDE, for discovering PDEs. In the proposed framework, a deep neural network that is trained with available data of a physical problem is utilized to generate meta-data and calculate derivatives, and the genetic algorithm is then employed to discover the underlying PDE. Owing to the merits of the genetic algorithm, such as mutation and crossover, DLGA-PDE can work with an incomplete candidate library. The proposed DLGA-PDE is tested for discovery of the Korteweg-de Vries (KdV) equation, the Burgers equation, the wave equation, and the Chaffee-Infante equation, respectively, for proof-of-concept. Satisfactory results are obtained without the need for a complete candidate library, even in the presence of noisy and limited data.
Tasks
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07305v1
PDF	https://arxiv.org/pdf/2001.07305v1.pdf
PWC	https://paperswithcode.com/paper/dlga-pde-discovery-of-pdes-with-incomplete
Repo
Framework

Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation


Title	Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation
Authors	Luanxuan Hou, Jie Cao, Yuan Zhao, Haifeng Shen, Yiping Meng, Ran He, Jieping Ye
Abstract	The target of human pose estimation is to determine body part or joint locations of each person from an image. This is a challenging problems with wide applications. To address this issue, this paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation. Technically, a parallel pyramid structure is proposed to compensate the loss of information. We take the design of parallel structure for reverse compensation. Meanwhile, the overall computational complexity does not increase. We further define an Attention Partial Module (APM) operator to extract weighted features from different scale feature maps generated by the parallel pyramid structure. Compared with refining through upsampling operator, APM can better capture the relationship between channels. At last, we proposed a differentiable auto data augmentation method to further improve estimation accuracy. We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component. Experiments corroborate the effectiveness of our proposed method. Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
Tasks	Data Augmentation, Pose Estimation
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07516v1
PDF	https://arxiv.org/pdf/2003.07516v1.pdf
PWC	https://paperswithcode.com/paper/augmented-parallel-pyramid-net-for-attention
Repo
Framework