January 31, 2020

1653 words 8 mins read

Paper Group AWR 458

Data Augmentation for Object Detection via Progressive and Selective Instance-Switching. Graph Convolutional Networks for Temporal Action Localization. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Towards Efficient Model Compression via Learned Global Ranking. Once-for-All: Train One Network and Specialize it for Effic …

Data Augmentation for Object Detection via Progressive and Selective Instance-Switching


Title	Data Augmentation for Object Detection via Progressive and Selective Instance-Switching
Authors	Hao Wang, Qilong Wang, Fan Yang, Weiqi Zhang, Wangmeng Zuo
Abstract	Collection of massive well-annotated samples is effective in improving object detection performance but is extremely laborious and costly. Instead of data collection and annotation, the recently proposed Cut-Paste methods [12, 15] show the potential to augment training dataset by cutting foreground objects and pasting them on proper new backgrounds. However, existing Cut-Paste methods cannot guarantee synthetic images always precisely model visual context, and all of them require external datasets. To handle above issues, this paper proposes a simple yet effective instance-switching (IS) strategy, which generates new training data by switching instances of same class from different images. Our IS naturally preserves contextual coherence in the original images while requiring no external dataset. For guiding our IS to obtain better object performance, we explore issues of instance imbalance and class importance in datasets, which frequently occur and bring adverse effect on detection performance. To this end, we propose a novel Progressive and Selective Instance-Switching (PSIS) method to augment training data for object detection. The proposed PSIS enhances instance balance by combining selective re-sampling with a class-balanced loss, and considers class importance by progressively augmenting training dataset guided by detection performance. The experiments are conducted on the challenging MS COCO benchmark, and results demonstrate our PSIS brings clear improvement over various state-of-the-art detectors (e.g., Faster R-CNN, FPN, Mask R-CNN and SNIPER), showing the superiority and generality of our PSIS. Code and models are available at: https://github.com/Hwang64/PSIS.
Tasks	Data Augmentation, Instance Segmentation, Object Detection
Published	2019-06-02
URL	https://arxiv.org/abs/1906.00358v2
PDF	https://arxiv.org/pdf/1906.00358v2.pdf
PWC	https://paperswithcode.com/paper/190600358
Repo	https://github.com/Hwang64/PSIS
Framework	none

Graph Convolutional Networks for Temporal Action Localization


Title	Graph Convolutional Networks for Temporal Action Localization
Authors	Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan
Abstract	Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14 (49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships. Codes are available at https://github.com/Alvin-Zeng/PGCN.
Tasks	Action Classification, Action Localization, Temporal Action Localization
Published	2019-09-07
URL	https://arxiv.org/abs/1909.03252v1
PDF	https://arxiv.org/pdf/1909.03252v1.pdf
PWC	https://paperswithcode.com/paper/graph-convolutional-networks-for-temporal
Repo	https://github.com/Alvin-Zeng/PGCN
Framework	pytorch

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression


Title	Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
Authors	Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, Dongwei Ren
Abstract	Bounding box regression is the crucial step in object detection. In existing methods, while $\ell_n$-norm loss is widely adopted for bounding box regression, it is not tailored to the evaluation metric, i.e., Intersection over Union (IoU). Recently, IoU loss and generalized IoU (GIoU) loss have been proposed to benefit the IoU metric, but still suffer from the problems of slow convergence and inaccurate regression. In this paper, we propose a Distance-IoU (DIoU) loss by incorporating the normalized distance between the predicted box and the target box, which converges much faster in training than IoU and GIoU losses. Furthermore, this paper summarizes three geometric factors in bounding box regression, \ie, overlap area, central point distance and aspect ratio, based on which a Complete IoU (CIoU) loss is proposed, thereby leading to faster convergence and better performance. By incorporating DIoU and CIoU losses into state-of-the-art object detection algorithms, e.g., YOLO v3, SSD and Faster RCNN, we achieve notable performance gains in terms of not only IoU metric but also GIoU metric. Moreover, DIoU can be easily adopted into non-maximum suppression (NMS) to act as the criterion, further boosting performance improvement. The source code and trained models are available at https://github.com/Zzh-tju/DIoU.
Tasks	Object Detection
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08287v1
PDF	https://arxiv.org/pdf/1911.08287v1.pdf
PWC	https://paperswithcode.com/paper/distance-iou-loss-faster-and-better-learning
Repo	https://github.com/Zzh-tju/DIoU-darknet
Framework	none

Towards Efficient Model Compression via Learned Global Ranking


Title	Towards Efficient Model Compression via Learned Global Ranking
Authors	Ting-Wu Chin, Ruizhou Ding, Cha Zhang, Diana Marculescu
Abstract	Pruning convolutional filters has demonstrated its effectiveness in compressing ConvNets. Prior art in filter pruning requires users to specify a target model complexity (e.g., model size or FLOP count) for the resulting architecture. However, determining a target model complexity can be difficult for optimizing various embodied AI applications such as autonomous robots, drones, and user-facing applications. First, both the accuracy and the speed of ConvNets can affect the performance of the application. Second, the performance of the application can be hard to assess without evaluating ConvNets during inference. As a consequence, finding a sweet-spot between the accuracy and speed via filter pruning, which needs to be done in a trial-and-error fashion, can be time-consuming. This work takes a first step toward making this process more efficient by altering the goal of model compression to producing a set of ConvNets with various accuracy and latency trade-offs instead of producing one ConvNet targeting some pre-defined latency constraint. To this end, we propose to learn a global ranking of the filters across different layers of the ConvNet, which is used to obtain a set of ConvNet architectures that have different accuracy/latency trade-offs by pruning the bottom-ranked filters. Our proposed algorithm, LeGR, is shown to be 2x to 3x faster than prior work while having comparable or better performance when targeting seven pruned ResNet-56 with different accuracy/FLOPs profiles on the CIFAR-100 dataset. Additionally, we have evaluated LeGR on ImageNet and Bird-200 with ResNet-50 and MobileNetV2 to demonstrate its effectiveness. Code available at https://github.com/cmu-enyac/LeGR.
Tasks	Model Compression
Published	2019-04-28
URL	https://arxiv.org/abs/1904.12368v2
PDF	https://arxiv.org/pdf/1904.12368v2.pdf
PWC	https://paperswithcode.com/paper/legr-filter-pruning-via-learned-global
Repo	https://github.com/cmu-enyac/LeGR
Framework	pytorch

Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms


Title	Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms
Authors	Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han
Abstract	We address the challenging problem of efficient deep learning model deployment across many devices and diverse constraints, from general-purpose hardware to specialized accelerators. Conventional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally prohibitive (causing $CO_2$ emission as much as 5 cars’ lifetime) thus unscalable. To reduce the cost, our key idea is to decouple model training from architecture search. To this end, we propose to train a once-for-all network (OFA) that supports diverse architectural settings (depth, width, kernel size, and resolution). Given a deployment scenario, we can then quickly get a specialized sub-network by selecting from the OFA network without additional training. To prevent interference between many sub-networks during training, we also propose a novel progressive shrinking algorithm, which can train a surprisingly large number of sub-networks ($> 10^{19}$) simultaneously. Extensive experiments on various hardware platforms (CPU, GPU, mCPU, mGPU, FPGA accelerator) show that OFA consistently outperforms SOTA NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3) while reducing orders of magnitude GPU hours and $CO_2$ emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting ($<$600M FLOPs). Code and pre-trained models are released at https://github.com/mit-han-lab/once-for-all.
Tasks	AutoML, Neural Architecture Search
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09791v3
PDF	https://arxiv.org/pdf/1908.09791v3.pdf
PWC	https://paperswithcode.com/paper/once-for-all-train-one-network-and-specialize
Repo	https://github.com/mit-han-lab/once-for-all
Framework	pytorch

Transfer Learning for Non-Intrusive Load Monitoring


Title	Transfer Learning for Non-Intrusive Load Monitoring
Authors	Michele DIncecco, Stefano Squartini, Mingjun Zhong
Abstract	Non-intrusive load monitoring (NILM) is a technique to recover source appliances from only the recorded mains in a household. NILM is unidentifiable and thus a challenge problem because the inferred power value of an appliance given only the mains could not be unique. To mitigate the unidentifiable problem, various methods incorporating domain knowledge into NILM have been proposed and shown effective experimentally. Recently, among these methods, deep neural networks are shown performing best. Arguably, the recently proposed sequence-to-point (seq2point) learning is promising for NILM. However, the results were only carried out on the same data domain. It is not clear if the method could be generalised or transferred to different domains, e.g., the test data were drawn from a different country comparing to the training data. We address this issue in the paper, and two transfer learning schemes are proposed, i.e., appliance transfer learning (ATL) and cross-domain transfer learning (CTL). For ATL, our results show that the latent features learnt by a `complex' appliance, e.g., washing machine, can be transferred to a` simple’ appliance, e.g., kettle. For CTL, our conclusion is that the seq2point learning is transferable. Precisely, when the training and test data are in a similar domain, seq2point learning can be directly applied to the test data without fine tuning; when the training and test data are in different domains, seq2point learning needs fine tuning before applying to the test data. Interestingly, we show that only the fully connected layers need fine tuning for transfer learning. Source code can be found at https://github.com/MingjunZhong/transferNILM.
Tasks	Non-Intrusive Load Monitoring, Transfer Learning
Published	2019-02-23
URL	https://arxiv.org/abs/1902.08835v3
PDF	https://arxiv.org/pdf/1902.08835v3.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-for-non-intrusive-load
Repo	https://github.com/MingjunZhong/transferNILM
Framework	tf