Paper Group AWR 458
Data Augmentation for Object Detection via Progressive and Selective Instance-Switching. Graph Convolutional Networks for Temporal Action Localization. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Towards Efficient Model Compression via Learned Global Ranking. Once-for-All: Train One Network and Specialize it for Effic …
Data Augmentation for Object Detection via Progressive and Selective Instance-Switching
Title | Data Augmentation for Object Detection via Progressive and Selective Instance-Switching |
Authors | Hao Wang, Qilong Wang, Fan Yang, Weiqi Zhang, Wangmeng Zuo |
Abstract | Collection of massive well-annotated samples is effective in improving object detection performance but is extremely laborious and costly. Instead of data collection and annotation, the recently proposed Cut-Paste methods [12, 15] show the potential to augment training dataset by cutting foreground objects and pasting them on proper new backgrounds. However, existing Cut-Paste methods cannot guarantee synthetic images always precisely model visual context, and all of them require external datasets. To handle above issues, this paper proposes a simple yet effective instance-switching (IS) strategy, which generates new training data by switching instances of same class from different images. Our IS naturally preserves contextual coherence in the original images while requiring no external dataset. For guiding our IS to obtain better object performance, we explore issues of instance imbalance and class importance in datasets, which frequently occur and bring adverse effect on detection performance. To this end, we propose a novel Progressive and Selective Instance-Switching (PSIS) method to augment training data for object detection. The proposed PSIS enhances instance balance by combining selective re-sampling with a class-balanced loss, and considers class importance by progressively augmenting training dataset guided by detection performance. The experiments are conducted on the challenging MS COCO benchmark, and results demonstrate our PSIS brings clear improvement over various state-of-the-art detectors (e.g., Faster R-CNN, FPN, Mask R-CNN and SNIPER), showing the superiority and generality of our PSIS. Code and models are available at: https://github.com/Hwang64/PSIS. |
Tasks | Data Augmentation, Instance Segmentation, Object Detection |
Published | 2019-06-02 |
URL | https://arxiv.org/abs/1906.00358v2 |
https://arxiv.org/pdf/1906.00358v2.pdf | |
PWC | https://paperswithcode.com/paper/190600358 |
Repo | https://github.com/Hwang64/PSIS |
Framework | none |
Graph Convolutional Networks for Temporal Action Localization
Title | Graph Convolutional Networks for Temporal Action Localization |
Authors | Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan |
Abstract | Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14 (49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships. Codes are available at https://github.com/Alvin-Zeng/PGCN. |
Tasks | Action Classification, Action Localization, Temporal Action Localization |
Published | 2019-09-07 |
URL | https://arxiv.org/abs/1909.03252v1 |
https://arxiv.org/pdf/1909.03252v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-convolutional-networks-for-temporal |
Repo | https://github.com/Alvin-Zeng/PGCN |
Framework | pytorch |
Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
Title | Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression |
Authors | Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, Dongwei Ren |
Abstract | Bounding box regression is the crucial step in object detection. In existing methods, while $\ell_n$-norm loss is widely adopted for bounding box regression, it is not tailored to the evaluation metric, i.e., Intersection over Union (IoU). Recently, IoU loss and generalized IoU (GIoU) loss have been proposed to benefit the IoU metric, but still suffer from the problems of slow convergence and inaccurate regression. In this paper, we propose a Distance-IoU (DIoU) loss by incorporating the normalized distance between the predicted box and the target box, which converges much faster in training than IoU and GIoU losses. Furthermore, this paper summarizes three geometric factors in bounding box regression, \ie, overlap area, central point distance and aspect ratio, based on which a Complete IoU (CIoU) loss is proposed, thereby leading to faster convergence and better performance. By incorporating DIoU and CIoU losses into state-of-the-art object detection algorithms, e.g., YOLO v3, SSD and Faster RCNN, we achieve notable performance gains in terms of not only IoU metric but also GIoU metric. Moreover, DIoU can be easily adopted into non-maximum suppression (NMS) to act as the criterion, further boosting performance improvement. The source code and trained models are available at https://github.com/Zzh-tju/DIoU. |
Tasks | Object Detection |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08287v1 |
https://arxiv.org/pdf/1911.08287v1.pdf | |
PWC | https://paperswithcode.com/paper/distance-iou-loss-faster-and-better-learning |
Repo | https://github.com/Zzh-tju/DIoU-darknet |
Framework | none |
Towards Efficient Model Compression via Learned Global Ranking
Title | Towards Efficient Model Compression via Learned Global Ranking |
Authors | Ting-Wu Chin, Ruizhou Ding, Cha Zhang, Diana Marculescu |
Abstract | Pruning convolutional filters has demonstrated its effectiveness in compressing ConvNets. Prior art in filter pruning requires users to specify a target model complexity (e.g., model size or FLOP count) for the resulting architecture. However, determining a target model complexity can be difficult for optimizing various embodied AI applications such as autonomous robots, drones, and user-facing applications. First, both the accuracy and the speed of ConvNets can affect the performance of the application. Second, the performance of the application can be hard to assess without evaluating ConvNets during inference. As a consequence, finding a sweet-spot between the accuracy and speed via filter pruning, which needs to be done in a trial-and-error fashion, can be time-consuming. This work takes a first step toward making this process more efficient by altering the goal of model compression to producing a set of ConvNets with various accuracy and latency trade-offs instead of producing one ConvNet targeting some pre-defined latency constraint. To this end, we propose to learn a global ranking of the filters across different layers of the ConvNet, which is used to obtain a set of ConvNet architectures that have different accuracy/latency trade-offs by pruning the bottom-ranked filters. Our proposed algorithm, LeGR, is shown to be 2x to 3x faster than prior work while having comparable or better performance when targeting seven pruned ResNet-56 with different accuracy/FLOPs profiles on the CIFAR-100 dataset. Additionally, we have evaluated LeGR on ImageNet and Bird-200 with ResNet-50 and MobileNetV2 to demonstrate its effectiveness. Code available at https://github.com/cmu-enyac/LeGR. |
Tasks | Model Compression |
Published | 2019-04-28 |
URL | https://arxiv.org/abs/1904.12368v2 |
https://arxiv.org/pdf/1904.12368v2.pdf | |
PWC | https://paperswithcode.com/paper/legr-filter-pruning-via-learned-global |
Repo | https://github.com/cmu-enyac/LeGR |
Framework | pytorch |
Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms
Title | Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms |
Authors | Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han |
Abstract | We address the challenging problem of efficient deep learning model deployment across many devices and diverse constraints, from general-purpose hardware to specialized accelerators. Conventional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally prohibitive (causing $CO_2$ emission as much as 5 cars’ lifetime) thus unscalable. To reduce the cost, our key idea is to decouple model training from architecture search. To this end, we propose to train a once-for-all network (OFA) that supports diverse architectural settings (depth, width, kernel size, and resolution). Given a deployment scenario, we can then quickly get a specialized sub-network by selecting from the OFA network without additional training. To prevent interference between many sub-networks during training, we also propose a novel progressive shrinking algorithm, which can train a surprisingly large number of sub-networks ($> 10^{19}$) simultaneously. Extensive experiments on various hardware platforms (CPU, GPU, mCPU, mGPU, FPGA accelerator) show that OFA consistently outperforms SOTA NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3) while reducing orders of magnitude GPU hours and $CO_2$ emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting ($<$600M FLOPs). Code and pre-trained models are released at https://github.com/mit-han-lab/once-for-all. |
Tasks | AutoML, Neural Architecture Search |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09791v3 |
https://arxiv.org/pdf/1908.09791v3.pdf | |
PWC | https://paperswithcode.com/paper/once-for-all-train-one-network-and-specialize |
Repo | https://github.com/mit-han-lab/once-for-all |
Framework | pytorch |
Transfer Learning for Non-Intrusive Load Monitoring
Title | Transfer Learning for Non-Intrusive Load Monitoring |
Authors | Michele DIncecco, Stefano Squartini, Mingjun Zhong |
Abstract | Non-intrusive load monitoring (NILM) is a technique to recover source appliances from only the recorded mains in a household. NILM is unidentifiable and thus a challenge problem because the inferred power value of an appliance given only the mains could not be unique. To mitigate the unidentifiable problem, various methods incorporating domain knowledge into NILM have been proposed and shown effective experimentally. Recently, among these methods, deep neural networks are shown performing best. Arguably, the recently proposed sequence-to-point (seq2point) learning is promising for NILM. However, the results were only carried out on the same data domain. It is not clear if the method could be generalised or transferred to different domains, e.g., the test data were drawn from a different country comparing to the training data. We address this issue in the paper, and two transfer learning schemes are proposed, i.e., appliance transfer learning (ATL) and cross-domain transfer learning (CTL). For ATL, our results show that the latent features learnt by a complex' appliance, e.g., washing machine, can be transferred to a simple’ appliance, e.g., kettle. For CTL, our conclusion is that the seq2point learning is transferable. Precisely, when the training and test data are in a similar domain, seq2point learning can be directly applied to the test data without fine tuning; when the training and test data are in different domains, seq2point learning needs fine tuning before applying to the test data. Interestingly, we show that only the fully connected layers need fine tuning for transfer learning. Source code can be found at https://github.com/MingjunZhong/transferNILM. |
Tasks | Non-Intrusive Load Monitoring, Transfer Learning |
Published | 2019-02-23 |
URL | https://arxiv.org/abs/1902.08835v3 |
https://arxiv.org/pdf/1902.08835v3.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-for-non-intrusive-load |
Repo | https://github.com/MingjunZhong/transferNILM |
Framework | tf |