October 20, 2019

1556 words 8 mins read

Paper Group AWR 355

Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose. Rethinking the Value of Network Pruning. Hybrid Knowledge Routed Modules for Large-scale Object Detection. ReSIFT: Reliability-Weighted SIFT-based Image Quality Assessment. Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese. PointGro …

Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose


Title	Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose
Authors	Daniil Osokin
Abstract	In this work we adapt multi-person pose estimation architecture to use it on edge devices. We follow the bottom-up approach from OpenPose, the winner of COCO 2016 Keypoints Challenge, because of its decent quality and robustness to number of people inside the frame. With proposed network design and optimized post-processing code the full solution runs at 28 frames per second (fps) on Intel$\unicode{xAE}$ NUC 6i7KYB mini PC and 26 fps on Core$^{TM}$ i7-6850K CPU. The network model has 4.1M parameters and 9 billions floating-point operations (GFLOPs) complexity, which is just ~15% of the baseline 2-stage OpenPose with almost the same quality. The code and model are available as a part of Intel$\unicode{xAE}$ OpenVINO$^{TM}$ Toolkit.
Tasks	Multi-Person Pose Estimation, Pose Estimation
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12004v1
PDF	http://arxiv.org/pdf/1811.12004v1.pdf
PWC	https://paperswithcode.com/paper/real-time-2d-multi-person-pose-estimation-on
Repo	https://github.com/murdockhou/lightweight_openpose
Framework	tf

Rethinking the Value of Network Pruning


Title	Rethinking the Value of Network Pruning
Authors	Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell
Abstract	Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned “important” weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited “important” weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the “Lottery Ticket Hypothesis” (Frankle & Carbin 2019), and find that with optimal learning rate, the “winning ticket” initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization.
Tasks	Network Pruning, Neural Architecture Search
Published	2018-10-11
URL	http://arxiv.org/abs/1810.05270v2
PDF	http://arxiv.org/pdf/1810.05270v2.pdf
PWC	https://paperswithcode.com/paper/rethinking-the-value-of-network-pruning
Repo	https://github.com/liuzhuang13/slimming
Framework	pytorch

Hybrid Knowledge Routed Modules for Large-scale Object Detection


Title	Hybrid Knowledge Routed Modules for Large-scale Object Detection
Authors	Chenhan Jiang, Hang Xu, Xiangdan Liang, Liang Lin
Abstract	The dominant object detection approaches treat the recognition of each region separately and overlook crucial semantic correlations between objects in one scene. This paradigm leads to substantial performance drop when facing heavy long-tail problems, where very few samples are available for rare classes and plenty of confusing categories exists. We exploit diverse human commonsense knowledge for reasoning over large-scale object categories and reaching semantic coherency within one image. Particularly, we present Hybrid Knowledge Routed Modules (HKRM) that incorporates the reasoning routed by two kinds of knowledge forms: an explicit knowledge module for structured constraints that are summarized with linguistic knowledge (e.g. shared attributes, relationships) about concepts; and an implicit knowledge module that depicts some implicit constraints (e.g. common spatial layouts). By functioning over a region-to-region graph, both modules can be individualized and adapted to coordinate with visual patterns in each image, guided by specific knowledge forms. HKRM are light-weight, general-purpose and extensible by easily incorporating multiple knowledge to endow any detection networks the ability of global semantic reasoning. Experiments on large-scale object detection benchmarks show HKRM obtains around 34.5% improvement on VisualGenome (1000 categories) and 30.4% on ADE in terms of mAP. Codes and trained model can be found in https://github.com/chanyn/HKRM.
Tasks	Object Detection
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12681v1
PDF	http://arxiv.org/pdf/1810.12681v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-knowledge-routed-modules-for-large
Repo	https://github.com/chanyn/HKRM
Framework	pytorch

ReSIFT: Reliability-Weighted SIFT-based Image Quality Assessment


Title	ReSIFT: Reliability-Weighted SIFT-based Image Quality Assessment
Authors	Dogancan Temel, Ghassan AlRegib
Abstract	This paper presents a full-reference image quality estimator based on SIFT descriptor matching over reliability-weighted feature maps. Reliability assignment includes a smoothing operation, a transformation to perceptual color domain, a local normalization stage, and a spectral residual computation with global normalization. The proposed method ReSIFT is tested on the LIVE and the LIVE Multiply Distorted databases and compared with 11 state-of-the-art full-reference quality estimators. In terms of the Pearson and the Spearman correlation, ReSIFT is the best performing quality estimator in the overall databases. Moreover, ReSIFT is the best performing quality estimator in at least one distortion group in compression, noise, and blur category.
Tasks	Image Quality Assessment
Published	2018-11-14
URL	http://arxiv.org/abs/1811.06090v1
PDF	http://arxiv.org/pdf/1811.06090v1.pdf
PWC	https://paperswithcode.com/paper/resift-reliability-weighted-sift-based-image
Repo	https://github.com/olivesgatech/ReSIFT
Framework	none

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese


Title	Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Authors	Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu
Abstract	Sequence-to-sequence attention-based models have recently shown very promising results on automatic speech recognition (ASR) tasks, which integrate an acoustic, pronunciation and language model into a single neural network. In these models, the Transformer, a new sequence-to-sequence attention-based model relying entirely on self-attention without using RNNs or convolutions, achieves a new single-model state-of-the-art BLEU on neural machine translation (NMT) tasks. Since the outstanding performance of the Transformer, we extend it to speech and concentrate on it as the basic architecture of sequence-to-sequence attention-based model on Mandarin Chinese ASR tasks. Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese. Additionally, a greedy cascading decoder with the Transformer is proposed for mapping CI-phoneme sequences and syllable sequences into word sequences. Experiments on HKUST datasets demonstrate that syllable based model with the Transformer performs better than CI-phoneme based counterpart, and achieves a character error rate (CER) of \emph{$28.77%$}, which is competitive to the state-of-the-art CER of $28.0%$ by the joint CTC-attention based encoder-decoder network.
Tasks	Language Modelling, Machine Translation, Sequence-To-Sequence Speech Recognition, Speech Recognition
Published	2018-04-28
URL	http://arxiv.org/abs/1804.10752v2
PDF	http://arxiv.org/pdf/1804.10752v2.pdf
PWC	https://paperswithcode.com/paper/syllable-based-sequence-to-sequence-speech
Repo	https://github.com/gentaiscool/end2end-asr-pytorch
Framework	pytorch

PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention


Title	PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention
Authors	Yongbin Sun, Yue Wang, Ziwei Liu, Joshua E. Siegel, Sanjay E. Sarma
Abstract	Generating 3D point clouds is challenging yet highly desired. This work presents a novel autoregressive model, PointGrow, which can generate diverse and realistic point cloud samples from scratch or conditioned on semantic contexts. This model operates recurrently, with each point sampled according to a conditional distribution given its previously-generated points, allowing inter-point correlations to be well-exploited and 3D shape generative processes to be better interpreted. Since point cloud object shapes are typically encoded by long-range dependencies, we augment our model with dedicated self-attention modules to capture such relations. Extensive evaluations show that PointGrow achieves satisfying performance on both unconditional and conditional point cloud generation tasks, with respect to realism and diversity. Several important applications, such as unsupervised feature learning and shape arithmetic operations, are also demonstrated.
Tasks	Generating 3D Point Clouds, Point Cloud Generation
Published	2018-10-12
URL	https://arxiv.org/abs/1810.05591v3
PDF	https://arxiv.org/pdf/1810.05591v3.pdf
PWC	https://paperswithcode.com/paper/pointgrow-autoregressively-learned-point
Repo	https://github.com/syb7573330/PointGrow
Framework	tf

Anomaly Generation using Generative Adversarial Networks in Host Based Intrusion Detection


Title	Anomaly Generation using Generative Adversarial Networks in Host Based Intrusion Detection
Authors	Milad Salem, Shayan Taheri, Jiann Shiun Yuan
Abstract	Generative adversarial networks have been able to generate striking results in various domains. This generation capability can be general while the networks gain deep understanding regarding the data distribution. In many domains, this data distribution consists of anomalies and normal data, with the anomalies commonly occurring relatively less, creating datasets that are imbalanced. The capabilities that generative adversarial networks offer can be leveraged to examine these anomalies and help alleviate the challenge that imbalanced datasets propose via creating synthetic anomalies. This anomaly generation can be specifically beneficial in domains that have costly data creation processes as well as inherently imbalanced datasets. One of the domains that fits this description is the host-based intrusion detection domain. In this work, ADFA-LD dataset is chosen as the dataset of interest containing system calls of small foot-print next generation attacks. The data is first converted into images, and then a Cycle-GAN is used to create images of anomalous data from images of normal data. The generated data is combined with the original dataset and is used to train a model to detect anomalies. By doing so, it is shown that the classification results are improved, with the AUC rising from 0.55 to 0.71, and the anomaly detection rate rising from 17.07% to 80.49%. The results are also compared to SMOTE, showing the potential presented by generative adversarial networks in anomaly generation.
Tasks	Anomaly Detection, Intrusion Detection
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04697v1
PDF	http://arxiv.org/pdf/1812.04697v1.pdf
PWC	https://paperswithcode.com/paper/anomaly-generation-using-generative
Repo	https://github.com/shayan-taheri/My_Publications
Framework	none