Paper Group AWR 355
Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose. Rethinking the Value of Network Pruning. Hybrid Knowledge Routed Modules for Large-scale Object Detection. ReSIFT: Reliability-Weighted SIFT-based Image Quality Assessment. Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese. PointGro …
Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose
Title | Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose |
Authors | Daniil Osokin |
Abstract | In this work we adapt multi-person pose estimation architecture to use it on edge devices. We follow the bottom-up approach from OpenPose, the winner of COCO 2016 Keypoints Challenge, because of its decent quality and robustness to number of people inside the frame. With proposed network design and optimized post-processing code the full solution runs at 28 frames per second (fps) on Intel$\unicode{xAE}$ NUC 6i7KYB mini PC and 26 fps on Core$^{TM}$ i7-6850K CPU. The network model has 4.1M parameters and 9 billions floating-point operations (GFLOPs) complexity, which is just ~15% of the baseline 2-stage OpenPose with almost the same quality. The code and model are available as a part of Intel$\unicode{xAE}$ OpenVINO$^{TM}$ Toolkit. |
Tasks | Multi-Person Pose Estimation, Pose Estimation |
Published | 2018-11-29 |
URL | http://arxiv.org/abs/1811.12004v1 |
http://arxiv.org/pdf/1811.12004v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-2d-multi-person-pose-estimation-on |
Repo | https://github.com/murdockhou/lightweight_openpose |
Framework | tf |
Rethinking the Value of Network Pruning
Title | Rethinking the Value of Network Pruning |
Authors | Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell |
Abstract | Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned “important” weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited “important” weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the “Lottery Ticket Hypothesis” (Frankle & Carbin 2019), and find that with optimal learning rate, the “winning ticket” initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. |
Tasks | Network Pruning, Neural Architecture Search |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.05270v2 |
http://arxiv.org/pdf/1810.05270v2.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-the-value-of-network-pruning |
Repo | https://github.com/liuzhuang13/slimming |
Framework | pytorch |
Hybrid Knowledge Routed Modules for Large-scale Object Detection
Title | Hybrid Knowledge Routed Modules for Large-scale Object Detection |
Authors | Chenhan Jiang, Hang Xu, Xiangdan Liang, Liang Lin |
Abstract | The dominant object detection approaches treat the recognition of each region separately and overlook crucial semantic correlations between objects in one scene. This paradigm leads to substantial performance drop when facing heavy long-tail problems, where very few samples are available for rare classes and plenty of confusing categories exists. We exploit diverse human commonsense knowledge for reasoning over large-scale object categories and reaching semantic coherency within one image. Particularly, we present Hybrid Knowledge Routed Modules (HKRM) that incorporates the reasoning routed by two kinds of knowledge forms: an explicit knowledge module for structured constraints that are summarized with linguistic knowledge (e.g. shared attributes, relationships) about concepts; and an implicit knowledge module that depicts some implicit constraints (e.g. common spatial layouts). By functioning over a region-to-region graph, both modules can be individualized and adapted to coordinate with visual patterns in each image, guided by specific knowledge forms. HKRM are light-weight, general-purpose and extensible by easily incorporating multiple knowledge to endow any detection networks the ability of global semantic reasoning. Experiments on large-scale object detection benchmarks show HKRM obtains around 34.5% improvement on VisualGenome (1000 categories) and 30.4% on ADE in terms of mAP. Codes and trained model can be found in https://github.com/chanyn/HKRM. |
Tasks | Object Detection |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12681v1 |
http://arxiv.org/pdf/1810.12681v1.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-knowledge-routed-modules-for-large |
Repo | https://github.com/chanyn/HKRM |
Framework | pytorch |
ReSIFT: Reliability-Weighted SIFT-based Image Quality Assessment
Title | ReSIFT: Reliability-Weighted SIFT-based Image Quality Assessment |
Authors | Dogancan Temel, Ghassan AlRegib |
Abstract | This paper presents a full-reference image quality estimator based on SIFT descriptor matching over reliability-weighted feature maps. Reliability assignment includes a smoothing operation, a transformation to perceptual color domain, a local normalization stage, and a spectral residual computation with global normalization. The proposed method ReSIFT is tested on the LIVE and the LIVE Multiply Distorted databases and compared with 11 state-of-the-art full-reference quality estimators. In terms of the Pearson and the Spearman correlation, ReSIFT is the best performing quality estimator in the overall databases. Moreover, ReSIFT is the best performing quality estimator in at least one distortion group in compression, noise, and blur category. |
Tasks | Image Quality Assessment |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.06090v1 |
http://arxiv.org/pdf/1811.06090v1.pdf | |
PWC | https://paperswithcode.com/paper/resift-reliability-weighted-sift-based-image |
Repo | https://github.com/olivesgatech/ReSIFT |
Framework | none |
Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Title | Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese |
Authors | Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu |
Abstract | Sequence-to-sequence attention-based models have recently shown very promising results on automatic speech recognition (ASR) tasks, which integrate an acoustic, pronunciation and language model into a single neural network. In these models, the Transformer, a new sequence-to-sequence attention-based model relying entirely on self-attention without using RNNs or convolutions, achieves a new single-model state-of-the-art BLEU on neural machine translation (NMT) tasks. Since the outstanding performance of the Transformer, we extend it to speech and concentrate on it as the basic architecture of sequence-to-sequence attention-based model on Mandarin Chinese ASR tasks. Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese. Additionally, a greedy cascading decoder with the Transformer is proposed for mapping CI-phoneme sequences and syllable sequences into word sequences. Experiments on HKUST datasets demonstrate that syllable based model with the Transformer performs better than CI-phoneme based counterpart, and achieves a character error rate (CER) of \emph{$28.77%$}, which is competitive to the state-of-the-art CER of $28.0%$ by the joint CTC-attention based encoder-decoder network. |
Tasks | Language Modelling, Machine Translation, Sequence-To-Sequence Speech Recognition, Speech Recognition |
Published | 2018-04-28 |
URL | http://arxiv.org/abs/1804.10752v2 |
http://arxiv.org/pdf/1804.10752v2.pdf | |
PWC | https://paperswithcode.com/paper/syllable-based-sequence-to-sequence-speech |
Repo | https://github.com/gentaiscool/end2end-asr-pytorch |
Framework | pytorch |
PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention
Title | PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention |
Authors | Yongbin Sun, Yue Wang, Ziwei Liu, Joshua E. Siegel, Sanjay E. Sarma |
Abstract | Generating 3D point clouds is challenging yet highly desired. This work presents a novel autoregressive model, PointGrow, which can generate diverse and realistic point cloud samples from scratch or conditioned on semantic contexts. This model operates recurrently, with each point sampled according to a conditional distribution given its previously-generated points, allowing inter-point correlations to be well-exploited and 3D shape generative processes to be better interpreted. Since point cloud object shapes are typically encoded by long-range dependencies, we augment our model with dedicated self-attention modules to capture such relations. Extensive evaluations show that PointGrow achieves satisfying performance on both unconditional and conditional point cloud generation tasks, with respect to realism and diversity. Several important applications, such as unsupervised feature learning and shape arithmetic operations, are also demonstrated. |
Tasks | Generating 3D Point Clouds, Point Cloud Generation |
Published | 2018-10-12 |
URL | https://arxiv.org/abs/1810.05591v3 |
https://arxiv.org/pdf/1810.05591v3.pdf | |
PWC | https://paperswithcode.com/paper/pointgrow-autoregressively-learned-point |
Repo | https://github.com/syb7573330/PointGrow |
Framework | tf |
Anomaly Generation using Generative Adversarial Networks in Host Based Intrusion Detection
Title | Anomaly Generation using Generative Adversarial Networks in Host Based Intrusion Detection |
Authors | Milad Salem, Shayan Taheri, Jiann Shiun Yuan |
Abstract | Generative adversarial networks have been able to generate striking results in various domains. This generation capability can be general while the networks gain deep understanding regarding the data distribution. In many domains, this data distribution consists of anomalies and normal data, with the anomalies commonly occurring relatively less, creating datasets that are imbalanced. The capabilities that generative adversarial networks offer can be leveraged to examine these anomalies and help alleviate the challenge that imbalanced datasets propose via creating synthetic anomalies. This anomaly generation can be specifically beneficial in domains that have costly data creation processes as well as inherently imbalanced datasets. One of the domains that fits this description is the host-based intrusion detection domain. In this work, ADFA-LD dataset is chosen as the dataset of interest containing system calls of small foot-print next generation attacks. The data is first converted into images, and then a Cycle-GAN is used to create images of anomalous data from images of normal data. The generated data is combined with the original dataset and is used to train a model to detect anomalies. By doing so, it is shown that the classification results are improved, with the AUC rising from 0.55 to 0.71, and the anomaly detection rate rising from 17.07% to 80.49%. The results are also compared to SMOTE, showing the potential presented by generative adversarial networks in anomaly generation. |
Tasks | Anomaly Detection, Intrusion Detection |
Published | 2018-12-11 |
URL | http://arxiv.org/abs/1812.04697v1 |
http://arxiv.org/pdf/1812.04697v1.pdf | |
PWC | https://paperswithcode.com/paper/anomaly-generation-using-generative |
Repo | https://github.com/shayan-taheri/My_Publications |
Framework | none |