Paper Group ANR 265
Combined convolutional and recurrent neural networks for hierarchical classification of images. Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning. Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch. Learning automata based SVM for intrusion detection. Multi-Agent …
Combined convolutional and recurrent neural networks for hierarchical classification of images
Title | Combined convolutional and recurrent neural networks for hierarchical classification of images |
Authors | Jaehoon Koo, Diego Klabjan, Jean Utke |
Abstract | Deep learning models based on CNNs are predominantly used in image classification tasks. Such approaches, assuming independence of object categories, normally use a CNN as a feature learner and apply a flat classifier on top of it. Object classes in many settings have hierarchical relations, and classifiers exploiting these relations should perform better. We propose hierarchical classification models combining a CNN to extract hierarchical representations of images, and an RNN or sequence-to-sequence model to capture a hierarchical tree of classes. In addition, we apply residual learning to the RNN part in oder to facilitate training our compound model and improve generalization of the model. Experimental results on a real world proprietary dataset of images show that our hierarchical networks perform better than state-of-the-art CNNs. |
Tasks | Image Classification |
Published | 2018-09-25 |
URL | https://arxiv.org/abs/1809.09574v3 |
https://arxiv.org/pdf/1809.09574v3.pdf | |
PWC | https://paperswithcode.com/paper/combined-convolutional-and-recurrent-neural |
Repo | |
Framework | |
Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning
Title | Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning |
Authors | Gurunath Reddy M, Tanumay Mandal, Krothapalli Sreenivasa Rao |
Abstract | In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx. We have created a pathological dataset which consists of simultaneous recordings of glottal source and acoustic speech signal of six different disorders from vocal disordered patients. The GCI locations are manually annotated for disorder analysis and supervised learning. We have proposed convolutional neural network based GCI detection method by fusing deep acoustic speech and linear prediction residual features for robust GCI detection. The experimental results showed that the proposed method is significantly better than the state-of-the-art GCI detection methods. |
Tasks | |
Published | 2018-11-25 |
URL | http://arxiv.org/abs/1811.09956v1 |
http://arxiv.org/pdf/1811.09956v1.pdf | |
PWC | https://paperswithcode.com/paper/glottal-closure-instants-detection-from |
Repo | |
Framework | |
Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch
Title | Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch |
Authors | Sounak Dey, Anjan Dutta, Suman K. Ghosh, Ernest Valveny, Josep Lladós, Umapada Pal |
Abstract | In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets. |
Tasks | Image Retrieval |
Published | 2018-04-28 |
URL | http://arxiv.org/abs/1804.10819v1 |
http://arxiv.org/pdf/1804.10819v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-cross-modal-deep-embeddings-for |
Repo | |
Framework | |
Learning automata based SVM for intrusion detection
Title | Learning automata based SVM for intrusion detection |
Authors | Chong Di |
Abstract | As an indispensable defensive measure of network security, the intrusion detection is a process of monitoring the events occurring in a computer system or network and analyzing them for signs of possible incidents. It is a classifier to judge the event is normal or malicious. The information used for intrusion detection contains some redundant features which would increase the difficulty of training the classifier for intrusion detection and increase the time of making predictions. To simplify the training process and improve the efficiency of the classifier, it is necessary to remove these dispensable features. in this paper, we propose a novel LA-SVM scheme to automatically remove redundant features focusing on intrusion detection. This is the first application of learning automata for solving dimension reduction problems. The simulation results indicate that the LA-SVM scheme achieves a higher accuracy and is more efficient in making predictions compared with traditional SVM. |
Tasks | Dimensionality Reduction, Intrusion Detection |
Published | 2018-01-04 |
URL | http://arxiv.org/abs/1801.01314v1 |
http://arxiv.org/pdf/1801.01314v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-automata-based-svm-for-intrusion |
Repo | |
Framework | |
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
Title | Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization |
Authors | Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong |
Abstract | Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy. In this paper, we propose a double averaging scheme, where each agent iteratively performs averaging over both space and time to incorporate neighboring gradient information and local reward information, respectively. We prove that the proposed algorithm converges to the optimal solution at a global geometric rate. In particular, such an algorithm is built upon a primal-dual reformulation of the mean squared projected Bellman error minimization problem, which gives rise to a decentralized convex-concave saddle-point problem. To the best of our knowledge, the proposed double averaging primal-dual optimization algorithm is the first to achieve fast finite-time convergence on decentralized convex-concave saddle-point problems. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2018-06-03 |
URL | http://arxiv.org/abs/1806.00877v4 |
http://arxiv.org/pdf/1806.00877v4.pdf | |
PWC | https://paperswithcode.com/paper/multi-agent-reinforcement-learning-via-double |
Repo | |
Framework | |
Leveraging Filter Correlations for Deep Model Compression
Title | Leveraging Filter Correlations for Deep Model Compression |
Authors | Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri |
Abstract | We present a filter correlation based model compression approach for deep convolutional neural networks. Our approach iteratively identifies pairs of filters with the largest pairwise correlations and drops one of the filters from each such pair. However, instead of discarding one of the filters from each such pair na"{i}vely, the model is re-optimized to make the filters in these pairs maximally correlated, so that discarding one of the filters from the pair results in minimal information loss. Moreover, after discarding the filters in each round, we further finetune the model to recover from the potential small loss incurred by the compression. We evaluate our proposed approach using a comprehensive set of experiments and ablation studies. Our compression method yields state-of-the-art FLOPs compression rates on various benchmarks, such as LeNet-5, VGG-16, and ResNet-50,56, while still achieving excellent predictive performance for tasks such as object detection on benchmark datasets. |
Tasks | Model Compression, Object Detection |
Published | 2018-11-26 |
URL | https://arxiv.org/abs/1811.10559v2 |
https://arxiv.org/pdf/1811.10559v2.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-filter-correlations-for-deep-model |
Repo | |
Framework | |
Genie: An Open Box Counterfactual Policy Estimator for Optimizing Sponsored Search Marketplace
Title | Genie: An Open Box Counterfactual Policy Estimator for Optimizing Sponsored Search Marketplace |
Authors | Murat Ali Bayir, Mingsen Xu, Yaojia Zhu, Yifan Shi |
Abstract | In this paper, we propose an offline counterfactual policy estimation framework called Genie to optimize Sponsored Search Marketplace. Genie employs an open box simulation engine with click calibration model to compute the KPI impact of any modification to the system. From the experimental results on Bing traffic, we showed that Genie performs better than existing observational approaches that employs randomized experiments for traffic slices that have frequent policy updates. We also show that Genie can be used to tune completely new policies efficiently without creating risky randomized experiments due to cold start problem. As time of today, Genie hosts more than 10000 optimization jobs yearly which runs more than 30 Million processing node hours of big data jobs for Bing Ads. For the last 3 years, Genie has been proven to be the one of the major platforms to optimize Bing Ads Marketplace due to its reliability under frequent policy changes and its efficiency to minimize risks in real experiments. |
Tasks | Calibration |
Published | 2018-08-22 |
URL | http://arxiv.org/abs/1808.07251v1 |
http://arxiv.org/pdf/1808.07251v1.pdf | |
PWC | https://paperswithcode.com/paper/genie-an-open-box-counterfactual-policy |
Repo | |
Framework | |
Joint Neural Architecture Search and Quantization
Title | Joint Neural Architecture Search and Quantization |
Authors | Yukang Chen, Gaofeng Meng, Qian Zhang, Xinbang Zhang, Liangchen Song, Shiming Xiang, Chunhong Pan |
Abstract | Designing neural architectures is a fundamental step in deep learning applications. As a partner technique, model compression on neural networks has been widely investigated to gear the needs that the deep learning algorithms could be run with the limited computation resources on mobile devices. Currently, both the tasks of architecture design and model compression require expertise tricks and tedious trials. In this paper, we integrate these two tasks into one unified framework, which enables the joint architecture search with quantization (compression) policies for neural networks. This method is named JASQ. Here our goal is to automatically find a compact neural network model with high performance that is suitable for mobile devices. Technically, a multi-objective evolutionary search algorithm is introduced to search the models under the balance between model size and performance accuracy. In experiments, we find that our approach outperforms the methods that search only for architectures or only for quantization policies. 1) Specifically, given existing networks, our approach can provide them with learning-based quantization policies, and outperforms their 2 bits, 4 bits, 8 bits, and 16 bits counterparts. It can yield higher accuracies than the float models, for example, over 1.02% higher accuracy on MobileNet-v1. 2) What is more, under the balance between model size and performance accuracy, two models are obtained with joint search of architectures and quantization policies: a high-accuracy model and a small model, JASQNet and JASQNet-Small that achieves 2.97% error rate with 0.9 MB on CIFAR-10. |
Tasks | Model Compression, Neural Architecture Search, Quantization |
Published | 2018-11-23 |
URL | http://arxiv.org/abs/1811.09426v1 |
http://arxiv.org/pdf/1811.09426v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-neural-architecture-search-and |
Repo | |
Framework | |
Decoupling Respiratory and Angular Variation in Rotational X-ray Scans Using a Prior Bilinear Model
Title | Decoupling Respiratory and Angular Variation in Rotational X-ray Scans Using a Prior Bilinear Model |
Authors | Tobias Geimer, Paul Keall, Katharina Breininger, Vincent Caillet, Michelle Dunbar, Christoph Bert, Andreas Maier |
Abstract | Data-driven respiratory signal extraction from rotational X-ray scans is a challenge as angular effects overlap with respiration-induced change in the scene. In this paper, we use the linearity of the X-ray transform to propose a bilinear model based on a prior 4D scan to separate angular and respiratory variation. The bilinear estimation process is supported by a B-spline interpolation using prior knowledge about the trajectory angle. Consequently, extraction of respiratory features simplifies to a linear problem. Though the need for a prior 4D CT seems steep, our proposed use-case of driving a respiratory motion model in radiation therapy usually meets this requirement. We evaluate on DRRs of 5 patient 4D CTs in a leave-one-phase-out manner and achieve a mean estimation error of 3.01 % in the gray values for unseen viewing angles. We further demonstrate suitability of the extracted weights to drive a motion model for treatments with a continuously rotating gantry. |
Tasks | |
Published | 2018-04-30 |
URL | http://arxiv.org/abs/1804.11227v3 |
http://arxiv.org/pdf/1804.11227v3.pdf | |
PWC | https://paperswithcode.com/paper/decoupling-respiratory-and-angular-variation |
Repo | |
Framework | |
Stability Based Filter Pruning for Accelerating Deep CNNs
Title | Stability Based Filter Pruning for Accelerating Deep CNNs |
Authors | Pravendra Singh, Vinay Sameer Raja Kadi, Nikhil Verma, Vinay P. Namboodiri |
Abstract | Convolutional neural networks (CNN) have achieved impressive performance on the wide variety of tasks (classification, detection, etc.) across multiple domains at the cost of high computational and memory requirements. Thus, leveraging CNNs for real-time applications necessitates model compression approaches that not only reduce the total number of parameters but reduce the overall computation as well. In this work, we present a stability-based approach for filter-level pruning of CNNs. We evaluate our proposed approach on different architectures (LeNet, VGG-16, ResNet, and Faster RCNN) and datasets and demonstrate its generalizability through extensive experiments. Moreover, our compressed models can be used at run-time without requiring any special libraries or hardware. Our model compression method reduces the number of FLOPS by an impressive factor of 6.03X and GPU memory footprint by more than 17X, significantly outperforming other state-of-the-art filter pruning methods. |
Tasks | Model Compression |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08321v1 |
http://arxiv.org/pdf/1811.08321v1.pdf | |
PWC | https://paperswithcode.com/paper/stability-based-filter-pruning-for |
Repo | |
Framework | |
ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers
Title | ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers |
Authors | Ao Ren, Tianyun Zhang, Shaokai Ye, Jiayu Li, Wenyao Xu, Xuehai Qian, Xue Lin, Yanzhi Wang |
Abstract | To facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), two important categories of DNN model compression techniques: weight pruning and weight quantization are investigated. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. However, there lacks a systematic framework of joint weight pruning and quantization of DNNs, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted for besides simply model size reduction. To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to deal with non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than prior work. The second part is hardware-aware DNN optimizations to facilitate hardware-level implementations. Without accuracy loss, we can achieve 85$\times$ and 24$\times$ pruning on LeNet-5 and AlexNet models, respectively, significantly higher than prior work. The improvement becomes more significant when focusing on computation reductions. Combining weight pruning and quantization, we achieve 1,910$\times$ and 231$\times$ reductions in overall model size on these two benchmarks, when focusing on data storage. Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50. |
Tasks | Model Compression, Quantization |
Published | 2018-12-31 |
URL | http://arxiv.org/abs/1812.11677v1 |
http://arxiv.org/pdf/1812.11677v1.pdf | |
PWC | https://paperswithcode.com/paper/admm-nn-an-algorithm-hardware-co-design |
Repo | |
Framework | |
Private Model Compression via Knowledge Distillation
Title | Private Model Compression via Knowledge Distillation |
Authors | Ji Wang, Weidong Bao, Lichao Sun, Xiaomin Zhu, Bokai Cao, Philip S. Yu |
Abstract | The soaring demand for intelligent mobile applications calls for deploying powerful deep neural networks (DNNs) on mobile devices. However, the outstanding performance of DNNs notoriously relies on increasingly complex models, which in turn is associated with an increase in computational expense far surpassing mobile devices’ capacity. What is worse, app service providers need to collect and utilize a large volume of users’ data, which contain sensitive information, to build the sophisticated DNN models. Directly deploying these models on public mobile devices presents prohibitive privacy risk. To benefit from the on-device deep learning without the capacity and privacy concerns, we design a private model compression framework RONA. Following the knowledge distillation paradigm, we jointly use hint learning, distillation learning, and self learning to train a compact and fast neural network. The knowledge distilled from the cumbersome model is adaptively bounded and carefully perturbed to enforce differential privacy. We further propose an elegant query sample selection method to reduce the number of queries and control the privacy loss. A series of empirical evaluations as well as the implementation on an Android mobile device show that RONA can not only compress cumbersome models efficiently but also provide a strong privacy guarantee. For example, on SVHN, when a meaningful $(9.83,10^{-6})$-differential privacy is guaranteed, the compact model trained by RONA can obtain 20$\times$ compression ratio and 19$\times$ speed-up with merely 0.97% accuracy loss. |
Tasks | Model Compression |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05072v1 |
http://arxiv.org/pdf/1811.05072v1.pdf | |
PWC | https://paperswithcode.com/paper/private-model-compression-via-knowledge |
Repo | |
Framework | |
Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition
Title | Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition |
Authors | Raden Mu’az Mun’im, Nakamasa Inoue, Koichi Shinoda |
Abstract | We investigate the feasibility of sequence-level knowledge distillation of Sequence-to-Sequence (Seq2Seq) models for Large Vocabulary Continuous Speech Recognition (LVSCR). We first use a pre-trained larger teacher model to generate multiple hypotheses per utterance with beam search. With the same input, we then train the student model using these hypotheses generated from the teacher as pseudo labels in place of the original ground truth labels. We evaluate our proposed method using Wall Street Journal (WSJ) corpus. It achieved up to $ 9.8 \times$ parameter reduction with accuracy loss of up to 7.0% word-error rate (WER) increase |
Tasks | Large Vocabulary Continuous Speech Recognition, Model Compression, Sequence-To-Sequence Speech Recognition, Speech Recognition |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.04531v1 |
http://arxiv.org/pdf/1811.04531v1.pdf | |
PWC | https://paperswithcode.com/paper/sequence-level-knowledge-distillation-for |
Repo | |
Framework | |
FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks
Title | FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks |
Authors | Raphael Tang, Ashutosh Adhikari, Jimmy Lin |
Abstract | There exists a plethora of techniques for inducing structured sparsity in parametric models during the optimization process, with the final goal of resource-efficient inference. However, few methods target a specific number of floating-point operations (FLOPs) as part of the optimization objective, despite many reporting FLOPs as part of the results. Furthermore, a one-size-fits-all approach ignores realistic system constraints, which differ significantly between, say, a GPU and a mobile phone – FLOPs on the former incur less latency than on the latter; thus, it is important for practitioners to be able to specify a target number of FLOPs during model compression. In this work, we extend a state-of-the-art technique to directly incorporate FLOPs as part of the optimization objective and show that, given a desired FLOPs requirement, different neural networks can be successfully trained for image classification. |
Tasks | Image Classification, Model Compression |
Published | 2018-11-07 |
URL | http://arxiv.org/abs/1811.03060v2 |
http://arxiv.org/pdf/1811.03060v2.pdf | |
PWC | https://paperswithcode.com/paper/flops-as-a-direct-optimization-objective-for |
Repo | |
Framework | |
Solving Archaeological Puzzles
Title | Solving Archaeological Puzzles |
Authors | Niv Derech, Ayellet Tal, Ilan Shimshoni |
Abstract | Puzzle solving is a difficult problem in its own right, even when the pieces are all square and build up a natural image. But what if these ideal conditions do not hold? One such application domain is archaeology, where restoring an artifact from its fragments is highly important. From the point of view of computer vision, archaeological puzzle solving is very challenging, due to three additional difficulties: the fragments are of general shape; they are abraded, especially at the boundaries (where the strongest cues for matching should exist); and the domain of valid transformations between the pieces is continuous. The key contribution of this paper is a fully-automatic and general algorithm that addresses puzzle solving in this intriguing domain. We show that our state-of-the-art approach manages to correctly reassemble dozens of broken artifacts and frescoes. |
Tasks | |
Published | 2018-12-26 |
URL | http://arxiv.org/abs/1812.10553v1 |
http://arxiv.org/pdf/1812.10553v1.pdf | |
PWC | https://paperswithcode.com/paper/solving-archaeological-puzzles |
Repo | |
Framework | |