February 2, 2020

3301 words 16 mins read

Paper Group AWR 67

Paper Group AWR 67

Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection. GraphSAINT: Graph Sampling Based Inductive Learning Method. Improving Model-based Genetic Programming for Symbolic Regression of Small Expressions. A Programmable Approach to Model Compression. Scalable Bayesian dynamic covariance modeling with variational Wisha …

Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection

Title Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection
Authors Yihe Dong, Samuel B. Hopkins, Jerry Li
Abstract We study two problems in high-dimensional robust statistics: \emph{robust mean estimation} and \emph{outlier detection}. In robust mean estimation the goal is to estimate the mean $\mu$ of a distribution on $\mathbb{R}^d$ given $n$ independent samples, an $\varepsilon$-fraction of which have been corrupted by a malicious adversary. In outlier detection the goal is to assign an \emph{outlier score} to each element of a data set such that elements more likely to be outliers are assigned higher scores. Our algorithms for both problems are based on a new outlier scoring method we call QUE-scoring based on \emph{quantum entropy regularization}. For robust mean estimation, this yields the first algorithm with optimal error rates and nearly-linear running time $\widetilde{O}(nd)$ in all parameters, improving on the previous fastest running time $\widetilde{O}(\min(nd/\varepsilon^6, nd^2))$. For outlier detection, we evaluate the performance of QUE-scoring via extensive experiments on synthetic and real data, and demonstrate that it often performs better than previously proposed algorithms. Code for these experiments is available at https://github.com/twistedcubic/que-outlier-detection .
Tasks Outlier Detection
Published 2019-06-26
URL https://arxiv.org/abs/1906.11366v1
PDF https://arxiv.org/pdf/1906.11366v1.pdf
PWC https://paperswithcode.com/paper/quantum-entropy-scoring-for-fast-robust-mean
Repo https://github.com/twistedcubic/que-outlier-detection
Framework pytorch

GraphSAINT: Graph Sampling Based Inductive Learning Method

Title GraphSAINT: Graph Sampling Based Inductive Learning Method
Authors Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, Viktor Prasanna
Abstract Graph Convolutional Networks (GCNs) are powerful models for learning representations of attributed graphs. To scale GCNs to large graphs, state-of-the-art methods use various layer sampling techniques to alleviate the “neighbor explosion” problem during minibatch training. We propose GraphSAINT, a graph sampling based inductive learning method that improves training efficiency and accuracy in a fundamentally different way. By changing perspective, GraphSAINT constructs minibatches by sampling the training graph, rather than the nodes or edges across GCN layers. Each iteration, a complete GCN is built from the properly sampled subgraph. Thus, we ensure fixed number of well-connected nodes in all layers. We further propose normalization technique to eliminate bias, and sampling algorithms for variance reduction. Importantly, we can decouple the sampling from the forward and backward propagation, and extend GraphSAINT with many architecture variants (e.g., graph attention, jumping connection). GraphSAINT demonstrates superior performance in both accuracy and training time on five large graphs, and achieves new state-of-the-art F1 scores for PPI (0.995) and Reddit (0.970).
Tasks Graph Embedding, Graph Representation Learning, Node Classification
Published 2019-07-10
URL https://arxiv.org/abs/1907.04931v4
PDF https://arxiv.org/pdf/1907.04931v4.pdf
PWC https://paperswithcode.com/paper/graphsaint-graph-sampling-based-inductive
Repo https://github.com/GraphSAINT/GraphSAINT
Framework tf

Improving Model-based Genetic Programming for Symbolic Regression of Small Expressions

Title Improving Model-based Genetic Programming for Symbolic Regression of Small Expressions
Authors Marco Virgolin, Tanja Alderliesten, Cees Witteveen, Peter A. N. Bosman
Abstract The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a model-based EA framework that has been shown to perform well in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts blindly, GOMEA learns a model of interdependencies within the genotype, i.e., the linkage, to estimate what patterns to propagate. In this article, we study the role of Linkage Learning (LL) performed by GOMEA in Symbolic Regression (SR). We show that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and propose a method to correct for this. We also propose approaches to improve LL when ephemeral random constants are used. Furthermore, we adapt a scheme of interleaving runs to alleviate the burden of tuning the population size, a crucial parameter for LL, to SR. We run experiments on 10 real-world datasets, enforcing a strict limitation on solution size, to enable interpretability. We find that the new LL method outperforms the standard one, and that GOMEA outperforms both traditional and semantic GP. We also find that the small solutions evolved by GOMEA are competitive with tuned decision trees, making GOMEA a promising new approach to SR.
Tasks
Published 2019-04-03
URL https://arxiv.org/abs/1904.02050v3
PDF https://arxiv.org/pdf/1904.02050v3.pdf
PWC https://paperswithcode.com/paper/model-based-genetic-programming-with-gomea
Repo https://github.com/marcovirgolin/GP-GOMEA
Framework none

A Programmable Approach to Model Compression

Title A Programmable Approach to Model Compression
Authors Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland, Ganesh Gopalakrishnan
Abstract Deep neural networks frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such as weight pruning and quantization that reduce both model size and inference time without appreciable loss in accuracy. Compressing models before they are deployed can therefore result in significantly more efficient systems. However, while the results are desirable, finding the best compression strategy for a given neural network, target platform, and optimization objective often requires extensive experimentation. Moreover, finding optimal hyperparameters for a given compression strategy typically results in even more expensive, frequently manual, trial-and-error exploration. In this paper, we introduce a programmable system for model compression called Condensa. Users programmatically compose simple operators, in Python, to build complex compression strategies. Given a strategy and a user-provided objective, such as minimization of running time, Condensa uses a novel sample-efficient constrained Bayesian optimization algorithm to automatically infer desirable sparsity ratios. Our experiments on three real-world image classification and language modeling tasks demonstrate memory footprint reductions of up to 65x and runtime throughput improvements of up to 2.22x using at most 10 samples per search. We have released a reference implementation of Condensa at https://github.com/NVlabs/condensa.
Tasks Image Classification, Language Modelling, Model Compression, Quantization
Published 2019-11-06
URL https://arxiv.org/abs/1911.02497v1
PDF https://arxiv.org/pdf/1911.02497v1.pdf
PWC https://paperswithcode.com/paper/a-programmable-approach-to-model-compression
Repo https://github.com/NVlabs/condensa
Framework pytorch

Scalable Bayesian dynamic covariance modeling with variational Wishart and inverse Wishart processes

Title Scalable Bayesian dynamic covariance modeling with variational Wishart and inverse Wishart processes
Authors Creighton Heaukulani, Mark van der Wilk
Abstract We implement gradient-based variational inference routines for Wishart and inverse Wishart processes, which we apply as Bayesian models for the dynamic, heteroskedastic covariance matrix of a multivariate time series. The Wishart and inverse Wishart processes are constructed from i.i.d. Gaussian processes, existing variational inference algorithms for which form the basis of our approach. These methods are easy to implement as a black-box and scale favorably with the length of the time series, however, they fail in the case of the Wishart process, an issue we resolve with a simple modification into an additive white noise parameterization of the model. This modification is also key to implementing a factored variant of the construction, allowing inference to additionally scale to high-dimensional covariance matrices. Through experimentation, we demonstrate that some (but not all) model variants outperform multivariate GARCH when forecasting the covariances of returns on financial instruments.
Tasks Gaussian Processes, Time Series
Published 2019-06-22
URL https://arxiv.org/abs/1906.09360v2
PDF https://arxiv.org/pdf/1906.09360v2.pdf
PWC https://paperswithcode.com/paper/scalable-bayesian-dynamic-covariance-modeling
Repo https://github.com/ckheaukulani/swpr
Framework tf

Transfer Learning with intelligent training data selection for prediction of Alzheimer’s Disease

Title Transfer Learning with intelligent training data selection for prediction of Alzheimer’s Disease
Authors Naimul Mefraz Khan, Marcia Hon, Nabila Abraham
Abstract Detection of Alzheimer’s Disease (AD) from neuroimaging data such as MRI through machine learning has been a subject of intense research in recent years. Recent success of deep learning in computer vision has progressed such research further. However, common limitations with such algorithms are reliance on a large number of training images, and requirement of careful optimization of the architecture of deep networks. In this paper, we attempt solving these issues with transfer learning, where the state-of-the-art VGG architecture is initialized with pre-trained weights from large benchmark datasets consisting of natural images. The network is then fine-tuned with layer-wise tuning, where only a pre-defined group of layers are trained on MRI images. To shrink the training data size, we employ image entropy to select the most informative slices. Through experimentation on the ADNI dataset, we show that with training size of 10 to 20 times smaller than the other contemporary methods, we reach state-of-the-art performance in AD vs. NC, AD vs. MCI, and MCI vs. NC classification problems, with a 4% and a 7% increase in accuracy over the state-of-the-art for AD vs. MCI and MCI vs. NC, respectively. We also provide detailed analysis of the effect of the intelligent training data selection method, changing the training size, and changing the number of layers to be fine-tuned. Finally, we provide Class Activation Maps (CAM) that demonstrate how the proposed model focuses on discriminative image regions that are neuropathologically relevant, and can help the healthcare practitioner in interpreting the model’s decision making process.
Tasks Decision Making, Transfer Learning
Published 2019-06-04
URL https://arxiv.org/abs/1906.01160v1
PDF https://arxiv.org/pdf/1906.01160v1.pdf
PWC https://paperswithcode.com/paper/transfer-learning-with-intelligent-training
Repo https://github.com/marciahon29/AlzheimersProject
Framework none

Transfer Learning Toolkit: Primers and Benchmarks

Title Transfer Learning Toolkit: Primers and Benchmarks
Authors Fuzhen Zhuang, Keyu Duan, Tongjia Guo, Yongchun Zhu, Dongbo Xi, Zhiyuan Qi, Qing He
Abstract The transfer learning toolkit wraps the codes of 17 transfer learning models and provides integrated interfaces, allowing users to use those models by calling a simple function. It is easy for primary researchers to use this toolkit and to choose proper models for real-world applications. The toolkit is written in Python and distributed under MIT open source license. In this paper, the current state of this toolkit is described and the necessary environment setting and usage are introduced.
Tasks Transfer Learning
Published 2019-11-20
URL https://arxiv.org/abs/1911.08967v1
PDF https://arxiv.org/pdf/1911.08967v1.pdf
PWC https://paperswithcode.com/paper/transfer-learning-toolkit-primers-and
Repo https://github.com/FuzhenZhuang/Transfer-Learning-Toolkit
Framework none

Jointly embedding the local and global relations of heterogeneous graph for rumor detection

Title Jointly embedding the local and global relations of heterogeneous graph for rumor detection
Authors Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han, Songlin Hu
Abstract The development of social media has revolutionized the way people communicate, share information and make decisions, but it also provides an ideal platform for publishing and spreading rumors. Existing rumor detection methods focus on finding clues from text content, user profiles, and propagation patterns. However, the local semantic relation and global structural information in the message propagation graph have not been well utilized by previous works. In this paper, we present a novel global-local attention network (GLAN) for rumor detection, which jointly encodes the local semantic and global structural information. We first generate a better integrated representation for each source tweet by fusing the semantic information of related retweets with the attention mechanism. Then, we model the global relationships among all source tweets, retweets, and users as a heterogeneous graph to capture the rich structural information for rumor detection. We conduct experiments on three real-world datasets, and the results demonstrate that GLAN significantly outperforms the state-of-the-art models in both rumor detection and early detection scenarios.
Tasks
Published 2019-09-10
URL https://arxiv.org/abs/1909.04465v2
PDF https://arxiv.org/pdf/1909.04465v2.pdf
PWC https://paperswithcode.com/paper/jointly-embedding-the-local-and-global
Repo https://github.com/chunyuanY/RumorDetection
Framework pytorch

Forward and Backward Information Retention for Accurate Binary Neural Networks

Title Forward and Backward Information Retention for Accurate Binary Neural Networks
Authors Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song
Abstract Weight and activation binarization is an effective approach to deep neural network compression and can accelerate the inference by leveraging bitwise operations. Although many binarization methods have improved the accuracy of the model by minimizing the quantization error in forward propagation, there remains a noticeable performance gap between the binarized model and the full-precision one. Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks. To address these issues, we propose an Information Retention Network (IR-Net) to retain the information that consists in the forward activations and backward gradients. IR-Net mainly relies on two technical contributions: (1) Libra Parameter Binarization (Libra-PB): simultaneously minimizing both quantization error and information loss of parameters by balanced and standardized weights in forward propagation; (2) Error Decay Estimator (EDE): minimizing the information loss of gradients by gradually approximating the sign function in backward propagation, jointly considering the updating ability and accurate gradients. We are the first to investigate both forward and backward processes of binary networks from the unified information perspective, which provides new insight into the mechanism of network binarization. Comprehensive experiments with various network structures on CIFAR-10 and ImageNet datasets manifest that the proposed IR-Net can consistently outperform state-of-the-art quantization methods.
Tasks Neural Network Compression, Quantization
Published 2019-09-24
URL https://arxiv.org/abs/1909.10788v4
PDF https://arxiv.org/pdf/1909.10788v4.pdf
PWC https://paperswithcode.com/paper/ir-net-forward-and-backward-information
Repo https://github.com/JDAI-CV/dabnn
Framework none

Focused Quantization for Sparse CNNs

Title Focused Quantization for Sparse CNNs
Authors Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, Cheng-Zhong Xu
Abstract Deep convolutional neural networks (CNNs) are powerful tools for a wide range of vision tasks, but the enormous amount of memory and compute resources required by CNNs pose a challenge in deploying them on constrained devices. Existing compression techniques, while excelling at reducing model sizes, struggle to be computationally friendly. In this paper, we attend to the statistical properties of sparse CNNs and present focused quantization, a novel quantization strategy based on power-of-two values, which exploits the weight distributions after fine-grained pruning. The proposed method dynamically discovers the most effective numerical representation for weights in layers with varying sparsities, significantly reducing model sizes. Multiplications in quantized CNNs are replaced with much cheaper bit-shift operations for efficient inference. Coupled with lossless encoding, we built a compression pipeline that provides CNNs with high compression ratios (CR), low computation cost and minimal loss in accuracy. In ResNet-50, we achieved a 18.08x CR with only 0.24% loss in top-5 accuracy, outperforming existing compression methods. We fully compressed a ResNet-18 and found that it is not only higher in CR and top-5 accuracy, but also more hardware efficient as it requires fewer logic gates to implement when compared to other state-of-the-art quantization methods assuming the same throughput.
Tasks Model Compression, Neural Network Compression, Quantization
Published 2019-03-07
URL https://arxiv.org/abs/1903.03046v3
PDF https://arxiv.org/pdf/1903.03046v3.pdf
PWC https://paperswithcode.com/paper/efficient-and-effective-quantization-for
Repo https://github.com/deep-fry/mayo
Framework tf

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation

Title FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
Authors Huikai Wu, Junge Zhang, Kaiqi Huang, Kongming Liang, Yizhou Yu
Abstract Modern approaches for semantic segmentation usually employ dilated convolutions in the backbone to extract high-resolution feature maps, which brings heavy computation complexity and memory footprint. To replace the time and memory consuming dilated convolutions, we propose a novel joint upsampling module named Joint Pyramid Upsampling (JPU) by formulating the task of extracting high-resolution feature maps into a joint upsampling problem. With the proposed JPU, our method reduces the computation complexity by more than three times without performance loss. Experiments show that JPU is superior to other upsampling modules, which can be plugged into many existing approaches to reduce computation complexity and improve performance. By replacing dilated convolutions with the proposed JPU module, our method achieves the state-of-the-art performance in Pascal Context dataset (mIoU of 53.13%) and ADE20K dataset (final score of 0.5584) while running 3 times faster.
Tasks Semantic Segmentation
Published 2019-03-28
URL http://arxiv.org/abs/1903.11816v1
PDF http://arxiv.org/pdf/1903.11816v1.pdf
PWC https://paperswithcode.com/paper/fastfcn-rethinking-dilated-convolution-in-the
Repo https://github.com/wuhuikai/FastFCN
Framework pytorch

Evolving Deep Neural Networks by Multi-objective Particle Swarm Optimization for Image Classification

Title Evolving Deep Neural Networks by Multi-objective Particle Swarm Optimization for Image Classification
Authors Bin Wang, Yanan Sun, Bing Xue, Mengjie Zhang
Abstract In recent years, convolutional neural networks (CNNs) have become deeper in order to achieve better classification accuracy in image classification. However, it is difficult to deploy the state-of-the-art deep CNNs for industrial use due to the difficulty of manually fine-tuning the hyperparameters and the trade-off between classification accuracy and computational cost. This paper proposes a novel multi-objective optimization method for evolving state-of-the-art deep CNNs in real-life applications, which automatically evolves the non-dominant solutions at the Pareto front. Three major contributions are made: Firstly, a new encoding strategy is designed to encode one of the best state-of-the-art CNNs; With the classification accuracy and the number of floating point operations as the two objectives, a multi-objective particle swarm optimization method is developed to evolve the non-dominant solutions; Last but not least, a new infrastructure is designed to boost the experiments by concurrently running the experiments on multiple GPUs across multiple machines, and a Python library is developed and released to manage the infrastructure. The experimental results demonstrate that the non-dominant solutions found by the proposed algorithm form a clear Pareto front, and the proposed infrastructure is able to almost linearly reduce the running time.
Tasks Image Classification
Published 2019-03-21
URL http://arxiv.org/abs/1904.09035v2
PDF http://arxiv.org/pdf/1904.09035v2.pdf
PWC https://paperswithcode.com/paper/190409035
Repo https://github.com/wwwbbb8510/cudam
Framework none

Deep Learning for Image Super-resolution: A Survey

Title Deep Learning for Image Super-resolution: A Survey
Authors Zhihao Wang, Jian Chen, Steven C. H. Hoi
Abstract Image Super-Resolution (SR) is an important class of image processing techniques to enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-02-16
URL https://arxiv.org/abs/1902.06068v2
PDF https://arxiv.org/pdf/1902.06068v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-image-super-resolution-a
Repo https://github.com/impredicative/irc-url-title-bot
Framework tf

Improving SIEM for Critical SCADA Water Infrastructures Using Machine Learning

Title Improving SIEM for Critical SCADA Water Infrastructures Using Machine Learning
Authors Hanan Hindy, David Brosset, Ethan Bayne, Amar Seeam, Xavier Bellekens
Abstract Network Control Systems (NAC) have been used in many industrial processes. They aim to reduce the human factor burden and efficiently handle the complex process and communication of those systems. Supervisory control and data acquisition (SCADA) systems are used in industrial, infrastructure and facility processes (e.g. manufacturing, fabrication, oil and water pipelines, building ventilation, etc.) Like other Internet of Things (IoT) implementations, SCADA systems are vulnerable to cyber-attacks, therefore, a robust anomaly detection is a major requirement. However, having an accurate anomaly detection system is not an easy task, due to the difficulty to differentiate between cyber-attacks and system internal failures (e.g. hardware failures). In this paper, we present a model that detects anomaly events in a water system controlled by SCADA. Six Machine Learning techniques have been used in building and evaluating the model. The model classifies different anomaly events including hardware failures (e.g. sensor failures), sabotage and cyber-attacks (e.g. DoS and Spoofing). Unlike other detection systems, our proposed work focuses on notifying the operator when an anomaly occurs with a probability of the event occurring. This additional information helps in accelerating the mitigation process. The model is trained and tested using a real-world dataset.
Tasks Anomaly Detection, Cyber Attack Detection
Published 2019-03-06
URL http://arxiv.org/abs/1904.05724v1
PDF http://arxiv.org/pdf/1904.05724v1.pdf
PWC https://paperswithcode.com/paper/improving-siem-for-critical-scada-water
Repo https://github.com/AbertayMachineLearningGroup/machine-learning-SIEM-water-infrastructure
Framework none

An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models

Title An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models
Authors Aliaksandr Hubin
Abstract Non-homogeneous hidden Markov models (NHHMM) are a subclass of dependent mixture models used for semi-supervised learning, where both transition probabilities between the latent states and mean parameter of the probability distribution of the responses (for a given state) depend on the set of $p$ covariates. A priori we do not know which (and how) covariates influence the transition probabilities and the mean parameters. This induces a complex combinatorial optimization problem for model selection with $4^p$ potential configurations. To address the problem, in this article we propose an adaptive (A) simulated annealing (SA) expectation maximization (EM) algorithm (ASA-EM) for joint optimization of models and their parameters with respect to a criterion of interest.
Tasks Combinatorial Optimization, Model Selection
Published 2019-12-20
URL https://arxiv.org/abs/1912.09733v1
PDF https://arxiv.org/pdf/1912.09733v1.pdf
PWC https://paperswithcode.com/paper/an-adaptive-simulated-annealing-em-algorithm
Repo https://github.com/aliaksah/depmixS4pp
Framework none
comments powered by Disqus