February 2, 2020

3301 words 16 mins read

Paper Group AWR 67

Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection. GraphSAINT: Graph Sampling Based Inductive Learning Method. Improving Model-based Genetic Programming for Symbolic Regression of Small Expressions. A Programmable Approach to Model Compression. Scalable Bayesian dynamic covariance modeling with variational Wisha …

Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection


Title	Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection
Authors	Yihe Dong, Samuel B. Hopkins, Jerry Li
Abstract	We study two problems in high-dimensional robust statistics: \emph{robust mean estimation} and \emph{outlier detection}. In robust mean estimation the goal is to estimate the mean $\mu$ of a distribution on $\mathbb{R}^d$ given $n$ independent samples, an $\varepsilon$-fraction of which have been corrupted by a malicious adversary. In outlier detection the goal is to assign an \emph{outlier score} to each element of a data set such that elements more likely to be outliers are assigned higher scores. Our algorithms for both problems are based on a new outlier scoring method we call QUE-scoring based on \emph{quantum entropy regularization}. For robust mean estimation, this yields the first algorithm with optimal error rates and nearly-linear running time $\widetilde{O}(nd)$ in all parameters, improving on the previous fastest running time $\widetilde{O}(\min(nd/\varepsilon^6, nd^2))$. For outlier detection, we evaluate the performance of QUE-scoring via extensive experiments on synthetic and real data, and demonstrate that it often performs better than previously proposed algorithms. Code for these experiments is available at https://github.com/twistedcubic/que-outlier-detection .
Tasks	Outlier Detection
Published	2019-06-26
URL	https://arxiv.org/abs/1906.11366v1
PDF	https://arxiv.org/pdf/1906.11366v1.pdf
PWC	https://paperswithcode.com/paper/quantum-entropy-scoring-for-fast-robust-mean
Repo	https://github.com/twistedcubic/que-outlier-detection
Framework	pytorch

GraphSAINT: Graph Sampling Based Inductive Learning Method


Title	GraphSAINT: Graph Sampling Based Inductive Learning Method
Authors	Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, Viktor Prasanna
Abstract	Graph Convolutional Networks (GCNs) are powerful models for learning representations of attributed graphs. To scale GCNs to large graphs, state-of-the-art methods use various layer sampling techniques to alleviate the “neighbor explosion” problem during minibatch training. We propose GraphSAINT, a graph sampling based inductive learning method that improves training efficiency and accuracy in a fundamentally different way. By changing perspective, GraphSAINT constructs minibatches by sampling the training graph, rather than the nodes or edges across GCN layers. Each iteration, a complete GCN is built from the properly sampled subgraph. Thus, we ensure fixed number of well-connected nodes in all layers. We further propose normalization technique to eliminate bias, and sampling algorithms for variance reduction. Importantly, we can decouple the sampling from the forward and backward propagation, and extend GraphSAINT with many architecture variants (e.g., graph attention, jumping connection). GraphSAINT demonstrates superior performance in both accuracy and training time on five large graphs, and achieves new state-of-the-art F1 scores for PPI (0.995) and Reddit (0.970).
Tasks	Graph Embedding, Graph Representation Learning, Node Classification
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04931v4
PDF	https://arxiv.org/pdf/1907.04931v4.pdf
PWC	https://paperswithcode.com/paper/graphsaint-graph-sampling-based-inductive
Repo	https://github.com/GraphSAINT/GraphSAINT
Framework	tf

Improving Model-based Genetic Programming for Symbolic Regression of Small Expressions


Title	Improving Model-based Genetic Programming for Symbolic Regression of Small Expressions
Authors	Marco Virgolin, Tanja Alderliesten, Cees Witteveen, Peter A. N. Bosman
Abstract	The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a model-based EA framework that has been shown to perform well in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts blindly, GOMEA learns a model of interdependencies within the genotype, i.e., the linkage, to estimate what patterns to propagate. In this article, we study the role of Linkage Learning (LL) performed by GOMEA in Symbolic Regression (SR). We show that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and propose a method to correct for this. We also propose approaches to improve LL when ephemeral random constants are used. Furthermore, we adapt a scheme of interleaving runs to alleviate the burden of tuning the population size, a crucial parameter for LL, to SR. We run experiments on 10 real-world datasets, enforcing a strict limitation on solution size, to enable interpretability. We find that the new LL method outperforms the standard one, and that GOMEA outperforms both traditional and semantic GP. We also find that the small solutions evolved by GOMEA are competitive with tuned decision trees, making GOMEA a promising new approach to SR.
Tasks
Published	2019-04-03
URL	https://arxiv.org/abs/1904.02050v3
PDF	https://arxiv.org/pdf/1904.02050v3.pdf
PWC	https://paperswithcode.com/paper/model-based-genetic-programming-with-gomea
Repo	https://github.com/marcovirgolin/GP-GOMEA
Framework	none

A Programmable Approach to Model Compression


Title	A Programmable Approach to Model Compression
Authors	Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland, Ganesh Gopalakrishnan
Abstract	Deep neural networks frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such as weight pruning and quantization that reduce both model size and inference time without appreciable loss in accuracy. Compressing models before they are deployed can therefore result in significantly more efficient systems. However, while the results are desirable, finding the best compression strategy for a given neural network, target platform, and optimization objective often requires extensive experimentation. Moreover, finding optimal hyperparameters for a given compression strategy typically results in even more expensive, frequently manual, trial-and-error exploration. In this paper, we introduce a programmable system for model compression called Condensa. Users programmatically compose simple operators, in Python, to build complex compression strategies. Given a strategy and a user-provided objective, such as minimization of running time, Condensa uses a novel sample-efficient constrained Bayesian optimization algorithm to automatically infer desirable sparsity ratios. Our experiments on three real-world image classification and language modeling tasks demonstrate memory footprint reductions of up to 65x and runtime throughput improvements of up to 2.22x using at most 10 samples per search. We have released a reference implementation of Condensa at https://github.com/NVlabs/condensa.
Tasks	Image Classification, Language Modelling, Model Compression, Quantization
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02497v1
PDF	https://arxiv.org/pdf/1911.02497v1.pdf
PWC	https://paperswithcode.com/paper/a-programmable-approach-to-model-compression
Repo	https://github.com/NVlabs/condensa
Framework	pytorch

Scalable Bayesian dynamic covariance modeling with variational Wishart and inverse Wishart processes


Title	Scalable Bayesian dynamic covariance modeling with variational Wishart and inverse Wishart processes
Authors	Creighton Heaukulani, Mark van der Wilk
Abstract	We implement gradient-based variational inference routines for Wishart and inverse Wishart processes, which we apply as Bayesian models for the dynamic, heteroskedastic covariance matrix of a multivariate time series. The Wishart and inverse Wishart processes are constructed from i.i.d. Gaussian processes, existing variational inference algorithms for which form the basis of our approach. These methods are easy to implement as a black-box and scale favorably with the length of the time series, however, they fail in the case of the Wishart process, an issue we resolve with a simple modification into an additive white noise parameterization of the model. This modification is also key to implementing a factored variant of the construction, allowing inference to additionally scale to high-dimensional covariance matrices. Through experimentation, we demonstrate that some (but not all) model variants outperform multivariate GARCH when forecasting the covariances of returns on financial instruments.
Tasks	Gaussian Processes, Time Series
Published	2019-06-22
URL	https://arxiv.org/abs/1906.09360v2
PDF	https://arxiv.org/pdf/1906.09360v2.pdf
PWC	https://paperswithcode.com/paper/scalable-bayesian-dynamic-covariance-modeling
Repo	https://github.com/ckheaukulani/swpr
Framework	tf

Transfer Learning with intelligent training data selection for prediction of Alzheimer’s Disease


Title	Transfer Learning with intelligent training data selection for prediction of Alzheimer’s Disease
Authors	Naimul Mefraz Khan, Marcia Hon, Nabila Abraham
Abstract	Detection of Alzheimer’s Disease (AD) from neuroimaging data such as MRI through machine learning has been a subject of intense research in recent years. Recent success of deep learning in computer vision has progressed such research further. However, common limitations with such algorithms are reliance on a large number of training images, and requirement of careful optimization of the architecture of deep networks. In this paper, we attempt solving these issues with transfer learning, where the state-of-the-art VGG architecture is initialized with pre-trained weights from large benchmark datasets consisting of natural images. The network is then fine-tuned with layer-wise tuning, where only a pre-defined group of layers are trained on MRI images. To shrink the training data size, we employ image entropy to select the most informative slices. Through experimentation on the ADNI dataset, we show that with training size of 10 to 20 times smaller than the other contemporary methods, we reach state-of-the-art performance in AD vs. NC, AD vs. MCI, and MCI vs. NC classification problems, with a 4% and a 7% increase in accuracy over the state-of-the-art for AD vs. MCI and MCI vs. NC, respectively. We also provide detailed analysis of the effect of the intelligent training data selection method, changing the training size, and changing the number of layers to be fine-tuned. Finally, we provide Class Activation Maps (CAM) that demonstrate how the proposed model focuses on discriminative image regions that are neuropathologically relevant, and can help the healthcare practitioner in interpreting the model’s decision making process.
Tasks	Decision Making, Transfer Learning
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01160v1
PDF	https://arxiv.org/pdf/1906.01160v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-with-intelligent-training
Repo	https://github.com/marciahon29/AlzheimersProject
Framework	none

Transfer Learning Toolkit: Primers and Benchmarks


Title	Transfer Learning Toolkit: Primers and Benchmarks
Authors	Fuzhen Zhuang, Keyu Duan, Tongjia Guo, Yongchun Zhu, Dongbo Xi, Zhiyuan Qi, Qing He
Abstract	The transfer learning toolkit wraps the codes of 17 transfer learning models and provides integrated interfaces, allowing users to use those models by calling a simple function. It is easy for primary researchers to use this toolkit and to choose proper models for real-world applications. The toolkit is written in Python and distributed under MIT open source license. In this paper, the current state of this toolkit is described and the necessary environment setting and usage are introduced.
Tasks	Transfer Learning
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08967v1
PDF	https://arxiv.org/pdf/1911.08967v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-toolkit-primers-and
Repo	https://github.com/FuzhenZhuang/Transfer-Learning-Toolkit
Framework	none

Jointly embedding the local and global relations of heterogeneous graph for rumor detection


Title	Jointly embedding the local and global relations of heterogeneous graph for rumor detection
Authors	Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han, Songlin Hu
Abstract	The development of social media has revolutionized the way people communicate, share information and make decisions, but it also provides an ideal platform for publishing and spreading rumors. Existing rumor detection methods focus on finding clues from text content, user profiles, and propagation patterns. However, the local semantic relation and global structural information in the message propagation graph have not been well utilized by previous works. In this paper, we present a novel global-local attention network (GLAN) for rumor detection, which jointly encodes the local semantic and global structural information. We first generate a better integrated representation for each source tweet by fusing the semantic information of related retweets with the attention mechanism. Then, we model the global relationships among all source tweets, retweets, and users as a heterogeneous graph to capture the rich structural information for rumor detection. We conduct experiments on three real-world datasets, and the results demonstrate that GLAN significantly outperforms the state-of-the-art models in both rumor detection and early detection scenarios.
Tasks
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04465v2
PDF	https://arxiv.org/pdf/1909.04465v2.pdf
PWC	https://paperswithcode.com/paper/jointly-embedding-the-local-and-global
Repo	https://github.com/chunyuanY/RumorDetection
Framework	pytorch

Forward and Backward Information Retention for Accurate Binary Neural Networks


Title	Forward and Backward Information Retention for Accurate Binary Neural Networks
Authors	Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song
Abstract	Weight and activation binarization is an effective approach to deep neural network compression and can accelerate the inference by leveraging bitwise operations. Although many binarization methods have improved the accuracy of the model by minimizing the quantization error in forward propagation, there remains a noticeable performance gap between the binarized model and the full-precision one. Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks. To address these issues, we propose an Information Retention Network (IR-Net) to retain the information that consists in the forward activations and backward gradients. IR-Net mainly relies on two technical contributions: (1) Libra Parameter Binarization (Libra-PB): simultaneously minimizing both quantization error and information loss of parameters by balanced and standardized weights in forward propagation; (2) Error Decay Estimator (EDE): minimizing the information loss of gradients by gradually approximating the sign function in backward propagation, jointly considering the updating ability and accurate gradients. We are the first to investigate both forward and backward processes of binary networks from the unified information perspective, which provides new insight into the mechanism of network binarization. Comprehensive experiments with various network structures on CIFAR-10 and ImageNet datasets manifest that the proposed IR-Net can consistently outperform state-of-the-art quantization methods.
Tasks	Neural Network Compression, Quantization
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10788v4
PDF	https://arxiv.org/pdf/1909.10788v4.pdf
PWC	https://paperswithcode.com/paper/ir-net-forward-and-backward-information
Repo	https://github.com/JDAI-CV/dabnn
Framework	none

Focused Quantization for Sparse CNNs


Title	Focused Quantization for Sparse CNNs
Authors	Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, Cheng-Zhong Xu
Abstract	Deep convolutional neural networks (CNNs) are powerful tools for a wide range of vision tasks, but the enormous amount of memory and compute resources required by CNNs pose a challenge in deploying them on constrained devices. Existing compression techniques, while excelling at reducing model sizes, struggle to be computationally friendly. In this paper, we attend to the statistical properties of sparse CNNs and present focused quantization, a novel quantization strategy based on power-of-two values, which exploits the weight distributions after fine-grained pruning. The proposed method dynamically discovers the most effective numerical representation for weights in layers with varying sparsities, significantly reducing model sizes. Multiplications in quantized CNNs are replaced with much cheaper bit-shift operations for efficient inference. Coupled with lossless encoding, we built a compression pipeline that provides CNNs with high compression ratios (CR), low computation cost and minimal loss in accuracy. In ResNet-50, we achieved a 18.08x CR with only 0.24% loss in top-5 accuracy, outperforming existing compression methods. We fully compressed a ResNet-18 and found that it is not only higher in CR and top-5 accuracy, but also more hardware efficient as it requires fewer logic gates to implement when compared to other state-of-the-art quantization methods assuming the same throughput.
Tasks	Model Compression, Neural Network Compression, Quantization
Published	2019-03-07
URL	https://arxiv.org/abs/1903.03046v3
PDF	https://arxiv.org/pdf/1903.03046v3.pdf
PWC	https://paperswithcode.com/paper/efficient-and-effective-quantization-for
Repo	https://github.com/deep-fry/mayo
Framework	tf

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation


Title	FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
Authors	Huikai Wu, Junge Zhang, Kaiqi Huang, Kongming Liang, Yizhou Yu
Abstract	Modern approaches for semantic segmentation usually employ dilated convolutions in the backbone to extract high-resolution feature maps, which brings heavy computation complexity and memory footprint. To replace the time and memory consuming dilated convolutions, we propose a novel joint upsampling module named Joint Pyramid Upsampling (JPU) by formulating the task of extracting high-resolution feature maps into a joint upsampling problem. With the proposed JPU, our method reduces the computation complexity by more than three times without performance loss. Experiments show that JPU is superior to other upsampling modules, which can be plugged into many existing approaches to reduce computation complexity and improve performance. By replacing dilated convolutions with the proposed JPU module, our method achieves the state-of-the-art performance in Pascal Context dataset (mIoU of 53.13%) and ADE20K dataset (final score of 0.5584) while running 3 times faster.
Tasks	Semantic Segmentation
Published	2019-03-28
URL	http://arxiv.org/abs/1903.11816v1
PDF	http://arxiv.org/pdf/1903.11816v1.pdf
PWC	https://paperswithcode.com/paper/fastfcn-rethinking-dilated-convolution-in-the
Repo	https://github.com/wuhuikai/FastFCN
Framework	pytorch

Evolving Deep Neural Networks by Multi-objective Particle Swarm Optimization for Image Classification


Title	Evolving Deep Neural Networks by Multi-objective Particle Swarm Optimization for Image Classification
Authors	Bin Wang, Yanan Sun, Bing Xue, Mengjie Zhang
Abstract	In recent years, convolutional neural networks (CNNs) have become deeper in order to achieve better classification accuracy in image classification. However, it is difficult to deploy the state-of-the-art deep CNNs for industrial use due to the difficulty of manually fine-tuning the hyperparameters and the trade-off between classification accuracy and computational cost. This paper proposes a novel multi-objective optimization method for evolving state-of-the-art deep CNNs in real-life applications, which automatically evolves the non-dominant solutions at the Pareto front. Three major contributions are made: Firstly, a new encoding strategy is designed to encode one of the best state-of-the-art CNNs; With the classification accuracy and the number of floating point operations as the two objectives, a multi-objective particle swarm optimization method is developed to evolve the non-dominant solutions; Last but not least, a new infrastructure is designed to boost the experiments by concurrently running the experiments on multiple GPUs across multiple machines, and a Python library is developed and released to manage the infrastructure. The experimental results demonstrate that the non-dominant solutions found by the proposed algorithm form a clear Pareto front, and the proposed infrastructure is able to almost linearly reduce the running time.
Tasks	Image Classification
Published	2019-03-21
URL	http://arxiv.org/abs/1904.09035v2
PDF	http://arxiv.org/pdf/1904.09035v2.pdf
PWC	https://paperswithcode.com/paper/190409035
Repo	https://github.com/wwwbbb8510/cudam
Framework	none

Deep Learning for Image Super-resolution: A Survey


Title	Deep Learning for Image Super-resolution: A Survey
Authors	Zhihao Wang, Jian Chen, Steven C. H. Hoi
Abstract	Image Super-Resolution (SR) is an important class of image processing techniques to enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-02-16
URL	https://arxiv.org/abs/1902.06068v2
PDF	https://arxiv.org/pdf/1902.06068v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-image-super-resolution-a
Repo	https://github.com/impredicative/irc-url-title-bot
Framework	tf

Improving SIEM for Critical SCADA Water Infrastructures Using Machine Learning


Title	Improving SIEM for Critical SCADA Water Infrastructures Using Machine Learning
Authors	Hanan Hindy, David Brosset, Ethan Bayne, Amar Seeam, Xavier Bellekens
Abstract	Network Control Systems (NAC) have been used in many industrial processes. They aim to reduce the human factor burden and efficiently handle the complex process and communication of those systems. Supervisory control and data acquisition (SCADA) systems are used in industrial, infrastructure and facility processes (e.g. manufacturing, fabrication, oil and water pipelines, building ventilation, etc.) Like other Internet of Things (IoT) implementations, SCADA systems are vulnerable to cyber-attacks, therefore, a robust anomaly detection is a major requirement. However, having an accurate anomaly detection system is not an easy task, due to the difficulty to differentiate between cyber-attacks and system internal failures (e.g. hardware failures). In this paper, we present a model that detects anomaly events in a water system controlled by SCADA. Six Machine Learning techniques have been used in building and evaluating the model. The model classifies different anomaly events including hardware failures (e.g. sensor failures), sabotage and cyber-attacks (e.g. DoS and Spoofing). Unlike other detection systems, our proposed work focuses on notifying the operator when an anomaly occurs with a probability of the event occurring. This additional information helps in accelerating the mitigation process. The model is trained and tested using a real-world dataset.
Tasks	Anomaly Detection, Cyber Attack Detection
Published	2019-03-06
URL	http://arxiv.org/abs/1904.05724v1
PDF	http://arxiv.org/pdf/1904.05724v1.pdf
PWC	https://paperswithcode.com/paper/improving-siem-for-critical-scada-water
Repo	https://github.com/AbertayMachineLearningGroup/machine-learning-SIEM-water-infrastructure
Framework	none

An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models


Title	An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models
Authors	Aliaksandr Hubin
Abstract	Non-homogeneous hidden Markov models (NHHMM) are a subclass of dependent mixture models used for semi-supervised learning, where both transition probabilities between the latent states and mean parameter of the probability distribution of the responses (for a given state) depend on the set of $p$ covariates. A priori we do not know which (and how) covariates influence the transition probabilities and the mean parameters. This induces a complex combinatorial optimization problem for model selection with $4^p$ potential configurations. To address the problem, in this article we propose an adaptive (A) simulated annealing (SA) expectation maximization (EM) algorithm (ASA-EM) for joint optimization of models and their parameters with respect to a criterion of interest.
Tasks	Combinatorial Optimization, Model Selection
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09733v1
PDF	https://arxiv.org/pdf/1912.09733v1.pdf
PWC	https://paperswithcode.com/paper/an-adaptive-simulated-annealing-em-algorithm
Repo	https://github.com/aliaksah/depmixS4pp
Framework	none