Paper Group AWR 67
Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection. GraphSAINT: Graph Sampling Based Inductive Learning Method. Improving Model-based Genetic Programming for Symbolic Regression of Small Expressions. A Programmable Approach to Model Compression. Scalable Bayesian dynamic covariance modeling with variational Wisha …
Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection
Title | Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection |
Authors | Yihe Dong, Samuel B. Hopkins, Jerry Li |
Abstract | We study two problems in high-dimensional robust statistics: \emph{robust mean estimation} and \emph{outlier detection}. In robust mean estimation the goal is to estimate the mean $\mu$ of a distribution on $\mathbb{R}^d$ given $n$ independent samples, an $\varepsilon$-fraction of which have been corrupted by a malicious adversary. In outlier detection the goal is to assign an \emph{outlier score} to each element of a data set such that elements more likely to be outliers are assigned higher scores. Our algorithms for both problems are based on a new outlier scoring method we call QUE-scoring based on \emph{quantum entropy regularization}. For robust mean estimation, this yields the first algorithm with optimal error rates and nearly-linear running time $\widetilde{O}(nd)$ in all parameters, improving on the previous fastest running time $\widetilde{O}(\min(nd/\varepsilon^6, nd^2))$. For outlier detection, we evaluate the performance of QUE-scoring via extensive experiments on synthetic and real data, and demonstrate that it often performs better than previously proposed algorithms. Code for these experiments is available at https://github.com/twistedcubic/que-outlier-detection . |
Tasks | Outlier Detection |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.11366v1 |
https://arxiv.org/pdf/1906.11366v1.pdf | |
PWC | https://paperswithcode.com/paper/quantum-entropy-scoring-for-fast-robust-mean |
Repo | https://github.com/twistedcubic/que-outlier-detection |
Framework | pytorch |
GraphSAINT: Graph Sampling Based Inductive Learning Method
Title | GraphSAINT: Graph Sampling Based Inductive Learning Method |
Authors | Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, Viktor Prasanna |
Abstract | Graph Convolutional Networks (GCNs) are powerful models for learning representations of attributed graphs. To scale GCNs to large graphs, state-of-the-art methods use various layer sampling techniques to alleviate the “neighbor explosion” problem during minibatch training. We propose GraphSAINT, a graph sampling based inductive learning method that improves training efficiency and accuracy in a fundamentally different way. By changing perspective, GraphSAINT constructs minibatches by sampling the training graph, rather than the nodes or edges across GCN layers. Each iteration, a complete GCN is built from the properly sampled subgraph. Thus, we ensure fixed number of well-connected nodes in all layers. We further propose normalization technique to eliminate bias, and sampling algorithms for variance reduction. Importantly, we can decouple the sampling from the forward and backward propagation, and extend GraphSAINT with many architecture variants (e.g., graph attention, jumping connection). GraphSAINT demonstrates superior performance in both accuracy and training time on five large graphs, and achieves new state-of-the-art F1 scores for PPI (0.995) and Reddit (0.970). |
Tasks | Graph Embedding, Graph Representation Learning, Node Classification |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04931v4 |
https://arxiv.org/pdf/1907.04931v4.pdf | |
PWC | https://paperswithcode.com/paper/graphsaint-graph-sampling-based-inductive |
Repo | https://github.com/GraphSAINT/GraphSAINT |
Framework | tf |
Improving Model-based Genetic Programming for Symbolic Regression of Small Expressions
Title | Improving Model-based Genetic Programming for Symbolic Regression of Small Expressions |
Authors | Marco Virgolin, Tanja Alderliesten, Cees Witteveen, Peter A. N. Bosman |
Abstract | The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a model-based EA framework that has been shown to perform well in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts blindly, GOMEA learns a model of interdependencies within the genotype, i.e., the linkage, to estimate what patterns to propagate. In this article, we study the role of Linkage Learning (LL) performed by GOMEA in Symbolic Regression (SR). We show that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and propose a method to correct for this. We also propose approaches to improve LL when ephemeral random constants are used. Furthermore, we adapt a scheme of interleaving runs to alleviate the burden of tuning the population size, a crucial parameter for LL, to SR. We run experiments on 10 real-world datasets, enforcing a strict limitation on solution size, to enable interpretability. We find that the new LL method outperforms the standard one, and that GOMEA outperforms both traditional and semantic GP. We also find that the small solutions evolved by GOMEA are competitive with tuned decision trees, making GOMEA a promising new approach to SR. |
Tasks | |
Published | 2019-04-03 |
URL | https://arxiv.org/abs/1904.02050v3 |
https://arxiv.org/pdf/1904.02050v3.pdf | |
PWC | https://paperswithcode.com/paper/model-based-genetic-programming-with-gomea |
Repo | https://github.com/marcovirgolin/GP-GOMEA |
Framework | none |
A Programmable Approach to Model Compression
Title | A Programmable Approach to Model Compression |
Authors | Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland, Ganesh Gopalakrishnan |
Abstract | Deep neural networks frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such as weight pruning and quantization that reduce both model size and inference time without appreciable loss in accuracy. Compressing models before they are deployed can therefore result in significantly more efficient systems. However, while the results are desirable, finding the best compression strategy for a given neural network, target platform, and optimization objective often requires extensive experimentation. Moreover, finding optimal hyperparameters for a given compression strategy typically results in even more expensive, frequently manual, trial-and-error exploration. In this paper, we introduce a programmable system for model compression called Condensa. Users programmatically compose simple operators, in Python, to build complex compression strategies. Given a strategy and a user-provided objective, such as minimization of running time, Condensa uses a novel sample-efficient constrained Bayesian optimization algorithm to automatically infer desirable sparsity ratios. Our experiments on three real-world image classification and language modeling tasks demonstrate memory footprint reductions of up to 65x and runtime throughput improvements of up to 2.22x using at most 10 samples per search. We have released a reference implementation of Condensa at https://github.com/NVlabs/condensa. |
Tasks | Image Classification, Language Modelling, Model Compression, Quantization |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.02497v1 |
https://arxiv.org/pdf/1911.02497v1.pdf | |
PWC | https://paperswithcode.com/paper/a-programmable-approach-to-model-compression |
Repo | https://github.com/NVlabs/condensa |
Framework | pytorch |
Scalable Bayesian dynamic covariance modeling with variational Wishart and inverse Wishart processes
Title | Scalable Bayesian dynamic covariance modeling with variational Wishart and inverse Wishart processes |
Authors | Creighton Heaukulani, Mark van der Wilk |
Abstract | We implement gradient-based variational inference routines for Wishart and inverse Wishart processes, which we apply as Bayesian models for the dynamic, heteroskedastic covariance matrix of a multivariate time series. The Wishart and inverse Wishart processes are constructed from i.i.d. Gaussian processes, existing variational inference algorithms for which form the basis of our approach. These methods are easy to implement as a black-box and scale favorably with the length of the time series, however, they fail in the case of the Wishart process, an issue we resolve with a simple modification into an additive white noise parameterization of the model. This modification is also key to implementing a factored variant of the construction, allowing inference to additionally scale to high-dimensional covariance matrices. Through experimentation, we demonstrate that some (but not all) model variants outperform multivariate GARCH when forecasting the covariances of returns on financial instruments. |
Tasks | Gaussian Processes, Time Series |
Published | 2019-06-22 |
URL | https://arxiv.org/abs/1906.09360v2 |
https://arxiv.org/pdf/1906.09360v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-bayesian-dynamic-covariance-modeling |
Repo | https://github.com/ckheaukulani/swpr |
Framework | tf |
Transfer Learning with intelligent training data selection for prediction of Alzheimer’s Disease
Title | Transfer Learning with intelligent training data selection for prediction of Alzheimer’s Disease |
Authors | Naimul Mefraz Khan, Marcia Hon, Nabila Abraham |
Abstract | Detection of Alzheimer’s Disease (AD) from neuroimaging data such as MRI through machine learning has been a subject of intense research in recent years. Recent success of deep learning in computer vision has progressed such research further. However, common limitations with such algorithms are reliance on a large number of training images, and requirement of careful optimization of the architecture of deep networks. In this paper, we attempt solving these issues with transfer learning, where the state-of-the-art VGG architecture is initialized with pre-trained weights from large benchmark datasets consisting of natural images. The network is then fine-tuned with layer-wise tuning, where only a pre-defined group of layers are trained on MRI images. To shrink the training data size, we employ image entropy to select the most informative slices. Through experimentation on the ADNI dataset, we show that with training size of 10 to 20 times smaller than the other contemporary methods, we reach state-of-the-art performance in AD vs. NC, AD vs. MCI, and MCI vs. NC classification problems, with a 4% and a 7% increase in accuracy over the state-of-the-art for AD vs. MCI and MCI vs. NC, respectively. We also provide detailed analysis of the effect of the intelligent training data selection method, changing the training size, and changing the number of layers to be fine-tuned. Finally, we provide Class Activation Maps (CAM) that demonstrate how the proposed model focuses on discriminative image regions that are neuropathologically relevant, and can help the healthcare practitioner in interpreting the model’s decision making process. |
Tasks | Decision Making, Transfer Learning |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01160v1 |
https://arxiv.org/pdf/1906.01160v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-with-intelligent-training |
Repo | https://github.com/marciahon29/AlzheimersProject |
Framework | none |
Transfer Learning Toolkit: Primers and Benchmarks
Title | Transfer Learning Toolkit: Primers and Benchmarks |
Authors | Fuzhen Zhuang, Keyu Duan, Tongjia Guo, Yongchun Zhu, Dongbo Xi, Zhiyuan Qi, Qing He |
Abstract | The transfer learning toolkit wraps the codes of 17 transfer learning models and provides integrated interfaces, allowing users to use those models by calling a simple function. It is easy for primary researchers to use this toolkit and to choose proper models for real-world applications. The toolkit is written in Python and distributed under MIT open source license. In this paper, the current state of this toolkit is described and the necessary environment setting and usage are introduced. |
Tasks | Transfer Learning |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08967v1 |
https://arxiv.org/pdf/1911.08967v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-toolkit-primers-and |
Repo | https://github.com/FuzhenZhuang/Transfer-Learning-Toolkit |
Framework | none |
Jointly embedding the local and global relations of heterogeneous graph for rumor detection
Title | Jointly embedding the local and global relations of heterogeneous graph for rumor detection |
Authors | Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han, Songlin Hu |
Abstract | The development of social media has revolutionized the way people communicate, share information and make decisions, but it also provides an ideal platform for publishing and spreading rumors. Existing rumor detection methods focus on finding clues from text content, user profiles, and propagation patterns. However, the local semantic relation and global structural information in the message propagation graph have not been well utilized by previous works. In this paper, we present a novel global-local attention network (GLAN) for rumor detection, which jointly encodes the local semantic and global structural information. We first generate a better integrated representation for each source tweet by fusing the semantic information of related retweets with the attention mechanism. Then, we model the global relationships among all source tweets, retweets, and users as a heterogeneous graph to capture the rich structural information for rumor detection. We conduct experiments on three real-world datasets, and the results demonstrate that GLAN significantly outperforms the state-of-the-art models in both rumor detection and early detection scenarios. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04465v2 |
https://arxiv.org/pdf/1909.04465v2.pdf | |
PWC | https://paperswithcode.com/paper/jointly-embedding-the-local-and-global |
Repo | https://github.com/chunyuanY/RumorDetection |
Framework | pytorch |
Forward and Backward Information Retention for Accurate Binary Neural Networks
Title | Forward and Backward Information Retention for Accurate Binary Neural Networks |
Authors | Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song |
Abstract | Weight and activation binarization is an effective approach to deep neural network compression and can accelerate the inference by leveraging bitwise operations. Although many binarization methods have improved the accuracy of the model by minimizing the quantization error in forward propagation, there remains a noticeable performance gap between the binarized model and the full-precision one. Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks. To address these issues, we propose an Information Retention Network (IR-Net) to retain the information that consists in the forward activations and backward gradients. IR-Net mainly relies on two technical contributions: (1) Libra Parameter Binarization (Libra-PB): simultaneously minimizing both quantization error and information loss of parameters by balanced and standardized weights in forward propagation; (2) Error Decay Estimator (EDE): minimizing the information loss of gradients by gradually approximating the sign function in backward propagation, jointly considering the updating ability and accurate gradients. We are the first to investigate both forward and backward processes of binary networks from the unified information perspective, which provides new insight into the mechanism of network binarization. Comprehensive experiments with various network structures on CIFAR-10 and ImageNet datasets manifest that the proposed IR-Net can consistently outperform state-of-the-art quantization methods. |
Tasks | Neural Network Compression, Quantization |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.10788v4 |
https://arxiv.org/pdf/1909.10788v4.pdf | |
PWC | https://paperswithcode.com/paper/ir-net-forward-and-backward-information |
Repo | https://github.com/JDAI-CV/dabnn |
Framework | none |
Focused Quantization for Sparse CNNs
Title | Focused Quantization for Sparse CNNs |
Authors | Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, Cheng-Zhong Xu |
Abstract | Deep convolutional neural networks (CNNs) are powerful tools for a wide range of vision tasks, but the enormous amount of memory and compute resources required by CNNs pose a challenge in deploying them on constrained devices. Existing compression techniques, while excelling at reducing model sizes, struggle to be computationally friendly. In this paper, we attend to the statistical properties of sparse CNNs and present focused quantization, a novel quantization strategy based on power-of-two values, which exploits the weight distributions after fine-grained pruning. The proposed method dynamically discovers the most effective numerical representation for weights in layers with varying sparsities, significantly reducing model sizes. Multiplications in quantized CNNs are replaced with much cheaper bit-shift operations for efficient inference. Coupled with lossless encoding, we built a compression pipeline that provides CNNs with high compression ratios (CR), low computation cost and minimal loss in accuracy. In ResNet-50, we achieved a 18.08x CR with only 0.24% loss in top-5 accuracy, outperforming existing compression methods. We fully compressed a ResNet-18 and found that it is not only higher in CR and top-5 accuracy, but also more hardware efficient as it requires fewer logic gates to implement when compared to other state-of-the-art quantization methods assuming the same throughput. |
Tasks | Model Compression, Neural Network Compression, Quantization |
Published | 2019-03-07 |
URL | https://arxiv.org/abs/1903.03046v3 |
https://arxiv.org/pdf/1903.03046v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-and-effective-quantization-for |
Repo | https://github.com/deep-fry/mayo |
Framework | tf |
FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
Title | FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation |
Authors | Huikai Wu, Junge Zhang, Kaiqi Huang, Kongming Liang, Yizhou Yu |
Abstract | Modern approaches for semantic segmentation usually employ dilated convolutions in the backbone to extract high-resolution feature maps, which brings heavy computation complexity and memory footprint. To replace the time and memory consuming dilated convolutions, we propose a novel joint upsampling module named Joint Pyramid Upsampling (JPU) by formulating the task of extracting high-resolution feature maps into a joint upsampling problem. With the proposed JPU, our method reduces the computation complexity by more than three times without performance loss. Experiments show that JPU is superior to other upsampling modules, which can be plugged into many existing approaches to reduce computation complexity and improve performance. By replacing dilated convolutions with the proposed JPU module, our method achieves the state-of-the-art performance in Pascal Context dataset (mIoU of 53.13%) and ADE20K dataset (final score of 0.5584) while running 3 times faster. |
Tasks | Semantic Segmentation |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.11816v1 |
http://arxiv.org/pdf/1903.11816v1.pdf | |
PWC | https://paperswithcode.com/paper/fastfcn-rethinking-dilated-convolution-in-the |
Repo | https://github.com/wuhuikai/FastFCN |
Framework | pytorch |
Evolving Deep Neural Networks by Multi-objective Particle Swarm Optimization for Image Classification
Title | Evolving Deep Neural Networks by Multi-objective Particle Swarm Optimization for Image Classification |
Authors | Bin Wang, Yanan Sun, Bing Xue, Mengjie Zhang |
Abstract | In recent years, convolutional neural networks (CNNs) have become deeper in order to achieve better classification accuracy in image classification. However, it is difficult to deploy the state-of-the-art deep CNNs for industrial use due to the difficulty of manually fine-tuning the hyperparameters and the trade-off between classification accuracy and computational cost. This paper proposes a novel multi-objective optimization method for evolving state-of-the-art deep CNNs in real-life applications, which automatically evolves the non-dominant solutions at the Pareto front. Three major contributions are made: Firstly, a new encoding strategy is designed to encode one of the best state-of-the-art CNNs; With the classification accuracy and the number of floating point operations as the two objectives, a multi-objective particle swarm optimization method is developed to evolve the non-dominant solutions; Last but not least, a new infrastructure is designed to boost the experiments by concurrently running the experiments on multiple GPUs across multiple machines, and a Python library is developed and released to manage the infrastructure. The experimental results demonstrate that the non-dominant solutions found by the proposed algorithm form a clear Pareto front, and the proposed infrastructure is able to almost linearly reduce the running time. |
Tasks | Image Classification |
Published | 2019-03-21 |
URL | http://arxiv.org/abs/1904.09035v2 |
http://arxiv.org/pdf/1904.09035v2.pdf | |
PWC | https://paperswithcode.com/paper/190409035 |
Repo | https://github.com/wwwbbb8510/cudam |
Framework | none |
Deep Learning for Image Super-resolution: A Survey
Title | Deep Learning for Image Super-resolution: A Survey |
Authors | Zhihao Wang, Jian Chen, Steven C. H. Hoi |
Abstract | Image Super-Resolution (SR) is an important class of image processing techniques to enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-02-16 |
URL | https://arxiv.org/abs/1902.06068v2 |
https://arxiv.org/pdf/1902.06068v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-image-super-resolution-a |
Repo | https://github.com/impredicative/irc-url-title-bot |
Framework | tf |
Improving SIEM for Critical SCADA Water Infrastructures Using Machine Learning
Title | Improving SIEM for Critical SCADA Water Infrastructures Using Machine Learning |
Authors | Hanan Hindy, David Brosset, Ethan Bayne, Amar Seeam, Xavier Bellekens |
Abstract | Network Control Systems (NAC) have been used in many industrial processes. They aim to reduce the human factor burden and efficiently handle the complex process and communication of those systems. Supervisory control and data acquisition (SCADA) systems are used in industrial, infrastructure and facility processes (e.g. manufacturing, fabrication, oil and water pipelines, building ventilation, etc.) Like other Internet of Things (IoT) implementations, SCADA systems are vulnerable to cyber-attacks, therefore, a robust anomaly detection is a major requirement. However, having an accurate anomaly detection system is not an easy task, due to the difficulty to differentiate between cyber-attacks and system internal failures (e.g. hardware failures). In this paper, we present a model that detects anomaly events in a water system controlled by SCADA. Six Machine Learning techniques have been used in building and evaluating the model. The model classifies different anomaly events including hardware failures (e.g. sensor failures), sabotage and cyber-attacks (e.g. DoS and Spoofing). Unlike other detection systems, our proposed work focuses on notifying the operator when an anomaly occurs with a probability of the event occurring. This additional information helps in accelerating the mitigation process. The model is trained and tested using a real-world dataset. |
Tasks | Anomaly Detection, Cyber Attack Detection |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1904.05724v1 |
http://arxiv.org/pdf/1904.05724v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-siem-for-critical-scada-water |
Repo | https://github.com/AbertayMachineLearningGroup/machine-learning-SIEM-water-infrastructure |
Framework | none |
An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models
Title | An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models |
Authors | Aliaksandr Hubin |
Abstract | Non-homogeneous hidden Markov models (NHHMM) are a subclass of dependent mixture models used for semi-supervised learning, where both transition probabilities between the latent states and mean parameter of the probability distribution of the responses (for a given state) depend on the set of $p$ covariates. A priori we do not know which (and how) covariates influence the transition probabilities and the mean parameters. This induces a complex combinatorial optimization problem for model selection with $4^p$ potential configurations. To address the problem, in this article we propose an adaptive (A) simulated annealing (SA) expectation maximization (EM) algorithm (ASA-EM) for joint optimization of models and their parameters with respect to a criterion of interest. |
Tasks | Combinatorial Optimization, Model Selection |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09733v1 |
https://arxiv.org/pdf/1912.09733v1.pdf | |
PWC | https://paperswithcode.com/paper/an-adaptive-simulated-annealing-em-algorithm |
Repo | https://github.com/aliaksah/depmixS4pp |
Framework | none |