Paper Group ANR 212
Polynomially Coded Regression: Optimal Straggler Mitigation via Data Encoding. Multiple Manifolds Metric Learning with Application to Image Set Classification. Hartley Spectral Pooling for Deep Learning. Testing for Conditional Mean Independence with Covariates through Martingale Difference Divergence. A representer theorem for deep neural networks …
Polynomially Coded Regression: Optimal Straggler Mitigation via Data Encoding
Title | Polynomially Coded Regression: Optimal Straggler Mitigation via Data Encoding |
Authors | Songze Li, Seyed Mohammadreza Mousavi Kalan, Qian Yu, Mahdi Soltanolkotabi, A. Salman Avestimehr |
Abstract | We consider the problem of training a least-squares regression model on a large dataset using gradient descent. The computation is carried out on a distributed system consisting of a master node and multiple worker nodes. Such distributed systems are significantly slowed down due to the presence of slow-running machines (stragglers) as well as various communication bottlenecks. We propose “polynomially coded regression” (PCR) that substantially reduces the effect of stragglers and lessens the communication burden in such systems. The key idea of PCR is to encode the partial data stored at each worker, such that the computations at the workers can be viewed as evaluating a polynomial at distinct points. This allows the master to compute the final gradient by interpolating this polynomial. PCR significantly reduces the recovery threshold, defined as the number of workers the master has to wait for prior to computing the gradient. In particular, PCR requires a recovery threshold that scales inversely proportionally with the amount of computation/storage available at each worker. In comparison, state-of-the-art straggler-mitigation schemes require a much higher recovery threshold that only decreases linearly in the per worker computation/storage load. We prove that PCR’s recovery threshold is near minimal and within a factor two of the best possible scheme. Our experiments over Amazon EC2 demonstrate that compared with state-of-the-art schemes, PCR improves the run-time by 1.50x ~ 2.36x with naturally occurring stragglers, and by as much as 2.58x ~ 4.29x with artificial stragglers. |
Tasks | |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09934v1 |
http://arxiv.org/pdf/1805.09934v1.pdf | |
PWC | https://paperswithcode.com/paper/polynomially-coded-regression-optimal |
Repo | |
Framework | |
Multiple Manifolds Metric Learning with Application to Image Set Classification
Title | Multiple Manifolds Metric Learning with Application to Image Set Classification |
Authors | Rui Wang, Xiao-Jun Wu, Kai-Xuan Chen, Josef Kittler |
Abstract | In image set classification, a considerable advance has been made by modeling the original image sets by second order statistics or linear subspace, which typically lie on the Riemannian manifold. Specifically, they are Symmetric Positive Definite (SPD) manifold and Grassmann manifold respectively, and some algorithms have been developed on them for classification tasks. Motivated by the inability of existing methods to extract discriminatory features for data on Riemannian manifolds, we propose a novel algorithm which combines multiple manifolds as the features of the original image sets. In order to fuse these manifolds, the well-studied Riemannian kernels have been utilized to map the original Riemannian spaces into high dimensional Hilbert spaces. A metric Learning method has been devised to embed these kernel spaces into a lower dimensional common subspace for classification. The state-of-the-art results achieved on three datasets corresponding to two different classification tasks, namely face recognition and object categorization, demonstrate the effectiveness of the proposed method. |
Tasks | Face Recognition, Metric Learning |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11918v1 |
http://arxiv.org/pdf/1805.11918v1.pdf | |
PWC | https://paperswithcode.com/paper/multiple-manifolds-metric-learning-with |
Repo | |
Framework | |
Hartley Spectral Pooling for Deep Learning
Title | Hartley Spectral Pooling for Deep Learning |
Authors | Hao Zhang, Jianwei Ma |
Abstract | In most convolution neural networks (CNNs), downsampling hidden layers is adopted for increasing computation efficiency and the receptive field size. Such operation is commonly so-called pooling. Maximation and averaging over sliding windows (max/average pooling), and plain downsampling in the form of strided convolution are popular pooling methods. Since the pooling is a lossy procedure, a motivation of our work is to design a new pooling approach for less lossy in the dimensionality reduction. Inspired by the Fourier spectral pooling(FSP) proposed by Rippel et. al. [1], we present the Hartley transform based spectral pooling method in CNNs. Compared with FSP, the proposed spectral pooling avoids the use of complex arithmetic for frequency representation and reduces the computation. Spectral pooling preserves more structure features for network’s discriminability than max and average pooling. We empirically show that Hartley spectral pooling gives rise to the convergence of training CNNs on MNIST and CIFAR-10 datasets. |
Tasks | Dimensionality Reduction |
Published | 2018-10-07 |
URL | http://arxiv.org/abs/1810.04028v1 |
http://arxiv.org/pdf/1810.04028v1.pdf | |
PWC | https://paperswithcode.com/paper/hartley-spectral-pooling-for-deep-learning |
Repo | |
Framework | |
Testing for Conditional Mean Independence with Covariates through Martingale Difference Divergence
Title | Testing for Conditional Mean Independence with Covariates through Martingale Difference Divergence |
Authors | Ze Jin, Xiaohan Yan, David S. Matteson |
Abstract | As a crucial problem in statistics is to decide whether additional variables are needed in a regression model. We propose a new multivariate test to investigate the conditional mean independence of Y given X conditioning on some known effect Z, i.e., E(YX, Z) = E(YZ). Assuming that E(YZ) and Z are linearly related, we reformulate an equivalent notion of conditional mean independence through transformation, which is approximated in practice. We apply the martingale difference divergence (Shao and Zhang, 2014) to measure conditional mean dependence, and show that the estimation error from approximation is negligible, as it has no impact on the asymptotic distribution of the test statistic under some regularity assumptions. The implementation of our test is demonstrated by both simulations and a financial data example. |
Tasks | |
Published | 2018-05-17 |
URL | http://arxiv.org/abs/1805.06640v1 |
http://arxiv.org/pdf/1805.06640v1.pdf | |
PWC | https://paperswithcode.com/paper/testing-for-conditional-mean-independence |
Repo | |
Framework | |
A representer theorem for deep neural networks
Title | A representer theorem for deep neural networks |
Authors | Michael Unser |
Abstract | We propose to optimize the activation functions of a deep neural network by adding a corresponding functional regularization to the cost function. We justify the use of a second-order total-variation criterion. This allows us to derive a general representer theorem for deep neural networks that makes a direct connection with splines and sparsity. Specifically, we show that the optimal network configuration can be achieved with activation functions that are nonuniform linear splines with adaptive knots. The bottom line is that the action of each neuron is encoded by a spline whose parameters (including the number of knots) are optimized during the training procedure. The scheme results in a computational structure that is compatible with the existing deep-ReLU, parametric ReLU, APL (adaptive piecewise-linear) and MaxOut architectures. It also suggests novel optimization challenges, while making the link with $\ell_1$ minimization and sparsity-promoting techniques explicit. |
Tasks | |
Published | 2018-02-26 |
URL | http://arxiv.org/abs/1802.09210v2 |
http://arxiv.org/pdf/1802.09210v2.pdf | |
PWC | https://paperswithcode.com/paper/a-representer-theorem-for-deep-neural |
Repo | |
Framework | |
A New Framework for Machine Intelligence: Concepts and Prototype
Title | A New Framework for Machine Intelligence: Concepts and Prototype |
Authors | Abel Torres Montoya |
Abstract | Machine learning (ML) and artificial intelligence (AI) have become hot topics in many information processing areas, from chatbots to scientific data analysis. At the same time, there is uncertainty about the possibility of extending predominant ML technologies to become general solutions with continuous learning capabilities. Here, a simple, yet comprehensive, theoretical framework for intelligent systems is presented. A combination of Mirror Compositional Representations (MCR) and a Solution-Critic Loop (SCL) is proposed as a generic approach for different types of problems. A prototype implementation is presented for document comparison using English Wikipedia corpus. |
Tasks | |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02137v1 |
http://arxiv.org/pdf/1806.02137v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-framework-for-machine-intelligence |
Repo | |
Framework | |
A Theory of Dichotomous Valuation with Applications to Variable Selection
Title | A Theory of Dichotomous Valuation with Applications to Variable Selection |
Authors | Xingwei Hu |
Abstract | An econometric or statistical model may undergo a marginal gain if we admit a new variable to the model, and a marginal loss if we remove an existing variable from the model. Assuming equality of opportunity among all candidate variables, we derive a valuation framework by the expected marginal gain and marginal loss in all potential modeling scenarios. However, marginal gain and loss are not symmetric; thus, we introduce three unbiased solutions. When used in variable selection, our new approaches significantly outperform several popular methods used in practice. The results also explore some novel traits of the Shapley value. |
Tasks | |
Published | 2018-08-01 |
URL | https://arxiv.org/abs/1808.00131v5 |
https://arxiv.org/pdf/1808.00131v5.pdf | |
PWC | https://paperswithcode.com/paper/a-theory-of-dichotomous-valuation-with |
Repo | |
Framework | |
Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression
Title | Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression |
Authors | Arun Kumar Kuchibhotla, Abhishek Chakrabortty |
Abstract | Concentration inequalities form an essential toolkit in the study of high-dimensional statistical methods. Most of the relevant statistics literature is based on the assumptions of sub-Gaussian/sub-exponential random vectors. In this paper, we bring together various probability inequalities for sums of independent random variables under much weaker exponential type (sub-Weibull) tail assumptions. These results extract a part sub-Gaussian tail behavior in finite samples, matching the asymptotics governed by the central limit theorem, and are compactly represented in terms of a new Orlicz quasi-norm - the Generalized Bernstein-Orlicz norm - that typifies such tail behaviors. We illustrate the usefulness of these inequalities through the analysis of four fundamental problems in high-dimensional statistics. In the first two problems, we study the rate of convergence of the sample covariance matrix in terms of the maximum elementwise norm and the maximum k-sub-matrix operator norm which are key quantities of interest in bootstrap procedures and high-dimensional structured covariance matrix estimation. The third example concerns the restricted eigenvalue condition, required in high dimensional linear regression, which we verify for all sub-Weibull random vectors under only marginal (not joint) tail assumptions on the covariates. To our knowledge, this is the first unified result obtained in such generality. In the final example, we consider the Lasso estimator for linear regression and establish its rate of convergence under much weaker tail assumptions (on the errors as well as the covariates) than those in the existing literature. The common feature in all our results is that the convergence rates under most exponential tails match the usual ones under sub-Gaussian assumptions. Finally, we also establish a high-dimensional CLT and tail bounds for empirical processes for sub-Weibulls. |
Tasks | |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02605v2 |
http://arxiv.org/pdf/1804.02605v2.pdf | |
PWC | https://paperswithcode.com/paper/moving-beyond-sub-gaussianity-in-high |
Repo | |
Framework | |
OrthoSeg: A Deep Multimodal Convolutional Neural Network for Semantic Segmentation of Orthoimagery
Title | OrthoSeg: A Deep Multimodal Convolutional Neural Network for Semantic Segmentation of Orthoimagery |
Authors | Pankaj Bodani, Kumar Shreshtha, Shashikant Sharma |
Abstract | This paper addresses the task of semantic segmentation of orthoimagery using multimodal data e.g. optical RGB, infrared and digital surface model. We propose a deep convolutional neural network architecture termed OrthoSeg for semantic segmentation using multimodal, orthorectified and coregistered data. We also propose a training procedure for supervised training of OrthoSeg. The training procedure complements the inherent architectural characteristics of OrthoSeg for preventing complex co-adaptations of learned features, which may arise due to probable high dimensionality and spatial correlation in multimodal and/or multispectral coregistered data. OrthoSeg consists of parallel encoding networks for independent encoding of multimodal feature maps and a decoder designed for efficiently fusing independently encoded multimodal feature maps. A softmax layer at the end of the network uses the features generated by the decoder for pixel-wise classification. The decoder fuses feature maps from the parallel encoders locally as well as contextually at multiple scales to generate per-pixel feature maps for final pixel-wise classification resulting in segmented output. We experimentally show the merits of OrthoSeg by demonstrating state-of-the-art accuracy on the ISPRS Potsdam 2D Semantic Segmentation dataset. Adaptability is one of the key motivations behind OrthoSeg so that it serves as a useful architectural option for a wide range of problems involving the task of semantic segmentation of coregistered multimodal and/or multispectral imagery. Hence, OrthoSeg is designed to enable independent scaling of parallel encoder networks and decoder network to better match application requirements, such as the number of input channels, the effective field-of-view, and model capacity. |
Tasks | Semantic Segmentation, Semantic Segmentation Of Orthoimagery |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07859v2 |
http://arxiv.org/pdf/1811.07859v2.pdf | |
PWC | https://paperswithcode.com/paper/orthoseg-a-deep-multimodal-convolutional |
Repo | |
Framework | |
Image Classification at Supercomputer Scale
Title | Image Classification at Supercomputer Scale |
Authors | Chris Ying, Sameer Kumar, Dehao Chen, Tao Wang, Youlong Cheng |
Abstract | Deep learning is extremely computationally intensive, and hardware vendors have responded by building faster accelerators in large clusters. Training deep learning models at petaFLOPS scale requires overcoming both algorithmic and systems software challenges. In this paper, we discuss three systems-related optimizations: (1) distributed batch normalization to control per-replica batch sizes, (2) input pipeline optimizations to sustain model throughput, and (3) 2-D torus all-reduce to speed up gradient summation. We combine these optimizations to train ResNet-50 on ImageNet to 76.3% accuracy in 2.2 minutes on a 1024-chip TPU v3 Pod with a training throughput of over 1.05 million images/second and no accuracy drop. |
Tasks | Image Classification |
Published | 2018-11-16 |
URL | http://arxiv.org/abs/1811.06992v2 |
http://arxiv.org/pdf/1811.06992v2.pdf | |
PWC | https://paperswithcode.com/paper/image-classification-at-supercomputer-scale |
Repo | |
Framework | |
Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives
Title | Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives |
Authors | Yue Wang, Swarat Chaudhuri, Lydia E. Kavraki |
Abstract | Planning robust executions under uncertainty is a fundamental challenge for building autonomous robots. Partially Observable Markov Decision Processes (POMDPs) provide a standard framework for modeling uncertainty in many applications. In this work, we study POMDPs with safe-reachability objectives, which require that with a probability above some threshold, a goal state is eventually reached while keeping the probability of visiting unsafe states below some threshold. This POMDP formulation is different from the traditional POMDP models with optimality objectives and we show that in some cases, POMDPs with safe-reachability objectives can provide a better guarantee of both safety and reachability than the existing POMDP models through an example. A key algorithmic problem for POMDPs is policy synthesis, which requires reasoning over a vast space of beliefs (probability distributions). To address this challenge, we introduce the notion of a goal-constrained belief space, which only contains beliefs reachable from the initial belief under desired executions that can achieve the given safe-reachability objective. Our method compactly represents this space over a bounded horizon using symbolic constraints, and employs an incremental Satisfiability Modulo Theories (SMT) solver to efficiently search for a valid policy over it. We evaluate our method using a case study involving a partially observable robotic domain with uncertain obstacles. The results show that our method can synthesize policies over large belief spaces with a small number of SMT solver calls by focusing on the goal-constrained belief space. |
Tasks | |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09780v2 |
http://arxiv.org/pdf/1801.09780v2.pdf | |
PWC | https://paperswithcode.com/paper/bounded-policy-synthesis-for-pomdps-with-safe |
Repo | |
Framework | |
Improving Active Learning in Systematic Reviews
Title | Improving Active Learning in Systematic Reviews |
Authors | Gaurav Singh, James Thomas, John Shawe-Taylor |
Abstract | Systematic reviews are essential to summarizing the results of different clinical and social science studies. The first step in a systematic review task is to identify all the studies relevant to the review. The task of identifying relevant studies for a given systematic review is usually performed manually, and as a result, involves substantial amounts of expensive human resource. Lately, there have been some attempts to reduce this manual effort using active learning. In this work, we build upon some such existing techniques, and validate by experimenting on a larger and comprehensive dataset than has been attempted until now. Our experiments provide insights on the use of different feature extraction models for different disciplines. More importantly, we identify that a naive active learning based screening process is biased in favour of selecting similar documents. We aimed to improve the performance of the screening process using a novel active learning algorithm with success. Additionally, we propose a mechanism to choose the best feature extraction method for a given review. |
Tasks | Active Learning |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09496v1 |
http://arxiv.org/pdf/1801.09496v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-active-learning-in-systematic |
Repo | |
Framework | |
Scalable Formal Concept Analysis algorithm for large datasets using Spark
Title | Scalable Formal Concept Analysis algorithm for large datasets using Spark |
Authors | Raghavendra K Chunduri, Aswani Kumar Cherukuri |
Abstract | In the process of knowledge discovery and representation in large datasets using formal concept analysis, complexity plays a major role in identifying all the formal concepts and constructing the concept lattice(digraph of the concepts). For identifying the formal concepts and constructing the digraph from the identified concepts in very large datasets, various distributed algorithms are available in the literature. However, the existing distributed algorithms are not very well suitable for concept generation because it is an iterative process. The existing algorithms are implemented using distributed frameworks like MapReduce and Open MP, these frameworks are not appropriate for iterative applications. Hence, in this paper we proposed efficient distributed algorithms for both formal concept generation and concept lattice digraph construction in large formal contexts using Apache Spark. Various performance metrics are considered for the evaluation of the proposed work, the results of the evaluation proves that the proposed algorithms are efficient for concept generation and lattice graph construction in comparison with the existing algorithms. |
Tasks | graph construction |
Published | 2018-07-06 |
URL | http://arxiv.org/abs/1807.02258v1 |
http://arxiv.org/pdf/1807.02258v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-formal-concept-analysis-algorithm |
Repo | |
Framework | |
Convolutional Graph Auto-encoder: A Deep Generative Neural Architecture for Probabilistic Spatio-temporal Solar Irradiance Forecasting
Title | Convolutional Graph Auto-encoder: A Deep Generative Neural Architecture for Probabilistic Spatio-temporal Solar Irradiance Forecasting |
Authors | Mahdi Khodayar, Saeed Mohammadi, Mohammad Khodayar, Jianhui Wang, Guangyi Liu |
Abstract | Machine Learning on graph-structured data is an important and omnipresent task for a vast variety of applications including anomaly detection and dynamic network analysis. In this paper, a deep generative model is introduced to capture continuous probability densities corresponding to the nodes of an arbitrary graph. In contrast to all learning formulations in the area of discriminative pattern recognition, we propose a scalable generative optimization/algorithm theoretically proved to capture distributions at the nodes of a graph. Our model is able to generate samples from the probability densities learned at each node. This probabilistic data generation model, i.e. convolutional graph auto-encoder (CGAE), is devised based on the localized first-order approximation of spectral graph convolutions, deep learning, and the variational Bayesian inference. We apply our CGAE to a new problem, the spatio-temporal probabilistic solar irradiance prediction. Multiple solar radiation measurement sites in a wide area in northern states of the US are modeled as an undirected graph. Using our proposed model, the distribution of future irradiance given historical radiation observations is estimated for every site/node. Numerical results on the National Solar Radiation Database show state-of-the-art performance for probabilistic radiation prediction on geographically distributed irradiance data in terms of reliability, sharpness, and continuous ranked probability score. |
Tasks | Anomaly Detection, Bayesian Inference |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03538v1 |
http://arxiv.org/pdf/1809.03538v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-graph-auto-encoder-a-deep |
Repo | |
Framework | |
Deep Reinforcement Learning using Capsules in Advanced Game Environments
Title | Deep Reinforcement Learning using Capsules in Advanced Game Environments |
Authors | Per-Arne Andersen |
Abstract | Reinforcement Learning (RL) is a research area that has blossomed tremendously in recent years and has shown remarkable potential for artificial intelligence based opponents in computer games. This success is primarily due to vast capabilities of Convolutional Neural Networks (ConvNet), enabling algorithms to extract useful information from noisy environments. Capsule Network (CapsNet) is a recent introduction to the Deep Learning algorithm group and has only barely begun to be explored. The network is an architecture for image classification, with superior performance for classification of the MNIST dataset. CapsNets have not been explored beyond image classification. This thesis introduces the use of CapsNet for Q-Learning based game algorithms. To successfully apply CapsNet in advanced game play, three main contributions follow. First, the introduction of four new game environments as frameworks for RL research with increasing complexity, namely Flash RL, Deep Line Wars, Deep RTS, and Deep Maze. These environments fill the gap between relatively simple and more complex game environments available for RL research and are in the thesis used to test and explore the CapsNet behavior. Second, the thesis introduces a generative modeling approach to produce artificial training data for use in Deep Learning models including CapsNets. We empirically show that conditional generative modeling can successfully generate game data of sufficient quality to train a Deep Q-Network well. Third, we show that CapsNet is a reliable architecture for Deep Q-Learning based algorithms for game AI. A capsule is a group of neurons that determine the presence of objects in the data and is in the literature shown to increase the robustness of training and predictions while lowering the amount training data needed. It should, therefore, be ideally suited for game plays. |
Tasks | Image Classification, Q-Learning |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09597v1 |
http://arxiv.org/pdf/1801.09597v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-using-capsules-in |
Repo | |
Framework | |