Paper Group ANR 371
VC-Dimension Based Generalization Bounds for Relational Learning. Predictive Performance Modeling for Distributed Computing using Black-Box Monitoring and Machine Learning. Split learning for health: Distributed deep learning without sharing raw patient data. How deep is deep enough? – Quantifying class separability in the hidden layers of deep ne …
VC-Dimension Based Generalization Bounds for Relational Learning
Title | VC-Dimension Based Generalization Bounds for Relational Learning |
Authors | Ondrej Kuzelka, Yuyi Wang, Steven Schockaert |
Abstract | In many applications of relational learning, the available data can be seen as a sample from a larger relational structure (e.g. we may be given a small fragment from some social network). In this paper we are particularly concerned with scenarios in which we can assume that (i) the domain elements appearing in the given sample have been uniformly sampled without replacement from the (unknown) full domain and (ii) the sample is complete for these domain elements (i.e. it is the full substructure induced by these elements). Within this setting, we study bounds on the error of sufficient statistics of relational models that are estimated on the available data. As our main result, we prove a bound based on a variant of the Vapnik-Chervonenkis dimension which is suitable for relational data. |
Tasks | Relational Reasoning |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06188v2 |
http://arxiv.org/pdf/1804.06188v2.pdf | |
PWC | https://paperswithcode.com/paper/vc-dimension-based-generalization-bounds-for |
Repo | |
Framework | |
Predictive Performance Modeling for Distributed Computing using Black-Box Monitoring and Machine Learning
Title | Predictive Performance Modeling for Distributed Computing using Black-Box Monitoring and Machine Learning |
Authors | Carl Witt, Marc Bux, Wladislaw Gusew, Ulf Leser |
Abstract | In many domains, the previous decade was characterized by increasing data volumes and growing complexity of computational workloads, creating new demands for highly data-parallel computing in distributed systems. Effective operation of these systems is challenging when facing uncertainties about the performance of jobs and tasks under varying resource configurations, e.g., for scheduling and resource allocation. We survey predictive performance modeling (PPM) approaches to estimate performance metrics such as execution duration, required memory or wait times of future jobs and tasks based on past performance observations. We focus on non-intrusive methods, i.e., methods that can be applied to any workload without modification, since the workload is usually a black-box from the perspective of the systems managing the computational infrastructure. We classify and compare sources of performance variation, predicted performance metrics, required training data, use cases, and the underlying prediction techniques. We conclude by identifying several open problems and pressing research needs in the field. |
Tasks | |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11877v1 |
http://arxiv.org/pdf/1805.11877v1.pdf | |
PWC | https://paperswithcode.com/paper/predictive-performance-modeling-for |
Repo | |
Framework | |
Split learning for health: Distributed deep learning without sharing raw patient data
Title | Split learning for health: Distributed deep learning without sharing raw patient data |
Authors | Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, Ramesh Raskar |
Abstract | Can health entities collaboratively train deep learning models without sharing sensitive raw data? This paper proposes several configurations of a distributed deep learning method called SplitNN to facilitate such collaborations. SplitNN does not share raw data or model details with collaborating institutions. The proposed configurations of splitNN cater to practical settings of i) entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks and iii) learning without sharing labels. We compare performance and resource efficiency trade-offs of splitNN and other distributed deep learning methods like federated learning, large batch synchronous stochastic gradient descent and show highly encouraging results for splitNN. |
Tasks | |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00564v1 |
http://arxiv.org/pdf/1812.00564v1.pdf | |
PWC | https://paperswithcode.com/paper/split-learning-for-health-distributed-deep |
Repo | |
Framework | |
How deep is deep enough? – Quantifying class separability in the hidden layers of deep neural networks
Title | How deep is deep enough? – Quantifying class separability in the hidden layers of deep neural networks |
Authors | Achim Schilling, Claus Metzner, Jonas Rietsch, Richard Gerum, Holger Schulze, Patrick Krauss |
Abstract | Deep neural networks typically outperform more traditional machine learning models in their ability to classify complex data, and yet is not clear how the individual hidden layers of a deep network contribute to the overall classification performance. We thus introduce a Generalized Discrimination Value (GDV) that measures, in a non-invasive manner, how well different data classes separate in each given network layer. The GDV can be used for the automatic tuning of hyper-parameters, such as the width profile and the total depth of a network. Moreover, the layer-dependent GDV(L) provides new insights into the data transformations that self-organize during training: In the case of multi-layer perceptrons trained with error backpropagation, we find that classification of highly complex data sets requires a temporal {\em reduction} of class separability, marked by a characteristic ‘energy barrier’ in the initial part of the GDV(L) curve. Even more surprisingly, for a given data set, the GDV(L) is running through a fixed ‘master curve’, independently from the total number of network layers. Furthermore, applying the GDV to Deep Belief Networks reveals that also unsupervised training with the Contrastive Divergence method can systematically increase class separability over tens of layers, even though the system does not ‘know’ the desired class labels. These results indicate that the GDV may become a useful tool to open the black box of deep learning. |
Tasks | |
Published | 2018-11-05 |
URL | https://arxiv.org/abs/1811.01753v2 |
https://arxiv.org/pdf/1811.01753v2.pdf | |
PWC | https://paperswithcode.com/paper/how-deep-is-deep-enough-optimizing-deep |
Repo | |
Framework | |
Guaranteed Sufficient Decrease for Stochastic Variance Reduced Gradient Optimization
Title | Guaranteed Sufficient Decrease for Stochastic Variance Reduced Gradient Optimization |
Authors | Fanhua Shang, Yuanyuan Liu, Kaiwen Zhou, James Cheng, Kelvin K. W. Ng, Yuichi Yoshida |
Abstract | In this paper, we propose a novel sufficient decrease technique for stochastic variance reduced gradient descent methods such as SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of stochastic variance reduction algorithms such as SVRG-SD and SAGA-SD as a byproduct. We introduce a coefficient to scale current iterate and to satisfy the sufficient decrease property, which takes the decisions to shrink, expand or even move in the opposite direction, and then give two specific update rules of the coefficient for Lasso and ridge regression. Moreover, we analyze the convergence properties of our algorithms for strongly convex problems, which show that our algorithms attain linear convergence rates. We also provide the convergence guarantees of our algorithms for non-strongly convex problems. Our experimental results further verify that our algorithms achieve significantly better performance than their counterparts. |
Tasks | Stochastic Optimization |
Published | 2018-02-26 |
URL | http://arxiv.org/abs/1802.09933v1 |
http://arxiv.org/pdf/1802.09933v1.pdf | |
PWC | https://paperswithcode.com/paper/guaranteed-sufficient-decrease-for-stochastic |
Repo | |
Framework | |
Cooperation in the spatial prisoner’s dilemma game with probabilistic abstention
Title | Cooperation in the spatial prisoner’s dilemma game with probabilistic abstention |
Authors | Marcos Cardinot, Josephine Griffith, Colm O’Riordan, Matjaz Perc |
Abstract | Research has shown that the addition of abstention as an option transforms social dilemmas to rock-paper-scissor type games, where defectors dominate cooperators, cooperators dominate abstainers (loners), and abstainers (loners), in turn, dominate defectors. In this way, abstention can sustain cooperation even under adverse conditions, although defection also persists due to cyclic dominance. However, to abstain or to act as a loner has, to date, always been considered as an independent, third strategy to complement traditional cooperation and defection. Here we consider probabilistic abstention, where each player is assigned a probability to abstain in a particular instance of the game. In the two limiting cases, the studied game reverts to the prisoner’s dilemma game without loners or to the optional prisoner’s dilemma game. For intermediate probabilities, we have a new hybrid game, which turns out to be most favorable for the successful evolution of cooperation. We hope this novel hybrid game provides a more realistic view of the dilemma of optional/voluntary participation. |
Tasks | |
Published | 2018-11-25 |
URL | http://arxiv.org/abs/1811.10114v2 |
http://arxiv.org/pdf/1811.10114v2.pdf | |
PWC | https://paperswithcode.com/paper/cooperation-in-the-spatial-prisoners-dilemma |
Repo | |
Framework | |
Predicting the Flu from Instagram
Title | Predicting the Flu from Instagram |
Authors | Oguzhan Gencoglu, Miikka Ermes |
Abstract | Conventional surveillance systems for monitoring infectious diseases, such as influenza, face challenges due to shortage of skilled healthcare professionals, remoteness of communities and absence of communication infrastructures. Internet-based approaches for surveillance are appealing logistically as well as economically. Search engine queries and Twitter have been the primarily used data sources in such approaches. The aim of this study is to assess the predictive power of an alternative data source, Instagram. By using 317 weeks of publicly available data from Instagram, we trained several machine learning algorithms to both nowcast and forecast the number of official influenza-like illness incidents in Finland where population-wide official statistics about the weekly incidents are available. In addition to date and hashtag count features of online posts, we were able to utilize also the visual content of the posted images with the help of deep convolutional neural networks. Our best nowcasting model reached a mean absolute error of 11.33 incidents per week and a correlation coefficient of 0.963 on the test data. Forecasting models for predicting 1 week and 2 weeks ahead showed statistical significance as well by reaching correlation coefficients of 0.903 and 0.862, respectively. This study demonstrates how social media and in particular, digital photographs shared in them, can be a valuable source of information for the field of infodemiology. |
Tasks | |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.10949v1 |
http://arxiv.org/pdf/1811.10949v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-flu-from-instagram |
Repo | |
Framework | |
Towards continual learning in medical imaging
Title | Towards continual learning in medical imaging |
Authors | Chaitanya Baweja, Ben Glocker, Konstantinos Kamnitsas |
Abstract | This work investigates continual learning of two segmentation tasks in brain MRI with neural networks. To explore in this context the capabilities of current methods for countering catastrophic forgetting of the first task when a new one is learned, we investigate elastic weight consolidation, a recently proposed method based on Fisher information, originally evaluated on reinforcement learning of Atari games. We use it to sequentially learn segmentation of normal brain structures and then segmentation of white matter lesions. Our findings show this recent method reduces catastrophic forgetting, while large room for improvement exists in these challenging settings for continual learning. |
Tasks | Atari Games, Continual Learning |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02496v1 |
http://arxiv.org/pdf/1811.02496v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-continual-learning-in-medical-imaging |
Repo | |
Framework | |
Predicting the Computational Cost of Deep Learning Models
Title | Predicting the Computational Cost of Deep Learning Models |
Authors | Daniel Justus, John Brennan, Stephen Bonner, Andrew Stephen McGough |
Abstract | Deep learning is rapidly becoming a go-to tool for many artificial intelligence problems due to its ability to outperform other approaches and even humans at many problems. Despite its popularity we are still unable to accurately predict the time it will take to train a deep learning network to solve a given problem. This training time can be seen as the product of the training time per epoch and the number of epochs which need to be performed to reach the desired level of accuracy. Some work has been carried out to predict the training time for an epoch – most have been based around the assumption that the training time is linearly related to the number of floating point operations required. However, this relationship is not true and becomes exacerbated in cases where other activities start to dominate the execution time. Such as the time to load data from memory or loss of performance due to non-optimal parallel execution. In this work we propose an alternative approach in which we train a deep learning network to predict the execution time for parts of a deep learning network. Timings for these individual parts can then be combined to provide a prediction for the whole execution time. This has advantages over linear approaches as it can model more complex scenarios. But, also, it has the ability to predict execution times for scenarios unseen in the training data. Therefore, our approach can be used not only to infer the execution time for a batch, or entire epoch, but it can also support making a well-informed choice for the appropriate hardware and model. |
Tasks | |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11880v1 |
http://arxiv.org/pdf/1811.11880v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-computational-cost-of-deep |
Repo | |
Framework | |
Recursive Sparse Pseudo-input Gaussian Process SARSA
Title | Recursive Sparse Pseudo-input Gaussian Process SARSA |
Authors | John Martin, Brendan Englot |
Abstract | The class of Gaussian Process (GP) methods for Temporal Difference learning has shown promise for data-efficient model-free Reinforcement Learning. In this paper, we consider a recent variant of the GP-SARSA algorithm, called Sparse Pseudo-input Gaussian Process SARSA (SPGP-SARSA), and derive recursive formulas for its predictive moments. This extension promotes greater memory efficiency, since previous computations can be reused and, interestingly, it provides a technique for updating value estimates on a multiple timescales |
Tasks | |
Published | 2018-11-17 |
URL | http://arxiv.org/abs/1811.07201v1 |
http://arxiv.org/pdf/1811.07201v1.pdf | |
PWC | https://paperswithcode.com/paper/recursive-sparse-pseudo-input-gaussian |
Repo | |
Framework | |
Constant Regret, Generalized Mixability, and Mirror Descent
Title | Constant Regret, Generalized Mixability, and Mirror Descent |
Authors | Zakaria Mhammedi, Robert C. Williamson |
Abstract | We consider the setting of prediction with expert advice; a learner makes predictions by aggregating those of a group of experts. Under this setting, and for the right choice of loss function and “mixing” algorithm, it is possible for the learner to achieve a constant regret regardless of the number of prediction rounds. For example, a constant regret can be achieved for \emph{mixable} losses using the \emph{aggregating algorithm}. The \emph{Generalized Aggregating Algorithm} (GAA) is a name for a family of algorithms parameterized by convex functions on simplices (entropies), which reduce to the aggregating algorithm when using the \emph{Shannon entropy} $\operatorname{S}$. For a given entropy $\Phi$, losses for which a constant regret is possible using the \textsc{GAA} are called $\Phi$-mixable. Which losses are $\Phi$-mixable was previously left as an open question. We fully characterize $\Phi$-mixability and answer other open questions posed by \cite{Reid2015}. We show that the Shannon entropy $\operatorname{S}$ is fundamental in nature when it comes to mixability; any $\Phi$-mixable loss is necessarily $\operatorname{S}$-mixable, and the lowest worst-case regret of the \textsc{GAA} is achieved using the Shannon entropy. Finally, by leveraging the connection between the \emph{mirror descent algorithm} and the update step of the GAA, we suggest a new \emph{adaptive} generalized aggregating algorithm and analyze its performance in terms of the regret bound. |
Tasks | |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.06965v3 |
http://arxiv.org/pdf/1802.06965v3.pdf | |
PWC | https://paperswithcode.com/paper/constant-regret-generalized-mixability-and |
Repo | |
Framework | |
Analysis of Nonautonomous Adversarial Systems
Title | Analysis of Nonautonomous Adversarial Systems |
Authors | Arash Mehrjou |
Abstract | Generative adversarial networks are used to generate images but still their convergence properties are not well understood. There have been a few studies who intended to investigate the stability properties of GANs as a dynamical system. This short writing can be seen in that direction. Among the proposed methods for stabilizing training of GANs, {\ss}-GAN was the first who proposed a complete annealing strategy to change high-level conditions of the GAN objective. In this note, we show by a simple example how annealing strategy works in GANs. The theoretical analysis is supported by simple simulations. |
Tasks | |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.05045v1 |
http://arxiv.org/pdf/1803.05045v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-nonautonomous-adversarial-systems |
Repo | |
Framework | |
Distributed Differentially-Private Algorithms for Matrix and Tensor Factorization
Title | Distributed Differentially-Private Algorithms for Matrix and Tensor Factorization |
Authors | Hafiz Imtiaz, Anand D. Sarwate |
Abstract | In many signal processing and machine learning applications, datasets containing private information are held at different locations, requiring the development of distributed privacy-preserving algorithms. Tensor and matrix factorizations are key components of many processing pipelines. In the distributed setting, differentially private algorithms suffer because they introduce noise to guarantee privacy. This paper designs new and improved distributed and differentially private algorithms for two popular matrix and tensor factorization methods: principal component analysis (PCA) and orthogonal tensor decomposition (OTD). The new algorithms employ a correlated noise design scheme to alleviate the effects of noise and can achieve the same noise level as the centralized scenario. Experiments on synthetic and real data illustrate the regimes in which the correlated noise allows performance matching with the centralized setting, outperforming previous methods and demonstrating that meaningful utility is possible while guaranteeing differential privacy. |
Tasks | |
Published | 2018-04-26 |
URL | http://arxiv.org/abs/1804.10299v1 |
http://arxiv.org/pdf/1804.10299v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-differentially-private-algorithms |
Repo | |
Framework | |
Study of Sparsity-Aware Subband Adaptive Filtering Algorithms with Adjustable Penalties
Title | Study of Sparsity-Aware Subband Adaptive Filtering Algorithms with Adjustable Penalties |
Authors | Y. Yu, H. Zhao, R. C. de Lamare |
Abstract | We propose two sparsity-aware normalized subband adaptive filter (NSAF) algorithms by using the gradient descent method to minimize a combination of the original NSAF cost function and the l1-norm penalty function on the filter coefficients. This l1-norm penalty exploits the sparsity of a system in the coefficients update formulation, thus improving the performance when identifying sparse systems. Compared with prior work, the proposed algorithms have lower computational complexity with comparable performance. We study and devise statistical models for these sparsity-aware NSAF algorithms in the mean square sense involving their transient and steady -state behaviors. This study relies on the vectorization argument and the paraunitary assumption imposed on the analysis filter banks, and thus does not restrict the input signal to being Gaussian or having another distribution. In addition, we propose to adjust adaptively the intensity parameter of the sparsity attraction term. Finally, simulation results in sparse system identification demonstrate the effectiveness of our theoretical results. |
Tasks | |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.07559v1 |
http://arxiv.org/pdf/1810.07559v1.pdf | |
PWC | https://paperswithcode.com/paper/study-of-sparsity-aware-subband-adaptive |
Repo | |
Framework | |
Zero Initialization of modified Gated Recurrent Encoder-Decoder Network for Short Term Load Forecasting
Title | Zero Initialization of modified Gated Recurrent Encoder-Decoder Network for Short Term Load Forecasting |
Authors | Vedanshu, M M Tripathi |
Abstract | Single layer Feedforward Neural Network(FNN) is used many a time as a last layer in models such as seq2seq or could be a simple RNN network. The importance of such layer is to transform the output to our required dimensions. When it comes to weights and biases initialization, there is no such specific technique that could speed up the learning process. We could depend on deep network initialization techniques such as Xavier or He initialization. But such initialization fails to show much improvement in learning speed or accuracy. In this paper we propose Zero Initialization (ZI) for weights of a single layer network. We first test this technique with on a simple RNN network and compare the results against Xavier, He and Identity initialization. As a final test we implement it on a seq2seq network. It was found that ZI considerably reduces the number of epochs used and improve the accuracy. The developed model has been applied for short-term load forecasting using the load data of Australian Energy Market. The model is able to forecast the day ahead load accurately with error of 0.94%. |
Tasks | Load Forecasting |
Published | 2018-12-09 |
URL | http://arxiv.org/abs/1812.03425v1 |
http://arxiv.org/pdf/1812.03425v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-initialization-of-modified-gated |
Repo | |
Framework | |