Paper Group NANR 57
Continual Learning with Bayesian Neural Networks for Non-Stationary Data. Knossos: Compiling AI with AI. Dynamic Time Lag Regression: Predicting What & When. Reparameterized Variational Divergence Minimization for Stable Imitation. Deep Symbolic Superoptimization Without Human Knowledge. Provable Filter Pruning for Efficient Neural Networks. Compre …
Continual Learning with Bayesian Neural Networks for Non-Stationary Data
Title | Continual Learning with Bayesian Neural Networks for Non-Stationary Data |
Authors | Anonymous |
Abstract | This work addresses continual learning for non-stationary data, using Bayesian neural networks and memory-based online variational Bayes. We represent the posterior approximation of the network weights by a diagonal Gaussian distribution and a complementary memory of raw data. This raw data corresponds to likelihood terms that cannot be well approximated by the Gaussian. We introduce a novel method for sequentially updating both components of the posterior approximation. Furthermore, we propose Bayesian forgetting and a Gaussian diffusion process for adapting to non-stationary data. The experimental results show that our update method improves on existing approaches for streaming data. Additionally, the adaptation methods lead to better predictive performance for non-stationary data. |
Tasks | Continual Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJlsFpVtDB |
https://openreview.net/pdf?id=SJlsFpVtDB | |
PWC | https://paperswithcode.com/paper/continual-learning-with-bayesian-neural |
Repo | |
Framework | |
Knossos: Compiling AI with AI
Title | Knossos: Compiling AI with AI |
Authors | Anonymous |
Abstract | Machine learning workloads are often expensive to train, taking weeks to converge. The current generation of frameworks relies on custom back-ends in order to achieve efficiency, making it impractical to train models on less common hardware where no such back-ends exist. Knossos builds on recent work that avoids the need for hand-written libraries, instead compiles machine learning models in much the same way one would compile other kinds of software. In order to make the resulting code efficient, the Knossos complier directly optimises the abstract syntax tree of the program. However in contrast to traditional compilers that employ hand-written optimisation passes, we take a rewriting approach driven by the $A^\star$ search algorithm and a learn value function that evaluates future potential cost reduction of taking various rewriting actions to the program. We show that Knossos can automatically learned optimisations that past compliers had to implement by hand. Furthermore, we demonstrate that Knossos can achieve wall time reduction compared to a hand-tuned compiler on a suite of machine learning programs, including basic linear algebra and convolutional networks. The Knossos compiler has minimal dependencies and can be used on any architecture that supports a \Cpp toolchain. Since cost model the proposed algorithm optimises can be tailored to a particular hardware architecture, the proposed approach can potentially applied to a variety of hardware. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SylyHkHYDB |
https://openreview.net/pdf?id=SylyHkHYDB | |
PWC | https://paperswithcode.com/paper/knossos-compiling-ai-with-ai |
Repo | |
Framework | |
Dynamic Time Lag Regression: Predicting What & When
Title | Dynamic Time Lag Regression: Predicting What & When |
Authors | Anonymous |
Abstract | This paper tackles a new regression problem, called Dynamic Time-Lag Regression (DTLR), where a cause signal drives an effect signal with an unknown time delay. The motivating application, pertaining to space weather modelling, aims to predict the near-Earth solar wind speed based on estimates of the Sun’s coronal magnetic field. DTLR differs from mainstream regression and from sequence-to-sequence learning in two respects: firstly, no ground truth (e.g., pairs of associated sub-sequences) is available; secondly, the cause signal contains much information irrelevant to the effect signal (the solar magnetic field governs the solar wind propagation in the heliosphere, of which the Earth’s magnetosphere is but a minuscule region). A Bayesian approach is presented to tackle the specifics of the DTLR problem, with theoretical justifications based on linear stability analysis. A proof of concept on synthetic problems is presented. Finally, the empirical results on the solar wind modelling task improve on the state of the art in solar wind forecasting. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkxybANtDB |
https://openreview.net/pdf?id=SkxybANtDB | |
PWC | https://paperswithcode.com/paper/dynamic-time-lag-regression-predicting-what |
Repo | |
Framework | |
Reparameterized Variational Divergence Minimization for Stable Imitation
Title | Reparameterized Variational Divergence Minimization for Stable Imitation |
Authors | Anonymous |
Abstract | State-of-the-art results in imitation learning are currently held by adversarial methods that iteratively estimate the divergence between student and expert policies and then minimize this divergence to bring the imitation policy closer to expert behavior. Analogous techniques for imitation learning from observations alone (without expert action labels), however, have not enjoyed the same ubiquitous successes. Recent work in adversarial methods for generative models has shown that the measure used to judge the discrepancy between real and synthetic samples is an algorithmic design choice, and that different choices can result in significant differences in model performance. Choices including Wasserstein distance and various $f$-divergences have already been explored in the adversarial networks literature, while more recently the latter class has been investigated for imitation learning. Unfortunately, we find that in practice this existing imitation-learning framework for using $f$-divergences suffers from numerical instabilities stemming from the combination of function approximation and policy-gradient reinforcement learning. In this work, we alleviate these challenges and offer a reparameterization of adversarial imitation learning as $f$-divergence minimization before further extending the framework to handle the problem of imitation from observations only. Empirically, we demonstrate that our design choices for coupling imitation learning and $f$-divergences are critical to recovering successful imitation policies. Moreover, we find that with the appropriate choice of $f$-divergence, we can obtain imitation-from-observation algorithms that outperform baseline approaches and more closely match expert performance in continous-control tasks with low-dimensional observation spaces. With high-dimensional observations, we still observe a significant gap with and without action labels, offering an interesting avenue for future work. |
Tasks | Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyxDXJStPS |
https://openreview.net/pdf?id=SyxDXJStPS | |
PWC | https://paperswithcode.com/paper/reparameterized-variational-divergence |
Repo | |
Framework | |
Deep Symbolic Superoptimization Without Human Knowledge
Title | Deep Symbolic Superoptimization Without Human Knowledge |
Authors | Anonymous |
Abstract | Deep symbolic superoptimization refers to the task of applying deep learning methods to simplify symbolic expressions. Existing approaches either perform supervised training on human-constructed datasets that defines equivalent expression pairs, or apply reinforcement learning with human-defined equivalent trans-formation actions. In short, almost all existing methods rely on human knowledge to define equivalence, which suffers from large labeling cost and learning bias, because it is almost impossible to define and comprehensive equivalent set. We thus propose HISS, a reinforcement learning framework for symbolic super-optimization that keeps human outside the loop. HISS introduces a tree-LSTM encoder-decoder network with attention to ensure tractable learning. Our experiments show that HISS can discover more simplification rules than existing human-dependent methods, and can learn meaningful embeddings for symbolic expressions, which are indicative of equivalence. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1egIyBFPS |
https://openreview.net/pdf?id=r1egIyBFPS | |
PWC | https://paperswithcode.com/paper/deep-symbolic-superoptimization-without-human |
Repo | |
Framework | |
Provable Filter Pruning for Efficient Neural Networks
Title | Provable Filter Pruning for Efficient Neural Networks |
Authors | Anonymous |
Abstract | We present a provable, sampling-based approach for generating compact Convolutional Neural Networks (CNNs) by identifying and removing redundant filters from an over-parameterized network. Our algorithm uses a small batch of input data points to assign a saliency score for each filter and constructs an importance sampling distribution where filters that highly affect the output are sampled with correspondingly high probability. Unlike weight pruning approaches that lead to irregular sparsity patterns – requiring specialized libraries or hardware to enable computational speedups – our approach compresses the original network to a slimmer subnetwork, which enables accelerated inference with any off-the-shelf deep learning library and hardware. Existing filter pruning methods are generally data-oblivious, rely on heuristics for evaluating the parameter importance, or require manual and tedious hyper-parameter tuning. In contrast, our method is data-informed, exhibits provable guarantees on the size and performance of the pruned network, and is widely applicable to varying network architectures and data sets. Our analytical bounds bridge the notions of compressibility and importance of network structures, which gives rise to a fully-automated procedure for identifying and preserving the filters in layers that are essential to the network’s performance. Our experimental results across varying pruning scenarios show that our algorithm consistently generates sparser and more efficient models than those generated by existing filter pruning approaches. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxkOlSYDH |
https://openreview.net/pdf?id=BJxkOlSYDH | |
PWC | https://paperswithcode.com/paper/provable-filter-pruning-for-efficient-neural |
Repo | |
Framework | |
Compressed Sensing with Deep Image Prior and Learned Regularization
Title | Compressed Sensing with Deep Image Prior and Learned Regularization |
Authors | Anonymous |
Abstract | We propose a novel method for compressed sensing recovery using untrained deep generative models. Our method is based on the recently proposed Deep Image Prior (DIP), wherein the convolutional weights of the network are optimized to match the observed measurements. We show that this approach can be applied to solve any differentiable linear inverse problem, outperforming previous unlearned methods. Unlike various learned approaches based on generative models, our method does not require pre-training over large datasets. We further introduce a novel learned regularization technique, which incorporates prior information on the network weights. This reduces reconstruction error, especially for noisy measurements. Finally we prove that, using the DIP optimization approach, moderately overparameterized single-layer networks trained can perfectly fit any signal despite the nonconvex nature of the fitting problem. This theoretical result provides justification for early stopping. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hkl_sAVtwr |
https://openreview.net/pdf?id=Hkl_sAVtwr | |
PWC | https://paperswithcode.com/paper/compressed-sensing-with-deep-image-prior-and-1 |
Repo | |
Framework | |
GENN: Predicting Correlated Drug-drug Interactions with Graph Energy Neural Networks
Title | GENN: Predicting Correlated Drug-drug Interactions with Graph Energy Neural Networks |
Authors | Anonymous |
Abstract | Gaining more comprehensive knowledge about drug-drug interactions (DDIs) is one of the most important tasks in drug development and medical practice. Recently graph neural networks have achieved great success in this task by modeling drugs as nodes and drug-drug interactions as links and casting DDI predictions as link prediction problems. However, correlations between link labels (e.g., DDI types) were rarely considered in existing works. We propose the graph energy neural network (\mname) to explicitly model link type correlations. We formulate the DDI prediction task as a structure prediction problem and introduce a new energy-based model where the energy function is defined by graph neural networks. Experiments on two real-world DDI datasets demonstrated that \mname is superior to many baselines without consideration of link type correlations and achieved $13.77%$ and $5.01%$ PR-AUC improvement on the two datasets, respectively. We also present a case study in which \mname can better capture meaningful DDI correlations compared with baseline models. |
Tasks | Link Prediction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1eNleBYwr |
https://openreview.net/pdf?id=H1eNleBYwr | |
PWC | https://paperswithcode.com/paper/genn-predicting-correlated-drug-drug |
Repo | |
Framework | |
A Closer Look at the Optimization Landscapes of Generative Adversarial Networks
Title | A Closer Look at the Optimization Landscapes of Generative Adversarial Networks |
Authors | Anonymous |
Abstract | Generative adversarial networks have been very successful in generative modeling, however they remain relatively challenging to train compared to standard deep neural networks. In this paper, we propose new visualization techniques for the optimization landscapes of GANs that enable us to study the game vector field resulting from the concatenation of the gradient of both players. Using these visualization techniques we try to bridge the gap between theory and practice by showing empirically that the training of GANs exhibits significant rotations around LSSP, similar to the one predicted by theory on toy examples. Moreover, we provide empirical evidence that GAN training seems to converge to a stable stationary point which is a saddle point for the generator loss, not a minimum, while still achieving excellent performance. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJeVnCEKwH |
https://openreview.net/pdf?id=HJeVnCEKwH | |
PWC | https://paperswithcode.com/paper/a-closer-look-at-the-optimization-landscapes-1 |
Repo | |
Framework | |
Representation Learning for Remote Sensing: An Unsupervised Sensor Fusion Approach
Title | Representation Learning for Remote Sensing: An Unsupervised Sensor Fusion Approach |
Authors | Anonymous |
Abstract | In the application of machine learning to remote sensing, labeled data is often scarce or expensive, which impedes the training of powerful models like deep convolutional neural networks. Although unlabeled data is abundant, recent self-supervised learning approaches are ill-suited to the remote sensing domain. In addition, most remote sensing applications currently use only a small subset of the multi-sensor, multi-channel information available, motivating the need for fused multi-sensor representations. We propose a new self-supervised training objective, Contrastive Sensor Fusion, which exploits coterminous data from multiple sources to learn useful representations of every possible combination of those sources. This method uses information common across multiple sensors and bands by training a single model to produce a representation that remains similar when any subset of its input channels is used. Using a dataset of 47 million unlabeled coterminous image triplets, we train an encoder to produce semantically meaningful representations from any possible combination of channels from the input sensors. These representations outperform fully supervised ImageNet weights on a remote sensing classification task and improve as more sensors are fused. |
Tasks | Representation Learning, Sensor Fusion |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJlVn6NKPB |
https://openreview.net/pdf?id=SJlVn6NKPB | |
PWC | https://paperswithcode.com/paper/representation-learning-for-remote-sensing-an |
Repo | |
Framework | |
Reanalysis of Variance Reduced Temporal Difference Learning
Title | Reanalysis of Variance Reduced Temporal Difference Learning |
Authors | Anonymous |
Abstract | Temporal difference (TD) learning is a popular algorithm for policy evaluation in reinforcement learning, but the vanilla TD can substantially suffer from the inherent optimization variance. A variance reduced TD (VRTD) algorithm was proposed by Korda and La (2015), which applies the variance reduction technique directly to the online TD learning with Markovian samples. In this work, we first point out the technical errors in the analysis of VRTD in Korda and La (2015), and then provide a mathematically solid analysis of the non-asymptotic convergence of VRTD and its variance reduction performance. We show that VRTD is guaranteed to converge to a neighborhood of the fixed-point solution of TD at a linear convergence rate. Furthermore, the variance error (for both i.i.d. and Markovian sampling) and the bias error (for Markovian sampling) of VRTD are significantly reduced by the batch size of variance reduction in comparison to those of vanilla TD. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1ly10EKDS |
https://openreview.net/pdf?id=S1ly10EKDS | |
PWC | https://paperswithcode.com/paper/reanalysis-of-variance-reduced-temporal |
Repo | |
Framework | |
Learning Algorithmic Solutions to Symbolic Planning Tasks with a Neural Computer
Title | Learning Algorithmic Solutions to Symbolic Planning Tasks with a Neural Computer |
Authors | Anonymous |
Abstract | A key feature of intelligent behavior is the ability to learn abstract strategies that transfer to unfamiliar problems. Therefore, we present a novel architecture, based on memory-augmented networks, that is inspired by the von Neumann and Harvard architectures of modern computers. This architecture enables the learning of abstract algorithmic solutions via Evolution Strategies in a reinforcement learning setting. Applied to Sokoban, sliding block puzzle and robotic manipulation tasks, we show that the architecture can learn algorithmic solutions with strong generalization and abstraction: scaling to arbitrary task configurations and complexities, and being independent of both the data representation and the task domain. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1gNc3NtvB |
https://openreview.net/pdf?id=S1gNc3NtvB | |
PWC | https://paperswithcode.com/paper/learning-algorithmic-solutions-to-symbolic-1 |
Repo | |
Framework | |
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
Title | Making Efficient Use of Demonstrations to Solve Hard Exploration Problems |
Authors | Anonymous |
Abstract | This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. We also introduce a suite of eight tasks that combine these three properties, and show that R2D3 can solve several of the tasks where other state of the art methods (both with and without demonstrations) fail to see even a single successful trajectory after tens of billions of steps of exploration. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SygKyeHKDH |
https://openreview.net/pdf?id=SygKyeHKDH | |
PWC | https://paperswithcode.com/paper/making-efficient-use-of-demonstrations-to-1 |
Repo | |
Framework | |
Learnable Group Transform For Time-Series
Title | Learnable Group Transform For Time-Series |
Authors | Anonymous |
Abstract | We propose to undertake the problem of representation learning for time-series by considering a Group Transform approach. This framework allows us to, first, generalize classical time-frequency transformations such as the Wavelet Transform, and second, to enable the learnability of the representation. While the creation of the Wavelet Transform filter-bank relies on the sampling of the affine group in order to transform the mother filter, our approach allows for non-linear transformations of the mother filter by introducing the group of strictly increasing and continuous functions. The transformations induced by such a group enable us to span a larger class of signal representations. The sampling of this group can be optimized with respect to a specific loss and function and thus cast into a Deep Learning architecture. The experiments on diverse time-series datasets demonstrate the expressivity of this framework which competes with state-of-the-art performances. |
Tasks | Representation Learning, Time Series |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgepaNtDS |
https://openreview.net/pdf?id=HJgepaNtDS | |
PWC | https://paperswithcode.com/paper/learnable-group-transform-for-time-series |
Repo | |
Framework | |
Neural networks are a priori biased towards Boolean functions with low entropy
Title | Neural networks are a priori biased towards Boolean functions with low entropy |
Authors | Anonymous |
Abstract | Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks – a single-layer perceptron with $n$ input neurons, one output neuron, and no threshold bias term – we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies $t$ points in ${0,1}^n$ as $1$ has a remarkably simple form: $ P(t) = 2^{-n} ,, {\rm for} ,, 0\leq t < 2^n$. Since a perceptron can express far fewer Boolean functions with small or large values of $t$ (low “entropy”) than with intermediate values of $t$ (high “entropy”) there is, on average, a strong intrinsic a-priori bias towards individual functions with low entropy. Furthermore, within a class of functions with fixed $t$, we often observe a further intrinsic bias towards functions of lower complexity. Finally, we prove that, regardless of the distribution of inputs, the bias towards low entropy becomes monotonically stronger upon adding ReLU layers, and empirically show that increasing the variance of the bias term has a similar effect. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Skgeip4FPr |
https://openreview.net/pdf?id=Skgeip4FPr | |
PWC | https://paperswithcode.com/paper/neural-networks-are-a-priori-biased-towards |
Repo | |
Framework | |