April 1, 2020

2763 words 13 mins read

Paper Group NANR 67

Paper Group NANR 67

Improved Mutual Information Estimation. Stochastic Latent Residual Video Prediction. Octave Graph Convolutional Network. Generalization Puzzles in Deep Networks. Under what circumstances do local codes emerge in feed-forward neural networks. Using Objective Bayesian Methods to Determine the Optimal Degree of Curvature within the Loss Landscape. Adv …

Improved Mutual Information Estimation

Title Improved Mutual Information Estimation
Authors Anonymous
Abstract We propose a new variational lower bound on the KL divergence and show that the Mutual Information (MI) can be estimated by maximizing this bound using a witness function on a hypothesis function class and an auxiliary scalar variable. If the function class is in a Reproducing Kernel Hilbert Space (RKHS), this leads to a jointly convex problem. We analyze the bound by deriving its dual formulation and show its connection to a likelihood ratio estimation problem. We show that the auxiliary variable introduced in our variational form plays the role of a Lagrange multiplier that enforces a normalization constraint on the likelihood ratio. By extending the function space to neural networks, we propose an efficient neural MI estimator, and validate its performance on synthetic examples, showing advantage over the existing baselines. We then demonstrate the strength of our estimator in large-scale self-supervised representation learning through MI maximization.
Tasks Representation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=S1lslCEYPB
PDF https://openreview.net/pdf?id=S1lslCEYPB
PWC https://paperswithcode.com/paper/improved-mutual-information-estimation
Repo
Framework

Stochastic Latent Residual Video Prediction

Title Stochastic Latent Residual Video Prediction
Authors Anonymous
Abstract Video prediction is a challenging task: models have to account for the inherent uncertainty of the future. Most works in the literature are based on stochastic image-autoregressive recurrent networks, raising several performance and applicability issues. An alternative is to use fully latent temporal models which untie frame synthesis and dynamics. However, no such model for video prediction has been proposed in the literature yet, due to design and training difficulties. In this paper, we overcome these difficulties by introducing a novel stochastic temporal model. It is based on residual updates of a latent state, motivated by discretization schemes of differential equations. This first-order principle naturally models video dynamics as it allows our simpler, lightweight, interpretable, latent model to outperform prior state-of-the-art methods on challenging datasets.
Tasks Video Prediction
Published 2020-01-01
URL https://openreview.net/forum?id=HyeqPJHYvH
PDF https://openreview.net/pdf?id=HyeqPJHYvH
PWC https://paperswithcode.com/paper/stochastic-latent-residual-video-prediction
Repo
Framework

Octave Graph Convolutional Network

Title Octave Graph Convolutional Network
Authors Anonymous
Abstract Many variants of Graph Convolutional Networks (GCNs) for representation learning have been proposed recently and have achieved fruitful results in various domains. Among them, spectral-based GCNs are constructed via convolution theorem upon theoretical foundation from the perspective of Graph Signal Processing (GSP). However, despite most of them implicitly act as low-pass filters that generate smooth representations for each node, there is limited development on the full usage of underlying information from low-frequency. Here, we first introduce the octave convolution on graphs in spectral domain. Accordingly, we present Octave Graph Convolutional Network (OctGCN), a novel architecture that learns representations for different frequency components regarding to weighted filters and graph wavelets bases. We empirically validate the importance of low-frequency components in graph signals on semi-supervised node classification and demonstrate that our model achieves state-of-the-art performance in comparison with both spectral-based and spatial-based baselines.
Tasks Node Classification, Representation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=HkxSOAEFDB
PDF https://openreview.net/pdf?id=HkxSOAEFDB
PWC https://paperswithcode.com/paper/octave-graph-convolutional-network
Repo
Framework

Generalization Puzzles in Deep Networks

Title Generalization Puzzles in Deep Networks
Authors Anonymous
Abstract In the last few years, deep learning has been tremendously successful in many applications. However, our theoretical understanding of deep learning, and thus the ability of providing principled improvements, seems to lag behind. A theoretical puzzle concerns the ability of deep networks to predict well despite their intriguing apparent lack of generalization: their classification accuracy on the training set is not a proxy for their performance on a test set. How is it possible that training performance is independent of testing performance? Do indeed deep networks require a drastically new theory of generalization? Or are there measurements based on the training data that are predictive of the network performance on future data? Here we show that when performance is measured appropriately, the training performance is in fact predictive of expected performance, consistently with classical machine learning theory.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BkelnhNFwB
PDF https://openreview.net/pdf?id=BkelnhNFwB
PWC https://paperswithcode.com/paper/generalization-puzzles-in-deep-networks
Repo
Framework

Under what circumstances do local codes emerge in feed-forward neural networks

Title Under what circumstances do local codes emerge in feed-forward neural networks
Authors Anonymous
Abstract Localist coding schemes are more easily interpretable than the distributed schemes but generally believed to be biologically implausible. Recent results have found highly selective units and object detectors in NNs that are indicative of local codes (LCs). Here we undertake a constructionist study on feed-forward NNs and find LCs emerging in response to invariant features, and this finding is robust until the invariant feature is perturbed by 40%. Decreasing the number of input data, increasing the relative weight of the invariant features and large values of dropout all increase the number of LCs. Longer training times increase the number of LCs and the turning point of the LC-epoch curve correlates well with the point at which NNs reach 90-100% on both test and training accuracy. Pseudo-deep networks (2 hidden layers) which have many LCs lose them when common aspects of deep-NN research are applied (large training data, ReLU activations, early stopping on training accuracy and softmax), suggesting that LCs may not be found in deep-NNs. Switching to more biologically feasible constraints (sigmoidal activation functions, longer training times, dropout, activation noise) increases the number of LCs. If LCs are not found in the feed-forward classification layers of modern deep-CNNs these data suggest this could either be caused by a lack of (moderately) invariant features being passed to the fully connected layers or due to the choice of training conditions and architecture. Should the interpretability and resilience to noise of LCs be required, this work suggests how to tune a NN so they emerge.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1eU1gHFvH
PDF https://openreview.net/pdf?id=r1eU1gHFvH
PWC https://paperswithcode.com/paper/under-what-circumstances-do-local-codes
Repo
Framework

Using Objective Bayesian Methods to Determine the Optimal Degree of Curvature within the Loss Landscape

Title Using Objective Bayesian Methods to Determine the Optimal Degree of Curvature within the Loss Landscape
Authors Anonymous
Abstract The efficacy of the width of the basin of attraction surrounding a minimum in parameter space as an indicator for the generalizability of a model parametrization is a point of contention surrounding the training of artificial neural networks, with the dominant view being that wider areas in the landscape reflect better generalizability by the trained model. In this work, however, we aim to show that this is only true for a noiseless system and in general the trend of the model towards wide areas in the landscape reflect the propensity of the model to overfit the training data. Utilizing the objective Bayesian (Jeffreys) prior we instead propose a different determinant of the optimal width within the parameter landscape determined solely by the curvature of the landscape. In doing so we utilize the decomposition of the landscape into the dimensions of principal curvature and find the first principal curvature dimension of the parameter space to be independent of noise within the training data.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HygXkJHtvB
PDF https://openreview.net/pdf?id=HygXkJHtvB
PWC https://paperswithcode.com/paper/using-objective-bayesian-methods-to-determine
Repo
Framework

Adversarially learned anomaly detection for time series data

Title Adversarially learned anomaly detection for time series data
Authors Anonymous
Abstract Anomaly detection in time series data is an important topic in many domains. However, time series are known to be particular hard to analyze. Based on the recent developments in adversarially learned models, we propose a new approach for anomaly detection in time series data. We build upon the idea to use a combination of a reconstruction error and the output of a Critic network. To this end we propose a cycle-consistent GAN architecture for sequential data and a new way of measuring the reconstruction error. We then show in a detailed evaluation how the different parts of our model contribute to the final anomaly score and demonstrate how the method improves the results on several data sets. We also compare our model to other baseline anomaly detection methods to verify its performance.
Tasks Anomaly Detection, Time Series
Published 2020-01-01
URL https://openreview.net/forum?id=SJeQdeBtwB
PDF https://openreview.net/pdf?id=SJeQdeBtwB
PWC https://paperswithcode.com/paper/adversarially-learned-anomaly-detection-for
Repo
Framework

Latent Normalizing Flows for Many-to-Many Cross Domain Mappings

Title Latent Normalizing Flows for Many-to-Many Cross Domain Mappings
Authors Anonymous
Abstract Learned joint representations of images and text form the backbone of several important cross-domain tasks such as image captioning. Prior work mostly maps both domains into a common latent representation in a purely supervised fashion. This is rather restrictive, however, as the two domains follow distinct generative processes. Therefore, we propose a novel semi-supervised framework, which models shared information between domains and domain-specific information separately. The information shared between the domains is aligned with an invertible neural network. Our model integrates normalising flow-based priors for the domain-specific information, which allows us to learn diverse many-to-many mappings between the two domains. We demonstrate the effectiveness of our model on diverse tasks, including image captioning and text-to-image synthesis.
Tasks Image Captioning, Image Generation
Published 2020-01-01
URL https://openreview.net/forum?id=SJxE8erKDH
PDF https://openreview.net/pdf?id=SJxE8erKDH
PWC https://paperswithcode.com/paper/latent-normalizing-flows-for-many-to-many
Repo
Framework

Learning Boolean Circuits with Neural Networks

Title Learning Boolean Circuits with Neural Networks
Authors Anonymous
Abstract Training neural-networks is computationally hard. However, in practice they are trained efficiently using gradient-based algorithms, achieving remarkable performance on natural data. To bridge this gap, we observe the property of local correlation: correlation between small patterns of the input and the target label. We focus on learning deep neural-networks with a variant of gradient-descent, when the target function is a tree-structured Boolean circuit. We show that in this case, the existence of correlation between the gates of the circuit and the target label determines whether the optimization succeeds or fails. Using this result, we show that neural-networks can learn the (log n)-parity problem for most product distributions. These results hint that local correlation may play an important role in differentiating between distributions that are hard or easy to learn.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BkxtNaEYDr
PDF https://openreview.net/pdf?id=BkxtNaEYDr
PWC https://paperswithcode.com/paper/learning-boolean-circuits-with-neural-1
Repo
Framework

Policy path programming

Title Policy path programming
Authors Anonymous
Abstract We develop a normative theory of hierarchical model-based policy optimization for Markov decision processes resulting in a full-depth, full-width policy iteration algorithm. This method performs policy updates which integrate reward information over all states at all horizons simultaneously thus sequentially maximizing the expected reward obtained per algorithmic iteration. Effectively, policy path programming ascends the expected cumulative reward gradient in the space of policies defined over all state-space paths. An exact formula is derived which finitely parametrizes these path gradients in terms of action preferences. Policy path gradients can be directly computed using an internal model thus obviating the need to sample paths in order to optimize in depth. They are quadratic in successor representation entries and afford natural generalizations to higher-order gradient techniques. In simulations, it is shown that intuitive hierarchical reasoning is emergent within the associated policy optimization dynamics.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=ByliZgBKPH
PDF https://openreview.net/pdf?id=ByliZgBKPH
PWC https://paperswithcode.com/paper/policy-path-programming
Repo
Framework

On Predictive Information Sub-optimality of RNNs

Title On Predictive Information Sub-optimality of RNNs
Authors Anonymous
Abstract Certain biological neurons demonstrate a remarkable capability to optimally compress the history of sensory inputs while being maximally informative about the future. In this work, we investigate if the same can be said of artificial neurons in recurrent neural networks (RNNs) trained with maximum likelihood. In experiments on two datasets, restorative Brownian motion and a hand-drawn sketch dataset, we find that RNNs are sub-optimal in the information plane. Instead of optimally compressing past information, they extract additional information that is not relevant for predicting the future. Overcoming this limitation may require alternative training procedures and architectures, or objectives beyond maximum likelihood estimation.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HklsHyBKDr
PDF https://openreview.net/pdf?id=HklsHyBKDr
PWC https://paperswithcode.com/paper/on-predictive-information-sub-optimality-of
Repo
Framework

Neural Communication Systems with Bandwidth-limited Channel

Title Neural Communication Systems with Bandwidth-limited Channel
Authors Anonymous
Abstract Reliably transmitting messages despite information loss due to a noisy channel is a core problem of information theory. One of the most important aspects of real world communication is that it may happen at varying levels of information transfer. The bandwidth-limited channel models this phenomenon. In this study we consider learning joint coding with the bandwidth-limited channel. Although, classical results suggest that it is asymptotically optimal to separate the sub-tasks of compression (source coding) and error correction (channel coding), it is well known that for finite block-length problems, and when there are restrictions to the computational complexity of coding, this optimality may not be achieved. Thus, we empirically compare the performance of joint and separate systems, and conclude that joint systems outperform their separate counterparts when coding is performed by flexible learnable function approximators such as neural networks. Specifically, we cast the joint communication problem as a variational learning problem. To facilitate this, we introduce a differentiable and computationally efficient version of this channel. We show that our design compensates for the loss of information by two mechanisms: (i) missing information is modelled by a prior model incorporated in the channel model, and (ii) sampling from the joint model is improved by auxiliary latent variables in the decoder. Experimental results justify the validity of our design decisions through improved distortion and FID scores.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJgD2ySFDr
PDF https://openreview.net/pdf?id=rJgD2ySFDr
PWC https://paperswithcode.com/paper/neural-communication-systems-with-bandwidth
Repo
Framework

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

Title Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space
Authors Anonymous
Abstract Hyperparameter optimization is both a practical issue and an interesting theoretical problem in training of deep architectures. Despite many recent advances the most commonly used methods almost universally involve training multiple and decoupled copies of the model, in effect sampling the hyperparameter space. We show that at a negligible additional computational cost, results can be improved by sampling \emph{nonlocal paths} instead of points in hyperparameter space. To this end we interpret hyperparameters as controlling the level of correlated noise in training, which can be mapped to an effective temperature. The usually independent instances of the model are coupled and allowed to exchange their hyperparameters throughout the training using the well established parallel tempering technique of statistical physics. Each simulation corresponds then to a unique path, or history, in the joint hyperparameter/model-parameter space. We provide empirical tests of our method, in particular for dropout and learning rate optimization. We observed faster training and improved resistance to overfitting and showed a systematic decrease in the absolute validation error, improving over benchmark results.
Tasks Hyperparameter Optimization
Published 2020-01-01
URL https://openreview.net/forum?id=HJlPC6NKDH
PDF https://openreview.net/pdf?id=HJlPC6NKDH
PWC https://paperswithcode.com/paper/training-deep-neural-networks-by-optimizing-1
Repo
Framework

Adversarial Neural Pruning

Title Adversarial Neural Pruning
Authors Anonymous
Abstract Despite the remarkable performance of deep neural networks (DNNs) on various tasks, they are susceptible to adversarial perturbations which makes it difficult to deploy them in real-world safety-critical applications. In this paper, we aim to obtain robust networks by sparsifying DNN’s latent features sensitive to adversarial perturbation. Specifically, we define vulnerability at the latent feature space and then propose a Bayesian framework to prioritize/prune features based on their contribution to both the original and adversarial loss. We also suggest regularizing the features’ vulnerability during training to improve robustness further. While such network sparsification has been primarily studied in the literature for computational efficiency and regularization effect of DNNs, we confirm that it is also useful to design a defense mechanism through quantitative evaluation and qualitative analysis. We validate our method, \emph{Adversarial Neural Pruning (ANP)} on multiple benchmark datasets, which results in an improvement in test accuracy and leads to state-of-the-art robustness. ANP also tackles the practical problem of obtaining sparse and robust networks at the same time, which could be crucial to ensure adversarial robustness on lightweight networks deployed to computation and memory-limited devices.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJe4SJrFDr
PDF https://openreview.net/pdf?id=SJe4SJrFDr
PWC https://paperswithcode.com/paper/adversarial-neural-pruning-1
Repo
Framework

When Covariate-shifted Data Augmentation Increases Test Error And How to Fix It

Title When Covariate-shifted Data Augmentation Increases Test Error And How to Fix It
Authors Anonymous
Abstract Empirically, data augmentation sometimes improves and sometimes hurts test error, even when only adding points with labels from the true conditional distribution that the hypothesis class is expressive enough to fit. In this paper, we provide precise conditions under which data augmentation hurts test accuracy for minimum norm estimators in linear regression. To mitigate the failure modes of augmentation, we introduce X-regularization, which uses unlabeled data to regularize the parameters towards the non-augmented estimate. We prove that our new estimator never hurts test error and exhibits significant improvements over adversarial data augmentation on CIFAR-10.
Tasks Data Augmentation
Published 2020-01-01
URL https://openreview.net/forum?id=ByxduJBtPB
PDF https://openreview.net/pdf?id=ByxduJBtPB
PWC https://paperswithcode.com/paper/when-covariate-shifted-data-augmentation
Repo
Framework
comments powered by Disqus