April 1, 2020

3198 words 16 mins read

Paper Group NANR 5

Incorporating Perceptual Prior to Improve Model’s Adversarial Robustness. Neural Clustering Processes. Bayesian Inference for Large Scale Image Classification. Refining the variational posterior through iterative optimization. Scale-Equivariant Neural Networks with Decomposed Convolutional Filters. Nonlinearities in activations substantially shape …

Incorporating Perceptual Prior to Improve Model’s Adversarial Robustness


Title	Incorporating Perceptual Prior to Improve Model’s Adversarial Robustness
Authors	Anonymous
Abstract	Deep Neural Networks trained using human-annotated data are able to achieve human-like accuracy on many computer vision tasks such as classification, object recognition and segmentation. However, they are still far from being as robust as the human visual system. In this paper, we demonstrate that even models that are trained to be robust to random perturbations do not necessarily learn robust representations. We propose to address this by imposing a perception based prior on the learned representations to ensure that perceptually similar images have similar representations. We demonstrate that, although this training method does not use adversarial samples during training, it significantly improves the network’s robustness to single-step and multi-step adversarial attacks, thus validating our hypothesis that the network indeed learns more robust representations. Our proposed method provides a means of achieving adversarial robustness at no additional computational cost when compared to normal training.
Tasks	Object Recognition
Published	2020-01-01
URL	https://openreview.net/forum?id=B1grayHYDH
PDF	https://openreview.net/pdf?id=B1grayHYDH
PWC	https://paperswithcode.com/paper/incorporating-perceptual-prior-to-improve
Repo
Framework

Neural Clustering Processes


Title	Neural Clustering Processes
Authors	Anonymous
Abstract	Mixture models, a basic building block in countless statistical models, involve latent random variables over discrete spaces, and existing posterior inference methods can be inaccurate and/or very slow. In this work we introduce a novel deep learning architecture for efficient amortized Bayesian inference over mixture models. While previous approaches to amortized clustering assumed a fixed or maximum number of mixture components and only amortized over the continuous parameters of each mixture component, our method amortizes over the local discrete labels of all the data points, and performs inference over an unbounded number of mixture components. The latter property makes our method natural for the challenging case of nonparametric Bayesian models, where the number of mixture components grows with the dataset. Our approach exploits the exchangeability of the generative models and is based on mapping distributed, permutation-invariant representations of discrete arrangements into varying-size multinomial conditional probabilities. The resulting algorithm parallelizes easily, yields iid samples from the approximate posteriors along with a normalized probability estimate of each sample (a quantity generally unavailable using Markov Chain Monte Carlo) and can easily be applied to both conjugate and non-conjugate models, as training only requires samples from the generative model. We also present an extension of the method to models of random communities (such as infinite relational or stochastic block models). As a scientific application, we present a novel approach to neural spike sorting for high-density multielectrode arrays.
Tasks	Bayesian Inference
Published	2020-01-01
URL	https://openreview.net/forum?id=ryxF80NYwS
PDF	https://openreview.net/pdf?id=ryxF80NYwS
PWC	https://paperswithcode.com/paper/neural-clustering-processes
Repo
Framework

Bayesian Inference for Large Scale Image Classification


Title	Bayesian Inference for Large Scale Image Classification
Authors	Anonymous
Abstract	Bayesian inference promises to ground and improve the performance of deep neural networks. It promises to be robust to overfitting, to simplify the training procedure and the space of hyperparameters, and to provide a calibrated measure of uncertainty that can enhance decision making, agent exploration and prediction fairness. Markov Chain Monte Carlo (MCMC) methods enable Bayesian inference by generating samples from the posterior distribution over model parameters. Despite the theoretical advantages of Bayesian inference and the similarity between MCMC and optimization methods, the performance of sampling methods has so far lagged behind optimization methods for large scale deep learning tasks. We aim to fill this gap and introduce ATMC, an adaptive noise MCMC algorithm that estimates and is able to sample from the posterior of a neural network. ATMC dynamically adjusts the amount of momentum and noise applied to each parameter update in order to compensate for the use of stochastic gradients. We use a ResNet architecture without batch normalization to test ATMC on the Cifar10 benchmark and the large scale ImageNet benchmark and show that, despite the absence of batch normalization, ATMC outperforms a strong optimization baseline in terms of both classification accuracy and test log-likelihood. We show that ATMC is intrinsically robust to overfitting on the training data and that ATMC provides a better calibrated measure of uncertainty compared to the optimization baseline.
Tasks	Bayesian Inference, Decision Making, Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=rklFh34Kwr
PDF	https://openreview.net/pdf?id=rklFh34Kwr
PWC	https://paperswithcode.com/paper/bayesian-inference-for-large-scale-image-1
Repo
Framework

Refining the variational posterior through iterative optimization


Title	Refining the variational posterior through iterative optimization
Authors	Anonymous
Abstract	Variational inference (VI) is a popular approach for approximate Bayesian inference that is particularly promising for highly parameterized models such as deep neural networks. A key challenge of variational inference is to approximate the posterior over model parameters with a distribution that is simpler and tractable yet sufficiently expressive. In this work, we propose a method for training highly flexible variational distributions by starting with a coarse approximation and iteratively refining it. Each refinement step makes cheap, local adjustments and only requires optimization of simple variational families. We demonstrate theoretically that our method always improves a bound on the approximation (the Evidence Lower BOund) and observe this empirically across a variety of benchmark tasks. In experiments, our method consistently outperforms recent variational inference methods for deep learning in terms of log-likelihood and the ELBO. We see that the gains are further amplified on larger scale models, significantly outperforming standard VI and deep ensembles on residual networks on CIFAR10.
Tasks	Bayesian Inference
Published	2020-01-01
URL	https://openreview.net/forum?id=rkglZyHtvH
PDF	https://openreview.net/pdf?id=rkglZyHtvH
PWC	https://paperswithcode.com/paper/refining-the-variational-posterior-through
Repo
Framework

Scale-Equivariant Neural Networks with Decomposed Convolutional Filters


Title	Scale-Equivariant Neural Networks with Decomposed Convolutional Filters
Authors	Anonymous
Abstract	Encoding the input scale information explicitly into the representation learned by a convolutional neural network (CNN) is beneficial for many vision tasks especially when dealing with multiscale input signals. We study, in this paper, a scale-equivariant CNN architecture with joint convolutions across the space and the scaling group, which is shown to be both sufficient and necessary to achieve scale-equivariant representations. To reduce the model complexity and computational burden, we decompose the convolutional filters under two pre-fixed separable bases and truncate the expansion to low-frequency components. A further benefit of the truncated filter expansion is the improved deformation robustness of the equivariant representation. Numerical experiments demonstrate that the proposed scale-equivariant neural network with decomposed convolutional filters (ScDCFNet) achieves significantly improved performance in multiscale image classification and better interpretability than regular CNNs at a reduced model size.
Tasks	Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=rkgCJ64tDB
PDF	https://openreview.net/pdf?id=rkgCJ64tDB
PWC	https://paperswithcode.com/paper/scale-equivariant-neural-networks-with-1
Repo
Framework

Nonlinearities in activations substantially shape the loss surfaces of neural networks


Title	Nonlinearities in activations substantially shape the loss surfaces of neural networks
Authors	Anonymous
Abstract	Understanding the loss surfaces of neural networks is fundamentally important to understanding deep learning. This paper presents how the nonlinearities in activations substantially shape the loss surfaces of neural networks. We first prove that the loss surface of every neural network has infinite spurious local minima, which are defined as the local minima with higher empirical risks than the global minima. Our result holds for any neural network with arbitrary depth and arbitrary piecewise linear activation functions (excluding linear functions) under most loss functions in practice. This result demonstrates that nonlinear networks possess substantial differences to the well-studied linear neural networks. Essentially, the underlying assumptions for the above result are consistent with most practical circumstances where the output layer is narrower than any hidden layer. We further prove a theorem that draws a big picture for the loss surfaces of nonlinear neural networks from the following respects. (1) Smooth and multilinear partition: the loss surface is partitioned into multiple smooth and multilinear open cells. (2) Local analogous convexity: within every cell, local minima are equally good, and equivalently, they are all global minima in the cell. (3) Local minima valley: some local minima are concentrated into a valley in some cell, sharing the same empirical risk. (4) Linear collapse: when all activations are linear, the partitioned loss surface collapses to one single cell, which includes linear neural networks as a simplified case. The second result holds for one-hidden-layer networks for regression under convex loss, while all others apply to networks of arbitrary depth.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1x6BTEKwr
PDF	https://openreview.net/pdf?id=B1x6BTEKwr
PWC	https://paperswithcode.com/paper/nonlinearities-in-activations-substantially
Repo
Framework

Deep Coordination Graphs


Title	Deep Coordination Graphs
Authors	Anonymous
Abstract	This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning. DCG strikes a flexible trade-off between representational capacity and generalization by factorizing the joint value function of all agents according to a coordination graph into payoffs between pairs of agents. The value can be maximized by local message passing along the graph, which allows training of the value function end-to-end with Q-learning. Payoff functions are approximated with deep neural networks and parameter sharing improves generalization over the state-action space. We show that DCG can solve challenging predator-prey tasks that are vulnerable to the relative overgeneralization pathology and in which all other known value factorization approaches fail.
Tasks	Multi-agent Reinforcement Learning, Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HklRKpEKDr
PDF	https://openreview.net/pdf?id=HklRKpEKDr
PWC	https://paperswithcode.com/paper/deep-coordination-graphs-1
Repo
Framework

Learning Surrogate Losses


Title	Learning Surrogate Losses
Authors	Anonymous
Abstract	The minimization of loss functions is the heart and soul of Machine Learning. In this paper, we propose an off-the-shelf optimization approach that can seamlessly minimize virtually any non-differentiable and non-decomposable loss function (e.g. Miss-classification Rate, AUC, F1, Jaccard Index, Mathew Correlation Coefficient, etc.). Our strategy learns smooth relaxation versions of the true losses by approximating them through a surrogate neural network. The proposed loss networks are set-wise models which are invariant to the order of mini-batch instances. Ultimately, the surrogate losses are learned jointly with the prediction model via bilevel optimization. Empirical results on multiple datasets with diverse real-life loss functions compared with state-of-the-art baselines demonstrate the efficiency of learning surrogate losses.
Tasks	bilevel optimization
Published	2020-01-01
URL	https://openreview.net/forum?id=BkePHaVKwS
PDF	https://openreview.net/pdf?id=BkePHaVKwS
PWC	https://paperswithcode.com/paper/learning-surrogate-losses-1
Repo
Framework

Gaussian Conditional Random Fields for Classification


Title	Gaussian Conditional Random Fields for Classification
Authors	Anonymous
Abstract	In this paper, a Gaussian conditional random field model for structured binary classification (GCRFBC) is proposed. The model is applicable to classification problems with undirected graphs, intractable for standard classification CRFs. The model representation of GCRFBC is extended by latent variables which yield some appealing properties. Thanks to the GCRF latent structure, the model becomes tractable, efficient, and open to improvements previously applied to GCRF regression. Two different forms of the algorithm are presented: GCRFBCb (GCRGBC - Bayesian) and GCRFBCnb (GCRFBC - non-Bayesian). The extended method of local variational approximation of sigmoid function is used for solving empirical Bayes in GCRFBCb variant, whereas MAP value of latent variables is the basis for learning and inference in the GCRFBCnb variant. The inference in GCRFBCb is solved by Newton-Cotes formulas for one-dimensional integration. Both models are evaluated on synthetic data and real-world data. It was shown that both models achieve better prediction performance than relevant baselines. Advantages and disadvantages of the proposed models are discussed.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryxC-kBYDS
PDF	https://openreview.net/pdf?id=ryxC-kBYDS
PWC	https://paperswithcode.com/paper/gaussian-conditional-random-fields-for-1
Repo
Framework

Open-Set Domain Adaptation with Category-Agnostic Clusters


Title	Open-Set Domain Adaptation with Category-Agnostic Clusters
Authors	Anonymous
Abstract	Unsupervised domain adaptation has received significant attention in recent years. Most of existing works tackle the closed-set scenario, assuming that the source and target domains share the exactly same categories. In practice, nevertheless, a target domain often contains samples of classes unseen in source domain (i.e., unknown class). The extension of domain adaptation from closed-set to such open-set situation is not trivial since the target samples in unknown class are not expected to align with the source. In this paper, we address this problem by augmenting the state-of-the-art domain adaptation technique, Self-Ensembling, with category-agnostic clusters in target domain. Specifically, we present Self-Ensembling with Category-agnostic Clusters (SE-CC) — a novel architecture that steers domain adaptation with the additional guidance of category-agnostic clusters that are specific to target domain. These clustering information provides domain-specific visual cues, facilitating the generalization of Self-Ensembling for both closed-set and open-set scenarios. Technically, clustering is firstly performed over all the unlabeled target samples to obtain the category-agnostic clusters, which reveal the underlying data space structure peculiar to target domain. A clustering branch is capitalized on to ensure that the learnt representation preserves such underlying structure by matching the estimated assignment distribution over clusters to the inherent cluster distribution for each target sample. Furthermore, SE-CC enhances the learnt representation with mutual information maximization. Extensive experiments are conducted on Office and VisDA datasets for both open-set and closed-set domain adaptation, and superior results are reported when comparing to the state-of-the-art approaches.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkgv71rtwr
PDF	https://openreview.net/pdf?id=Bkgv71rtwr
PWC	https://paperswithcode.com/paper/open-set-domain-adaptation-with-category
Repo
Framework

Distribution Matching Prototypical Network for Unsupervised Domain Adaptation


Title	Distribution Matching Prototypical Network for Unsupervised Domain Adaptation
Authors	Anonymous
Abstract	State-of-the-art Unsupervised Domain Adaptation (UDA) methods learn transferable features by minimizing the feature distribution discrepancy between the source and target domains. Different from these methods which do not model the feature distributions explicitly, in this paper, we explore explicit feature distribution modeling for UDA. In particular, we propose Distribution Matching Prototypical Network (DMPN) to model the deep features from each domain as Gaussian mixture distributions. With explicit feature distribution modeling, we can easily measure the discrepancy between the two domains. In DMPN, we propose two new domain discrepancy losses with probabilistic interpretations. The first one minimizes the distances between the corresponding Gaussian component means of the source and target data. The second one minimizes the pseudo negative log likelihood of generating the target features from source feature distribution. To learn both discriminative and domain invariant features, DMPN is trained by minimizing the classification loss on the labeled source data and the domain discrepancy losses together. Extensive experiments are conducted over two UDA tasks. Our approach yields a large margin in the Digits Image transfer task over state-of-the-art approaches. More remarkably, DMPN obtains a mean accuracy of 81.4% on VisDA 2017 dataset. The hyper-parameter sensitivity analysis shows that our approach is robust w.r.t hyper-parameter changes.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2020-01-01
URL	https://openreview.net/forum?id=r1eX1yrKwB
PDF	https://openreview.net/pdf?id=r1eX1yrKwB
PWC	https://paperswithcode.com/paper/distribution-matching-prototypical-network
Repo
Framework

Enhanced Convolutional Neural Tangent Kernels


Title	Enhanced Convolutional Neural Tangent Kernels
Authors	Anonymous
Abstract	Recent research shows that for training with l2 loss, convolutional neural networks (CNNs) whose width (number of channels in convolutional layers) goes to infinity, correspond to regression with respect to the CNN Gaussian Process kernel (CNN-GP) if only the last layer is trained, and correspond to regression with respect to the Convolutional Neural Tangent Kernel (CNTK) if all layers are trained. An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel. Here we show how to significantly enhance the performance of these kernels using two ideas. (1) Modifying the kernel using a new operation called Local Average Pooling (LAP) which preserves efficient computability of the kernel and inherits the spirit of standard data augmentation using pixel shifts. Earlier papers were unable to incorporate naive data augmentation because of the quadratic training cost of kernel regression. This idea is inspired by Global Average Pooling (GAP), which we show for CNN-GP and CNTK, GAP is equivalent to full translation data augmentation. (2) Representing the input image using a pre-processing technique proposed by Coates et al. (2011), which uses a single convolutional layer composed of random image patches. On CIFAR-10 the resulting kernel, CNN-GP with LAP and horizontal flip data augmentation achieves 89% accuracy, matching the performance of AlexNet (Krizhevsky et al., 2012). Note that this is the best such result we know of for a classifier that is not a trained neural network. Similar improvements are obtained for Fashion-MNIST.
Tasks	Data Augmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=BkgNqkHFPr
PDF	https://openreview.net/pdf?id=BkgNqkHFPr
PWC	https://paperswithcode.com/paper/enhanced-convolutional-neural-tangent-kernels
Repo
Framework

Wildly Unsupervised Domain Adaptation and Its Powerful and Efficient Solution


Title	Wildly Unsupervised Domain Adaptation and Its Powerful and Efficient Solution
Authors	Anonymous
Abstract	In unsupervised domain adaptation (UDA), classifiers for the target domain (TD) are trained with clean labeled data from the source domain (SD) and unlabeled data from TD. However, in the wild, it is hard to acquire a large amount of perfectly clean labeled data in SD given limited budget. Hence, we consider a new, more realistic and more challenging problem setting, where classifiers have to be trained with noisy labeled data from SD and unlabeled data from TD—we name it wildly UDA (WUDA). We show that WUDA ruins all UDA methods if taking no care of label noise in SD, and to this end, we propose a Butterfly framework, a powerful and efficient solution to WUDA. Butterfly maintains four models (e.g., deep networks) simultaneously, where two take care of all adaptations (i.e., noisy-to-clean, labeled-to-unlabeled, and SD-to-TD-distributional) and then the other two can focus on classification in TD. As a consequence, Butterfly possesses all the conceptually necessary components for solving WUDA. Experiments demonstrate that under WUDA, Butterfly significantly outperforms existing baseline methods.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2020-01-01
URL	https://openreview.net/forum?id=rkl2s34twS
PDF	https://openreview.net/pdf?id=rkl2s34twS
PWC	https://paperswithcode.com/paper/wildly-unsupervised-domain-adaptation-and-its
Repo
Framework

Learning vector representation of local content and matrix representation of local motion, with implications for V1


Title	Learning vector representation of local content and matrix representation of local motion, with implications for V1
Authors	Anonymous
Abstract	This paper proposes a representational model for image pair such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components. (1) The vector representations of local contents of images. (2) The matrix representations of local pixel displacements caused by the relative motions between the agent and the objects in the 3D scene. When the image frame undergoes changes due to local pixel displacements, the vectors are multiplied by the matrices that represent the local displacements. Our experiments show that our model can learn to infer local motions. Moreover, the model can learn Gabor-like filter pairs of quadrature phases.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SyeLGlHtPS
PDF	https://openreview.net/pdf?id=SyeLGlHtPS
PWC	https://paperswithcode.com/paper/learning-vector-representation-of-local
Repo
Framework

Differentiable Bayesian Neural Network Inference for Data Streams


Title	Differentiable Bayesian Neural Network Inference for Data Streams
Authors	Anonymous
Abstract	While deep neural networks (NNs) do not provide the confidence of its prediction, Bayesian neural network (BNN) can estimate the uncertainty of the prediction. However, BNNs have not been widely used in practice due to the computational cost of predictive inference. This prohibitive computational cost is a hindrance especially when processing stream data with low-latency. To address this problem, we propose a novel model which approximate BNNs for data streams. Instead of generating separate prediction for each data sample independently, this model estimates the increments of prediction for a new data sample from the previous predictions. The computational cost of this model is almost the same as that of non-Bayesian deep NNs. Experiments including semantic segmentation on real-world data show that this model performs significantly faster than BNNs, estimating uncertainty comparable to the results of BNNs.
Tasks	Semantic Segmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=rJx7wlSYvB
PDF	https://openreview.net/pdf?id=rJx7wlSYvB
PWC	https://paperswithcode.com/paper/differentiable-bayesian-neural-network-1
Repo
Framework