April 1, 2020

3041 words 15 mins read

Paper Group NANR 8

Paper Group NANR 8

Demystifying Graph Neural Network Via Graph Filter Assessment. Wide Neural Networks are Interpolating Kernel Methods: Impact of Initialization on Generalization. Optimising Neural Network Architectures for Provable Adversarial Robustness. Convolutional Bipartite Attractor Networks. Random Matrix Theory Proves that Deep Learning Representations of G …

Demystifying Graph Neural Network Via Graph Filter Assessment

Title Demystifying Graph Neural Network Via Graph Filter Assessment
Authors Anonymous
Abstract Graph Neural Networks (GNNs) have received tremendous attention recently due to their power in handling graph data for different downstream tasks across different application domains. The key of GNN is its graph convolutional filters, and recently various kinds of filters are designed. However, there still lacks in-depth analysis on (1) Whether there exists a best filter that can perform best on all graph data; (2) Which graph properties will influence the optimal choice of graph filter; (3) How to design appropriate filter adaptive to the graph data. In this paper, we focus on addressing the above three questions. We first propose a novel assessment tool to evaluate the effectiveness of graph convolutional filters for a given graph. Using the assessment tool, we find out that there is no single filter as a `silver bullet’ that perform the best on all possible graphs. In addition, different graph structure properties will influence the optimal graph convolutional filter’s design choice. Based on these findings, we develop Adaptive Filter Graph Neural Network (AFGNN), a simple but powerful model that can adaptively learn task-specific filter. For a given graph, it leverages graph filter assessment as regularization and learns to combine from a set of base filters. Experiments on both synthetic and real-world benchmark datasets demonstrate that our proposed model can indeed learn an appropriate filter and perform well on graph tasks. |
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1erNxBtwr
PDF https://openreview.net/pdf?id=r1erNxBtwr
PWC https://paperswithcode.com/paper/demystifying-graph-neural-network-via-graph
Repo
Framework

Wide Neural Networks are Interpolating Kernel Methods: Impact of Initialization on Generalization

Title Wide Neural Networks are Interpolating Kernel Methods: Impact of Initialization on Generalization
Authors Anonymous
Abstract The recently developed link between strongly overparametrized neural networks (NNs) and kernel methods has opened a new way to understand puzzling features of NNs, such as their convergence and generalization behaviors. In this paper, we make the bias of initialization on strongly overparametrized NNs under gradient descent explicit. We prove that fully-connected wide ReLU-NNs trained with squared loss are essentially a sum of two parts: The first is the minimum complexity solution of an interpolating kernel method, while the second contributes to the test error only and depends heavily on the initialization. This decomposition has two consequences: (a) the second part becomes negligible in the regime of small initialization variance, which allows us to transfer generalization bounds from minimum complexity interpolating kernel methods to NNs; (b) in the opposite regime, the test error of wide NNs increases significantly with the initialization variance, while still interpolating the training data perfectly. Our work shows that – contrary to common belief – the initialization scheme has a strong effect on generalization performance, providing a novel criterion to identify good initialization strategies.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJxvD3VKvr
PDF https://openreview.net/pdf?id=rJxvD3VKvr
PWC https://paperswithcode.com/paper/wide-neural-networks-are-interpolating-kernel
Repo
Framework

Optimising Neural Network Architectures for Provable Adversarial Robustness

Title Optimising Neural Network Architectures for Provable Adversarial Robustness
Authors Henry Gouk, Timothy M. Hospedales
Abstract Existing Lipschitz-based provable defences to adversarial examples only cover the L2 threat model. We introduce the first bound that makes use of Lipschitz continuity to provide a more general guarantee for threat models based on any p-norm. Additionally, a new strategy is proposed for designing network architectures that exhibit superior provable adversarial robustness over conventional convolutional neural networks. Experiments are conducted to validate our theoretical contributions, show that the assumptions made during the design of our novel architecture hold in practice, and quantify the empirical robustness of several Lipschitz-based adversarial defence methods.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HJlAUaVYvH
PDF https://openreview.net/pdf?id=HJlAUaVYvH
PWC https://paperswithcode.com/paper/optimising-neural-network-architectures-for
Repo
Framework

Convolutional Bipartite Attractor Networks

Title Convolutional Bipartite Attractor Networks
Authors Anonymous
Abstract In human perception and cognition, a fundamental operation that brains perform is interpretation: constructing coherent neural states from noisy, incomplete, and intrinsically ambiguous evidence. The problem of interpretation is well matched to an early and often overlooked architecture, the attractor network—a recurrent neural net that performs constraint satisfaction, imputation of missing features, and clean up of noisy data via energy minimization dynamics. We revisit attractor nets in light of modern deep learning methods and propose a convolutional bipartite architecture with a novel training loss, activation function, and connectivity constraints. We tackle larger problems than have been previously explored with attractor nets and demonstrate their potential for image completion and super-resolution. We argue that this architecture is better motivated than ever-deeper feedforward models and is a viable alternative to more costly sampling-based generative methods on a range of supervised and unsupervised tasks.
Tasks Imputation, Super-Resolution
Published 2020-01-01
URL https://openreview.net/forum?id=Hke0lRNYwS
PDF https://openreview.net/pdf?id=Hke0lRNYwS
PWC https://paperswithcode.com/paper/convolutional-bipartite-attractor-networks-1
Repo
Framework

Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures

Title Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures
Authors Anonymous
Abstract This paper shows that deep learning (DL) representations of data produced by generative adversarial nets (GANs) are random vectors which fall within the class of so-called concentrated random vectors. Further exploiting the fact that Gram matrices, of the type G = X’X with X = [x_1 , . . . , x_n ] ∈ R p×n and x_i independent concentrated random vectors from a mixture model, behave asymptotically (as n, p → ∞) as if the x_i were drawn from a Gaussian mixture, suggests that DL representations of GAN-data can be fully described by their first two statistical moments for a wide range of standard classifiers. Our theoretical findings are validated by generating images with the BigGAN model and across different popular deep representation networks.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rkgPnhNFPB
PDF https://openreview.net/pdf?id=rkgPnhNFPB
PWC https://paperswithcode.com/paper/random-matrix-theory-proves-that-deep
Repo
Framework

PCMC-Net: Feature-based Pairwise Choice Markov Chains

Title PCMC-Net: Feature-based Pairwise Choice Markov Chains
Authors Anonymous
Abstract Pairwise Choice Markov Chains (PCMC) have been recently introduced to overcome limitations of choice models based on traditional axioms unable to express empirical observations from modern behavior economics like framing effects and asymmetric dominance. The inference approach that estimates the transition rates between each possible pair of alternatives via maximum likelihood suffers when the examples of each alternative are scarce and is inappropriate when new alternatives can be observed at test time. In this work, we propose an amortized inference approach for PCMC by embedding its definition into a neural network that represents transition rates as a function of the alternatives’ and individual’s features. We apply our construction to the complex case of airline itinerary booking where singletons are common (due to varying prices and individual-specific itineraries), and asymmetric dominance and behaviors strongly dependent on market segments are observed. Experiments show our network significantly outperforming, in terms of prediction accuracy and logarithmic loss, feature engineered standard and latent class Multinomial Logit models as well as recent machine learning approaches.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BJgWE1SFwS
PDF https://openreview.net/pdf?id=BJgWE1SFwS
PWC https://paperswithcode.com/paper/pcmc-net-feature-based-pairwise-choice-markov
Repo
Framework

Functional vs. parametric equivalence of ReLU networks

Title Functional vs. parametric equivalence of ReLU networks
Authors Anonymous
Abstract We address the following question: How redundant is the parameterisation of ReLU networks? Specifically, we consider transformations of the weight space which leave the function implemented by the network intact. Two such transformations are known for feed-forward architectures: permutation of neurons within a layer, and positive scaling of all incoming weights of a neuron coupled with inverse scaling of its outgoing weights. In this work, we show for architectures with non-increasing widths that permutation and scaling are in fact the only function-preserving weight transformations. For any eligible architecture we give an explicit construction of a neural network such that any other network that implements the same function can be obtained from the original one by the application of permutations and rescaling. The proof relies on a geometric understanding of boundaries between linear regions of ReLU networks, and we hope the developed mathematical tools are of independent interest.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Bylx-TNKvH
PDF https://openreview.net/pdf?id=Bylx-TNKvH
PWC https://paperswithcode.com/paper/functional-vs-parametric-equivalence-of-relu
Repo
Framework

Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee

Title Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee
Authors Anonymous
Abstract Over-parameterized deep neural networks trained by simple first-order methods are known to be able to fit any labeling of data. Such over-fitting ability hinders generalization when mislabeled training examples are present. On the other hand, simple regularization methods like early-stopping can often achieve highly nontrivial performance on clean test data in these scenarios, a phenomenon not theoretically understood. This paper proposes and analyzes two simple and intuitive regularization methods: (i) regularization by the distance between the network parameters to initialization, and (ii) adding a trainable auxiliary variable to the network output for each training example. Theoretically, we prove that gradient descent training with either of these two methods leads to a generalization guarantee on the clean data distribution despite being trained using noisy labels. Our generalization analysis relies on the connection between wide neural network and neural tangent kernel (NTK). The generalization bound is independent of the network size, and is comparable to the bound one can get when there is no label noise. Experimental results verify the effectiveness of these methods on noisily labeled datasets.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Hke3gyHYwH
PDF https://openreview.net/pdf?id=Hke3gyHYwH
PWC https://paperswithcode.com/paper/simple-and-effective-regularization-methods
Repo
Framework

PGCN-TCA: Pseudo Graph Convolutional Network With Temporal and Channel-Wise Attention for Skeleton-Based Action Recognition

Title PGCN-TCA: Pseudo Graph Convolutional Network With Temporal and Channel-Wise Attention for Skeleton-Based Action Recognition
Authors Hongye Yang, Yuzhang Gu, Jianchao Zhu, Keli Hu, Xiaolin Zhang
Abstract Skeleton-based human action recognition has become an active research area in recent years. The key to this task is to fully explore both spatial and temporal features. Recently, GCN-based methods modeling the human body skeletons as spatial-temporal graphs, have achieved remarkable performances. However, most GCN-based methods use a fixed adjacency matrix defined by the dataset, which can only capture the structural information provided by joints directly connected through bones and ignore the dependencies between distant joints that are not connected. In addition, such a fixed adjacency matrix used in all layers leads to the network failing to extract multi-level semantic features. In this paper we propose a pseudo graph convolutional network with temporal and channel-wise attention (PGCN-TCA) to solve this problem. The fixed normalized adjacent matrix is substituted with a learnable matrix. In this way, the matrix can learn the dependencies between connected joints and joints that are not physically connected. At the same time, learnable matrices in different layers can help the network capture multi-level features in spatial domain. Moreover, Since frames and input channels that contain outstanding characteristics play significant roles in distinguishing the action from others, we propose a mixed temporal and channel-wise attention. Our method achieves comparable performances to state-of-the-art methods on NTU-RGB+D and HDM05 datasets.
Tasks Skeleton Based Action Recognition, Temporal Action Localization
Published 2020-01-06
URL https://doi.org/10.1109/ACCESS.2020.2964115
PDF https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8950167
PWC https://paperswithcode.com/paper/pgcn-tca-pseudo-graph-convolutional-network
Repo
Framework

Prox-SGD: Training Structured Neural Networks under Regularization and Constraints

Title Prox-SGD: Training Structured Neural Networks under Regularization and Constraints
Authors Anonymous
Abstract In this paper, we consider the problem of training neural networks (NN). To promote a NN with specific structures, we explicitly take into consideration the nonsmooth regularization (such as L1-norm) and constraints (such as interval constraint). This is formulated as a constrained nonsmooth nonconvex optimization problem, and we propose a convergent proximal-type stochastic gradient descent (Prox-SGD) algorithm. We show that under properly selected learning rates, momentum eventually resembles the unknown real gradient and thus is crucial in analyzing the convergence. We establish that with probability 1, every limit point of the sequence generated by the proposed Prox-SGD is a stationary point. Then the Prox-SGD is tailored to train a sparse neural network and a binary neural network, and the theoretical analysis is also supported by extensive numerical tests.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HygpthEtvr
PDF https://openreview.net/pdf?id=HygpthEtvr
PWC https://paperswithcode.com/paper/prox-sgd-training-structured-neural-networks
Repo
Framework

LabelFool: A Trick in the Label Space

Title LabelFool: A Trick in the Label Space
Authors Anonymous
Abstract It is widely known that well-designed perturbations can cause state-of-the-art machine learning classifiers to mis-label an image, with sufficiently small perturbations that are imperceptible to the human eyes. However, by detecting the inconsistency between the image and wrong label, the human observer would be alerted of the attack. In this paper, we aim to design attacks that not only make classifiers generate wrong labels, but also make the wrong labels imperceptible to human observers. To achieve this, we propose an algorithm called LabelFool which identifies a target label similar to the ground truth label and finds a perturbation of the image for this target label. We first find the target label for an input image by a probability model, then move the input in the feature space towards the target label. Subjective studies on ImageNet show that in the label space, our attack is much less recognizable by human observers, while objective experimental results on ImageNet show that we maintain similar performance in the image space as well as attack rates to state-of-the-art attack algorithms.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1glDpNYwS
PDF https://openreview.net/pdf?id=r1glDpNYwS
PWC https://paperswithcode.com/paper/labelfool-a-trick-in-the-label-space
Repo
Framework

Context-Aware Cross-Attention for Skeleton-Based Human Action Recognition

Title Context-Aware Cross-Attention for Skeleton-Based Human Action Recognition
Authors Yanbo Fan, Shuchen Weng, Yong Zhang, Boxin Shi, Yi Zhang
Abstract Skeleton-based human action recognition is becoming popular due to its computational efficiency and robustness. Since not all skeleton joints are informative for action recognition, attention mechanisms are adopted to extract informative joints and suppress the influence of irrelevant ones. However, existing attention frameworks usually ignore helpful scenario context information. In this paper, we propose a cross-attention module that consists of a self-attention branch and a cross-attention branch for skeleton-based action recognition. It helps to extract joints that are not only more informative but also highly correlated to the corresponding scenario context information. Moreover, the cross-attention module maintains input variables’ size and can be flexibly incorporated into many existing frameworks without breaking their behaviors. To facilitate end-to-end training, we further develop a scenario context information extraction branch to extract context information from raw RGB video directly. We conduct comprehensive experiments on the NTU RGB+D and the Kinetics databases, and experimental results demonstrate the correctness and effectiveness of the proposed model.
Tasks Skeleton Based Action Recognition, Temporal Action Localization
Published 2020-01-20
URL https://doi.org/10.1109/ACCESS.2020.2968054
PDF https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8963933
PWC https://paperswithcode.com/paper/context-aware-cross-attention-for-skeleton
Repo
Framework

Spatially Parallel Attention and Component Extraction for Scene Decomposition

Title Spatially Parallel Attention and Component Extraction for Scene Decomposition
Authors Anonymous
Abstract We propose a generative latent variable model for unsupervised scene decomposition. Our model, SPACE, provides a unified probabilistic modeling framework to combine the best of previous models. SPACE can explicitly provide factorized object representation per foreground object while also decomposing background segments of complex morphology. Previous models are good at either of these, but not both. With the proposed parallel-spatial attention, SPACE also resolves the scalability problem of previous methods and thus makes the model applicable to scenes with a much larger number of objects without performance degradation. Besides, the foreground/background distinction of SPACE is more effective and intuitive than other methods because unlike other methods SPACE can detect static objects that look like background. In experiments on Atari and 3D-Rooms, we show that SPACE achieves the above properties consistently in all experiments in comparison to SPAIR, IODINE, and GENESIS.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rkl03ySYDH
PDF https://openreview.net/pdf?id=rkl03ySYDH
PWC https://paperswithcode.com/paper/spatially-parallel-attention-and-component
Repo
Framework

Collaborative Training of Balanced Random Forests for Open Set Domain Adaptation

Title Collaborative Training of Balanced Random Forests for Open Set Domain Adaptation
Authors Anonymous
Abstract In this paper, we introduce a collaborative training algorithm of balanced random forests for domain adaptation tasks which can avoid the overfitting problem. In real scenarios, most domain adaptation algorithms face the challenges from noisy, insufficient training data. Moreover in open set categorization, unknown or misaligned source and target categories adds difficulty. In such cases, conventional methods suffer from overfitting and fail to successfully transfer the knowledge of the source to the target domain. To address these issues, the following two techniques are proposed. First, we introduce the optimized decision tree construction method, in which the data at each node are split into equal sizes while maximizing the information gain. Compared to the conventional random forests, it generates larger and more balanced decision trees due to the even-split constraint, which contributes to enhanced discrimination power and reduced overfitting. Second, to tackle the domain misalignment problem, we propose the domain alignment loss which penalizes uneven splits of the source and target domain data. By collaboratively optimizing the information gain of the labeled source data as well as the entropy of unlabeled target data distributions, the proposed CoBRF algorithm achieves significantly better performance than the state-of-the-art methods. The proposed algorithm is extensively evaluated in various experimental setups in challenging domain adaptation tasks with noisy and small training data as well as open set domain adaptation problems, for two backbone networks of AlexNet and ResNet-50.
Tasks Domain Adaptation
Published 2020-01-01
URL https://openreview.net/forum?id=SkeJPertPS
PDF https://openreview.net/pdf?id=SkeJPertPS
PWC https://paperswithcode.com/paper/collaborative-training-of-balanced-random
Repo
Framework

RTFM: Generalising to New Environment Dynamics via Reading

Title RTFM: Generalising to New Environment Dynamics via Reading
Authors Anonymous
Abstract Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations. We procedurally generate environment dynamics and corresponding language descriptions of the dynamics, such that agents must read to understand new environment dynamics instead of memorising any particular information. In addition, we propose txt2π, a model that captures three-way interactions between the goal, document, and observations. On RTFM, txt2π generalises to new environments with dynamics not seen during training via reading. Furthermore, our model outperforms baselines such as FiLM and language-conditioned CNNs on RTFM. Through curriculum learning, txt2π produces policies that excel on complex RTFM tasks requiring several reasoning and coreference steps.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJgob6NKvH
PDF https://openreview.net/pdf?id=SJgob6NKvH
PWC https://paperswithcode.com/paper/rtfm-generalising-to-new-environment-dynamics
Repo
Framework
comments powered by Disqus