April 1, 2020

2949 words 14 mins read

Paper Group NANR 46

A Simple Technique to Enable Saliency Methods to Pass the Sanity Checks. Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport. How can we generalise learning distributed representations of graphs?. Are Powerful Graph Neural Nets Necessary? A Dissection on Graph Classification. Attacking Graph …

A Simple Technique to Enable Saliency Methods to Pass the Sanity Checks


Title	A Simple Technique to Enable Saliency Methods to Pass the Sanity Checks
Authors	Anonymous
Abstract	{\em Saliency methods} attempt to explain a deep net’s decision by assigning a {\em score} to each feature/pixel in the input, often doing this credit-assignment via the gradient of the output with respect to input. Recently \citet{adebayosan} questioned the validity of many of these methods since they do not pass simple {\em sanity checks}, which test whether the scores shift/vanish when layers of the trained net are randomized, or when the net is retrained using random labels for inputs. % for the inputs. %Surprisingly, the tested methods did not pass these checks: the explanations were relatively unchanged. We propose a simple fix to existing saliency methods that helps them pass sanity checks, which we call {\em competition for pixels}. This involves computing saliency maps for all possible labels in the classification task, and using a simple competition among them to identify and remove less relevant pixels from the map. Some theoretical justification is provided for it and its performance is empirically demonstrated on several popular methods.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJeGZxrFvS
PDF	https://openreview.net/pdf?id=BJeGZxrFvS
PWC	https://paperswithcode.com/paper/a-simple-technique-to-enable-saliency-methods
Repo
Framework

Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport


Title	Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport
Authors	Anonymous
Abstract	Hierarchical abstractions are a methodology for solving large-scale graph problems in various disciplines. Coarsening is one such approach: it generates a pyramid of graphs whereby the one in the next level is a structural summary of the prior one. With a long history in scientific computing, many coarsening strategies were developed based on mathematically driven heuristics. Recently, resurgent interests exist in deep learning to design hierarchical methods learnable through differentiable parameterization. These approaches are paired with downstream tasks for supervised learning. In this work, we propose an unsupervised approach, coined \textsc{OTCoarsening}, with the use of optimal transport. Both the coarsening matrix and the transport cost matrix are parameterized, so that an optimal coarsening strategy can be learned and tailored for a given set of graphs. We demonstrate that the proposed approach produces meaningful coarse graphs and yields competitive performance compared with supervised methods for graph classification.
Tasks	Graph Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkf4XgrKvS
PDF	https://openreview.net/pdf?id=Bkf4XgrKvS
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-graph-hierarchical
Repo
Framework

How can we generalise learning distributed representations of graphs?


Title	How can we generalise learning distributed representations of graphs?
Authors	Anonymous
Abstract	We propose a general framework to construct unsupervised models capable of learning distributed representations of discrete structures such as graphs based on R-Convolution kernels and distributed semantics research. Our framework combines the insights and observations of Deep Graph Kernels and Graph2Vec towards a unified methodology for performing similarity learning on graphs of arbitrary size. This is exemplified by our own instance G2DR which extends Graph2Vec from labelled graphs towards unlabelled graphs and tackles issues of diagonal dominance through pruning of the subgraph vocabulary composing graphs. These changes produce new state of the art results in the downstream application of G2DR embeddings in graph classification tasks over datasets with small labelled graphs in binary classification to multi-class classification on large unlabelled graphs using an off-the-shelf support vector machine.
Tasks	Graph Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=r1xI-gHFDH
PDF	https://openreview.net/pdf?id=r1xI-gHFDH
PWC	https://paperswithcode.com/paper/how-can-we-generalise-learning-distributed
Repo
Framework

Are Powerful Graph Neural Nets Necessary? A Dissection on Graph Classification


Title	Are Powerful Graph Neural Nets Necessary? A Dissection on Graph Classification
Authors	Anonymous
Abstract	Graph Neural Nets (GNNs) have received increasing attentions, partially due to their superior performance in many node and graph classification tasks. However, there is a lack of understanding on what they are learning and how sophisticated the learned graph functions are. In this work, we propose a dissection of GNNs on graph classification into two parts: 1) the graph filtering, where graph-based neighbor aggregations are performed, and 2) the set function, where a set of hidden node features are composed for prediction. To study the importance of both parts, we propose to linearize them separately. We first linearize the graph filtering function, resulting Graph Feature Network (GFN), which is a simple lightweight neural net defined on a \textit{set} of graph augmented features. Further linearization of GFN’s set function results in Graph Linear Network (GLN), which is a linear function. Empirically we perform evaluations on common graph classification benchmarks. To our surprise, we find that, despite the simplification, GFN could match or exceed the best accuracies produced by recently proposed GNNs (with a fraction of computation cost), while GLN underperforms significantly. Our results demonstrate the importance of non-linear set function, and suggest that linear graph filtering with non-linear set function is an efficient and powerful scheme for modeling existing graph classification benchmarks.
Tasks	Graph Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=BJxQxeBYwH
PDF	https://openreview.net/pdf?id=BJxQxeBYwH
PWC	https://paperswithcode.com/paper/are-powerful-graph-neural-nets-necessary-a
Repo
Framework

Attacking Graph Convolutional Networks via Rewiring


Title	Attacking Graph Convolutional Networks via Rewiring
Authors	Anonymous
Abstract	Graph Neural Networks (GNNs) have boosted the performance of many graph related tasks such as node classification and graph classification. Recent researches show that graph neural networks are vulnerable to adversarial attacks, which deliberately add carefully created unnoticeable perturbation to the graph structure. The perturbation is usually created by adding/deleting a few edges, which might be noticeable even when the number of edges modified is small. In this paper, we propose a graph rewiring operation which affects the graph in a less noticeable way compared to adding/deleting edges. We then use reinforcement learning to learn the attack strategy based on the proposed rewiring operation. Experiments on real world graphs demonstrate the effectiveness of the proposed framework. To understand the proposed framework, we further analyze how its generated perturbation to the graph structure affects the output of the target model.
Tasks	Graph Classification, Node Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=B1eXygBFPH
PDF	https://openreview.net/pdf?id=B1eXygBFPH
PWC	https://paperswithcode.com/paper/attacking-graph-convolutional-networks-via-1
Repo
Framework

Feature-based Augmentation for Semi-Supervised Learning


Title	Feature-based Augmentation for Semi-Supervised Learning
Authors	Anonymous
Abstract	In this paper, we propose a feature-based augmentation, a simple and efficient method for semi-supervised learning, where only a small part of the data is labeled. In semi-supervised learning, input image augmentation is typically known to be a technique for ensuring generalization of unlabeled data. However, unlike general input augmentation(translation, filp, Gaussian noise, etc.), our method adds noise to features that have the most contribution on prediction, generating an augmented features. We call this method ``Feature-based augmentation” because the noise is determined by the network weight itself and augmentation is carried out at the feature level. A prediction by augmented features is used as a target for unlabeled data. The target is stable because it is augmented by the noise based on its extracted features. Feature-based augmentation is applied to semi-supervised learning on SVHN, CIFAR-10 datasets. This method achieved a state-of-the-art error rate. In particular, performance differences from other methods were more pronounced with the smaller the number of labeled data. \|
Tasks	Image Augmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=BkgE2yHYDr
PDF	https://openreview.net/pdf?id=BkgE2yHYDr
PWC	https://paperswithcode.com/paper/feature-based-augmentation-for-semi
Repo
Framework

Knowledge Graph Embedding: A Probabilistic Perspective and Generalization Bounds


Title	Knowledge Graph Embedding: A Probabilistic Perspective and Generalization Bounds
Authors	Anonymous
Abstract	We study theoretical properties of embedding methods for knowledge graph completion under the missing completely at random assumption. We prove generalization error bounds for this setting. Even though the missing completely at random setting may seem naive, it is actually how knowledge graph embedding methods are typically benchmarked in the literature. Our results provide, to certain extent, an explanation for why knowledge graph embedding methods work (as much as classical learning theory results provide explanations for classical learning from i.i.d. data).
Tasks	Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding
Published	2020-01-01
URL	https://openreview.net/forum?id=SJg2j0VFPB
PDF	https://openreview.net/pdf?id=SJg2j0VFPB
PWC	https://paperswithcode.com/paper/knowledge-graph-embedding-a-probabilistic
Repo
Framework

Generative Hierarchical Models for Parts, Objects, and Scenes


Title	Generative Hierarchical Models for Parts, Objects, and Scenes
Authors	Anonymous
Abstract	Hierarchical structure such as part-whole relationship in objects and scenes are the most inherent structure in natural scenes. Learning such representation via unsupervised learning can provide various benefits such as interpretability, compositionality, and transferability, which are important in many downstream tasks. In this paper, we propose the first hierarchical generative model for learning multiple latent part-whole relationships in a scene. During inference, taking top-down approach, our model infers the representation of more abstract concept (e.g., objects) and then infers that of more specific concepts (e.g., parts) by conditioning on the corresponding abstract concept. This makes the model avoid a difficult problem of routing between parts and whole. In experiments on images containing multiple objects with different shapes and part compositions, we demonstrate that our model can learn the latent hierarchical structure between parts and wholes and generate imaginary scenes.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkeATxrKwH
PDF	https://openreview.net/pdf?id=SkeATxrKwH
PWC	https://paperswithcode.com/paper/generative-hierarchical-models-for-parts
Repo
Framework

Analyzing Privacy Loss in Updates of Natural Language Models


Title	Analyzing Privacy Loss in Updates of Natural Language Models
Authors	Anonymous
Abstract	To continuously improve quality and reflect changes in data, machine learning-based services have to regularly re-train and update their core models. In the setting of language models, we show that a comparative analysis of model snapshots before and after an update can reveal a surprising amount of detailed information about the changes in the data used for training before and after the update. We discuss the privacy implications of our findings, propose mitigation strategies and evaluate their effect.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1xoserKPH
PDF	https://openreview.net/pdf?id=B1xoserKPH
PWC	https://paperswithcode.com/paper/analyzing-privacy-loss-in-updates-of-natural
Repo
Framework

Wyner VAE: A Variational Autoencoder with Succinct Common Representation Learning


Title	Wyner VAE: A Variational Autoencoder with Succinct Common Representation Learning
Authors	Anonymous
Abstract	A new variational autoencoder (VAE) model is proposed that learns a succinct common representation of two correlated data variables for conditional and joint generation tasks. The proposed Wyner VAE model is based on two information theoretic problems—distributed simulation and channel synthesis—in which Wyner’s common information arises as the fundamental limit of the succinctness of the common representation. The Wyner VAE decomposes a pair of correlated data variables into their common representation (e.g., a shared concept) and local representations that capture the remaining randomness (e.g., texture and style) in respective data variables by imposing the mutual information between the data variables and the common representation as a regularization term. The utility of the proposed approach is demonstrated through experiments for joint and conditional generation with and without style control using synthetic data and real images. Experimental results show that learning a succinct common representation achieves better generative performance and that the proposed model outperforms existing VAE variants and the variational information bottleneck method.
Tasks	Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=r1g1CAEKDH
PDF	https://openreview.net/pdf?id=r1g1CAEKDH
PWC	https://paperswithcode.com/paper/wyner-vae-a-variational-autoencoder-with
Repo
Framework

Pre-trained Contextual Embedding of Source Code


Title	Pre-trained Contextual Embedding of Source Code
Authors	Anonymous
Abstract	The source code of a program not only serves as a formal description of an executable task, but it also serves to communicate developer intent in a human-readable form. To facilitate this, developers use meaningful identifier names and natural-language documentation. This makes it possible to successfully apply sequence-modeling approaches, shown to be effective in natural-language processing, to source code. A major advancement in natural-language understanding has been the use of pre-trained token embeddings; BERT and other works have further shown that pre-trained contextual embeddings can be extremely powerful and can be finetuned effectively for a variety of downstream supervised tasks. Inspired by these developments, we present the first attempt to replicate this success on source code. We curate a massive corpus of Python programs from GitHub to pre-train a BERT model, which we call Code Understanding BERT (CuBERT). We also pre-train Word2Vec embeddings on the same dataset. We create a benchmark of five classification tasks and compare finetuned CuBERT against sequence models trained with and without the Word2Vec embeddings. Our results show that CuBERT outperforms the baseline methods by a margin of 2.9-22%. We also show its superiority when finetuned with smaller datasets, and over fewer epochs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rygoURNYvS
PDF	https://openreview.net/pdf?id=rygoURNYvS
PWC	https://paperswithcode.com/paper/pre-trained-contextual-embedding-of-source
Repo
Framework

Irrationality can help reward inference


Title	Irrationality can help reward inference
Authors	Anonymous
Abstract	Specifying reward functions is difficult, which motivates the area of reward inference: learning rewards from human behavior. The starting assumption in the area is that human behavior is optimal given the desired reward function, but in reality people have many different forms of irrationality, from noise to myopia to risk aversion and beyond. This fact seems like it will be strictly harmful to reward inference: it is already hard to infer the reward from rational behavior, and noise and systematic biases make actions have less direct of a relationship to the reward. Our insight in this work is that, contrary to expectations, irrationality can actually help rather than hinder reward inference. For some types and amounts of irrationality, the expert now produces more varied policies compared to rational behavior, which help disambiguate among different reward parameters – those that otherwise correspond to the same rational behavior. We put this to the test in a systematic analysis of the effect of irrationality on reward inference. We start by covering the space of irrationalities as deviations from the Bellman update, simulate expert behavior, and measure the accuracy of inference to contrast the different types and study the gains and losses. We provide a mutual information-based analysis of our findings, and wrap up by discussing the need to accurately model irrationality, as well as to what extent we might expect (or be able to train) real people to exhibit helpful irrationalities when teaching rewards to learners.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJlo91BYPr
PDF	https://openreview.net/pdf?id=BJlo91BYPr
PWC	https://paperswithcode.com/paper/irrationality-can-help-reward-inference
Repo
Framework

Improving the robustness of ImageNet classifiers using elements of human visual cognition


Title	Improving the robustness of ImageNet classifiers using elements of human visual cognition
Authors	Anonymous
Abstract	We investigate the robustness properties of image recognition models equipped with two features inspired by human vision, an explicit episodic memory and a shape bias, at the ImageNet scale. As reported in previous work, we show that an explicit episodic memory improves the robustness of image recognition models against small-norm adversarial perturbations under some threat models. It does not, however, improve the robustness against more natural, and typically larger, perturbations. Learning more robust features during training appears to be necessary for robustness in this second sense. We show that features derived from a model that was encouraged to learn global, shape-based representations (Geirhos et al., 2019) do not only improve the robustness against natural perturbations, but when used in conjunction with an episodic memory, they also provide additional robustness against adversarial perturbations. Finally, we address three important design choices for the episodic memory: memory size, dimensionality of the memories and the retrieval method. We show that to make the episodic memory more compact, it is preferable to reduce the number of memories by clustering them, instead of reducing their dimensionality.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HylYtaVtwS
PDF	https://openreview.net/pdf?id=HylYtaVtwS
PWC	https://paperswithcode.com/paper/improving-the-robustness-of-imagenet-1
Repo
Framework

Contrastive Learning of Structured World Models


Title	Contrastive Learning of Structured World Models
Authors	Anonymous
Abstract	A structured understanding of our world in terms of objects, relations, and hierarchies is an important component of human cognition. Learning such a structured world model from raw sensory data remains a challenge. As a step towards this goal, we introduce Contrastively-trained Structured World Models (C-SWMs). C-SWMs utilize a contrastive approach for representation learning in environments with compositional structure. We structure each state embedding as a set of object representations and their relations, modeled by a graph neural network. This allows objects to be discovered from raw pixel observations without direct supervision as part of the learning process. We evaluate C-SWMs on compositional environments involving multiple interacting objects that can be manipulated independently by an agent, simple Atari games, and a multi-object physics simulation. Our experiments demonstrate that C-SWMs can overcome limitations of models based on pixel reconstruction and outperform typical representatives of this model class in highly structured environments, while learning interpretable object-based representations.
Tasks	Atari Games, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gax6VtDB
PDF	https://openreview.net/pdf?id=H1gax6VtDB
PWC	https://paperswithcode.com/paper/contrastive-learning-of-structured-world
Repo
Framework

Efficient Systolic Array Based on Decomposable MAC for Quantized Deep Neural Networks


Title	Efficient Systolic Array Based on Decomposable MAC for Quantized Deep Neural Networks
Authors	Ning-Chi Huang, Huan-Jan Chou, Kai-Chiang Wu
Abstract	Deep Neural Networks (DNNs) have achieved high accuracy in various machine learning applications in recent years. As the recognition accuracy of deep learning applications increases, reducing the complexity of these neural networks and performing the DNN computation on embedded systems or mobile devices become an emerging and crucial challenge. Quantization has been presented to reduce the utilization of computational resources by compressing the input data and weights from floating-point numbers to integers with shorter bit-width. For practical power reduction, it is necessary to operate these DNNs with quantized parameters on appropriate hardware. Therefore, systolic arrays are adopted to be the major computation units for matrix multiplication in DNN accelerators. To obtain a better tradeoff between the precision/accuracy and power consumption, using parameters with various bit-widths among different layers within a DNN is an advanced quantization method. In this paper, we propose a novel decomposition strategy to construct a low-power decomposable multiplier-accumulator (MAC) for the energy efficiency of quantized DNNs. In the experiments, when 65% multiplication operations of VGG-16 are operated in shorter bit-width with at most 1% accuracy loss on the CIFAR-10 dataset, our decomposable MAC has 50% energy reduction compared with a non-decomposable MAC.
Tasks	Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=Hye-p0VFPB
PDF	https://openreview.net/pdf?id=Hye-p0VFPB
PWC	https://paperswithcode.com/paper/efficient-systolic-array-based-on
Repo
Framework