April 1, 2020

3099 words 15 mins read

Paper Group NANR 65

Enhancing Adversarial Defense by k-Winners-Take-All. Learn to Explain Efficiently via Neural Logic Inductive Learning. Leveraging Entanglement Entropy for Deep Understanding of Attention Matrix in Text Matching. Reducing Transformer Depth on Demand with Structured Dropout. AlignNet: Self-supervised Alignment Module. Moniqua: Modulo Quantized Commun …

Enhancing Adversarial Defense by k-Winners-Take-All


Title	Enhancing Adversarial Defense by k-Winners-Take-All
Authors	Anonymous
Abstract	We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks. Instead of using popular activation functions (such as ReLU), we advocate the use of k-Winners-Take-All (k-WTA) activation, a C0 discontinuous function that purposely invalidates the neural network model’s gradient at densely distributed input data points. The proposed k-WTA activation can be readily used in nearly all existing networks and training methods with no significant overhead. Our proposal is theoretically rationalized. We analyze why the discontinuities in k-WTA networks can largely prevent gradient-based search of adversarial examples and why they at the same time remain innocuous to the network training. This understanding is also empirically backed. We test k-WTA activation on various network structures optimized by a training method, be it adversarial training or not. In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks.
Tasks	Adversarial Defense
Published	2020-01-01
URL	https://openreview.net/forum?id=Skgvy64tvr
PDF	https://openreview.net/pdf?id=Skgvy64tvr
PWC	https://paperswithcode.com/paper/enhancing-adversarial-defense-by-k-winners
Repo
Framework

Learn to Explain Efficiently via Neural Logic Inductive Learning


Title	Learn to Explain Efficiently via Neural Logic Inductive Learning
Authors	Anonymous
Abstract	The capability of making interpretable and self-explanatory decisions is essential for developing responsible machine learning systems. In this work, we study the learning to explain the problem in the scope of inductive logic programming (ILP). We propose Neural Logic Inductive Learning (NLIL), an efficient differentiable ILP framework that learns first-order logic rules that can explain the patterns in the data. In experiments, compared with the state-of-the-art models, we find NLIL is able to search for rules that are x10 times longer while remaining x3 times faster. We also show that NLIL can scale to large image datasets, i.e. Visual Genome, with 1M entities.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJlh8CEYDB
PDF	https://openreview.net/pdf?id=SJlh8CEYDB
PWC	https://paperswithcode.com/paper/learn-to-explain-efficiently-via-neural-logic-1
Repo
Framework

Leveraging Entanglement Entropy for Deep Understanding of Attention Matrix in Text Matching


Title	Leveraging Entanglement Entropy for Deep Understanding of Attention Matrix in Text Matching
Authors	Anonymous
Abstract	The formal understanding of deep learning has made great progress based on quantum many-body physics. For example, the entanglement entropy in quantum many-body systems can interpret the inductive bias of neural network and then guide the design of network structure and parameters for certain tasks. However, there are two unsolved problems in the current study of entanglement entropy, which limits its application potential. First, the theoretical benefits of entanglement entropy was only investigated in the representation of a single object (e.g., an image or a sentence), but has not been well studied in the matching of two objects (e.g., question-answering pairs). Second, the entanglement entropy can not be qualitatively calculated since the exponentially increasing dimension of the matching matrix. In this paper, we are trying to address these two problem by investigating the fundamental connections between the entanglement entropy and the attention matrix. We prove that by a mapping (via the trace operator) on the high-dimensional matching matrix, a low-dimensional attention matrix can be derived. Based on such a attention matrix, we can provide a feasible solution to the entanglement entropy that describes the correlation between the two objects in matching tasks. Inspired by the theoretical property of the entanglement entropy, we can design the network architecture adaptively in a typical text matching task, i.e., question-answering task.
Tasks	Question Answering, Text Matching
Published	2020-01-01
URL	https://openreview.net/forum?id=rJx8ylSKvr
PDF	https://openreview.net/pdf?id=rJx8ylSKvr
PWC	https://paperswithcode.com/paper/leveraging-entanglement-entropy-for-deep
Repo
Framework

Reducing Transformer Depth on Demand with Structured Dropout


Title	Reducing Transformer Depth on Demand with Structured Dropout
Authors	Anonymous
Abstract	Overparametrized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering. These models contain hundreds of millions of parameters, necessitating a large amount of computation and making them prone to overfitting. In this work, we explore LayerDrop, a form of structured dropout, which has a regularization effect during training and allows for efficient pruning at inference time. In particular, we show that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance. We demonstrate the effectiveness of our approach by improving the state of the art on machine translation, language modeling, summarization, question answering, and language understanding benchmarks. Moreover, we show that our approach leads to small BERT-like models of higher quality than when training from scratch or using distillation.
Tasks	Language Modelling, Machine Translation, Question Answering
Published	2020-01-01
URL	https://openreview.net/forum?id=SylO2yStDr
PDF	https://openreview.net/pdf?id=SylO2yStDr
PWC	https://paperswithcode.com/paper/reducing-transformer-depth-on-demand-with
Repo
Framework

AlignNet: Self-supervised Alignment Module


Title	AlignNet: Self-supervised Alignment Module
Authors	Anonymous
Abstract	The natural world consists of objects that we perceive as persistent in space and time, even though these objects appear, disappear and reappear in our field of view as we move. This can be attributed to our notion of object persistence – our knowledge that objects typically continue to exist, even if we can no longer see them – and our ability to track objects. Drawing inspiration from the psychology literature on `sticky indices’, we propose the AlignNet, a model that learns to assign unique indices to new objects when they first appear and reassign the index to subsequent instances of that object. By introducing a persistent object-based memory, the AlignNet may be used to keep track of objects across time, even if they disappear and reappear later. We implement the AlignNet as a graph network applied to a bipartite graph, in which the input nodes are objects from two sets that we wish to align. The network is trained to predict the edges which connect two instances of the same object across sets. The model is also capable of identifying when there are no matches and dealing with these cases. We perform experiments to show the model’s ability to deal with the appearance, disappearance and reappearance of objects. Additionally, we demonstrate how a persistent object-based memory can help solve question-answering problems in a partially observable environment. \|
Tasks	Question Answering
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gcw1HYPr
PDF	https://openreview.net/pdf?id=H1gcw1HYPr
PWC	https://paperswithcode.com/paper/alignnet-self-supervised-alignment-module
Repo
Framework

Moniqua: Modulo Quantized Communication in Decentralized SGD


Title	Moniqua: Modulo Quantized Communication in Decentralized SGD
Authors	Anonymous
Abstract	Decentralized stochastic gradient descent (SGD), where parallel workers are connected to form a graph and communicate adjacently, has shown promising results both theoretically and empirically. In this paper we propose Moniqua, a technique that allows decentralized SGD to use quantized communication. We prove in theory that Moniqua communicates a provably bounded number of bits per iteration, while converging at the same asymptotic rate as the original algorithm does with full-precision communication. Moniqua improves upon prior works in that it (1) requires no additional memory, (2) applies to non-convex objectives, and (3) supports biased/linear quantizers. We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing less than 4-bits-per-parameter communication without affecting convergence when training VGG16 on CIFAR10.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyxhqhVKPB
PDF	https://openreview.net/pdf?id=HyxhqhVKPB
PWC	https://paperswithcode.com/paper/moniqua-modulo-quantized-communication-in
Repo
Framework

Towards Verified Robustness under Text Deletion Interventions


Title	Towards Verified Robustness under Text Deletion Interventions
Authors	Anonymous
Abstract	Neural networks are widely used in Natural Language Processing, yet despite their empirical successes, their behaviour is brittle: they are both over-sensitive to small input changes, and under-sensitive to deletions of large fractions of input text. This paper aims to tackle under-sensitivity in the context of natural language inference by ensuring that models do not become more confident in their predictions as arbitrary subsets of words from the input text are deleted. We develop a novel technique for formal verification of this specification for models based on the popular decomposable attention mechanism by employing the efficient yet effective interval bound propagation (IBP) approach. Using this method we can efficiently prove, given a model, whether a particular sample is free from the under-sensitivity problem. We compare different training methods to address under-sensitivity, and compare metrics to measure it. In our experiments on the SNLI and MNLI datasets, we observe that IBP training leads to a significantly improved verified accuracy. On the SNLI test set, we can verify 18.4% of samples, a substantial improvement over only 2.8% using standard training.
Tasks	Natural Language Inference
Published	2020-01-01
URL	https://openreview.net/forum?id=SyxhVkrYvr
PDF	https://openreview.net/pdf?id=SyxhVkrYvr
PWC	https://paperswithcode.com/paper/towards-verified-robustness-under-text
Repo
Framework

word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement


Title	word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement
Authors	Anonymous
Abstract	Deep learning natural language processing models often use vector word embeddings, such as word2vec or GloVe, to represent words. A discrete sequence of words can be much more easily integrated with downstream neural layers if it is represented as a sequence of continuous vectors. Also, semantic relationships between words, learned from a text corpus, can be encoded in the relative configurations of the embedding vectors. However, storing and accessing embedding vectors for all words in a dictionary requires large amount of space, and may stain systems with limited GPU memory. Here, we used approaches inspired by quantum computing to propose two related methods, word2ket and word2ketXS, for storing word embedding matrix during training and inference in a highly efficient way. Our approach achieves a hundred-fold or more reduction in the space required to store the embeddings with almost no relative drop in accuracy in practical natural language processing tasks.
Tasks	Word Embeddings
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxARkrFwB
PDF	https://openreview.net/pdf?id=HkxARkrFwB
PWC	https://paperswithcode.com/paper/word2ket-space-efficient-word-embeddings
Repo
Framework

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning


Title	On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning
Authors	Anonymous
Abstract	Generalization error (also known as the out-of-sample error) measures how well the hypothesis learned from training data generalizes to previously unseen data. Proving tight generalization error bounds is a central question in statistical learning theory. In this paper, we obtain generalization error bounds for learning general non-convex objectives, which has attracted significant attention in recent years. We develop a new framework, termed Bayes-Stability, for proving algorithm-dependent generalization error bounds. The new framework combines ideas from both the PAC-Bayesian theory and the notion of algorithmic stability. Applying the Bayes-Stability method, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers (and is typically tighter than) a recent result in Mou et al. (2018) and improves upon the results in Pensia et al. (2018). Our experiments demonstrate that our data-dependent bounds can distinguish randomly labelled data from normal data, which provides an explanation to the intriguing phenomena observed in Zhang et al. (2017a). We also study the setting where the total loss is the sum of a bounded loss and an additiona l`2 regularization term. We obtain new generalization bounds for the continuous Langevin dynamic in this setting by developing a new Log-Sobolev inequality for the parameter distribution at any time. Our new bounds are more desirable when the noise level of the processis not very small, and do not become vacuous even when T tends to infinity. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkxxtgHKPS
PDF	https://openreview.net/pdf?id=SkxxtgHKPS
PWC	https://paperswithcode.com/paper/on-generalization-error-bounds-of-noisy-1
Repo
Framework

CLN2INV: Learning Loop Invariants with Continuous Logic Networks


Title	CLN2INV: Learning Loop Invariants with Continuous Logic Networks
Authors	Anonymous
Abstract	Program verification offers a framework for ensuring program correctness and therefore systematically eliminating different classes of bugs. Inferring loop invariants is one of the main challenges behind automated verification of real-world programs which often contain many loops. In this paper, we present Continuous Logic Network (CLN), a novel neural architecture for automatically learning loop invariants directly from program execution traces. Unlike existing neural networks, CLNs can learn precise and explicit representations of formulas in Satisfiability Modulo Theories (SMT) for loop invariants from program execution traces. We develop a new sound and complete semantic mapping for assigning SMT formulas to continuous truth values that allows CLNs to be trained efficiently. We use CLNs to implement a new inference system for loop invariants, CLN2INV, that significantly outperforms existing approaches on the popular Code2Inv dataset. CLN2INV is the first tool to solve all 124 theoretically solvable problems in the Code2Inv dataset. Moreover, CLN2INV takes only 1.1 second on average for each problem, which is 40 times faster than existing approaches. We further demonstrate that CLN2INV can even learn 12 significantly more complex loop invariants than the ones required for the Code2Inv dataset.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJlfuTEtvB
PDF	https://openreview.net/pdf?id=HJlfuTEtvB
PWC	https://paperswithcode.com/paper/cln2inv-learning-loop-invariants-with-1
Repo
Framework

Understanding Attention Mechanisms


Title	Understanding Attention Mechanisms
Authors	Anonymous
Abstract	Attention mechanisms have advanced the state of the art in several machine learning tasks. Despite significant empirical gains, there is a lack of theoretical analyses on understanding their effectiveness. In this paper, we address this problem by studying the landscape of population and empirical loss functions of attention-based neural networks. Our results show that, under mild assumptions, every local minimum of a two-layer global attention model has low prediction error, and attention models require lower sample complexity than models not employing attention. We then extend our analyses to the popular self-attention model, proving that they deliver consistent predictions with a more expressive class of functions. Additionally, our theoretical results provide several guidelines for designing attention mechanisms. Our findings are validated with satisfactory experimental results on MNIST and IMDB reviews dataset.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BylDrRNKvH
PDF	https://openreview.net/pdf?id=BylDrRNKvH
PWC	https://paperswithcode.com/paper/understanding-attention-mechanisms
Repo
Framework

Ecological Reinforcement Learning


Title	Ecological Reinforcement Learning
Authors	Anonymous
Abstract	Reinforcement learning algorithms have been shown to effectively learn tasks in a variety of static, deterministic, and simplistic environments, but their application to environments which are characteristic of dynamic lifelong settings encountered in the real world has been limited. Understanding the impact of specific environmental properties on the learning dynamics of reinforcement learning algorithms is important as we want to align the environments in which we develop our algorithms with the real world, and this is strongly coupled with the type of intelligence which can be learned. In this work, we study what we refer to as ecological reinforcement learning: the interaction between properties of the environment and the reinforcement learning agent. To this end, we introduce environments with characteristics that we argue better reflect natural environments: non-episodic learning, uninformative ``fundamental drive’’ reward signals, and natural dynamics that cause the environment to change even when the agent fails to take intelligent actions. We show these factors can have a profound effect on the learning progress of reinforcement learning algorithms. Surprisingly, we find that these seemingly more challenging learning conditions can often make reinforcement learning agents learn more effectively. Through this study, we hope to shift the focus of the community towards learning in realistic, natural environments with dynamic elements. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1xxx64YwH
PDF	https://openreview.net/pdf?id=S1xxx64YwH
PWC	https://paperswithcode.com/paper/ecological-reinforcement-learning
Repo
Framework

Adversarial Policies: Attacking Deep Reinforcement Learning


Title	Adversarial Policies: Attacking Deep Reinforcement Learning
Authors	Anonymous
Abstract	Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify another agent’s observations. This might lead one to wonder: is it possible to attack an RL agent simply by choosing an adversarial policy acting in a multi-agent environment so as to create natural observations that are adversarial? We demonstrate the existence of adversarial policies in zero-sum games between simulated humanoid robots with proprioceptive observations, against state-of-the-art victims trained via self-play to be robust to opponents. The adversarial policies reliably win against the victims but generate seemingly random and uncoordinated behavior. We find that these policies are more successful in high-dimensional environments, and induce substantially different activations in the victim policy network than when the victim plays against a normal opponent. Videos are available at https://attackingrl.github.io.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJgEMpVFwB
PDF	https://openreview.net/pdf?id=HJgEMpVFwB
PWC	https://paperswithcode.com/paper/adversarial-policies-attacking-deep-1
Repo
Framework

Curriculum Learning for Deep Generative Models with Clustering


Title	Curriculum Learning for Deep Generative Models with Clustering
Authors	Anonymous
Abstract	Training generative models like Generative Adversarial Network (GAN) is challenging for noisy data. A novel curriculum learning algorithm pertaining to clustering is proposed to address this issue in this paper. The curriculum construction is based on the centrality of underlying clusters in data points. The data points of high centrality takes priority of being fed into generative models during training. To make our algorithm scalable to large-scale data, the active set is devised, in the sense that every round of training proceeds only on an active subset containing a small fraction of already trained data and the incremental data of lower centrality. Moreover, the geometric analysis is presented to interpret the necessity of cluster curriculum for generative models. The experiments on cat and human-face data validate that our algorithm is able to learn the optimal generative models (e.g. ProGAN) with respect to specified quality metrics for noisy data. An interesting finding is that the optimal cluster curriculum is closely related to the critical point of the geometric percolation process formulated in the paper.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BklTQCEtwH
PDF	https://openreview.net/pdf?id=BklTQCEtwH
PWC	https://paperswithcode.com/paper/curriculum-learning-for-deep-generative-1
Repo
Framework

Universal Approximation with Deep Narrow Networks


Title	Universal Approximation with Deep Narrow Networks
Authors	Anonymous
Abstract	The classical Universal Approximation Theorem certifies that the universal approximation property holds for the class of neural networks of arbitrary width. Here we consider the natural dual' theorem for width-bounded networks of arbitrary depth. Precisely, let $n$ be the number of inputs neurons, $m$ be the number of output neurons, and let $\rho$ be any nonaffine continuous function, with a continuous nonzero derivative at some point. Then we show that the class of neural networks of arbitrary depth, width $n + m + 2$, and activation function $\rho$, exhibits the universal approximation property with respect to the uniform norm on compact subsets of $\mathbb{R}^n$. This covers every activation function possible to use in practice; in particular this includes polynomial activation functions, making this genuinely different to the classical case. We go on to consider extensions of this result. First we show an analogous result for a certain class of nowhere differentiable activation functions. Second we establish an analogous result for noncompact domains, by showing that deep narrow networks with the ReLU activation function exhibit the universal approximation property with respect to the $p$-norm on $\mathbb{R}^n$. Finally we show that width of only $n + m + 1$ suffices for most’ activation functions.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1xGGTEtDH
PDF	https://openreview.net/pdf?id=B1xGGTEtDH
PWC	https://paperswithcode.com/paper/universal-approximation-with-deep-narrow-1
Repo
Framework