Paper Group NANR 43
Continual Learning via Neural Pruning. apPILcation: an Android-based Tool for Learning Mansi. Quantum Graph Neural Networks. ShardNet: One Filter Set to Rule Them All. Balancing Cost and Benefit with Tied-Multi Transformers. AdaScale SGD: A Scale-Invariant Algorithm for Distributed Training. Visual Representation Learning with 3D View-Constrastive …
Continual Learning via Neural Pruning
Title | Continual Learning via Neural Pruning |
Authors | Anonymous |
Abstract | We introduce Continual Learning via Neural Pruning~(CLNP), a new method aimed at lifelong learning in fixed capacity models based on neuronal model sparsification. In this method, subsequent tasks are trained using the inactive neurons and filters of the sparsified network and cause zero deterioration to the performance of previous tasks. In order to deal with the possible compromise between model sparsity and performance, we formalize and incorporate the concept of \emph{graceful forgetting}: the idea that it is preferable to suffer a small amount of forgetting in a controlled manner if it helps regain network capacity and prevents uncontrolled loss of performance during the training of future tasks. CLNP also provides simple continual learning diagnostic tools in terms of the number of free neurons left for the training of future tasks as well as the number of neurons that are being reused. In particular, we see in experiments that CLNP verifies and automatically takes advantage of the fact that the features of earlier layers are more transferable. We show empirically that CLNP leads to significantly improved results over current weight elasticity based methods. CLNP can also be applied in single-head architectures providing the first viable such algorithm for continual learning. |
Tasks | Continual Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkeJm6VtPH |
https://openreview.net/pdf?id=BkeJm6VtPH | |
PWC | https://paperswithcode.com/paper/continual-learning-via-neural-pruning-1 |
Repo | |
Framework | |
apPILcation: an Android-based Tool for Learning Mansi
Title | apPILcation: an Android-based Tool for Learning Mansi |
Authors | G{'a}bor Bob{'a}ly, Csilla Horv{'a}th, Veronika Vincze |
Abstract | |
Tasks | |
Published | 2020-10-01 |
URL | https://www.aclweb.org/anthology/2020.iwclul-1.7/ |
https://www.aclweb.org/anthology/2020.iwclul-1.7 | |
PWC | https://paperswithcode.com/paper/appilcation-an-android-based-tool-for |
Repo | |
Framework | |
Quantum Graph Neural Networks
Title | Quantum Graph Neural Networks |
Authors | Anonymous |
Abstract | We introduce Quantum Graph Neural Networks (QGNN), a new class of quantum neural network ansatze which are tailored to represent quantum processes which have a graph structure, and are particularly suitable to be executed on distributed quantum systems over a quantum network. Along with this general class of ansatze, we introduce further specialized architectures, namely, Quantum Graph Recurrent Neural Networks (QGRNN) and Quantum Graph Convolutional Neural Networks (QGCNN). We provide four example applications of QGNN’s: learning Hamiltonian dynamics of quantum systems, learning how to create multipartite entanglement in a quantum network, unsupervised learning for spectral clustering, and supervised learning for graph isomorphism classification. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkxZveSFDS |
https://openreview.net/pdf?id=rkxZveSFDS | |
PWC | https://paperswithcode.com/paper/quantum-graph-neural-networks |
Repo | |
Framework | |
ShardNet: One Filter Set to Rule Them All
Title | ShardNet: One Filter Set to Rule Them All |
Authors | Anonymous |
Abstract | Deep CNNs have achieved state-of-the-art performance for numerous machine learning and computer vision tasks in recent years, but as they have become increasingly deep, the number of parameters they use has also increased, making them hard to deploy in memory-constrained environments and difficult to interpret. Machine learning theory implies that such networks are highly over-parameterised and that it should be possible to reduce their size without sacrificing accuracy, and indeed many recent studies have begun to highlight specific redundancies that can be exploited to achieve this. In this paper, we take a further step in this direction by proposing a filter-sharing approach to compressing deep CNNs that reduces their memory footprint by repeatedly applying a single convolutional mapping of learned filters to simulate a CNN pipeline. We show, via experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet that this allows us to reduce the parameter counts of networks based on common designs such as VGGNet and ResNet by a factor proportional to their depth, whilst leaving their accuracy largely unaffected. At a broader level, our approach also indicates how the scale-space regularities found in visual signals can be leveraged to build neural architectures that are more parsimonious and interpretable. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1xRxgSFvH |
https://openreview.net/pdf?id=S1xRxgSFvH | |
PWC | https://paperswithcode.com/paper/shardnet-one-filter-set-to-rule-them-all |
Repo | |
Framework | |
Balancing Cost and Benefit with Tied-Multi Transformers
Title | Balancing Cost and Benefit with Tied-Multi Transformers |
Authors | Anonymous |
Abstract | This paper proposes a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding. In sequence-to-sequence modeling, typically, the output of the last layer of the N-layer encoder is fed to the M-layer decoder, and the output of the last decoder layer is used to compute loss. Instead, our method computes a single loss consisting of NxM losses, where each loss is computed from the output of one of the M decoder layers connected to one of the N encoder layers. A single model trained by our method subsumes multiple models with different number of encoder and decoder layers, and can be used for decoding with fewer than the maximum number of encoder and decoder layers. We then propose a mechanism to choose a priori the number of encoder and decoder layers for faster decoding, and also explore recurrent stacking of layers and knowledge distillation to enable further parameter reduction. In a case study of neural machine translation, we present a cost-benefit analysis of the proposed approaches and empirically show that they greatly reduce decoding costs while preserving translation quality. |
Tasks | Machine Translation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BygKZkBtDH |
https://openreview.net/pdf?id=BygKZkBtDH | |
PWC | https://paperswithcode.com/paper/balancing-cost-and-benefit-with-tied-multi |
Repo | |
Framework | |
AdaScale SGD: A Scale-Invariant Algorithm for Distributed Training
Title | AdaScale SGD: A Scale-Invariant Algorithm for Distributed Training |
Authors | Anonymous |
Abstract | When using distributed training to speed up stochastic gradient descent, learning rates must adapt to new scales in order to maintain training effectiveness. Re-tuning these parameters is resource intensive, while fixed scaling rules often degrade model quality. We propose AdaScale SGD, a practical and principled algorithm that is approximately scale invariant. By continually adapting to the gradient’s variance, AdaScale often trains at a wide range of scales with nearly identical results. We describe this invariance formally through AdaScale’s convergence bounds. As the batch size increases, the bounds maintain final objective values, while smoothly transitioning away from linear speed-ups. In empirical comparisons, AdaScale trains well beyond the batch size limits of popular “linear learning rate scaling” rules. This includes large-scale training without model degradation for machine translation, image classification, object detection, and speech recognition tasks. The algorithm introduces negligible computational overhead and no tuning parameters, making AdaScale an attractive choice for large-scale training. |
Tasks | Image Classification, Machine Translation, Object Detection, Speech Recognition |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rygxdA4YPS |
https://openreview.net/pdf?id=rygxdA4YPS | |
PWC | https://paperswithcode.com/paper/adascale-sgd-a-scale-invariant-algorithm-for |
Repo | |
Framework | |
Visual Representation Learning with 3D View-Constrastive Inverse Graphics Networks
Title | Visual Representation Learning with 3D View-Constrastive Inverse Graphics Networks |
Authors | Anonymous |
Abstract | Predictive coding theories suggest that the brain learns by predicting observations at various levels of abstraction. One of the most basic prediction tasks is view prediction: how would a given scene look from an alternative viewpoint? Humans excel at this task. Our ability to imagine and fill in missing visual information is tightly coupled with perception: we feel as if we see the world in 3 dimensions, while in fact, information from only the front surface of the world hits our (2D) retinas. This paper explores the connection between view-predictive representation learning and its role in the development of 3D visual recognition. We propose inverse graphics networks, which take as input 2.5D video streams captured by a moving camera, and map to stable 3D feature maps of the scene, by disentangling the scene content from the motion of the camera. The model can also project its 3D feature maps to novel viewpoints, to predict and match against target views. We propose contrastive prediction losses that can handle stochasticity of the visual input and can scale view-predictive learning to more photorealistic scenes than those considered in previous works. We show that the proposed model learns 3D visual representations useful for (1) semi-supervised learning of 3D object detectors, and (2) unsupervised learning of 3D moving object detectors, by estimating motion of the inferred 3D feature maps in videos of dynamic scenes. To the best of our knowledge, this is the first work that empirically shows view prediction to be a useful and scalable self-supervised task beneficial to 3D object detection. |
Tasks | 3D Object Detection, Object Detection, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxt60VtPr |
https://openreview.net/pdf?id=BJxt60VtPr | |
PWC | https://paperswithcode.com/paper/visual-representation-learning-with-3d-view |
Repo | |
Framework | |
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering
Title | Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering |
Authors | Anonymous |
Abstract | Answering questions that require multi-hop reasoning at web-scale requires retrieving multiple evidence documents, one of which often has little lexical or semantic relationship to the question. This paper introduces a new graph-based recurrent retrieval approach that learns to retrieve reasoning paths over the Wikipedia graph to answer multi-hop open-domain questions. Our retriever trains a recurrent neural network that learns to sequentially retrieve evidence documents in the reasoning path by conditioning on the previously retrieved documents. Our reader ranks the reasoning paths and extracts the answer span included in the best reasoning path. Experimental results demonstrate state-of-the-art results in two open-domain QA datasets showcasing the robustness of our method. Notably, our method achieves significant improvement in HotpotQA fullwiki and distractor settings, outperforming the previous best model by more than 10 points. |
Tasks | Question Answering |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJgVHkrYDH |
https://openreview.net/pdf?id=SJgVHkrYDH | |
PWC | https://paperswithcode.com/paper/learning-to-retrieve-reasoning-paths-over |
Repo | |
Framework | |
Accelerating First-Order Optimization Algorithms
Title | Accelerating First-Order Optimization Algorithms |
Authors | Anonymous |
Abstract | Several stochastic optimization algorithms are currently available. In most cases, selecting the best optimizer for a given problem is not an easy task. Therefore, instead of looking for yet another ’absolute’ best optimizer, accelerating existing ones according to the context might prove more effective. This paper presents a simple and intuitive technique to accelerate first-order optimization algorithms. When applied to first-order optimization algorithms, it converges much more quickly and achieves lower function/loss values when compared to traditional algorithms. The proposed solution modifies the update rule, based on the variation of the direction of the gradient during training. Several tests were conducted with SGD, AdaGrad, Adam and AMSGrad on three public datasets. Results clearly show that the proposed technique, has the potential to improve the performance of existing optimization algorithms. |
Tasks | Stochastic Optimization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkxedlrFwB |
https://openreview.net/pdf?id=HkxedlrFwB | |
PWC | https://paperswithcode.com/paper/accelerating-first-order-optimization-1 |
Repo | |
Framework | |
Defending Against Adversarial Examples by Regularized Deep Embedding
Title | Defending Against Adversarial Examples by Regularized Deep Embedding |
Authors | Anonymous |
Abstract | Recent studies have demonstrated the vulnerability of deep convolutional neural networks against adversarial examples. Inspired by the observation that the intrinsic dimension of image data is much smaller than its pixel space dimension and the vulnerability of neural networks grows with the input dimension, we propose to embed high-dimensional input images into a low-dimensional space to perform classification. However, arbitrarily projecting the input images to a low-dimensional space without regularization will not improve the robustness of deep neural networks. We propose a new framework, Embedding Regularized Classifier (ER-Classifier), which improves the adversarial robustness of the classifier through embedding regularization. Experimental results on several benchmark datasets show that, our proposed framework achieves state-of-the-art performance against strong adversarial attack methods. |
Tasks | Adversarial Attack |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BygpAp4Ywr |
https://openreview.net/pdf?id=BygpAp4Ywr | |
PWC | https://paperswithcode.com/paper/defending-against-adversarial-examples-by |
Repo | |
Framework | |
Emergent Tool Use From Multi-Agent Autocurricula
Title | Emergent Tool Use From Multi-Agent Autocurricula |
Authors | Anonymous |
Abstract | Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. We find clear evidence of six emergent phases in agent strategy in our environment, each of which creates a new pressure for the opposing team to adapt; for instance, agents learn to build multi-object shelters using moveable boxes which in turn leads to agents discovering that they can overcome obstacles using ramps. We further provide evidence that multi-agent competition may scale better with increasing environment complexity and leads to behavior that centers around far more human-relevant skills than other self-supervised reinforcement learning methods such as intrinsic motivation. Finally, we propose transfer and fine-tuning as a way to quantitatively evaluate targeted capabilities, and we compare hide-and-seek agents to both intrinsic motivation and random initialization baselines in a suite of domain-specific intelligence tests. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkxpxJBKwS |
https://openreview.net/pdf?id=SkxpxJBKwS | |
PWC | https://paperswithcode.com/paper/emergent-tool-use-from-multi-agent-1 |
Repo | |
Framework | |
On the Anomalous Generalization of GANs
Title | On the Anomalous Generalization of GANs |
Authors | Anonymous |
Abstract | Generative models, especially Generative Adversarial Networks (GANs), have received significant attention recently. However, it has been observed that in terms of some attributes, \emph{e.g.} the number of simple geometric primitives in an image, GANs are not able to learn the target distribution in practice. Motivated by this observation, we discover two specific problems of GANs leading to anomalous generalization behaviour, which we refer to as the sample insufficiency and the pixel-wise combination. For the first problem of sample insufficiency, we show theoretically and empirically that the batchsize of the training samples in practice may be insufficient for the discriminator to learn an accurate discrimination function. It could result in unstable training dynamics for the generator, leading to anomalous generalization. For the second problem of pixel-wise combination, we find that besides recognizing the positive training samples as real, under certain circumstances, the discriminator could be fooled to recognize the pixel-wise combinations (\emph{e.g.} pixel-wise average) of the positive training samples as real. However, those combinations could be visually different from the real samples in the target distribution. With the fooled discriminator as reference, the generator would obtain biased supervision further, leading to the anomalous generalization behaviour. Additionally, in this paper, we propose methods to mitigate the anomalous generalization of GANs. Extensive experiments on benchmark show our proposed methods improve the FID score up to 30% on natural image dataset. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJldrxBtwB |
https://openreview.net/pdf?id=BJldrxBtwB | |
PWC | https://paperswithcode.com/paper/on-the-anomalous-generalization-of-gans |
Repo | |
Framework | |
Towards A Unified Min-Max Framework for Adversarial Exploration and Robustness
Title | Towards A Unified Min-Max Framework for Adversarial Exploration and Robustness |
Authors | Anonymous |
Abstract | The worst-case training principle that minimizes the maximal adversarial loss, also known as adversarial training (AT), has shown to be a state-of-the-art approach for enhancing adversarial robustness against norm-ball bounded input perturbations. Nonetheless, min-max optimization beyond the purpose of AT has not been rigorously explored in the research of adversarial attack and defense. In particular, given a set of risk sources (domains), minimizing the maximal loss induced from the domain set can be reformulated as a general min-max problem that is different from AT. Examples of this general formulation include attacking model ensembles, devising universal perturbation under multiple inputs or data transformations, and generalized AT over different types of attack models. We show that these problems can be solved under a unified and theoretically principled min-max optimization framework. We also show that the self-adjusted domain weights learned from our method provides a means to explain the difficulty level of attack and defense over multiple domains. Extensive experiments show that our approach leads to substantial performance improvement over the conventional averaging strategy. |
Tasks | Adversarial Attack |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1eik6EtPB |
https://openreview.net/pdf?id=S1eik6EtPB | |
PWC | https://paperswithcode.com/paper/towards-a-unified-min-max-framework-for |
Repo | |
Framework | |
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
Title | Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks |
Authors | Anonymous |
Abstract | Recent research shows that the following two models are equivalent: (a) infinitely wide neural networks (NNs) trained under l2 loss by gradient descent with infinitesimally small learning rate (b) kernel regression with respect to so-called Neural Tangent Kernels (NTKs) (Jacot et al., 2018). An efficient algorithm to compute the NTK, as well as its convolutional counterparts, appears in Arora et al. (2019a), which allowed studying performance of infinitely wide nets on datasets like CIFAR-10. However, super-quadratic running time of kernel methods makes them best suited for small-data tasks. We report results suggesting neural tangent kernels perform strongly on low-data tasks. 1. On a standard testbed of classification/regression tasks from the UCI database, NTK SVM beats the previous gold standard, Random Forests (RF), and also the corresponding finite nets. 2. On CIFAR-10 with 10 – 640 training samples, Convolutional NTK consistently beats ResNet-34 by 1% - 3%. 3. On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance. 4. Comparing the performance of NTK with the finite-width net it was derived from, NTK behavior starts at lower net widths than suggested by theoretical analysis(Arora et al., 2019a). NTK’s efficacy may trace to lower variance of output. |
Tasks | Few-Shot Image Classification, Image Classification, Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkl8sJBYvH |
https://openreview.net/pdf?id=rkl8sJBYvH | |
PWC | https://paperswithcode.com/paper/harnessing-the-power-of-infinitely-wide-deep |
Repo | |
Framework | |
Divide-and-Conquer Adversarial Learning for High-Resolution Image Enhancement
Title | Divide-and-Conquer Adversarial Learning for High-Resolution Image Enhancement |
Authors | Anonymous |
Abstract | This paper introduces a divide-and-conquer inspired adversarial learning (DACAL) approach for photo enhancement. The key idea is to decompose the photo enhancement process into hierarchically multiple sub-problems, which can be better conquered from bottom to up. On the top level, we propose a perception-based division to learn additive and multiplicative components, required to translate a low-quality image into its high-quality counterpart. On the intermediate level, we use a frequency-based division with generative adversarial network (GAN) to weakly supervise the photo enhancement process. On the lower level, we design dimension-based division that enables the GAN model to better approximates the distribution distance on multiple independent one-dimensional data to train the GAN model. While considering all three hierarchies, we develop a multiscale training approach to optimize the image enhancement process, suitable for high-resolution images, in a weakly-supervised manner. Both quantitative and qualitative results clearly demonstrate that the proposed DACAL achieves the state-of-the-art performance for high-resolution image enhancement. |
Tasks | Image Enhancement |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJej7kBYwS |
https://openreview.net/pdf?id=BJej7kBYwS | |
PWC | https://paperswithcode.com/paper/divide-and-conquer-adversarial-learning-for |
Repo | |
Framework | |