Paper Group NANR 29
OvA-INN: Continual Learning with Invertible Neural Networks. Building Deep Equivariant Capsule Networks. Beyond Classical Diffusion: Ballistic Graph Neural Network. Masked Translation Model. Unsupervised Video-to-Video Translation via Self-Supervised Learning. Progressive Knowledge Distillation For Generative Modeling. Semi-Supervised Few-Shot Lear …
OvA-INN: Continual Learning with Invertible Neural Networks
Title | OvA-INN: Continual Learning with Invertible Neural Networks |
Authors | Anonymous |
Abstract | In the field of Continual Learning, the objective is to learn several tasks one after the other without access to the data from previous tasks. Several solutions have been proposed to tackle this problem but they usually assume that the user knows which of the tasks to perform at test time on a particular sample, or rely on small samples from previous data and most of them suffer of a substantial drop in accuracy when updated with batches of only one class at a time. In this article, we propose a new method, OvA-INN, which is able to learn one class at a time and without storing any of the previous data. To achieve this, for each class, we train a specific Invertible Neural Network to output the zero vector for its class. At test time, we can predict the class of a sample by identifying which network outputs the vector with the smallest norm. With this method, we show that we can take advantage of pretrained models by stacking an invertible network on top of a features extractor. This way, we are able to outperform state-of-the-art approaches that rely on features learning for the Continual Learning of MNIST and CIFAR-100 datasets. In our experiments, we are reaching 72% accuracy on CIFAR-100 after training our model one class at a time. |
Tasks | Continual Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJxcBpNKPr |
https://openreview.net/pdf?id=rJxcBpNKPr | |
PWC | https://paperswithcode.com/paper/ova-inn-continual-learning-with-invertible |
Repo | |
Framework | |
Building Deep Equivariant Capsule Networks
Title | Building Deep Equivariant Capsule Networks |
Authors | Anonymous |
Abstract | Capsule networks are constrained by the parameter-expensive nature of their layers, and the general lack of provable equivariance guarantees. We present a variation of capsule networks that aims to remedy this. We identify that learning all pair-wise part-whole relationships between capsules of successive layers is inefficient. Further, we also realise that the choice of prediction networks and the routing mechanism are both key to equivariance. Based on these, we propose an alternative framework for capsule networks that learns to projectively encode the manifold of pose-variations, termed the space-of-variation (SOV), for every capsule-type of each layer. This is done using a trainable, equivariant function defined over a grid of group-transformations. Thus, the prediction-phase of routing involves projection into the SOV of a deeper capsule using the corresponding function. As a specific instantiation of this idea, and also in order to reap the benefits of increased parameter-sharing, we use type-homegenous group-equivariant convolutions of shallower capsules in this phase. We also introduce an equivariant routing mechanism based on degree-centrality. We show that this particular instance of our general model is equivariant, and hence preserves the compositional representation of an input under transformations. We conduct several experiments on standard object-classification datasets that showcase the increased transformation-robustness, as well as general performance, of our model to several capsule baselines. |
Tasks | Object Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJgNJgSFPS |
https://openreview.net/pdf?id=BJgNJgSFPS | |
PWC | https://paperswithcode.com/paper/building-deep-equivariant-capsule-networks-1 |
Repo | |
Framework | |
Beyond Classical Diffusion: Ballistic Graph Neural Network
Title | Beyond Classical Diffusion: Ballistic Graph Neural Network |
Authors | Anonymous |
Abstract | This paper presents the ballistic graph neural network. Ballistic graph neural network tackles the weight distribution from a transportation perspective and has many different properties comparing to the traditional graph neural network pipeline. The ballistic graph neural network does not require to calculate any eigenvalue. The filters propagate exponentially faster($\sigma^2 \sim T^2$) comparing to traditional graph neural network($\sigma^2 \sim T$). We use a perturbed coin operator to perturb and optimize the diffusion rate. Our results show that by selecting the diffusion speed, the network can reach a similar accuracy with fewer parameters. We also show the perturbed filters act as better representations comparing to pure ballistic ones. We provide a new perspective of training graph neural network, by adjusting the diffusion rate, the neural network’s performance can be improved. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1gV3nVKPS |
https://openreview.net/pdf?id=r1gV3nVKPS | |
PWC | https://paperswithcode.com/paper/beyond-classical-diffusion-ballistic-graph |
Repo | |
Framework | |
Masked Translation Model
Title | Masked Translation Model |
Authors | Anonymous |
Abstract | We introduce the masked translation model (MTM) which combines encoding and decoding of sequences within the same model component. The MTM is based on the idea of masked language modeling and supports both autoregressive and non-autoregressive decoding strategies by simply changing the order of masking. In experiments on the WMT 2016 Romanian-English task, the MTM shows strong constant-time translation performance, beating all related approaches with comparable complexity. We also extensively compare various decoding strategies supported by the MTM, as well as several length modeling techniques and training settings. |
Tasks | Language Modelling |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HygaSxHYvH |
https://openreview.net/pdf?id=HygaSxHYvH | |
PWC | https://paperswithcode.com/paper/masked-translation-model |
Repo | |
Framework | |
Unsupervised Video-to-Video Translation via Self-Supervised Learning
Title | Unsupervised Video-to-Video Translation via Self-Supervised Learning |
Authors | Anonymous |
Abstract | Existing unsupervised video-to-video translation methods fail to produce translated videos which are frame-wise realistic, semantic information preserving and video-level consistent. In this work, we propose a novel unsupervised video-to-video translation model. Our model decomposes the style and the content, uses specialized encoder-decoder structure and propagates the inter-frame information through bidirectional recurrent neural network (RNN) units. The style-content decomposition mechanism enables us to achieve long-term style-consistent video translation results as well as provides us with a good interface for modality flexible translation. In addition, by changing the input frames and style codes incorporated in our translation, we propose a video interpolation loss, which captures temporal information within the sequence to train our building blocks in a self-supervised manner. Our model can produce photo-realistic, spatio-temporal consistent translated videos in a multimodal way. Subjective and objective experimental results validate the superiority of our model over the existing methods. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkevCyrFDS |
https://openreview.net/pdf?id=HkevCyrFDS | |
PWC | https://paperswithcode.com/paper/unsupervised-video-to-video-translation-via |
Repo | |
Framework | |
Progressive Knowledge Distillation For Generative Modeling
Title | Progressive Knowledge Distillation For Generative Modeling |
Authors | Anonymous |
Abstract | While modern generative models are able to synthesize high-fidelity, visually appealing images, successfully generating examples that are useful for recognition tasks remains an elusive goal. To this end, our key insight is that the examples should be synthesized to recover classifier decision boundaries that would be learned from a large amount of real examples. More concretely, we treat a classifier trained on synthetic examples as ‘‘student’’ and a classifier trained on real examples as ‘‘teacher’'. By introducing knowledge distillation into a meta-learning framework, we encourage the generative model to produce examples in a way that enables the student classifier to mimic the behavior of the teacher. To mitigate the potential gap between student and teacher classifiers, we further propose to distill the knowledge in a progressive manner, either by gradually strengthening the teacher or weakening the student. We demonstrate the use of our model-agnostic distillation approach to deal with data scarcity, significantly improving few-shot learning performance on miniImageNet and ImageNet1K benchmarks. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByxV2kBYwB |
https://openreview.net/pdf?id=ByxV2kBYwB | |
PWC | https://paperswithcode.com/paper/progressive-knowledge-distillation-for |
Repo | |
Framework | |
Semi-Supervised Few-Shot Learning with a Controlled Degree of Task-Adaptive Conditioning
Title | Semi-Supervised Few-Shot Learning with a Controlled Degree of Task-Adaptive Conditioning |
Authors | Anonymous |
Abstract | Few-shot learning aims to handle previously unseen tasks using only a small amount of new training data. In preparing (or meta-training) a few-shot learner, however, massive labeled data are necessary. In the real world, unfortunately, labeled data are expensive and/or scarce. In this work, we propose a few-shot learner that can work well under the semi-supervised setting where a large portion of training data is unlabeled. Our method employs explicit task-conditioning in which unlabeled sample clustering for the current task takes place in a new projection space different from the embedding feature space. The conditioned clustering space is linearly constructed so as to quickly close the gap between the class centroids for the current task and the independent per-class reference vectors meta-trained across tasks. In a more general setting, our method introduces a concept of controlling the degree of task-conditioning for meta-learning: the amount of task-conditioning varies with the number of repetitive updates for the clustering space. During each update, the soft labels of the unlabeled samples estimated in the conditioned clustering space are used to update the class averages in the original embedded space, which in turn are used to reconstruct the clustering space. Extensive simulation results based on the miniImageNet and tieredImageNet datasets show state-of-the-art semi-supervised few-shot classification performance of the proposed method. Simulation results also indicate that the proposed task-adaptive clustering shows graceful degradation with a growing number of distractor samples, i.e., unlabeled samples coming from outside the candidate classes. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1xpI1BFDS |
https://openreview.net/pdf?id=B1xpI1BFDS | |
PWC | https://paperswithcode.com/paper/semi-supervised-few-shot-learning-with-a |
Repo | |
Framework | |
Data augmentation instead of explicit regularization
Title | Data augmentation instead of explicit regularization |
Authors | Anonymous |
Abstract | Modern deep artificial neural networks have achieved impressive results through models with orders of magnitude more parameters than training examples which control overfitting with the help of regularization. Regularization can be implicit, as is the case of stochastic gradient descent and parameter sharing in convolutional layers, or explicit. Explicit regularization techniques, most common forms are weight decay and dropout, have proven successful in terms of improved generalization, but they blindly reduce the effective capacity of the model, introduce sensitive hyper-parameters and require deeper and wider architectures to compensate for the reduced capacity. In contrast, data augmentation techniques exploit domain knowledge to increase the number of training examples and improve generalization without reducing the effective capacity and without introducing model-dependent parameters, since it is applied on the training data. In this paper we systematically contrast data augmentation and explicit regularization on three popular architectures and three data sets. Our results demonstrate that data augmentation alone can achieve the same performance or higher as regularized models and exhibits much higher adaptability to changes in the architecture and the amount of training data. |
Tasks | Data Augmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1eqOnNYDH |
https://openreview.net/pdf?id=H1eqOnNYDH | |
PWC | https://paperswithcode.com/paper/data-augmentation-instead-of-explicit-2 |
Repo | |
Framework | |
HIPPOCAMPAL NEURONAL REPRESENTATIONS IN CONTINUAL LEARNING
Title | HIPPOCAMPAL NEURONAL REPRESENTATIONS IN CONTINUAL LEARNING |
Authors | Anonymous |
Abstract | The hippocampus has long been associated with spatial memory and goal-directed spatial navigation. However, the region’s independent role in continual learning of navigational strategies has seldom been investigated. Here we analyse populationlevel activity of hippocampal CA1 neurons in the context of continual learning of two different spatial navigation strategies. Demixed Principal Component Analysis (dPCA) is applied on neuronal recordings from 612 hippocampal CA1 neurons of rodents learning to perform allocentric and egocentric spatial tasks. The components uncovered using dPCA from the firing activity reveal that hippocampal neurons encode relevant task variables such decisions, navigational strategies and reward location. We compare this hippocampal features with standard reinforcement learning algorithms, highlighting similarities and differences. Finally, we demonstrate that a standard deep reinforcement learning model achieves similar average performance when compared to animal learning, but fails to mimic animals during task switching. Overall, our results gives insights into how the hippocampus solves reinforced spatial continual learning, and puts forward a framework to explicitly compare biological and machine learning during spatial continual learning. |
Tasks | Continual Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Syxi6grFwH |
https://openreview.net/pdf?id=Syxi6grFwH | |
PWC | https://paperswithcode.com/paper/hippocampal-neuronal-representations-in |
Repo | |
Framework | |
Unifying Graph Convolutional Networks as Matrix Factorization
Title | Unifying Graph Convolutional Networks as Matrix Factorization |
Authors | Zhaocheng Liu, Qiang Liu, Haoli Zhang, Jun Zhu |
Abstract | In recent years, substantial progress has been made on graph convolutional networks (GCN). In this paper, for the first time, we theoretically analyze the connections between GCN and matrix factorization (MF), and unify GCN as matrix factorization with co-training and unitization. Moreover, under the guidance of this theoretical analysis, we propose an alternative model to GCN named Co-training and Unitized Matrix Factorization (CUMF). The correctness of our analysis is verified by thorough experiments. The experimental results show that CUMF achieves similar or superior performances compared to GCN. In addition, CUMF inherits the benefits of MF-based methods to naturally support constructing mini-batches, and is more friendly to distributed computing comparing with GCN. The distributed CUMF on semi-supervised node classification significantly outperforms distributed GCN methods. Thus, CUMF greatly benefits large scale and complex real-world applications. |
Tasks | Node Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJxf53EtDr |
https://openreview.net/pdf?id=HJxf53EtDr | |
PWC | https://paperswithcode.com/paper/unifying-graph-convolutional-networks-as |
Repo | |
Framework | |
Isolating Latent Structure with Cross-population Variational Autoencoders
Title | Isolating Latent Structure with Cross-population Variational Autoencoders |
Authors | Anonymous |
Abstract | A significant body of recent work has examined variational autoencoders as a powerful approach for tasks which involve modeling the distribution of complex data such as images and text. In this work, we present a framework for modeling multiple data sets which come from differing distributions but which share some common latent structure. By incorporating architectural constraints and using a mutual information regularized form of the variational objective, our method successfully models differing data populations while explicitly encouraging the isolation of the shared and private latent factors. This enables our model to learn useful shared structure across similar tasks and to disentangle cross-population representations in a weakly supervised way. We demonstrate the utility of our method on several applications including image denoising, sub-group discovery, and continual learning. |
Tasks | Continual Learning, Denoising, Image Denoising |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1eWdlBFwS |
https://openreview.net/pdf?id=r1eWdlBFwS | |
PWC | https://paperswithcode.com/paper/isolating-latent-structure-with-cross |
Repo | |
Framework | |
Discriminative Variational Autoencoder for Continual Learning with Generative Replay
Title | Discriminative Variational Autoencoder for Continual Learning with Generative Replay |
Authors | Anonymous |
Abstract | Generative replay (GR) is a method to alleviate catastrophic forgetting in continual learning (CL) by generating previous task data and learning them together with the data from new tasks. In this paper, we propose discriminative variational autoencoder (DiVA) to address the GR-based CL problem. DiVA has class-wise discriminative latent embeddings by maximizing the mutual information between classes and latent variables of VAE. Thus, DiVA is directly applicable to classification and class-conditional generation which are efficient and effective properties in the GR-based CL scenario. Furthermore, we use a novel trick based on domain translation to cover natural images which is challenging to GR-based methods. As a result, DiVA achieved the competitive or higher accuracy compared to state-of-the-art algorithms in Permuted MNIST, Split MNIST, and Split CIFAR10 settings. |
Tasks | Continual Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJxjPxSYDH |
https://openreview.net/pdf?id=SJxjPxSYDH | |
PWC | https://paperswithcode.com/paper/discriminative-variational-autoencoder-for |
Repo | |
Framework | |
Distribution-Guided Local Explanation for Black-Box Classifiers
Title | Distribution-Guided Local Explanation for Black-Box Classifiers |
Authors | Anonymous |
Abstract | Existing local explanation methods provide an explanation for each decision of black-box classifiers, in the form of relevance scores of features according to their contributions. To obtain satisfying explainability, many methods introduce ad hoc constraints into the classification loss to regularize these relevance scores. However, the large information gap between the classification loss and these constraints increases the difficulty of tuning hyper-parameters. To bridge this gap, in this paper we present a simple but effective mask predictor. Specifically, we model the above constraints with a distribution controller, and integrate it with a neural network to directly guide the distribution of relevance scores. The benefit of this strategy is to facilitate the setting of involved hyper-parameters, and enable discriminative scores over supporting features. The experimental results demonstrate that our method outperforms others in terms of faithfulness and explainability. Meanwhile, it also provides effective saliency maps for explaining each decision. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJgFjREtwr |
https://openreview.net/pdf?id=rJgFjREtwr | |
PWC | https://paperswithcode.com/paper/distribution-guided-local-explanation-for |
Repo | |
Framework | |
Noisy Machines: Understanding noisy neural networks and enhancing robustness to analog hardware errors using distillation
Title | Noisy Machines: Understanding noisy neural networks and enhancing robustness to analog hardware errors using distillation |
Authors | Anonymous |
Abstract | The success of deep learning has brought forth a wave of interest in computer hardware design to better meet the high demands of neural network inference. In particular, analog computing hardware has been heavily motivated specifically for accelerating neural networks, based on either electronic, optical or photonic devices, which may well achieve lower power consumption than conventional digital electronics. However, these proposed analog accelerators suffer from the intrinsic noise generated by their physical components, which makes it challenging to achieve high accuracy on deep neural networks. Hence, for successful deployment on analog accelerators, it is essential to be able to train deep neural networks to be robust to random continuous noise in the network weights, which is a somewhat new challenge in machine learning. In this paper, we advance the understanding of noisy neural networks. We outline how a noisy neural network has reduced learning capacity as a result of loss of mutual information between its input and output. To combat this, we propose using knowledge distillation combined with noise injection during training to achieve more noise robust networks, which is demonstrated experimentally across different networks and datasets, including ImageNet. Our method achieves models with as much as 2X greater noise tolerance compared with the previous best attempts, which is a significant step towards making analog hardware practical for deep learning. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BklxN0NtvB |
https://openreview.net/pdf?id=BklxN0NtvB | |
PWC | https://paperswithcode.com/paper/noisy-machines-understanding-noisy-neural |
Repo | |
Framework | |
STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION
Title | STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION |
Authors | Anonymous |
Abstract | ResNet structure has achieved great success since its debut. In this paper, we study the stability of learning ResNet. Specifically, we consider the ResNet block $h_l = \phi(h_{l-1}+\tau\cdot g(h_{l-1}))$ where $\phi(\cdot)$ is ReLU activation and $\tau$ is a scalar. We show that for standard initialization used in practice, $\tau =1/\Omega(\sqrt{L})$ is a sharp value in characterizing the stability of forward/backward process of ResNet, where $L$ is the number of residual blocks. Specifically, stability is guaranteed for $\tau\le 1/\Omega(\sqrt{L})$ while conversely forward process explodes when $\tau>L^{-\frac{1}{2}+c}$ for a positive constant $c$. Moreover, if ResNet is properly over-parameterized, we show for $\tau \le 1/\tilde{\Omega}(\sqrt{L})$ gradient descent is guaranteed to find the global minima \footnote{We use $\tilde{\Omega}(\cdot)$ to hide logarithmic factor.}, which significantly enlarges the range of $\tau\le 1/\tilde{\Omega}(L)$ that admits global convergence in previous work. We also demonstrate that the over-parameterization requirement of ResNet only weakly depends on the depth, which corroborates the advantage of ResNet over vanilla feedforward network. Empirically, with $\tau\le1/\sqrt{L}$, deep ResNet can be easily trained even without normalization layer. Moreover, adding $\tau=1/\sqrt{L}$ can also improve the performance of ResNet with normalization layer. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJe-oRVtPB |
https://openreview.net/pdf?id=HJe-oRVtPB | |
PWC | https://paperswithcode.com/paper/stability-and-convergence-theory-for-learning |
Repo | |
Framework | |