Paper Group NANR 117
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding. Masked Based Unsupervised Content Transfer. Wasserstein Adversarial Regularization (WAR) on label noise. Understanding and Improving Information Transfer in Multi-Task Learning. On the Pareto Efficiency of Quantized CNN. Spike-based causal inference for …
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
Title | StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding |
Authors | Anonymous |
Abstract | Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman, we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 89.0 (outperforming all published models), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7. |
Tasks | Language Modelling, Natural Language Inference, Question Answering, Semantic Textual Similarity, Sentiment Analysis |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJgQ4lSFPH |
https://openreview.net/pdf?id=BJgQ4lSFPH | |
PWC | https://paperswithcode.com/paper/structbert-incorporating-language-structures-1 |
Repo | |
Framework | |
Masked Based Unsupervised Content Transfer
Title | Masked Based Unsupervised Content Transfer |
Authors | Anonymous |
Abstract | We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other. The proposed method disentangles the common and separate parts of these domains and, through the generation of a mask, focuses the attention of the underlying network to the desired augmentation alone, without wastefully reconstructing the entire target. This enables state-of-the-art quality and variety of content translation, as demonstrated through extensive quantitative and qualitative evaluation. Our method is also capable of adding the separate content of different guide images and domains as well as remove existing separate content. Furthermore, our method enables weakly-supervised semantic segmentation of the separate part of each domain, where only class labels are provided. Our code is available anonymously at http://bit.ly/2mXTizX. |
Tasks | Semantic Segmentation, Weakly-Supervised Semantic Segmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJe-91BtvH |
https://openreview.net/pdf?id=BJe-91BtvH | |
PWC | https://paperswithcode.com/paper/masked-based-unsupervised-content-transfer |
Repo | |
Framework | |
Wasserstein Adversarial Regularization (WAR) on label noise
Title | Wasserstein Adversarial Regularization (WAR) on label noise |
Authors | Anonymous |
Abstract | Noisy labels often occur in vision datasets, especially when they are obtained from crowdsourcing or Web scraping. We propose a new regularization method, which enables learning robust classifiers in presence of noisy data. To achieve this goal, we propose a new adversarial regularization scheme based on the Wasserstein distance. Using this distance allows taking into account specific relations between classes by leveraging the geometric properties of the labels space. Our Wasserstein Adversarial Regularization (WAR) encodes a selective regularization, which promotes smoothness of the classifier between some classes, while preserving sufficient complexity of the decision boundary between others. We first discuss how and why adversarial regularization can be used in the context of label noise and then show the effectiveness of our method on five datasets corrupted with noisy labels: in both benchmarks and real datasets, WAR outperforms the state-of-the-art competitors. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJldu6EtDS |
https://openreview.net/pdf?id=SJldu6EtDS | |
PWC | https://paperswithcode.com/paper/wasserstein-adversarial-regularization-war-on |
Repo | |
Framework | |
Understanding and Improving Information Transfer in Multi-Task Learning
Title | Understanding and Improving Information Transfer in Multi-Task Learning |
Authors | Anonymous |
Abstract | We investigate multi-task learning approaches which use a shared feature representation for all tasks. To better understand the transfer of task information, we study an architecture with a shared module for all tasks and a separate output module for each task. We study the theory of this setting on linear and ReLU-activated models. Our key observation is that whether or not tasks’ data are well-aligned can significantly affect the performance of multi-task learning. We show that misalignment between task data can cause negative transfer (or hurt performance) and provide sufficient conditions for positive transfer. Inspired by the theoretical insights, we show that aligning tasks’ embedding layers leads to performance gains for multi-task training and transfer learning on the GLUE benchmark and sentiment analysis tasks; for example, we obtained a 2.35% GLUE score average improvement on 5 GLUE tasks over BERT LARGE using our alignment method. We also design an SVD-based task re-weighting scheme and show that it improves the robustness of multi-task training on a multi-label image dataset. |
Tasks | Multi-Task Learning, Sentiment Analysis, Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SylzhkBtDB |
https://openreview.net/pdf?id=SylzhkBtDB | |
PWC | https://paperswithcode.com/paper/understanding-and-improving-information |
Repo | |
Framework | |
On the Pareto Efficiency of Quantized CNN
Title | On the Pareto Efficiency of Quantized CNN |
Authors | Anonymous |
Abstract | Weight Quantization for deep convolutional neural networks (CNNs) has shown promising results in compressing and accelerating CNN-powered applications such as semantic segmentation, gesture recognition, and scene understanding. Prior art has shown that different datasets, tasks, and network architectures admit different iso-accurate precision values, which increase the complexity of efficient quantized neural network implementations from both hardware and software perspectives. In this work, we show that when the number of channels is allowed to vary in an iso-model size scenario, lower precision values Pareto dominate higher precision ones (in accuracy vs. model size) for networks with standard convolutions. Relying on comprehensive empirical analyses, we find that the Pareto optimal precision value of a convolution layer depends on the number of input channels per output filters and provide theoretical insights for it. To this end, we develop a simple algorithm to select the precision values for CNNs that outperforms corresponding 8-bit quantized networks by 0.9% and 2.2% in top-1 accuracy on ImageNet for ResNet50 and MobileNetV2, respectively. |
Tasks | Gesture Recognition, Quantization, Scene Understanding, Semantic Segmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJeWVpNtwr |
https://openreview.net/pdf?id=BJeWVpNtwr | |
PWC | https://paperswithcode.com/paper/on-the-pareto-efficiency-of-quantized-cnn |
Repo | |
Framework | |
Spike-based causal inference for weight alignment
Title | Spike-based causal inference for weight alignment |
Authors | Anonymous |
Abstract | In artificial neural networks trained with gradient descent, the weights used for processing stimuli are also used during backward passes to calculate gradients. For the real brain to approximate gradients, gradient information would have to be propagated separately, such that one set of synaptic weights is used for processing and another set is used for backward passes. This produces the so-called “weight transport problem” for biological models of learning, where the backward weights used to calculate gradients need to mirror the forward weights used to process stimuli. This weight transport problem has been considered so hard that popular proposals for biological learning assume that the backward weights are simply random, as in the feedback alignment algorithm. However, such random weights do not appear to work well for large networks. Here we show how the discontinuity introduced in a spiking system can lead to a solution to this problem. The resulting algorithm is a special case of an estimator used for causal inference in econometrics, regression discontinuity design. We show empirically that this algorithm rapidly makes the backward weights approximate the forward weights. As the backward weights become correct, this improves learning performance over feedback alignment on tasks such as Fashion-MNIST and CIFAR-10. Our results demonstrate that a simple learning rule in a spiking network can allow neurons to produce the right backward connections and thus solve the weight transport problem. |
Tasks | Causal Inference |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJxWxxSYvB |
https://openreview.net/pdf?id=rJxWxxSYvB | |
PWC | https://paperswithcode.com/paper/spike-based-causal-inference-for-weight |
Repo | |
Framework | |
Understanding the functional and structural differences across excitatory and inhibitory neurons
Title | Understanding the functional and structural differences across excitatory and inhibitory neurons |
Authors | Anonymous |
Abstract | One of the most fundamental organizational principles of the brain is the separation of excitatory (E) and inhibitory (I) neurons. In addition to their opposing effects on post-synaptic neurons, E and I cells tend to differ in their selectivity and connectivity. Although many such differences have been characterized experimentally, it is not clear why they exist in the first place. We studied this question in deep networks equipped with E and I cells. We found that salient distinctions between E and I neurons emerge across various deep convolutional recurrent networks trained to perform standard object classification tasks. We explored the necessary conditions for the networks to develop distinct selectivity and connectivity across cell types. We found that neurons that project to higher-order areas will have greater stimulus selectivity, regardless of whether they are excitatory or not. Sparser connectivity is required for higher selectivity, but only when the recurrent connections are excitatory. These findings demonstrate that the functional and structural differences observed across E and I neurons are not independent, and can be explained using a smaller number of factors. |
Tasks | Object Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1xsG0VYvB |
https://openreview.net/pdf?id=S1xsG0VYvB | |
PWC | https://paperswithcode.com/paper/understanding-the-functional-and-structural |
Repo | |
Framework | |
Learning in Confusion: Batch Active Learning with Noisy Oracle
Title | Learning in Confusion: Batch Active Learning with Noisy Oracle |
Authors | Anonymous |
Abstract | We study the problem of training machine learning models incrementally using active learning with access to imperfect or noisy oracles. We specifically consider the setting of batch active learning, in which multiple samples are selected as opposed to a single sample as in classical settings so as to reduce the training overhead. Our approach bridges between uniform randomness and score based importance sampling of clusters when selecting a batch of new samples. Experiments on benchmark image classification datasets (MNIST, SVHN, and CIFAR10) shows improvement over existing active learning strategies. We introduce an extra denoising layer to deep networks to make active learning robust to label noises and show significant improvements. |
Tasks | Active Learning, Denoising, Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJxIkkSKwB |
https://openreview.net/pdf?id=SJxIkkSKwB | |
PWC | https://paperswithcode.com/paper/learning-in-confusion-batch-active-learning-1 |
Repo | |
Framework | |
Samples Are Useful? Not Always: denoising policy gradient updates using variance explained
Title | Samples Are Useful? Not Always: denoising policy gradient updates using variance explained |
Authors | Anonymous |
Abstract | Policy gradient algorithms in reinforcement learning optimize the policy directly and rely on efficiently sampling an environment. However, while most sampling procedures are based solely on sampling the agent’s policy, other measures directly accessible through these algorithms could be used to improve sampling before each policy update. Following this line of thoughts, we propose the use of SAUNA, a method where transitions are rejected from the gradient updates if they do not meet a particular criterion, and kept otherwise. This criterion, the fraction of variance explained Vex, is a measure of the discrepancy between a model and actual samples. In this work, Vex is used to evaluate the impact each transition will have on learning: this criterion refines sampling and improves the policy gradient algorithm. In this paper: (a) We introduce and explore Vex, the criterion used for denoising policy gradient updates. (b) We conduct experiments across a variety of benchmark environments, including standard continuous control problems. Our results show better performance with SAUNA. (c) We investigate why Vex provides a reliable assessment for the selection of samples that will positively impact learning. (d) We show how this criterion can work as a dynamic tool to adjust the ratio between exploration and exploitation. |
Tasks | Continuous Control, Denoising |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1g_BT4FvS |
https://openreview.net/pdf?id=B1g_BT4FvS | |
PWC | https://paperswithcode.com/paper/samples-are-useful-not-always-denoising |
Repo | |
Framework | |
Needles in Haystacks: On Classifying Tiny Objects in Large Images
Title | Needles in Haystacks: On Classifying Tiny Objects in Large Images |
Authors | Anonymous |
Abstract | In some important computer vision domains, such as medical or hyperspectral imaging, we care about the classification of tiny objects in large images. However, most Convolutional Neural Networks (CNNs) for image classification were developed using biased datasets that contain large objects, in mostly central image positions. To assess whether classical CNN architectures work well for tiny object classification we build a comprehensive testbed containing two datasets: one derived from MNIST digits and one from histopathology images. This testbed allows controlled experiments to stress-test CNN architectures with a broad spectrum of signal-to-noise ratios. Our observations indicate that: (1) There exists a limit to signal-to-noise below which CNNs fail to generalize and that this limit is affected by dataset size - more data leading to better performances; however, the amount of training data required for the model to generalize scales rapidly with the inverse of the object-to-image ratio (2) in general, higher capacity models exhibit better generalization; (3) when knowing the approximate object sizes, adapting receptive field is beneficial; and (4) for very small signal-to-noise ratio the choice of global pooling operation affects optimization, whereas for relatively large signal-to-noise values, all tested global pooling operations exhibit similar performance. |
Tasks | Image Classification, Object Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1xTup4KPr |
https://openreview.net/pdf?id=H1xTup4KPr | |
PWC | https://paperswithcode.com/paper/needles-in-haystacks-on-classifying-tiny-1 |
Repo | |
Framework | |
Functional Regularisation for Continual Learning with Gaussian Processes
Title | Functional Regularisation for Continual Learning with Gaussian Processes |
Authors | Anonymous |
Abstract | We introduce a framework for Continual Learning (CL) based on Bayesian inference over the function space rather than the parameters of a deep neural network. This method, referred to as functional regularisation for Continual Learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function. To achieve this we rely on a Gaussian process obtained by treating the weights of the last layer of a neural network as random and Gaussian distributed. Then, the training algorithm sequentially encounters tasks and constructs posterior beliefs over the task-specific functions by using inducing point sparse Gaussian process methods. At each step a new task is first learnt and then a summary is constructed consisting of (i) inducing inputs – a fixed-size subset of the task inputs selected such that it optimally represents the task – and (ii) a posterior distribution over the function values at these inputs. This summary then regularises learning of future tasks, through Kullback-Leibler regularisation terms. Our method thus unites approaches focused on (pseudo-)rehearsal with those derived from a sequential Bayesian inference perspective in a principled way, leading to strong results on accepted benchmarks. |
Tasks | Bayesian Inference, Continual Learning, Gaussian Processes |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkxCzeHFDB |
https://openreview.net/pdf?id=HkxCzeHFDB | |
PWC | https://paperswithcode.com/paper/functional-regularisation-for-continual-1 |
Repo | |
Framework | |
Online Learned Continual Compression with Stacked Quantization Modules
Title | Online Learned Continual Compression with Stacked Quantization Modules |
Authors | Anonymous |
Abstract | We introduce and study the problem of Online Continual Compression, where one attempts to learn to compress and store a representative dataset from a non i.i.d data stream, while only observing each sample once. This problem is highly relevant for downstream online continual learning tasks, as well as standard learning methods under resource constrained data collection. We propose a new architecture which stacks Quantization Modules (SQM), consisting of a series of discrete autoencoders, each equipped with their own memory. Every added module is trained to reconstruct the latent space of the previous module using fewer bits, allowing the learned representation to become more compact as training progresses. This modularity has several advantages: 1) moderate compressions are quickly available early in training, which is crucial for remembering the early tasks, 2) as more data needs to be stored, earlier data becomes more compressed, freeing memory, 3) unlike previous methods, our approach does not require pretraining, even on challenging datasets. We show several potential applications of this method. We first replace the episodic memory used in Experience Replay with SQM, leading to significant gains on standard continual learning benchmarks using a fixed memory budget. We then apply our method to compressing larger images like those from Imagenet, and show that it is also effective with other modalities, such as LiDAR data. |
Tasks | Continual Learning, Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1xHfxHtPr |
https://openreview.net/pdf?id=S1xHfxHtPr | |
PWC | https://paperswithcode.com/paper/online-learned-continual-compression-with |
Repo | |
Framework | |
Continual Learning with Delayed Feedback
Title | Continual Learning with Delayed Feedback |
Authors | Anonymous |
Abstract | Most of the artificial neural networks are using the benefit of labeled datasets whereas in human brain, the learning is often unsupervised. The feedback or a label for a given input or a sensory stimuli is not often available instantly. After some time when brain gets the feedback, it updates its knowledge. That’s how brain learns. Moreover, there is no training or testing phase. Human learns continually. This work proposes a model-agnostic continual learning framework which can be used with neural networks as well as decision trees to incorporate continual learning. Specifically, this work investigates how delayed feedback can be handled. In addition, a way to update the Machine Learning models with unlabeled data is proposed. Promising results are received from the experiments done on neural networks and decision trees. |
Tasks | Continual Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJlgTJHKwB |
https://openreview.net/pdf?id=SJlgTJHKwB | |
PWC | https://paperswithcode.com/paper/continual-learning-with-delayed-feedback |
Repo | |
Framework | |
Task-agnostic Continual Learning via Growing Long-Term Memory Networks
Title | Task-agnostic Continual Learning via Growing Long-Term Memory Networks |
Authors | Anonymous |
Abstract | As our experience shows, humans can learn and deploy a myriad of different skills to tackle the situations they encounter daily. Neural networks, in contrast, have a fixed memory capacity that prevents them from learning more than a few sets of skills before starting to forget them. In this work, we make a step to bridge neural networks with human-like learning capabilities. For this, we propose a model with a growing and open-bounded memory capacity that can be accessed based on the model’s current demands. To test this system, we introduce a continual learning task based on language modelling where the model is exposed to multiple languages and domains in sequence, without providing any explicit signal on the type of input it is currently dealing with. The proposed system exhibits improved adaptation skills in that it can recover faster than comparable baselines after a switch in the input language or domain. |
Tasks | Continual Learning, Language Modelling |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJxoi1HtPr |
https://openreview.net/pdf?id=rJxoi1HtPr | |
PWC | https://paperswithcode.com/paper/task-agnostic-continual-learning-via-growing |
Repo | |
Framework | |
Uncertainty-guided Continual Learning with Bayesian Neural Networks
Title | Uncertainty-guided Continual Learning with Bayesian Neural Networks |
Authors | Anonymous |
Abstract | Continual learning aims to learn new tasks without forgetting previously learned ones. This is especially challenging when one cannot access data from previous tasks and when the model has a fixed capacity. Current regularization-based continual learning algorithms need an external representation and extra computation to measure the parameters’ \textit{importance}. In contrast, we propose Uncertainty-guided Continual Bayesian Neural Networks (UCB) where the learning rate adapts according to the uncertainty defined in the probability distribution of the weights in networks. Uncertainty is a natural way to identify \textit{what to remember} and \textit{what to change} as we continually learn, and thus mitigate catastrophic forgetting. We also show a variant of our model, which uses uncertainty for weight pruning and retains task performance after pruning by saving binary masks per tasks. We evaluate our UCB approach extensively on diverse object classification datasets with short and long sequences of tasks and report superior or on-par performance compared to existing approaches. Additionally, we show that our model does not necessarily need task information at test time, i.e.~it does not presume knowledge of which task a sample belongs to. |
Tasks | Continual Learning, Object Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HklUCCVKDB |
https://openreview.net/pdf?id=HklUCCVKDB | |
PWC | https://paperswithcode.com/paper/uncertainty-guided-continual-learning-with-1 |
Repo | |
Framework | |