April 1, 2020

3047 words 15 mins read

Paper Group NANR 94

Multi-Precision Policy Enforced Training (MuPPET) : A precision-switching strategy for quantised fixed-point training of CNNs. Feature-map-level Online Adversarial Knowledge Distillation. UNPAIRED POINT CLOUD COMPLETION ON REAL SCANS USING ADVERSARIAL TRAINING. A Probabilistic Formulation of Unsupervised Text Style Transfer. Neural Network Branchin …

Multi-Precision Policy Enforced Training (MuPPET) : A precision-switching strategy for quantised fixed-point training of CNNs


Title	Multi-Precision Policy Enforced Training (MuPPET) : A precision-switching strategy for quantised fixed-point training of CNNs
Authors	Anonymous
Abstract	Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from hours to weeks, limiting the productivity and experimentation of deep learning practitioners. As networks grow in size and complexity one approach of reducing training time is the use of low-precision data representation and computations during the training stage. However, in doing so the final accuracy suffers due to the problem of vanishing gradients. Existing state-of-the-art methods combat this issue by means of a mixed-precision approach employing two different precision levels, FP32 (32-bit floating-point precision) and FP16/FP8 (16-/8-bit floating-point precision), leveraging the hardware support of recent GPU architectures for FP16 operations to obtaining performance gains. This work pushes the boundary of quantised training by employing a multilevel optimisation approach that utilises multiple precisions including low-precision fixed-point representations. The training strategy, named MuPPET, combines the use of multiple number representation regimes together with a precision-switching mechanism that decides at run time the transition between different precisions. Overall, the proposed strategy tailors the training process to the hardware-level capabilities of the utilised hardware architecture and yields improvements in training time and energy efficiency compared to state-of-the-art approaches. Applying MuPPET on the training of AlexNet, ResNet18 and GoogLeNet on ImageNet (ILSVRC12) and targeting an NVIDIA Turing GPU, the proposed method achieves the same accuracy as the standard full-precision training with an average training-time speedup of 1.28× across the networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1xzdlStvB
PDF	https://openreview.net/pdf?id=H1xzdlStvB
PWC	https://paperswithcode.com/paper/multi-precision-policy-enforced-training
Repo
Framework

Feature-map-level Online Adversarial Knowledge Distillation


Title	Feature-map-level Online Adversarial Knowledge Distillation
Authors	Anonymous
Abstract	Feature maps contain rich information about image intensity and spatial correlation. However, previous online knowledge distillation methods only utilize the class probabilities. Thus in this paper, we propose an online knowledge distillation method that transfers not only the knowledge of the class probabilities but also that of the feature map using the adversarial training framework. We train multiple networks simultaneously by employing discriminators to distinguish the feature map distributions of different networks. Each network has its corresponding discriminator which discriminates the feature map from its own as fake while classifying that of the other network as real. By training a network to fool the corresponding discriminator, it can learn the other network’s feature map distribution. Discriminators and networks are trained concurrently in a minimax two-player game. Also, we propose a novel cyclic learning scheme for training more than two networks together. We have applied our method to various network architectures on the classification task and discovered a significant improvement of performance especially in the case of training a pair of a small network and a large one.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkl086VYvH
PDF	https://openreview.net/pdf?id=Bkl086VYvH
PWC	https://paperswithcode.com/paper/feature-map-level-online-adversarial
Repo
Framework

UNPAIRED POINT CLOUD COMPLETION ON REAL SCANS USING ADVERSARIAL TRAINING


Title	UNPAIRED POINT CLOUD COMPLETION ON REAL SCANS USING ADVERSARIAL TRAINING
Authors	Anonymous
Abstract	As 3D scanning solutions become increasingly popular, several deep learning setups have been developed for the task of scan completion, i.e., plausibly filling in regions that were missed in the raw scans. These methods, however, largely rely on supervision in the form of paired training data, i.e., partial scans with corresponding desired completed scans. While these methods have been successfully demonstrated on synthetic data, the approaches cannot be directly used on real scans in absence of suitable paired training data. We develop a first approach that works directly on input point clouds, does not require paired training data, and hence can directly be applied to real scans for scan completion. We evaluate the approach qualitatively on several real-world datasets (ScanNet, Matterport3D, KITTI), quantitatively on 3D-EPN shape completion benchmark dataset, and demonstrate realistic completions under varying levels of incompleteness.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HkgrZ0EYwB
PDF	https://openreview.net/pdf?id=HkgrZ0EYwB
PWC	https://paperswithcode.com/paper/unpaired-point-cloud-completion-on-real-scans-1
Repo
Framework

A Probabilistic Formulation of Unsupervised Text Style Transfer


Title	A Probabilistic Formulation of Unsupervised Text Style Transfer
Authors	Anonymous
Abstract	We present a deep generative model for unsupervised text style transfer that unifies previously proposed non-generative techniques. Our probabilistic approach models non-parallel data from two domains as a partially observed parallel corpus. By hypothesizing a parallel latent sequence that generates each observed sequence, our model learns to transform sequences from one domain to another in a completely unsupervised fashion. In contrast with traditional generative sequence models (e.g. the HMM), our model makes few assumptions about the data it generates: it uses a recurrent language model as a prior and an encoder-decoder as a transduction distribution. While computation of marginal data likelihood is intractable in this model class, we show that amortized variational inference admits a practical surrogate. Further, by drawing connections between our variational objective and other recent unsupervised style transfer and machine translation techniques, we show how our probabilistic view can unify some known non-generative objectives such as backtranslation and adversarial loss. Finally, we demonstrate the effectiveness of our method on a wide range of unsupervised style transfer tasks, including sentiment transfer, formality transfer, word decipherment, author imitation, and related language translation. Across all style transfer tasks, our approach yields substantial gains over state-of-the-art non-generative baselines, including the state-of-the-art unsupervised machine translation techniques that our approach generalizes. Further, we conduct experiments on a standard unsupervised machine translation task and find that our unified approach matches the current state-of-the-art.
Tasks	Language Modelling, Machine Translation, Style Transfer, Text Style Transfer, Unsupervised Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=HJlA0C4tPS
PDF	https://openreview.net/pdf?id=HJlA0C4tPS
PWC	https://paperswithcode.com/paper/a-probabilistic-formulation-of-unsupervised
Repo
Framework

Neural Network Branching for Neural Network Verification


Title	Neural Network Branching for Neural Network Verification
Authors	Anonymous
Abstract	Formal verification of neural networks is essential for their deployment in safety-critical areas. Many available formal verification methods have been shown to be instances of a unified Branch and Bound (BaB) formulation. We propose a novel framework for designing an effective branching strategy for BaB. Specifically, we learn a graph neural network (GNN) to imitate the strong branching heuristic behaviour. Our framework differs from previous methods for learning to branch in two main aspects. Firstly, our framework directly treats the neural network we want to verify as a graph input for the GNN. Secondly, we develop an intuitive forward and backward embedding update schedule. Empirically, our framework achieves roughly $50%$ reduction in both the number of branches and the time required for verification on various convolutional networks when compared to the best available hand-designed branching strategy. In addition, we show that our GNN model enjoys both horizontal and vertical transferability. Horizontally, the model trained on easy properties performs well on properties of increased difficulty levels. Vertically, the model trained on small neural networks achieves similar performance on large neural networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1evfa4tPB
PDF	https://openreview.net/pdf?id=B1evfa4tPB
PWC	https://paperswithcode.com/paper/neural-network-branching-for-neural-network
Repo
Framework

Bayesian Meta Sampling for Fast Uncertainty Adaptation


Title	Bayesian Meta Sampling for Fast Uncertainty Adaptation
Authors	Anonymous
Abstract	Meta learning has been making impressive progress for fast model adaptation. However, limited work has been done on learning fast uncertainty adaption for Bayesian modeling. In this paper, we propose to achieve the goal by placing meta learning on the space of probability measures, inducing the concept of meta sampling for fast uncertainty adaption. Specifically, we propose a Bayesian meta sampling framework consisting of two main components: a meta sampler and a sample adapter. The meta sampler is constructed by adopting a neural-inverse-autoregressive-flow (NIAF) structure, a variant of the recently proposed neural autoregressive flows, to efficiently generate meta samples to be adapted. The sample adapter moves meta samples to task-specific samples, based on a newly proposed and general Bayesian sampling technique, called optimal-transport Bayesian sampling. The combination of the two components allows a simple learning procedure for the meta sampler to be developed, which can be efficiently optimized via standard back-propagation. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed framework, obtaining better sample quality and faster uncertainty adaption compared to related methods.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkxv90EKPB
PDF	https://openreview.net/pdf?id=Bkxv90EKPB
PWC	https://paperswithcode.com/paper/bayesian-meta-sampling-for-fast-uncertainty
Repo
Framework

Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling


Title	Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling
Authors	Anonymous
Abstract	Sequence labeling is a fundamental framework for various natural language processing problems including part-of-speech tagging and named entity recognition. Its performance is largely influenced by the annotation quality and quantity in supervised learning scenarios. In many cases, ground truth labels are costly and time-consuming to collect or even non-existent, while imperfect ones could be easily accessed or transferred from different domains. A typical example is crowd-sourced datasets which have multiple annotations for each sentence which may be noisy or incomplete. Additionally, predictions from multiple source models in transfer learning can be seen as a case of multi-source supervision. In this paper, we propose a novel framework named Consensus Network (CONNET) to conduct training with imperfect annotations from multiple sources. It learns the representation for every weak supervision source and dynamically aggregates them by a context-aware attention mechanism. Finally, it leads to a model reflecting the consensus among multiple sources. We evaluate the proposed framework in two practical settings of multi-source learning: learning with crowd annotations and unsupervised cross-domain model adaptation. Extensive experimental results show that our model achieves significant improvements over existing methods in both settings.
Tasks	Named Entity Recognition, Part-Of-Speech Tagging, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HJe9cR4KvB
PDF	https://openreview.net/pdf?id=HJe9cR4KvB
PWC	https://paperswithcode.com/paper/learning-to-contextually-aggregate-multi-1
Repo
Framework

Individualised Dose-Response Estimation using Generative Adversarial Nets


Title	Individualised Dose-Response Estimation using Generative Adversarial Nets
Authors	Ioana Bica, James Jordon, Mihaela van der Schaar
Abstract	The problem of estimating treatment responses from observational data is by now a well-studied one. Less well studied, though, is the problem of treatment response estimation when the treatments are accompanied by a continuous dosage parameter. In this paper, we tackle this lesser studied problem by building on a modification of the generative adversarial networks (GANs) framework that has already demonstrated effectiveness in the former problem. Our model, DRGAN, is flexible, capable of handling multiple treatments each accompanied by a dosage parameter. The key idea is to use a significantly modified GAN model to generate entire dose-response curves for each sample in the training data which will then allow us to use standard supervised methods to learn an inference model capable of estimating these curves for a new sample. Our model consists of 3 blocks: (1) a generator, (2) a discriminator, (3) an inference block. In order to address the challenge presented by the introduction of dosages, we propose novel architectures for both our generator and discriminator. We model the generator as a multi-task deep neural network. In order to address the increased complexity of the treatment space (because of the addition of dosages), we develop a hierarchical discriminator consisting of several networks: (a) a treatment discriminator, (b) a dosage discriminator for each treatment. In the experiments section, we introduce a new semi-synthetic data simulation for use in the dose-response setting and demonstrate improvements over the existing benchmark models.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJx9vaVtDS
PDF	https://openreview.net/pdf?id=rJx9vaVtDS
PWC	https://paperswithcode.com/paper/individualised-dose-response-estimation-using
Repo
Framework

Deep Learning For Symbolic Mathematics


Title	Deep Learning For Symbolic Mathematics
Authors	Anonymous
Abstract	Neural networks have a reputation for being better at solving statistical or approximate problems than at performing calculations or working with symbolic data. In this paper, we show that they can be surprisingly good at more elaborated tasks in mathematics, such as symbolic integration and solving differential equations. We propose a syntax for representing these mathematical problems, and methods for generating large datasets that can be used to train sequence-to-sequence models. We achieve results that outperform commercial Computer Algebra Systems such as Matlab or Mathematica.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1eZYeHFDS
PDF	https://openreview.net/pdf?id=S1eZYeHFDS
PWC	https://paperswithcode.com/paper/deep-learning-for-symbolic-mathematics
Repo
Framework

Gap-Aware Mitigation of Gradient Staleness


Title	Gap-Aware Mitigation of Gradient Staleness
Authors	Anonymous
Abstract	Cloud computing is becoming increasingly popular as a platform for distributed training of deep neural networks. Synchronous stochastic gradient descent (SSGD) suffers from substantial slowdowns due to stragglers if the environment is non-dedicated, as is common in cloud computing. Asynchronous SGD (ASGD) methods are immune to these slowdowns but are scarcely used due to gradient staleness, which encumbers the convergence process. Recent techniques have had limited success mitigating the gradient staleness when scaling up to many workers (computing nodes). In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers. Our evaluation on the CIFAR, ImageNet, and WikiText-103 datasets shows that GA outperforms the currently acceptable gradient penalization method, in final test accuracy. We also provide convergence rate proof for GA. Despite prior beliefs, we show that if GA is applied, momentum becomes beneficial in asynchronous environments, even when the number of workers scales up.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1lLw6EYwB
PDF	https://openreview.net/pdf?id=B1lLw6EYwB
PWC	https://paperswithcode.com/paper/gap-aware-mitigation-of-gradient-staleness-1
Repo
Framework

Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following


Title	Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following
Authors	Anonymous
Abstract	Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. These properties make it a natural fit to guide the training of interactive agents as it could ease recurrent challenges in Reinforcement Learning such as sample complexity, generalization, or multi-tasking. Yet, it remains an open-problem to relate language and RL in even simple instruction following scenarios. Current methods rely on expert demonstrations, auxiliary losses, or inductive biases in neural architectures. In this paper, we propose an orthogonal approach called Textual Hindsight Experience Replay (THER) that extends the Hindsight Experience Replay approach to the language setting. Whenever the agent does not fulfill its instruction, THER learn to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward. To do so, THER learns to map a state into an instruction by using past successful trajectories, which removes the need to have external expert interventions to relabel episodes as in vanilla HER. We observe that this simple idea also initiates a learning synergy between language acquisition and policy learning on instruction following tasks in the BabyAI environment.
Tasks	Language Acquisition
Published	2020-01-01
URL	https://openreview.net/forum?id=S1g_t1StDB
PDF	https://openreview.net/pdf?id=S1g_t1StDB
PWC	https://paperswithcode.com/paper/self-educated-language-agent-with-hindsight
Repo
Framework

Wasserstein Robust Reinforcement Learning


Title	Wasserstein Robust Reinforcement Learning
Authors	Anonymous
Abstract	Reinforcement learning algorithms, though successful, tend to over-fit to training environments, thereby hampering their application to the real-world. This paper proposes $\text{W}\text{R}^{2}\text{L}$ – a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. Apart from the formulation, we also propose an efficient and scalable solver following a novel zero-order optimisation method that we believe can be useful to numerical optimisation in general. We empirically demonstrate significant gains compared to standard and robust state-of-the-art algorithms on high-dimensional MuJuCo environments
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyxwZRNtDr
PDF	https://openreview.net/pdf?id=HyxwZRNtDr
PWC	https://paperswithcode.com/paper/wasserstein-robust-reinforcement-learning-1
Repo
Framework

A FRAMEWORK FOR ROBUSTNESS CERTIFICATION OF SMOOTHED CLASSIFIERS USING F-DIVERGENCES


Title	A FRAMEWORK FOR ROBUSTNESS CERTIFICATION OF SMOOTHED CLASSIFIERS USING F-DIVERGENCES
Authors	Anonymous
Abstract	Formal verification techniques that compute provable guarantees on properties of machine learning models, like robustness to norm-bounded adversarial perturbations, have yielded impressive results. Although most techniques developed so far requires knowledge of the architecture of the machine learning model and remains hard to scale to complex prediction pipelines, the method of randomized smoothing has been shown to overcome many of these obstacles. By requiring only black-box access to the underlying model, randomized smoothing scales to large architectures and is agnostic to the internals of the network. However, past work on randomized smoothing has focused on restricted classes of smoothing measures or perturbations (like Gaussian or discrete) and has only been able to prove robustness with respect to simple norm bounds. In this paper we introduce a general framework for proving robustness properties of smoothed machine learning models in the black-box setting. Specifically, we extend randomized smoothing procedures to handle arbitrary smoothing measures and prove robustness of the smoothed classifier by using $f$-divergences. Our methodology achieves state-of-the-art}certified robustness on MNIST, CIFAR-10 and ImageNet and also audio classification task, Librispeech, with respect to several classes of adversarial perturbations.
Tasks	Audio Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=SJlKrkSFPH
PDF	https://openreview.net/pdf?id=SJlKrkSFPH
PWC	https://paperswithcode.com/paper/a-framework-for-robustness-certification-of
Repo
Framework

A Closer Look at Deep Policy Gradients


Title	A Closer Look at Deep Policy Gradients
Authors	Anonymous
Abstract	We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. To this end, we propose a fine-grained analysis of state-of-the-art methods based on key elements of this framework: gradient estimation, value prediction, and optimization landscapes. Our results show that the behavior of deep policy gradient algorithms often deviates from what their motivating framework would predict: surrogate rewards do not match the true reward landscape, learned value estimators fail to fit the true value function, and gradient estimates poorly correlate with the “true” gradient. The mismatch between predicted and empirical behavior we uncover highlights our poor understanding of current methods, and indicates the need to move beyond current benchmark-centric evaluation methods.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryxdEkHtPS
PDF	https://openreview.net/pdf?id=ryxdEkHtPS
PWC	https://paperswithcode.com/paper/a-closer-look-at-deep-policy-gradients
Repo
Framework

Regularizing Trajectories to Mitigate Catastrophic Forgetting


Title	Regularizing Trajectories to Mitigate Catastrophic Forgetting
Authors	Anonymous
Abstract	Regularization-based continual learning approaches generally prevent catastrophic forgetting by augmenting the training loss with an auxiliary objective. However in most practical optimization scenarios with noisy data and/or gradients, it is possible that stochastic gradient descent can inadvertently change critical parameters. In this paper, we argue for the importance of regularizing optimization trajectories directly. We derive a new co-natural gradient update rule for continual learning whereby the new task gradients are preconditioned with the empirical Fisher information of previously learnt tasks. We show that using the co-natural gradient systematically reduces forgetting in continual learning. Moreover, it helps combat overfitting when learning a new task in a low resource scenario.
Tasks	Continual Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkl4EANFDH
PDF	https://openreview.net/pdf?id=Hkl4EANFDH
PWC	https://paperswithcode.com/paper/regularizing-trajectories-to-mitigate
Repo
Framework