April 1, 2020

# Paper Group NANR 112

TSInsight: A local-global attribution framework for interpretability in time-series data. Connecting the Dots Between MLE and RL for Sequence Prediction. Quantum Semi-Supervised Kernel Learning. Meta-Graph: Few shot Link Prediction via Meta Learning. Learning transitional skills with intrinsic motivation. Distilling the Knowledge of BERT for Text G …

#### TSInsight: A local-global attribution framework for interpretability in time-series data

Title TSInsight: A local-global attribution framework for interpretability in time-series data
Authors Shoaib Ahmed Siddiqui, Dominique Mercier, Andreas Dengel, Sheraz Ahmed
Abstract With the rise in employment of deep learning methods in safety-critical scenarios, interpretability is more essential than ever before. Although many different directions regarding interpretability have been explored for visual modalities, time-series data has been neglected with only a handful of methods tested due to their poor intelligibility. We approach the problem of interpretability in a novel way by proposing TSInsight where we attach an auto-encoder with a sparsity-inducing norm on its output to the classifier and fine-tune it based on the gradients from the classifier and a reconstruction penalty. The auto-encoder learns to preserve features that are important for the prediction by the classifier and suppresses the ones that are irrelevant i.e. serves as a feature attribution method to boost interpretability. In other words, we ask the network to only reconstruct parts which are useful for the classifier i.e. are correlated or causal for the prediction. In contrast to most other attribution frameworks, TSInsight is capable of generating both instance-based and model-based explanations. We evaluated TSInsight along with other commonly used attribution methods on a range of different time-series datasets to validate its efficacy. Furthermore, we analyzed the set of properties that TSInsight achieves out of the box including adversarial robustness and output space contraction. The obtained results advocate that TSInsight can be an effective tool for the interpretability of deep time-series models.
Published 2020-01-01
URL https://openreview.net/forum?id=B1gzLaNYvr
PDF https://openreview.net/pdf?id=B1gzLaNYvr
Repo
Framework

#### Connecting the Dots Between MLE and RL for Sequence Prediction

Title Connecting the Dots Between MLE and RL for Sequence Prediction
Authors Anonymous
Abstract Sequence prediction models can be learned from example sequences with a variety of training algorithms. Maximum likelihood learning is simple and efficient, yet can suffer from compounding error at test time. Reinforcement learning such as policy gradient addresses the issue but can have prohibitively poor exploration efficiency. A rich set of other algorithms, such as data noising, RAML, and softmax policy gradient, have also been developed from different perspectives. In this paper, we present a formalism of entropy regularized policy optimization, and show that the apparently distinct algorithms, including MLE, can be reformulated as special instances of the formulation. The difference between them is characterized by the reward function and two weight hyperparameters. The unifying interpretation enables us to systematically compare the algorithms side-by-side, and gain new insights into the trade-offs of the algorithm design. The new perspective also leads to an improved approach that dynamically interpolates among the family of algorithms, and learns the model in a scheduled way. Experiments on machine translation, text summarization, and game imitation learning demonstrate superiority of the proposed approach.
Tasks Imitation Learning, Machine Translation, Text Summarization
Published 2020-01-01
URL https://openreview.net/forum?id=B1gX8JrYPr
PDF https://openreview.net/pdf?id=B1gX8JrYPr
PWC https://paperswithcode.com/paper/connecting-the-dots-between-mle-and-rl-for-1
Repo
Framework

#### Quantum Semi-Supervised Kernel Learning

Title Quantum Semi-Supervised Kernel Learning
Authors Anonymous
Abstract Quantum machine learning methods have the potential to facilitate learning using extremely large datasets. While the availability of data for training machine learning models is steadily increasing, oftentimes it is much easier to collect feature vectors that to obtain the corresponding labels. One of the approaches for addressing this issue is to use semi-supervised learning, which leverages not only the labeled samples, but also unlabeled feature vectors. Here, we present a quantum machine learning algorithm for training Semi-Supervised Kernel Support Vector Machines. The algorithm uses recent advances in quantum sample-based Hamiltonian simulation to extend the existing Quantum LS-SVM algorithm to handle the semi-supervised term in the loss, while maintaining the same quantum speedup as the Quantum LS-SVM.
Published 2020-01-01
URL https://openreview.net/forum?id=ByeqyxBKvS
PDF https://openreview.net/pdf?id=ByeqyxBKvS
PWC https://paperswithcode.com/paper/quantum-semi-supervised-kernel-learning
Repo
Framework
Title Meta-Graph: Few shot Link Prediction via Meta Learning
Authors Anonymous
Abstract We consider the task of few shot link prediction, where the goal is to predict missing edges across multiple graphs using only a small sample of known edges. We show that current link prediction methods are generally ill-equipped to handle this task—as they cannot effectively transfer knowledge between graphs in a multi-graph setting and are unable to effectively learn from very sparse data. To address this challenge, we introduce a new gradient-based meta learning framework, Meta-Graph, that leverages higher-order gradients along with a learned graph signature function that conditionally generates a graph neural network initialization. Using a novel set of few shot link prediction benchmarks, we show that Meta-Graph enables not only fast adaptation but also better final convergence and can effectively learn using only a small sample of true edges.
Published 2020-01-01
URL https://openreview.net/forum?id=BJepcaEtwB
PDF https://openreview.net/pdf?id=BJepcaEtwB
Repo
Framework

#### Learning transitional skills with intrinsic motivation

Title Learning transitional skills with intrinsic motivation
Authors Anonymous
Abstract By maximizing an information theoretic objective, a few recent methods empower the agent to explore the environment and learn useful skills without supervision. However, when considering to use multiple consecutive skills to complete a specific task, the transition from one to another cannot guarantee the success of the process due to the evident gap between skills. In this paper, we propose to learn transitional skills (LTS) in addition to creating diverse primitive skills without a reward function. By introducing an extra latent variable for transitional skills, our LTS method discovers both primitive and transitional skills by minimizing the difference of mutual information and the similarity of skills. By considering various simulated robotic tasks, our results demonstrate the effectiveness of LTS on learning both diverse primitive skills and transitional skills, and show its superiority in smooth transition of skills over the state-of-the-art baseline DIAYN.
Published 2020-01-01
URL https://openreview.net/forum?id=ryeRwlSYPH
PDF https://openreview.net/pdf?id=ryeRwlSYPH
PWC https://paperswithcode.com/paper/learning-transitional-skills-with-intrinsic
Repo
Framework

#### Distilling the Knowledge of BERT for Text Generation

Title Distilling the Knowledge of BERT for Text Generation
Authors Anonymous
Abstract Large-scale pre-trained language model, such as BERT, has recently achieved great success in a wide range of language understanding tasks. However, it remains an open question how to utilize BERT for text generation tasks. In this paper, we present a novel approach to addressing this challenge in a generic sequence-to-sequence (Seq2Seq) setting. We first propose a new task, Conditional Masked Language Modeling (C-MLM), to enable fine-tuning of BERT on target text-generation dataset. The fine-tuned BERT (i.e., teacher) is then exploited as extra supervision to improve conventional Seq2Seq models (i.e., student) for text generation. By leveraging BERT’s idiosyncratic bidirectional nature, distilling the knowledge learned from BERT can encourage auto-regressive Seq2Seq models to plan ahead, imposing global sequence-level supervision for coherent text generation. Experiments show that the proposed approach significantly outperforms strong baselines of Transformer on multiple text generation tasks, including machine translation (MT) and text summarization. Our proposed model also achieves new state-of-the-art results on the IWSLT German-English and English-Vietnamese MT datasets.
Tasks Language Modelling, Machine Translation, Text Generation, Text Summarization
Published 2020-01-01
URL https://openreview.net/forum?id=Bkgz_krKPB
PDF https://openreview.net/pdf?id=Bkgz_krKPB
PWC https://paperswithcode.com/paper/distilling-the-knowledge-of-bert-for-text
Repo
Framework

#### Blending Diverse Physical Priors with Neural Networks

Title Blending Diverse Physical Priors with Neural Networks
Authors Anonymous
Abstract Rethinking physics in the era of deep learning is an increasingly important topic. This topic is special because, in addition to data, one can leverage a vast library of physical prior models (e.g. kinematics, fluid flow, etc) to perform more robust inference. The nascent sub-field of physics-based learning (PBL) studies this problem of blending neural networks with physical priors. While previous PBL algorithms have been applied successfully to specific tasks, it is hard to generalize existing PBL methods to a wide range of physics-based problems. Such generalization would require an architecture that can adapt to variations in the correctness of the physics, or in the quality of training data. No such architecture exists. In this paper, we aim to generalize PBL, by making a first attempt to bring neural architecture search (NAS) to the realm of PBL. We introduce a new method known as physics-based neural architecture search (PhysicsNAS) that is a top-performer across a diverse range of quality in the physical model and the dataset.
Published 2020-01-01
URL https://openreview.net/forum?id=HkeQ6ANYDB
PDF https://openreview.net/pdf?id=HkeQ6ANYDB
PWC https://paperswithcode.com/paper/blending-diverse-physical-priors-with-neural-1
Repo
Framework

#### Hierarchical Summary-to-Article Generation

Title Hierarchical Summary-to-Article Generation
Authors Anonymous
Abstract In this paper, we explore \textit{summary-to-article generation}: the task of generating long articles given a short summary, which provides finer-grained content control for the generated text. To prevent sequence-to-sequence (seq2seq) models from degenerating into language models and better controlling the long text to be generated, we propose a hierarchical generation approach which first generates a sketch of intermediate length based on the summary and then completes the article by enriching the generated sketch. To mitigate the discrepancy between the oracle’’ sketch used during training and the noisy sketch generated during inference, we propose an end-to-end joint training framework based on multi-agent reinforcement learning. For evaluation, we use text summarization corpora by reversing their inputs and outputs, and introduce a novel evaluation method that employs a summarization system to summarize the generated article and test its match with the original input summary. Experiments show that our proposed hierarchical generation approach can generate a coherent and relevant article based on the given summary, yielding significant improvements upon conventional seq2seq models. |
Tasks Multi-agent Reinforcement Learning, Text Summarization
Published 2020-01-01
URL https://openreview.net/forum?id=Hkl8Ia4YPH
PDF https://openreview.net/pdf?id=Hkl8Ia4YPH
PWC https://paperswithcode.com/paper/hierarchical-summary-to-article-generation
Repo
Framework

#### TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising

Title TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising
Authors Anonymous
Abstract Text summarization aims to extract essential information from a piece of text and transform it into a concise version. Existing unsupervised abstractive summarization models use recurrent neural networks framework and ignore abundant unlabeled corpora resources. In order to address these issues, we propose TED, a transformer-based unsupervised summarization system with dataset-agnostic pretraining. We first leverage the lead bias in news articles to pretrain the model on large-scale corpora. Then, we finetune TED on target domains through theme modeling and a denoising autoencoder to enhance the quality of summaries. Notably, TED outperforms all unsupervised abstractive baselines on NYT, CNN/DM and English Gigaword datasets with various document styles. Further analysis shows that the summaries generated by TED are abstractive and containing even higher proportions of novel tokens than those from supervised models.
Tasks Abstractive Text Summarization, Denoising, Text Summarization
Published 2020-01-01
URL https://openreview.net/forum?id=Syxwsp4KDB
PDF https://openreview.net/pdf?id=Syxwsp4KDB
PWC https://paperswithcode.com/paper/ted-a-pretrained-unsupervised-summarization
Repo
Framework

#### On Mutual Information Maximization for Representation Learning

Title On Mutual Information Maximization for Representation Learning
Authors Anonymous
Abstract Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning and argue that this interpretation may be a plausible explanation for the success of the recently introduced methods.
Published 2020-01-01
URL https://openreview.net/forum?id=rkxoh24FPH
PDF https://openreview.net/pdf?id=rkxoh24FPH
PWC https://paperswithcode.com/paper/on-mutual-information-maximization-for-1
Repo
Framework

#### Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks

Title Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks
Authors Anonymous
Abstract We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) — DNNs often fit target functions from low to high frequencies — on high-dimensional benchmark datasets, such as MNIST/CIFAR10, and deep networks, such as VGG16. This F-Principle of DNNs is opposite to the learning behavior of most conventional iterative numerical schemes (e.g., Jacobi method), which exhibits faster convergence for higher frequencies, for various scientific computing problems. With a naive theory, we illustrate that this F-Principle results from the regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or randomized dataset.
Published 2020-01-01
URL https://openreview.net/forum?id=Skgb5h4KPH
PDF https://openreview.net/pdf?id=Skgb5h4KPH
PWC https://paperswithcode.com/paper/frequency-principle-fourier-analysis-sheds-1
Repo
Framework

#### Gaussian Process Meta-Representations Of Neural Networks

Title Gaussian Process Meta-Representations Of Neural Networks
Authors Anonymous
Abstract Bayesian inference offers a theoretically grounded and general way to train neural networks and can potentially give calibrated uncertainty. It is, however, challenging to specify a meaningful and tractable prior over the network parameters. More crucially, many existing inference methods assume mean-field approximate posteriors, ignoring interactions between parameters in high-dimensional weight space. To this end, this paper introduces two innovations: (i) a Gaussian process-based hierarchical model for the network parameters based on recently introduced unit embeddings that can flexibly encode weight structures, and (ii) input-dependent contextual variables for the weight prior that can provide convenient ways to regularize the function space being modeled by the NN through the use of kernels. Furthermore, we develop an efficient structured variational inference scheme that alleviates the need to perform inference in the weight space whilst retaining and learning non-trivial correlations between network parameters. We show these models provide desirable test-time uncertainty estimates, demonstrate cases of modeling inductive biases for neural networks with kernels and demonstrate competitive predictive performance of the proposed model and algorithm over alternative approaches on a range of classification and active learning tasks.
Published 2020-01-01
URL https://openreview.net/forum?id=HkxwmRVtwH
PDF https://openreview.net/pdf?id=HkxwmRVtwH
PWC https://paperswithcode.com/paper/gaussian-process-meta-representations-of
Repo
Framework

#### Decoupling Adaptation from Modeling with Meta-Optimizers for Meta Learning

Title Decoupling Adaptation from Modeling with Meta-Optimizers for Meta Learning
Authors Anonymous
Published 2020-01-01
URL https://openreview.net/forum?id=BkljIlHtvS
PDF https://openreview.net/pdf?id=BkljIlHtvS
Repo
Framework

#### LOGAN: Latent Optimisation for Generative Adversarial Networks

Title LOGAN: Latent Optimisation for Generative Adversarial Networks
Authors Anonymous
Abstract Training generative adversarial networks requires balancing of delicate adversarial dynamics. Even with careful tuning, training may diverge or end up in a bad equilibrium with dropped modes. In this work, we introduce a new form of latent optimisation inspired by the CS-GAN and show that it improves adversarial dynamics by enhancing interactions between the discriminator and the generator. We develop supporting theoretical analysis from the perspectives of differentiable games and stochastic approximation. Our experiments demonstrate that latent optimisation can significantly improve GAN training, obtaining state-of-the-art performance for the ImageNet (128 x 128) dataset. Our model achieves an Inception Score (IS) of 148 and an Frechet Inception Distance (FID) of 3.4, an improvement of 17% and 32% in IS and FID respectively, compared with the baseline BigGAN-deep model with the same architecture and number of parameters.
Published 2020-01-01
URL https://openreview.net/forum?id=rJeU_1SFvr
PDF https://openreview.net/pdf?id=rJeU_1SFvr
PWC https://paperswithcode.com/paper/logan-latent-optimisation-for-generative
Repo
Framework

#### Efficient High-Dimensional Data Representation Learning via Semi-Stochastic Block Coordinate Descent Methods

Title Efficient High-Dimensional Data Representation Learning via Semi-Stochastic Block Coordinate Descent Methods
Authors Anonymous
Abstract With the increase of data volume and data dimension, sparse representation learning attracts more and more attention. For high-dimensional data, randomized block coordinate descent methods perform well because they do not need to calculate the gradient along the whole dimension. Existing hard thresholding algorithms evaluate gradients followed by a hard thresholding operation to update the model parameter, which leads to slow convergence. To address this issue, we propose a novel hard thresholding algorithm, called Semi-stochastic Block Coordinate Descent Hard Thresholding Pursuit (SBCD-HTP). Moreover, we present its sparse and asynchronous parallel variants. We theoretically analyze the convergence properties of our algorithms, which show that they have a significantly lower hard thresholding complexity than existing algorithms. Our empirical evaluations on real-world datasets and face recognition tasks demonstrate the superior performance of our algorithms for sparsity-constrained optimization problems.