April 1, 2020

2897 words 14 mins read

Paper Group NANR 112

TSInsight: A local-global attribution framework for interpretability in time-series data. Connecting the Dots Between MLE and RL for Sequence Prediction. Quantum Semi-Supervised Kernel Learning. Meta-Graph: Few shot Link Prediction via Meta Learning. Learning transitional skills with intrinsic motivation. Distilling the Knowledge of BERT for Text G …

TSInsight: A local-global attribution framework for interpretability in time-series data


Title	TSInsight: A local-global attribution framework for interpretability in time-series data
Authors	Shoaib Ahmed Siddiqui, Dominique Mercier, Andreas Dengel, Sheraz Ahmed
Abstract	With the rise in employment of deep learning methods in safety-critical scenarios, interpretability is more essential than ever before. Although many different directions regarding interpretability have been explored for visual modalities, time-series data has been neglected with only a handful of methods tested due to their poor intelligibility. We approach the problem of interpretability in a novel way by proposing TSInsight where we attach an auto-encoder with a sparsity-inducing norm on its output to the classifier and fine-tune it based on the gradients from the classifier and a reconstruction penalty. The auto-encoder learns to preserve features that are important for the prediction by the classifier and suppresses the ones that are irrelevant i.e. serves as a feature attribution method to boost interpretability. In other words, we ask the network to only reconstruct parts which are useful for the classifier i.e. are correlated or causal for the prediction. In contrast to most other attribution frameworks, TSInsight is capable of generating both instance-based and model-based explanations. We evaluated TSInsight along with other commonly used attribution methods on a range of different time-series datasets to validate its efficacy. Furthermore, we analyzed the set of properties that TSInsight achieves out of the box including adversarial robustness and output space contraction. The obtained results advocate that TSInsight can be an effective tool for the interpretability of deep time-series models.
Tasks	Time Series
Published	2020-01-01
URL	https://openreview.net/forum?id=B1gzLaNYvr
PDF	https://openreview.net/pdf?id=B1gzLaNYvr
PWC	https://paperswithcode.com/paper/tsinsight-a-local-global-attribution
Repo
Framework

Connecting the Dots Between MLE and RL for Sequence Prediction


Title	Connecting the Dots Between MLE and RL for Sequence Prediction
Authors	Anonymous
Abstract	Sequence prediction models can be learned from example sequences with a variety of training algorithms. Maximum likelihood learning is simple and efficient, yet can suffer from compounding error at test time. Reinforcement learning such as policy gradient addresses the issue but can have prohibitively poor exploration efficiency. A rich set of other algorithms, such as data noising, RAML, and softmax policy gradient, have also been developed from different perspectives. In this paper, we present a formalism of entropy regularized policy optimization, and show that the apparently distinct algorithms, including MLE, can be reformulated as special instances of the formulation. The difference between them is characterized by the reward function and two weight hyperparameters. The unifying interpretation enables us to systematically compare the algorithms side-by-side, and gain new insights into the trade-offs of the algorithm design. The new perspective also leads to an improved approach that dynamically interpolates among the family of algorithms, and learns the model in a scheduled way. Experiments on machine translation, text summarization, and game imitation learning demonstrate superiority of the proposed approach.
Tasks	Imitation Learning, Machine Translation, Text Summarization
Published	2020-01-01
URL	https://openreview.net/forum?id=B1gX8JrYPr
PDF	https://openreview.net/pdf?id=B1gX8JrYPr
PWC	https://paperswithcode.com/paper/connecting-the-dots-between-mle-and-rl-for-1
Repo
Framework

Quantum Semi-Supervised Kernel Learning


Title	Quantum Semi-Supervised Kernel Learning
Authors	Anonymous
Abstract	Quantum machine learning methods have the potential to facilitate learning using extremely large datasets. While the availability of data for training machine learning models is steadily increasing, oftentimes it is much easier to collect feature vectors that to obtain the corresponding labels. One of the approaches for addressing this issue is to use semi-supervised learning, which leverages not only the labeled samples, but also unlabeled feature vectors. Here, we present a quantum machine learning algorithm for training Semi-Supervised Kernel Support Vector Machines. The algorithm uses recent advances in quantum sample-based Hamiltonian simulation to extend the existing Quantum LS-SVM algorithm to handle the semi-supervised term in the loss, while maintaining the same quantum speedup as the Quantum LS-SVM.
Tasks	Quantum Machine Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=ByeqyxBKvS
PDF	https://openreview.net/pdf?id=ByeqyxBKvS
PWC	https://paperswithcode.com/paper/quantum-semi-supervised-kernel-learning
Repo
Framework

Meta-Graph: Few shot Link Prediction via Meta Learning


Title	Meta-Graph: Few shot Link Prediction via Meta Learning
Authors	Anonymous
Abstract	We consider the task of few shot link prediction, where the goal is to predict missing edges across multiple graphs using only a small sample of known edges. We show that current link prediction methods are generally ill-equipped to handle this task—as they cannot effectively transfer knowledge between graphs in a multi-graph setting and are unable to effectively learn from very sparse data. To address this challenge, we introduce a new gradient-based meta learning framework, Meta-Graph, that leverages higher-order gradients along with a learned graph signature function that conditionally generates a graph neural network initialization. Using a novel set of few shot link prediction benchmarks, we show that Meta-Graph enables not only fast adaptation but also better final convergence and can effectively learn using only a small sample of true edges.
Tasks	Link Prediction, Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BJepcaEtwB
PDF	https://openreview.net/pdf?id=BJepcaEtwB
PWC	https://paperswithcode.com/paper/meta-graph-few-shot-link-prediction-via-meta
Repo
Framework

Learning transitional skills with intrinsic motivation


Title	Learning transitional skills with intrinsic motivation
Authors	Anonymous
Abstract	By maximizing an information theoretic objective, a few recent methods empower the agent to explore the environment and learn useful skills without supervision. However, when considering to use multiple consecutive skills to complete a specific task, the transition from one to another cannot guarantee the success of the process due to the evident gap between skills. In this paper, we propose to learn transitional skills (LTS) in addition to creating diverse primitive skills without a reward function. By introducing an extra latent variable for transitional skills, our LTS method discovers both primitive and transitional skills by minimizing the difference of mutual information and the similarity of skills. By considering various simulated robotic tasks, our results demonstrate the effectiveness of LTS on learning both diverse primitive skills and transitional skills, and show its superiority in smooth transition of skills over the state-of-the-art baseline DIAYN.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryeRwlSYPH
PDF	https://openreview.net/pdf?id=ryeRwlSYPH
PWC	https://paperswithcode.com/paper/learning-transitional-skills-with-intrinsic
Repo
Framework

Distilling the Knowledge of BERT for Text Generation


Title	Distilling the Knowledge of BERT for Text Generation
Authors	Anonymous
Abstract	Large-scale pre-trained language model, such as BERT, has recently achieved great success in a wide range of language understanding tasks. However, it remains an open question how to utilize BERT for text generation tasks. In this paper, we present a novel approach to addressing this challenge in a generic sequence-to-sequence (Seq2Seq) setting. We first propose a new task, Conditional Masked Language Modeling (C-MLM), to enable fine-tuning of BERT on target text-generation dataset. The fine-tuned BERT (i.e., teacher) is then exploited as extra supervision to improve conventional Seq2Seq models (i.e., student) for text generation. By leveraging BERT’s idiosyncratic bidirectional nature, distilling the knowledge learned from BERT can encourage auto-regressive Seq2Seq models to plan ahead, imposing global sequence-level supervision for coherent text generation. Experiments show that the proposed approach significantly outperforms strong baselines of Transformer on multiple text generation tasks, including machine translation (MT) and text summarization. Our proposed model also achieves new state-of-the-art results on the IWSLT German-English and English-Vietnamese MT datasets.
Tasks	Language Modelling, Machine Translation, Text Generation, Text Summarization
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkgz_krKPB
PDF	https://openreview.net/pdf?id=Bkgz_krKPB
PWC	https://paperswithcode.com/paper/distilling-the-knowledge-of-bert-for-text
Repo
Framework

Blending Diverse Physical Priors with Neural Networks


Title	Blending Diverse Physical Priors with Neural Networks
Authors	Anonymous
Abstract	Rethinking physics in the era of deep learning is an increasingly important topic. This topic is special because, in addition to data, one can leverage a vast library of physical prior models (e.g. kinematics, fluid flow, etc) to perform more robust inference. The nascent sub-field of physics-based learning (PBL) studies this problem of blending neural networks with physical priors. While previous PBL algorithms have been applied successfully to specific tasks, it is hard to generalize existing PBL methods to a wide range of physics-based problems. Such generalization would require an architecture that can adapt to variations in the correctness of the physics, or in the quality of training data. No such architecture exists. In this paper, we aim to generalize PBL, by making a first attempt to bring neural architecture search (NAS) to the realm of PBL. We introduce a new method known as physics-based neural architecture search (PhysicsNAS) that is a top-performer across a diverse range of quality in the physical model and the dataset.
Tasks	Neural Architecture Search
Published	2020-01-01
URL	https://openreview.net/forum?id=HkeQ6ANYDB
PDF	https://openreview.net/pdf?id=HkeQ6ANYDB
PWC	https://paperswithcode.com/paper/blending-diverse-physical-priors-with-neural-1
Repo
Framework

Hierarchical Summary-to-Article Generation


Title	Hierarchical Summary-to-Article Generation
Authors	Anonymous
Abstract	In this paper, we explore \textit{summary-to-article generation}: the task of generating long articles given a short summary, which provides finer-grained content control for the generated text. To prevent sequence-to-sequence (seq2seq) models from degenerating into language models and better controlling the long text to be generated, we propose a hierarchical generation approach which first generates a sketch of intermediate length based on the summary and then completes the article by enriching the generated sketch. To mitigate the discrepancy between the ``oracle’’ sketch used during training and the noisy sketch generated during inference, we propose an end-to-end joint training framework based on multi-agent reinforcement learning. For evaluation, we use text summarization corpora by reversing their inputs and outputs, and introduce a novel evaluation method that employs a summarization system to summarize the generated article and test its match with the original input summary. Experiments show that our proposed hierarchical generation approach can generate a coherent and relevant article based on the given summary, yielding significant improvements upon conventional seq2seq models. \|
Tasks	Multi-agent Reinforcement Learning, Text Summarization
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkl8Ia4YPH
PDF	https://openreview.net/pdf?id=Hkl8Ia4YPH
PWC	https://paperswithcode.com/paper/hierarchical-summary-to-article-generation
Repo
Framework

TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising


Title	TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising
Authors	Anonymous
Abstract	Text summarization aims to extract essential information from a piece of text and transform it into a concise version. Existing unsupervised abstractive summarization models use recurrent neural networks framework and ignore abundant unlabeled corpora resources. In order to address these issues, we propose TED, a transformer-based unsupervised summarization system with dataset-agnostic pretraining. We first leverage the lead bias in news articles to pretrain the model on large-scale corpora. Then, we finetune TED on target domains through theme modeling and a denoising autoencoder to enhance the quality of summaries. Notably, TED outperforms all unsupervised abstractive baselines on NYT, CNN/DM and English Gigaword datasets with various document styles. Further analysis shows that the summaries generated by TED are abstractive and containing even higher proportions of novel tokens than those from supervised models.
Tasks	Abstractive Text Summarization, Denoising, Text Summarization
Published	2020-01-01
URL	https://openreview.net/forum?id=Syxwsp4KDB
PDF	https://openreview.net/pdf?id=Syxwsp4KDB
PWC	https://paperswithcode.com/paper/ted-a-pretrained-unsupervised-summarization
Repo
Framework

On Mutual Information Maximization for Representation Learning


Title	On Mutual Information Maximization for Representation Learning
Authors	Anonymous
Abstract	Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning and argue that this interpretation may be a plausible explanation for the success of the recently introduced methods.
Tasks	Metric Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rkxoh24FPH
PDF	https://openreview.net/pdf?id=rkxoh24FPH
PWC	https://paperswithcode.com/paper/on-mutual-information-maximization-for-1
Repo
Framework

Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks


Title	Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks
Authors	Anonymous
Abstract	We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) — DNNs often fit target functions from low to high frequencies — on high-dimensional benchmark datasets, such as MNIST/CIFAR10, and deep networks, such as VGG16. This F-Principle of DNNs is opposite to the learning behavior of most conventional iterative numerical schemes (e.g., Jacobi method), which exhibits faster convergence for higher frequencies, for various scientific computing problems. With a naive theory, we illustrate that this F-Principle results from the regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or randomized dataset.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Skgb5h4KPH
PDF	https://openreview.net/pdf?id=Skgb5h4KPH
PWC	https://paperswithcode.com/paper/frequency-principle-fourier-analysis-sheds-1
Repo
Framework

Gaussian Process Meta-Representations Of Neural Networks


Title	Gaussian Process Meta-Representations Of Neural Networks
Authors	Anonymous
Abstract	Bayesian inference offers a theoretically grounded and general way to train neural networks and can potentially give calibrated uncertainty. It is, however, challenging to specify a meaningful and tractable prior over the network parameters. More crucially, many existing inference methods assume mean-field approximate posteriors, ignoring interactions between parameters in high-dimensional weight space. To this end, this paper introduces two innovations: (i) a Gaussian process-based hierarchical model for the network parameters based on recently introduced unit embeddings that can flexibly encode weight structures, and (ii) input-dependent contextual variables for the weight prior that can provide convenient ways to regularize the function space being modeled by the NN through the use of kernels. Furthermore, we develop an efficient structured variational inference scheme that alleviates the need to perform inference in the weight space whilst retaining and learning non-trivial correlations between network parameters. We show these models provide desirable test-time uncertainty estimates, demonstrate cases of modeling inductive biases for neural networks with kernels and demonstrate competitive predictive performance of the proposed model and algorithm over alternative approaches on a range of classification and active learning tasks.
Tasks	Active Learning, Bayesian Inference
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxwmRVtwH
PDF	https://openreview.net/pdf?id=HkxwmRVtwH
PWC	https://paperswithcode.com/paper/gaussian-process-meta-representations-of
Repo
Framework

Decoupling Adaptation from Modeling with Meta-Optimizers for Meta Learning


Title	Decoupling Adaptation from Modeling with Meta-Optimizers for Meta Learning
Authors	Anonymous
Abstract	Meta-learning methods, most notably Model-Agnostic Meta-Learning (Finn et al, 2017) or MAML, have achieved great success in adapting to new tasks quickly, after having been trained on similar tasks. The mechanism behind their success, however, is poorly understood. We begin this work with an experimental analysis of MAML, finding that deep models are crucial for its success, even given sets of simple tasks where a linear model would suffice on any individual task. Furthermore, on image-recognition tasks, we find that the early layers of MAML-trained models learn task-invariant features, while later layers are used for adaptation, providing further evidence that these models require greater capacity than is strictly necessary for their individual tasks. Following our findings, we propose a method which enables better use of model capacity at inference time by separating the adaptation aspect of meta-learning into parameters that are only used for adaptation but are not part of the forward model. We find that our approach enables more effective meta-learning in smaller models, which are suitably sized for the individual tasks.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BkljIlHtvS
PDF	https://openreview.net/pdf?id=BkljIlHtvS
PWC	https://paperswithcode.com/paper/decoupling-adaptation-from-modeling-with-meta
Repo
Framework

LOGAN: Latent Optimisation for Generative Adversarial Networks


Title	LOGAN: Latent Optimisation for Generative Adversarial Networks
Authors	Anonymous
Abstract	Training generative adversarial networks requires balancing of delicate adversarial dynamics. Even with careful tuning, training may diverge or end up in a bad equilibrium with dropped modes. In this work, we introduce a new form of latent optimisation inspired by the CS-GAN and show that it improves adversarial dynamics by enhancing interactions between the discriminator and the generator. We develop supporting theoretical analysis from the perspectives of differentiable games and stochastic approximation. Our experiments demonstrate that latent optimisation can significantly improve GAN training, obtaining state-of-the-art performance for the ImageNet (128 x 128) dataset. Our model achieves an Inception Score (IS) of 148 and an Frechet Inception Distance (FID) of 3.4, an improvement of 17% and 32% in IS and FID respectively, compared with the baseline BigGAN-deep model with the same architecture and number of parameters.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJeU_1SFvr
PDF	https://openreview.net/pdf?id=rJeU_1SFvr
PWC	https://paperswithcode.com/paper/logan-latent-optimisation-for-generative
Repo
Framework

Efficient High-Dimensional Data Representation Learning via Semi-Stochastic Block Coordinate Descent Methods


Title	Efficient High-Dimensional Data Representation Learning via Semi-Stochastic Block Coordinate Descent Methods
Authors	Anonymous
Abstract	With the increase of data volume and data dimension, sparse representation learning attracts more and more attention. For high-dimensional data, randomized block coordinate descent methods perform well because they do not need to calculate the gradient along the whole dimension. Existing hard thresholding algorithms evaluate gradients followed by a hard thresholding operation to update the model parameter, which leads to slow convergence. To address this issue, we propose a novel hard thresholding algorithm, called Semi-stochastic Block Coordinate Descent Hard Thresholding Pursuit (SBCD-HTP). Moreover, we present its sparse and asynchronous parallel variants. We theoretically analyze the convergence properties of our algorithms, which show that they have a significantly lower hard thresholding complexity than existing algorithms. Our empirical evaluations on real-world datasets and face recognition tasks demonstrate the superior performance of our algorithms for sparsity-constrained optimization problems.
Tasks	Face Recognition, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HkewNJStDr
PDF	https://openreview.net/pdf?id=HkewNJStDr
PWC	https://paperswithcode.com/paper/efficient-high-dimensional-data
Repo
Framework