April 1, 2020

2811 words 14 mins read

Paper Group NANR 138

Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks. AUGMENTED POLICY GRADIENT METHODS FOR EFFICIENT REINFORCEMENT LEARNING. Sample Efficient Policy Gradient Methods with Recursive Variance Reduction. Topological Autoencoders. Policy Optimization with Stochastic Mirror Descent. Generative Latent Flow. Neural Phrase-to-P …

Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks


Title	Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks
Authors	Anonymous
Abstract	In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network, significantly extending prior work on a method known as `Bayesian Dark Knowledge." Our generalized framework applies to the case of classification models and takes as input the architecture of a` teacher” network, a general posterior expectation of interest, and the architecture of a ``student” network. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples from the parameter posterior of the teacher model. We further consider the problem of optimizing the student model architecture with respect to an accuracy-speed-storage trade-off. We present experimental results investigating multiple data sets, distillation targets, teacher model architectures, and approaches to searching for student model architectures. We establish the key result that distilling into a student model with an architecture that matches the teacher, as is done in Bayesian Dark Knowledge, can lead to sub-optimal performance. Lastly, we show that student architecture search methods can identify student models with significantly improved performance. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Byg_vREtvB
PDF	https://openreview.net/pdf?id=Byg_vREtvB
PWC	https://paperswithcode.com/paper/generalized-bayesian-posterior-expectation
Repo
Framework

AUGMENTED POLICY GRADIENT METHODS FOR EFFICIENT REINFORCEMENT LEARNING


Title	AUGMENTED POLICY GRADIENT METHODS FOR EFFICIENT REINFORCEMENT LEARNING
Authors	Anonymous
Abstract	We propose a new mixture of model-based and model-free reinforcement learning (RL) algorithms that combines the strengths of both RL methods. Our goal is to reduce the sample complexity of model-free approaches utilizing fictitious trajectory rollouts performed on a learned dynamics model to improve the data efficiency of policy gradient methods while maintaining the same asymptotic behaviour. We suggest to use a special type of uncertainty quantification by a stochastic dynamics model in which the next state prediction is randomly drawn from the distribution predicted by the dynamics model. As a result, the negative effect of exploiting erroneously optimistic regions in the dynamics model is addressed by next state predictions based on an uncertainty aware ensemble of dynamics models. The influence of the ensemble of dynamics models on the policy update is controlled by adjusting the number of virtually performed rollouts in the next iteration according to the ratio of the real and virtual total reward. Our approach, which we call Model-Based Policy Gradient Enrichment (MBPGE), is tested on a collection of benchmark tests including simulated robotic locomotion. We compare our approach to plain model-free algorithms and a model-based one. Our evaluation shows that MBPGE leads to higher learning rates in an early training stage and an improved asymptotic behaviour.
Tasks	Policy Gradient Methods
Published	2020-01-01
URL	https://openreview.net/forum?id=S1gN8yrYwB
PDF	https://openreview.net/pdf?id=S1gN8yrYwB
PWC	https://paperswithcode.com/paper/augmented-policy-gradient-methods-for
Repo
Framework

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction


Title	Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
Authors	Anonymous
Abstract	Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires $O(1/\epsilon^{3/2})$\footnote{$O(\cdot)$ notation hides constant factors.} episodes to find an $\epsilon$-approximate stationary point of the nonconcave performance function $J(\boldsymbol{\theta})$ (i.e., $\boldsymbol{\theta}$ such that $\nabla J(\boldsymbol{\theta})_2^2\leq\epsilon$). This sample complexity improves the existing result $O(1/\epsilon^{5/3})$ for stochastic variance reduced policy gradient algorithms by a factor of $O(1/\epsilon^{1/6})$. In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.
Tasks	Policy Gradient Methods
Published	2020-01-01
URL	https://openreview.net/forum?id=HJlxIJBFDr
PDF	https://openreview.net/pdf?id=HJlxIJBFDr
PWC	https://paperswithcode.com/paper/sample-efficient-policy-gradient-methods-with-1
Repo
Framework

Topological Autoencoders


Title	Topological Autoencoders
Authors	Anonymous
Abstract	We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we can construct this loss in a differentiable manner, such that the encoding learns to retain multi-scale connectivity information. We show that our approach is theoretically well-founded and that it exhibits favourable latent representations on a synthetic manifold as well as on real-world image data sets, while preserving low reconstruction errors.
Tasks	Topological Data Analysis
Published	2020-01-01
URL	https://openreview.net/forum?id=HkgtJRVFPS
PDF	https://openreview.net/pdf?id=HkgtJRVFPS
PWC	https://paperswithcode.com/paper/topological-autoencoders
Repo
Framework

Policy Optimization with Stochastic Mirror Descent


Title	Policy Optimization with Stochastic Mirror Descent
Authors	Anonymous
Abstract	Improving sample efficiency has been a longstanding goal in reinforcement learning. In this paper, we propose the $\mathtt{VRMPO}$: a sample efficient policy gradient method with stochastic mirror descent. A novel variance reduced policy gradient estimator is the key of $\mathtt{VRMPO}$ to improve sample efficiency. Our $\mathtt{VRMPO}$ needs only $\mathcal{O}(\epsilon^{-3})$ sample trajectories to achieve an $\epsilon$-approximate first-order stationary point, which matches the best-known sample complexity. We conduct extensive experiments to show our algorithm outperforms state-of-the-art policy gradient methods in various settings.
Tasks	Policy Gradient Methods
Published	2020-01-01
URL	https://openreview.net/forum?id=SkxpDT4YvS
PDF	https://openreview.net/pdf?id=SkxpDT4YvS
PWC	https://paperswithcode.com/paper/policy-optimization-with-stochastic-mirror-1
Repo
Framework

Generative Latent Flow


Title	Generative Latent Flow
Authors	Anonymous
Abstract	In this work, we propose the Generative Latent Flow (GLF), an algorithm for generative modeling of the data distribution. GLF uses an Auto-encoder (AE) to learn latent representations of the data, and a normalizing flow to map the distribution of the latent variables to that of simple i.i.d noise. In contrast to some other Auto-encoder based generative models, which use various regularizers that encourage the encoded latent distribution to match the prior distribution, our model explicitly constructs a mapping between these two distributions, leading to better density matching while avoiding over regularizing the latent variables. We compare our model with several related techniques, and show that it has many relative advantages including fast convergence, single stage training and minimal reconstruction trade-off. We also study the relationship between our model and its stochastic counterpart, and show that our model can be viewed as a vanishing noise limit of VAEs with flow prior. Quantitatively, under standardized evaluations, our method achieves state-of-the-art sample quality and diversity among AE based models on commonly used datasets, and is competitive with GANs’ benchmarks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Syg7VaNYPB
PDF	https://openreview.net/pdf?id=Syg7VaNYPB
PWC	https://paperswithcode.com/paper/generative-latent-flow
Repo
Framework

Neural Phrase-to-Phrase Machine Translation


Title	Neural Phrase-to-Phrase Machine Translation
Authors	Anonymous
Abstract	We present Neural Phrase-to-Phrase Machine Translation (\nppmt), a phrase-based translation model that uses a novel phrase-attention mechanism to discover relevant input (source) segments to generate output (target) phrases. We propose an efficient dynamic programming algorithm to marginalize over all possible segments at training time and use a greedy algorithm or beam search for decoding. We also show how to incorporate a memory module derived from an external phrase dictionary to \nppmt{} to improve decoding. %that allows %the model to be trained faster %\nppmt is significantly faster %than existing neural phrase-based %machine translation method by \cite{huang2018towards}. Experiment results demonstrate that \nppmt{} outperforms the best neural phrase-based translation model \citep{huang2018towards} both in terms of model performance and speed, and is comparable to a state-of-the-art Transformer-based machine translation system \citep{vaswani2017attention}.
Tasks	Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=S1gtclSFvr
PDF	https://openreview.net/pdf?id=S1gtclSFvr
PWC	https://paperswithcode.com/paper/neural-phrase-to-phrase-machine-translation-1
Repo
Framework

In-training Matrix Factorization for Parameter-frugal Neural Machine Translation


Title	In-training Matrix Factorization for Parameter-frugal Neural Machine Translation
Authors	Anonymous
Abstract	In this paper, we propose the use of in-training matrix factorization to reduce the model size for neural machine translation. Using in-training matrix factorization, parameter matrices may be decomposed into the products of smaller matrices, which can compress large machine translation architectures by vastly reducing the number of learnable parameters. We apply in-training matrix factorization to different layers of standard neural architectures and show that in-training factorization is capable of reducing nearly 50% of learnable parameters without any associated loss in BLEU score. Further, we find that in-training matrix factorization is especially powerful on embedding layers, providing a simple and effective method to curtail the number of parameters with minimal impact on model performance, and, at times, an increase in performance.
Tasks	Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=HJg0_eBFwB
PDF	https://openreview.net/pdf?id=HJg0_eBFwB
PWC	https://paperswithcode.com/paper/in-training-matrix-factorization-for
Repo
Framework

UW-NET: AN INCEPTION-ATTENTION NETWORK FOR UNDERWATER IMAGE CLASSIFICATION


Title	UW-NET: AN INCEPTION-ATTENTION NETWORK FOR UNDERWATER IMAGE CLASSIFICATION
Authors	Miao Yang and Ke Hu, Chongyi Li, Zhiqiang Wei
Abstract	The classification of images taken in special imaging environments except air is the first challenge in extending the applications of deep learning. We report on an UW-Net (Underwater Network), a new convolutional neural network (CNN) based network for underwater image classification. In this model, we simulate the visual correlation of background attention with image understanding for special environments, such as fog and underwater by constructing an inception-attention (I-A) module. The experimental results demonstrate that the proposed UW-Net achieves an accuracy of 99.3% on underwater image classification, which is significantly better than other image classification networks, such as AlexNet, InceptionV3, ResNet and Se-ResNet. Moreover, we demonstrate the proposed IA module can be used to boost the performance of the existing object recognition networks. By substituting the inception module with the I-A module, the Inception-ResnetV2 network achieves a 10.7% top1 error rate and a 0% top5 error rate on the subset of ILSVRC-2012, which further illustrates the function of the background attention in the image classifications.
Tasks	Image Classification, Object Recognition
Published	2020-01-01
URL	https://openreview.net/forum?id=HklCmaVtPS
PDF	https://openreview.net/pdf?id=HklCmaVtPS
PWC	https://paperswithcode.com/paper/uw-net-an-inception-attention-network-for
Repo
Framework

Uncertainty-aware Variational-Recurrent Imputation Network for Clinical Time Series


Title	Uncertainty-aware Variational-Recurrent Imputation Network for Clinical Time Series
Authors	Anonymous
Abstract	Electronic Health Records (EHR) comprise of longitudinal clinical observations portrayed with sparsity, irregularity, and high-dimensionality which become the major obstacles in drawing reliable downstream outcome. Despite greatly numbers of imputation methods are being proposed to tackle these issues, most of the existing methods ignore correlated features or temporal dynamics and entirely put aside the uncertainty. In particular, since the missing values estimates have the risk of being imprecise, it motivates us to pay attention to reliable and less certain information differently. In this work, we propose a novel variational-recurrent imputation network (V-RIN), which unified imputation and prediction network, by taking into account the correlated features, temporal dynamics, and further utilizing the uncertainty to alleviate the risk of biased missing values estimates. Specifically, we leverage the deep generative model to estimate the missing values based on the distribution among variables and a recurrent imputation network to exploit the temporal relations in conjunction with utilization of the uncertainty. We validated the effectiveness of our proposed model with publicly available real-world EHR dataset, PhysioNet Challenge 2012, and compared the results with other state-of-the-art competing methods in the literature.
Tasks	Imputation, Time Series
Published	2020-01-01
URL	https://openreview.net/forum?id=ryg2wlSFwS
PDF	https://openreview.net/pdf?id=ryg2wlSFwS
PWC	https://paperswithcode.com/paper/uncertainty-aware-variational-recurrent
Repo
Framework

Compression without Quantization


Title	Compression without Quantization
Authors	Anonymous
Abstract	Standard compression algorithms work by mapping an image to discrete code using an encoder from which the original image can be reconstructed through a decoder. This process, due to the quantization step, is inherently non-differentiable so these algorithms must rely on approximate methods to train the encoder and decoder end-to-end. In this paper, we present an innovative framework for lossy image compression which is able to circumvent the quantization step by relying on a non-deterministic compression codec. The decoder maps the input image to a distribution in continuous space from which a sample can be encoded with expected code length being the relative entropy to the encoding distribution, i.e. it is bits-back efficient. The result is a principled, end-to-end differentiable compression framework that can be straight-forwardly trained using standard gradient-based optimizers. To showcase the efficiency of our method, we apply it to lossy image compression by training Probabilistic Ladder Networks (PLNs) on the CLIC 2018 dataset and show that their rate-distortion curves on the Kodak dataset are competitive with the state-of-the-art on low bitrates.
Tasks	Image Compression, Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=HyeG9lHYwH
PDF	https://openreview.net/pdf?id=HyeG9lHYwH
PWC	https://paperswithcode.com/paper/compression-without-quantization
Repo
Framework

Learning Cross-Context Entity Representations from Text


Title	Learning Cross-Context Entity Representations from Text
Authors	Anonymous
Abstract	Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases or human readable encyclopedias tend to be entity-centric, we investigate the use of a fill-in-the-blank task to learn context independent representations of entities from the text contexts in which those entities were mentioned. We show that large scale training of neural models allows us to learn high quality entity representations, and we demonstrate successful results on four domains: (1) existing entity-level typing benchmarks, including a 64% error reduction over previous work on TypeNet (Murty et al., 2018); (2) a novel few-shot category reconstruction task; (3) existing entity linking benchmarks, where we achieve a score of 87.3% on TAC-KBP 2010 without using any alias table, external knowledge base or in domain training data and (4) answering trivia questions, which uniquely identify entities. Our global entity representations encode fine-grained type categories, such as “Scottish footballers”, and can answer trivia questions such as “Who was the last inmate of Spandau jail in Berlin?".
Tasks	Entity Linking, Language Modelling, Learning Word Embeddings, Word Embeddings
Published	2020-01-01
URL	https://openreview.net/forum?id=HygwvC4tPH
PDF	https://openreview.net/pdf?id=HygwvC4tPH
PWC	https://paperswithcode.com/paper/learning-cross-context-entity-representations
Repo
Framework

A new perspective in understanding of Adam-Type algorithms and beyond


Title	A new perspective in understanding of Adam-Type algorithms and beyond
Authors	Anonymous
Abstract	First-order adaptive optimization algorithms such as Adam play an important role in modern deep learning due to their super fast convergence speed in solving large scale optimization problems. However, Adam’s non-convergence behavior and regrettable generalization ability make it fall into a love-hate relationship to deep learning community. Previous studies on Adam and its variants (refer as Adam-Type algorithms) mainly rely on theoretical regret bound analysis, which overlook the natural characteristic reside in such algorithms and limit our thinking. In this paper, we aim at seeking a different interpretation of Adam-Type algorithms so that we can intuitively comprehend and improve them. The way we chose is based on a traditional online convex optimization algorithm scheme known as mirror descent method. By bridging Adam and mirror descent, we receive a clear map of the functionality of each part in Adam. In addition, this new angle brings us a new insight on identifying the non-convergence issue of Adam. Moreover, we provide new variant of Adam-Type algorithm, namely AdamAL which can naturally mitigate the non-convergence issue of Adam and improve its performance. We further conduct experiments on various popular deep learning tasks and models, and the results are quite promising.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SyxM51BYPB
PDF	https://openreview.net/pdf?id=SyxM51BYPB
PWC	https://paperswithcode.com/paper/a-new-perspective-in-understanding-of-adam
Repo
Framework

Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification


Title	Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification
Authors	Anonymous
Abstract	We introduce a novel synthetic oversampling method for variable length, multi- feature sequence datasets based on autoencoders and generative adversarial net- works. We show that this method improves classification accuracy for highly imbalanced sequence classification tasks. We show that this method outperforms standard oversampling techniques that use techniques such as SMOTE and autoencoders. We also use generative adversarial networks on the majority class as an outlier detection method for novelty detection, with limited classification improvement. We show that the use of generative adversarial network based synthetic data improves classification model performance on a variety of sequence data sets.
Tasks	Outlier Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=ryxtCpNtDS
PDF	https://openreview.net/pdf?id=ryxtCpNtDS
PWC	https://paperswithcode.com/paper/autoencoders-and-generative-adversarial-1
Repo
Framework

Statistically Consistent Saliency Estimation


Title	Statistically Consistent Saliency Estimation
Authors	Anonymous
Abstract	The use of deep learning for a wide range of data problems has increased the need for understanding and diagnosing these models, and deep learning interpretation techniques have become an essential tool for data analysts. Although numerous model interpretation methods have been proposed in recent years, most of these procedures are based on heuristics with little or no theoretical guarantees. In this work, we propose a statistical framework for saliency estimation for black box computer vision models. We build a model-agnostic estimation procedure that is statistically consistent and passes the saliency checks of Adebayo et al. (2018). Our method requires solving a linear program, whose solution can be efficiently computed in polynomial time. Through our theoretical analysis, we establish an upper bound on the number of model evaluations needed to recover the region of importance with high probability, and build a new perturbation scheme for estimation of local gradients that is shown to be more efficient than the commonly used random perturbation schemes. Validity of the new method is demonstrated through sensitivity analysis.
Tasks	Saliency Prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=BJlrZyrKDB
PDF	https://openreview.net/pdf?id=BJlrZyrKDB
PWC	https://paperswithcode.com/paper/statistically-consistent-saliency-estimation
Repo
Framework