Paper Group NANR 138
Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks. AUGMENTED POLICY GRADIENT METHODS FOR EFFICIENT REINFORCEMENT LEARNING. Sample Efficient Policy Gradient Methods with Recursive Variance Reduction. Topological Autoencoders. Policy Optimization with Stochastic Mirror Descent. Generative Latent Flow. Neural Phrase-to-P …
Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks
Title | Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks |
Authors | Anonymous |
Abstract | In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network, significantly extending prior work on a method known as Bayesian Dark Knowledge." Our generalized framework applies to the case of classification models and takes as input the architecture of a teacher” network, a general posterior expectation of interest, and the architecture of a ``student” network. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples from the parameter posterior of the teacher model. We further consider the problem of optimizing the student model architecture with respect to an accuracy-speed-storage trade-off. We present experimental results investigating multiple data sets, distillation targets, teacher model architectures, and approaches to searching for student model architectures. We establish the key result that distilling into a student model with an architecture that matches the teacher, as is done in Bayesian Dark Knowledge, can lead to sub-optimal performance. Lastly, we show that student architecture search methods can identify student models with significantly improved performance. | |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Byg_vREtvB |
https://openreview.net/pdf?id=Byg_vREtvB | |
PWC | https://paperswithcode.com/paper/generalized-bayesian-posterior-expectation |
Repo | |
Framework | |
AUGMENTED POLICY GRADIENT METHODS FOR EFFICIENT REINFORCEMENT LEARNING
Title | AUGMENTED POLICY GRADIENT METHODS FOR EFFICIENT REINFORCEMENT LEARNING |
Authors | Anonymous |
Abstract | We propose a new mixture of model-based and model-free reinforcement learning (RL) algorithms that combines the strengths of both RL methods. Our goal is to reduce the sample complexity of model-free approaches utilizing fictitious trajectory rollouts performed on a learned dynamics model to improve the data efficiency of policy gradient methods while maintaining the same asymptotic behaviour. We suggest to use a special type of uncertainty quantification by a stochastic dynamics model in which the next state prediction is randomly drawn from the distribution predicted by the dynamics model. As a result, the negative effect of exploiting erroneously optimistic regions in the dynamics model is addressed by next state predictions based on an uncertainty aware ensemble of dynamics models. The influence of the ensemble of dynamics models on the policy update is controlled by adjusting the number of virtually performed rollouts in the next iteration according to the ratio of the real and virtual total reward. Our approach, which we call Model-Based Policy Gradient Enrichment (MBPGE), is tested on a collection of benchmark tests including simulated robotic locomotion. We compare our approach to plain model-free algorithms and a model-based one. Our evaluation shows that MBPGE leads to higher learning rates in an early training stage and an improved asymptotic behaviour. |
Tasks | Policy Gradient Methods |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1gN8yrYwB |
https://openreview.net/pdf?id=S1gN8yrYwB | |
PWC | https://paperswithcode.com/paper/augmented-policy-gradient-methods-for |
Repo | |
Framework | |
Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
Title | Sample Efficient Policy Gradient Methods with Recursive Variance Reduction |
Authors | Anonymous |
Abstract | Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires $O(1/\epsilon^{3/2})$\footnote{$O(\cdot)$ notation hides constant factors.} episodes to find an $\epsilon$-approximate stationary point of the nonconcave performance function $J(\boldsymbol{\theta})$ (i.e., $\boldsymbol{\theta}$ such that $\nabla J(\boldsymbol{\theta})_2^2\leq\epsilon$). This sample complexity improves the existing result $O(1/\epsilon^{5/3})$ for stochastic variance reduced policy gradient algorithms by a factor of $O(1/\epsilon^{1/6})$. In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms. |
Tasks | Policy Gradient Methods |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJlxIJBFDr |
https://openreview.net/pdf?id=HJlxIJBFDr | |
PWC | https://paperswithcode.com/paper/sample-efficient-policy-gradient-methods-with-1 |
Repo | |
Framework | |
Topological Autoencoders
Title | Topological Autoencoders |
Authors | Anonymous |
Abstract | We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we can construct this loss in a differentiable manner, such that the encoding learns to retain multi-scale connectivity information. We show that our approach is theoretically well-founded and that it exhibits favourable latent representations on a synthetic manifold as well as on real-world image data sets, while preserving low reconstruction errors. |
Tasks | Topological Data Analysis |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkgtJRVFPS |
https://openreview.net/pdf?id=HkgtJRVFPS | |
PWC | https://paperswithcode.com/paper/topological-autoencoders |
Repo | |
Framework | |
Policy Optimization with Stochastic Mirror Descent
Title | Policy Optimization with Stochastic Mirror Descent |
Authors | Anonymous |
Abstract | Improving sample efficiency has been a longstanding goal in reinforcement learning. In this paper, we propose the $\mathtt{VRMPO}$: a sample efficient policy gradient method with stochastic mirror descent. A novel variance reduced policy gradient estimator is the key of $\mathtt{VRMPO}$ to improve sample efficiency. Our $\mathtt{VRMPO}$ needs only $\mathcal{O}(\epsilon^{-3})$ sample trajectories to achieve an $\epsilon$-approximate first-order stationary point, which matches the best-known sample complexity. We conduct extensive experiments to show our algorithm outperforms state-of-the-art policy gradient methods in various settings. |
Tasks | Policy Gradient Methods |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkxpDT4YvS |
https://openreview.net/pdf?id=SkxpDT4YvS | |
PWC | https://paperswithcode.com/paper/policy-optimization-with-stochastic-mirror-1 |
Repo | |
Framework | |
Generative Latent Flow
Title | Generative Latent Flow |
Authors | Anonymous |
Abstract | In this work, we propose the Generative Latent Flow (GLF), an algorithm for generative modeling of the data distribution. GLF uses an Auto-encoder (AE) to learn latent representations of the data, and a normalizing flow to map the distribution of the latent variables to that of simple i.i.d noise. In contrast to some other Auto-encoder based generative models, which use various regularizers that encourage the encoded latent distribution to match the prior distribution, our model explicitly constructs a mapping between these two distributions, leading to better density matching while avoiding over regularizing the latent variables. We compare our model with several related techniques, and show that it has many relative advantages including fast convergence, single stage training and minimal reconstruction trade-off. We also study the relationship between our model and its stochastic counterpart, and show that our model can be viewed as a vanishing noise limit of VAEs with flow prior. Quantitatively, under standardized evaluations, our method achieves state-of-the-art sample quality and diversity among AE based models on commonly used datasets, and is competitive with GANs’ benchmarks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Syg7VaNYPB |
https://openreview.net/pdf?id=Syg7VaNYPB | |
PWC | https://paperswithcode.com/paper/generative-latent-flow |
Repo | |
Framework | |
Neural Phrase-to-Phrase Machine Translation
Title | Neural Phrase-to-Phrase Machine Translation |
Authors | Anonymous |
Abstract | We present Neural Phrase-to-Phrase Machine Translation (\nppmt), a phrase-based translation model that uses a novel phrase-attention mechanism to discover relevant input (source) segments to generate output (target) phrases. We propose an efficient dynamic programming algorithm to marginalize over all possible segments at training time and use a greedy algorithm or beam search for decoding. We also show how to incorporate a memory module derived from an external phrase dictionary to \nppmt{} to improve decoding. %that allows %the model to be trained faster %\nppmt is significantly faster %than existing neural phrase-based %machine translation method by \cite{huang2018towards}. Experiment results demonstrate that \nppmt{} outperforms the best neural phrase-based translation model \citep{huang2018towards} both in terms of model performance and speed, and is comparable to a state-of-the-art Transformer-based machine translation system \citep{vaswani2017attention}. |
Tasks | Machine Translation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1gtclSFvr |
https://openreview.net/pdf?id=S1gtclSFvr | |
PWC | https://paperswithcode.com/paper/neural-phrase-to-phrase-machine-translation-1 |
Repo | |
Framework | |
In-training Matrix Factorization for Parameter-frugal Neural Machine Translation
Title | In-training Matrix Factorization for Parameter-frugal Neural Machine Translation |
Authors | Anonymous |
Abstract | In this paper, we propose the use of in-training matrix factorization to reduce the model size for neural machine translation. Using in-training matrix factorization, parameter matrices may be decomposed into the products of smaller matrices, which can compress large machine translation architectures by vastly reducing the number of learnable parameters. We apply in-training matrix factorization to different layers of standard neural architectures and show that in-training factorization is capable of reducing nearly 50% of learnable parameters without any associated loss in BLEU score. Further, we find that in-training matrix factorization is especially powerful on embedding layers, providing a simple and effective method to curtail the number of parameters with minimal impact on model performance, and, at times, an increase in performance. |
Tasks | Machine Translation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJg0_eBFwB |
https://openreview.net/pdf?id=HJg0_eBFwB | |
PWC | https://paperswithcode.com/paper/in-training-matrix-factorization-for |
Repo | |
Framework | |
UW-NET: AN INCEPTION-ATTENTION NETWORK FOR UNDERWATER IMAGE CLASSIFICATION
Title | UW-NET: AN INCEPTION-ATTENTION NETWORK FOR UNDERWATER IMAGE CLASSIFICATION |
Authors | Miao Yang and Ke Hu, Chongyi Li, Zhiqiang Wei |
Abstract | The classification of images taken in special imaging environments except air is the first challenge in extending the applications of deep learning. We report on an UW-Net (Underwater Network), a new convolutional neural network (CNN) based network for underwater image classification. In this model, we simulate the visual correlation of background attention with image understanding for special environments, such as fog and underwater by constructing an inception-attention (I-A) module. The experimental results demonstrate that the proposed UW-Net achieves an accuracy of 99.3% on underwater image classification, which is significantly better than other image classification networks, such as AlexNet, InceptionV3, ResNet and Se-ResNet. Moreover, we demonstrate the proposed IA module can be used to boost the performance of the existing object recognition networks. By substituting the inception module with the I-A module, the Inception-ResnetV2 network achieves a 10.7% top1 error rate and a 0% top5 error rate on the subset of ILSVRC-2012, which further illustrates the function of the background attention in the image classifications. |
Tasks | Image Classification, Object Recognition |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HklCmaVtPS |
https://openreview.net/pdf?id=HklCmaVtPS | |
PWC | https://paperswithcode.com/paper/uw-net-an-inception-attention-network-for |
Repo | |
Framework | |
Uncertainty-aware Variational-Recurrent Imputation Network for Clinical Time Series
Title | Uncertainty-aware Variational-Recurrent Imputation Network for Clinical Time Series |
Authors | Anonymous |
Abstract | Electronic Health Records (EHR) comprise of longitudinal clinical observations portrayed with sparsity, irregularity, and high-dimensionality which become the major obstacles in drawing reliable downstream outcome. Despite greatly numbers of imputation methods are being proposed to tackle these issues, most of the existing methods ignore correlated features or temporal dynamics and entirely put aside the uncertainty. In particular, since the missing values estimates have the risk of being imprecise, it motivates us to pay attention to reliable and less certain information differently. In this work, we propose a novel variational-recurrent imputation network (V-RIN), which unified imputation and prediction network, by taking into account the correlated features, temporal dynamics, and further utilizing the uncertainty to alleviate the risk of biased missing values estimates. Specifically, we leverage the deep generative model to estimate the missing values based on the distribution among variables and a recurrent imputation network to exploit the temporal relations in conjunction with utilization of the uncertainty. We validated the effectiveness of our proposed model with publicly available real-world EHR dataset, PhysioNet Challenge 2012, and compared the results with other state-of-the-art competing methods in the literature. |
Tasks | Imputation, Time Series |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryg2wlSFwS |
https://openreview.net/pdf?id=ryg2wlSFwS | |
PWC | https://paperswithcode.com/paper/uncertainty-aware-variational-recurrent |
Repo | |
Framework | |
Compression without Quantization
Title | Compression without Quantization |
Authors | Anonymous |
Abstract | Standard compression algorithms work by mapping an image to discrete code using an encoder from which the original image can be reconstructed through a decoder. This process, due to the quantization step, is inherently non-differentiable so these algorithms must rely on approximate methods to train the encoder and decoder end-to-end. In this paper, we present an innovative framework for lossy image compression which is able to circumvent the quantization step by relying on a non-deterministic compression codec. The decoder maps the input image to a distribution in continuous space from which a sample can be encoded with expected code length being the relative entropy to the encoding distribution, i.e. it is bits-back efficient. The result is a principled, end-to-end differentiable compression framework that can be straight-forwardly trained using standard gradient-based optimizers. To showcase the efficiency of our method, we apply it to lossy image compression by training Probabilistic Ladder Networks (PLNs) on the CLIC 2018 dataset and show that their rate-distortion curves on the Kodak dataset are competitive with the state-of-the-art on low bitrates. |
Tasks | Image Compression, Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyeG9lHYwH |
https://openreview.net/pdf?id=HyeG9lHYwH | |
PWC | https://paperswithcode.com/paper/compression-without-quantization |
Repo | |
Framework | |
Learning Cross-Context Entity Representations from Text
Title | Learning Cross-Context Entity Representations from Text |
Authors | Anonymous |
Abstract | Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases or human readable encyclopedias tend to be entity-centric, we investigate the use of a fill-in-the-blank task to learn context independent representations of entities from the text contexts in which those entities were mentioned. We show that large scale training of neural models allows us to learn high quality entity representations, and we demonstrate successful results on four domains: (1) existing entity-level typing benchmarks, including a 64% error reduction over previous work on TypeNet (Murty et al., 2018); (2) a novel few-shot category reconstruction task; (3) existing entity linking benchmarks, where we achieve a score of 87.3% on TAC-KBP 2010 without using any alias table, external knowledge base or in domain training data and (4) answering trivia questions, which uniquely identify entities. Our global entity representations encode fine-grained type categories, such as “Scottish footballers”, and can answer trivia questions such as “Who was the last inmate of Spandau jail in Berlin?". |
Tasks | Entity Linking, Language Modelling, Learning Word Embeddings, Word Embeddings |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HygwvC4tPH |
https://openreview.net/pdf?id=HygwvC4tPH | |
PWC | https://paperswithcode.com/paper/learning-cross-context-entity-representations |
Repo | |
Framework | |
A new perspective in understanding of Adam-Type algorithms and beyond
Title | A new perspective in understanding of Adam-Type algorithms and beyond |
Authors | Anonymous |
Abstract | First-order adaptive optimization algorithms such as Adam play an important role in modern deep learning due to their super fast convergence speed in solving large scale optimization problems. However, Adam’s non-convergence behavior and regrettable generalization ability make it fall into a love-hate relationship to deep learning community. Previous studies on Adam and its variants (refer as Adam-Type algorithms) mainly rely on theoretical regret bound analysis, which overlook the natural characteristic reside in such algorithms and limit our thinking. In this paper, we aim at seeking a different interpretation of Adam-Type algorithms so that we can intuitively comprehend and improve them. The way we chose is based on a traditional online convex optimization algorithm scheme known as mirror descent method. By bridging Adam and mirror descent, we receive a clear map of the functionality of each part in Adam. In addition, this new angle brings us a new insight on identifying the non-convergence issue of Adam. Moreover, we provide new variant of Adam-Type algorithm, namely AdamAL which can naturally mitigate the non-convergence issue of Adam and improve its performance. We further conduct experiments on various popular deep learning tasks and models, and the results are quite promising. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyxM51BYPB |
https://openreview.net/pdf?id=SyxM51BYPB | |
PWC | https://paperswithcode.com/paper/a-new-perspective-in-understanding-of-adam |
Repo | |
Framework | |
Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification
Title | Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification |
Authors | Anonymous |
Abstract | We introduce a novel synthetic oversampling method for variable length, multi- feature sequence datasets based on autoencoders and generative adversarial net- works. We show that this method improves classification accuracy for highly imbalanced sequence classification tasks. We show that this method outperforms standard oversampling techniques that use techniques such as SMOTE and autoencoders. We also use generative adversarial networks on the majority class as an outlier detection method for novelty detection, with limited classification improvement. We show that the use of generative adversarial network based synthetic data improves classification model performance on a variety of sequence data sets. |
Tasks | Outlier Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxtCpNtDS |
https://openreview.net/pdf?id=ryxtCpNtDS | |
PWC | https://paperswithcode.com/paper/autoencoders-and-generative-adversarial-1 |
Repo | |
Framework | |
Statistically Consistent Saliency Estimation
Title | Statistically Consistent Saliency Estimation |
Authors | Anonymous |
Abstract | The use of deep learning for a wide range of data problems has increased the need for understanding and diagnosing these models, and deep learning interpretation techniques have become an essential tool for data analysts. Although numerous model interpretation methods have been proposed in recent years, most of these procedures are based on heuristics with little or no theoretical guarantees. In this work, we propose a statistical framework for saliency estimation for black box computer vision models. We build a model-agnostic estimation procedure that is statistically consistent and passes the saliency checks of Adebayo et al. (2018). Our method requires solving a linear program, whose solution can be efficiently computed in polynomial time. Through our theoretical analysis, we establish an upper bound on the number of model evaluations needed to recover the region of importance with high probability, and build a new perturbation scheme for estimation of local gradients that is shown to be more efficient than the commonly used random perturbation schemes. Validity of the new method is demonstrated through sensitivity analysis. |
Tasks | Saliency Prediction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlrZyrKDB |
https://openreview.net/pdf?id=BJlrZyrKDB | |
PWC | https://paperswithcode.com/paper/statistically-consistent-saliency-estimation |
Repo | |
Framework | |