Paper Group NANR 71
Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models. Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently. Neural Outlier Rejection for Self-Supervised Keypoint Learning. Deep Hierarchical-Hyperspherical Learning (DH^2L). Physics-Aware Flow Data Completion Using Neural Inpa …
Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models
Title | Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models |
Authors | Anonymous |
Abstract | In this work, we study how the large-scale pretrain-finetune framework changes the behavior of a neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. We find that after standard fine-tuning, the model forgets important language generation skills acquired during large-scale pre-training. We demonstrate the forgetting phenomenon through a detailed behavior analysis from the perspectives of context sensitivity and knowledge transfer. Adopting the concept of data mixing, we propose an intuitive fine-tuning strategy named “mix-review’'. We find that mix-review effectively regularize the fine-tuning process, and the forgetting problem is largely alleviated. Finally, we discuss interesting behavior of the resulting dialogue model and its implications. |
Tasks | Text Generation, Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1lUE04YPB |
https://openreview.net/pdf?id=r1lUE04YPB | |
PWC | https://paperswithcode.com/paper/mix-review-alleviate-forgetting-in-the-1 |
Repo | |
Framework | |
Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently
Title | Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently |
Authors | Anonymous |
Abstract | It has been observed \citep{zhang2016understanding} that deep neural networks can memorize: they achieve 100% accuracy on training data. Recent theoretical results explained such behavior in highly overparametrized regimes, where the number of neurons in each layer is larger than the number of training samples. In this paper, we show that neural networks can be trained to memorize training data perfectly in a mildly overparametrized regime, where the number of parameters is just a constant factor more than the number of training samples, and the number of neurons is much smaller. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkly70EKDH |
https://openreview.net/pdf?id=rkly70EKDH | |
PWC | https://paperswithcode.com/paper/mildly-overparametrized-neural-nets-can-1 |
Repo | |
Framework | |
Neural Outlier Rejection for Self-Supervised Keypoint Learning
Title | Neural Outlier Rejection for Self-Supervised Keypoint Learning |
Authors | Anonymous |
Abstract | Generating reliable illumination and viewpoint invariant keypoints is critical for tasks such as feature-based SLAM and SfM. Recently, many learned keypoint methods have demonstrated improved performance on challenging benchmarks. However, it is extremely difficult to create consistent training samples for interest points in natural images, since they are hard to define clearly and consistently for a human annotator. In this work, we propose a novel end-to-end self-supervised learning scheme that can effectively exploit unlabeled data to provide more reliable keypoints under various scene conditions. Our key contributions are (i) a novel way of regressing keypoints, which avoids discretization errors introduced by related methods; (ii) a novel way of extracting associated descriptors by means of an upsampling step, which allows regressing the descriptors with a more fine-grained detail for the per-pixel level metric learning and (iii) a novel way of training the descriptor by using a proxy task, i.e. neural outlier rejection. By using this proxy task we can derive a fully self-supervised training loss for the descriptor, thus avoiding the need for manual annotation. We show that these three contributions greatly improve the quality of feature matching and homography estimation on challenging benchmarks over the state-of-the-art. |
Tasks | Homography Estimation, Metric Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Skx82ySYPH |
https://openreview.net/pdf?id=Skx82ySYPH | |
PWC | https://paperswithcode.com/paper/neural-outlier-rejection-for-self-supervised |
Repo | |
Framework | |
Deep Hierarchical-Hyperspherical Learning (DH^2L)
Title | Deep Hierarchical-Hyperspherical Learning (DH^2L) |
Authors | Anonymous |
Abstract | Regularization is known to be an inexpensive and reasonable solution to alleviate over-fitting problems of inference models, including deep neural networks. In this paper, we propose a hierarchical regularization which preserves the semantic structure of a sample distribution. At the same time, this regularization promotes diversity by imposing distance between parameter vectors enlarged within semantic structures. To generate evenly distributed parameters, we constrain them to lie on \emph{hierarchical hyperspheres}. Evenly distributed parameters are considered to be less redundant. To define hierarchical parameter space, we propose to reformulate the topology space with multiple hypersphere space. On each hypersphere space, the projection parameter is defined by two individual parameters. Since maximizing groupwise pairwise distance between points on hypersphere is nontrivial (generalized Thomson problem), we propose a new discrete metric integrated with continuous angle metric. Extensive experiments on publicly available datasets (CIFAR-10, CIFAR-100, CUB200-2011, and Stanford Cars), our proposed method shows improved generalization performance, especially when the number of super-classes is larger. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1lHAAVtwr |
https://openreview.net/pdf?id=r1lHAAVtwr | |
PWC | https://paperswithcode.com/paper/deep-hierarchical-hyperspherical-learning |
Repo | |
Framework | |
Physics-Aware Flow Data Completion Using Neural Inpainting
Title | Physics-Aware Flow Data Completion Using Neural Inpainting |
Authors | Anonymous |
Abstract | In this paper we propose a physics-aware neural network for inpainting fluid flow data. We consider that flow field data inherently follows the solution of the Navier-Stokes equations and hence our network is designed to capture physical laws. We use a DenseBlock U-Net architecture combined with a stream function formulation to inpaint missing velocity data. Our loss functions represent the relevant physical quantities velocity, velocity Jacobian, vorticity and divergence. Obstacles are treated as known priors, and each layer of the network receives the relevant information through concatenation with the previous layer’s output. Our results demonstrate the network’s capability for physics-aware completion tasks, and the presented ablation studies show the effectiveness of each proposed component. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BylldxBYwH |
https://openreview.net/pdf?id=BylldxBYwH | |
PWC | https://paperswithcode.com/paper/physics-aware-flow-data-completion-using |
Repo | |
Framework | |
Learning Neural Surrogate Model for Warm-Starting Bayesian Optimization
Title | Learning Neural Surrogate Model for Warm-Starting Bayesian Optimization |
Authors | Haotian Zhang, Jian Sun, Zongben Xu |
Abstract | Bayesian optimization is an effective tool to optimize black-box functions and popular for hyper-parameter tuning in machine learning. Traditional Bayesian optimization methods are based on Gaussian process (GP), relying on a GP-based surrogate model for sampling points of the function of interest. In this work, we consider transferring knowledge from related problems to target problem by learning an initial surrogate model for warm-starting Bayesian optimization. We propose a neural network-based surrogate model to estimate the function mean value in GP. Then we design a novel weighted Reptile algorithm with sampling strategy to learn an initial surrogate model from meta train set. The initial surrogate model is learned to be able to well adapt to new tasks. Extensive experiments show that this warm-starting technique enables us to find better minimizer or hyper-parameters than traditional GP and previous warm-starting methods. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1g6s0NtwS |
https://openreview.net/pdf?id=H1g6s0NtwS | |
PWC | https://paperswithcode.com/paper/learning-neural-surrogate-model-for-warm |
Repo | |
Framework | |
Recurrent Hierarchical Topic-Guided Neural Language Models
Title | Recurrent Hierarchical Topic-Guided Neural Language Models |
Authors | Anonymous |
Abstract | To simultaneously capture syntax and semantics from a text corpus, we propose a new larger-context language model that extracts recurrent hierarchical semantic structure via a dynamic deep topic model to guide natural language generation. Moving beyond a conventional language model that ignores long-range word dependencies and sentence order, the proposed model captures not only intra-sentence word dependencies, but also temporal transitions between sentences and inter-sentence topic dependences. For inference, we develop a hybrid of stochastic-gradient MCMC and recurrent autoencoding variational Bayes. Experimental results on a variety of real-world text corpora demonstrate that the proposed model not only outperforms state-of-the-art larger-context language models, but also learns interpretable recurrent multilayer topics and generates diverse sentences and paragraphs that are syntactically correct and semantically coherent. |
Tasks | Language Modelling, Text Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Byl1W1rtvH |
https://openreview.net/pdf?id=Byl1W1rtvH | |
PWC | https://paperswithcode.com/paper/recurrent-hierarchical-topic-guided-neural |
Repo | |
Framework | |
Exploration Based Language Learning for Text-Based Games
Title | Exploration Based Language Learning for Text-Based Games |
Authors | Anonymous |
Abstract | This work presents an exploration and imitation-learning-based agent capable of state-of-the-art performance in playing text-based computer games. Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, and language generation by artificial agents. Moreover, they provide a learning environment in which these skills can be acquired through interactions with an environment rather than using fixed corpora. One aspect that makes these games particularly challenging for learning agents is the combinatorially large action space. Existing methods for solving text-based games are limited to games that are either very simple or have an action space restricted to a predetermined set of admissible actions. In this work, we propose to use the exploration approach of Go-Explore (Ecoffet et al., 2019) for solving text-based games. More specifically, in an initial exploration phase, we first extract trajectories with high rewards, after which we train a policy to solve the game by imitating these trajectories. Our experiments show that this approach outperforms existing solutions in solving text-based games, and it is more sample efficient in terms of the number of interactions with the environment. Moreover, we show that the learned policy can generalize better than existing solutions to unseen games without using any restriction on the action space. |
Tasks | Imitation Learning, Text Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BygSXCNFDB |
https://openreview.net/pdf?id=BygSXCNFDB | |
PWC | https://paperswithcode.com/paper/exploration-based-language-learning-for-text |
Repo | |
Framework | |
Explanation by Progressive Exaggeration
Title | Explanation by Progressive Exaggeration |
Authors | Anonymous |
Abstract | As machine learning methods see greater adoption and implementation in high stakes applications such as medical image diagnosis, the need for model interpretability and explanation has become more critical. Classical approaches that assess feature importance (eg saliency maps) do not explain how and why a particular region of an image is relevant to the prediction. We propose a method that explains the outcome of a classification black-box by gradually exaggerating the semantic effect of a given class. Given a query input to a classifier, our method produces a progressive set of plausible variations of that query, which gradually change the posterior probability from its original class to its negation. These counter-factually generated samples preserve features unrelated to the classification decision, such that a user can employ our method as a ``tuning knob’’ to traverse a data manifold while crossing the decision boundary. Our method is model agnostic and only requires the output value and gradient of the predictor with respect to its input. | |
Tasks | Feature Importance |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1xFWgrFPS |
https://openreview.net/pdf?id=H1xFWgrFPS | |
PWC | https://paperswithcode.com/paper/explanation-by-progressive-exaggeration |
Repo | |
Framework | |
How noise affects the Hessian spectrum in overparameterized neural networks
Title | How noise affects the Hessian spectrum in overparameterized neural networks |
Authors | Anonymous |
Abstract | Stochastic gradient descent (SGD) forms the core optimization method for deep neural networks. While some theoretical progress has been made, it still remains unclear why SGD leads the learning dynamics in overparameterized networks to solutions that generalize well. Here we show that for overparameterized networks with a degenerate valley in their loss landscape, SGD on average decreases the trace of the Hessian of the loss. We also generalize this result to other noise structures and show that isotropic noise in the non-degenerate subspace of the Hessian decreases its determinant. In addition to explaining SGDs role in sculpting the Hessian spectrum, this opens the door to new optimization approaches that may confer better generalization performance. We test our results with experiments on toy models and deep neural networks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hklcm0VYDS |
https://openreview.net/pdf?id=Hklcm0VYDS | |
PWC | https://paperswithcode.com/paper/how-noise-affects-the-hessian-spectrum-in-1 |
Repo | |
Framework | |
Imagining the Latent Space of a Variational Auto-Encoders
Title | Imagining the Latent Space of a Variational Auto-Encoders |
Authors | Anonymous |
Abstract | Variational Auto-Encoders (VAEs) are designed to capture compressible information about a dataset. As a consequence the information stored in the latent space is seldom sufficient to reconstruct a particular image. To help understand the type of information stored in the latent space we train a GAN-style decoder constrained to produce images that the VAE encoder will map to the same region of latent space. This allows us to ‘‘imagine’’ the information captured in the latent space. We argue that this is necessary to make a VAE into a truly generative model. We use our GAN to visualise the latent space of a standard VAE and of a $\beta$-VAE. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJe4PyrFvB |
https://openreview.net/pdf?id=BJe4PyrFvB | |
PWC | https://paperswithcode.com/paper/imagining-the-latent-space-of-a-variational |
Repo | |
Framework | |
Attention Interpretability Across NLP Tasks
Title | Attention Interpretability Across NLP Tasks |
Authors | Anonymous |
Abstract | The attention layer in a neural network model provides insights into the model’s reasoning behind its prediction, which are usually criticized for being opaque. Recently, seemingly contradictory viewpoints have emerged about the interpretability of attention weights (Jain & Wallace, 2019; Vig & Belinkov, 2019). Amid such confusion arises the need to understand attention mechanism more systematically. In this work, we attempt to fill this gap by giving a comprehensive explanation which justifies both kinds of observations (i.e., when is attention interpretable and when it is not). Through a series of experiments on diverse NLP tasks, we validate our observations and reinforce our claim of interpretability of attention through manual evaluation. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJe-_CNKPH |
https://openreview.net/pdf?id=BJe-_CNKPH | |
PWC | https://paperswithcode.com/paper/attention-interpretability-across-nlp-tasks-1 |
Repo | |
Framework | |
Self-Adversarial Learning with Comparative Discrimination for Text Generation
Title | Self-Adversarial Learning with Comparative Discrimination for Text Generation |
Authors | Anonymous |
Abstract | Conventional Generative Adversarial Networks (GANs) for text generation tend to have issues of reward sparsity and mode collapse that affect the quality and diversity of generated samples. To address the issues, we propose a novel self-adversarial learning (SAL) paradigm for improving GANs’ performance in text generation. In contrast to standard GANs that use a binary classifier as its discriminator to predict whether a sample is real or generated, SAL employs a comparative discriminator which is a pairwise classifier for comparing the text quality between a pair of samples. During training, SAL rewards the generator when its currently generated sentence is found to be better than its previously generated samples. This self-improvement reward mechanism allows the model to receive credits more easily and avoid collapsing towards the limited number of real samples, which not only helps alleviate the reward sparsity issue but also reduces the risk of mode collapse. Experiments on text generation benchmark datasets show that our proposed approach substantially improves both the quality and the diversity, and yields more stable performance compared to the previous GANs for text generation. |
Tasks | Text Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1l8L6EtDS |
https://openreview.net/pdf?id=B1l8L6EtDS | |
PWC | https://paperswithcode.com/paper/self-adversarial-learning-with-comparative |
Repo | |
Framework | |
Quantized Reinforcement Learning (QuaRL)
Title | Quantized Reinforcement Learning (QuaRL) |
Authors | Anonymous |
Abstract | Recent work has shown that quantization can help reduce the memory, compute, and energy demands of deep neural networks without significantly harming their quality. However, whether these prior techniques, applied traditionally to image-based models, work with the same efficacy to the sequential decision making process in reinforcement learning remains an unanswered question. To address this void, we conduct the first comprehensive empirical study that quantifies the effects of quantization on various deep reinforcement learning policies with the intent to reduce their computational resource demands. We apply techniques such as post-training quantization and quantization aware training to a spectrum of reinforcement learning tasks (such as Pong, Breakout, BeamRider and more) and training algorithms (such as PPO, A2C, DDPG, and DQN). Across this spectrum of tasks and learning algorithms, we show that policies can be quantized to 6-8 bits of precision without loss of accuracy. Additionally, we show that certain tasks and reinforcement learning algorithms yield policies that are more difficult to quantize due to their effect of widening the models’ distribution of weights and that quantization aware training consistently improves results over post-training quantization and oftentimes even over the full precision baseline. Finally, we demonstrate the real-world applications of quantization for reinforcement learning. We use half-precision training to train a Pong model 50 % faster, and we deploy a quantized reinforcement learning based navigation policy to an embedded system, achieving an 18x speedup and a 4x reduction in memory usage over an unquantized policy. |
Tasks | Decision Making, Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJeEP04KDH |
https://openreview.net/pdf?id=HJeEP04KDH | |
PWC | https://paperswithcode.com/paper/quantized-reinforcement-learning-quarl-1 |
Repo | |
Framework | |
Unsupervised Universal Self-Attention Network for Graph Classification
Title | Unsupervised Universal Self-Attention Network for Graph Classification |
Authors | Anonymous |
Abstract | Existing graph embedding models often have weaknesses in exploiting graph structure similarities, potential dependencies among nodes and global network properties. To this end, we present U2GAN, a novel unsupervised model leveraging on the strength of the recently introduced universal self-attention network (Dehghani et al., 2019), to learn low-dimensional embeddings of graphs which can be used for graph classification. In particular, given an input graph, U2GAN first applies a self-attention computation, which is then followed by a recurrent transition to iteratively memorize its attention on vector representations of each node and its neighbors across each iteration. Thus, U2GAN can address the weaknesses in the existing models in order to produce plausible node embeddings whose sum is the final embedding of the whole graph. Experimental results show that our unsupervised U2GAN produces new state-of-the-art performances on a range of well-known benchmark datasets for the graph classification task. It even outperforms supervised methods in most of benchmark cases. |
Tasks | Graph Classification, Graph Embedding |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJeLBpEFPB |
https://openreview.net/pdf?id=HJeLBpEFPB | |
PWC | https://paperswithcode.com/paper/unsupervised-universal-self-attention-network-1 |
Repo | |
Framework | |