April 1, 2020

2758 words 13 mins read

Paper Group NANR 71

Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models. Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently. Neural Outlier Rejection for Self-Supervised Keypoint Learning. Deep Hierarchical-Hyperspherical Learning (DH^2L). Physics-Aware Flow Data Completion Using Neural Inpa …

Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models


Title	Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models
Authors	Anonymous
Abstract	In this work, we study how the large-scale pretrain-finetune framework changes the behavior of a neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. We find that after standard fine-tuning, the model forgets important language generation skills acquired during large-scale pre-training. We demonstrate the forgetting phenomenon through a detailed behavior analysis from the perspectives of context sensitivity and knowledge transfer. Adopting the concept of data mixing, we propose an intuitive fine-tuning strategy named “mix-review’'. We find that mix-review effectively regularize the fine-tuning process, and the forgetting problem is largely alleviated. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.
Tasks	Text Generation, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=r1lUE04YPB
PDF	https://openreview.net/pdf?id=r1lUE04YPB
PWC	https://paperswithcode.com/paper/mix-review-alleviate-forgetting-in-the-1
Repo
Framework

Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently


Title	Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently
Authors	Anonymous
Abstract	It has been observed \citep{zhang2016understanding} that deep neural networks can memorize: they achieve 100% accuracy on training data. Recent theoretical results explained such behavior in highly overparametrized regimes, where the number of neurons in each layer is larger than the number of training samples. In this paper, we show that neural networks can be trained to memorize training data perfectly in a mildly overparametrized regime, where the number of parameters is just a constant factor more than the number of training samples, and the number of neurons is much smaller.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkly70EKDH
PDF	https://openreview.net/pdf?id=rkly70EKDH
PWC	https://paperswithcode.com/paper/mildly-overparametrized-neural-nets-can-1
Repo
Framework

Neural Outlier Rejection for Self-Supervised Keypoint Learning


Title	Neural Outlier Rejection for Self-Supervised Keypoint Learning
Authors	Anonymous
Abstract	Generating reliable illumination and viewpoint invariant keypoints is critical for tasks such as feature-based SLAM and SfM. Recently, many learned keypoint methods have demonstrated improved performance on challenging benchmarks. However, it is extremely difficult to create consistent training samples for interest points in natural images, since they are hard to define clearly and consistently for a human annotator. In this work, we propose a novel end-to-end self-supervised learning scheme that can effectively exploit unlabeled data to provide more reliable keypoints under various scene conditions. Our key contributions are (i) a novel way of regressing keypoints, which avoids discretization errors introduced by related methods; (ii) a novel way of extracting associated descriptors by means of an upsampling step, which allows regressing the descriptors with a more fine-grained detail for the per-pixel level metric learning and (iii) a novel way of training the descriptor by using a proxy task, i.e. neural outlier rejection. By using this proxy task we can derive a fully self-supervised training loss for the descriptor, thus avoiding the need for manual annotation. We show that these three contributions greatly improve the quality of feature matching and homography estimation on challenging benchmarks over the state-of-the-art.
Tasks	Homography Estimation, Metric Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Skx82ySYPH
PDF	https://openreview.net/pdf?id=Skx82ySYPH
PWC	https://paperswithcode.com/paper/neural-outlier-rejection-for-self-supervised
Repo
Framework

Deep Hierarchical-Hyperspherical Learning (DH^2L)


Title	Deep Hierarchical-Hyperspherical Learning (DH^2L)
Authors	Anonymous
Abstract	Regularization is known to be an inexpensive and reasonable solution to alleviate over-fitting problems of inference models, including deep neural networks. In this paper, we propose a hierarchical regularization which preserves the semantic structure of a sample distribution. At the same time, this regularization promotes diversity by imposing distance between parameter vectors enlarged within semantic structures. To generate evenly distributed parameters, we constrain them to lie on \emph{hierarchical hyperspheres}. Evenly distributed parameters are considered to be less redundant. To define hierarchical parameter space, we propose to reformulate the topology space with multiple hypersphere space. On each hypersphere space, the projection parameter is defined by two individual parameters. Since maximizing groupwise pairwise distance between points on hypersphere is nontrivial (generalized Thomson problem), we propose a new discrete metric integrated with continuous angle metric. Extensive experiments on publicly available datasets (CIFAR-10, CIFAR-100, CUB200-2011, and Stanford Cars), our proposed method shows improved generalization performance, especially when the number of super-classes is larger.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1lHAAVtwr
PDF	https://openreview.net/pdf?id=r1lHAAVtwr
PWC	https://paperswithcode.com/paper/deep-hierarchical-hyperspherical-learning
Repo
Framework

Physics-Aware Flow Data Completion Using Neural Inpainting


Title	Physics-Aware Flow Data Completion Using Neural Inpainting
Authors	Anonymous
Abstract	In this paper we propose a physics-aware neural network for inpainting fluid flow data. We consider that flow field data inherently follows the solution of the Navier-Stokes equations and hence our network is designed to capture physical laws. We use a DenseBlock U-Net architecture combined with a stream function formulation to inpaint missing velocity data. Our loss functions represent the relevant physical quantities velocity, velocity Jacobian, vorticity and divergence. Obstacles are treated as known priors, and each layer of the network receives the relevant information through concatenation with the previous layer’s output. Our results demonstrate the network’s capability for physics-aware completion tasks, and the presented ablation studies show the effectiveness of each proposed component.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BylldxBYwH
PDF	https://openreview.net/pdf?id=BylldxBYwH
PWC	https://paperswithcode.com/paper/physics-aware-flow-data-completion-using
Repo
Framework

Learning Neural Surrogate Model for Warm-Starting Bayesian Optimization


Title	Learning Neural Surrogate Model for Warm-Starting Bayesian Optimization
Authors	Haotian Zhang, Jian Sun, Zongben Xu
Abstract	Bayesian optimization is an effective tool to optimize black-box functions and popular for hyper-parameter tuning in machine learning. Traditional Bayesian optimization methods are based on Gaussian process (GP), relying on a GP-based surrogate model for sampling points of the function of interest. In this work, we consider transferring knowledge from related problems to target problem by learning an initial surrogate model for warm-starting Bayesian optimization. We propose a neural network-based surrogate model to estimate the function mean value in GP. Then we design a novel weighted Reptile algorithm with sampling strategy to learn an initial surrogate model from meta train set. The initial surrogate model is learned to be able to well adapt to new tasks. Extensive experiments show that this warm-starting technique enables us to find better minimizer or hyper-parameters than traditional GP and previous warm-starting methods.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1g6s0NtwS
PDF	https://openreview.net/pdf?id=H1g6s0NtwS
PWC	https://paperswithcode.com/paper/learning-neural-surrogate-model-for-warm
Repo
Framework

Recurrent Hierarchical Topic-Guided Neural Language Models


Title	Recurrent Hierarchical Topic-Guided Neural Language Models
Authors	Anonymous
Abstract	To simultaneously capture syntax and semantics from a text corpus, we propose a new larger-context language model that extracts recurrent hierarchical semantic structure via a dynamic deep topic model to guide natural language generation. Moving beyond a conventional language model that ignores long-range word dependencies and sentence order, the proposed model captures not only intra-sentence word dependencies, but also temporal transitions between sentences and inter-sentence topic dependences. For inference, we develop a hybrid of stochastic-gradient MCMC and recurrent autoencoding variational Bayes. Experimental results on a variety of real-world text corpora demonstrate that the proposed model not only outperforms state-of-the-art larger-context language models, but also learns interpretable recurrent multilayer topics and generates diverse sentences and paragraphs that are syntactically correct and semantically coherent.
Tasks	Language Modelling, Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=Byl1W1rtvH
PDF	https://openreview.net/pdf?id=Byl1W1rtvH
PWC	https://paperswithcode.com/paper/recurrent-hierarchical-topic-guided-neural
Repo
Framework

Exploration Based Language Learning for Text-Based Games


Title	Exploration Based Language Learning for Text-Based Games
Authors	Anonymous
Abstract	This work presents an exploration and imitation-learning-based agent capable of state-of-the-art performance in playing text-based computer games. Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, and language generation by artificial agents. Moreover, they provide a learning environment in which these skills can be acquired through interactions with an environment rather than using fixed corpora. One aspect that makes these games particularly challenging for learning agents is the combinatorially large action space. Existing methods for solving text-based games are limited to games that are either very simple or have an action space restricted to a predetermined set of admissible actions. In this work, we propose to use the exploration approach of Go-Explore (Ecoffet et al., 2019) for solving text-based games. More specifically, in an initial exploration phase, we first extract trajectories with high rewards, after which we train a policy to solve the game by imitating these trajectories. Our experiments show that this approach outperforms existing solutions in solving text-based games, and it is more sample efficient in terms of the number of interactions with the environment. Moreover, we show that the learned policy can generalize better than existing solutions to unseen games without using any restriction on the action space.
Tasks	Imitation Learning, Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=BygSXCNFDB
PDF	https://openreview.net/pdf?id=BygSXCNFDB
PWC	https://paperswithcode.com/paper/exploration-based-language-learning-for-text
Repo
Framework

Explanation by Progressive Exaggeration


Title	Explanation by Progressive Exaggeration
Authors	Anonymous
Abstract	As machine learning methods see greater adoption and implementation in high stakes applications such as medical image diagnosis, the need for model interpretability and explanation has become more critical. Classical approaches that assess feature importance (eg saliency maps) do not explain how and why a particular region of an image is relevant to the prediction. We propose a method that explains the outcome of a classification black-box by gradually exaggerating the semantic effect of a given class. Given a query input to a classifier, our method produces a progressive set of plausible variations of that query, which gradually change the posterior probability from its original class to its negation. These counter-factually generated samples preserve features unrelated to the classification decision, such that a user can employ our method as a ``tuning knob’’ to traverse a data manifold while crossing the decision boundary. Our method is model agnostic and only requires the output value and gradient of the predictor with respect to its input. \|
Tasks	Feature Importance
Published	2020-01-01
URL	https://openreview.net/forum?id=H1xFWgrFPS
PDF	https://openreview.net/pdf?id=H1xFWgrFPS
PWC	https://paperswithcode.com/paper/explanation-by-progressive-exaggeration
Repo
Framework

How noise affects the Hessian spectrum in overparameterized neural networks


Title	How noise affects the Hessian spectrum in overparameterized neural networks
Authors	Anonymous
Abstract	Stochastic gradient descent (SGD) forms the core optimization method for deep neural networks. While some theoretical progress has been made, it still remains unclear why SGD leads the learning dynamics in overparameterized networks to solutions that generalize well. Here we show that for overparameterized networks with a degenerate valley in their loss landscape, SGD on average decreases the trace of the Hessian of the loss. We also generalize this result to other noise structures and show that isotropic noise in the non-degenerate subspace of the Hessian decreases its determinant. In addition to explaining SGDs role in sculpting the Hessian spectrum, this opens the door to new optimization approaches that may confer better generalization performance. We test our results with experiments on toy models and deep neural networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hklcm0VYDS
PDF	https://openreview.net/pdf?id=Hklcm0VYDS
PWC	https://paperswithcode.com/paper/how-noise-affects-the-hessian-spectrum-in-1
Repo
Framework

Imagining the Latent Space of a Variational Auto-Encoders


Title	Imagining the Latent Space of a Variational Auto-Encoders
Authors	Anonymous
Abstract	Variational Auto-Encoders (VAEs) are designed to capture compressible information about a dataset. As a consequence the information stored in the latent space is seldom sufficient to reconstruct a particular image. To help understand the type of information stored in the latent space we train a GAN-style decoder constrained to produce images that the VAE encoder will map to the same region of latent space. This allows us to ‘‘imagine’’ the information captured in the latent space. We argue that this is necessary to make a VAE into a truly generative model. We use our GAN to visualise the latent space of a standard VAE and of a $\beta$-VAE.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJe4PyrFvB
PDF	https://openreview.net/pdf?id=BJe4PyrFvB
PWC	https://paperswithcode.com/paper/imagining-the-latent-space-of-a-variational
Repo
Framework

Attention Interpretability Across NLP Tasks


Title	Attention Interpretability Across NLP Tasks
Authors	Anonymous
Abstract	The attention layer in a neural network model provides insights into the model’s reasoning behind its prediction, which are usually criticized for being opaque. Recently, seemingly contradictory viewpoints have emerged about the interpretability of attention weights (Jain & Wallace, 2019; Vig & Belinkov, 2019). Amid such confusion arises the need to understand attention mechanism more systematically. In this work, we attempt to fill this gap by giving a comprehensive explanation which justifies both kinds of observations (i.e., when is attention interpretable and when it is not). Through a series of experiments on diverse NLP tasks, we validate our observations and reinforce our claim of interpretability of attention through manual evaluation.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJe-_CNKPH
PDF	https://openreview.net/pdf?id=BJe-_CNKPH
PWC	https://paperswithcode.com/paper/attention-interpretability-across-nlp-tasks-1
Repo
Framework

Self-Adversarial Learning with Comparative Discrimination for Text Generation


Title	Self-Adversarial Learning with Comparative Discrimination for Text Generation
Authors	Anonymous
Abstract	Conventional Generative Adversarial Networks (GANs) for text generation tend to have issues of reward sparsity and mode collapse that affect the quality and diversity of generated samples. To address the issues, we propose a novel self-adversarial learning (SAL) paradigm for improving GANs’ performance in text generation. In contrast to standard GANs that use a binary classifier as its discriminator to predict whether a sample is real or generated, SAL employs a comparative discriminator which is a pairwise classifier for comparing the text quality between a pair of samples. During training, SAL rewards the generator when its currently generated sentence is found to be better than its previously generated samples. This self-improvement reward mechanism allows the model to receive credits more easily and avoid collapsing towards the limited number of real samples, which not only helps alleviate the reward sparsity issue but also reduces the risk of mode collapse. Experiments on text generation benchmark datasets show that our proposed approach substantially improves both the quality and the diversity, and yields more stable performance compared to the previous GANs for text generation.
Tasks	Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=B1l8L6EtDS
PDF	https://openreview.net/pdf?id=B1l8L6EtDS
PWC	https://paperswithcode.com/paper/self-adversarial-learning-with-comparative
Repo
Framework

Quantized Reinforcement Learning (QuaRL)


Title	Quantized Reinforcement Learning (QuaRL)
Authors	Anonymous
Abstract	Recent work has shown that quantization can help reduce the memory, compute, and energy demands of deep neural networks without significantly harming their quality. However, whether these prior techniques, applied traditionally to image-based models, work with the same efficacy to the sequential decision making process in reinforcement learning remains an unanswered question. To address this void, we conduct the first comprehensive empirical study that quantifies the effects of quantization on various deep reinforcement learning policies with the intent to reduce their computational resource demands. We apply techniques such as post-training quantization and quantization aware training to a spectrum of reinforcement learning tasks (such as Pong, Breakout, BeamRider and more) and training algorithms (such as PPO, A2C, DDPG, and DQN). Across this spectrum of tasks and learning algorithms, we show that policies can be quantized to 6-8 bits of precision without loss of accuracy. Additionally, we show that certain tasks and reinforcement learning algorithms yield policies that are more difficult to quantize due to their effect of widening the models’ distribution of weights and that quantization aware training consistently improves results over post-training quantization and oftentimes even over the full precision baseline. Finally, we demonstrate the real-world applications of quantization for reinforcement learning. We use half-precision training to train a Pong model 50 % faster, and we deploy a quantized reinforcement learning based navigation policy to an embedded system, achieving an 18x speedup and a 4x reduction in memory usage over an unquantized policy.
Tasks	Decision Making, Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=HJeEP04KDH
PDF	https://openreview.net/pdf?id=HJeEP04KDH
PWC	https://paperswithcode.com/paper/quantized-reinforcement-learning-quarl-1
Repo
Framework

Unsupervised Universal Self-Attention Network for Graph Classification


Title	Unsupervised Universal Self-Attention Network for Graph Classification
Authors	Anonymous
Abstract	Existing graph embedding models often have weaknesses in exploiting graph structure similarities, potential dependencies among nodes and global network properties. To this end, we present U2GAN, a novel unsupervised model leveraging on the strength of the recently introduced universal self-attention network (Dehghani et al., 2019), to learn low-dimensional embeddings of graphs which can be used for graph classification. In particular, given an input graph, U2GAN first applies a self-attention computation, which is then followed by a recurrent transition to iteratively memorize its attention on vector representations of each node and its neighbors across each iteration. Thus, U2GAN can address the weaknesses in the existing models in order to produce plausible node embeddings whose sum is the final embedding of the whole graph. Experimental results show that our unsupervised U2GAN produces new state-of-the-art performances on a range of well-known benchmark datasets for the graph classification task. It even outperforms supervised methods in most of benchmark cases.
Tasks	Graph Classification, Graph Embedding
Published	2020-01-01
URL	https://openreview.net/forum?id=HJeLBpEFPB
PDF	https://openreview.net/pdf?id=HJeLBpEFPB
PWC	https://paperswithcode.com/paper/unsupervised-universal-self-attention-network-1
Repo
Framework