April 1, 2020

2927 words 14 mins read

Paper Group NANR 74

Angular Visual Hardness. The Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions. A multi-task U-net for segmentation with lazy labels. Mint: Matrix-Interleaving for Multi-Task Learning. Gradient Surgery for Multi-Task Learning. Multi-Dimensional Explanation of Reviews. Underwhelming Generalizatio …

Angular Visual Hardness


Title	Angular Visual Hardness
Authors	Anonymous
Abstract	The mechanisms behind human visual systems and convolutional neural networks (CNNs) are vastly different. Hence, it is expected that they have different notions of ambiguity or hardness. In this paper, we make a surprising discovery: there exists a (nearly) universal score function for CNNs whose correlation with human visual hardness is statistically significant. We term this function as angular visual hardness (AVH) and in a CNN, it is given by the normalized angular distance between a feature embedding and the classifier weights of the corresponding target category. We conduct an in-depth scientific study. We observe that CNN models with the highest accuracy also have the best AVH scores. This agrees with an earlier finding that state-of-art models tend to improve on classification of harder training examples. We find that AVH displays interesting dynamics during training: it quickly reaches a plateau even though the training loss keeps improving. This suggests the need for designing better loss functions that can target harder examples more effectively. Finally, we empirically show significant improvement in performance by using AVH as a measure of hardness in self-training tasks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxJHlrFvr
PDF	https://openreview.net/pdf?id=HkxJHlrFvr
PWC	https://paperswithcode.com/paper/angular-visual-hardness
Repo
Framework

The Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions


Title	The Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions
Authors	Anonymous
Abstract	Deep learning models achieve high predictive accuracy in a broad spectrum of tasks, but rigorously quantifying their predictive uncertainty remains challenging. Usable estimates of predictive uncertainty should (1) cover the true prediction target with a high probability, and (2) discriminate between high- and low-confidence prediction instances. State-of-the-art methods for uncertainty quantification are based predominantly on Bayesian neural networks. However, Bayesian methods may fall short of (1) and (2) — i.e., Bayesian credible intervals do not guarantee frequentist coverage, and approximate posterior inference may undermine discriminative accuracy. To this end, this paper tackles the following question: can we devise an alternative frequentist approach for uncertainty quantification that satisfies (1) and (2)? To address this question, we develop the discriminative jackknife (DJ), a formal inference procedure that constructs predictive confidence intervals for a wide range of deep learning models, is easy to implement, and provides rigorous theoretical guarantees on (1) and (2). The DJ procedure uses higher-order influence functions (HOIFs) of the trained model parameters to construct a jackknife (leave-one-out) estimator of predictive confidence intervals. DJ computes HOIFs using a recursive formula that requires only oracle access to loss gradients and Hessian-vector products, hence it can be applied in a post-hoc fashion without compromising model accuracy or interfering with model training. Experiments demonstrate that DJ performs competitively compared to existing Bayesian and non-Bayesian baselines.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1xauR4Kvr
PDF	https://openreview.net/pdf?id=H1xauR4Kvr
PWC	https://paperswithcode.com/paper/the-discriminative-jackknife-quantifying
Repo
Framework

A multi-task U-net for segmentation with lazy labels


Title	A multi-task U-net for segmentation with lazy labels
Authors	Anonymous
Abstract	The need for labour intensive pixel-wise annotation is a major limitation of many fully supervised learning methods for image segmentation. In this paper, we propose a deep convolutional neural network for multi-class segmentation that circumvents this problem by being trainable on coarse data labels combined with only a very small number of images with pixel-wise annotations. We call this new labelling strategy ‘lazy’ labels. Image segmentation is then stratified into three connected tasks: rough detection of class instances, separation of wrongly connected objects without a clear boundary, and pixel-wise segmentation to find the accurate boundaries of each object. These problems are integrated into a multi-task learning framework and the model is trained end-to-end in a semi-supervised fashion. The method is demonstrated on two segmentation datasets, including food microscopy images and histology images of tissues respectively. We show that the model gives accurate segmentation results even if exact boundary labels are missing for a majority of the annotated data. This allows more flexibility and efficiency for training deep neural networks that are data hungry in a practical setting where manual annotation is expensive, by collecting more lazy (rough) annotations than precisely segmented images.
Tasks	Multi-Task Learning, Semantic Segmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=r1glygHtDB
PDF	https://openreview.net/pdf?id=r1glygHtDB
PWC	https://paperswithcode.com/paper/a-multi-task-u-net-for-segmentation-with-lazy-1
Repo
Framework

Mint: Matrix-Interleaving for Multi-Task Learning


Title	Mint: Matrix-Interleaving for Multi-Task Learning
Authors	Anonymous
Abstract	Deep learning enables training of large and flexible function approximators from scratch at the cost of large amounts of data. Applications of neural networks often consider learning in the context of a single task. However, in many scenarios what we hope to learn is not just a single task, but a model that can be used to solve multiple different tasks. Such multi-task learning settings have the potential to improve data efficiency and generalization by sharing data and representations across tasks. However, in some challenging multi-task learning settings, particularly in reinforcement learning, it is very difficult to learn a single model that can solve all the tasks while realizing data efficiency and performance benefits. Learning each of the tasks independently from scratch can actually perform better in such settings, but it does not benefit from the representation sharing that multi-task learning can potentially provide. In this work, we develop an approach that endows a single model with the ability to represent both extremes: joint training and independent training. To this end, we introduce matrix-interleaving (Mint), a modification to standard neural network models that projects the activations for each task into a different learned subspace, represented by a per-task and per-layer matrix. By learning these matrices jointly with the other model parameters, the optimizer itself can decide how much to share representations between tasks. On three challenging multi-task supervised learning and reinforcement learning problems with varying degrees of shared task structure, we find that this model consistently matches or outperforms joint training and independent training, combining the best elements of both.
Tasks	Multi-Task Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BJxnIxSKDr
PDF	https://openreview.net/pdf?id=BJxnIxSKDr
PWC	https://paperswithcode.com/paper/mint-matrix-interleaving-for-multi-task
Repo
Framework

Gradient Surgery for Multi-Task Learning


Title	Gradient Surgery for Multi-Task Learning
Authors	Anonymous
Abstract	While deep learning and deep reinforcement learning systems have demonstrated impressive results in domains such as image classification, game playing, and robotic control, data efficiency remains a major challenge, particularly as these algorithms learn individual tasks from scratch. Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks to enable more efficient learning. However, the multi-task setting presents a number of optimization challenges, making it difficult to realize large efficiency gains compared to learning tasks independently. The reasons why multi-task learning is so challenging compared to single task learning are not fully understood. Motivated by the insight that gradient interference causes optimization challenges, we develop a simple and general approach for avoiding interference between gradients from different tasks, by altering the gradients through a technique we refer to as “gradient surgery”. We propose a form of gradient surgery that projects the gradient of a task onto the normal plane of the gradient of any other task that has a conflicting gradient. On a series of challenging multi-task supervised and multi-task reinforcement learning problems, we find that this approach leads to substantial gains in efficiency and performance. Further, it can be effectively combined with previously-proposed multi-task architectures for enhanced performance in a model-agnostic way.
Tasks	Image Classification, Multi-Task Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HJewiCVFPB
PDF	https://openreview.net/pdf?id=HJewiCVFPB
PWC	https://paperswithcode.com/paper/gradient-surgery-for-multi-task-learning
Repo
Framework

Multi-Dimensional Explanation of Reviews


Title	Multi-Dimensional Explanation of Reviews
Authors	Anonymous
Abstract	Neural models achieved considerable improvement for many natural language processing tasks, but they offer little transparency, and interpretability comes at a cost. In some domains, automated predictions without justifications have limited applicability. Recently, progress has been made regarding single-aspect sentiment analysis for reviews, where the ambiguity of a justification is minimal. In this context, a justification, or mask, consists of (long) word sequences from the input text, which suffice to make the prediction. Existing models cannot handle more than one aspect in one training and induce binary masks that might be ambiguous. In our work, we propose a neural model for predicting multi-aspect sentiments for reviews and generates a probabilistic multi-dimensional mask (one per aspect) simultaneously, in an unsupervised and multi-task learning manner. Our evaluation shows that on three datasets, in the beer and hotel domain, our model outperforms strong baselines and generates masks that are: strong feature predictors, meaningful, and interpretable.
Tasks	Multi-Task Learning, Sentiment Analysis
Published	2020-01-01
URL	https://openreview.net/forum?id=B1lyZpEYvH
PDF	https://openreview.net/pdf?id=B1lyZpEYvH
PWC	https://paperswithcode.com/paper/multi-dimensional-explanation-of-reviews-1
Repo
Framework

Underwhelming Generalization Improvements From Controlling Feature Attribution


Title	Underwhelming Generalization Improvements From Controlling Feature Attribution
Authors	Anonymous
Abstract	Overfitting is a common issue in machine learning, which can arise when the model learns to predict class membership using convenient but spuriously-correlated image features instead of the true image features that denote a class. These are typically visualized using saliency maps. In some object classification tasks such as for medical images, one may have some images with masks, indicating a region of interest, i.e., which part of the image contains the most relevant information for the classification. We describe a simple method for taking advantage of such auxiliary labels, by training networks to ignore the distracting features which may be extracted outside of the region of interest, on the training images for which such masks are available. This mask information is only used during training and has an impact on generalization accuracy in a dataset-dependent way. We observe an underwhelming relationship between controlling saliency maps and improving generalization performance.
Tasks	Object Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=SJx0PAEFDS
PDF	https://openreview.net/pdf?id=SJx0PAEFDS
PWC	https://paperswithcode.com/paper/underwhelming-generalization-improvements-1
Repo
Framework

Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field


Title	Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field
Authors	Anonymous
Abstract	Consistent and reproducible evaluation of Deep Reinforcement Learning (DRL) is not straightforward. In the Arcade Learning Environment (ALE), small changes in environment parameters such as stochasticity or the maximum allowed play time can lead to very different performance. In this work, we discuss the difficulties of comparing different agents trained on ALE. In order to take a step further towards reproducible and comparable DRL, we introduce SABER, a Standardized Atari BEnchmark for general Reinforcement learning algorithms. Our methodology extends previous recommendations and contains a complete set of environment parameters as well as train and test procedures. We then use SABER to evaluate the current state of the art, Rainbow. Furthermore, we introduce a human world records baseline, and argue that previous claims of expert or superhuman performance of DRL might not be accurate. Finally, we propose Rainbow-IQN by extending Rainbow with Implicit Quantile Networks (IQN) leading to new state-of-the-art performance. Source code is available for reproducibility.
Tasks	Atari Games
Published	2020-01-01
URL	https://openreview.net/forum?id=r1x_DaVKwH
PDF	https://openreview.net/pdf?id=r1x_DaVKwH
PWC	https://paperswithcode.com/paper/is-deep-reinforcement-learning-really-1
Repo
Framework

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint


Title	Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint
Authors	Anonymous
Abstract	This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i.e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate. Specifically, we derive the exact population risk of the unregularized least squares regression problem with two-layer neural networks when either the first or the second layer is trained using a gradient flow under different initialization setups. When only the second layer coefficients are optimized, we recover the double descent phenomenon: a cusp in the population risk appears at $h\approx n$ and further overparameterization decreases the risk. In contrast, when the first layer weights are optimized, we highlight how different scales of initialization lead to different inductive bias, and show that the resulting risk is \textit{independent} of overparameterization. Our theoretical and experimental results suggest that previously studied model setups that provably give rise to double descent might not translate to two-layer neural networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gBsgBYwH
PDF	https://openreview.net/pdf?id=H1gBsgBYwH
PWC	https://paperswithcode.com/paper/generalization-of-two-layer-neural-networks
Repo
Framework

Few-shot Text Classification with Distributional Signatures


Title	Few-shot Text Classification with Distributional Signatures
Authors	Anonymous
Abstract	In this paper, we explore meta-learning for few-shot text classification. Meta-learning has shown strong performance in computer vision, where low-level patterns are transferable across learning tasks. However, directly applying this approach to text is challenging–lexical features highly informative for one task maybe insignificant for another. Thus, rather than learning solely from words, our model also leverages their distributional signatures, which encode pertinent word occurrence patterns. Our model is trained within a meta-learning framework to map these signatures into attention scores, which are then used to weight the lexical representations of words. We demonstrate that our model consistently outperforms prototypical networks learned on lexical knowledge (Snell et al., 2017) in both few-shot text classification and relation classification by a significant margin across six benchmark datasets (19.96% on average in 1-shot classification).
Tasks	Meta-Learning, Relation Classification, Text Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=H1emfT4twB
PDF	https://openreview.net/pdf?id=H1emfT4twB
PWC	https://paperswithcode.com/paper/few-shot-text-classification-with-1
Repo
Framework

Finding Winning Tickets with Limited (or No) Supervision


Title	Finding Winning Tickets with Limited (or No) Supervision
Authors	Anonymous
Abstract	The lottery ticket hypothesis argues that neural networks contain sparse subnetworks, which, if appropriately initialized (the winning tickets), are capable of matching the accuracy of the full network when trained in isolation. Empirically made in different contexts, such an observation opens interesting questions about the dynamics of neural network optimization and the importance of their initializations. However, the properties of winning tickets are not well understood, especially the importance of supervision in the generating process. In this paper, we aim to answer the following open questions: can we find winning tickets with few data samples or few labels? can we even obtain good tickets without supervision? Perhaps surprisingly, we provide a positive answer to both, by generating winning tickets with limited access to data, or with self-supervision—thus without using manual annotations—and then demonstrating the transferability of the tickets to challenging classification tasks such as ImageNet.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJx_QJHYDB
PDF	https://openreview.net/pdf?id=SJx_QJHYDB
PWC	https://paperswithcode.com/paper/finding-winning-tickets-with-limited-or-no
Repo
Framework

Adversarial Robustness as a Prior for Learned Representations


Title	Adversarial Robustness as a Prior for Learned Representations
Authors	Anonymous
Abstract	An important goal in deep learning is to learn versatile, high-level feature representations of input data. However, standard networks’ representations seem to possess shortcomings that, as we illustrate, prevent them from fully realizing this goal. In this work, we show that robust optimization can be re-cast as a tool for enforcing priors on the features learned by deep neural networks. It turns out that representations learned by robust models address the aforementioned shortcomings and make significant progress towards learning a high-level encoding of inputs. In particular, these representations are approximately invertible, while allowing for direct visualization and manipulation of salient input features. More broadly, our results indicate adversarial robustness as a promising avenue for improving learned representations.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rygvFyrKwH
PDF	https://openreview.net/pdf?id=rygvFyrKwH
PWC	https://paperswithcode.com/paper/adversarial-robustness-as-a-prior-for-learned
Repo
Framework

Denoising Improves Latent Space Geometry in Text Autoencoders


Title	Denoising Improves Latent Space Geometry in Text Autoencoders
Authors	Anonymous
Abstract	Neural language models have recently shown impressive gains in unconditional text generation, but controllable generation and manipulation of text remain challenging. In particular, controlling text via latent space operations in autoencoders has been difficult, in part due to chaotic latent space geometry. We propose to employ adversarial autoencoders together with denoising (referred as DAAE) to drive the latent space to organize itself. Theoretically, we prove that input sentence perturbations in the denoising approach encourage similar sentences to map to similar latent representations. Empirically, we illustrate the trade-off between text-generation and autoencoder-reconstruction capabilities, and our model significantly improves over other autoencoder variants. Even from completely unsupervised training, DAAE can successfully alter the tense/sentiment of sentences via simple latent vector arithmetic.
Tasks	Denoising, Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=ryl3blSFPr
PDF	https://openreview.net/pdf?id=ryl3blSFPr
PWC	https://paperswithcode.com/paper/denoising-improves-latent-space-geometry-in
Repo
Framework

CaptainGAN: Navigate Through Embedding Space For Better Text Generation


Title	CaptainGAN: Navigate Through Embedding Space For Better Text Generation
Authors	Anonymous
Abstract	Score-function-based text generation approaches such as REINFORCE, in general, suffer from high computational complexity and training instability problems. This is mainly due to the non-differentiable nature of the discrete space sampling and thus these methods have to treat the discriminator as a reward function and ignore the gradient information. In this paper, we propose a novel approach, CaptainGAN, which adopts the straight-through gradient estimator and introduces a ”re-centered” gradient estimation technique to steer the generator toward better text tokens through the embedding space. Our method is stable to train and converges quickly without maximum likelihood pre-training. On multiple metrics of text quality and diversity, our method outperforms existing GAN-based methods on natural language generation.
Tasks	Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gy1erYDH
PDF	https://openreview.net/pdf?id=H1gy1erYDH
PWC	https://paperswithcode.com/paper/captaingan-navigate-through-embedding-space
Repo
Framework

STYLE EXAMPLE-GUIDED TEXT GENERATION USING GENERATIVE ADVERSARIAL TRANSFORMERS


Title	STYLE EXAMPLE-GUIDED TEXT GENERATION USING GENERATIVE ADVERSARIAL TRANSFORMERS
Authors	Anonymous
Abstract	We introduce a language generative model framework for generating a styled paragraph based on a context sentence and a style reference example. The framework consists of a style encoder and a texts decoder. The style encoder extracts a style code from the reference example, and the text decoder generates texts based on the style code and the context. We propose a novel objective function to train our framework. We also investigate different network design choices. We conduct extensive experimental validation with comparison to strong baselines to validate the effectiveness of the proposed framework using a newly collected dataset with diverse text styles. Both code and dataset will be released upon publication.
Tasks	Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=BylIA1HYwS
PDF	https://openreview.net/pdf?id=BylIA1HYwS
PWC	https://paperswithcode.com/paper/style-example-guided-text-generation-using
Repo
Framework