April 1, 2020

2821 words 14 mins read

Paper Group NANR 41

Generalized Zero-shot ICD Coding. NORML: Nodal Optimization for Recurrent Meta-Learning. Noisy $\ell^{0}$-Sparse Subspace Clustering on Dimensionality Reduced Data. Towards an Adversarially Robust Normalization Approach. Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination. Iterative Target Augmentation for Effective Con …

Generalized Zero-shot ICD Coding


Title	Generalized Zero-shot ICD Coding
Authors	Anonymous
Abstract	The International Classification of Diseases (ICD) is a list of classification codes for the diagnoses. Automatic ICD coding is in high demand as the manual coding can be labor-intensive and error-prone. It is a multi-label text classification task with extremely long-tailed label distribution, making it difficult to perform fine-grained classification on both frequent and zero-shot codes at the same time. In this paper, we propose a latent feature generation framework for generalized zero-shot ICD coding, where we aim to improve the prediction on codes that have no labeled data without compromising the performance on seen codes. Our framework generates pseudo features conditioned on the ICD code descriptions and exploits the ICD code hierarchical structure. To guarantee the semantic consistency between the generated features and real features, we reconstruct the keywords in the input documents that are related to the conditioned ICD codes. To the best of our knowledge, this works represents the first one that proposes an adversarial generative model for the generalized zero-shot learning on multi-label text classification. Extensive experiments demonstrate the effectiveness of our approach. On the public MIMIC-III dataset, our methods improve the F1 score from nearly 0 to 20.91% for the zero-shot codes, and increase the AUC score by 3% (absolute improvement) from previous state of the art. We also show that the framework improves the performance on few-shot codes.
Tasks	Multi-Label Text Classification, Text Classification, Zero-Shot Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=S1lBTerYwH
PDF	https://openreview.net/pdf?id=S1lBTerYwH
PWC	https://paperswithcode.com/paper/generalized-zero-shot-icd-coding
Repo
Framework

NORML: Nodal Optimization for Recurrent Meta-Learning


Title	NORML: Nodal Optimization for Recurrent Meta-Learning
Authors	Anonymous
Abstract	Meta-learning is an exciting and powerful paradigm that aims to improve the effectiveness of current learning systems. By formulating the learning process as an optimization problem, a model can learn how to learn while requiring significantly less data or experience than traditional approaches. Gradient-based meta-learning methods aims to do just that, however recent work have shown that the effectiveness of these approaches are primarily due to feature reuse and very little has to do with priming the system for rapid learning (learning to make effective weight updates on unseen data distributions). This work introduces Nodal Optimization for Recurrent Meta-Learning (NORML), a novel meta-learning framework where an LSTM-based meta-learner performs neuron-wise optimization on a learner for efficient task learning. Crucially, the number of meta-learner parameters needed in NORML, increases linearly relative to the number of learner parameters. Allowing NORML to potentially scale to learner networks with very large numbers of parameters. While NORML also benefits from feature reuse it is shown experimentally that the meta-learner LSTM learns to make effective weight updates using information from previous data-points and update steps.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rklj3gBYvH
PDF	https://openreview.net/pdf?id=rklj3gBYvH
PWC	https://paperswithcode.com/paper/norml-nodal-optimization-for-recurrent-meta
Repo
Framework

Noisy $\ell^{0}$-Sparse Subspace Clustering on Dimensionality Reduced Data


Title	Noisy $\ell^{0}$-Sparse Subspace Clustering on Dimensionality Reduced Data
Authors	Anonymous
Abstract	High-dimensional data often lie in or close to low-dimensional subspaces. Sparse subspace clustering methods with sparsity induced by L0-norm, such as L0-Sparse Subspace Clustering (L0-SSC), are demonstrated to be more effective than its L1 counterpart such as Sparse Subspace Clustering (SSC). However, these L0-norm based subspace clustering methods are restricted to clean data that lie exactly in subspaces. Real data often suffer from noise and they may lie close to subspaces. We propose noisy L0-SSC to handle noisy data so as to improve the robustness. We show that the optimal solution to the optimization problem of noisy L0-SSC achieves subspace detection property (SDP), a key element with which data from different subspaces are separated, under deterministic and randomized models. Our results provide theoretical guarantee on the correctness of noisy L0-SSC in terms of SDP on noisy data. We further propose Noisy-DR-L0-SSC which provably recovers the subspaces on dimensionality reduced data. Noisy-DR-L0-SSC first projects the data onto a lower dimensional space by linear transformation, then performs noisy L0-SSC on the dimensionality reduced data so as to improve the efficiency. The experimental results demonstrate the effectiveness of noisy L0-SSC and Noisy-DR-L0-SSC.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gjM1SFDr
PDF	https://openreview.net/pdf?id=H1gjM1SFDr
PWC	https://paperswithcode.com/paper/noisy-ell0-sparse-subspace-clustering-on
Repo
Framework

Towards an Adversarially Robust Normalization Approach


Title	Towards an Adversarially Robust Normalization Approach
Authors	Anonymous
Abstract	Batch Normalization (BatchNorm) has shown to be effective for improving and accelerating the training of deep neural networks. However, recently it has been shown that it is also vulnerable to adversarial perturbations. In this work, we aim to investigate the cause of adversarial vulnerability of the BatchNorm. We hypothesize that the use of different normalization statistics during training and inference (mini-batch statistics for training and moving average of these values at inference) is the main cause of this adversarial vulnerability in the BatchNorm layer. We empirically proved this by experiments on various neural network architectures and datasets. Furthermore, we introduce Robust Normalization (RobustNorm) and experimentally show that it is not only resilient to adversarial perturbation but also inherit the benefits of BatchNorm.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJlEEaEFDS
PDF	https://openreview.net/pdf?id=BJlEEaEFDS
PWC	https://paperswithcode.com/paper/towards-an-adversarially-robust-normalization
Repo
Framework

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination


Title	Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination
Authors	Anonymous
Abstract	Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Also, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods such as MADDPG on a number of difficult coordination benchmarks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkxtNaNKwr
PDF	https://openreview.net/pdf?id=rkxtNaNKwr
PWC	https://paperswithcode.com/paper/evolutionary-reinforcement-learning-for-1
Repo
Framework

Iterative Target Augmentation for Effective Conditional Generation


Title	Iterative Target Augmentation for Effective Conditional Generation
Authors	Anonymous
Abstract	Many challenging prediction problems, from molecular optimization to program synthesis, involve creating complex structured objects as outputs. However, available training data may not be sufficient for a generative model to learn all possible complex transformations. By leveraging the idea that evaluation is easier than generation, we show how a simple, broadly applicable, iterative target augmentation scheme can be surprisingly effective in guiding the training and use of such models. Our scheme views the generative model as a prior distribution, and employs a separately trained filter as the likelihood. In each augmentation step, we filter the model’s outputs to obtain additional prediction targets for the next training epoch. Our method is applicable in the supervised as well as semi-supervised settings. We demonstrate that our approach yields significant gains over strong baselines both in molecular optimization and program synthesis. In particular, our augmented model outperforms the previous state-of-the-art in molecular optimization by over 10% in absolute gain.
Tasks	Program Synthesis
Published	2020-01-01
URL	https://openreview.net/forum?id=rylztAEYvr
PDF	https://openreview.net/pdf?id=rylztAEYvr
PWC	https://paperswithcode.com/paper/iterative-target-augmentation-for-effective
Repo
Framework

Benefits of Overparameterization in Single-Layer Latent Variable Generative Models


Title	Benefits of Overparameterization in Single-Layer Latent Variable Generative Models
Authors	Anonymous
Abstract	One of the most surprising and exciting discoveries in supervising learning was the benefit of overparameterization (i.e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i.e. generalization). In contrast, unsupervised settings have been under-explored, despite the fact that it has been observed that overparameterization can be helpful as early as Dasgupta & Schulman (2007). In this paper, we perform an exhaustive study of different aspects of overparameterization in unsupervised learning via synthetic and semi-synthetic experiments. We discuss benefits to different metrics of success (recovering the parameters of the ground-truth model, held-out log-likelihood), sensitivity to variations of the training algorithm, and behavior as the amount of overparameterization increases. We find that, when learning using methods such as variational inference, larger models can significantly increase the number of ground truth latent variables recovered.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkg0_eHtDr
PDF	https://openreview.net/pdf?id=rkg0_eHtDr
PWC	https://paperswithcode.com/paper/benefits-of-overparameterization-in-single-1
Repo
Framework

CEB Improves Model Robustness


Title	CEB Improves Model Robustness
Authors	Anonymous
Abstract	We demonstrate that the Conditional Entropy Bottleneck (CEB) can improve model robustness. CEB is an easy strategy to implement and works in tandem with data augmentation procedures. We report results of a large scale adversarial robustness study on CIFAR-10, as well as the IMAGENET-C Common Corruptions Benchmark.
Tasks	Data Augmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=SygEukHYvB
PDF	https://openreview.net/pdf?id=SygEukHYvB
PWC	https://paperswithcode.com/paper/ceb-improves-model-robustness
Repo
Framework

Learning with Long-term Remembering: Following the Lead of Mixed Stochastic Gradient


Title	Learning with Long-term Remembering: Following the Lead of Mixed Stochastic Gradient
Authors	Anonymous
Abstract	Current deep neural networks can achieve remarkable performance on a single task. However, when the deep neural network is continually trained on a sequence of tasks, it seems to gradually forget the previous learned knowledge. This phenomenon is referred to as catastrophic forgetting and motivates the field called lifelong learning. The central question in lifelong learning is how to enable deep neural networks to maintain performance on old tasks while learning a new task. In this paper, we introduce a novel and effective lifelong learning algorithm, called MixEd stochastic GrAdient (MEGA), which allows deep neural networks to acquire the ability of retaining performance on old tasks while learning new tasks. MEGA modulates the balance between old tasks and the new task by integrating the current gradient with the gradient computed on a small reference episodic memory. Extensive experimental results show that the proposed MEGA algorithm significantly advances the state-of-the-art on all four commonly used life-long learning benchmarks, reducing the error by up to 18%.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1g6kaVKvH
PDF	https://openreview.net/pdf?id=H1g6kaVKvH
PWC	https://paperswithcode.com/paper/learning-with-long-term-remembering-following-1
Repo
Framework

Graph inference learning for semi-supervised classification


Title	Graph inference learning for semi-supervised classification
Authors	Anonymous
Abstract	In this work, we address the semi-supervised classification of graph data, where the categories of those unlabeled nodes are inferred from labeled nodes as well as graph structures. Recent works often solve this problem with the advanced graph convolution in a conventional supervised manner, but the performance could be heavily affected when labeled data is scarce. Here we propose a Graph Inference Learning (GIL) framework to boost the performance of node classification by learning the inference of node labels on graph topology. To bridge the connection of two nodes, we formally define a structure relation by encapsulating node attributes, between-node paths and local topological structures together, which can make inference conveniently deduced from one node to another node. For learning the inference process, we further introduce meta-optimization on structure relations from training nodes to validation nodes, such that the learnt graph inference capability can be better self-adapted into test nodes. Comprehensive evaluations on four benchmark datasets (including Cora, Citeseer, Pubmed and NELL) demonstrate the superiority of our GIL when compared with other state-of-the-art methods in the semi-supervised node classification task.
Tasks	Node Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=r1evOhEKvH
PDF	https://openreview.net/pdf?id=r1evOhEKvH
PWC	https://paperswithcode.com/paper/graph-inference-learning-for-semi-supervised
Repo
Framework

Encoder-decoder Network as Loss Function for Summarization


Title	Encoder-decoder Network as Loss Function for Summarization
Authors	Anonymous
Abstract	We present a new approach to defining a sequence loss function to train a summarizer by using a secondary encoder-decoder as a loss function, alleviating a shortcoming of word level training for sequence outputs. The technique is based on the intuition that if a summary is a good one, it should contain the most essential information from the original article, and therefore should itself be a good input sequence, in lieu of the original, from which a summary can be generated. We present experimental results where we apply this additional loss function to a general abstractive summarizer on a news summarization dataset. The result is an improvement in the ROUGE metric and an especially large improvement in human evaluations, suggesting enhanced performance that is competitive with specialized state-of-the-art models.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SylkzaEYPS
PDF	https://openreview.net/pdf?id=SylkzaEYPS
PWC	https://paperswithcode.com/paper/encoder-decoder-network-as-loss-function-for
Repo
Framework

Gradient $\ell_1$ Regularization for Quantization Robustness


Title	Gradient $\ell_1$ Regularization for Quantization Robustness
Authors	Anonymous
Abstract	We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for ``on the fly’’ post-training quantization to various bit-widths. We show that by modeling quantization as a $\ell_\infty$-bounded perturbation, the first-order term in the loss expansion can be regularized using the $\ell_1$-norm of gradients. We experimentally validate our method on different vision architectures on CIFAR-10 and ImageNet datasets and show that the regularization of a neural network using our method improves robustness against quantization noise. \|
Tasks	Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=ryxK0JBtPr
PDF	https://openreview.net/pdf?id=ryxK0JBtPr
PWC	https://paperswithcode.com/paper/gradient-ell_1-regularization-for
Repo
Framework

Towards Understanding Generalization in Gradient-Based Meta-Learning


Title	Towards Understanding Generalization in Gradient-Based Meta-Learning
Authors	Anonymous
Abstract	In this work we study generalization of neural networks in gradient-based meta-learning by analyzing various properties of the objective landscapes. We experimentally demonstrate that as meta-training progresses, the meta-test solutions obtained by adapting the meta-train solution of the model to new tasks via few steps of gradient-based fine-tuning, become flatter, lower in loss, and further away from the meta-train solution. We also show that those meta-test solutions become flatter even as generalization starts to degrade, thus providing an experimental evidence against the correlation between generalization and flat minima in the paradigm of gradient-based meta-leaning. Furthermore, we provide empirical evidence that generalization to new tasks is correlated with the coherence between their adaptation trajectories in parameter space, measured by the average cosine similarity between task-specific trajectory directions, starting from a same meta-train solution. We also show that coherence of meta-test gradients, measured by the average inner product between the task-specific gradient vectors evaluated at meta-train solution, is also correlated with generalization.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SygT21SFvB
PDF	https://openreview.net/pdf?id=SygT21SFvB
PWC	https://paperswithcode.com/paper/towards-understanding-generalization-in-1
Repo
Framework

Adversarial Paritial Multi-label Learning


Title	Adversarial Paritial Multi-label Learning
Authors	Yan Yan, Yuhong Guo
Abstract	Partial multi-label learning (PML), which tackles the problem of learning multi-label prediction models from instances with overcomplete noisy annotations, has recently started gaining attention from the research community. In this paper, we propose a novel adversarial learning model, PML-GAN, under a generalized encoder-decoder framework for partial multi-label learning. The PML-GAN model uses a disambiguation network to identify noisy labels and uses a multi-label prediction network to map the training instances to the disambiguated label vectors, while deploying a generative adversarial network as an inverse mapping from label vectors to data samples in the input feature space. The learning of the overall model corresponds to a minimax adversarial game, which enhances the correspondence of input features with the output labels. Extensive experiments are conducted on multiple datasets, while the proposed model demonstrates the state-of-the-art performance for partial multi-label learning.
Tasks	Multi-Label Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rkehoAVtvS
PDF	https://openreview.net/pdf?id=rkehoAVtvS
PWC	https://paperswithcode.com/paper/adversarial-paritial-multi-label-learning
Repo
Framework

Building Hierarchical Interpretations in Natural Language via Feature Interaction Detection


Title	Building Hierarchical Interpretations in Natural Language via Feature Interaction Detection
Authors	Anonymous
Abstract	The interpretability of neural networks has become crucial for their applications in real world with respect to the reliability and trustworthiness. Existing explanation generation methods usually provide important features by scoring their individual contributions to the model prediction and ignore the interactions between features, which eventually provide a bag-of-words representation as explanation. In natural language processing, this type of explanations is challenging for human user to understand the meaning of an explanation and draw the connection between explanation and model prediction, especially for long texts. In this work, we focus on detecting the interactions between features, and propose a novel approach to build a hierarchy of explanations based on feature interactions. The proposed method is evaluated with three neural classifiers, LSTM, CNN, and BERT, on two benchmark text classification datasets. The generated explanations are assessed by both automatic evaluation measurements and human evaluators. Experiments show the effectiveness of the proposed method in providing explanations that are both faithful to models, and understandable to humans.
Tasks	Text Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=S1xD6xHKDr
PDF	https://openreview.net/pdf?id=S1xD6xHKDr
PWC	https://paperswithcode.com/paper/building-hierarchical-interpretations-in
Repo
Framework