Paper Group NANR 41
Generalized Zero-shot ICD Coding. NORML: Nodal Optimization for Recurrent Meta-Learning. Noisy $\ell^{0}$-Sparse Subspace Clustering on Dimensionality Reduced Data. Towards an Adversarially Robust Normalization Approach. Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination. Iterative Target Augmentation for Effective Con …
Generalized Zero-shot ICD Coding
Title | Generalized Zero-shot ICD Coding |
Authors | Anonymous |
Abstract | The International Classification of Diseases (ICD) is a list of classification codes for the diagnoses. Automatic ICD coding is in high demand as the manual coding can be labor-intensive and error-prone. It is a multi-label text classification task with extremely long-tailed label distribution, making it difficult to perform fine-grained classification on both frequent and zero-shot codes at the same time. In this paper, we propose a latent feature generation framework for generalized zero-shot ICD coding, where we aim to improve the prediction on codes that have no labeled data without compromising the performance on seen codes. Our framework generates pseudo features conditioned on the ICD code descriptions and exploits the ICD code hierarchical structure. To guarantee the semantic consistency between the generated features and real features, we reconstruct the keywords in the input documents that are related to the conditioned ICD codes. To the best of our knowledge, this works represents the first one that proposes an adversarial generative model for the generalized zero-shot learning on multi-label text classification. Extensive experiments demonstrate the effectiveness of our approach. On the public MIMIC-III dataset, our methods improve the F1 score from nearly 0 to 20.91% for the zero-shot codes, and increase the AUC score by 3% (absolute improvement) from previous state of the art. We also show that the framework improves the performance on few-shot codes. |
Tasks | Multi-Label Text Classification, Text Classification, Zero-Shot Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1lBTerYwH |
https://openreview.net/pdf?id=S1lBTerYwH | |
PWC | https://paperswithcode.com/paper/generalized-zero-shot-icd-coding |
Repo | |
Framework | |
NORML: Nodal Optimization for Recurrent Meta-Learning
Title | NORML: Nodal Optimization for Recurrent Meta-Learning |
Authors | Anonymous |
Abstract | Meta-learning is an exciting and powerful paradigm that aims to improve the effectiveness of current learning systems. By formulating the learning process as an optimization problem, a model can learn how to learn while requiring significantly less data or experience than traditional approaches. Gradient-based meta-learning methods aims to do just that, however recent work have shown that the effectiveness of these approaches are primarily due to feature reuse and very little has to do with priming the system for rapid learning (learning to make effective weight updates on unseen data distributions). This work introduces Nodal Optimization for Recurrent Meta-Learning (NORML), a novel meta-learning framework where an LSTM-based meta-learner performs neuron-wise optimization on a learner for efficient task learning. Crucially, the number of meta-learner parameters needed in NORML, increases linearly relative to the number of learner parameters. Allowing NORML to potentially scale to learner networks with very large numbers of parameters. While NORML also benefits from feature reuse it is shown experimentally that the meta-learner LSTM learns to make effective weight updates using information from previous data-points and update steps. |
Tasks | Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rklj3gBYvH |
https://openreview.net/pdf?id=rklj3gBYvH | |
PWC | https://paperswithcode.com/paper/norml-nodal-optimization-for-recurrent-meta |
Repo | |
Framework | |
Noisy $\ell^{0}$-Sparse Subspace Clustering on Dimensionality Reduced Data
Title | Noisy $\ell^{0}$-Sparse Subspace Clustering on Dimensionality Reduced Data |
Authors | Anonymous |
Abstract | High-dimensional data often lie in or close to low-dimensional subspaces. Sparse subspace clustering methods with sparsity induced by L0-norm, such as L0-Sparse Subspace Clustering (L0-SSC), are demonstrated to be more effective than its L1 counterpart such as Sparse Subspace Clustering (SSC). However, these L0-norm based subspace clustering methods are restricted to clean data that lie exactly in subspaces. Real data often suffer from noise and they may lie close to subspaces. We propose noisy L0-SSC to handle noisy data so as to improve the robustness. We show that the optimal solution to the optimization problem of noisy L0-SSC achieves subspace detection property (SDP), a key element with which data from different subspaces are separated, under deterministic and randomized models. Our results provide theoretical guarantee on the correctness of noisy L0-SSC in terms of SDP on noisy data. We further propose Noisy-DR-L0-SSC which provably recovers the subspaces on dimensionality reduced data. Noisy-DR-L0-SSC first projects the data onto a lower dimensional space by linear transformation, then performs noisy L0-SSC on the dimensionality reduced data so as to improve the efficiency. The experimental results demonstrate the effectiveness of noisy L0-SSC and Noisy-DR-L0-SSC. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1gjM1SFDr |
https://openreview.net/pdf?id=H1gjM1SFDr | |
PWC | https://paperswithcode.com/paper/noisy-ell0-sparse-subspace-clustering-on |
Repo | |
Framework | |
Towards an Adversarially Robust Normalization Approach
Title | Towards an Adversarially Robust Normalization Approach |
Authors | Anonymous |
Abstract | Batch Normalization (BatchNorm) has shown to be effective for improving and accelerating the training of deep neural networks. However, recently it has been shown that it is also vulnerable to adversarial perturbations. In this work, we aim to investigate the cause of adversarial vulnerability of the BatchNorm. We hypothesize that the use of different normalization statistics during training and inference (mini-batch statistics for training and moving average of these values at inference) is the main cause of this adversarial vulnerability in the BatchNorm layer. We empirically proved this by experiments on various neural network architectures and datasets. Furthermore, we introduce Robust Normalization (RobustNorm) and experimentally show that it is not only resilient to adversarial perturbation but also inherit the benefits of BatchNorm. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlEEaEFDS |
https://openreview.net/pdf?id=BJlEEaEFDS | |
PWC | https://paperswithcode.com/paper/towards-an-adversarially-robust-normalization |
Repo | |
Framework | |
Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination
Title | Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination |
Authors | Anonymous |
Abstract | Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Also, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods such as MADDPG on a number of difficult coordination benchmarks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkxtNaNKwr |
https://openreview.net/pdf?id=rkxtNaNKwr | |
PWC | https://paperswithcode.com/paper/evolutionary-reinforcement-learning-for-1 |
Repo | |
Framework | |
Iterative Target Augmentation for Effective Conditional Generation
Title | Iterative Target Augmentation for Effective Conditional Generation |
Authors | Anonymous |
Abstract | Many challenging prediction problems, from molecular optimization to program synthesis, involve creating complex structured objects as outputs. However, available training data may not be sufficient for a generative model to learn all possible complex transformations. By leveraging the idea that evaluation is easier than generation, we show how a simple, broadly applicable, iterative target augmentation scheme can be surprisingly effective in guiding the training and use of such models. Our scheme views the generative model as a prior distribution, and employs a separately trained filter as the likelihood. In each augmentation step, we filter the model’s outputs to obtain additional prediction targets for the next training epoch. Our method is applicable in the supervised as well as semi-supervised settings. We demonstrate that our approach yields significant gains over strong baselines both in molecular optimization and program synthesis. In particular, our augmented model outperforms the previous state-of-the-art in molecular optimization by over 10% in absolute gain. |
Tasks | Program Synthesis |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rylztAEYvr |
https://openreview.net/pdf?id=rylztAEYvr | |
PWC | https://paperswithcode.com/paper/iterative-target-augmentation-for-effective |
Repo | |
Framework | |
Benefits of Overparameterization in Single-Layer Latent Variable Generative Models
Title | Benefits of Overparameterization in Single-Layer Latent Variable Generative Models |
Authors | Anonymous |
Abstract | One of the most surprising and exciting discoveries in supervising learning was the benefit of overparameterization (i.e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i.e. generalization). In contrast, unsupervised settings have been under-explored, despite the fact that it has been observed that overparameterization can be helpful as early as Dasgupta & Schulman (2007). In this paper, we perform an exhaustive study of different aspects of overparameterization in unsupervised learning via synthetic and semi-synthetic experiments. We discuss benefits to different metrics of success (recovering the parameters of the ground-truth model, held-out log-likelihood), sensitivity to variations of the training algorithm, and behavior as the amount of overparameterization increases. We find that, when learning using methods such as variational inference, larger models can significantly increase the number of ground truth latent variables recovered. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkg0_eHtDr |
https://openreview.net/pdf?id=rkg0_eHtDr | |
PWC | https://paperswithcode.com/paper/benefits-of-overparameterization-in-single-1 |
Repo | |
Framework | |
CEB Improves Model Robustness
Title | CEB Improves Model Robustness |
Authors | Anonymous |
Abstract | We demonstrate that the Conditional Entropy Bottleneck (CEB) can improve model robustness. CEB is an easy strategy to implement and works in tandem with data augmentation procedures. We report results of a large scale adversarial robustness study on CIFAR-10, as well as the IMAGENET-C Common Corruptions Benchmark. |
Tasks | Data Augmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SygEukHYvB |
https://openreview.net/pdf?id=SygEukHYvB | |
PWC | https://paperswithcode.com/paper/ceb-improves-model-robustness |
Repo | |
Framework | |
Learning with Long-term Remembering: Following the Lead of Mixed Stochastic Gradient
Title | Learning with Long-term Remembering: Following the Lead of Mixed Stochastic Gradient |
Authors | Anonymous |
Abstract | Current deep neural networks can achieve remarkable performance on a single task. However, when the deep neural network is continually trained on a sequence of tasks, it seems to gradually forget the previous learned knowledge. This phenomenon is referred to as catastrophic forgetting and motivates the field called lifelong learning. The central question in lifelong learning is how to enable deep neural networks to maintain performance on old tasks while learning a new task. In this paper, we introduce a novel and effective lifelong learning algorithm, called MixEd stochastic GrAdient (MEGA), which allows deep neural networks to acquire the ability of retaining performance on old tasks while learning new tasks. MEGA modulates the balance between old tasks and the new task by integrating the current gradient with the gradient computed on a small reference episodic memory. Extensive experimental results show that the proposed MEGA algorithm significantly advances the state-of-the-art on all four commonly used life-long learning benchmarks, reducing the error by up to 18%. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1g6kaVKvH |
https://openreview.net/pdf?id=H1g6kaVKvH | |
PWC | https://paperswithcode.com/paper/learning-with-long-term-remembering-following-1 |
Repo | |
Framework | |
Graph inference learning for semi-supervised classification
Title | Graph inference learning for semi-supervised classification |
Authors | Anonymous |
Abstract | In this work, we address the semi-supervised classification of graph data, where the categories of those unlabeled nodes are inferred from labeled nodes as well as graph structures. Recent works often solve this problem with the advanced graph convolution in a conventional supervised manner, but the performance could be heavily affected when labeled data is scarce. Here we propose a Graph Inference Learning (GIL) framework to boost the performance of node classification by learning the inference of node labels on graph topology. To bridge the connection of two nodes, we formally define a structure relation by encapsulating node attributes, between-node paths and local topological structures together, which can make inference conveniently deduced from one node to another node. For learning the inference process, we further introduce meta-optimization on structure relations from training nodes to validation nodes, such that the learnt graph inference capability can be better self-adapted into test nodes. Comprehensive evaluations on four benchmark datasets (including Cora, Citeseer, Pubmed and NELL) demonstrate the superiority of our GIL when compared with other state-of-the-art methods in the semi-supervised node classification task. |
Tasks | Node Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1evOhEKvH |
https://openreview.net/pdf?id=r1evOhEKvH | |
PWC | https://paperswithcode.com/paper/graph-inference-learning-for-semi-supervised |
Repo | |
Framework | |
Encoder-decoder Network as Loss Function for Summarization
Title | Encoder-decoder Network as Loss Function for Summarization |
Authors | Anonymous |
Abstract | We present a new approach to defining a sequence loss function to train a summarizer by using a secondary encoder-decoder as a loss function, alleviating a shortcoming of word level training for sequence outputs. The technique is based on the intuition that if a summary is a good one, it should contain the most essential information from the original article, and therefore should itself be a good input sequence, in lieu of the original, from which a summary can be generated. We present experimental results where we apply this additional loss function to a general abstractive summarizer on a news summarization dataset. The result is an improvement in the ROUGE metric and an especially large improvement in human evaluations, suggesting enhanced performance that is competitive with specialized state-of-the-art models. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SylkzaEYPS |
https://openreview.net/pdf?id=SylkzaEYPS | |
PWC | https://paperswithcode.com/paper/encoder-decoder-network-as-loss-function-for |
Repo | |
Framework | |
Gradient $\ell_1$ Regularization for Quantization Robustness
Title | Gradient $\ell_1$ Regularization for Quantization Robustness |
Authors | Anonymous |
Abstract | We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for ``on the fly’’ post-training quantization to various bit-widths. We show that by modeling quantization as a $\ell_\infty$-bounded perturbation, the first-order term in the loss expansion can be regularized using the $\ell_1$-norm of gradients. We experimentally validate our method on different vision architectures on CIFAR-10 and ImageNet datasets and show that the regularization of a neural network using our method improves robustness against quantization noise. | |
Tasks | Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxK0JBtPr |
https://openreview.net/pdf?id=ryxK0JBtPr | |
PWC | https://paperswithcode.com/paper/gradient-ell_1-regularization-for |
Repo | |
Framework | |
Towards Understanding Generalization in Gradient-Based Meta-Learning
Title | Towards Understanding Generalization in Gradient-Based Meta-Learning |
Authors | Anonymous |
Abstract | In this work we study generalization of neural networks in gradient-based meta-learning by analyzing various properties of the objective landscapes. We experimentally demonstrate that as meta-training progresses, the meta-test solutions obtained by adapting the meta-train solution of the model to new tasks via few steps of gradient-based fine-tuning, become flatter, lower in loss, and further away from the meta-train solution. We also show that those meta-test solutions become flatter even as generalization starts to degrade, thus providing an experimental evidence against the correlation between generalization and flat minima in the paradigm of gradient-based meta-leaning. Furthermore, we provide empirical evidence that generalization to new tasks is correlated with the coherence between their adaptation trajectories in parameter space, measured by the average cosine similarity between task-specific trajectory directions, starting from a same meta-train solution. We also show that coherence of meta-test gradients, measured by the average inner product between the task-specific gradient vectors evaluated at meta-train solution, is also correlated with generalization. |
Tasks | Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SygT21SFvB |
https://openreview.net/pdf?id=SygT21SFvB | |
PWC | https://paperswithcode.com/paper/towards-understanding-generalization-in-1 |
Repo | |
Framework | |
Adversarial Paritial Multi-label Learning
Title | Adversarial Paritial Multi-label Learning |
Authors | Yan Yan, Yuhong Guo |
Abstract | Partial multi-label learning (PML), which tackles the problem of learning multi-label prediction models from instances with overcomplete noisy annotations, has recently started gaining attention from the research community. In this paper, we propose a novel adversarial learning model, PML-GAN, under a generalized encoder-decoder framework for partial multi-label learning. The PML-GAN model uses a disambiguation network to identify noisy labels and uses a multi-label prediction network to map the training instances to the disambiguated label vectors, while deploying a generative adversarial network as an inverse mapping from label vectors to data samples in the input feature space. The learning of the overall model corresponds to a minimax adversarial game, which enhances the correspondence of input features with the output labels. Extensive experiments are conducted on multiple datasets, while the proposed model demonstrates the state-of-the-art performance for partial multi-label learning. |
Tasks | Multi-Label Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkehoAVtvS |
https://openreview.net/pdf?id=rkehoAVtvS | |
PWC | https://paperswithcode.com/paper/adversarial-paritial-multi-label-learning |
Repo | |
Framework | |
Building Hierarchical Interpretations in Natural Language via Feature Interaction Detection
Title | Building Hierarchical Interpretations in Natural Language via Feature Interaction Detection |
Authors | Anonymous |
Abstract | The interpretability of neural networks has become crucial for their applications in real world with respect to the reliability and trustworthiness. Existing explanation generation methods usually provide important features by scoring their individual contributions to the model prediction and ignore the interactions between features, which eventually provide a bag-of-words representation as explanation. In natural language processing, this type of explanations is challenging for human user to understand the meaning of an explanation and draw the connection between explanation and model prediction, especially for long texts. In this work, we focus on detecting the interactions between features, and propose a novel approach to build a hierarchy of explanations based on feature interactions. The proposed method is evaluated with three neural classifiers, LSTM, CNN, and BERT, on two benchmark text classification datasets. The generated explanations are assessed by both automatic evaluation measurements and human evaluators. Experiments show the effectiveness of the proposed method in providing explanations that are both faithful to models, and understandable to humans. |
Tasks | Text Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1xD6xHKDr |
https://openreview.net/pdf?id=S1xD6xHKDr | |
PWC | https://paperswithcode.com/paper/building-hierarchical-interpretations-in |
Repo | |
Framework | |