Paper Group NANR 38
Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates. $\alpha^{\alpha}$-Rank: Scalable Multi-agent Evaluation through Evolution. Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness. Zero-shot task adaptation by homoiconic meta-mapping. Towards Effective and Efficient Zero-shot Learning by Fin …
Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates
Title | Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates |
Authors | Anonymous |
Abstract | Learning with noisy labels is a common problem in supervised learning. Existing approaches require practitioners to specify noise rates, i.e., a set of parameters controlling the severity of label noises in the problem. In this work, we introduce a technique to learn from noisy labels that does not require a priori specification of the noise rates. In particular, we introduce a new family of loss functions that we name as peer loss functions. Our approach then uses a standard empirical risk minimization (ERM) framework with peer loss functions. Peer loss functions associate each training sample with a certain form of “peer” samples, which evaluate a classifier’ predictions jointly. We show that, under mild conditions, performing ERM with peer loss functions on the noisy dataset leads to the optimal or a near optimal classifier as if performing ERM over the clean training data, which we do not have access to. To our best knowledge, this is the first result on “learning with noisy labels without knowing noise rates” with theoretical guarantees. We pair our results with an extensive set of experiments, where we compare with state-of-the-art techniques of learning with noisy labels. Our results show that peer loss functions based method consistently outperforms the baseline benchmarks. Peer loss provides a way to simplify model development when facing potentially noisy training labels, and can be promoted as a robust candidate loss function in such situations. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bkgq9ANKvB |
https://openreview.net/pdf?id=Bkgq9ANKvB | |
PWC | https://paperswithcode.com/paper/peer-loss-functions-learning-from-noisy-1 |
Repo | |
Framework | |
$\alpha^{\alpha}$-Rank: Scalable Multi-agent Evaluation through Evolution
Title | $\alpha^{\alpha}$-Rank: Scalable Multi-agent Evaluation through Evolution |
Authors | Anonymous |
Abstract | Although challenging, strategy profile evaluation in large connected learner networks is crucial for enabling the next wave of machine learning applications. Recently, $\alpha$-Rank, an evolutionary algorithm, has been proposed as a solution for ranking joint policy profiles in multi-agent systems. $\alpha$-Rank claimed scalability through a polynomial time implementation with respect to the total number of pure strategy profiles. In this paper, we formally prove that such a claim is not grounded. In fact, we show that $\alpha$-Rank exhibits an exponential complexity in number of agents, hindering its application beyond a small finite number of joint profiles. Realizing such a limitation, we contribute by proposing a scalable evaluation protocol that we title $\alpha^{\alpha}$-Rank. Our method combines evolutionary dynamics with stochastic optimization and double oracles for \emph{truly} scalable ranking with linear (in number of agents) time and memory complexities. Our contributions allow us, for the first time, to conduct large-scale evaluation experiments of multi-agent systems, where we show successful results on large joint strategy profiles with sizes in the order of $\mathcal{O}(2^{25})$ (i.e., $\approx \text{$33$ million strategies}$) – a setting not evaluable using current techniques. |
Tasks | Stochastic Optimization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hkg_8xBYDS |
https://openreview.net/pdf?id=Hkg_8xBYDS | |
PWC | https://paperswithcode.com/paper/alphaalpha-rank-scalable-multi-agent |
Repo | |
Framework | |
Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness
Title | Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness |
Authors | Anonymous |
Abstract | Recently $\ell^4$-norm maximization has been proposed to solve the sparse dictionary learning (SDL) problem. The simple MSP (matching, stretching, and projection) algorithm proposed by \cite{zhai2019a} has shown to be surprisingly efficient and effective. This paper aims to better understand this algorithm from its strong geometric and statistical connections with the classic PCA and ICA, as well as their associated fixed-point style algorithms. Such connections provide a unified way of viewing problems that pursue {\em principal}, {\em independent}, or {\em sparse} components of high-dimensional data. Our studies reveal additional good properties of the $\ell^4$-maximization: not only is the MSP algorithm for sparse coding insensitive to small noise, but also robust to outliers, and resilient to sparse corruptions. We provide preliminary statistical justification for such inherently nice properties. To corroborate the theoretical analysis, we also provide extensive and compelling experimental evidence with both synthetic data and real images. |
Tasks | Dictionary Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJeY-1BKDS |
https://openreview.net/pdf?id=SJeY-1BKDS | |
PWC | https://paperswithcode.com/paper/understanding-l4-based-dictionary-learning |
Repo | |
Framework | |
Zero-shot task adaptation by homoiconic meta-mapping
Title | Zero-shot task adaptation by homoiconic meta-mapping |
Authors | Anonymous |
Abstract | How can deep learning systems flexibly reuse their knowledge? Toward this goal, we propose a new class of challenges, and a class of architectures that can solve them. The challenges are meta-mappings, which involve systematically transforming task behaviors to adapt to new tasks zero-shot. We suggest that the key to achieving these challenges is representing the task being performed in such a way that this task representation is itself transformable. We therefore draw inspiration from functional programming and recent work in meta-learning to propose a class of Homoiconic Meta-Mapping (HoMM) approaches that represent data points and tasks in a shared latent space, and learn to infer transformations of that space. HoMM approaches can be applied to any type of machine learning task, including supervised learning and reinforcement learning. We demonstrate the utility of this perspective by exhibiting zero-shot remapping of behavior to adapt to new tasks. |
Tasks | Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyeX7aVKvr |
https://openreview.net/pdf?id=HyeX7aVKvr | |
PWC | https://paperswithcode.com/paper/zero-shot-task-adaptation-by-homoiconic-meta |
Repo | |
Framework | |
Towards Effective and Efficient Zero-shot Learning by Fine-tuning with Task Descriptions
Title | Towards Effective and Efficient Zero-shot Learning by Fine-tuning with Task Descriptions |
Authors | Anonymous |
Abstract | While current machine learning models have achieved great success with labeled data, we have to deal with classes that have little or no training data in many real-world applications. This leads to the study of zero-shot learning. The typical approach in zero-shot learning is to embed seen and unseen classes into a shared space using class meta-data, and construct classifiers on top of that. Yet previous methods either still require significant manual labor in obtaining useful meta-data, or utilize automatically collected meta-data while trading in performance. To achieve satisfactory performance under practical meta-data efficiency constraint, we propose \textbf{N\textsuperscript{3}} (\textbf{N}eural \textbf{N}etworks from \textbf{N}atural Language), a meta-model that maps natural language class descriptions to corresponding neural network classifiers. N\textsuperscript{3} leverages readily available online documents combined with pretrained language representations such as BERT to obtain expressive class embeddings. In addition, N\textsuperscript{3} generates parameter adaptations for pretrained neural networks using these class embeddings, effectively finetuneing'' the network to classify unseen classes. Our experiments show that N\textsuperscript{3} is able to outperform previous methods across 8 different benchmark evaluations and we show through ablation studies the contribution of each model component. To offer insight into how N\textsuperscript{3} finetunes’’ the pretrained network, we also performed a range of qualitative and quantitative analysis. Our code will be released after the review period. |
Tasks | Zero-Shot Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkxNQJrFPH |
https://openreview.net/pdf?id=rkxNQJrFPH | |
PWC | https://paperswithcode.com/paper/towards-effective-and-efficient-zero-shot |
Repo | |
Framework | |
Characterize and Transfer Attention in Graph Neural Networks
Title | Characterize and Transfer Attention in Graph Neural Networks |
Authors | Anonymous |
Abstract | Does attention matter and, if so, when and how? Our study on both inductive and transductive learning suggests that datasets have a strong influence on the effects of attention in graph neural networks. Independent of learning setting, task and attention variant, attention mostly degenerate to simple averaging for all three citation networks, whereas they behave strikingly different in the protein-protein interaction networks and molecular graphs: nodes attend to different neighbors per head and get more focused in deeper layers. Consequently, attention distributions become telltale features of the datasets themselves. We further explore the possibility of transferring attention for graph sparsification and show that, when applicable, attention-based sparsification retains enough information to obtain good performance while reducing computational and storage costs. Finally, we point out several possible directions for further study and transfer of attention. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkeBBJrFPH |
https://openreview.net/pdf?id=SkeBBJrFPH | |
PWC | https://paperswithcode.com/paper/characterize-and-transfer-attention-in-graph |
Repo | |
Framework | |
Capsules with Inverted Dot-Product Attention Routing
Title | Capsules with Inverted Dot-Product Attention Routing |
Authors | Anonymous |
Abstract | We introduce a new routing algorithm for capsule networks, in which a child capsule is routed to a parent based only on agreement between the parent’s state and the child’s vote. Unlike previously proposed routing algorithms, the parent’s ability to reconstruct the child is not explicitly taken into account to update the routing probabilities. This simplifies the routing procedure and improves performance on benchmark datasets such as CIFAR-10 and CIFAR-100. The new mechanism 1) designs routing via inverted dot-product attention; 2) imposes Layer Normalization as normalization; and 3) replaces sequential iterative routing with concurrent iterative routing. Besides outperforming existing capsule networks, our model performs at-par with a powerful CNN (ResNet-18), using less than 25% of the parameters. On a different task of recognizing digits from overlayed digit images, the proposed capsule model performs favorably against CNNs given the same number of layers and neurons per layer. We believe that our work raises the possibility of applying capsule networks to complex real-world tasks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJe6uANtwH |
https://openreview.net/pdf?id=HJe6uANtwH | |
PWC | https://paperswithcode.com/paper/capsules-with-inverted-dot-product-attention |
Repo | |
Framework | |
Sparse Transformer: Concentrated Attention Through Explicit Selection
Title | Sparse Transformer: Concentrated Attention Through Explicit Selection |
Authors | Anonymous |
Abstract | Self-attention-based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks. Self attention is able to model long-term dependencies, but it may suffer from the extraction of irrelevant information in the context. To tackle the problem, we propose a novel model called Sparse Transformer. Sparse Transformer is able to improve the concentration of attention on the global context through an explicit selection of the most relevant segments. Extensive experimental results on a series of natural language processing tasks, including neural machine translation, image captioning, and language modeling, all demonstrate the advantages of Sparse Transformer in model performance. Sparse Transformer reaches the state-of-the-art performances in the IWSLT 2015 English-to-Vietnamese translation and IWSLT 2014 German-to-English translation. In addition, we conduct qualitative analysis to account for Sparse Transformer’s superior performance. |
Tasks | Image Captioning, Language Modelling, Machine Translation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hye87grYDH |
https://openreview.net/pdf?id=Hye87grYDH | |
PWC | https://paperswithcode.com/paper/sparse-transformer-concentrated-attention |
Repo | |
Framework | |
Disentangled Representation Learning with Sequential Residual Variational Autoencoder
Title | Disentangled Representation Learning with Sequential Residual Variational Autoencoder |
Authors | Nanxiang Li, Shabnam Ghaffarzadegan, Liu Ren |
Abstract | Recent advancements in unsupervised disentangled representation learning focus on extending the variational autoencoder (VAE) with an augmented objective function to balance the trade-off between disentanglement and reconstruction. We propose Sequential Residual Variational Autoencoder (SR-VAE) that defines a “Residual learning” mechanism as the training regime instead of the augmented objective function. Our proposed solution deploys two important ideas in a single framework: (1) learning from the residual between the input data and the accumulated reconstruction of sequentially added latent variables; (2) decomposing the reconstruction into decoder output and a residual term. This formulation encourages the disentanglement in the latent space by inducing explicit dependency structure, and reduces the bottleneck of VAE by adding the residual term to facilitate reconstruction. More importantly, SR-VAE eliminates the hyperparameter tuning, a crucial step for the prior state-of-the-art performance using the objective function augmentation approach. We demonstrate both qualitatively and quantitatively that SR-VAE improves the state-of-the-art unsupervised disentangled representation learning on a variety of complex datasets. |
Tasks | Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Sklyn6EYvH |
https://openreview.net/pdf?id=Sklyn6EYvH | |
PWC | https://paperswithcode.com/paper/disentangled-representation-learning-with-1 |
Repo | |
Framework | |
Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets
Title | Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets |
Authors | Anonymous |
Abstract | Deep neural nets (DNNs) compression is crucial for adaptation to mobile devices. Though many successful algorithms exist to compress naturally trained DNNs, developing efficient and stable compression algorithms for robustly trained DNNs remains widely open. In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep learning. Such a co-design enables us to advance the goal of accommodating both sparsity and robustness. With this objective in mind, we leverage the relaxed augmented Lagrangian based algorithms to prune the weights of adversarially trained DNNs, at both structured and unstructured levels. Using a Feynman-Kac formalism principled robust and sparse DNNs, we can at least double the channel sparsity of the adversarially trained ResNet20 for CIFAR10 classification, meanwhile, improve the natural accuracy by 8.69% and the robust accuracy under the benchmark 20 iterations of IFGSM attack by 5.42%. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1el9TEKPB |
https://openreview.net/pdf?id=S1el9TEKPB | |
PWC | https://paperswithcode.com/paper/sparsity-meets-robustness-channel-pruning-for |
Repo | |
Framework | |
Towards Interpretable Molecular Graph Representation Learning
Title | Towards Interpretable Molecular Graph Representation Learning |
Authors | Anonymous |
Abstract | Recent work in graph neural networks (GNNs) has led to improvements in molecular activity and property prediction tasks. Unfortunately, GNNs often fail to capture the relative importance of interactions between molecular substructures, in part due to the absence of efficient intermediate pooling steps. To address these issues, we propose LaPool (Laplacian Pooling), a novel, data-driven, and interpretable hierarchical graph pooling method that takes into account both node features and graph structure to improve molecular understanding. We benchmark LaPool and show that it not only outperforms recent GNNs on molecular graph understanding and prediction tasks but also remains highly competitive on other graph types. We then demonstrate the improved interpretability achieved with LaPool using both qualitative and quantitative assessments, highlighting its potential applications in drug discovery. |
Tasks | Drug Discovery, Graph Representation Learning, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyljY04YDB |
https://openreview.net/pdf?id=HyljY04YDB | |
PWC | https://paperswithcode.com/paper/towards-interpretable-molecular-graph |
Repo | |
Framework | |
Robust Reinforcement Learning via Adversarial Training with Langevin Dynamics
Title | Robust Reinforcement Learning via Adversarial Training with Langevin Dynamics |
Authors | Anonymous |
Abstract | We re-think the Two-Player Reinforcement Learning (RL) as an instance of a distribution sampling problem in infinite dimensions. Using the powerful Stochastic Gradient Langevin Dynamics, we propose a new two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our new algorithm consistently outperforms existing baselines, in terms of generalization across differing training and testing conditions, on several MuJoCo environments. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJl7mxBYvB |
https://openreview.net/pdf?id=BJl7mxBYvB | |
PWC | https://paperswithcode.com/paper/robust-reinforcement-learning-via-adversarial |
Repo | |
Framework | |
Hierarchical Graph-to-Graph Translation for Molecules
Title | Hierarchical Graph-to-Graph Translation for Molecules |
Authors | Anonymous |
Abstract | The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving the encoding of substructure components with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its attachment to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model significantly outperforms previous state-of-the-art baselines. |
Tasks | Drug Discovery, Graph-To-Graph Translation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJeeKTNKDB |
https://openreview.net/pdf?id=rJeeKTNKDB | |
PWC | https://paperswithcode.com/paper/hierarchical-graph-to-graph-translation-for |
Repo | |
Framework | |
Disentangling Improves VAEs’ Robustness to Adversarial Attacks
Title | Disentangling Improves VAEs’ Robustness to Adversarial Attacks |
Authors | Anonymous |
Abstract | This paper is concerned with the robustness of VAEs to adversarial attacks. We highlight that conventional VAEs are brittle under attack but that methods recently introduced for disentanglement such as β-TCVAE (Chen et al., 2018) improve robustness, as demonstrated through a variety of previously proposed adversarial attacks (Tabacof et al. (2016); Gondim-Ribeiro et al. (2018); Kos et al.(2018)). This motivated us to develop Seatbelt-VAE, a new hierarchical disentangled VAE that is designed to be significantly more robust to adversarial attacks than existing approaches, while retaining high quality reconstructions. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkeZ9a4Fwr |
https://openreview.net/pdf?id=rkeZ9a4Fwr | |
PWC | https://paperswithcode.com/paper/disentangling-improves-vaes-robustness-to |
Repo | |
Framework | |
Order Learning and Its Application to Age Estimation
Title | Order Learning and Its Application to Age Estimation |
Authors | Anonymous |
Abstract | We propose order learning to determine the order graph of classes, representing ranks or priorities, and classify an object instance into one of the classes. To this end, we design a pairwise comparator to categorize the relationship between two instances into one of three cases: one instance is greater than,' similar to,’ or `smaller than’ the other. Then, by comparing an input instance with reference instances and maximizing the consistency among the comparison results, the class of the input can be estimated reliably. We apply order learning to develop a facial age estimator, which provides the state-of-the-art performance. Moreover, the performance is further improved when the order graph is divided into disjoint chains using gender and ethnic group information or even in an unsupervised manner. | |
Tasks | Age Estimation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HygsuaNFwr |
https://openreview.net/pdf?id=HygsuaNFwr | |
PWC | https://paperswithcode.com/paper/order-learning-and-its-application-to-age |
Repo | |
Framework | |