April 1, 2020

2703 words 13 mins read

Paper Group NANR 38

Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates. $\alpha^{\alpha}$-Rank: Scalable Multi-agent Evaluation through Evolution. Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness. Zero-shot task adaptation by homoiconic meta-mapping. Towards Effective and Efficient Zero-shot Learning by Fin …

Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates


Title	Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates
Authors	Anonymous
Abstract	Learning with noisy labels is a common problem in supervised learning. Existing approaches require practitioners to specify noise rates, i.e., a set of parameters controlling the severity of label noises in the problem. In this work, we introduce a technique to learn from noisy labels that does not require a priori specification of the noise rates. In particular, we introduce a new family of loss functions that we name as peer loss functions. Our approach then uses a standard empirical risk minimization (ERM) framework with peer loss functions. Peer loss functions associate each training sample with a certain form of “peer” samples, which evaluate a classifier’ predictions jointly. We show that, under mild conditions, performing ERM with peer loss functions on the noisy dataset leads to the optimal or a near optimal classifier as if performing ERM over the clean training data, which we do not have access to. To our best knowledge, this is the first result on “learning with noisy labels without knowing noise rates” with theoretical guarantees. We pair our results with an extensive set of experiments, where we compare with state-of-the-art techniques of learning with noisy labels. Our results show that peer loss functions based method consistently outperforms the baseline benchmarks. Peer loss provides a way to simplify model development when facing potentially noisy training labels, and can be promoted as a robust candidate loss function in such situations.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkgq9ANKvB
PDF	https://openreview.net/pdf?id=Bkgq9ANKvB
PWC	https://paperswithcode.com/paper/peer-loss-functions-learning-from-noisy-1
Repo
Framework

$\alpha^{\alpha}$-Rank: Scalable Multi-agent Evaluation through Evolution


Title	$\alpha^{\alpha}$-Rank: Scalable Multi-agent Evaluation through Evolution
Authors	Anonymous
Abstract	Although challenging, strategy profile evaluation in large connected learner networks is crucial for enabling the next wave of machine learning applications. Recently, $\alpha$-Rank, an evolutionary algorithm, has been proposed as a solution for ranking joint policy profiles in multi-agent systems. $\alpha$-Rank claimed scalability through a polynomial time implementation with respect to the total number of pure strategy profiles. In this paper, we formally prove that such a claim is not grounded. In fact, we show that $\alpha$-Rank exhibits an exponential complexity in number of agents, hindering its application beyond a small finite number of joint profiles. Realizing such a limitation, we contribute by proposing a scalable evaluation protocol that we title $\alpha^{\alpha}$-Rank. Our method combines evolutionary dynamics with stochastic optimization and double oracles for \emph{truly} scalable ranking with linear (in number of agents) time and memory complexities. Our contributions allow us, for the first time, to conduct large-scale evaluation experiments of multi-agent systems, where we show successful results on large joint strategy profiles with sizes in the order of $\mathcal{O}(2^{25})$ (i.e., $\approx \text{$33$ million strategies}$) – a setting not evaluable using current techniques.
Tasks	Stochastic Optimization
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkg_8xBYDS
PDF	https://openreview.net/pdf?id=Hkg_8xBYDS
PWC	https://paperswithcode.com/paper/alphaalpha-rank-scalable-multi-agent
Repo
Framework

Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness


Title	Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness
Authors	Anonymous
Abstract	Recently $\ell^4$-norm maximization has been proposed to solve the sparse dictionary learning (SDL) problem. The simple MSP (matching, stretching, and projection) algorithm proposed by \cite{zhai2019a} has shown to be surprisingly efficient and effective. This paper aims to better understand this algorithm from its strong geometric and statistical connections with the classic PCA and ICA, as well as their associated fixed-point style algorithms. Such connections provide a unified way of viewing problems that pursue {\em principal}, {\em independent}, or {\em sparse} components of high-dimensional data. Our studies reveal additional good properties of the $\ell^4$-maximization: not only is the MSP algorithm for sparse coding insensitive to small noise, but also robust to outliers, and resilient to sparse corruptions. We provide preliminary statistical justification for such inherently nice properties. To corroborate the theoretical analysis, we also provide extensive and compelling experimental evidence with both synthetic data and real images.
Tasks	Dictionary Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeY-1BKDS
PDF	https://openreview.net/pdf?id=SJeY-1BKDS
PWC	https://paperswithcode.com/paper/understanding-l4-based-dictionary-learning
Repo
Framework

Zero-shot task adaptation by homoiconic meta-mapping


Title	Zero-shot task adaptation by homoiconic meta-mapping
Authors	Anonymous
Abstract	How can deep learning systems flexibly reuse their knowledge? Toward this goal, we propose a new class of challenges, and a class of architectures that can solve them. The challenges are meta-mappings, which involve systematically transforming task behaviors to adapt to new tasks zero-shot. We suggest that the key to achieving these challenges is representing the task being performed in such a way that this task representation is itself transformable. We therefore draw inspiration from functional programming and recent work in meta-learning to propose a class of Homoiconic Meta-Mapping (HoMM) approaches that represent data points and tasks in a shared latent space, and learn to infer transformations of that space. HoMM approaches can be applied to any type of machine learning task, including supervised learning and reinforcement learning. We demonstrate the utility of this perspective by exhibiting zero-shot remapping of behavior to adapt to new tasks.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HyeX7aVKvr
PDF	https://openreview.net/pdf?id=HyeX7aVKvr
PWC	https://paperswithcode.com/paper/zero-shot-task-adaptation-by-homoiconic-meta
Repo
Framework

Towards Effective and Efficient Zero-shot Learning by Fine-tuning with Task Descriptions


Title	Towards Effective and Efficient Zero-shot Learning by Fine-tuning with Task Descriptions
Authors	Anonymous
Abstract	While current machine learning models have achieved great success with labeled data, we have to deal with classes that have little or no training data in many real-world applications. This leads to the study of zero-shot learning. The typical approach in zero-shot learning is to embed seen and unseen classes into a shared space using class meta-data, and construct classifiers on top of that. Yet previous methods either still require significant manual labor in obtaining useful meta-data, or utilize automatically collected meta-data while trading in performance. To achieve satisfactory performance under practical meta-data efficiency constraint, we propose \textbf{N\textsuperscript{3}} (\textbf{N}eural \textbf{N}etworks from \textbf{N}atural Language), a meta-model that maps natural language class descriptions to corresponding neural network classifiers. N\textsuperscript{3} leverages readily available online documents combined with pretrained language representations such as BERT to obtain expressive class embeddings. In addition, N\textsuperscript{3} generates parameter adaptations for pretrained neural networks using these class embeddings, effectively `finetuneing'' the network to classify unseen classes. Our experiments show that N\textsuperscript{3} is able to outperform previous methods across 8 different benchmark evaluations and we show through ablation studies the contribution of each model component. To offer insight into how N\textsuperscript{3}` finetunes’’ the pretrained network, we also performed a range of qualitative and quantitative analysis. Our code will be released after the review period.
Tasks	Zero-Shot Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rkxNQJrFPH
PDF	https://openreview.net/pdf?id=rkxNQJrFPH
PWC	https://paperswithcode.com/paper/towards-effective-and-efficient-zero-shot
Repo
Framework

Characterize and Transfer Attention in Graph Neural Networks


Title	Characterize and Transfer Attention in Graph Neural Networks
Authors	Anonymous
Abstract	Does attention matter and, if so, when and how? Our study on both inductive and transductive learning suggests that datasets have a strong influence on the effects of attention in graph neural networks. Independent of learning setting, task and attention variant, attention mostly degenerate to simple averaging for all three citation networks, whereas they behave strikingly different in the protein-protein interaction networks and molecular graphs: nodes attend to different neighbors per head and get more focused in deeper layers. Consequently, attention distributions become telltale features of the datasets themselves. We further explore the possibility of transferring attention for graph sparsification and show that, when applicable, attention-based sparsification retains enough information to obtain good performance while reducing computational and storage costs. Finally, we point out several possible directions for further study and transfer of attention.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkeBBJrFPH
PDF	https://openreview.net/pdf?id=SkeBBJrFPH
PWC	https://paperswithcode.com/paper/characterize-and-transfer-attention-in-graph
Repo
Framework

Capsules with Inverted Dot-Product Attention Routing


Title	Capsules with Inverted Dot-Product Attention Routing
Authors	Anonymous
Abstract	We introduce a new routing algorithm for capsule networks, in which a child capsule is routed to a parent based only on agreement between the parent’s state and the child’s vote. Unlike previously proposed routing algorithms, the parent’s ability to reconstruct the child is not explicitly taken into account to update the routing probabilities. This simplifies the routing procedure and improves performance on benchmark datasets such as CIFAR-10 and CIFAR-100. The new mechanism 1) designs routing via inverted dot-product attention; 2) imposes Layer Normalization as normalization; and 3) replaces sequential iterative routing with concurrent iterative routing. Besides outperforming existing capsule networks, our model performs at-par with a powerful CNN (ResNet-18), using less than 25% of the parameters. On a different task of recognizing digits from overlayed digit images, the proposed capsule model performs favorably against CNNs given the same number of layers and neurons per layer. We believe that our work raises the possibility of applying capsule networks to complex real-world tasks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJe6uANtwH
PDF	https://openreview.net/pdf?id=HJe6uANtwH
PWC	https://paperswithcode.com/paper/capsules-with-inverted-dot-product-attention
Repo
Framework

Sparse Transformer: Concentrated Attention Through Explicit Selection


Title	Sparse Transformer: Concentrated Attention Through Explicit Selection
Authors	Anonymous
Abstract	Self-attention-based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks. Self attention is able to model long-term dependencies, but it may suffer from the extraction of irrelevant information in the context. To tackle the problem, we propose a novel model called Sparse Transformer. Sparse Transformer is able to improve the concentration of attention on the global context through an explicit selection of the most relevant segments. Extensive experimental results on a series of natural language processing tasks, including neural machine translation, image captioning, and language modeling, all demonstrate the advantages of Sparse Transformer in model performance. Sparse Transformer reaches the state-of-the-art performances in the IWSLT 2015 English-to-Vietnamese translation and IWSLT 2014 German-to-English translation. In addition, we conduct qualitative analysis to account for Sparse Transformer’s superior performance.
Tasks	Image Captioning, Language Modelling, Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=Hye87grYDH
PDF	https://openreview.net/pdf?id=Hye87grYDH
PWC	https://paperswithcode.com/paper/sparse-transformer-concentrated-attention
Repo
Framework

Disentangled Representation Learning with Sequential Residual Variational Autoencoder


Title	Disentangled Representation Learning with Sequential Residual Variational Autoencoder
Authors	Nanxiang Li, Shabnam Ghaffarzadegan, Liu Ren
Abstract	Recent advancements in unsupervised disentangled representation learning focus on extending the variational autoencoder (VAE) with an augmented objective function to balance the trade-off between disentanglement and reconstruction. We propose Sequential Residual Variational Autoencoder (SR-VAE) that defines a “Residual learning” mechanism as the training regime instead of the augmented objective function. Our proposed solution deploys two important ideas in a single framework: (1) learning from the residual between the input data and the accumulated reconstruction of sequentially added latent variables; (2) decomposing the reconstruction into decoder output and a residual term. This formulation encourages the disentanglement in the latent space by inducing explicit dependency structure, and reduces the bottleneck of VAE by adding the residual term to facilitate reconstruction. More importantly, SR-VAE eliminates the hyperparameter tuning, a crucial step for the prior state-of-the-art performance using the objective function augmentation approach. We demonstrate both qualitatively and quantitatively that SR-VAE improves the state-of-the-art unsupervised disentangled representation learning on a variety of complex datasets.
Tasks	Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Sklyn6EYvH
PDF	https://openreview.net/pdf?id=Sklyn6EYvH
PWC	https://paperswithcode.com/paper/disentangled-representation-learning-with-1
Repo
Framework

Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets


Title	Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets
Authors	Anonymous
Abstract	Deep neural nets (DNNs) compression is crucial for adaptation to mobile devices. Though many successful algorithms exist to compress naturally trained DNNs, developing efficient and stable compression algorithms for robustly trained DNNs remains widely open. In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep learning. Such a co-design enables us to advance the goal of accommodating both sparsity and robustness. With this objective in mind, we leverage the relaxed augmented Lagrangian based algorithms to prune the weights of adversarially trained DNNs, at both structured and unstructured levels. Using a Feynman-Kac formalism principled robust and sparse DNNs, we can at least double the channel sparsity of the adversarially trained ResNet20 for CIFAR10 classification, meanwhile, improve the natural accuracy by 8.69% and the robust accuracy under the benchmark 20 iterations of IFGSM attack by 5.42%.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1el9TEKPB
PDF	https://openreview.net/pdf?id=S1el9TEKPB
PWC	https://paperswithcode.com/paper/sparsity-meets-robustness-channel-pruning-for
Repo
Framework

Towards Interpretable Molecular Graph Representation Learning


Title	Towards Interpretable Molecular Graph Representation Learning
Authors	Anonymous
Abstract	Recent work in graph neural networks (GNNs) has led to improvements in molecular activity and property prediction tasks. Unfortunately, GNNs often fail to capture the relative importance of interactions between molecular substructures, in part due to the absence of efficient intermediate pooling steps. To address these issues, we propose LaPool (Laplacian Pooling), a novel, data-driven, and interpretable hierarchical graph pooling method that takes into account both node features and graph structure to improve molecular understanding. We benchmark LaPool and show that it not only outperforms recent GNNs on molecular graph understanding and prediction tasks but also remains highly competitive on other graph types. We then demonstrate the improved interpretability achieved with LaPool using both qualitative and quantitative assessments, highlighting its potential applications in drug discovery.
Tasks	Drug Discovery, Graph Representation Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HyljY04YDB
PDF	https://openreview.net/pdf?id=HyljY04YDB
PWC	https://paperswithcode.com/paper/towards-interpretable-molecular-graph
Repo
Framework

Robust Reinforcement Learning via Adversarial Training with Langevin Dynamics


Title	Robust Reinforcement Learning via Adversarial Training with Langevin Dynamics
Authors	Anonymous
Abstract	We re-think the Two-Player Reinforcement Learning (RL) as an instance of a distribution sampling problem in infinite dimensions. Using the powerful Stochastic Gradient Langevin Dynamics, we propose a new two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our new algorithm consistently outperforms existing baselines, in terms of generalization across differing training and testing conditions, on several MuJoCo environments.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJl7mxBYvB
PDF	https://openreview.net/pdf?id=BJl7mxBYvB
PWC	https://paperswithcode.com/paper/robust-reinforcement-learning-via-adversarial
Repo
Framework

Hierarchical Graph-to-Graph Translation for Molecules


Title	Hierarchical Graph-to-Graph Translation for Molecules
Authors	Anonymous
Abstract	The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving the encoding of substructure components with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its attachment to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model significantly outperforms previous state-of-the-art baselines.
Tasks	Drug Discovery, Graph-To-Graph Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=rJeeKTNKDB
PDF	https://openreview.net/pdf?id=rJeeKTNKDB
PWC	https://paperswithcode.com/paper/hierarchical-graph-to-graph-translation-for
Repo
Framework

Disentangling Improves VAEs’ Robustness to Adversarial Attacks


Title	Disentangling Improves VAEs’ Robustness to Adversarial Attacks
Authors	Anonymous
Abstract	This paper is concerned with the robustness of VAEs to adversarial attacks. We highlight that conventional VAEs are brittle under attack but that methods recently introduced for disentanglement such as β-TCVAE (Chen et al., 2018) improve robustness, as demonstrated through a variety of previously proposed adversarial attacks (Tabacof et al. (2016); Gondim-Ribeiro et al. (2018); Kos et al.(2018)). This motivated us to develop Seatbelt-VAE, a new hierarchical disentangled VAE that is designed to be significantly more robust to adversarial attacks than existing approaches, while retaining high quality reconstructions.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkeZ9a4Fwr
PDF	https://openreview.net/pdf?id=rkeZ9a4Fwr
PWC	https://paperswithcode.com/paper/disentangling-improves-vaes-robustness-to
Repo
Framework

Order Learning and Its Application to Age Estimation


Title	Order Learning and Its Application to Age Estimation
Authors	Anonymous
Abstract	We propose order learning to determine the order graph of classes, representing ranks or priorities, and classify an object instance into one of the classes. To this end, we design a pairwise comparator to categorize the relationship between two instances into one of three cases: one instance is `greater than,'` similar to,’ or `smaller than’ the other. Then, by comparing an input instance with reference instances and maximizing the consistency among the comparison results, the class of the input can be estimated reliably. We apply order learning to develop a facial age estimator, which provides the state-of-the-art performance. Moreover, the performance is further improved when the order graph is divided into disjoint chains using gender and ethnic group information or even in an unsupervised manner. \|
Tasks	Age Estimation
Published	2020-01-01
URL	https://openreview.net/forum?id=HygsuaNFwr
PDF	https://openreview.net/pdf?id=HygsuaNFwr
PWC	https://paperswithcode.com/paper/order-learning-and-its-application-to-age
Repo
Framework