April 1, 2020

2987 words 15 mins read

Paper Group NANR 11

Paper Group NANR 11

When Robustness Doesn’t Promote Robustness: Synthetic vs. Natural Distribution Shifts on ImageNet. A Unified framework for randomized smoothing based certified defenses. Augmenting Transformers with KNN-Based Composite Memory. Near-Zero-Cost Differentially Private Deep Learning with Teacher Ensembles. TOWARDS STABILIZING BATCH STATISTICS IN BACKWAR …

When Robustness Doesn’t Promote Robustness: Synthetic vs. Natural Distribution Shifts on ImageNet

Title When Robustness Doesn’t Promote Robustness: Synthetic vs. Natural Distribution Shifts on ImageNet
Authors Anonymous
Abstract We conduct a large experimental comparison of various robustness metrics for image classification. The main question of our study is to what extent current synthetic robustness interventions (lp-adversarial examples, noise corruptions, etc.) promote robustness under natural distribution shifts occurring in real data. To this end, we evaluate 147 ImageNet models under 199 different evaluation settings. We find that no current robustness intervention improves robustness on natural distribution shifts beyond a baseline given by standard models without a robustness intervention. The only exception is the use of larger training datasets, which provides a small increase in robustness on one natural distribution shift. Our results indicate that robustness improvements on real data may require new methodology and more evaluations on natural distribution shifts.
Tasks Image Classification
Published 2020-01-01
URL https://openreview.net/forum?id=HyxPIyrFvH
PDF https://openreview.net/pdf?id=HyxPIyrFvH
PWC https://paperswithcode.com/paper/when-robustness-doesnt-promote-robustness
Repo
Framework

A Unified framework for randomized smoothing based certified defenses

Title A Unified framework for randomized smoothing based certified defenses
Authors Anonymous
Abstract Randomized smoothing, which was recently proved to be a certified defensive technique, has received considerable attention due to its scalability to large datasets and neural networks. However, several important questions still remain unanswered in the existing frameworks, such as (i) whether Gaussian mechanism is an optimal choice for certifying $\ell_2$-normed robustness, and (ii) whether randomized smoothing can certify $\ell_\infty$-normed robustness (on high-dimensional datasets like ImageNet). To answer these questions, we introduce a {\em unified} and {\em self-contained} framework to study randomized smoothing-based certified defenses, where we mainly focus on the two most popular norms in adversarial machine learning, {\em i.e.,} $\ell_2$ and $\ell_\infty$ norm. We answer the above two questions by first demonstrating that Gaussian mechanism and Exponential mechanism are the (near) optimal options to certify the $\ell_2$ and $\ell_\infty$-normed robustness. We further show that the largest $\ell_\infty$ radius certified by randomized smoothing is upper bounded by $O(1/\sqrt{d})$, where $d$ is the dimensionality of the data. This theoretical finding suggests that certifying $\ell_\infty$-normed robustness by randomized smoothing may not be scalable to high-dimensional data. The veracity of our framework and analysis is verified by extensive evaluations on CIFAR10 and ImageNet.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=ryl71a4YPB
PDF https://openreview.net/pdf?id=ryl71a4YPB
PWC https://paperswithcode.com/paper/a-unified-framework-for-randomized-smoothing
Repo
Framework

Augmenting Transformers with KNN-Based Composite Memory

Title Augmenting Transformers with KNN-Based Composite Memory
Authors Anonymous
Abstract Various machine learning tasks can benefit from access to external information of different modalities, such as text and images. Recent work has focused on learning architectures with large memories capable of storing this knowledge. We propose augmenting Transformer neural networks with KNN-based Information Fetching (KIF) modules. Each KIF module learns a read operation to access fixed external knowledge. We apply these modules to generative dialogue modeling, a challenging task where information must be flexibly retrieved and incorporated to maintain the topic and flow of conversation. We demonstrate the effectiveness of our approach by identifying relevant knowledge from Wikipedia, images, and human-written dialogue utterances, and show that leveraging this retrieved information improves model performance, measured by automatic and human evaluation.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=H1gx1CNKPH
PDF https://openreview.net/pdf?id=H1gx1CNKPH
PWC https://paperswithcode.com/paper/augmenting-transformers-with-knn-based
Repo
Framework

Near-Zero-Cost Differentially Private Deep Learning with Teacher Ensembles

Title Near-Zero-Cost Differentially Private Deep Learning with Teacher Ensembles
Authors Anonymous
Abstract Ensuring the privacy of sensitive data used to train modern machine learning models is of paramount importance in many areas of practice. One approach to study these concerns is through the lens of differential privacy. In this framework, privacy guarantees are generally obtained by perturbing models in such a way that specifics of data used to train the model are made ambiguous. A particular instance of this approach is through a ``teacher-student’’ model, wherein the teacher, who owns the sensitive data, provides the student with useful, but noisy, information, hopefully allowing the student model to perform well on a given task without access to particular features of the sensitive data. Because stronger privacy guarantees generally involve more significant noising on the part of the teacher, deploying existing frameworks fundamentally involves a trade-off between utility and privacy guarantee. One of the most important techniques used in previous work involves an ensemble of teacher models, which return information to a student based on a noisy voting procedure. In this work, we propose a novel voting mechanism, which we call an Immutable Noisy ArgMax, that, under certain conditions, can bear very large random noising from the teacher without affecting the useful information transferred to the student. Our mechanisms improve over the state-of-the-art methods on all measures, and scale to larger tasks with both higher utility and stronger privacy ($\epsilon \approx 0$). |
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Syl38yrFwr
PDF https://openreview.net/pdf?id=Syl38yrFwr
PWC https://paperswithcode.com/paper/near-zero-cost-differentially-private-deep
Repo
Framework

TOWARDS STABILIZING BATCH STATISTICS IN BACKWARD PROPAGATION OF BATCH NORMALIZATION

Title TOWARDS STABILIZING BATCH STATISTICS IN BACKWARD PROPAGATION OF BATCH NORMALIZATION
Authors Anonymous
Abstract Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption. In this paper, we reveal that there are two extra batch statistics involved in backward propagation of BN, on which has never been well discussed before. The extra batch statistics associated with gradients also can severely affect the training of deep neural network. Based on our analysis, we propose a novel normalization method, named Moving Average Batch Normalization (MABN). MABN can completely restore the performance of vanilla BN in small batch cases, without introducing any additional nonlinear operations in inference procedure. We prove the benefits of MABN by both theoretical analysis and experiments. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SkgGjRVKDS
PDF https://openreview.net/pdf?id=SkgGjRVKDS
PWC https://paperswithcode.com/paper/towards-stabilizing-batch-statistics-in
Repo
Framework

GroSS Decomposition: Group-Size Series Decomposition for Whole Search-Space Training

Title GroSS Decomposition: Group-Size Series Decomposition for Whole Search-Space Training
Authors Anonymous
Abstract We present Group-size Series (GroSS) decomposition, a mathematical formulation of tensor factorisation into a series of approximations of increasing rank terms. GroSS allows for dynamic and differentiable selection of factorisation rank, which is analogous to a grouped convolution. Therefore, to the best of our knowledge, GroSS is the first method to simultaneously train differing numbers of groups within a single layer, as well as all possible combinations between layers. In doing so, GroSS trains an entire grouped convolution architecture search-space concurrently. We demonstrate this with a proof-of-concept exhaustive architecure search with a performance objective. GroSS represents a significant step towards liberating network architecture search from the burden of training and finetuning.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJe_cyrKPB
PDF https://openreview.net/pdf?id=rJe_cyrKPB
PWC https://paperswithcode.com/paper/gross-decomposition-group-size-series
Repo
Framework

DeepPCM: Predicting Protein-Ligand Binding using Unsupervised Learned Representations

Title DeepPCM: Predicting Protein-Ligand Binding using Unsupervised Learned Representations
Authors Anonymous
Abstract In-silico protein-ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to make an accurate model of the protein-ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous work in PCM modeling relies on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings which outperform complex, human-engineered representations. We apply this reasoning to propose a novel proteochemometric modeling methodology which, for the first time, uses embeddings generated via unsupervised representation learning for both the protein and ligand descriptors. We evaluate performance on various splits of a benchmark dataset, including a challenging split that tests the model’s ability to generalize to proteins for which bioactivity data is greatly limited, and we find that our method consistently outperforms state-of-the-art methods.
Tasks Drug Discovery, Representation Learning, Unsupervised Representation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=SklEhlHtPr
PDF https://openreview.net/pdf?id=SklEhlHtPr
PWC https://paperswithcode.com/paper/deeppcm-predicting-protein-ligand-binding
Repo
Framework

Few-Shot One-Class Classification via Meta-Learning

Title Few-Shot One-Class Classification via Meta-Learning
Authors Anonymous
Abstract Although few-shot learning and one-class classification have been separately well studied, their intersection remains rather unexplored. Our work addresses the few-shot one-class classification problem and presents a meta-learning approach that requires only few data examples from only one class to adapt to unseen tasks. The proposed method builds upon the model-agnostic meta-learning (MAML) algorithm (Finn et al., 2017) and explicitly trains for few-shot class-imbalance learning, aiming to learn a model initialization that is particularly suited for learning one-class classification tasks after observing only a few examples of one class. Experimental results on datasets from the image domain and the time-series domain show that our model substantially outperforms the baselines, including MAML, and demonstrate the ability to learn new tasks from only few majority class samples. Moreover, we successfully learn anomaly detectors for a real world application involving sensor readings recorded during industrial manufacturing of workpieces with a CNC milling machine using only few examples from the normal class.
Tasks Few-Shot Learning, Meta-Learning, Time Series
Published 2020-01-01
URL https://openreview.net/forum?id=B1ltfgSYwS
PDF https://openreview.net/pdf?id=B1ltfgSYwS
PWC https://paperswithcode.com/paper/few-shot-one-class-classification-via-meta
Repo
Framework

VIMPNN: A physics informed neural network for estimating potential energies of out-of-equilibrium systems

Title VIMPNN: A physics informed neural network for estimating potential energies of out-of-equilibrium systems
Authors Anonymous
Abstract Simulation of molecular and crystal systems enables insight into interesting chemical properties that benefit processes ranging from drug discovery to material synthesis. However these simulations can be computationally expensive and time consuming despite the approximations through Density Functional Theory (DFT). We propose the Valence Interaction Message Passing Neural Network (VIMPNN) to approximate DFT’s ground-state energy calculations. VIMPNN integrates physics prior knowledge such as the existence of different interatomic bounds to estimate more accurate energies. Furthermore, while many previous machine learning methods consider only stable systems, our proposed method is demonstrated on unstable systems at different atomic distances. VIMPNN predictions can be used to determine the stable configurations of systems, i.e. stable distance for atoms – a necessary step for the future simulation of crystal growth for example. Our method is extensively evaluated on a augmented version of the QM9 dataset that includes unstable molecules, as well as a new dataset of infinite- and finite-size crystals, and is compared with the Message Passing Neural Network (MPNN). VIMPNN has comparable accuracy with DFT, while allowing for 5 orders of magnitude in computational speed up compared to DFT simulations, and produces more accurate and informative potential energy curves than MPNN for estimating stable configurations.
Tasks Drug Discovery
Published 2020-01-01
URL https://openreview.net/forum?id=HJl8SgHtwr
PDF https://openreview.net/pdf?id=HJl8SgHtwr
PWC https://paperswithcode.com/paper/vimpnn-a-physics-informed-neural-network-for
Repo
Framework

Semi-Supervised Learning with Normalizing Flows

Title Semi-Supervised Learning with Normalizing Flows
Authors Anonymous
Abstract We propose Flow Gaussian Mixture Model (FlowGMM), a general-purpose method for semi-supervised learning based on a simple and principled probabilistic framework. We approximate the joint distribution of the labeled and unlabeled data with a flexible mixture model implemented as a Gaussian mixture transformed by a normalizing flow. We train the model by maximizing the exact joint likelihood of the labeled and unlabeled data. We evaluate FlowGMM on a wide range of semi-supervised classification problems across different data types: AG-News and Yahoo Answers text data, MNIST, SVHN and CIFAR-10 image classification problems as well as tabular UCI datasets. FlowGMM achieves promising results on image classification problems and outperforms the competing methods on other types of data. FlowGMM learns an interpretable latent repesentation space and allows hyper-parameter free feature visualization at real time rates. Finally, we show that FlowGMM can be calibrated to produce meaningful uncertainty estimates for its predictions.
Tasks Image Classification
Published 2020-01-01
URL https://openreview.net/forum?id=BJg_2JHKvH
PDF https://openreview.net/pdf?id=BJg_2JHKvH
PWC https://paperswithcode.com/paper/semi-supervised-learning-with-normalizing
Repo
Framework

Meta-Learning Runge-Kutta

Title Meta-Learning Runge-Kutta
Authors Anonymous
Abstract Initial value problems, i.e. differential equations with specific, initial conditions, represent a classic problem within the field of ordinary differential equations(ODEs). While the simplest types of ODEs may have closed-form solutions, most interesting cases typically rely on iterative schemes for numerical integration such as the family of Runge-Kutta methods. They are, however, sensitive to the strategy the step size is adapted during integration, which has to be chosen by the experimenter. In this paper, we show how the design of a step size controller can be cast as a learning problem, allowing deep networks to learn to exploit structure in the initial value problem at hand in an automatic way. The key ingredients for the resulting Meta-Learning Runge-Kutta (MLRK) are the development of a good performance measure and the identification of suitable input features. Traditional approaches suggest the local error estimates as input to the controller. However, by studying the characteristics of the local error function we show that including the partial derivatives of the initial value problem is favorable. Our experiments demonstrate considerable benefits over traditional approaches. In particular, MLRK is able to mitigate sudden spikes in the local error function by a faster adaptation of the step size. More importantly, the additional information in the form of partial derivatives and function values leads to a substantial improvement in performance. The source code can be found at https://www.dropbox.com/sh/rkctdfhkosywnnx/AABKadysCR8-aHW_0kb6vCtSa?dl=0
Tasks Meta-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rkesVkHtDr
PDF https://openreview.net/pdf?id=rkesVkHtDr
PWC https://paperswithcode.com/paper/meta-learning-runge-kutta
Repo
Framework

Step Size Optimization

Title Step Size Optimization
Authors Anonymous
Abstract This paper proposes a new approach for step size adaptation in gradient methods. The proposed method called step size optimization (SSO) formulates the step size adaptation as an optimization problem which minimizes the loss function with respect to the step size for the given model parameters and gradients. Then, the step size is optimized based on alternating direction method of multipliers (ADMM). SSO does not require the second-order information or any probabilistic models for adapting the step size, so it is efficient and easy to implement. Furthermore, we also introduce stochastic SSO for stochastic learning environments. In the experiments, we integrated SSO to vanilla SGD and Adam, and they outperformed state-of-the-art adaptive gradient methods including RMSProp, Adam, L4-Adam, and AdaBound on extensive benchmark datasets.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Sygg3JHtwB
PDF https://openreview.net/pdf?id=Sygg3JHtwB
PWC https://paperswithcode.com/paper/step-size-optimization
Repo
Framework

Resolving Lexical Ambiguity in English–Japanese Neural Machine Translation

Title Resolving Lexical Ambiguity in English–Japanese Neural Machine Translation
Authors Anonymous
Abstract Lexical ambiguity, i.e., the presence of two or more meanings for a single word, is an inherent and challenging problem for machine translation systems. Even though the use of recurrent neural networks and attention mechanisms are expected to solve this problem, machine translation systems are not always able to correctly translate lexically ambiguous sentences. In this work, I attempt to resolve the problem of lexical ambiguity in English–Japanese neural machine translation systems by combining a pretrained Bidirectional Encoder Representations from Transformer (BERT) language model that can produce contextualized word embeddings and a Transformer translation model, which is a state-of-the-art architecture for the machine translation task. These two proposed architectures have been shown to be more effective in translating ambiguous sentences than a vanilla Transformer model and the Google Translate system. Furthermore, one of the proposed models, the Transformer_BERT-WE, achieves a higher BLEU score compared to the vanilla Transformer model in terms of general translation, which is concrete proof that the use of contextualized word embeddings from BERT can not only solve the problem of lexical ambiguity, but also boost the translation quality in general.
Tasks Language Modelling, Machine Translation, Word Embeddings
Published 2020-01-01
URL https://openreview.net/forum?id=HJeIrlSFDH
PDF https://openreview.net/pdf?id=HJeIrlSFDH
PWC https://paperswithcode.com/paper/resolving-lexical-ambiguity-in
Repo
Framework

Behavior Regularized Offline Reinforcement Learning

Title Behavior Regularized Offline Reinforcement Learning
Authors Anonymous
Abstract In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment. However in many real-world applications, access to the environment is limited to a fixed offline dataset of logged experience. In such settings, standard RL algorithms have been shown to diverge or otherwise yield poor performance. Accordingly, much recent work has suggested a number of remedies to these issues. In this work, we introduce a general framework, behavior regularized actor critic (BRAC), to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks. Surprisingly, we find that many of the technical complexities introduced in recent methods are unnecessary to achieve strong performance. Additional ablations provide insights into which design choices matter most in the offline RL setting.
Tasks Continuous Control
Published 2020-01-01
URL https://openreview.net/forum?id=BJg9hTNKPH
PDF https://openreview.net/pdf?id=BJg9hTNKPH
PWC https://paperswithcode.com/paper/behavior-regularized-offline-reinforcement
Repo
Framework

On the Distribution of Penultimate Activations of Classification Networks

Title On the Distribution of Penultimate Activations of Classification Networks
Authors Anonymous
Abstract This paper considers probability distributions of penultimate activations in deep classification networks. We first identify a dual relation between the activations and the weights of the final fully connected layer: learning the networks with the cross-entropy loss makes their (normalized) penultimate activations follow a von Mises-Fisher distribution for each class, which is parameterized by the weights of the final fully-connected layer. Through this analysis, we derive a probability density function of penultimate activations per class. This generative model allows us to synthesize activations of classification networks without feeding images forward through them. We also demonstrate through experiments that our generative model of penultimate activations can be applied to real-world applications such as knowledge distillation and class-conditional image generation.
Tasks Conditional Image Generation, Image Generation
Published 2020-01-01
URL https://openreview.net/forum?id=SJx371HFvr
PDF https://openreview.net/pdf?id=SJx371HFvr
PWC https://paperswithcode.com/paper/on-the-distribution-of-penultimate
Repo
Framework
comments powered by Disqus