April 1, 2020

3116 words 15 mins read

Paper Group NANR 48

Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing. Estimating counterfactual treatment outcomes over time through adversarially balanced representations. Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition. Hypermodels for Exploration. SpikeGrad: An ANN-equiv …

Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing


Title	Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing
Authors	Anonymous
Abstract	It is well-known that classifiers are vulnerable to adversarial perturbations. To defend against adversarial perturbations, various certified robustness results have been derived. However, existing certified robustnesses are limited to top-1 predictions. In many real-world applications, top-k predictions are more relevant. In this work, we aim to derive certified robustness for top-k predictions. In particular, our certified robustness is based on randomized smoothing, which turns any classifier to a new classifier via adding noise to an input example. We adopt randomized smoothing because it is scalable to large-scale neural networks and applicable to any classifier. We derive a tight robustness in L_2 norm for top-k predictions when using randomized smoothing with Gaussian noise. We find that generalizing the certified robustness from top-1 to top-k predictions faces significant technical challenges. We also empirically evaluate our method on CIFAR10 and ImageNet. For example, our method can obtain an ImageNet classifier with a certified top-5 accuracy of 62.8% when the L_2-norms of the adversarial perturbations are less than 0.5 (=127/255).
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BkeWw6VFwr
PDF	https://openreview.net/pdf?id=BkeWw6VFwr
PWC	https://paperswithcode.com/paper/certified-robustness-for-top-k-predictions
Repo
Framework

Estimating counterfactual treatment outcomes over time through adversarially balanced representations


Title	Estimating counterfactual treatment outcomes over time through adversarially balanced representations
Authors	Anonymous
Abstract	Identifying when to give treatments to patients and how to select among multiple treatments over time are important medical problems with a few existing solutions. In this paper, we introduce the Counterfactual Recurrent Network (CRN), a novel sequence-to-sequence model that leverages the increasingly available patient observational data to estimate treatment effects over time and answer such medical questions. To handle the bias from time-varying confounders, covariates affecting the treatment assignment policy in the observational data, CRN uses domain adversarial training to build balancing representations of the patient history. At each timestep, CRN constructs a treatment invariant representation which removes the association between patient history and treatment assignments and thus can be reliably used for making counterfactual predictions. On a simulated model of tumour growth, with varying degree of time-dependent confounding, we show how our model achieves lower error in estimating counterfactuals and in choosing the correct treatment and timing of treatment than current state-of-the-art methods.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJg866NFvB
PDF	https://openreview.net/pdf?id=BJg866NFvB
PWC	https://paperswithcode.com/paper/estimating-counterfactual-treatment-outcomes
Repo
Framework

Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition


Title	Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition
Authors	Anonymous
Abstract	When constructing random forests, it is of prime importance to ensure high accuracy and low correlation of individual tree classifiers for good performance. Nevertheless, it is typically difficult for existing random forest methods to strike a good balance between these conflicting factors. In this work, we propose a generalized convolutional forest networks to learn a feature space to maximize the strength of individual tree classifiers while minimizing the respective correlation. The feature space is iteratively constructed by a probabilistic triplet sampling method based on the distribution obtained from the splits of the random forest. The sampling process is designed to pull the data of the same label together for higher strength and push away the data frequently falling to the same leaf nodes. We perform extensive experiments on five image classification and two domain generalization datasets with ResNet-50 and DenseNet-161 backbone networks. Experimental results show that the proposed algorithm performs favorably against state-of-the-art methods.
Tasks	Domain Generalization, Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lxVyStPH
PDF	https://openreview.net/pdf?id=H1lxVyStPH
PWC	https://paperswithcode.com/paper/generalized-convolutional-forest-networks-for
Repo
Framework

Hypermodels for Exploration


Title	Hypermodels for Exploration
Authors	Anonymous
Abstract	We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its size, and as such, prior work has typically been limited to ensembles with tens of elements. We show that alternative hypermodels can enjoy dramatic efficiency gains, enabling behavior that would otherwise require hundreds or thousands of elements, and even succeed in situations where ensemble methods fail to learn regardless of size. This allows more accurate approximation of Thompson sampling as well as use of more sophisticated exploration schemes. In particular, we consider an approximate form of information-directed sampling and demonstrate performance gains relative to Thompson sampling. As alternatives to ensembles, we consider linear and neural network hypermodels, also known as hypernetworks. We prove that, with neural network base models, a linear hypermodel can represent essentially any distribution over functions, and as such, hypernetworks do not extend what can be represented.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryx6WgStPB
PDF	https://openreview.net/pdf?id=ryx6WgStPB
PWC	https://paperswithcode.com/paper/hypermodels-for-exploration
Repo
Framework

SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes


Title	SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes
Authors	Anonymous
Abstract	Event-based neuromorphic systems promise to reduce the energy consumption of deep neural networks by replacing expensive floating point operations on dense matrices by low energy, sparse operations on spike events. While these systems can be trained increasingly well using approximations of the backpropagation algorithm, this usually requires high precision errors and is therefore incompatible with the typical communication infrastructure of neuromorphic circuits. In this work, we analyze how the gradient can be discretized into spike events when training a spiking neural network. To accelerate our simulation, we show that using a special implementation of the integrate-and-fire neuron allows us to describe the accumulated activations and errors of the spiking neural network in terms of an equivalent artificial neural network, allowing us to largely speed up training compared to an explicit simulation of all spike events. This way we are able to demonstrate that even for deep networks, the gradients can be discretized sufficiently well with spikes if the gradient is properly rescaled. This form of spike-based backpropagation enables us to achieve equivalent or better accuracies on the MNIST and CIFAR10 datasets than comparable state-of-the-art spiking neural networks trained with full precision gradients. The algorithm, which we call SpikeGrad, is based on only accumulation and comparison operations and can naturally exploit sparsity in the gradient computation, which makes it an interesting choice for a spiking neuromorphic systems with on-chip learning capacities.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkxs0yHFPH
PDF	https://openreview.net/pdf?id=rkxs0yHFPH
PWC	https://paperswithcode.com/paper/spikegrad-an-ann-equivalent-computation-model
Repo
Framework

EnsembleNet: A novel architecture for Incremental Learning


Title	EnsembleNet: A novel architecture for Incremental Learning
Authors	Anonymous
Abstract	Deep neural networks are used in many state-of-the-art systems for machine perception. Once a network is trained to do a specific task, it cannot be easily trained to do new tasks as it leads to catastrophic forgetting of the previously learned tasks. We propose here a novel architecture called EnsembleNet that accommodates for newer classes of data without having to retrain previously trained sub-models. The novelty of our model lies in the fact that only a small portion of the network has to be retrained which makes it extremely computational efficient and also results in high performance compared to other architectures in the literature. We demonstrated our model on MNIST Handwritten digits, MNIST Fashion, and CIFAR10 datasets. The proposed architecture was benchmarked against other models in the literature on Omega-new, Omega-base, Omega-all metrics for MNIST- Handwritten dataset. The experimental results show that Ensemble Net on overall outperformed every other model in the literature.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SylDPJrYvS
PDF	https://openreview.net/pdf?id=SylDPJrYvS
PWC	https://paperswithcode.com/paper/ensemblenet-a-novel-architecture-for
Repo
Framework

Improving Sequential Latent Variable Models with Autoregressive Flows


Title	Improving Sequential Latent Variable Models with Autoregressive Flows
Authors	Anonymous
Abstract	We propose an approach for sequence modeling based on autoregressive normalizing flows. Each autoregressive transform, acting across time, serves as a moving reference frame for modeling higher-level dynamics. This technique provides a simple, general-purpose method for improving sequence modeling, with connections to existing and classical techniques. We demonstrate the proposed approach both with standalone models, as well as a part of larger sequential latent variable models. Results are presented on three benchmark video datasets, where flow-based dynamics improve log-likelihood performance over baseline models.
Tasks	Latent Variable Models
Published	2020-01-01
URL	https://openreview.net/forum?id=HklvmlrKPB
PDF	https://openreview.net/pdf?id=HklvmlrKPB
PWC	https://paperswithcode.com/paper/improving-sequential-latent-variable-models
Repo
Framework

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case


Title	A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
Authors	Anonymous
Abstract	A key element of understanding the efficacy of overparameterized neural networks is characterizing how they represent functions as the number of weights in the network approaches infinity. In this paper, we characterize the norm required to realize any function as a single hidden-layer ReLU network with an unbounded number of units (infinite width), but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm. This was settled for univariate functions in Savarese et al. (2019), where it was shown that the required norm is determined by the L1-norm of the second derivative of the function. We extend the characterization to multi-variate functions (i.e., multiple input units), relating the required norm to the L1-norm of the Radon transform of a higher-order Laplacian of the function. This characterization allows us to show that all functions in a Sobolev space, can be represented with bounded norm, to calculate the required norm for several specific functions, and to obtain a depth separation result. These results have important implications for understanding generalization performance and the distinction between neural networks and more traditional kernel learning.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lNPxHKDH
PDF	https://openreview.net/pdf?id=H1lNPxHKDH
PWC	https://paperswithcode.com/paper/a-function-space-view-of-bounded-norm
Repo
Framework

Variational Recurrent Models for Solving Partially Observable Control Tasks


Title	Variational Recurrent Models for Solving Partially Observable Control Tasks
Authors	Anonymous
Abstract	In partially observable (PO) environments, deep reinforcement learning (RL) agents often suffer from unsatisfactory performance, since two problems need to be tackled together: how to extract information from the raw observations to solve the task, and how to improve the policy. In this study, we propose an RL algorithm for solving PO tasks. Our method comprises two parts: a variational recurrent model (VRM) for modeling the environment, and an RL controller that has access to both the environment and the VRM. The proposed algorithm was tested in two types of PO robotic control tasks, those in which either coordinates or velocities were not observable and those that require long-term memorization. Our experiments show that the proposed algorithm achieved better data efficiency and/or learned more optimal policy than other alternative approaches in tasks in which unobserved states cannot be inferred from raw observations in a simple manner.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1lL4a4tDB
PDF	https://openreview.net/pdf?id=r1lL4a4tDB
PWC	https://paperswithcode.com/paper/variational-recurrent-models-for-solving
Repo
Framework

Kronecker Attention Networks


Title	Kronecker Attention Networks
Authors	Hongyang Gao, Zhengyang Wang, Shuiwang Ji
Abstract	Attention operators have been applied on both 1-D data like texts and higher-order data such as images and videos. Use of attention operators on high-order data requires flattening of the spatial or spatial-temporal dimensions into a vector, which is assumed to follow a multivariate normal distribution. This not only incurs excessive requirements on computational resources, but also fails to preserve structures in data. In this work, we propose to avoid flattening by developing Kronecker attention operators (KAOs) that operate on high-order tensor data directly. KAOs lead to dramatic reductions in computational resources. Moreover, we analyze KAOs theoretically from a probabilistic perspective and point out that KAOs assume the data follow matrix-variate normal distributions. Experimental results show that KAOs reduce the amount of required computational resources by a factor of hundreds, with larger factors for higher-dimensional and higher-order data. Results also show that networks with KAOs outperform models without attention, while achieving competitive performance as those with original attention operators.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hyx_h64Yvr
PDF	https://openreview.net/pdf?id=Hyx_h64Yvr
PWC	https://paperswithcode.com/paper/kronecker-attention-networks
Repo
Framework

Constant Curvature Graph Convolutional Networks


Title	Constant Curvature Graph Convolutional Networks
Authors	Anonymous
Abstract	Interest has been rising lately towards methods representing data in non-Euclidean spaces, e.g. hyperbolic or spherical. These geometries provide specific inductive biases useful for certain real-world data properties, e.g. scale-free or hierarchical graphs are best embedded in a hyperbolic space. However, the very popular class of graph neural networks is currently limited to model data only via Euclidean node embeddings and associated vector space operations. In this work, we bridge this gap by proposing mathematically grounded generalizations of graph convolutional networks (GCN) to (products of) constant curvature spaces. We do this by i) extending the gyro-vector space theory from hyperbolic to spherical spaces, providing a unified and smooth view of the two geometries, ii) leveraging gyro-barycentric coordinates that generalize the classic Euclidean concept of the center of mass. Our class of models gives strict generalizations in the sense that they recover their Euclidean counterparts when the curvature goes to zero from either side. Empirically, our methods outperform different types of classic Euclidean GCNs in the tasks of node classification and minimizing distortion for symbolic data exhibiting non-Euclidean behavior, according to their discrete curvature.
Tasks	Node Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=BJg73xHtvr
PDF	https://openreview.net/pdf?id=BJg73xHtvr
PWC	https://paperswithcode.com/paper/constant-curvature-graph-convolutional
Repo
Framework

Discovering Topics With Neural Topic Models Built From PLSA Loss


Title	Discovering Topics With Neural Topic Models Built From PLSA Loss
Authors	Anonymous
Abstract	In this paper we present a model for unsupervised topic discovery in texts corpora. The proposed model uses documents, words, and topics lookup table embedding as neural network model parameters to build probabilities of words given topics, and probabilities of topics given documents. These probabilities are used to recover by marginalization probabilities of words given documents. For very large corpora where the number of documents can be in the order of billions, using a neural auto-encoder based document embedding is more scalable then using a lookup table embedding as classically done. We thus extended the lookup based document embedding model to continuous auto-encoder based model. Our models are trained using probabilistic latent semantic analysis (PLSA) assumptions. We evaluated our models on six datasets with a rich variety of contents. Conducted experiments demonstrate that the proposed neural topic models are very effective in capturing relevant topics. Furthermore, considering perplexity metric, conducted evaluation benchmarks show that our topic models outperform latent Dirichlet allocation (LDA) model which is classically used to address topic discovery tasks.
Tasks	Document Embedding, Topic Models
Published	2020-01-01
URL	https://openreview.net/forum?id=Skx24yHFDr
PDF	https://openreview.net/pdf?id=Skx24yHFDr
PWC	https://paperswithcode.com/paper/discovering-topics-with-neural-topic-models
Repo
Framework

Revisiting Self-Training for Neural Sequence Generation


Title	Revisiting Self-Training for Neural Sequence Generation
Authors	Anonymous
Abstract	Self-training is one of the earliest and simplest semi-supervised methods. The key idea is to augment the original labeled dataset with unlabeled data paired with the model’s prediction. Self-training has mostly been well-studied to classification problems. However, in complex sequence generation tasks such as machine translation, it is still not clear how self-training woks due to the compositionality of the target space. In this work, we first show that it is not only possible but recommended to apply self-training in sequence generation. Through careful examination of the performance gains, we find that the noise added on the hidden states (e.g. dropout) is critical to the success of self-training, as this acts like a regularizer which forces the model to yield similar predictions for similar inputs from unlabeled data. To further encourage this mechanism, we propose to inject noise to the input space, resulting in a “noisy” version of self-training. Empirical study on standard benchmarks across machine translation and text summarization tasks under different resource settings shows that noisy self-training is able to effectively utilize unlabeled data and improve the baseline performance by large margin.
Tasks	Machine Translation, Text Summarization
Published	2020-01-01
URL	https://openreview.net/forum?id=SJgdnAVKDH
PDF	https://openreview.net/pdf?id=SJgdnAVKDH
PWC	https://paperswithcode.com/paper/revisiting-self-training-for-neural-sequence-1
Repo
Framework

The Secret Revealer: Generative Model Inversion Attacks Against Deep Neural Networks


Title	The Secret Revealer: Generative Model Inversion Attacks Against Deep Neural Networks
Authors	Anonymous
Abstract	This paper studies \emph{model inversion attacks}, in which the access to a model is abused to infer information about the training data. Since its first introduction by~\citet{fredrikson2014privacy}, such attacks have raised serious concerns given that training data usually contain sensitive information. Thus far, successful model inversion attacks have only been demonstrated on simple models, such as linear regression and logistic regression. Previous attempts to invert neural networks, even the ones with simple architectures, have failed to produce convincing results. We present a novel attack method, termed the \emph{generative model inversion attack}, which can invert deep neural networks with high success rates. Rather than reconstructing private training data from scratch, we leverage partial public information, which can be very generic, to learn a distributional prior via generative adversarial networks (GANs) and use it to guide the inversion process. Moreover, we theoretically prove that a model’s predictive power and its vulnerability to inversion attacks are indeed two sides of the same coin—highly predictive models are able to establish a strong correlation between features and labels, which coincides exactly with what an adversary exploits to mount the attacks. Our experiments demonstrate that the proposed attack improves identification accuracy over the existing work by about $75%$ for reconstructing face images from a state-of-the-art face recognition classifier. We also show that differential privacy, in its canonical form, is of little avail to protect against our attacks.
Tasks	Face Recognition
Published	2020-01-01
URL	https://openreview.net/forum?id=ByevJerKwS
PDF	https://openreview.net/pdf?id=ByevJerKwS
PWC	https://paperswithcode.com/paper/the-secret-revealer-generative-model
Repo
Framework

Bias-Resilient Neural Network


Title	Bias-Resilient Neural Network
Authors	Anonymous
Abstract	Presence of bias and confounding effects is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in the recent years. Such challenges range from spurious associations of confounding variables in medical studies to the bias of race in gender or face recognition systems. One solution is to enhance datasets and organize them such that they do not reflect biases, which is a cumbersome and intensive task. The alternative is to make use of available data and build models considering these biases. Traditional statistical methods apply straightforward techniques such as residualization or stratification to precomputed features to account for confounding variables. However, these techniques are not in general applicable to end-to-end deep learning methods. In this paper, we propose a method based on the adversarial training strategy to learn discriminative features unbiased and invariant to the confounder(s). This is enabled by incorporating a new adversarial loss function that encourages a vanished correlation between the bias and learned features. We apply our method to a synthetic, a medical diagnosis, and a gender classification (Gender Shades) dataset. Our results show that the learned features by our method not only result in superior prediction performance but also are uncorrelated with the bias or confounder variables. The code is available at http://blinded_for_review/.
Tasks	Face Recognition, Medical Diagnosis
Published	2020-01-01
URL	https://openreview.net/forum?id=Bke8764twr
PDF	https://openreview.net/pdf?id=Bke8764twr
PWC	https://paperswithcode.com/paper/bias-resilient-neural-network-1
Repo
Framework