April 1, 2020

3066 words 15 mins read

Paper Group NANR 22

Generating Biased Datasets for Neural Natural Language Processing. Conditional generation of molecules from disentangled representations. Shallow VAEs with RealNVP Prior Can Perform as Well as Deep Hierarchical VAEs. On PAC-Bayes Bounds for Deep Neural Networks using the Loss Curvature. Laconic Image Classification: Human vs. Machine Performance. A …

Generating Biased Datasets for Neural Natural Language Processing


Title	Generating Biased Datasets for Neural Natural Language Processing
Authors	Anonymous
Abstract	In a time where neural networks are increasingly adopted in sensitive applications, algorithmic bias has emerged as an issue with moral implications. While there are myriad ways that a system may be compromised by bias, systematically isolating and evaluating existing systems on such scenarios is non-trivial, i.e., bias may be subtle, natural and inherently difficult to quantify. To this end, this paper proposes the first systematic study of benchmarking state-of-the-art neural models against biased scenarios. More concretely, we postulate that the bias annotator problem can be approximated with neural models, i.e., we propose generative models of latent bias to deliberately and unfairly associate latent features to a specific class. All in all, our framework provides a new way for principled quantification and evaluation of models against biased datasets. Consequently, we find that state-of-the-art NLP models (e.g., BERT, RoBERTa, XLNET) are readily compromised by biased data.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SyekUyrFPS
PDF	https://openreview.net/pdf?id=SyekUyrFPS
PWC	https://paperswithcode.com/paper/generating-biased-datasets-for-neural-natural
Repo
Framework

Conditional generation of molecules from disentangled representations


Title	Conditional generation of molecules from disentangled representations
Authors	Anonymous
Abstract	Though machine learning approaches have shown great success in estimating properties of small molecules, the inverse problem of generating molecules with desired properties remains challenging. This difficulty is in part because the set of molecules which have a given property is structurally very diverse. Treating this inverse problem as a conditional distribution estimation task, we draw upon work in learning disentangled representations to learn a conditional distribution over molecules given a desired property, where the molecular structure is encoded in a continuous latent random variable. By including property information as an input factor independent from the structure representation, one can perform conditional molecule generation via a ``style transfer’’ process, in which we explicitly set the property to a desired value at generation time. In contrast to existing approaches, we disentangle the latent factors from the property factors using a regularization term which constrains the generated molecules to have the property provided to the generation network, no matter how the latent factor changes. \|
Tasks	Style Transfer
Published	2020-01-01
URL	https://openreview.net/forum?id=BkxthxHYvr
PDF	https://openreview.net/pdf?id=BkxthxHYvr
PWC	https://paperswithcode.com/paper/conditional-generation-of-molecules-from
Repo
Framework

Shallow VAEs with RealNVP Prior Can Perform as Well as Deep Hierarchical VAEs


Title	Shallow VAEs with RealNVP Prior Can Perform as Well as Deep Hierarchical VAEs
Authors	Anonymous
Abstract	Using powerful posterior distributions is a popular technique in variational inference. However, recent works showed that the aggregated posterior may fail to match unit Gaussian prior, even with expressive posteriors, thus learning the prior becomes an alternative way to improve the variational lower-bound. We show that using learned RealNVP prior and just one latent variable in VAE, we can achieve test NLL comparable to very deep state-of-the-art hierarchical VAE, outperforming many previous works with complex hierarchical VAE architectures. We hypothesize that, when coupled with Gaussian posteriors, the learned prior can encourage appropriate posterior overlapping, which is likely to improve reconstruction loss and lower-bound, supported by our experimental results. We demonstrate that, with learned RealNVP prior, ß-VAE can have better rate-distortion curve than using fixed Gaussian prior.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkeNlJSKvS
PDF	https://openreview.net/pdf?id=SkeNlJSKvS
PWC	https://paperswithcode.com/paper/shallow-vaes-with-realnvp-prior-can-perform
Repo
Framework

On PAC-Bayes Bounds for Deep Neural Networks using the Loss Curvature


Title	On PAC-Bayes Bounds for Deep Neural Networks using the Loss Curvature
Authors	Anonymous
Abstract	We investigate whether it’s possible to tighten PAC-Bayes bounds for deep neural networks by utilizing the Hessian of the training loss at the minimum. For the case of Gaussian priors and posteriors we introduce a Hessian-based method to obtain tighter PAC-Bayes bounds that relies on closed form solutions of layerwise subproblems. We thus avoid commonly used variational inference techniques which can be difficult to implement and time consuming for modern deep architectures. We conduct a theoretical analysis that links the random initialization, minimum, and curvature at the minimum of a deep neural network to limits on what is provable about generalization through PAC-Bayes. Through careful experiments we validate our theoretical predictions and analyze the influence of the prior mean, prior covariance, posterior mean and posterior covariance on obtaining tighter bounds.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SklgfkSFPH
PDF	https://openreview.net/pdf?id=SklgfkSFPH
PWC	https://paperswithcode.com/paper/on-pac-bayes-bounds-for-deep-neural-networks
Repo
Framework

Laconic Image Classification: Human vs. Machine Performance


Title	Laconic Image Classification: Human vs. Machine Performance
Authors	Anonymous
Abstract	We propose laconic classification as a novel way to understand and compare the performance of diverse image classifiers. The goal in this setting is to minimise the amount of information (aka. entropy) required in individual test images to maintain correct classification. Given a classifier and a test image, we compute an approximate minimal-entropy positive image for which the classifier provides a correct classification, becoming incorrect upon any further reduction. The notion of entropy offers a unifying metric that allows to combine and compare the effects of various types of reductions (e.g., crop, colour reduction, resolution reduction) on classification performance, in turn generalising similar methods explored in previous works. Proposing two complementary frameworks for computing the minimal-entropy positive images of both human and machine classifiers, in experiments over the ILSVRC test-set, we find that machine classifiers are more sensitive entropy-wise to reduced resolution (versus cropping or reduced colour for machines, as well as reduced resolution for humans), supporting recent results suggesting a texture bias in the ILSVRC-trained models used. We also find, in the evaluated setting, that humans classify the minimal-entropy positive images of machine models with higher precision than machines classify those of humans.
Tasks	Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=rJgPFgHFwr
PDF	https://openreview.net/pdf?id=rJgPFgHFwr
PWC	https://paperswithcode.com/paper/laconic-image-classification-human-vs-machine
Repo
Framework

A Generative Model for Molecular Distance Geometry


Title	A Generative Model for Molecular Distance Geometry
Authors	Anonymous
Abstract	Computing equilibrium states for many-body systems, such as molecules, is a long-standing challenge. In the absence of methods for generating statistically independent samples, great computational effort is invested in simulating these systems using, for example, Markov chain Monte Carlo. We present a probabilistic model that generates such samples for molecules from their graph representations. Our model learns a low-dimensional manifold that preserves the geometry of local atomic neighborhoods through a principled learning representation that is based on Euclidean distance geometry. We create a new dataset for molecular conformation generation with which we show experimentally that our generative model achieves state-of-the-art accuracy. Finally, we show how to use our model as a proposal distribution in an importance sampling scheme to compute molecular properties.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1x1IkHtPr
PDF	https://openreview.net/pdf?id=S1x1IkHtPr
PWC	https://paperswithcode.com/paper/a-generative-model-for-molecular-distance
Repo
Framework

Adversarial Privacy Preservation under Attribute Inference Attack


Title	Adversarial Privacy Preservation under Attribute Inference Attack
Authors	Anonymous
Abstract	With the prevalence of machine learning services, crowdsourced data containing sensitive information poses substantial privacy challenges. Existing work focusing on protecting against membership inference attacks under the rigorous framework of differential privacy are vulnerable to attribute inference attacks. In light of the current gap between theory and practice, we develop a novel theoretical framework for privacy-preservation under the attack of attribute inference. Under our framework, we propose a minimax optimization formulation to protect the given attribute and analyze its privacy guarantees against arbitrary adversaries. On the other hand, it is clear that privacy constraint may cripple utility when the protected attribute is correlated with the target variable. To this end, we also prove an information-theoretic lower bound to precisely characterize the fundamental trade-off between utility and privacy. Empirically, we extensively conduct experiments to corroborate our privacy guarantee and validate the inherent trade-offs in different privacy preservation algorithms. Our experimental results indicate that the adversarial representation learning approaches achieve the best trade-off in terms of privacy preservation and utility maximization.
Tasks	Inference Attack, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SkxoqRNKwr
PDF	https://openreview.net/pdf?id=SkxoqRNKwr
PWC	https://paperswithcode.com/paper/adversarial-privacy-preservation-under
Repo
Framework

BREAKING CERTIFIED DEFENSES: SEMANTIC ADVERSARIAL EXAMPLES WITH SPOOFED ROBUSTNESS CERTIFICATES


Title	BREAKING CERTIFIED DEFENSES: SEMANTIC ADVERSARIAL EXAMPLES WITH SPOOFED ROBUSTNESS CERTIFICATES
Authors	Anonymous
Abstract	Defenses against adversarial attacks can be classified into certified and non-certified. Certifiable defenses make networks robust within a certain $\ell_p$-bounded radius, so that it is impossible for the adversary to make adversarial examples in the certificate bound. We present an attack that maintains the imperceptibility property of adversarial examples while being outside of the certified radius. Furthermore, the proposed “Shadow Attack” can fool certifiably robust networks by producing an imperceptible adversarial example that gets misclassified and produces a strong ``spoofed’’ certificate. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJxdTxHYvB
PDF	https://openreview.net/pdf?id=HJxdTxHYvB
PWC	https://paperswithcode.com/paper/breaking-certified-defenses-semantic
Repo
Framework

Decentralized Deep Learning with Arbitrary Communication Compression


Title	Decentralized Deep Learning with Arbitrary Communication Compression
Authors	Anonymous
Abstract	Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters. As current approaches are limited by network bandwidth, we propose the use of communication compression in the decentralized training context. We show that Choco-SGD achieves linear speedup in the number of workers for arbitrary high compression ratios on general non-convex functions, and non-IID training data. We demonstrate the practical performance of the algorithm in two key scenarios: the training of deep learning models (i) over decentralized user devices, connected by a peer-to-peer network and (ii) in a datacenter.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkgGCkrKvH
PDF	https://openreview.net/pdf?id=SkgGCkrKvH
PWC	https://paperswithcode.com/paper/decentralized-deep-learning-with-arbitrary-1
Repo
Framework

Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation


Title	Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation
Authors	Anonymous
Abstract	Spiking Neural Networks (SNNs) operate with asynchronous discrete events (or spikes) which can potentially lead to higher energy-efficiency in neuromorphic hardware implementations. Many works have shown that an SNN for inference can be formed by copying the weights from a trained Artificial Neural Network (ANN) and setting the firing threshold for each layer as the maximum input received in that layer. These type of converted SNNs require a large number of time-steps to achieve competitive accuracy which diminishes the energy savings. The number of time-steps can be reduced by training SNNs with spike-based backpropagation from scratch, but that is computationally expensive and slow. To address these challenges, we present a computationally-efficient training technique for deep SNNs. We propose a hybrid training methodology: 1) take a converted SNN and use its weights and thresholds as an initialization step for spike-based backpropagation, and 2) perform incremental spike-timing dependent backpropagation (STDB) on this carefully initialized network to obtain an SNN that converges within few epochs and requires fewer time-steps for input processing. STDB is performed with a novel surrogate gradient function defined using neuron’s spike time. The weight update is proportional to the difference in spike timing between the current time-step and the most recent time-step the neuron generated an output spike. The SNNs trained with our hybrid conversion-and-STDB training perform at 10X-25X fewer number of time-steps and achieve similar accuracy compared to purely converted SNNs. The proposed training methodology converges in less than 20 epochs of spike-based backpropagation for most standard image classification datasets, thereby greatly reducing the training complexity compared to training SNNs from scratch. We perform experiments on CIFAR-10, CIFAR-100 and ImageNet datasets for both VGG and ResNet architectures. We achieve top-1 accuracy of 65.19% for ImageNet dataset on SNN with 250 time-steps, which is 10X faster compared to converted SNNs with similar accuracy.
Tasks	Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=B1xSperKvH
PDF	https://openreview.net/pdf?id=B1xSperKvH
PWC	https://paperswithcode.com/paper/enabling-deep-spiking-neural-networks-with
Repo
Framework

P-BN: Towards Effective Batch Normalization in the Path Space


Title	P-BN: Towards Effective Batch Normalization in the Path Space
Authors	Anonymous
Abstract	Neural networks with ReLU activation functions have demonstrated their success in many applications. Recently, researchers noticed a potential issue with the optimization of ReLU networks: the ReLU activation functions are positively scale-invariant (PSI), while the weights are not. This mismatch may lead to undesirable behaviors in the optimization process. Hence, some new algorithms that conduct optimizations directly in the path space (the path space is proven to be PSI) were developed, such as Stochastic Gradient Descent (SGD) in the path space, and it was shown that SGD in the path space is superior to that in the weight space. However, it is still unknown whether other deep learning techniques beyond SGD, such as batch normalization (BN), could also have their counterparts in the path space. In this paper, we conduct a formal study on the design of BN in the path space. According to our study, the key challenge is how to ensure the forward propagation in the path space, because BN is utilized during the forward process. To tackle such challenge, we propose a novel re-parameterization of ReLU networks, with which we replace each weight in the original neural network, with a new value calculated from one or several paths, while keeping the outputs of the network unchanged for any input. Then we show that BN in the path space, namely P-BN, is just a slightly modified conventional BN on the re-parameterized ReLU networks. Our experiments on two benchmark datasets, CIFAR and ImageNet, show that the proposed P-BN can signiﬁcantly outperform the conventional BN in the weight space.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJeXaJHKvB
PDF	https://openreview.net/pdf?id=BJeXaJHKvB
PWC	https://paperswithcode.com/paper/p-bn-towards-effective-batch-normalization-in
Repo
Framework

Undersensitivity in Neural Reading Comprehension


Title	Undersensitivity in Neural Reading Comprehension
Authors	Anonymous
Abstract	Neural reading comprehension models have recently achieved impressive gener- alisation results, yet still perform poorly when given adversarially selected input. Most prior work has studied semantically invariant text perturbations which cause a model’s prediction to change when it should not. In this work we focus on the complementary problem: excessive prediction undersensitivity where input text is meaningfully changed, and the model’s prediction does not change when it should. We formulate a noisy adversarial attack which searches among semantic variations of comprehension questions for which a model still erroneously pro- duces the same answer as the original question – and with an even higher prob- ability. We show that – despite comprising unanswerable questions – SQuAD2.0 and NewsQA models are vulnerable to this attack and commit a substantial frac- tion of errors on adversarially generated questions. This indicates that current models—even where they can correctly predict the answer—rely on spurious sur- face patterns and are not necessarily aware of all information provided in a given comprehension question. Developing this further, we experiment with both data augmentation and adversarial training as defence strategies: both are able to sub- stantially decrease a model’s vulnerability to undersensitivity attacks on held out evaluation data. Finally, we demonstrate that adversarially robust models gener- alise better in a biased data setting with a train/evaluation distribution mismatch; they are less prone to overly rely on predictive cues only present in the training set and outperform a conventional model in the biased data setting by up to 11% F1.
Tasks	Adversarial Attack, Data Augmentation, Reading Comprehension
Published	2020-01-01
URL	https://openreview.net/forum?id=HkgxheBFDS
PDF	https://openreview.net/pdf?id=HkgxheBFDS
PWC	https://paperswithcode.com/paper/undersensitivity-in-neural-reading
Repo
Framework

Causally Correct Partial Models for Reinforcement Learning


Title	Causally Correct Partial Models for Reinforcement Learning
Authors	Anonymous
Abstract	In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent’s next actions. However, jointly modeling future observations can be computationally expensive or even intractable if the observations are high-dimensional (e.g. images). For this reason, previous works have considered partial models, which model only part of the observation. In this paper, we show that partial models can be causally incorrect: they are confounded by the observations they don’t model, and can therefore lead to incorrect planning. To address this, we introduce a general family of partial models that are provably causally correct, but avoid the need to fully model future observations.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyeG9yHKPr
PDF	https://openreview.net/pdf?id=HyeG9yHKPr
PWC	https://paperswithcode.com/paper/causally-correct-partial-models-for
Repo
Framework

Why Convolutional Networks Learn Oriented Bandpass Filters: A Hypothesis


Title	Why Convolutional Networks Learn Oriented Bandpass Filters: A Hypothesis
Authors	Anonymous
Abstract	It has been repeatedly observed that convolutional architectures when applied to image understanding tasks learn oriented bandpass filters. A standard explanation of this result is that these filters reflect the structure of the images that they have been exposed to during training: Natural images typically are locally composed of oriented contours at various scales and oriented bandpass filters are matched to such structure. The present paper offers an alternative explanation based not on the structure of images, but rather on the structure of convolutional architectures. In particular, complex exponentials are the eigenfunctions of convolution. These eigenfunctions are defined globally; however, convolutional architectures operate locally. To enforce locality, one can apply a windowing function to the eigenfunctions, which leads to oriented bandpass filters as the natural operators to be learned with convolutional architectures. From a representational point of view, these filters allow for a local systematic way to characterize and operate on an image or other signal.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1ekaT4tDB
PDF	https://openreview.net/pdf?id=S1ekaT4tDB
PWC	https://paperswithcode.com/paper/why-convolutional-networks-learn-oriented
Repo
Framework

Channel Equilibrium Networks


Title	Channel Equilibrium Networks
Authors	Anonymous
Abstract	Convolutional Neural Networks (CNNs) typically treat normalization methods such as batch normalization (BN) and rectified linear function (ReLU) as building blocks. Previous work showed that this basic block would lead to channel-level sparsity (i.e. channel of zero values), reducing computational complexity of CNNs. However, over-sparse CNNs have many collapsed channels (i.e. many channels with undesired zero values), impeding their learning ability. This problem is seldom explored in the literature. To recover the collapsed channels and enhance learning capacity, we propose a building block, Channel Equilibrium (CE), which takes the output of a normalization layer as input and switches between two branches, batch decorrelation (BD) branch and adaptive instance inverse (AII) branch. CE is able to prevent implicit channel-level sparsity in both experiments and theory. It has several appealing properties. First, CE can be stacked after many normalization methods such as BN and Group Normalization (GN), and integrated into many advanced CNN architectures such as ResNet and MobileNet V2 to form a series of CE networks (CENets), consistently improving their performance. Second, extensive experiments show that CE achieves state-of-the-art results on various challenging benchmarks such as ImageNet and COCO. Third, we show an interesting connection between CE and Nash Equilibrium, a well-known solution of a non-cooperative game. The models and code will be released soon.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJlOcR4KwS
PDF	https://openreview.net/pdf?id=BJlOcR4KwS
PWC	https://paperswithcode.com/paper/channel-equilibrium-networks
Repo
Framework