Paper Group NANR 22
Generating Biased Datasets for Neural Natural Language Processing. Conditional generation of molecules from disentangled representations. Shallow VAEs with RealNVP Prior Can Perform as Well as Deep Hierarchical VAEs. On PAC-Bayes Bounds for Deep Neural Networks using the Loss Curvature. Laconic Image Classification: Human vs. Machine Performance. A …
Generating Biased Datasets for Neural Natural Language Processing
Title | Generating Biased Datasets for Neural Natural Language Processing |
Authors | Anonymous |
Abstract | In a time where neural networks are increasingly adopted in sensitive applications, algorithmic bias has emerged as an issue with moral implications. While there are myriad ways that a system may be compromised by bias, systematically isolating and evaluating existing systems on such scenarios is non-trivial, i.e., bias may be subtle, natural and inherently difficult to quantify. To this end, this paper proposes the first systematic study of benchmarking state-of-the-art neural models against biased scenarios. More concretely, we postulate that the bias annotator problem can be approximated with neural models, i.e., we propose generative models of latent bias to deliberately and unfairly associate latent features to a specific class. All in all, our framework provides a new way for principled quantification and evaluation of models against biased datasets. Consequently, we find that state-of-the-art NLP models (e.g., BERT, RoBERTa, XLNET) are readily compromised by biased data. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyekUyrFPS |
https://openreview.net/pdf?id=SyekUyrFPS | |
PWC | https://paperswithcode.com/paper/generating-biased-datasets-for-neural-natural |
Repo | |
Framework | |
Conditional generation of molecules from disentangled representations
Title | Conditional generation of molecules from disentangled representations |
Authors | Anonymous |
Abstract | Though machine learning approaches have shown great success in estimating properties of small molecules, the inverse problem of generating molecules with desired properties remains challenging. This difficulty is in part because the set of molecules which have a given property is structurally very diverse. Treating this inverse problem as a conditional distribution estimation task, we draw upon work in learning disentangled representations to learn a conditional distribution over molecules given a desired property, where the molecular structure is encoded in a continuous latent random variable. By including property information as an input factor independent from the structure representation, one can perform conditional molecule generation via a ``style transfer’’ process, in which we explicitly set the property to a desired value at generation time. In contrast to existing approaches, we disentangle the latent factors from the property factors using a regularization term which constrains the generated molecules to have the property provided to the generation network, no matter how the latent factor changes. | |
Tasks | Style Transfer |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkxthxHYvr |
https://openreview.net/pdf?id=BkxthxHYvr | |
PWC | https://paperswithcode.com/paper/conditional-generation-of-molecules-from |
Repo | |
Framework | |
Shallow VAEs with RealNVP Prior Can Perform as Well as Deep Hierarchical VAEs
Title | Shallow VAEs with RealNVP Prior Can Perform as Well as Deep Hierarchical VAEs |
Authors | Anonymous |
Abstract | Using powerful posterior distributions is a popular technique in variational inference. However, recent works showed that the aggregated posterior may fail to match unit Gaussian prior, even with expressive posteriors, thus learning the prior becomes an alternative way to improve the variational lower-bound. We show that using learned RealNVP prior and just one latent variable in VAE, we can achieve test NLL comparable to very deep state-of-the-art hierarchical VAE, outperforming many previous works with complex hierarchical VAE architectures. We hypothesize that, when coupled with Gaussian posteriors, the learned prior can encourage appropriate posterior overlapping, which is likely to improve reconstruction loss and lower-bound, supported by our experimental results. We demonstrate that, with learned RealNVP prior, ß-VAE can have better rate-distortion curve than using fixed Gaussian prior. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkeNlJSKvS |
https://openreview.net/pdf?id=SkeNlJSKvS | |
PWC | https://paperswithcode.com/paper/shallow-vaes-with-realnvp-prior-can-perform |
Repo | |
Framework | |
On PAC-Bayes Bounds for Deep Neural Networks using the Loss Curvature
Title | On PAC-Bayes Bounds for Deep Neural Networks using the Loss Curvature |
Authors | Anonymous |
Abstract | We investigate whether it’s possible to tighten PAC-Bayes bounds for deep neural networks by utilizing the Hessian of the training loss at the minimum. For the case of Gaussian priors and posteriors we introduce a Hessian-based method to obtain tighter PAC-Bayes bounds that relies on closed form solutions of layerwise subproblems. We thus avoid commonly used variational inference techniques which can be difficult to implement and time consuming for modern deep architectures. We conduct a theoretical analysis that links the random initialization, minimum, and curvature at the minimum of a deep neural network to limits on what is provable about generalization through PAC-Bayes. Through careful experiments we validate our theoretical predictions and analyze the influence of the prior mean, prior covariance, posterior mean and posterior covariance on obtaining tighter bounds. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SklgfkSFPH |
https://openreview.net/pdf?id=SklgfkSFPH | |
PWC | https://paperswithcode.com/paper/on-pac-bayes-bounds-for-deep-neural-networks |
Repo | |
Framework | |
Laconic Image Classification: Human vs. Machine Performance
Title | Laconic Image Classification: Human vs. Machine Performance |
Authors | Anonymous |
Abstract | We propose laconic classification as a novel way to understand and compare the performance of diverse image classifiers. The goal in this setting is to minimise the amount of information (aka. entropy) required in individual test images to maintain correct classification. Given a classifier and a test image, we compute an approximate minimal-entropy positive image for which the classifier provides a correct classification, becoming incorrect upon any further reduction. The notion of entropy offers a unifying metric that allows to combine and compare the effects of various types of reductions (e.g., crop, colour reduction, resolution reduction) on classification performance, in turn generalising similar methods explored in previous works. Proposing two complementary frameworks for computing the minimal-entropy positive images of both human and machine classifiers, in experiments over the ILSVRC test-set, we find that machine classifiers are more sensitive entropy-wise to reduced resolution (versus cropping or reduced colour for machines, as well as reduced resolution for humans), supporting recent results suggesting a texture bias in the ILSVRC-trained models used. We also find, in the evaluated setting, that humans classify the minimal-entropy positive images of machine models with higher precision than machines classify those of humans. |
Tasks | Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJgPFgHFwr |
https://openreview.net/pdf?id=rJgPFgHFwr | |
PWC | https://paperswithcode.com/paper/laconic-image-classification-human-vs-machine |
Repo | |
Framework | |
A Generative Model for Molecular Distance Geometry
Title | A Generative Model for Molecular Distance Geometry |
Authors | Anonymous |
Abstract | Computing equilibrium states for many-body systems, such as molecules, is a long-standing challenge. In the absence of methods for generating statistically independent samples, great computational effort is invested in simulating these systems using, for example, Markov chain Monte Carlo. We present a probabilistic model that generates such samples for molecules from their graph representations. Our model learns a low-dimensional manifold that preserves the geometry of local atomic neighborhoods through a principled learning representation that is based on Euclidean distance geometry. We create a new dataset for molecular conformation generation with which we show experimentally that our generative model achieves state-of-the-art accuracy. Finally, we show how to use our model as a proposal distribution in an importance sampling scheme to compute molecular properties. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1x1IkHtPr |
https://openreview.net/pdf?id=S1x1IkHtPr | |
PWC | https://paperswithcode.com/paper/a-generative-model-for-molecular-distance |
Repo | |
Framework | |
Adversarial Privacy Preservation under Attribute Inference Attack
Title | Adversarial Privacy Preservation under Attribute Inference Attack |
Authors | Anonymous |
Abstract | With the prevalence of machine learning services, crowdsourced data containing sensitive information poses substantial privacy challenges. Existing work focusing on protecting against membership inference attacks under the rigorous framework of differential privacy are vulnerable to attribute inference attacks. In light of the current gap between theory and practice, we develop a novel theoretical framework for privacy-preservation under the attack of attribute inference. Under our framework, we propose a minimax optimization formulation to protect the given attribute and analyze its privacy guarantees against arbitrary adversaries. On the other hand, it is clear that privacy constraint may cripple utility when the protected attribute is correlated with the target variable. To this end, we also prove an information-theoretic lower bound to precisely characterize the fundamental trade-off between utility and privacy. Empirically, we extensively conduct experiments to corroborate our privacy guarantee and validate the inherent trade-offs in different privacy preservation algorithms. Our experimental results indicate that the adversarial representation learning approaches achieve the best trade-off in terms of privacy preservation and utility maximization. |
Tasks | Inference Attack, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkxoqRNKwr |
https://openreview.net/pdf?id=SkxoqRNKwr | |
PWC | https://paperswithcode.com/paper/adversarial-privacy-preservation-under |
Repo | |
Framework | |
BREAKING CERTIFIED DEFENSES: SEMANTIC ADVERSARIAL EXAMPLES WITH SPOOFED ROBUSTNESS CERTIFICATES
Title | BREAKING CERTIFIED DEFENSES: SEMANTIC ADVERSARIAL EXAMPLES WITH SPOOFED ROBUSTNESS CERTIFICATES |
Authors | Anonymous |
Abstract | Defenses against adversarial attacks can be classified into certified and non-certified. Certifiable defenses make networks robust within a certain $\ell_p$-bounded radius, so that it is impossible for the adversary to make adversarial examples in the certificate bound. We present an attack that maintains the imperceptibility property of adversarial examples while being outside of the certified radius. Furthermore, the proposed “Shadow Attack” can fool certifiably robust networks by producing an imperceptible adversarial example that gets misclassified and produces a strong ``spoofed’’ certificate. | |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJxdTxHYvB |
https://openreview.net/pdf?id=HJxdTxHYvB | |
PWC | https://paperswithcode.com/paper/breaking-certified-defenses-semantic |
Repo | |
Framework | |
Decentralized Deep Learning with Arbitrary Communication Compression
Title | Decentralized Deep Learning with Arbitrary Communication Compression |
Authors | Anonymous |
Abstract | Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters. As current approaches are limited by network bandwidth, we propose the use of communication compression in the decentralized training context. We show that Choco-SGD achieves linear speedup in the number of workers for arbitrary high compression ratios on general non-convex functions, and non-IID training data. We demonstrate the practical performance of the algorithm in two key scenarios: the training of deep learning models (i) over decentralized user devices, connected by a peer-to-peer network and (ii) in a datacenter. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkgGCkrKvH |
https://openreview.net/pdf?id=SkgGCkrKvH | |
PWC | https://paperswithcode.com/paper/decentralized-deep-learning-with-arbitrary-1 |
Repo | |
Framework | |
Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation
Title | Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation |
Authors | Anonymous |
Abstract | Spiking Neural Networks (SNNs) operate with asynchronous discrete events (or spikes) which can potentially lead to higher energy-efficiency in neuromorphic hardware implementations. Many works have shown that an SNN for inference can be formed by copying the weights from a trained Artificial Neural Network (ANN) and setting the firing threshold for each layer as the maximum input received in that layer. These type of converted SNNs require a large number of time-steps to achieve competitive accuracy which diminishes the energy savings. The number of time-steps can be reduced by training SNNs with spike-based backpropagation from scratch, but that is computationally expensive and slow. To address these challenges, we present a computationally-efficient training technique for deep SNNs. We propose a hybrid training methodology: 1) take a converted SNN and use its weights and thresholds as an initialization step for spike-based backpropagation, and 2) perform incremental spike-timing dependent backpropagation (STDB) on this carefully initialized network to obtain an SNN that converges within few epochs and requires fewer time-steps for input processing. STDB is performed with a novel surrogate gradient function defined using neuron’s spike time. The weight update is proportional to the difference in spike timing between the current time-step and the most recent time-step the neuron generated an output spike. The SNNs trained with our hybrid conversion-and-STDB training perform at 10X-25X fewer number of time-steps and achieve similar accuracy compared to purely converted SNNs. The proposed training methodology converges in less than 20 epochs of spike-based backpropagation for most standard image classification datasets, thereby greatly reducing the training complexity compared to training SNNs from scratch. We perform experiments on CIFAR-10, CIFAR-100 and ImageNet datasets for both VGG and ResNet architectures. We achieve top-1 accuracy of 65.19% for ImageNet dataset on SNN with 250 time-steps, which is 10X faster compared to converted SNNs with similar accuracy. |
Tasks | Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1xSperKvH |
https://openreview.net/pdf?id=B1xSperKvH | |
PWC | https://paperswithcode.com/paper/enabling-deep-spiking-neural-networks-with |
Repo | |
Framework | |
P-BN: Towards Effective Batch Normalization in the Path Space
Title | P-BN: Towards Effective Batch Normalization in the Path Space |
Authors | Anonymous |
Abstract | Neural networks with ReLU activation functions have demonstrated their success in many applications. Recently, researchers noticed a potential issue with the optimization of ReLU networks: the ReLU activation functions are positively scale-invariant (PSI), while the weights are not. This mismatch may lead to undesirable behaviors in the optimization process. Hence, some new algorithms that conduct optimizations directly in the path space (the path space is proven to be PSI) were developed, such as Stochastic Gradient Descent (SGD) in the path space, and it was shown that SGD in the path space is superior to that in the weight space. However, it is still unknown whether other deep learning techniques beyond SGD, such as batch normalization (BN), could also have their counterparts in the path space. In this paper, we conduct a formal study on the design of BN in the path space. According to our study, the key challenge is how to ensure the forward propagation in the path space, because BN is utilized during the forward process. To tackle such challenge, we propose a novel re-parameterization of ReLU networks, with which we replace each weight in the original neural network, with a new value calculated from one or several paths, while keeping the outputs of the network unchanged for any input. Then we show that BN in the path space, namely P-BN, is just a slightly modified conventional BN on the re-parameterized ReLU networks. Our experiments on two benchmark datasets, CIFAR and ImageNet, show that the proposed P-BN can significantly outperform the conventional BN in the weight space. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJeXaJHKvB |
https://openreview.net/pdf?id=BJeXaJHKvB | |
PWC | https://paperswithcode.com/paper/p-bn-towards-effective-batch-normalization-in |
Repo | |
Framework | |
Undersensitivity in Neural Reading Comprehension
Title | Undersensitivity in Neural Reading Comprehension |
Authors | Anonymous |
Abstract | Neural reading comprehension models have recently achieved impressive gener- alisation results, yet still perform poorly when given adversarially selected input. Most prior work has studied semantically invariant text perturbations which cause a model’s prediction to change when it should not. In this work we focus on the complementary problem: excessive prediction undersensitivity where input text is meaningfully changed, and the model’s prediction does not change when it should. We formulate a noisy adversarial attack which searches among semantic variations of comprehension questions for which a model still erroneously pro- duces the same answer as the original question – and with an even higher prob- ability. We show that – despite comprising unanswerable questions – SQuAD2.0 and NewsQA models are vulnerable to this attack and commit a substantial frac- tion of errors on adversarially generated questions. This indicates that current models—even where they can correctly predict the answer—rely on spurious sur- face patterns and are not necessarily aware of all information provided in a given comprehension question. Developing this further, we experiment with both data augmentation and adversarial training as defence strategies: both are able to sub- stantially decrease a model’s vulnerability to undersensitivity attacks on held out evaluation data. Finally, we demonstrate that adversarially robust models gener- alise better in a biased data setting with a train/evaluation distribution mismatch; they are less prone to overly rely on predictive cues only present in the training set and outperform a conventional model in the biased data setting by up to 11% F1. |
Tasks | Adversarial Attack, Data Augmentation, Reading Comprehension |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkgxheBFDS |
https://openreview.net/pdf?id=HkgxheBFDS | |
PWC | https://paperswithcode.com/paper/undersensitivity-in-neural-reading |
Repo | |
Framework | |
Causally Correct Partial Models for Reinforcement Learning
Title | Causally Correct Partial Models for Reinforcement Learning |
Authors | Anonymous |
Abstract | In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent’s next actions. However, jointly modeling future observations can be computationally expensive or even intractable if the observations are high-dimensional (e.g. images). For this reason, previous works have considered partial models, which model only part of the observation. In this paper, we show that partial models can be causally incorrect: they are confounded by the observations they don’t model, and can therefore lead to incorrect planning. To address this, we introduce a general family of partial models that are provably causally correct, but avoid the need to fully model future observations. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyeG9yHKPr |
https://openreview.net/pdf?id=HyeG9yHKPr | |
PWC | https://paperswithcode.com/paper/causally-correct-partial-models-for |
Repo | |
Framework | |
Why Convolutional Networks Learn Oriented Bandpass Filters: A Hypothesis
Title | Why Convolutional Networks Learn Oriented Bandpass Filters: A Hypothesis |
Authors | Anonymous |
Abstract | It has been repeatedly observed that convolutional architectures when applied to image understanding tasks learn oriented bandpass filters. A standard explanation of this result is that these filters reflect the structure of the images that they have been exposed to during training: Natural images typically are locally composed of oriented contours at various scales and oriented bandpass filters are matched to such structure. The present paper offers an alternative explanation based not on the structure of images, but rather on the structure of convolutional architectures. In particular, complex exponentials are the eigenfunctions of convolution. These eigenfunctions are defined globally; however, convolutional architectures operate locally. To enforce locality, one can apply a windowing function to the eigenfunctions, which leads to oriented bandpass filters as the natural operators to be learned with convolutional architectures. From a representational point of view, these filters allow for a local systematic way to characterize and operate on an image or other signal. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1ekaT4tDB |
https://openreview.net/pdf?id=S1ekaT4tDB | |
PWC | https://paperswithcode.com/paper/why-convolutional-networks-learn-oriented |
Repo | |
Framework | |
Channel Equilibrium Networks
Title | Channel Equilibrium Networks |
Authors | Anonymous |
Abstract | Convolutional Neural Networks (CNNs) typically treat normalization methods such as batch normalization (BN) and rectified linear function (ReLU) as building blocks. Previous work showed that this basic block would lead to channel-level sparsity (i.e. channel of zero values), reducing computational complexity of CNNs. However, over-sparse CNNs have many collapsed channels (i.e. many channels with undesired zero values), impeding their learning ability. This problem is seldom explored in the literature. To recover the collapsed channels and enhance learning capacity, we propose a building block, Channel Equilibrium (CE), which takes the output of a normalization layer as input and switches between two branches, batch decorrelation (BD) branch and adaptive instance inverse (AII) branch. CE is able to prevent implicit channel-level sparsity in both experiments and theory. It has several appealing properties. First, CE can be stacked after many normalization methods such as BN and Group Normalization (GN), and integrated into many advanced CNN architectures such as ResNet and MobileNet V2 to form a series of CE networks (CENets), consistently improving their performance. Second, extensive experiments show that CE achieves state-of-the-art results on various challenging benchmarks such as ImageNet and COCO. Third, we show an interesting connection between CE and Nash Equilibrium, a well-known solution of a non-cooperative game. The models and code will be released soon. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlOcR4KwS |
https://openreview.net/pdf?id=BJlOcR4KwS | |
PWC | https://paperswithcode.com/paper/channel-equilibrium-networks |
Repo | |
Framework | |