April 1, 2020

2932 words 14 mins read

Paper Group NANR 23

Towards Stable and Efficient Training of Verifiably Robust Neural Networks. Generating Dialogue Responses From A Semantic Latent Space. Walking on the Edge: Fast, Low-Distortion Adversarial Examples. Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks. Automatically Learning Feature Crossing from Mode …

Towards Stable and Efficient Training of Verifiably Robust Neural Networks


Title	Towards Stable and Efficient Training of Verifiably Robust Neural Networks
Authors	Anonymous
Abstract	Training neural networks with verifiable robustness guarantees is challenging. Several existing approaches utilize linear relaxation based neural network output bounds under perturbation, but they can slow down training by a factor of hundreds depending on the underlying network architectures. Meanwhile, interval bound propagation (IBP) based training is efficient and significantly outperforms linear relaxation based methods on many tasks, yet it may suffer from stability issues since the bounds are much looser especially at the beginning of training. In this paper, we propose a new certified adversarial training method, CROWN-IBP, by combining the fast IBP bounds in a forward bounding pass and a tight linear relaxation based bound, CROWN, in a backward bounding pass. CROWN-IBP is computationally efficient and consistently outperforms IBP baselines on training verifiably robust neural networks. We conduct large scale experiments on MNIST and CIFAR datasets, and outperform all previous linear relaxation and bound propagation based certified defenses in L_inf robustness. Notably, we achieve 7.02% verified test error on MNIST at epsilon=0.3, and 66.94% on CIFAR-10 with epsilon=8/255.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Skxuk1rFwB
PDF	https://openreview.net/pdf?id=Skxuk1rFwB
PWC	https://paperswithcode.com/paper/towards-stable-and-efficient-training-of-1
Repo
Framework

Generating Dialogue Responses From A Semantic Latent Space


Title	Generating Dialogue Responses From A Semantic Latent Space
Authors	Anonymous
Abstract	Generic responses are a known issue for open-domain dialog generation. Most current approaches model this one-to-many task as a one-to-one task, hence being unable to integrate information from multiple semantically similar valid responses of a prompt. We propose a novel dialog generation model that learns a semantic latent space, on which representations of semantically related sentences are close to each other. This latent space is learned by maximizing correlation between the features extracted from prompt and responses. Learning the pair relationship between the prompts and responses as a regression task on the latent space, instead of classification on the vocabulary using MLE loss, enables our model to view semantically related responses collectively. An additional autoencoder is trained, for recovering the full sentence from the latent space. Experimental results show that our proposed model eliminates the generic response problem, while achieving comparable or better coherence compared to baselines.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SyeUMRNYDr
PDF	https://openreview.net/pdf?id=SyeUMRNYDr
PWC	https://paperswithcode.com/paper/generating-dialogue-responses-from-a-semantic
Repo
Framework

Walking on the Edge: Fast, Low-Distortion Adversarial Examples


Title	Walking on the Edge: Fast, Low-Distortion Adversarial Examples
Authors	Anonymous
Abstract	Adversarial examples of deep neural networks are receiving ever increasing attention because they help in understanding and reducing the sensitivity to their input. This is natural given the increasing applications of deep neural networks in our everyday lives. When white-box attacks are almost always successful, it is typically only the distortion of the perturbations that matters in their evaluation. In this work, we argue that speed is important as well, especially when considering that fast attacks are required by adversarial training. Given more time, iterative methods can always find better solutions. We investigate this speed-distortion trade-off in some depth and introduce a new attack called boundary projection BP that improves upon existing methods by a large margin. Our key idea is that the classification boundary is a manifold in the image space: we therefore quickly reach the boundary and then optimize distortion on this manifold.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BygkQeHKwB
PDF	https://openreview.net/pdf?id=BygkQeHKwB
PWC	https://paperswithcode.com/paper/walking-on-the-edge-fast-low-distortion
Repo
Framework

Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks


Title	Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks
Authors	Anonymous
Abstract	Training with larger number of parameters while keeping fast iterations is an increasingly adopted strategy and trend for developing better performing Deep Neural Network (DNN) models. This necessitates increased memory footprint and computational requirements for training. Here we introduce a novel methodology for training deep neural networks using 8-bit floating point (FP8) numbers. Reduced bit precision allows for a larger effective memory and increased computational speed. We name this method Shifted and Squeezed FP8 (S2FP8). We show that, unlike previous 8-bit precision training methods, the proposed method works out of the box for representative models: ResNet50, Transformer and NCF. The method can maintain model accuracy without requiring fine-tuning loss scaling parameters or keeping certain layers in single precision. We introduce two learnable statistics of the DNN tensors - shifted and squeezed factors that are used to optimally adjust the range of the tensors in 8-bits, thus minimizing the loss in information due to quantization.
Tasks	Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkxe2AVtPS
PDF	https://openreview.net/pdf?id=Bkxe2AVtPS
PWC	https://paperswithcode.com/paper/shifted-and-squeezed-8-bit-floating-point
Repo
Framework

Automatically Learning Feature Crossing from Model Interpretation for Tabular Data


Title	Automatically Learning Feature Crossing from Model Interpretation for Tabular Data
Authors	Anonymous
Abstract	Automatically feature generation is a major topic of automated machine learning. Among various feature generation approaches, feature crossing, which takes cross-product of sparse features, is a promising way to effectively capture the interactions among categorical features in tabular data. Previous works on feature crossing try to search in the set of all the possible cross feature fields. This is obviously not efficient when the size of original feature fields is large. Meanwhile, some deep learning-based methods combines deep neural networks and various interaction components. However, due to the existing of Deep Neural Networks (DNN), only a few cross features can be explicitly generated by the interaction components. Recently, piece-wise interpretation of DNN has been widely studied, and the piece-wise interpretations are usually inconsistent in different samples. Inspired by this, we give a definition of interpretation inconsistency in DNN, and propose a novel method called CrossGO, which selects useful cross features according to the interpretation inconsistency. The whole process of learning feature crossing can be done via simply training a DNN model and a logistic regression (LR) model. CrossGO can generate compact candidate set of cross feature fields, and promote the efficiency of searching. Extensive experiments have been conducted on several real-world datasets. Cross features generated by CrossGO can empower a simple LR model achieving approximate or even better performances comparing with complex DNN models.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Sye2s2VtDr
PDF	https://openreview.net/pdf?id=Sye2s2VtDr
PWC	https://paperswithcode.com/paper/automatically-learning-feature-crossing-from
Repo
Framework

PAC-Bayesian Neural Network Bounds


Title	PAC-Bayesian Neural Network Bounds
Authors	Anonymous
Abstract	Bayesian neural networks, which both use the negative log-likelihood loss function and average their predictions using a learned posterior over the parameters, have been used successfully across many scientific fields, partly due to their ability to `effortlessly’ extract desired representations from many large-scale datasets. However, generalization bounds for this setting is still missing. In this paper, we present a new PAC-Bayesian generalization bound for the negative log-likelihood loss which utilizes the \emph{Herbst Argument} for the log-Sobolev inequality to bound the moment generating function of the learners risk. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HkgR8erKwB
PDF	https://openreview.net/pdf?id=HkgR8erKwB
PWC	https://paperswithcode.com/paper/pac-bayesian-neural-network-bounds
Repo
Framework

Real or Not Real, that is the Question


Title	Real or Not Real, that is the Question
Authors	Anonymous
Abstract	While generative adversarial networks (GAN) have been widely adopted in various topics, in this paper we generalize the standard GAN to a new perspective by treating realness as a random variable that can be estimated from multiple angles. In this generalized framework, referred to as RealnessGAN, the discriminator outputs a distribution as the measure of realness. While RealnessGAN shares similar theoretical guarantees with the standard GAN, it provides more insights on adversarial learning. More importantly, compared to multiple baselines, RealnessGAN provides stronger guidance for the generator, achieving improvements on both synthetic and real-world datasets. Moreover, it enables the basic DCGAN architecture to generate realistic images at 1024*1024 resolution when trained from scratch.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1lPaCNtPB
PDF	https://openreview.net/pdf?id=B1lPaCNtPB
PWC	https://paperswithcode.com/paper/real-or-not-real-that-is-the-question
Repo
Framework

Learning Temporal Abstraction with Information-theoretic Constraints for Hierarchical Reinforcement Learning


Title	Learning Temporal Abstraction with Information-theoretic Constraints for Hierarchical Reinforcement Learning
Authors	Anonymous
Abstract	Applying reinforcement learning (RL) to real-world problems will require reasoning about action-reward correlation over long time horizons. Hierarchical reinforcement learning (HRL) methods handle this by dividing the task into hierarchies, often with hand-tuned network structure or pre-defined subgoals. We propose a novel HRL framework TAIC, which learns the temporal abstraction from past experience or expert demonstrations without task-specific knowledge. We formulate the temporal abstraction problem as learning latent representations of action sequences and present a novel approach of regularizing the latent space by adding information-theoretic constraints. Specifically, we maximize the mutual information between the latent variables and the state changes. A visualization of the latent space demonstrates that our algorithm learns an effective abstraction of the long action sequences. The learned abstraction allows us to learn new tasks on higher level more efficiently. We convey a significant speedup in convergence over benchmark learning problems. These results demonstrate that learning temporal abstractions is an effective technique in increasing the convergence rate and sample efficiency of RL algorithms.
Tasks	Hierarchical Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HkeUDCNFPS
PDF	https://openreview.net/pdf?id=HkeUDCNFPS
PWC	https://paperswithcode.com/paper/learning-temporal-abstraction-with
Repo
Framework

Hierarchical Graph Matching Networks for Deep Graph Similarity Learning


Title	Hierarchical Graph Matching Networks for Deep Graph Similarity Learning
Authors	Anonymous
Abstract	While the celebrated graph neural networks yields effective representations for individual nodes of a graph, there has been relatively less success in extending to deep graph similarity learning. Recent work has considered either global-level graph-graph interactions or low-level node-node interactions, ignoring the rich cross-level interactions between parts of a graph and a whole graph. In this paper, we propose a Hierarchical Graph Matching Network (HGMN) for computing the graph similarity between any pair of graph-structured objects. Our model jointly learns graph representations and a graph matching metric function for computing graph similarity in an end-to-end fashion. The proposed HGMN model consists of a multi-perspective node-graph matching network for effectively learning cross-level interactions between parts of a graph and a whole graph, and a siamese graph neural network for learning global-level interactions between two graphs. Our comprehensive experiments demonstrate that our proposed HGMN consistently outperforms state-of-the-art graph matching networks baselines for both classification and regression tasks.
Tasks	Graph Matching, Graph Similarity
Published	2020-01-01
URL	https://openreview.net/forum?id=rkeqn1rtDH
PDF	https://openreview.net/pdf?id=rkeqn1rtDH
PWC	https://paperswithcode.com/paper/hierarchical-graph-matching-networks-for-deep
Repo
Framework

JAX MD: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python


Title	JAX MD: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python
Authors	Anonymous
Abstract	A large fraction of computational science involves simulating the dynamics of particles that interact via pairwise or many-body interactions. These simulations, called Molecular Dynamics (MD), span a vast range of subjects from physics and materials science to biochemistry and drug discovery. Most MD software involves significant use of handwritten derivatives and code reuse across C++, FORTRAN, and CUDA. This is reminiscent of the state of machine learning before automatic differentiation became popular. In this work we bring the substantial advances in software that have taken place in machine learning to MD with JAX, M.D. (JAX MD). JAX MD is an end-to-end differentiable MD package written entirely in Python that can be just-in-time compiled to CPU, GPU, or TPU. JAX MD allows researchers to iterate extremely quickly and lets researchers easily incorporate machine learning models into their workflows. Finally, since all of the simulation code is written in Python, researchers can have unprecedented flexibility in setting up experiments without having to edit any low-level C++ or CUDA code. In addition to making existing workloads easier, JAX MD allows researchers to take derivatives through whole-simulations as well as seamlessly incorporate neural networks into simulations. This paper explores the architecture of JAX MD and its capabilities through several vignettes. Code is available at github.com/jaxmd/jax-md along with an interactive Colab notebook.
Tasks	Drug Discovery
Published	2020-01-01
URL	https://openreview.net/forum?id=r1xMnCNYvB
PDF	https://openreview.net/pdf?id=r1xMnCNYvB
PWC	https://paperswithcode.com/paper/jax-md-end-to-end-differentiable-hardware
Repo
Framework

Unsupervised Intuitive Physics from Past Experiences


Title	Unsupervised Intuitive Physics from Past Experiences
Authors	Anonymous
Abstract	We consider the problem of learning models of intuitive physics from raw, unlabelled visual input. Differently from prior work, in addition to learning general physical principles, we are also interested in learning ``on the fly’’ physical properties specific to new environments, based on a small number of environment-specific experiences. We do all this in an unsupervised manner, using a meta-learning formulation where the goal is to predict videos containing demonstrations of physical phenomena, such as objects moving and colliding with a complex background. We introduce the idea of summarizing past experiences in a very compact manner, in our case using dynamic images, and show that this can be used to solve the problem well and efficiently. Empirically, we show, via extensive experiments and ablation studies, that our model learns to perform physical predictions that generalize well in time and space, as well as to a variable number of interacting physical objects. \|
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJlOq34Kwr
PDF	https://openreview.net/pdf?id=SJlOq34Kwr
PWC	https://paperswithcode.com/paper/unsupervised-intuitive-physics-from-past-1
Repo
Framework

Localized Generations with Deep Neural Networks for Multi-Scale Structured Datasets


Title	Localized Generations with Deep Neural Networks for Multi-Scale Structured Datasets
Authors	Anonymous
Abstract	Extracting the hidden structure of the external environment is an essential component of intelligent agents and human learning. The real-world datasets that we are interested in are often characterized by the locality: the change in the structural relationship between the data points depending on location in observation space. The local learning approach extracts semantic representations for these datasets by training the embedding model from scratch for each local neighborhood, respectively. However, this approach is only limited to use with a simple model, since the complex model, including deep neural networks, requires a massive amount of data and extended training time. In this study, we overcome this trade-off based on the insight that the real-world dataset often shares some structural similarity between each neighborhood. We propose to utilize the embedding model for the other local structure as a weak form of supervision. Our proposed model, the Local VAE, generalize the Variational Autoencoder to have the different model parameters for each local subset and train these local parameters by the gradient-based meta-learning. Our experimental results showed that the Local VAE succeeded in learning the semantic representations for the dataset with local structure, including the 3D Shapes Dataset, and generated high-quality images.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Skgaia4tDH
PDF	https://openreview.net/pdf?id=Skgaia4tDH
PWC	https://paperswithcode.com/paper/localized-generations-with-deep-neural
Repo
Framework

ROBUST DISCRIMINATIVE REPRESENTATION LEARNING VIA GRADIENT RESCALING: AN EMPHASIS REGULARISATION PERSPECTIVE


Title	ROBUST DISCRIMINATIVE REPRESENTATION LEARNING VIA GRADIENT RESCALING: AN EMPHASIS REGULARISATION PERSPECTIVE
Authors	Anonymous
Abstract	It is fundamental and challenging to train robust and accurate Deep Neural Networks (DNNs) when semantically abnormal examples exist. Although great progress has been made, there is still one crucial research question which is not thoroughly explored yet: What training examples should be focused and how much more should they be emphasised to achieve robust learning? In this work, we study this question and propose gradient rescaling (GR) to solve it. GR modifies the magnitude of logit vector’s gradient to emphasise on relatively easier training data points when noise becomes more severe, which functions as explicit emphasis regularisation to improve the generalisation performance of DNNs. Apart from regularisation, we connect GR to examples weighting and designing robust loss functions. We empirically demonstrate that GR is highly anomaly-robust and outperforms the state-of-the-art by a large margin, e.g., increasing 7% on CIFAR100 with 40% noisy labels. It is also significantly superior to standard regularisers in both clean and abnormal settings. Furthermore, we present comprehensive ablation studies to explore the behaviours of GR under different cases, which is informative for applying GR in real-world scenarios.
Tasks	Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rylUOn4Yvr
PDF	https://openreview.net/pdf?id=rylUOn4Yvr
PWC	https://paperswithcode.com/paper/robust-discriminative-representation-learning
Repo
Framework

Blurring Structure and Learning to Optimize and Adapt Receptive Fields


Title	Blurring Structure and Learning to Optimize and Adapt Receptive Fields
Authors	Anonymous
Abstract	The visual world is vast and varied, but its variations divide into structured and unstructured factors. We compose free-form filters and structured Gaussian filters, optimized end-to-end, to factorize deep representations and learn both local features and their degree of locality. In effect this optimizes over receptive field size and shape, tuning locality to the data and task. Our semi-structured composition is strictly more expressive than free-form filtering, and changes in its structured parameters would require changes in architecture for standard networks. Dynamic inference, in which the Gaussian structure varies with the input, adapts receptive field size to compensate for local scale variation. Optimizing receptive field size improves semantic segmentation accuracy on Cityscapes by 1-2 points for strong dilated and skip architectures and by up to 10 points for suboptimal designs. Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency.
Tasks	Semantic Segmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=r1ghgxHtPH
PDF	https://openreview.net/pdf?id=r1ghgxHtPH
PWC	https://paperswithcode.com/paper/blurring-structure-and-learning-to-optimize
Repo
Framework

BayesOpt Adversarial Attack


Title	BayesOpt Adversarial Attack
Authors	Anonymous
Abstract	Black-box adversarial attacks require a large number of attempts before finding successful adversarial examples that are visually indistinguishable from the original input. Current approaches relying on substitute model training, gradient estimation or genetic algorithms often require an excessive number of queries. Therefore, they are not suitable for real-world systems where the maximum query number is limited due to cost. We propose a query-efficient black-box attack which uses Bayesian optimisation in combination with Bayesian model selection to optimise over the adversarial perturbation and the optimal degree of search space dimension reduction. We demonstrate empirically that our method can achieve comparable success rates with 2-5 times fewer queries compared to previous state-of-the-art black-box attacks.
Tasks	Adversarial Attack, Bayesian Optimisation, Dimensionality Reduction, Model Selection
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkem-lrtvH
PDF	https://openreview.net/pdf?id=Hkem-lrtvH
PWC	https://paperswithcode.com/paper/bayesopt-adversarial-attack
Repo
Framework