April 1, 2020

2904 words 14 mins read

Paper Group NANR 69

Measure by Measure: Automatic Music Composition with Traditional Western Music Notation. Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training. PassNet: Learning pass probability surfaces from single-location labels. An architecture for visually-interpretable soccer analytics. Global R …

Measure by Measure: Automatic Music Composition with Traditional Western Music Notation


Title	Measure by Measure: Automatic Music Composition with Traditional Western Music Notation
Authors	Anonymous
Abstract	In this paper, we present a system that is capable of generating long polyphonic music given number of measures, up to hundreds of measures. This is achieved by creating a measure model that imitates the object hierarchy used in common encodings of traditional western music notation. On top of this, we construct a inter-measure context model that spans the entire composition.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hklk6xrYPB
PDF	https://openreview.net/pdf?id=Hklk6xrYPB
PWC	https://paperswithcode.com/paper/measure-by-measure-automatic-music
Repo
Framework

Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training


Title	Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training
Authors	Anonymous
Abstract	Adversarial training is the standard to train models robust against adversarial examples. However, especially for complex datasets, adversarial training incurs a significant loss in accuracy and is known to generalize poorly to stronger attacks, e.g., larger perturbations or other threat models. In this paper, we introduce confidence-calibrated adversarial training (CCAT) where the key idea is to enforce that the confidence on adversarial examples decays with their distance to the attacked examples. We show that CCAT preserves better the accuracy of normal training while robustness against adversarial examples is achieved via confidence thresholding. Most importantly, in strong contrast to adversarial training, the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training. We also discuss our extensive work to design strong adaptive attacks against CCAT and standard adversarial training which is of independent interest. We present experimental results on MNIST, SVHN and Cifar10.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJgwf04KPr
PDF	https://openreview.net/pdf?id=SJgwf04KPr
PWC	https://paperswithcode.com/paper/confidence-calibrated-adversarial-training-1
Repo
Framework

PassNet: Learning pass probability surfaces from single-location labels. An architecture for visually-interpretable soccer analytics


Title	PassNet: Learning pass probability surfaces from single-location labels. An architecture for visually-interpretable soccer analytics
Authors	Anonymous
Abstract	We propose a fully convolutional network architecture that is able to estimate a full surface of pass probabilities from single-location labels derived from high frequency spatio-temporal data of professional soccer matches. The network is able to perform remarkably well from low-level inputs by learning a feature hierarchy that produces predictions at different sampling levels that are merged together to preserve both coarse and fine detail. Our approach presents an extreme case of weakly supervised learning where there is just a single pixel correspondence between ground-truth outcomes and the predicted probability map. By providing not just an accurate evaluation of observed events but also a visual interpretation of the results of other potential actions, our approach opens the door for spatio-temporal decision-making analysis, an as-yet little-explored area in sports. Our proposed deep learning architecture can be easily adapted to solve many other related problems in sports analytics; we demonstrate this by extending the network to learn to estimate pass-selection likelihood.
Tasks	Decision Making
Published	2020-01-01
URL	https://openreview.net/forum?id=r1xxKJBKvr
PDF	https://openreview.net/pdf?id=r1xxKJBKvr
PWC	https://paperswithcode.com/paper/passnet-learning-pass-probability-surfaces
Repo
Framework

Global Relational Models of Source Code


Title	Global Relational Models of Source Code
Authors	Anonymous
Abstract	Models of code can learn distributed representations of a program’s syntax and semantics to predict many non-trivial properties of a program. Recent state-of-the-art models leverage highly structured representations of programs, such as trees, graphs and paths therein (e.g. data-flow relations), which are precise and abundantly available for code. This provides a strong inductive bias towards semantically meaningful relations, yielding more generalizable representations than classical sequence-based models. Unfortunately, these models primarily rely on graph-based message passing to represent relations in code, which makes them de facto local due to the high cost of message-passing steps, quite in contrast to modern, global sequence-based models, such as the Transformer. In this work, we bridge this divide between global and structured models by introducing two new hybrid model families that are both global and incorporate structural bias: Graph Sandwiches, which wrap traditional (gated) graph message-passing layers in sequential message-passing layers; and Graph Relational Embedding Attention Transformers (GREAT for short), which bias traditional Transformers with relational information from graph edge types. By studying a popular, non-trivial program repair task, variable-misuse identification, we explore the relative merits of traditional and hybrid model families for code representation. Starting with a graph-based model that already improves upon the prior state-of-the-art for this task by 20%, we show that our proposed hybrid models improve an additional 10-15%, while training both faster and using fewer parameters.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1lnbRNtwr
PDF	https://openreview.net/pdf?id=B1lnbRNtwr
PWC	https://paperswithcode.com/paper/global-relational-models-of-source-code
Repo
Framework

Semi-Supervised Named Entity Recognition with CRF-VAEs


Title	Semi-Supervised Named Entity Recognition with CRF-VAEs
Authors	Anonymous
Abstract	We investigate methods for semi-supervised learning (SSL) of a neural linear-chain conditional random field (CRF) for Named Entity Recognition (NER) by treating the tagger as the amortized variational posterior in a generative model of text given tags. We first illustrate how to incorporate a CRF in a VAE, enabling end-to-end training on semi-supervised data. We then investigate a series of increasingly complex deep generative models of tokens given tags enabled by end-to-end optimization, comparing the proposed models against supervised and strong CRF SSL baselines on the Ontonotes5 NER dataset. We find that our best proposed model consistently improves performance by $\approx 1%$ F1 in low- and moderate-resource regimes and easily addresses degenerate model behavior in a more difficult, partially supervised setting.
Tasks	Named Entity Recognition
Published	2020-01-01
URL	https://openreview.net/forum?id=BkxnKkrtvS
PDF	https://openreview.net/pdf?id=BkxnKkrtvS
PWC	https://paperswithcode.com/paper/semi-supervised-named-entity-recognition-with
Repo
Framework

End-to-end named entity recognition and relation extraction using pre-trained language models


Title	End-to-end named entity recognition and relation extraction using pre-trained language models
Authors	Anonymous
Abstract	Named entity recognition (NER) and relation extraction (RE) are two important tasks in information extraction and retrieval (IE & IR). Recent work has demonstrated that it is beneficial to learn these tasks jointly, which avoids the propagation of error inherent in pipeline-based systems and improves performance. However, state-of-the-art joint models typically rely on external natural language processing (NLP) tools, such as dependency parsers, limiting their usefulness to domains (e.g. news) where those tools perform well. The few neural, end-to-end models that have been proposed are trained almost completely from scratch. In this paper, we propose a neural, end-to-end model for jointly extracting entities and their relations which does not rely on external NLP tools and which integrates a large, pre-trained language model. Because the bulk of our model’s parameters are pre-trained and we eschew recurrence for self-attention, our model is fast to train. On 5 datasets across 3 domains, our model matches or exceeds state-of-the-art performance, sometimes by a large margin.
Tasks	Language Modelling, Named Entity Recognition, Relation Extraction
Published	2020-01-01
URL	https://openreview.net/forum?id=rkgqm0VKwB
PDF	https://openreview.net/pdf?id=rkgqm0VKwB
PWC	https://paperswithcode.com/paper/end-to-end-named-entity-recognition-and
Repo
Framework

Autoencoder-based Initialization for Recurrent Neural Networks with a Linear Memory


Title	Autoencoder-based Initialization for Recurrent Neural Networks with a Linear Memory
Authors	Anonymous
Abstract	Orthogonal recurrent neural networks address the vanishing gradient problem by parameterizing the recurrent connections using an orthogonal matrix. This class of models is particularly effective to solve tasks that require the memorization of long sequences. We propose an alternative solution based on explicit memorization using linear autoencoders for sequences. We show how a recently proposed recurrent architecture, the Linear Memory Network, composed of a nonlinear feedforward layer and a separate linear recurrence, can be used to solve hard memorization tasks. We propose an initialization schema that sets the weights of a recurrent architecture to approximate a linear autoencoder of the input sequences, which can be found with a closed-form solution. The initialization schema can be easily adapted to any recurrent architecture. We argue that this approach is superior to a random orthogonal initialization due to the autoencoder, which allows the memorization of long sequences even before training. The empirical analysis show that our approach achieves competitive results against alternative orthogonal models, and the LSTM, on sequential MNIST, permuted MNIST and TIMIT.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BkgM7xHYwH
PDF	https://openreview.net/pdf?id=BkgM7xHYwH
PWC	https://paperswithcode.com/paper/autoencoder-based-initialization-for
Repo
Framework

Relative Pixel Prediction For Autoregressive Image Generation


Title	Relative Pixel Prediction For Autoregressive Image Generation
Authors	Anonymous
Abstract	In natural images, transitions between adjacent pixels tend to be smooth and gradual, a fact that has long been exploited in image compression models based on predictive coding. In contrast, existing neural autoregressive image generation models predict the absolute pixel intensities at each position, which is a more challenging problem. In this paper, we propose to predict pixels relatively, by predicting new pixels relative to previously generated pixels (or pixels from the conditioning context, when available). We show that this form of prediction fare favorably to its absolute counterpart when used independently, but their coordination under an unified probabilistic model yields optimal performance, as the model learns to predict sharp transitions using the absolute predictor, while generating smooth transitions using the relative predictor. Experiments on multiple benchmarks for unconditional image generation, image colorization, and super-resolution indicate that our presented mechanism leads to improvements in terms of likelihood compared to the absolute prediction counterparts.
Tasks	Colorization, Image Compression, Image Generation, Super-Resolution
Published	2020-01-01
URL	https://openreview.net/forum?id=SyedHyBFwS
PDF	https://openreview.net/pdf?id=SyedHyBFwS
PWC	https://paperswithcode.com/paper/relative-pixel-prediction-for-autoregressive
Repo
Framework

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory


Title	AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
Authors	Anonymous
Abstract	Adaptive optimization algorithms such as RMSProp and Adam have fast convergence and smooth learning process. Despite their successes, they are proven to have non-convergence issue even in convex optimization problems as well as weak performance compared with the first order gradient methods such as stochastic gradient descent (SGD). Several other algorithms, for example AMSGrad and AdaShift, have been proposed to alleviate these issues but only minor effect has been observed. This paper further analyzes the performance of such algorithms in a non-convex setting by extending their non-convergence issue into a simple non-convex case and show that Adam’s design of update steps would possibly lead the algorithm to local minimums. To address the above problems, we propose a novel adaptive gradient descent algorithm, named AdaX, which accumulates the long-term past gradient information exponentially. We prove the convergence of AdaX in both convex and non-convex settings. Extensive experiments show that AdaX outperforms Adam in various tasks of computer vision and natural language processing and can catch up with SGD.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1l-5pEtDr
PDF	https://openreview.net/pdf?id=r1l-5pEtDr
PWC	https://paperswithcode.com/paper/adax-adaptive-gradient-descent-with
Repo
Framework

Fully Convolutional Graph Neural Networks using Bipartite Graph Convolutions


Title	Fully Convolutional Graph Neural Networks using Bipartite Graph Convolutions
Authors	Marcel Nassar, Xin Wang, Evren Tumer
Abstract	Graph neural networks have been adopted in numerous applications ranging from learning relational representations to modeling data on irregular domains such as point clouds, social graphs, and molecular structures. Though diverse in nature, graph neural network architectures remain limited by the graph convolution operator whose input and output graphs must have the same structure. With this restriction, representational hierarchy can only be built by graph convolution operations followed by non-parameterized pooling or expansion layers. This is very much like early convolutional network architectures, which later have been replaced by more effective parameterized strided and transpose convolution operations in combination with skip connections. In order to bring a similar change to graph convolutional networks, here we introduce the bipartite graph convolution operation, a parameterized transformation between different input and output graphs. Our framework is general enough to subsume conventional graph convolution and pooling as its special cases and supports multi-graph aggregation leading to a class of flexible and adaptable network architectures, termed BiGraphNet. By replacing the sequence of graph convolution and pooling in hierarchical architectures with a single parametric bipartite graph convolution, (i) we answer the question of whether graph pooling matters, and (ii) accelerate computations and lower memory requirements in hierarchical networks by eliminating pooling layers. Then, with concrete examples, we demonstrate that the general BiGraphNet formalism (iii) provides the modeling flexibility to build efficient architectures such as graph skip connections, and autoencoders.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gWyJBFDr
PDF	https://openreview.net/pdf?id=H1gWyJBFDr
PWC	https://paperswithcode.com/paper/fully-convolutional-graph-neural-networks
Repo
Framework

Multi-Agent Interactions Modeling with Correlated Policies


Title	Multi-Agent Interactions Modeling with Correlated Policies
Authors	Anonymous
Abstract	In multi-agent systems, complex interacting behaviors arise due to heavy correlations among agents. However, prior works on modeling multi-agent interactions from demonstrations have largely been constrained by assuming the independence among policies and their reward structures. In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents’ policies. Consequently, we develop a Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution. Various experiments demonstrate that CoDAIL can better fit complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods.
Tasks	Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=B1gZV1HYvS
PDF	https://openreview.net/pdf?id=B1gZV1HYvS
PWC	https://paperswithcode.com/paper/multi-agent-interactions-modeling-with
Repo
Framework

Learning to Make Generalizable and Diverse Predictions for Retrosynthesis


Title	Learning to Make Generalizable and Diverse Predictions for Retrosynthesis
Authors	Anonymous
Abstract	We propose a new model for making generalizable and diverse retrosynthetic reaction predictions. Given a target compound, the task is to predict the likely chemical reactants to produce the target. This generative task can be framed as a sequence-to-sequence problem by using the SMILES representations of the molecules. Building on top of the popular Transformer architecture, we propose two novel pre-training methods that construct relevant auxiliary tasks (plausible reactions) for our problem. Furthermore, we incorporate a discrete latent variable model into the architecture to encourage the model to produce a diverse set of alternative predictions. On the 50k subset of reaction examples from the United States patent literature (USPTO-50k) benchmark dataset, our model greatly improves performance over the baseline, while also generating predictions that are more diverse.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BygfrANKvB
PDF	https://openreview.net/pdf?id=BygfrANKvB
PWC	https://paperswithcode.com/paper/learning-to-make-generalizable-and-diverse-1
Repo
Framework

iSparse: Output Informed Sparsification of Neural Networks


Title	iSparse: Output Informed Sparsification of Neural Networks
Authors	Anonymous
Abstract	Deep neural networks have demonstrated unprecedented success in various knowledge management applications. However, the networks created are often very complex, with large numbers of trainable edges which require extensive computational resources. We note that many successful networks nevertheless often contain large numbers of redundant edges. Moreover, many of these edges may have negligible contributions towards the overall network performance. In this paper, we propose a novel iSparse framework and experimentally show, that we can sparsify the network, by 30-50%, without impacting the network performance. iSparse leverages a novel edge significance score, E, to determine the importance of an edge with respect to the final network output. Furthermore, iSparse can be applied both while training a model or on top of a pre-trained model, making it a retraining-free approach - leading to a minimal computational overhead. Comparisons of iSparse against PFEC, NISP, DropConnect, and Retraining-Free on benchmark datasets show that iSparse leads to effective network sparsifications.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryefmpEYPr
PDF	https://openreview.net/pdf?id=ryefmpEYPr
PWC	https://paperswithcode.com/paper/isparse-output-informed-sparsification-of
Repo
Framework

Stein Bridging: Enabling Mutual Reinforcement between Explicit and Implicit Generative Models


Title	Stein Bridging: Enabling Mutual Reinforcement between Explicit and Implicit Generative Models
Authors	Anonymous
Abstract	Deep generative models are generally categorized into explicit models and implicit models. The former assumes an explicit density form whose normalizing constant is often unknown; while the latter, including generative adversarial networks (GANs), generates samples using a push-forward mapping. In spite of substantial recent advances demonstrating the power of the two classes of generative models in many applications, both of them, when used alone, suffer from respective limitations and drawbacks. To mitigate these issues, we propose Stein Bridging, a novel joint training framework that connects an explicit density estimator and an implicit sample generator with Stein discrepancy. We show that the Stein Bridge induces new regularization schemes for both explicit and implicit models. Convergence analysis and extensive experiments demonstrate that the Stein Bridging i) improves the stability and sample quality of the GAN training, and ii) facilitates the density estimator to seek more modes in data and alleviate the mode-collapse issue. Additionally, we discuss several applications of Stein Bridging and useful tricks in practical implementation used in our experiments.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gx3kSKPS
PDF	https://openreview.net/pdf?id=H1gx3kSKPS
PWC	https://paperswithcode.com/paper/stein-bridging-enabling-mutual-reinforcement
Repo
Framework

Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods


Title	Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods
Authors	Diego Granziol, Timur Garipov, Dmitry Vetrov, Stefan Zohren, Stephen Roberts, Andrew Gordon Wilson
Abstract	The geometric properties of loss surfaces, such as the local flatness of a solution, are associated with generalization in deep learning. The Hessian is often used to understand these geometric properties. We investigate the differences between the eigenvalues of the neural network Hessian evaluated over the empirical dataset, the Empirical Hessian, and the eigenvalues of the Hessian under the data generating distribution, which we term the True Hessian. Under mild assumptions, we use random matrix theory to show that the True Hessian has eigenvalues of smaller absolute value than the Empirical Hessian. We support these results for different SGD schedules on both a 110-Layer ResNet and VGG-16. To perform these experiments we propose a framework for spectral visualization, based on GPU accelerated stochastic Lanczos quadrature. This approach is an order of magnitude faster than state-of-the-art methods for spectral visualization, and can be generically used to investigate the spectral properties of matrices in deep learning.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gza2NtwH
PDF	https://openreview.net/pdf?id=H1gza2NtwH
PWC	https://paperswithcode.com/paper/towards-understanding-the-true-loss-surface
Repo
Framework