Paper Group NANR 69
Measure by Measure: Automatic Music Composition with Traditional Western Music Notation. Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training. PassNet: Learning pass probability surfaces from single-location labels. An architecture for visually-interpretable soccer analytics. Global R …
Measure by Measure: Automatic Music Composition with Traditional Western Music Notation
Title | Measure by Measure: Automatic Music Composition with Traditional Western Music Notation |
Authors | Anonymous |
Abstract | In this paper, we present a system that is capable of generating long polyphonic music given number of measures, up to hundreds of measures. This is achieved by creating a measure model that imitates the object hierarchy used in common encodings of traditional western music notation. On top of this, we construct a inter-measure context model that spans the entire composition. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hklk6xrYPB |
https://openreview.net/pdf?id=Hklk6xrYPB | |
PWC | https://paperswithcode.com/paper/measure-by-measure-automatic-music |
Repo | |
Framework | |
Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training
Title | Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training |
Authors | Anonymous |
Abstract | Adversarial training is the standard to train models robust against adversarial examples. However, especially for complex datasets, adversarial training incurs a significant loss in accuracy and is known to generalize poorly to stronger attacks, e.g., larger perturbations or other threat models. In this paper, we introduce confidence-calibrated adversarial training (CCAT) where the key idea is to enforce that the confidence on adversarial examples decays with their distance to the attacked examples. We show that CCAT preserves better the accuracy of normal training while robustness against adversarial examples is achieved via confidence thresholding. Most importantly, in strong contrast to adversarial training, the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training. We also discuss our extensive work to design strong adaptive attacks against CCAT and standard adversarial training which is of independent interest. We present experimental results on MNIST, SVHN and Cifar10. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJgwf04KPr |
https://openreview.net/pdf?id=SJgwf04KPr | |
PWC | https://paperswithcode.com/paper/confidence-calibrated-adversarial-training-1 |
Repo | |
Framework | |
PassNet: Learning pass probability surfaces from single-location labels. An architecture for visually-interpretable soccer analytics
Title | PassNet: Learning pass probability surfaces from single-location labels. An architecture for visually-interpretable soccer analytics |
Authors | Anonymous |
Abstract | We propose a fully convolutional network architecture that is able to estimate a full surface of pass probabilities from single-location labels derived from high frequency spatio-temporal data of professional soccer matches. The network is able to perform remarkably well from low-level inputs by learning a feature hierarchy that produces predictions at different sampling levels that are merged together to preserve both coarse and fine detail. Our approach presents an extreme case of weakly supervised learning where there is just a single pixel correspondence between ground-truth outcomes and the predicted probability map. By providing not just an accurate evaluation of observed events but also a visual interpretation of the results of other potential actions, our approach opens the door for spatio-temporal decision-making analysis, an as-yet little-explored area in sports. Our proposed deep learning architecture can be easily adapted to solve many other related problems in sports analytics; we demonstrate this by extending the network to learn to estimate pass-selection likelihood. |
Tasks | Decision Making |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1xxKJBKvr |
https://openreview.net/pdf?id=r1xxKJBKvr | |
PWC | https://paperswithcode.com/paper/passnet-learning-pass-probability-surfaces |
Repo | |
Framework | |
Global Relational Models of Source Code
Title | Global Relational Models of Source Code |
Authors | Anonymous |
Abstract | Models of code can learn distributed representations of a program’s syntax and semantics to predict many non-trivial properties of a program. Recent state-of-the-art models leverage highly structured representations of programs, such as trees, graphs and paths therein (e.g. data-flow relations), which are precise and abundantly available for code. This provides a strong inductive bias towards semantically meaningful relations, yielding more generalizable representations than classical sequence-based models. Unfortunately, these models primarily rely on graph-based message passing to represent relations in code, which makes them de facto local due to the high cost of message-passing steps, quite in contrast to modern, global sequence-based models, such as the Transformer. In this work, we bridge this divide between global and structured models by introducing two new hybrid model families that are both global and incorporate structural bias: Graph Sandwiches, which wrap traditional (gated) graph message-passing layers in sequential message-passing layers; and Graph Relational Embedding Attention Transformers (GREAT for short), which bias traditional Transformers with relational information from graph edge types. By studying a popular, non-trivial program repair task, variable-misuse identification, we explore the relative merits of traditional and hybrid model families for code representation. Starting with a graph-based model that already improves upon the prior state-of-the-art for this task by 20%, we show that our proposed hybrid models improve an additional 10-15%, while training both faster and using fewer parameters. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1lnbRNtwr |
https://openreview.net/pdf?id=B1lnbRNtwr | |
PWC | https://paperswithcode.com/paper/global-relational-models-of-source-code |
Repo | |
Framework | |
Semi-Supervised Named Entity Recognition with CRF-VAEs
Title | Semi-Supervised Named Entity Recognition with CRF-VAEs |
Authors | Anonymous |
Abstract | We investigate methods for semi-supervised learning (SSL) of a neural linear-chain conditional random field (CRF) for Named Entity Recognition (NER) by treating the tagger as the amortized variational posterior in a generative model of text given tags. We first illustrate how to incorporate a CRF in a VAE, enabling end-to-end training on semi-supervised data. We then investigate a series of increasingly complex deep generative models of tokens given tags enabled by end-to-end optimization, comparing the proposed models against supervised and strong CRF SSL baselines on the Ontonotes5 NER dataset. We find that our best proposed model consistently improves performance by $\approx 1%$ F1 in low- and moderate-resource regimes and easily addresses degenerate model behavior in a more difficult, partially supervised setting. |
Tasks | Named Entity Recognition |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkxnKkrtvS |
https://openreview.net/pdf?id=BkxnKkrtvS | |
PWC | https://paperswithcode.com/paper/semi-supervised-named-entity-recognition-with |
Repo | |
Framework | |
End-to-end named entity recognition and relation extraction using pre-trained language models
Title | End-to-end named entity recognition and relation extraction using pre-trained language models |
Authors | Anonymous |
Abstract | Named entity recognition (NER) and relation extraction (RE) are two important tasks in information extraction and retrieval (IE & IR). Recent work has demonstrated that it is beneficial to learn these tasks jointly, which avoids the propagation of error inherent in pipeline-based systems and improves performance. However, state-of-the-art joint models typically rely on external natural language processing (NLP) tools, such as dependency parsers, limiting their usefulness to domains (e.g. news) where those tools perform well. The few neural, end-to-end models that have been proposed are trained almost completely from scratch. In this paper, we propose a neural, end-to-end model for jointly extracting entities and their relations which does not rely on external NLP tools and which integrates a large, pre-trained language model. Because the bulk of our model’s parameters are pre-trained and we eschew recurrence for self-attention, our model is fast to train. On 5 datasets across 3 domains, our model matches or exceeds state-of-the-art performance, sometimes by a large margin. |
Tasks | Language Modelling, Named Entity Recognition, Relation Extraction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkgqm0VKwB |
https://openreview.net/pdf?id=rkgqm0VKwB | |
PWC | https://paperswithcode.com/paper/end-to-end-named-entity-recognition-and |
Repo | |
Framework | |
Autoencoder-based Initialization for Recurrent Neural Networks with a Linear Memory
Title | Autoencoder-based Initialization for Recurrent Neural Networks with a Linear Memory |
Authors | Anonymous |
Abstract | Orthogonal recurrent neural networks address the vanishing gradient problem by parameterizing the recurrent connections using an orthogonal matrix. This class of models is particularly effective to solve tasks that require the memorization of long sequences. We propose an alternative solution based on explicit memorization using linear autoencoders for sequences. We show how a recently proposed recurrent architecture, the Linear Memory Network, composed of a nonlinear feedforward layer and a separate linear recurrence, can be used to solve hard memorization tasks. We propose an initialization schema that sets the weights of a recurrent architecture to approximate a linear autoencoder of the input sequences, which can be found with a closed-form solution. The initialization schema can be easily adapted to any recurrent architecture. We argue that this approach is superior to a random orthogonal initialization due to the autoencoder, which allows the memorization of long sequences even before training. The empirical analysis show that our approach achieves competitive results against alternative orthogonal models, and the LSTM, on sequential MNIST, permuted MNIST and TIMIT. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkgM7xHYwH |
https://openreview.net/pdf?id=BkgM7xHYwH | |
PWC | https://paperswithcode.com/paper/autoencoder-based-initialization-for |
Repo | |
Framework | |
Relative Pixel Prediction For Autoregressive Image Generation
Title | Relative Pixel Prediction For Autoregressive Image Generation |
Authors | Anonymous |
Abstract | In natural images, transitions between adjacent pixels tend to be smooth and gradual, a fact that has long been exploited in image compression models based on predictive coding. In contrast, existing neural autoregressive image generation models predict the absolute pixel intensities at each position, which is a more challenging problem. In this paper, we propose to predict pixels relatively, by predicting new pixels relative to previously generated pixels (or pixels from the conditioning context, when available). We show that this form of prediction fare favorably to its absolute counterpart when used independently, but their coordination under an unified probabilistic model yields optimal performance, as the model learns to predict sharp transitions using the absolute predictor, while generating smooth transitions using the relative predictor. Experiments on multiple benchmarks for unconditional image generation, image colorization, and super-resolution indicate that our presented mechanism leads to improvements in terms of likelihood compared to the absolute prediction counterparts. |
Tasks | Colorization, Image Compression, Image Generation, Super-Resolution |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyedHyBFwS |
https://openreview.net/pdf?id=SyedHyBFwS | |
PWC | https://paperswithcode.com/paper/relative-pixel-prediction-for-autoregressive |
Repo | |
Framework | |
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
Title | AdaX: Adaptive Gradient Descent with Exponential Long Term Memory |
Authors | Anonymous |
Abstract | Adaptive optimization algorithms such as RMSProp and Adam have fast convergence and smooth learning process. Despite their successes, they are proven to have non-convergence issue even in convex optimization problems as well as weak performance compared with the first order gradient methods such as stochastic gradient descent (SGD). Several other algorithms, for example AMSGrad and AdaShift, have been proposed to alleviate these issues but only minor effect has been observed. This paper further analyzes the performance of such algorithms in a non-convex setting by extending their non-convergence issue into a simple non-convex case and show that Adam’s design of update steps would possibly lead the algorithm to local minimums. To address the above problems, we propose a novel adaptive gradient descent algorithm, named AdaX, which accumulates the long-term past gradient information exponentially. We prove the convergence of AdaX in both convex and non-convex settings. Extensive experiments show that AdaX outperforms Adam in various tasks of computer vision and natural language processing and can catch up with SGD. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1l-5pEtDr |
https://openreview.net/pdf?id=r1l-5pEtDr | |
PWC | https://paperswithcode.com/paper/adax-adaptive-gradient-descent-with |
Repo | |
Framework | |
Fully Convolutional Graph Neural Networks using Bipartite Graph Convolutions
Title | Fully Convolutional Graph Neural Networks using Bipartite Graph Convolutions |
Authors | Marcel Nassar, Xin Wang, Evren Tumer |
Abstract | Graph neural networks have been adopted in numerous applications ranging from learning relational representations to modeling data on irregular domains such as point clouds, social graphs, and molecular structures. Though diverse in nature, graph neural network architectures remain limited by the graph convolution operator whose input and output graphs must have the same structure. With this restriction, representational hierarchy can only be built by graph convolution operations followed by non-parameterized pooling or expansion layers. This is very much like early convolutional network architectures, which later have been replaced by more effective parameterized strided and transpose convolution operations in combination with skip connections. In order to bring a similar change to graph convolutional networks, here we introduce the bipartite graph convolution operation, a parameterized transformation between different input and output graphs. Our framework is general enough to subsume conventional graph convolution and pooling as its special cases and supports multi-graph aggregation leading to a class of flexible and adaptable network architectures, termed BiGraphNet. By replacing the sequence of graph convolution and pooling in hierarchical architectures with a single parametric bipartite graph convolution, (i) we answer the question of whether graph pooling matters, and (ii) accelerate computations and lower memory requirements in hierarchical networks by eliminating pooling layers. Then, with concrete examples, we demonstrate that the general BiGraphNet formalism (iii) provides the modeling flexibility to build efficient architectures such as graph skip connections, and autoencoders. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1gWyJBFDr |
https://openreview.net/pdf?id=H1gWyJBFDr | |
PWC | https://paperswithcode.com/paper/fully-convolutional-graph-neural-networks |
Repo | |
Framework | |
Multi-Agent Interactions Modeling with Correlated Policies
Title | Multi-Agent Interactions Modeling with Correlated Policies |
Authors | Anonymous |
Abstract | In multi-agent systems, complex interacting behaviors arise due to heavy correlations among agents. However, prior works on modeling multi-agent interactions from demonstrations have largely been constrained by assuming the independence among policies and their reward structures. In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents’ policies. Consequently, we develop a Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution. Various experiments demonstrate that CoDAIL can better fit complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods. |
Tasks | Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1gZV1HYvS |
https://openreview.net/pdf?id=B1gZV1HYvS | |
PWC | https://paperswithcode.com/paper/multi-agent-interactions-modeling-with |
Repo | |
Framework | |
Learning to Make Generalizable and Diverse Predictions for Retrosynthesis
Title | Learning to Make Generalizable and Diverse Predictions for Retrosynthesis |
Authors | Anonymous |
Abstract | We propose a new model for making generalizable and diverse retrosynthetic reaction predictions. Given a target compound, the task is to predict the likely chemical reactants to produce the target. This generative task can be framed as a sequence-to-sequence problem by using the SMILES representations of the molecules. Building on top of the popular Transformer architecture, we propose two novel pre-training methods that construct relevant auxiliary tasks (plausible reactions) for our problem. Furthermore, we incorporate a discrete latent variable model into the architecture to encourage the model to produce a diverse set of alternative predictions. On the 50k subset of reaction examples from the United States patent literature (USPTO-50k) benchmark dataset, our model greatly improves performance over the baseline, while also generating predictions that are more diverse. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BygfrANKvB |
https://openreview.net/pdf?id=BygfrANKvB | |
PWC | https://paperswithcode.com/paper/learning-to-make-generalizable-and-diverse-1 |
Repo | |
Framework | |
iSparse: Output Informed Sparsification of Neural Networks
Title | iSparse: Output Informed Sparsification of Neural Networks |
Authors | Anonymous |
Abstract | Deep neural networks have demonstrated unprecedented success in various knowledge management applications. However, the networks created are often very complex, with large numbers of trainable edges which require extensive computational resources. We note that many successful networks nevertheless often contain large numbers of redundant edges. Moreover, many of these edges may have negligible contributions towards the overall network performance. In this paper, we propose a novel iSparse framework and experimentally show, that we can sparsify the network, by 30-50%, without impacting the network performance. iSparse leverages a novel edge significance score, E, to determine the importance of an edge with respect to the final network output. Furthermore, iSparse can be applied both while training a model or on top of a pre-trained model, making it a retraining-free approach - leading to a minimal computational overhead. Comparisons of iSparse against PFEC, NISP, DropConnect, and Retraining-Free on benchmark datasets show that iSparse leads to effective network sparsifications. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryefmpEYPr |
https://openreview.net/pdf?id=ryefmpEYPr | |
PWC | https://paperswithcode.com/paper/isparse-output-informed-sparsification-of |
Repo | |
Framework | |
Stein Bridging: Enabling Mutual Reinforcement between Explicit and Implicit Generative Models
Title | Stein Bridging: Enabling Mutual Reinforcement between Explicit and Implicit Generative Models |
Authors | Anonymous |
Abstract | Deep generative models are generally categorized into explicit models and implicit models. The former assumes an explicit density form whose normalizing constant is often unknown; while the latter, including generative adversarial networks (GANs), generates samples using a push-forward mapping. In spite of substantial recent advances demonstrating the power of the two classes of generative models in many applications, both of them, when used alone, suffer from respective limitations and drawbacks. To mitigate these issues, we propose Stein Bridging, a novel joint training framework that connects an explicit density estimator and an implicit sample generator with Stein discrepancy. We show that the Stein Bridge induces new regularization schemes for both explicit and implicit models. Convergence analysis and extensive experiments demonstrate that the Stein Bridging i) improves the stability and sample quality of the GAN training, and ii) facilitates the density estimator to seek more modes in data and alleviate the mode-collapse issue. Additionally, we discuss several applications of Stein Bridging and useful tricks in practical implementation used in our experiments. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1gx3kSKPS |
https://openreview.net/pdf?id=H1gx3kSKPS | |
PWC | https://paperswithcode.com/paper/stein-bridging-enabling-mutual-reinforcement |
Repo | |
Framework | |
Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods
Title | Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods |
Authors | Diego Granziol, Timur Garipov, Dmitry Vetrov, Stefan Zohren, Stephen Roberts, Andrew Gordon Wilson |
Abstract | The geometric properties of loss surfaces, such as the local flatness of a solution, are associated with generalization in deep learning. The Hessian is often used to understand these geometric properties. We investigate the differences between the eigenvalues of the neural network Hessian evaluated over the empirical dataset, the Empirical Hessian, and the eigenvalues of the Hessian under the data generating distribution, which we term the True Hessian. Under mild assumptions, we use random matrix theory to show that the True Hessian has eigenvalues of smaller absolute value than the Empirical Hessian. We support these results for different SGD schedules on both a 110-Layer ResNet and VGG-16. To perform these experiments we propose a framework for spectral visualization, based on GPU accelerated stochastic Lanczos quadrature. This approach is an order of magnitude faster than state-of-the-art methods for spectral visualization, and can be generically used to investigate the spectral properties of matrices in deep learning. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1gza2NtwH |
https://openreview.net/pdf?id=H1gza2NtwH | |
PWC | https://paperswithcode.com/paper/towards-understanding-the-true-loss-surface |
Repo | |
Framework | |