April 1, 2020

2871 words 14 mins read

Paper Group NANR 103

Paper Group NANR 103

SCELMo: Source Code Embeddings from Language Models. AN ATTENTION-BASED DEEP NET FOR LEARNING TO RANK. Probability Calibration for Knowledge Graph Embedding Models. Random Bias Initialization Improving Binary Neural Network Training. LEVERAGING AUXILIARY TEXT FOR DEEP RECOGNITION OF UNSEEN VISUAL RELATIONSHIPS. Learning Likelihoods with Conditional …

SCELMo: Source Code Embeddings from Language Models

Title SCELMo: Source Code Embeddings from Language Models
Authors Anonymous
Abstract Continuous embeddings of tokens in computer programs have been used to support a variety of software development tools, including readability, code search, and program repair. Contextual embeddings are common in natural language processing but have not been previously applied in software engineering. We introduce a new set of deep contextualized word representations for computer programs based on language models. We train a set of embeddings using the ELMo (embeddings from language models) framework of Peters et al (2018). We investigate whether these embeddings are effective when fine-tuned for the downstream task of bug detection. We show that even a low-dimensional embedding trained on a relatively small corpus of programs can improve a state-of-the-art machine learning system for bug detection.
Tasks Code Search
Published 2020-01-01
URL https://openreview.net/forum?id=ryxnJlSKvr
PDF https://openreview.net/pdf?id=ryxnJlSKvr
PWC https://paperswithcode.com/paper/scelmo-source-code-embeddings-from-language
Repo
Framework

AN ATTENTION-BASED DEEP NET FOR LEARNING TO RANK

Title AN ATTENTION-BASED DEEP NET FOR LEARNING TO RANK
Authors Anonymous
Abstract In information retrieval, learning to rank constructs a machine-based ranking model which given a query, sorts the search results by their degree of relevance or importance to the query. Neural networks have been successfully applied to this problem, and in this paper, we propose an attention-based deep neural network which better incorporates different embeddings of the queries and search results with an attention-based mechanism. This model also applies a decoder mechanism to learn the ranks of the search results in a listwise fashion. The embeddings are trained with convolutional neural networks or the word2vec model. We demonstrate the performance of this model with image retrieval and text querying data sets.
Tasks Image Retrieval, Information Retrieval, Learning-To-Rank
Published 2020-01-01
URL https://openreview.net/forum?id=BJgxzlSFvr
PDF https://openreview.net/pdf?id=BJgxzlSFvr
PWC https://paperswithcode.com/paper/an-attention-based-deep-net-for-learning-to-1
Repo
Framework

Probability Calibration for Knowledge Graph Embedding Models

Title Probability Calibration for Knowledge Graph Embedding Models
Authors Anonymous
Abstract Knowledge graph embedding research has overlooked the problem of probability calibration. We show popular embedding models are indeed uncalibrated. That means probability estimates associated to predicted triples are unreliable. We present a novel method to calibrate a model when ground truth negatives are not available, which is the usual case in knowledge graphs. We propose to use Platt scaling and isotonic regression alongside our method. Experiments on three datasets with ground truth negatives show our contribution leads to well calibrated models when compared to the gold standard of using negatives. We get significantly better results than the uncalibrated models from all calibration methods. We show isotonic regression offers the best the performance overall, not without trade-offs. We also show that calibrated models reach state-of-the-art accuracy without the need to define relation-specific decision thresholds.
Tasks Calibration, Graph Embedding, Knowledge Graph Embedding, Knowledge Graphs
Published 2020-01-01
URL https://openreview.net/forum?id=S1g8K1BFwS
PDF https://openreview.net/pdf?id=S1g8K1BFwS
PWC https://paperswithcode.com/paper/probability-calibration-for-knowledge-graph
Repo
Framework

Random Bias Initialization Improving Binary Neural Network Training

Title Random Bias Initialization Improving Binary Neural Network Training
Authors Anonymous
Abstract Edge intelligence especially binary neural network (BNN) has attracted considerable attention of the artificial intelligence community recently. BNNs significantly reduce the computational cost, model size, and memory footprint. However, there is still a performance gap between the successful full-precision neural network with ReLU activation and BNNs. We argue that the accuracy drop of BNNs is due to their geometry. We analyze the behaviour of the full-precision neural network with ReLU activation and compare it with its binarized counterpart. This comparison suggests random bias initialization as a remedy to activation saturation in full-precision networks and leads us towards an improved BNN training. Our numerical experiments confirm our geometric intuition.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJx4Ogrtvr
PDF https://openreview.net/pdf?id=SJx4Ogrtvr
PWC https://paperswithcode.com/paper/random-bias-initialization-improving-binary
Repo
Framework

LEVERAGING AUXILIARY TEXT FOR DEEP RECOGNITION OF UNSEEN VISUAL RELATIONSHIPS

Title LEVERAGING AUXILIARY TEXT FOR DEEP RECOGNITION OF UNSEEN VISUAL RELATIONSHIPS
Authors Anonymous
Abstract One of the most difficult tasks in \emph{scene understanding} is recognizing interactions between objects in an image. This task is often called \emph{visual relationship detection} (VRD). We consider the question of whether, given auxiliary textual data in addition to the standard visual data used for training VRD models, VRD performance can be improved. We present a new deep model that can leverage additional textual data. Our model relies on a shared text–image representation of subject-verb-object relationships appearing in the text, and object interactions in images. Our method is the first to enable recognition of visual relationships missing in the visual training data and appearing only in the auxiliary text. We test our approach on two different text sources: text originating in images and text originating in books. We test and validate our approach using two large-scale recognition tasks: VRD and Scene Graph Generation. We show a surprising result: Our approach works better with text originating in books, and outperforms the text originating in images on the task of unseen relationship recognition. It is comparable to the model which utilizes text originating in images on the task of seen relationship recognition.
Tasks Graph Generation, Scene Graph Generation, Scene Understanding
Published 2020-01-01
URL https://openreview.net/forum?id=SyePj6NYwS
PDF https://openreview.net/pdf?id=SyePj6NYwS
PWC https://paperswithcode.com/paper/leveraging-auxiliary-text-for-deep
Repo
Framework

Learning Likelihoods with Conditional Normalizing Flows

Title Learning Likelihoods with Conditional Normalizing Flows
Authors Anonymous
Abstract Normalizing Flows (NFs) are able to model complicated distributions p(y) with strong inter-dimensional correlations and high multimodality by transforming a simple base density p(z) through an invertible neural network under the change of variables formula. Such behavior is desirable in multivariate structured prediction tasks, where handcrafted per-pixel loss-based methods inadequately capture strong correlations between output dimensions. We present a study of conditional normalizing flows (CNFs), a class of NFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(yx). CNFs are efficient in sampling and inference, they can be trained with a likelihood-based objective, and CNFs, being generative flows, do not suffer from mode collapse or training instabilities. We provide an effective method to train continuous CNFs for binary problems and in particular, we apply these CNFs to super-resolution and vessel segmentation tasks demonstrating competitive performance on standard benchmark datasets in terms of likelihood and conventional metrics.
Tasks Structured Prediction, Super-Resolution
Published 2020-01-01
URL https://openreview.net/forum?id=rJg3zxBYwH
PDF https://openreview.net/pdf?id=rJg3zxBYwH
PWC https://paperswithcode.com/paper/learning-likelihoods-with-conditional
Repo
Framework

Population-Guided Parallel Policy Search for Reinforcement Learning

Title Population-Guided Parallel Policy Search for Reinforcement Learning
Authors Anonymous
Abstract In this paper, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL). In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information. The key point is that the information of the best policy is fused in a soft manner by constructing an augmented loss function for policy update to enlarge the overall search region by the multiple learners. The guidance by the previous best policy and the enlarged range enable faster and better policy search, and monotone improvement of the expected cumulative return by the proposed scheme is proved theoretically. Working algorithms are constructed by applying the proposed scheme to the twin delayed deep deterministic (TD3) policy gradient algorithm, and numerical results show that the constructed P3S-TD3 outperforms most of the current state-of-the-art RL algorithms, and the gain is significant in the case of sparse reward environment.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJeINp4KwH
PDF https://openreview.net/pdf?id=rJeINp4KwH
PWC https://paperswithcode.com/paper/population-guided-parallel-policy-search-for
Repo
Framework

Biologically Plausible Neural Networks via Evolutionary Dynamics and Dopaminergic Plasticity

Title Biologically Plausible Neural Networks via Evolutionary Dynamics and Dopaminergic Plasticity
Authors Anonymous
Abstract Artificial neural networks (ANNs) lack in biological plausibility, chiefly because backpropagation requires a variant of plasticity (precise changes of the synaptic weights informed by neural events that occur downstream in the neural circuit) that is profoundly incompatible with the current understanding of the animal brain. Here we propose that backpropagation can happen in evolutionary time, instead of lifetime, in what we call neural net evolution (NNE). In NNE the weights of the links of the neural net are sparse linear functions of the animal’s genes, where each gene has two alleles, 0 and 1. In each generation, a population is generated at random based on current allele frequencies, and it is tested in the learning task. The relative performance of the two alleles of each gene over the whole population is determined, and the allele frequencies are updated via the standard population genetics equations for the weak selection regime. We prove that, under assumptions, NNE succeeds in learning simple labeling functions with high probability, and with polynomially many generations and individuals per generation. We test the NNE concept, with only one hidden layer, on MNIST with encouraging results. Finally, we explore a further version of biologically plausible ANNs inspired by the recent discovery in animals of dopaminergic plasticity: the increase of the strength of a synapse that fired if dopamine was released soon after the firing.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rkg6FgrtPB
PDF https://openreview.net/pdf?id=rkg6FgrtPB
PWC https://paperswithcode.com/paper/biologically-plausible-neural-networks-via
Repo
Framework

A Latent Morphology Model for Open-Vocabulary Neural Machine Translation

Title A Latent Morphology Model for Open-Vocabulary Neural Machine Translation
Authors Anonymous
Abstract Translation into morphologically-rich languages challenges neural machine translation (NMT) models with extremely sparse vocabularies where atomic treatment of surface forms is unrealistic. This problem is typically addressed by either pre-processing words into subword units or performing translation directly at the level of characters. The former is based on word segmentation algorithms optimized using corpus-level statistics with no regard to the translation task. The latter learns directly from translation data but requires rather deep architectures. In this paper, we propose to translate words by modeling word formation through a hierarchical latent variable model which mimics the process of morphological inflection. Our model generates words one character at a time by composing two latent representations: a continuous one, aimed at capturing the lexical semantics, and a set of (approximately) discrete features, aimed at capturing the morphosyntactic function, which are shared among different surface forms. Our model achieves better accuracy in translation into three morphologically-rich languages than conventional open-vocabulary NMT methods, while also demonstrating a better generalization capacity under low to mid-resource settings.
Tasks Machine Translation, Morphological Inflection
Published 2020-01-01
URL https://openreview.net/forum?id=BJxSI1SKDH
PDF https://openreview.net/pdf?id=BJxSI1SKDH
PWC https://paperswithcode.com/paper/a-latent-morphology-model-for-open-vocabulary
Repo
Framework

Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control

Title Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control
Authors Anonymous
Abstract Deep reinforcement learning has achieved great success in many previously difficult reinforcement learning tasks, yet recent studies show that deep RL agents are also unavoidably susceptible to adversarial perturbations, similar to deep neural networks in classification tasks. Prior works mostly focus on model-free adversarial attacks and agents with discrete actions. In this work, we study the problem of continuous control agents in deep RL with adversarial attacks and propose the first two-step algorithm based on learned model dynamics. Extensive experiments on various MuJoCo domains (Cartpole, Fish, Walker, Humanoid) demonstrate that our proposed framework is much more effective and efficient than model-free based attacks baselines in degrading agent performance as well as driving agents to unsafe states.
Tasks Continuous Control
Published 2020-01-01
URL https://openreview.net/forum?id=SylL0krYPS
PDF https://openreview.net/pdf?id=SylL0krYPS
PWC https://paperswithcode.com/paper/toward-evaluating-robustness-of-deep
Repo
Framework

Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering

Title Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering
Authors Anonymous
Abstract A sparse representation is known to be an effective means to encode precise lexical cues in information retrieval tasks by associating each dimension with a unique n-gram-based feature. However, it has often relied on term frequency (such as tf-idf and BM25) or hand-engineered features that are coarse-grained (document-level) and often task-specific, hence not easily generalizable and not appropriate for fine-grained (word or phrase-level) retrieval. In this work, we propose an effective method for learning a highly contextualized, word-level sparse representation by utilizing rectified self-attention weights on the neighboring n-grams. We kernelize the inner product space during training for memory efficiency without the explicit mapping of the large sparse vectors. We particularly focus on the application of our model to phrase retrieval problem, which has recently shown to be a promising direction for open-domain question answering (QA) and requires lexically sensitive phrase encoding. We demonstrate the effectiveness of the learned sparse representations by not only drastically improving the phrase retrieval accuracy (by more than 4%), but also outperforming all other (pipeline-based) open-domain QA methods with up to 97x faster inference in SQuADopen and CuratedTrec.
Tasks Information Retrieval, Open-Domain Question Answering, Question Answering
Published 2020-01-01
URL https://openreview.net/forum?id=ryxgegBKwr
PDF https://openreview.net/pdf?id=ryxgegBKwr
PWC https://paperswithcode.com/paper/contextualized-sparse-representation-with
Repo
Framework

BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

Title BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
Authors Anonymous
Abstract The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustively searching for such a combination is prohibitively expensive. In this work, we develop BlockSwap: a fast algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. These networks can then be used as students and distilled with the original large network as a teacher. We demonstrate the effectiveness of the chosen networks across CIFAR-10 and ImageNet for classification, and COCO for detection, and provide a comprehensive ablation study of our approach. BlockSwap quickly explores possible block configurations using a simple architecture ranking system, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques (e.g. 8 minutes on a single CPU for CIFAR-10).
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SklkDkSFPB
PDF https://openreview.net/pdf?id=SklkDkSFPB
PWC https://paperswithcode.com/paper/blockswap-fisher-guided-block-substitution-1
Repo
Framework

Decoding As Dynamic Programming For Recurrent Autoregressive Models

Title Decoding As Dynamic Programming For Recurrent Autoregressive Models
Authors Anonymous
Abstract Decoding in autoregressive models (ARMs) consists of searching for a high scoring output sequence under the trained model. Standard decoding methods, based on unidirectional greedy algorithm or beam search, are suboptimal due to error propagation and myopic decisions which do not account for future steps in the generation process. In this paper we present a novel decoding approach based on the method of auxiliary coordinates (Carreira-Perpinan & Wang, 2014) to address the aforementioned shortcomings. Our method introduces discrete variables for output tokens, and auxiliary continuous variables representing the states of the underlying ARM. The auxiliary variables lead to a factor graph approximation of the ARM, whose maximum a posteriori (MAP) inference is found exactly using dynamic programming. The MAP inference is then used to recreate an improved factor graph approximation of the ARM via updated auxiliary variables. We then extend our approach to decode in an ensemble of ARMs, possibly with different generation orders, which is out of reach for the standard unidirectional decoding algorithms. Experiments on the text infilling task over SWAG and Daily Dialogue datasets show that our decoding method is superior to strong unidirectional decoding baselines.
Tasks Text Infilling
Published 2020-01-01
URL https://openreview.net/forum?id=HklOo0VFDH
PDF https://openreview.net/pdf?id=HklOo0VFDH
PWC https://paperswithcode.com/paper/decoding-as-dynamic-programming-for-recurrent
Repo
Framework

Sampling-Free Learning of Bayesian Quantized Neural Networks

Title Sampling-Free Learning of Bayesian Quantized Neural Networks
Authors Anonymous
Abstract Bayesian learning of model parameters in neural networks is important in scenarios where estimates with well-calibrated uncertainty are important. In this pa- per, we propose Bayesian quantized networks (BQNs), quantized neural networks (QNNs) for which we learn a posterior distribution over their discrete parameters. We provide a set of efficient algorithms for learning and prediction in BQNs without the need to sample from their parameters or activations, which not only allows for differentiable learning in quantized models but also reduces the variance in gradients estimation. We evaluate BQNs on MNIST, Fashion-MNIST and KMNIST classification datasets compared against bootstrap ensemble of QNNs (E-QNN). We demonstrate BQNs achieve both lower predictive errors and better-calibrated uncertainties than E-QNN (with less than 20% of the negative log-likelihood).
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rylVHR4FPB
PDF https://openreview.net/pdf?id=rylVHR4FPB
PWC https://paperswithcode.com/paper/sampling-free-learning-of-bayesian-quantized
Repo
Framework

On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

Title On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach
Authors Anonymous
Abstract Many tasks in modern machine learning can be formulated as finding equilibria in sequential games. In particular, two-player zero-sum sequential games, also known as minimax optimization, have received growing interest. It is tempting to apply gradient descent to solve minimax optimization given its popularity and success in supervised learning. However, it has been noted that naive application of gradient descent fails to find some local minimax and can converge to non-local-minimax points. In this paper, we propose Follow-the-Ridge (FR), a novel algorithm that provably converges to and only converges to local minimax. We show theoretically that the algorithm addresses the notorious rotational behaviour of gradient dynamics, and is compatible with preconditioning and positive momentum. Empirically, FR solves toy minimax problems and improves the convergence of GAN training compared to the recent minimax optimization algorithms.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Hkx7_1rKwS
PDF https://openreview.net/pdf?id=Hkx7_1rKwS
PWC https://paperswithcode.com/paper/on-solving-minimax-optimization-locally-a
Repo
Framework
comments powered by Disqus