May 7, 2019

2659 words 13 mins read

Paper Group AWR 65

Paper Group AWR 65

Asynchronous Stochastic Gradient Descent with Delay Compensation. Sifting Common Information from Many Variables. Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning. Language Models with Pre-Trained (GloVe) Word Embeddings. Preco …

Asynchronous Stochastic Gradient Descent with Delay Compensation

Title Asynchronous Stochastic Gradient Descent with Delay Compensation
Authors Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu
Abstract With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is, however, known to suffer from the problem of delayed gradients. That is, when a local worker adds its gradient to the global model, the global model may have been updated by other workers and this gradient becomes “delayed”. We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD. This is achieved by leveraging Taylor expansion of the gradient function and efficient approximation to the Hessian matrix of the loss function. We call the new algorithm Delay Compensated ASGD (DC-ASGD). We evaluated the proposed algorithm on CIFAR-10 and ImageNet datasets, and the experimental results demonstrate that DC-ASGD outperforms both synchronous SGD and asynchronous SGD, and nearly approaches the performance of sequential SGD.
Tasks
Published 2016-09-27
URL https://arxiv.org/abs/1609.08326v6
PDF https://arxiv.org/pdf/1609.08326v6.pdf
PWC https://paperswithcode.com/paper/asynchronous-stochastic-gradient-descent-with-1
Repo https://github.com/microsoft/dmtk
Framework torch

Sifting Common Information from Many Variables

Title Sifting Common Information from Many Variables
Authors Greg Ver Steeg, Shuyang Gao, Kyle Reing, Aram Galstyan
Abstract Measuring the relationship between any pair of variables is a rich and active area of research that is central to scientific practice. In contrast, characterizing the common information among any group of variables is typically a theoretical exercise with few practical methods for high-dimensional data. A promising solution would be a multivariate generalization of the famous Wyner common information, but this approach relies on solving an apparently intractable optimization problem. We leverage the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables. This scalable approach allows us to demonstrate the usefulness of common information in high-dimensional learning problems. The sieve outperforms standard methods on dimensionality reduction tasks, solves a blind source separation problem that cannot be solved with ICA, and accurately recovers structure in brain imaging data.
Tasks Dimensionality Reduction
Published 2016-06-07
URL http://arxiv.org/abs/1606.02307v4
PDF http://arxiv.org/pdf/1606.02307v4.pdf
PWC https://paperswithcode.com/paper/sifting-common-information-from-many
Repo https://github.com/gregversteeg/LinearSieve
Framework tf

Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents

Title Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents
Authors Rui Zhang, Honglak Lee, Dragomir Radev
Abstract The goal of sentence and document modeling is to accurately represent the meaning of sentences and documents for various Natural Language Processing tasks. In this work, we present Dependency Sensitive Convolutional Neural Networks (DSCNN) as a general-purpose classification system for both sentences and documents. DSCNN hierarchically builds textual representations by processing pretrained word embeddings via Long Short-Term Memory networks and subsequently extracting features with convolution operators. Compared with existing recursive neural models with tree structures, DSCNN does not rely on parsers and expensive phrase labeling, and thus is not restricted to sentence-level tasks. Moreover, unlike other CNN-based models that analyze sentences locally by sliding windows, our system captures both the dependency information within each sentence and relationships across sentences in the same document. Experiment results demonstrate that our approach is achieving state-of-the-art performance on several tasks, including sentiment analysis, question type classification, and subjectivity classification.
Tasks Sentence Embeddings, Sentiment Analysis, Word Embeddings
Published 2016-11-08
URL http://arxiv.org/abs/1611.02361v1
PDF http://arxiv.org/pdf/1611.02361v1.pdf
PWC https://paperswithcode.com/paper/dependency-sensitive-convolutional-neural
Repo https://github.com/ManuelVs/NeuralNetworks
Framework tf

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Title Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning
Authors Oron Anschel, Nir Baram, Nahum Shimkin
Abstract Instability and variability of Deep Reinforcement Learning (DRL) algorithms tend to adversely affect their performance. Averaged-DQN is a simple extension to the DQN algorithm, based on averaging previously learned Q-values estimates, which leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values. To understand the effect of the algorithm, we examine the source of value function estimation errors and provide an analytical comparison within a simplified model. We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension.
Tasks Atari Games
Published 2016-11-07
URL http://arxiv.org/abs/1611.01929v4
PDF http://arxiv.org/pdf/1611.01929v4.pdf
PWC https://paperswithcode.com/paper/averaged-dqn-variance-reduction-and
Repo https://github.com/qlan3/Explorer
Framework pytorch

Language Models with Pre-Trained (GloVe) Word Embeddings

Title Language Models with Pre-Trained (GloVe) Word Embeddings
Authors Victor Makarenkov, Bracha Shapira, Lior Rokach
Abstract In this work we implement a training of a Language Model (LM), using Recurrent Neural Network (RNN) and GloVe word embeddings, introduced by Pennigton et al. in [1]. The implementation is following the general idea of training RNNs for LM tasks presented in [2], but is rather using Gated Recurrent Unit (GRU) [3] for a memory cell, and not the more commonly used LSTM [4].
Tasks Language Modelling, Word Embeddings
Published 2016-10-12
URL http://arxiv.org/abs/1610.03759v2
PDF http://arxiv.org/pdf/1610.03759v2.pdf
PWC https://paperswithcode.com/paper/language-models-with-pre-trained-glove-word
Repo https://github.com/vicmak/ProofSeer
Framework none

Preconditioning Kernel Matrices

Title Preconditioning Kernel Matrices
Authors Kurt Cutajar, Michael A. Osborne, John P. Cunningham, Maurizio Filippone
Abstract The computational and storage complexity of kernel machines presents the primary barrier to their scaling to large, modern, datasets. A common way to tackle the scalability issue is to use the conjugate gradient algorithm, which relieves the constraints on both storage (the kernel matrix need not be stored) and computation (both stochastic gradients and parallelization can be used). Even so, conjugate gradient is not without its own issues: the conditioning of kernel matrices is often such that conjugate gradients will have poor convergence in practice. Preconditioning is a common approach to alleviating this issue. Here we propose preconditioned conjugate gradients for kernel machines, and develop a broad range of preconditioners particularly useful for kernel matrices. We describe a scalable approach to both solving kernel machines and learning their hyperparameters. We show this approach is exact in the limit of iterations and outperforms state-of-the-art approximations for a given computational budget.
Tasks
Published 2016-02-22
URL http://arxiv.org/abs/1602.06693v2
PDF http://arxiv.org/pdf/1602.06693v2.pdf
PWC https://paperswithcode.com/paper/preconditioning-kernel-matrices
Repo https://github.com/shrutimoy10/Preconditioning-Kernel-Matrices
Framework none

Learning without Forgetting

Title Learning without Forgetting
Authors Zhizhong Li, Derek Hoiem
Abstract When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.
Tasks
Published 2016-06-29
URL http://arxiv.org/abs/1606.09282v3
PDF http://arxiv.org/pdf/1606.09282v3.pdf
PWC https://paperswithcode.com/paper/learning-without-forgetting
Repo https://github.com/wannabeOG/ExpertNet-Pytorch
Framework pytorch

Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations

Title Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations
Authors Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, Ondrej Chum
Abstract Query expansion is a popular method to improve the quality of image retrieval with both conventional and CNN representations. It has been so far limited to global image similarity. This work focuses on diffusion, a mechanism that captures the image manifold in the feature space. The diffusion is carried out on descriptors of overlapping image regions rather than on a global image descriptor like in previous approaches. An efficient off-line stage allows optional reduction in the number of stored regions. In the on-line stage, the proposed handling of unseen queries in the indexing stage removes additional computation to adjust the precomputed data. We perform diffusion through a sparse linear system solver, yielding practical query times well below one second. Experimentally, we observe a significant boost in performance of image retrieval with compact CNN descriptors on standard benchmarks, especially when the query object covers only a small part of the image. Small objects have been a common failure case of CNN-based retrieval.
Tasks Image Retrieval
Published 2016-11-16
URL https://arxiv.org/abs/1611.05113v3
PDF https://arxiv.org/pdf/1611.05113v3.pdf
PWC https://paperswithcode.com/paper/efficient-diffusion-on-region-manifolds
Repo https://github.com/ahmetius/diffusion-retrieval
Framework none

A Subsequence Interleaving Model for Sequential Pattern Mining

Title A Subsequence Interleaving Model for Sequential Pattern Mining
Authors Jaroslav Fowkes, Charles Sutton
Abstract Recent sequential pattern mining methods have used the minimum description length (MDL) principle to define an encoding scheme which describes an algorithm for mining the most compressing patterns in a database. We present a novel subsequence interleaving model based on a probabilistic model of the sequence database, which allows us to search for the most compressing set of patterns without designing a specific encoding scheme. Our proposed algorithm is able to efficiently mine the most relevant sequential patterns and rank them using an associated measure of interestingness. The efficient inference in our model is a direct result of our use of a structural expectation-maximization framework, in which the expectation-step takes the form of a submodular optimization problem subject to a coverage constraint. We show on both synthetic and real world datasets that our model mines a set of sequential patterns with low spuriousness and redundancy, high interpretability and usefulness in real-world applications. Furthermore, we demonstrate that the quality of the patterns from our approach is comparable to, if not better than, existing state of the art sequential pattern mining algorithms.
Tasks Sequential Pattern Mining
Published 2016-02-16
URL http://arxiv.org/abs/1602.05012v2
PDF http://arxiv.org/pdf/1602.05012v2.pdf
PWC https://paperswithcode.com/paper/a-subsequence-interleaving-model-for
Repo https://github.com/mast-group/sequence-mining
Framework none

Diet Networks: Thin Parameters for Fat Genomics

Title Diet Networks: Thin Parameters for Fat Genomics
Authors Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio
Abstract Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature’s distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.
Tasks
Published 2016-11-28
URL http://arxiv.org/abs/1611.09340v3
PDF http://arxiv.org/pdf/1611.09340v3.pdf
PWC https://paperswithcode.com/paper/diet-networks-thin-parameters-for-fat
Repo https://github.com/ze1gades/diary
Framework none

Deep Motif: Visualizing Genomic Sequence Classifications

Title Deep Motif: Visualizing Genomic Sequence Classifications
Authors Jack Lanchantin, Ritambhara Singh, Zeming Lin, Yanjun Qi
Abstract This paper applies a deep convolutional/highway MLP framework to classify genomic sequences on the transcription factor binding site task. To make the model understandable, we propose an optimization driven strategy to extract “motifs”, or symbolic patterns which visualize the positive class learned by the network. We show that our system, Deep Motif (DeMo), extracts motifs that are similar to, and in some cases outperform the current well known motifs. In addition, we find that a deeper model consisting of multiple convolutional and highway layers can outperform a single convolutional and fully connected layer in the previous state-of-the-art.
Tasks
Published 2016-05-04
URL http://arxiv.org/abs/1605.01133v2
PDF http://arxiv.org/pdf/1605.01133v2.pdf
PWC https://paperswithcode.com/paper/deep-motif-visualizing-genomic-sequence
Repo https://github.com/bakirillov/deepmotif4pytorch
Framework pytorch

Visualization Regularizers for Neural Network based Image Recognition

Title Visualization Regularizers for Neural Network based Image Recognition
Authors Biswajit Paria, Vikas Reddy, Anirban Santara, Pabitra Mitra
Abstract The success of deep neural networks is mostly due their ability to learn meaningful features from the data. Features learned in the hidden layers of deep neural networks trained in computer vision tasks have been shown to be similar to mid-level vision features. We leverage this fact in this work and propose the visualization regularizer for image tasks. The proposed regularization technique enforces smoothness of the features learned by hidden nodes and turns out to be a special case of Tikhonov regularization. We achieve higher classification accuracy as compared to existing regularizers such as the L2 norm regularizer and dropout, on benchmark datasets without changing the training computational complexity.
Tasks
Published 2016-04-10
URL http://arxiv.org/abs/1604.02646v3
PDF http://arxiv.org/pdf/1604.02646v3.pdf
PWC https://paperswithcode.com/paper/visualization-regularizers-for-neural-network
Repo https://github.com/biswajitsc/VisRegDL
Framework none

Evaluating Informal-Domain Word Representations With UrbanDictionary

Title Evaluating Informal-Domain Word Representations With UrbanDictionary
Authors Naomi Saphra, Adam Lopez
Abstract Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums. We want to test whether a representation of informal words fulfills the promise of eliding explicit text normalization as a preprocessing step. One possible evaluation metric for such domains is the proximity of spelling variants. We propose how such a metric might be computed and how a spelling variant dataset can be collected using UrbanDictionary.
Tasks
Published 2016-06-27
URL http://arxiv.org/abs/1606.08270v1
PDF http://arxiv.org/pdf/1606.08270v1.pdf
PWC https://paperswithcode.com/paper/evaluating-informal-domain-word
Repo https://github.com/nsaphra/urbandic-scraper
Framework none

DOLDA - a regularized supervised topic model for high-dimensional multi-class regression

Title DOLDA - a regularized supervised topic model for high-dimensional multi-class regression
Authors Måns Magnusson, Leif Jonsson, Mattias Villani
Abstract Generating user interpretable multi-class predictions in data rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle both many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant (DO) probit model (Johndrow et al., 2013) together with an efficient Horseshoe prior for variable selection/shrinkage (Carvalho et al., 2010). We propose a computationally efficient parallel Gibbs sampler for the new model. An important advantage of DOLDA is that learned topics are directly connected to individual classes without the need for a reference class. We evaluate the model’s predictive accuracy on two datasets and demonstrate DOLDA’s advantage in interpreting the generated predictions.
Tasks
Published 2016-01-31
URL http://arxiv.org/abs/1602.00260v2
PDF http://arxiv.org/pdf/1602.00260v2.pdf
PWC https://paperswithcode.com/paper/dolda-a-regularized-supervised-topic-model
Repo https://github.com/lejon/DiagonalOrthantLDA
Framework none

Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs

Title Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs
Authors Swabha Swayamdipta, Miguel Ballesteros, Chris Dyer, Noah A. Smith
Abstract We present a transition-based parser that jointly produces syntactic and semantic dependencies. It learns a representation of the entire algorithm state, using stack long short-term memories. Our greedy inference algorithm has linear time, including feature extraction. On the CoNLL 2008–9 English shared tasks, we obtain the best published parsing performance among models that jointly learn syntax and semantics.
Tasks Semantic Parsing
Published 2016-06-29
URL http://arxiv.org/abs/1606.08954v2
PDF http://arxiv.org/pdf/1606.08954v2.pdf
PWC https://paperswithcode.com/paper/greedy-joint-syntactic-semantic-parsing-with
Repo https://github.com/clab/joint-lstm-parser
Framework none
comments powered by Disqus