Paper Group AWR 65
Asynchronous Stochastic Gradient Descent with Delay Compensation. Sifting Common Information from Many Variables. Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning. Language Models with Pre-Trained (GloVe) Word Embeddings. Preco …
Asynchronous Stochastic Gradient Descent with Delay Compensation
Title | Asynchronous Stochastic Gradient Descent with Delay Compensation |
Authors | Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu |
Abstract | With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is, however, known to suffer from the problem of delayed gradients. That is, when a local worker adds its gradient to the global model, the global model may have been updated by other workers and this gradient becomes “delayed”. We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD. This is achieved by leveraging Taylor expansion of the gradient function and efficient approximation to the Hessian matrix of the loss function. We call the new algorithm Delay Compensated ASGD (DC-ASGD). We evaluated the proposed algorithm on CIFAR-10 and ImageNet datasets, and the experimental results demonstrate that DC-ASGD outperforms both synchronous SGD and asynchronous SGD, and nearly approaches the performance of sequential SGD. |
Tasks | |
Published | 2016-09-27 |
URL | https://arxiv.org/abs/1609.08326v6 |
https://arxiv.org/pdf/1609.08326v6.pdf | |
PWC | https://paperswithcode.com/paper/asynchronous-stochastic-gradient-descent-with-1 |
Repo | https://github.com/microsoft/dmtk |
Framework | torch |
Sifting Common Information from Many Variables
Title | Sifting Common Information from Many Variables |
Authors | Greg Ver Steeg, Shuyang Gao, Kyle Reing, Aram Galstyan |
Abstract | Measuring the relationship between any pair of variables is a rich and active area of research that is central to scientific practice. In contrast, characterizing the common information among any group of variables is typically a theoretical exercise with few practical methods for high-dimensional data. A promising solution would be a multivariate generalization of the famous Wyner common information, but this approach relies on solving an apparently intractable optimization problem. We leverage the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables. This scalable approach allows us to demonstrate the usefulness of common information in high-dimensional learning problems. The sieve outperforms standard methods on dimensionality reduction tasks, solves a blind source separation problem that cannot be solved with ICA, and accurately recovers structure in brain imaging data. |
Tasks | Dimensionality Reduction |
Published | 2016-06-07 |
URL | http://arxiv.org/abs/1606.02307v4 |
http://arxiv.org/pdf/1606.02307v4.pdf | |
PWC | https://paperswithcode.com/paper/sifting-common-information-from-many |
Repo | https://github.com/gregversteeg/LinearSieve |
Framework | tf |
Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents
Title | Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents |
Authors | Rui Zhang, Honglak Lee, Dragomir Radev |
Abstract | The goal of sentence and document modeling is to accurately represent the meaning of sentences and documents for various Natural Language Processing tasks. In this work, we present Dependency Sensitive Convolutional Neural Networks (DSCNN) as a general-purpose classification system for both sentences and documents. DSCNN hierarchically builds textual representations by processing pretrained word embeddings via Long Short-Term Memory networks and subsequently extracting features with convolution operators. Compared with existing recursive neural models with tree structures, DSCNN does not rely on parsers and expensive phrase labeling, and thus is not restricted to sentence-level tasks. Moreover, unlike other CNN-based models that analyze sentences locally by sliding windows, our system captures both the dependency information within each sentence and relationships across sentences in the same document. Experiment results demonstrate that our approach is achieving state-of-the-art performance on several tasks, including sentiment analysis, question type classification, and subjectivity classification. |
Tasks | Sentence Embeddings, Sentiment Analysis, Word Embeddings |
Published | 2016-11-08 |
URL | http://arxiv.org/abs/1611.02361v1 |
http://arxiv.org/pdf/1611.02361v1.pdf | |
PWC | https://paperswithcode.com/paper/dependency-sensitive-convolutional-neural |
Repo | https://github.com/ManuelVs/NeuralNetworks |
Framework | tf |
Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning
Title | Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning |
Authors | Oron Anschel, Nir Baram, Nahum Shimkin |
Abstract | Instability and variability of Deep Reinforcement Learning (DRL) algorithms tend to adversely affect their performance. Averaged-DQN is a simple extension to the DQN algorithm, based on averaging previously learned Q-values estimates, which leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values. To understand the effect of the algorithm, we examine the source of value function estimation errors and provide an analytical comparison within a simplified model. We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension. |
Tasks | Atari Games |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.01929v4 |
http://arxiv.org/pdf/1611.01929v4.pdf | |
PWC | https://paperswithcode.com/paper/averaged-dqn-variance-reduction-and |
Repo | https://github.com/qlan3/Explorer |
Framework | pytorch |
Language Models with Pre-Trained (GloVe) Word Embeddings
Title | Language Models with Pre-Trained (GloVe) Word Embeddings |
Authors | Victor Makarenkov, Bracha Shapira, Lior Rokach |
Abstract | In this work we implement a training of a Language Model (LM), using Recurrent Neural Network (RNN) and GloVe word embeddings, introduced by Pennigton et al. in [1]. The implementation is following the general idea of training RNNs for LM tasks presented in [2], but is rather using Gated Recurrent Unit (GRU) [3] for a memory cell, and not the more commonly used LSTM [4]. |
Tasks | Language Modelling, Word Embeddings |
Published | 2016-10-12 |
URL | http://arxiv.org/abs/1610.03759v2 |
http://arxiv.org/pdf/1610.03759v2.pdf | |
PWC | https://paperswithcode.com/paper/language-models-with-pre-trained-glove-word |
Repo | https://github.com/vicmak/ProofSeer |
Framework | none |
Preconditioning Kernel Matrices
Title | Preconditioning Kernel Matrices |
Authors | Kurt Cutajar, Michael A. Osborne, John P. Cunningham, Maurizio Filippone |
Abstract | The computational and storage complexity of kernel machines presents the primary barrier to their scaling to large, modern, datasets. A common way to tackle the scalability issue is to use the conjugate gradient algorithm, which relieves the constraints on both storage (the kernel matrix need not be stored) and computation (both stochastic gradients and parallelization can be used). Even so, conjugate gradient is not without its own issues: the conditioning of kernel matrices is often such that conjugate gradients will have poor convergence in practice. Preconditioning is a common approach to alleviating this issue. Here we propose preconditioned conjugate gradients for kernel machines, and develop a broad range of preconditioners particularly useful for kernel matrices. We describe a scalable approach to both solving kernel machines and learning their hyperparameters. We show this approach is exact in the limit of iterations and outperforms state-of-the-art approximations for a given computational budget. |
Tasks | |
Published | 2016-02-22 |
URL | http://arxiv.org/abs/1602.06693v2 |
http://arxiv.org/pdf/1602.06693v2.pdf | |
PWC | https://paperswithcode.com/paper/preconditioning-kernel-matrices |
Repo | https://github.com/shrutimoy10/Preconditioning-Kernel-Matrices |
Framework | none |
Learning without Forgetting
Title | Learning without Forgetting |
Authors | Zhizhong Li, Derek Hoiem |
Abstract | When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance. |
Tasks | |
Published | 2016-06-29 |
URL | http://arxiv.org/abs/1606.09282v3 |
http://arxiv.org/pdf/1606.09282v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-without-forgetting |
Repo | https://github.com/wannabeOG/ExpertNet-Pytorch |
Framework | pytorch |
Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations
Title | Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations |
Authors | Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, Ondrej Chum |
Abstract | Query expansion is a popular method to improve the quality of image retrieval with both conventional and CNN representations. It has been so far limited to global image similarity. This work focuses on diffusion, a mechanism that captures the image manifold in the feature space. The diffusion is carried out on descriptors of overlapping image regions rather than on a global image descriptor like in previous approaches. An efficient off-line stage allows optional reduction in the number of stored regions. In the on-line stage, the proposed handling of unseen queries in the indexing stage removes additional computation to adjust the precomputed data. We perform diffusion through a sparse linear system solver, yielding practical query times well below one second. Experimentally, we observe a significant boost in performance of image retrieval with compact CNN descriptors on standard benchmarks, especially when the query object covers only a small part of the image. Small objects have been a common failure case of CNN-based retrieval. |
Tasks | Image Retrieval |
Published | 2016-11-16 |
URL | https://arxiv.org/abs/1611.05113v3 |
https://arxiv.org/pdf/1611.05113v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-diffusion-on-region-manifolds |
Repo | https://github.com/ahmetius/diffusion-retrieval |
Framework | none |
A Subsequence Interleaving Model for Sequential Pattern Mining
Title | A Subsequence Interleaving Model for Sequential Pattern Mining |
Authors | Jaroslav Fowkes, Charles Sutton |
Abstract | Recent sequential pattern mining methods have used the minimum description length (MDL) principle to define an encoding scheme which describes an algorithm for mining the most compressing patterns in a database. We present a novel subsequence interleaving model based on a probabilistic model of the sequence database, which allows us to search for the most compressing set of patterns without designing a specific encoding scheme. Our proposed algorithm is able to efficiently mine the most relevant sequential patterns and rank them using an associated measure of interestingness. The efficient inference in our model is a direct result of our use of a structural expectation-maximization framework, in which the expectation-step takes the form of a submodular optimization problem subject to a coverage constraint. We show on both synthetic and real world datasets that our model mines a set of sequential patterns with low spuriousness and redundancy, high interpretability and usefulness in real-world applications. Furthermore, we demonstrate that the quality of the patterns from our approach is comparable to, if not better than, existing state of the art sequential pattern mining algorithms. |
Tasks | Sequential Pattern Mining |
Published | 2016-02-16 |
URL | http://arxiv.org/abs/1602.05012v2 |
http://arxiv.org/pdf/1602.05012v2.pdf | |
PWC | https://paperswithcode.com/paper/a-subsequence-interleaving-model-for |
Repo | https://github.com/mast-group/sequence-mining |
Framework | none |
Diet Networks: Thin Parameters for Fat Genomics
Title | Diet Networks: Thin Parameters for Fat Genomics |
Authors | Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio |
Abstract | Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature’s distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier. |
Tasks | |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.09340v3 |
http://arxiv.org/pdf/1611.09340v3.pdf | |
PWC | https://paperswithcode.com/paper/diet-networks-thin-parameters-for-fat |
Repo | https://github.com/ze1gades/diary |
Framework | none |
Deep Motif: Visualizing Genomic Sequence Classifications
Title | Deep Motif: Visualizing Genomic Sequence Classifications |
Authors | Jack Lanchantin, Ritambhara Singh, Zeming Lin, Yanjun Qi |
Abstract | This paper applies a deep convolutional/highway MLP framework to classify genomic sequences on the transcription factor binding site task. To make the model understandable, we propose an optimization driven strategy to extract “motifs”, or symbolic patterns which visualize the positive class learned by the network. We show that our system, Deep Motif (DeMo), extracts motifs that are similar to, and in some cases outperform the current well known motifs. In addition, we find that a deeper model consisting of multiple convolutional and highway layers can outperform a single convolutional and fully connected layer in the previous state-of-the-art. |
Tasks | |
Published | 2016-05-04 |
URL | http://arxiv.org/abs/1605.01133v2 |
http://arxiv.org/pdf/1605.01133v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-motif-visualizing-genomic-sequence |
Repo | https://github.com/bakirillov/deepmotif4pytorch |
Framework | pytorch |
Visualization Regularizers for Neural Network based Image Recognition
Title | Visualization Regularizers for Neural Network based Image Recognition |
Authors | Biswajit Paria, Vikas Reddy, Anirban Santara, Pabitra Mitra |
Abstract | The success of deep neural networks is mostly due their ability to learn meaningful features from the data. Features learned in the hidden layers of deep neural networks trained in computer vision tasks have been shown to be similar to mid-level vision features. We leverage this fact in this work and propose the visualization regularizer for image tasks. The proposed regularization technique enforces smoothness of the features learned by hidden nodes and turns out to be a special case of Tikhonov regularization. We achieve higher classification accuracy as compared to existing regularizers such as the L2 norm regularizer and dropout, on benchmark datasets without changing the training computational complexity. |
Tasks | |
Published | 2016-04-10 |
URL | http://arxiv.org/abs/1604.02646v3 |
http://arxiv.org/pdf/1604.02646v3.pdf | |
PWC | https://paperswithcode.com/paper/visualization-regularizers-for-neural-network |
Repo | https://github.com/biswajitsc/VisRegDL |
Framework | none |
Evaluating Informal-Domain Word Representations With UrbanDictionary
Title | Evaluating Informal-Domain Word Representations With UrbanDictionary |
Authors | Naomi Saphra, Adam Lopez |
Abstract | Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums. We want to test whether a representation of informal words fulfills the promise of eliding explicit text normalization as a preprocessing step. One possible evaluation metric for such domains is the proximity of spelling variants. We propose how such a metric might be computed and how a spelling variant dataset can be collected using UrbanDictionary. |
Tasks | |
Published | 2016-06-27 |
URL | http://arxiv.org/abs/1606.08270v1 |
http://arxiv.org/pdf/1606.08270v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-informal-domain-word |
Repo | https://github.com/nsaphra/urbandic-scraper |
Framework | none |
DOLDA - a regularized supervised topic model for high-dimensional multi-class regression
Title | DOLDA - a regularized supervised topic model for high-dimensional multi-class regression |
Authors | Måns Magnusson, Leif Jonsson, Mattias Villani |
Abstract | Generating user interpretable multi-class predictions in data rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle both many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant (DO) probit model (Johndrow et al., 2013) together with an efficient Horseshoe prior for variable selection/shrinkage (Carvalho et al., 2010). We propose a computationally efficient parallel Gibbs sampler for the new model. An important advantage of DOLDA is that learned topics are directly connected to individual classes without the need for a reference class. We evaluate the model’s predictive accuracy on two datasets and demonstrate DOLDA’s advantage in interpreting the generated predictions. |
Tasks | |
Published | 2016-01-31 |
URL | http://arxiv.org/abs/1602.00260v2 |
http://arxiv.org/pdf/1602.00260v2.pdf | |
PWC | https://paperswithcode.com/paper/dolda-a-regularized-supervised-topic-model |
Repo | https://github.com/lejon/DiagonalOrthantLDA |
Framework | none |
Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs
Title | Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs |
Authors | Swabha Swayamdipta, Miguel Ballesteros, Chris Dyer, Noah A. Smith |
Abstract | We present a transition-based parser that jointly produces syntactic and semantic dependencies. It learns a representation of the entire algorithm state, using stack long short-term memories. Our greedy inference algorithm has linear time, including feature extraction. On the CoNLL 2008–9 English shared tasks, we obtain the best published parsing performance among models that jointly learn syntax and semantics. |
Tasks | Semantic Parsing |
Published | 2016-06-29 |
URL | http://arxiv.org/abs/1606.08954v2 |
http://arxiv.org/pdf/1606.08954v2.pdf | |
PWC | https://paperswithcode.com/paper/greedy-joint-syntactic-semantic-parsing-with |
Repo | https://github.com/clab/joint-lstm-parser |
Framework | none |