May 7, 2019

2659 words 13 mins read

Paper Group AWR 65

Asynchronous Stochastic Gradient Descent with Delay Compensation. Sifting Common Information from Many Variables. Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning. Language Models with Pre-Trained (GloVe) Word Embeddings. Preco …

Asynchronous Stochastic Gradient Descent with Delay Compensation


Title	Asynchronous Stochastic Gradient Descent with Delay Compensation
Authors	Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu
Abstract	With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is, however, known to suffer from the problem of delayed gradients. That is, when a local worker adds its gradient to the global model, the global model may have been updated by other workers and this gradient becomes “delayed”. We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD. This is achieved by leveraging Taylor expansion of the gradient function and efficient approximation to the Hessian matrix of the loss function. We call the new algorithm Delay Compensated ASGD (DC-ASGD). We evaluated the proposed algorithm on CIFAR-10 and ImageNet datasets, and the experimental results demonstrate that DC-ASGD outperforms both synchronous SGD and asynchronous SGD, and nearly approaches the performance of sequential SGD.
Tasks
Published	2016-09-27
URL	https://arxiv.org/abs/1609.08326v6
PDF	https://arxiv.org/pdf/1609.08326v6.pdf
PWC	https://paperswithcode.com/paper/asynchronous-stochastic-gradient-descent-with-1
Repo	https://github.com/microsoft/dmtk
Framework	torch

Sifting Common Information from Many Variables


Title	Sifting Common Information from Many Variables
Authors	Greg Ver Steeg, Shuyang Gao, Kyle Reing, Aram Galstyan
Abstract	Measuring the relationship between any pair of variables is a rich and active area of research that is central to scientific practice. In contrast, characterizing the common information among any group of variables is typically a theoretical exercise with few practical methods for high-dimensional data. A promising solution would be a multivariate generalization of the famous Wyner common information, but this approach relies on solving an apparently intractable optimization problem. We leverage the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables. This scalable approach allows us to demonstrate the usefulness of common information in high-dimensional learning problems. The sieve outperforms standard methods on dimensionality reduction tasks, solves a blind source separation problem that cannot be solved with ICA, and accurately recovers structure in brain imaging data.
Tasks	Dimensionality Reduction
Published	2016-06-07
URL	http://arxiv.org/abs/1606.02307v4
PDF	http://arxiv.org/pdf/1606.02307v4.pdf
PWC	https://paperswithcode.com/paper/sifting-common-information-from-many
Repo	https://github.com/gregversteeg/LinearSieve
Framework	tf

Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents


Title	Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents
Authors	Rui Zhang, Honglak Lee, Dragomir Radev
Abstract	The goal of sentence and document modeling is to accurately represent the meaning of sentences and documents for various Natural Language Processing tasks. In this work, we present Dependency Sensitive Convolutional Neural Networks (DSCNN) as a general-purpose classification system for both sentences and documents. DSCNN hierarchically builds textual representations by processing pretrained word embeddings via Long Short-Term Memory networks and subsequently extracting features with convolution operators. Compared with existing recursive neural models with tree structures, DSCNN does not rely on parsers and expensive phrase labeling, and thus is not restricted to sentence-level tasks. Moreover, unlike other CNN-based models that analyze sentences locally by sliding windows, our system captures both the dependency information within each sentence and relationships across sentences in the same document. Experiment results demonstrate that our approach is achieving state-of-the-art performance on several tasks, including sentiment analysis, question type classification, and subjectivity classification.
Tasks	Sentence Embeddings, Sentiment Analysis, Word Embeddings
Published	2016-11-08
URL	http://arxiv.org/abs/1611.02361v1
PDF	http://arxiv.org/pdf/1611.02361v1.pdf
PWC	https://paperswithcode.com/paper/dependency-sensitive-convolutional-neural
Repo	https://github.com/ManuelVs/NeuralNetworks
Framework	tf

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning


Title	Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning
Authors	Oron Anschel, Nir Baram, Nahum Shimkin
Abstract	Instability and variability of Deep Reinforcement Learning (DRL) algorithms tend to adversely affect their performance. Averaged-DQN is a simple extension to the DQN algorithm, based on averaging previously learned Q-values estimates, which leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values. To understand the effect of the algorithm, we examine the source of value function estimation errors and provide an analytical comparison within a simplified model. We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension.
Tasks	Atari Games
Published	2016-11-07
URL	http://arxiv.org/abs/1611.01929v4
PDF	http://arxiv.org/pdf/1611.01929v4.pdf
PWC	https://paperswithcode.com/paper/averaged-dqn-variance-reduction-and
Repo	https://github.com/qlan3/Explorer
Framework	pytorch

Language Models with Pre-Trained (GloVe) Word Embeddings


Title	Language Models with Pre-Trained (GloVe) Word Embeddings
Authors	Victor Makarenkov, Bracha Shapira, Lior Rokach
Abstract	In this work we implement a training of a Language Model (LM), using Recurrent Neural Network (RNN) and GloVe word embeddings, introduced by Pennigton et al. in [1]. The implementation is following the general idea of training RNNs for LM tasks presented in [2], but is rather using Gated Recurrent Unit (GRU) [3] for a memory cell, and not the more commonly used LSTM [4].
Tasks	Language Modelling, Word Embeddings
Published	2016-10-12
URL	http://arxiv.org/abs/1610.03759v2
PDF	http://arxiv.org/pdf/1610.03759v2.pdf
PWC	https://paperswithcode.com/paper/language-models-with-pre-trained-glove-word
Repo	https://github.com/vicmak/ProofSeer
Framework	none

Preconditioning Kernel Matrices


Title	Preconditioning Kernel Matrices
Authors	Kurt Cutajar, Michael A. Osborne, John P. Cunningham, Maurizio Filippone
Abstract	The computational and storage complexity of kernel machines presents the primary barrier to their scaling to large, modern, datasets. A common way to tackle the scalability issue is to use the conjugate gradient algorithm, which relieves the constraints on both storage (the kernel matrix need not be stored) and computation (both stochastic gradients and parallelization can be used). Even so, conjugate gradient is not without its own issues: the conditioning of kernel matrices is often such that conjugate gradients will have poor convergence in practice. Preconditioning is a common approach to alleviating this issue. Here we propose preconditioned conjugate gradients for kernel machines, and develop a broad range of preconditioners particularly useful for kernel matrices. We describe a scalable approach to both solving kernel machines and learning their hyperparameters. We show this approach is exact in the limit of iterations and outperforms state-of-the-art approximations for a given computational budget.
Tasks
Published	2016-02-22
URL	http://arxiv.org/abs/1602.06693v2
PDF	http://arxiv.org/pdf/1602.06693v2.pdf
PWC	https://paperswithcode.com/paper/preconditioning-kernel-matrices
Repo	https://github.com/shrutimoy10/Preconditioning-Kernel-Matrices
Framework	none

Learning without Forgetting


Title	Learning without Forgetting
Authors	Zhizhong Li, Derek Hoiem
Abstract	When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.
Tasks
Published	2016-06-29
URL	http://arxiv.org/abs/1606.09282v3
PDF	http://arxiv.org/pdf/1606.09282v3.pdf
PWC	https://paperswithcode.com/paper/learning-without-forgetting
Repo	https://github.com/wannabeOG/ExpertNet-Pytorch
Framework	pytorch

Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations


Title	Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations
Authors	Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, Ondrej Chum
Abstract	Query expansion is a popular method to improve the quality of image retrieval with both conventional and CNN representations. It has been so far limited to global image similarity. This work focuses on diffusion, a mechanism that captures the image manifold in the feature space. The diffusion is carried out on descriptors of overlapping image regions rather than on a global image descriptor like in previous approaches. An efficient off-line stage allows optional reduction in the number of stored regions. In the on-line stage, the proposed handling of unseen queries in the indexing stage removes additional computation to adjust the precomputed data. We perform diffusion through a sparse linear system solver, yielding practical query times well below one second. Experimentally, we observe a significant boost in performance of image retrieval with compact CNN descriptors on standard benchmarks, especially when the query object covers only a small part of the image. Small objects have been a common failure case of CNN-based retrieval.
Tasks	Image Retrieval
Published	2016-11-16
URL	https://arxiv.org/abs/1611.05113v3
PDF	https://arxiv.org/pdf/1611.05113v3.pdf
PWC	https://paperswithcode.com/paper/efficient-diffusion-on-region-manifolds
Repo	https://github.com/ahmetius/diffusion-retrieval
Framework	none

A Subsequence Interleaving Model for Sequential Pattern Mining


Title	A Subsequence Interleaving Model for Sequential Pattern Mining
Authors	Jaroslav Fowkes, Charles Sutton
Abstract	Recent sequential pattern mining methods have used the minimum description length (MDL) principle to define an encoding scheme which describes an algorithm for mining the most compressing patterns in a database. We present a novel subsequence interleaving model based on a probabilistic model of the sequence database, which allows us to search for the most compressing set of patterns without designing a specific encoding scheme. Our proposed algorithm is able to efficiently mine the most relevant sequential patterns and rank them using an associated measure of interestingness. The efficient inference in our model is a direct result of our use of a structural expectation-maximization framework, in which the expectation-step takes the form of a submodular optimization problem subject to a coverage constraint. We show on both synthetic and real world datasets that our model mines a set of sequential patterns with low spuriousness and redundancy, high interpretability and usefulness in real-world applications. Furthermore, we demonstrate that the quality of the patterns from our approach is comparable to, if not better than, existing state of the art sequential pattern mining algorithms.
Tasks	Sequential Pattern Mining
Published	2016-02-16
URL	http://arxiv.org/abs/1602.05012v2
PDF	http://arxiv.org/pdf/1602.05012v2.pdf
PWC	https://paperswithcode.com/paper/a-subsequence-interleaving-model-for
Repo	https://github.com/mast-group/sequence-mining
Framework	none

Diet Networks: Thin Parameters for Fat Genomics


Title	Diet Networks: Thin Parameters for Fat Genomics
Authors	Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio
Abstract	Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature’s distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.
Tasks
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09340v3
PDF	http://arxiv.org/pdf/1611.09340v3.pdf
PWC	https://paperswithcode.com/paper/diet-networks-thin-parameters-for-fat
Repo	https://github.com/ze1gades/diary
Framework	none

Deep Motif: Visualizing Genomic Sequence Classifications


Title	Deep Motif: Visualizing Genomic Sequence Classifications
Authors	Jack Lanchantin, Ritambhara Singh, Zeming Lin, Yanjun Qi
Abstract	This paper applies a deep convolutional/highway MLP framework to classify genomic sequences on the transcription factor binding site task. To make the model understandable, we propose an optimization driven strategy to extract “motifs”, or symbolic patterns which visualize the positive class learned by the network. We show that our system, Deep Motif (DeMo), extracts motifs that are similar to, and in some cases outperform the current well known motifs. In addition, we find that a deeper model consisting of multiple convolutional and highway layers can outperform a single convolutional and fully connected layer in the previous state-of-the-art.
Tasks
Published	2016-05-04
URL	http://arxiv.org/abs/1605.01133v2
PDF	http://arxiv.org/pdf/1605.01133v2.pdf
PWC	https://paperswithcode.com/paper/deep-motif-visualizing-genomic-sequence
Repo	https://github.com/bakirillov/deepmotif4pytorch
Framework	pytorch

Visualization Regularizers for Neural Network based Image Recognition


Title	Visualization Regularizers for Neural Network based Image Recognition
Authors	Biswajit Paria, Vikas Reddy, Anirban Santara, Pabitra Mitra
Abstract	The success of deep neural networks is mostly due their ability to learn meaningful features from the data. Features learned in the hidden layers of deep neural networks trained in computer vision tasks have been shown to be similar to mid-level vision features. We leverage this fact in this work and propose the visualization regularizer for image tasks. The proposed regularization technique enforces smoothness of the features learned by hidden nodes and turns out to be a special case of Tikhonov regularization. We achieve higher classification accuracy as compared to existing regularizers such as the L2 norm regularizer and dropout, on benchmark datasets without changing the training computational complexity.
Tasks
Published	2016-04-10
URL	http://arxiv.org/abs/1604.02646v3
PDF	http://arxiv.org/pdf/1604.02646v3.pdf
PWC	https://paperswithcode.com/paper/visualization-regularizers-for-neural-network
Repo	https://github.com/biswajitsc/VisRegDL
Framework	none

Evaluating Informal-Domain Word Representations With UrbanDictionary


Title	Evaluating Informal-Domain Word Representations With UrbanDictionary
Authors	Naomi Saphra, Adam Lopez
Abstract	Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums. We want to test whether a representation of informal words fulfills the promise of eliding explicit text normalization as a preprocessing step. One possible evaluation metric for such domains is the proximity of spelling variants. We propose how such a metric might be computed and how a spelling variant dataset can be collected using UrbanDictionary.
Tasks
Published	2016-06-27
URL	http://arxiv.org/abs/1606.08270v1
PDF	http://arxiv.org/pdf/1606.08270v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-informal-domain-word
Repo	https://github.com/nsaphra/urbandic-scraper
Framework	none

DOLDA - a regularized supervised topic model for high-dimensional multi-class regression


Title	DOLDA - a regularized supervised topic model for high-dimensional multi-class regression
Authors	Måns Magnusson, Leif Jonsson, Mattias Villani
Abstract	Generating user interpretable multi-class predictions in data rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle both many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant (DO) probit model (Johndrow et al., 2013) together with an efficient Horseshoe prior for variable selection/shrinkage (Carvalho et al., 2010). We propose a computationally efficient parallel Gibbs sampler for the new model. An important advantage of DOLDA is that learned topics are directly connected to individual classes without the need for a reference class. We evaluate the model’s predictive accuracy on two datasets and demonstrate DOLDA’s advantage in interpreting the generated predictions.
Tasks
Published	2016-01-31
URL	http://arxiv.org/abs/1602.00260v2
PDF	http://arxiv.org/pdf/1602.00260v2.pdf
PWC	https://paperswithcode.com/paper/dolda-a-regularized-supervised-topic-model
Repo	https://github.com/lejon/DiagonalOrthantLDA
Framework	none

Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs


Title	Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs
Authors	Swabha Swayamdipta, Miguel Ballesteros, Chris Dyer, Noah A. Smith
Abstract	We present a transition-based parser that jointly produces syntactic and semantic dependencies. It learns a representation of the entire algorithm state, using stack long short-term memories. Our greedy inference algorithm has linear time, including feature extraction. On the CoNLL 2008–9 English shared tasks, we obtain the best published parsing performance among models that jointly learn syntax and semantics.
Tasks	Semantic Parsing
Published	2016-06-29
URL	http://arxiv.org/abs/1606.08954v2
PDF	http://arxiv.org/pdf/1606.08954v2.pdf
PWC	https://paperswithcode.com/paper/greedy-joint-syntactic-semantic-parsing-with
Repo	https://github.com/clab/joint-lstm-parser
Framework	none