July 29, 2019

2964 words 14 mins read

Paper Group AWR 172

Paper Group AWR 172

Billion-scale similarity search with GPUs. Finite Sample Guarantees for PCA in Non-Isotropic and Data-Dependent Noise. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training. Orthogonal Recurrent Neural Networks with Scaled Cayley Transform. Gr …

Billion-scale similarity search with GPUs

Title Billion-scale similarity search with GPUs
Authors Jeff Johnson, Matthijs Douze, Hervé Jégou
Abstract Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.
Tasks Image Similarity Search, Quantization
Published 2017-02-28
URL http://arxiv.org/abs/1702.08734v1
PDF http://arxiv.org/pdf/1702.08734v1.pdf
PWC https://paperswithcode.com/paper/billion-scale-similarity-search-with-gpus
Repo https://github.com/CoderINusE/unbounded-cache-lm
Framework pytorch

Finite Sample Guarantees for PCA in Non-Isotropic and Data-Dependent Noise

Title Finite Sample Guarantees for PCA in Non-Isotropic and Data-Dependent Noise
Authors Namrata Vaswani, Praneeth Narayanamurthy
Abstract This work obtains novel finite sample guarantees for Principal Component Analysis (PCA). These hold even when the corrupting noise is non-isotropic, and a part (or all of it) is data-dependent. Because of the latter, in general, the noise and the true data are correlated. The results in this work are a significant improvement over those given in our earlier work where this “correlated-PCA” problem was first studied. In fact, in certain regimes, our results imply that the sample complexity required to achieve subspace recovery error that is a constant fraction of the noise level is near-optimal. Useful corollaries of our result include guarantees for PCA in sparse data-dependent noise and for PCA with missing data. An important application of the former is in proving correctness of the subspace update step of a popular online algorithm for dynamic robust PCA.
Tasks
Published 2017-09-19
URL http://arxiv.org/abs/1709.06255v1
PDF http://arxiv.org/pdf/1709.06255v1.pdf
PWC https://paperswithcode.com/paper/finite-sample-guarantees-for-pca-in-non
Repo https://github.com/praneethmurthy/correlated-pca
Framework none

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Title Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Authors Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
Abstract Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain $X$ to a target domain $Y$ in the absence of paired examples. Our goal is to learn a mapping $G: X \rightarrow Y$ such that the distribution of images from $G(X)$ is indistinguishable from the distribution $Y$ using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping $F: Y \rightarrow X$ and introduce a cycle consistency loss to push $F(G(X)) \approx X$ (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
Tasks Image-to-Image Translation, Multimodal Unsupervised Image-To-Image Translation, Style Transfer, Unsupervised Image-To-Image Translation
Published 2017-03-30
URL http://arxiv.org/abs/1703.10593v6
PDF http://arxiv.org/pdf/1703.10593v6.pdf
PWC https://paperswithcode.com/paper/unpaired-image-to-image-translation-using
Repo https://github.com/bareeka/ganproject
Framework tf

CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training

Title CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training
Authors Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis, Sriram Vishwanath
Abstract We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph. We show that adversarial training can be used to learn a generative model with true observational and interventional distributions if the generator architecture is consistent with the given causal graph. We consider the application of generating faces based on given binary labels where the dependency structure between the labels is preserved with a causal graph. This problem can be seen as learning a causal implicit generative model for the image and labels. We devise a two-stage procedure for this problem. First we train a causal implicit generative model over binary labels using a neural network consistent with a causal graph as the generator. We empirically show that WassersteinGAN can be used to output discrete labels. Later, we propose two new conditional GAN architectures, which we call CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained causal implicit generative model for the labels is then a causal implicit generative model over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset.
Tasks Face Generation
Published 2017-09-06
URL http://arxiv.org/abs/1709.02023v2
PDF http://arxiv.org/pdf/1709.02023v2.pdf
PWC https://paperswithcode.com/paper/causalgan-learning-causal-implicit-generative
Repo https://github.com/mkocaoglu/CausalGAN
Framework tf

Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

Title Orthogonal Recurrent Neural Networks with Scaled Cayley Transform
Authors Kyle Helfrich, Devin Willmott, Qiang Ye
Abstract Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform. Such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs.
Tasks
Published 2017-07-29
URL http://arxiv.org/abs/1707.09520v3
PDF http://arxiv.org/pdf/1707.09520v3.pdf
PWC https://paperswithcode.com/paper/orthogonal-recurrent-neural-networks-with
Repo https://github.com/SpartinStuff/scoRNN
Framework tf

Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations

Title Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations
Authors Alberto Bietti, Julien Mairal
Abstract The success of deep convolutional architectures is often attributed in part to their ability to learn multiscale and invariant representations of natural signals. However, a precise study of these properties and how they affect learning guarantees is still missing. In this paper, we consider deep convolutional representations of signals; we study their invariance to translations and to more general groups of transformations, their stability to the action of diffeomorphisms, and their ability to preserve signal information. This analysis is carried by introducing a multilayer kernel based on convolutional kernel networks and by studying the geometry induced by the kernel mapping. We then characterize the corresponding reproducing kernel Hilbert space (RKHS), showing that it contains a large class of convolutional neural networks with homogeneous activation functions. This analysis allows us to separate data representation from learning, and to provide a canonical measure of model complexity, the RKHS norm, which controls both stability and generalization of any learned model. In addition to models in the constructed RKHS, our stability analysis also applies to convolutional networks with generic activations such as rectified linear units, and we discuss its relationship with recent generalization bounds based on spectral norms.
Tasks
Published 2017-06-09
URL http://arxiv.org/abs/1706.03078v4
PDF http://arxiv.org/pdf/1706.03078v4.pdf
PWC https://paperswithcode.com/paper/group-invariance-stability-to-deformations
Repo https://github.com/albietz/ckn_kernel
Framework none

Learning to Skim Text

Title Learning to Skim Text
Authors Adams Wei Yu, Hongrae Lee, Quoc V. Le
Abstract Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering. Despite their promise, many recurrent models have to read the whole text word by word, making it slow to handle long documents. For example, it is difficult to use a recurrent network to read a book and answer questions about it. In this paper, we present an approach of reading text while skipping irrelevant information if needed. The underlying model is a recurrent network that learns how far to jump after reading a few words of the input text. We employ a standard policy gradient method to train the model to make discrete jumping decisions. In our benchmarks on four different tasks, including number prediction, sentiment analysis, news article classification and automatic Q&A, our proposed model, a modified LSTM with jumping, is up to 6 times faster than the standard sequential LSTM, while maintaining the same or even better accuracy.
Tasks Document Classification, Machine Translation, Question Answering, Sentiment Analysis
Published 2017-04-23
URL http://arxiv.org/abs/1704.06877v2
PDF http://arxiv.org/pdf/1704.06877v2.pdf
PWC https://paperswithcode.com/paper/learning-to-skim-text
Repo https://github.com/COMP6248-Reproducability-Challenge/Differentiables_FastAccurateTextClassification
Framework none

OpenML Benchmarking Suites

Title OpenML Benchmarking Suites
Authors Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Frank Hutter, Michel Lang, Rafael G. Mantovani, Jan N. van Rijn, Joaquin Vanschoren
Abstract Machine learning research depends on objectively interpretable, comparable, and reproducible algorithm benchmarks. Therefore, we advocate the use of curated, comprehensive suites of machine learning tasks to standardize the setup, execution, and reporting of benchmarks. We enable this through software tools that help to create and leverage these benchmarking suites. These are seamlessly integrated into the OpenML platform, and accessible through interfaces in Python, Java, and R. OpenML benchmarking suites are (a) easy to use through standardized data formats, APIs, and client libraries; (b) machine-readable, with extensive meta-information on the included datasets; and (c) allow benchmarks to be shared and reused in future studies. We also present a first, carefully curated and practical benchmarking suite for classification: the OpenML Curated Classification benchmarking suite 2018 (OpenML-CC18).
Tasks
Published 2017-08-11
URL https://arxiv.org/abs/1708.03731v2
PDF https://arxiv.org/pdf/1708.03731v2.pdf
PWC https://paperswithcode.com/paper/openml-benchmarking-suites-and-the-openml100
Repo https://github.com/jorisvandenbossche/target-encoder-benchmarks
Framework none

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Title Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
Authors Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, Ji Liu
Abstract Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high communication cost on the central node. Motivated by this, we ask, can decentralized algorithms be faster than its centralized counterpart? Although decentralized PSGD (D-PSGD) algorithms have been studied by the control community, existing analysis and theory do not show any advantage over centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario where only the decentralized network is available. In this paper, we study a D-PSGD algorithm and provide the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent. This is because D-PSGD has comparable total computational complexities to C-PSGD but requires much less communication cost on the busiest node. We further conduct an empirical study to validate our theoretical analysis across multiple frameworks (CNTK and Torch), different network configurations, and computation platforms up to 112 GPUs. On network configurations with low bandwidth or high latency, D-PSGD can be up to one order of magnitude faster than its well-optimized centralized counterparts.
Tasks
Published 2017-05-25
URL http://arxiv.org/abs/1705.09056v5
PDF http://arxiv.org/pdf/1705.09056v5.pdf
PWC https://paperswithcode.com/paper/can-decentralized-algorithms-outperform
Repo https://github.com/facebookresearch/stochastic_gradient_push
Framework pytorch

dna2vec: Consistent vector representations of variable-length k-mers

Title dna2vec: Consistent vector representations of variable-length k-mers
Authors Patrick Ng
Abstract One of the ubiquitous representation of long DNA sequence is dividing it into shorter k-mer components. Unfortunately, the straightforward vector encoding of k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse yet, the distance between any pair of one-hot vectors is equidistant. This is particularly problematic when applying the latest machine learning algorithms to solve problems in biological sequence analysis. In this paper, we propose a novel method to train distributed representations of variable-length k-mers. Our method is based on the popular word embedding model word2vec, which is trained on a shallow two-layer neural network. Our experiments provide evidence that the summing of dna2vec vectors is akin to nucleotides concatenation. We also demonstrate that there is correlation between Needleman-Wunsch similarity score and cosine similarity of dna2vec vectors.
Tasks
Published 2017-01-23
URL http://arxiv.org/abs/1701.06279v1
PDF http://arxiv.org/pdf/1701.06279v1.pdf
PWC https://paperswithcode.com/paper/dna2vec-consistent-vector-representations-of
Repo https://github.com/DamLabResources/journalclub
Framework none

Spinal cord gray matter segmentation using deep dilated convolutions

Title Spinal cord gray matter segmentation using deep dilated convolutions
Authors Christian S. Perone, Evan Calabrese, Julien Cohen-Adad
Abstract Gray matter (GM) tissue changes have been associated with a wide range of neurological disorders and was also recently found relevant as a biomarker for disability in amyotrophic lateral sclerosis. The ability to automatically segment the GM is, therefore, an important task for modern studies of the spinal cord. In this work, we devise a modern, simple and end-to-end fully automated human spinal cord gray matter segmentation method using Deep Learning, that works both on in vivo and ex vivo MRI acquisitions. We evaluate our method against six independently developed methods on a GM segmentation challenge and report state-of-the-art results in 8 out of 10 different evaluation metrics as well as major network parameter reduction when compared to the traditional medical imaging architectures such as U-Nets.
Tasks Medical Image Segmentation
Published 2017-10-02
URL http://arxiv.org/abs/1710.01269v1
PDF http://arxiv.org/pdf/1710.01269v1.pdf
PWC https://paperswithcode.com/paper/spinal-cord-gray-matter-segmentation-using
Repo https://github.com/neuropoly/multiclass-segmentation
Framework pytorch

Faster K-Means Cluster Estimation

Title Faster K-Means Cluster Estimation
Authors Siddhesh Khandelwal, Amit Awekar
Abstract There has been considerable work on improving popular clustering algorithm `K-means’ in terms of mean squared error (MSE) and speed, both. However, most of the k-means variants tend to compute distance of each data point to each cluster centroid for every iteration. We propose a fast heuristic to overcome this bottleneck with only marginal increase in MSE. We observe that across all iterations of K-means, a data point changes its membership only among a small subset of clusters. Our heuristic predicts such clusters for each data point by looking at nearby clusters after the first iteration of k-means. We augment well known variants of k-means with our heuristic to demonstrate effectiveness of our heuristic. For various synthetic and real-world datasets, our heuristic achieves speed-up of up-to 3 times when compared to efficient variants of k-means. |
Tasks
Published 2017-01-17
URL http://arxiv.org/abs/1701.04600v1
PDF http://arxiv.org/pdf/1701.04600v1.pdf
PWC https://paperswithcode.com/paper/faster-k-means-cluster-estimation
Repo https://github.com/siddheshk/Faster-Kmeans
Framework none

Shake-Shake regularization

Title Shake-Shake regularization
Authors Xavier Gastaldi
Abstract The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake
Tasks
Published 2017-05-21
URL http://arxiv.org/abs/1705.07485v2
PDF http://arxiv.org/pdf/1705.07485v2.pdf
PWC https://paperswithcode.com/paper/shake-shake-regularization
Repo https://github.com/motokimura/shake_shake_chainer
Framework pytorch

Action Schema Networks: Generalised Policies with Deep Learning

Title Action Schema Networks: Generalised Policies with Deep Learning
Authors Sam Toyer, Felipe Trevizan, Sylvie Thiébaux, Lexing Xie
Abstract In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weight-sharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the network to be amortised over all problems in that domain. Further, we propose a training method which balances exploration and supervised training on small problems to produce a policy which remains robust when evaluated on larger problems. In experiments, we show that ASNet’s learning capability allows it to significantly outperform traditional non-learning planners in several challenging domains.
Tasks
Published 2017-09-13
URL http://arxiv.org/abs/1709.04271v2
PDF http://arxiv.org/pdf/1709.04271v2.pdf
PWC https://paperswithcode.com/paper/action-schema-networks-generalised-policies
Repo https://github.com/qxcv/asnets
Framework none

Input Fast-Forwarding for Better Deep Learning

Title Input Fast-Forwarding for Better Deep Learning
Authors Ahmed Ibrahim, A. Lynn Abbott, Mohamed E. Hussein
Abstract This paper introduces a new architectural framework, known as input fast-forwarding, that can enhance the performance of deep networks. The main idea is to incorporate a parallel path that sends representations of input values forward to deeper network layers. This scheme is substantially different from “deep supervision” in which the loss layer is re-introduced to earlier layers. The parallel path provided by fast-forwarding enhances the training process in two ways. First, it enables the individual layers to combine higher-level information (from the standard processing path) with lower-level information (from the fast-forward path). Second, this new architecture reduces the problem of vanishing gradients substantially because the fast-forwarding path provides a shorter route for gradient backpropagation. In order to evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet), with 20 convolutional layers along with parallel fast-forward paths, has been created and tested. The paper presents empirical results that demonstrate improved learning capacity of FFNet due to fast-forwarding, as compared to GoogLeNet (with deep supervision) and CaffeNet, which are 4x and 18x larger in size, respectively. All of the source code and deep learning models described in this paper will be made available to the entire research community
Tasks
Published 2017-05-23
URL http://arxiv.org/abs/1705.08479v1
PDF http://arxiv.org/pdf/1705.08479v1.pdf
PWC https://paperswithcode.com/paper/input-fast-forwarding-for-better-deep
Repo https://github.com/aicentral/FFNet
Framework none
comments powered by Disqus