July 29, 2019

2964 words 14 mins read

Paper Group AWR 172

Billion-scale similarity search with GPUs. Finite Sample Guarantees for PCA in Non-Isotropic and Data-Dependent Noise. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training. Orthogonal Recurrent Neural Networks with Scaled Cayley Transform. Gr …

Billion-scale similarity search with GPUs


Title	Billion-scale similarity search with GPUs
Authors	Jeff Johnson, Matthijs Douze, Hervé Jégou
Abstract	Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.
Tasks	Image Similarity Search, Quantization
Published	2017-02-28
URL	http://arxiv.org/abs/1702.08734v1
PDF	http://arxiv.org/pdf/1702.08734v1.pdf
PWC	https://paperswithcode.com/paper/billion-scale-similarity-search-with-gpus
Repo	https://github.com/CoderINusE/unbounded-cache-lm
Framework	pytorch

Finite Sample Guarantees for PCA in Non-Isotropic and Data-Dependent Noise


Title	Finite Sample Guarantees for PCA in Non-Isotropic and Data-Dependent Noise
Authors	Namrata Vaswani, Praneeth Narayanamurthy
Abstract	This work obtains novel finite sample guarantees for Principal Component Analysis (PCA). These hold even when the corrupting noise is non-isotropic, and a part (or all of it) is data-dependent. Because of the latter, in general, the noise and the true data are correlated. The results in this work are a significant improvement over those given in our earlier work where this “correlated-PCA” problem was first studied. In fact, in certain regimes, our results imply that the sample complexity required to achieve subspace recovery error that is a constant fraction of the noise level is near-optimal. Useful corollaries of our result include guarantees for PCA in sparse data-dependent noise and for PCA with missing data. An important application of the former is in proving correctness of the subspace update step of a popular online algorithm for dynamic robust PCA.
Tasks
Published	2017-09-19
URL	http://arxiv.org/abs/1709.06255v1
PDF	http://arxiv.org/pdf/1709.06255v1.pdf
PWC	https://paperswithcode.com/paper/finite-sample-guarantees-for-pca-in-non
Repo	https://github.com/praneethmurthy/correlated-pca
Framework	none

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks


Title	Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Authors	Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
Abstract	Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain $X$ to a target domain $Y$ in the absence of paired examples. Our goal is to learn a mapping $G: X \rightarrow Y$ such that the distribution of images from $G(X)$ is indistinguishable from the distribution $Y$ using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping $F: Y \rightarrow X$ and introduce a cycle consistency loss to push $F(G(X)) \approx X$ (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
Tasks	Image-to-Image Translation, Multimodal Unsupervised Image-To-Image Translation, Style Transfer, Unsupervised Image-To-Image Translation
Published	2017-03-30
URL	http://arxiv.org/abs/1703.10593v6
PDF	http://arxiv.org/pdf/1703.10593v6.pdf
PWC	https://paperswithcode.com/paper/unpaired-image-to-image-translation-using
Repo	https://github.com/bareeka/ganproject
Framework	tf

CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training


Title	CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training
Authors	Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis, Sriram Vishwanath
Abstract	We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph. We show that adversarial training can be used to learn a generative model with true observational and interventional distributions if the generator architecture is consistent with the given causal graph. We consider the application of generating faces based on given binary labels where the dependency structure between the labels is preserved with a causal graph. This problem can be seen as learning a causal implicit generative model for the image and labels. We devise a two-stage procedure for this problem. First we train a causal implicit generative model over binary labels using a neural network consistent with a causal graph as the generator. We empirically show that WassersteinGAN can be used to output discrete labels. Later, we propose two new conditional GAN architectures, which we call CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained causal implicit generative model for the labels is then a causal implicit generative model over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset.
Tasks	Face Generation
Published	2017-09-06
URL	http://arxiv.org/abs/1709.02023v2
PDF	http://arxiv.org/pdf/1709.02023v2.pdf
PWC	https://paperswithcode.com/paper/causalgan-learning-causal-implicit-generative
Repo	https://github.com/mkocaoglu/CausalGAN
Framework	tf

Orthogonal Recurrent Neural Networks with Scaled Cayley Transform


Title	Orthogonal Recurrent Neural Networks with Scaled Cayley Transform
Authors	Kyle Helfrich, Devin Willmott, Qiang Ye
Abstract	Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform. Such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs.
Tasks
Published	2017-07-29
URL	http://arxiv.org/abs/1707.09520v3
PDF	http://arxiv.org/pdf/1707.09520v3.pdf
PWC	https://paperswithcode.com/paper/orthogonal-recurrent-neural-networks-with
Repo	https://github.com/SpartinStuff/scoRNN
Framework	tf

Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations


Title	Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations
Authors	Alberto Bietti, Julien Mairal
Abstract	The success of deep convolutional architectures is often attributed in part to their ability to learn multiscale and invariant representations of natural signals. However, a precise study of these properties and how they affect learning guarantees is still missing. In this paper, we consider deep convolutional representations of signals; we study their invariance to translations and to more general groups of transformations, their stability to the action of diffeomorphisms, and their ability to preserve signal information. This analysis is carried by introducing a multilayer kernel based on convolutional kernel networks and by studying the geometry induced by the kernel mapping. We then characterize the corresponding reproducing kernel Hilbert space (RKHS), showing that it contains a large class of convolutional neural networks with homogeneous activation functions. This analysis allows us to separate data representation from learning, and to provide a canonical measure of model complexity, the RKHS norm, which controls both stability and generalization of any learned model. In addition to models in the constructed RKHS, our stability analysis also applies to convolutional networks with generic activations such as rectified linear units, and we discuss its relationship with recent generalization bounds based on spectral norms.
Tasks
Published	2017-06-09
URL	http://arxiv.org/abs/1706.03078v4
PDF	http://arxiv.org/pdf/1706.03078v4.pdf
PWC	https://paperswithcode.com/paper/group-invariance-stability-to-deformations
Repo	https://github.com/albietz/ckn_kernel
Framework	none

Learning to Skim Text


Title	Learning to Skim Text
Authors	Adams Wei Yu, Hongrae Lee, Quoc V. Le
Abstract	Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering. Despite their promise, many recurrent models have to read the whole text word by word, making it slow to handle long documents. For example, it is difficult to use a recurrent network to read a book and answer questions about it. In this paper, we present an approach of reading text while skipping irrelevant information if needed. The underlying model is a recurrent network that learns how far to jump after reading a few words of the input text. We employ a standard policy gradient method to train the model to make discrete jumping decisions. In our benchmarks on four different tasks, including number prediction, sentiment analysis, news article classification and automatic Q&A, our proposed model, a modified LSTM with jumping, is up to 6 times faster than the standard sequential LSTM, while maintaining the same or even better accuracy.
Tasks	Document Classification, Machine Translation, Question Answering, Sentiment Analysis
Published	2017-04-23
URL	http://arxiv.org/abs/1704.06877v2
PDF	http://arxiv.org/pdf/1704.06877v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-skim-text
Repo	https://github.com/COMP6248-Reproducability-Challenge/Differentiables_FastAccurateTextClassification
Framework	none

OpenML Benchmarking Suites


Title	OpenML Benchmarking Suites
Authors	Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Frank Hutter, Michel Lang, Rafael G. Mantovani, Jan N. van Rijn, Joaquin Vanschoren
Abstract	Machine learning research depends on objectively interpretable, comparable, and reproducible algorithm benchmarks. Therefore, we advocate the use of curated, comprehensive suites of machine learning tasks to standardize the setup, execution, and reporting of benchmarks. We enable this through software tools that help to create and leverage these benchmarking suites. These are seamlessly integrated into the OpenML platform, and accessible through interfaces in Python, Java, and R. OpenML benchmarking suites are (a) easy to use through standardized data formats, APIs, and client libraries; (b) machine-readable, with extensive meta-information on the included datasets; and (c) allow benchmarks to be shared and reused in future studies. We also present a first, carefully curated and practical benchmarking suite for classification: the OpenML Curated Classification benchmarking suite 2018 (OpenML-CC18).
Tasks
Published	2017-08-11
URL	https://arxiv.org/abs/1708.03731v2
PDF	https://arxiv.org/pdf/1708.03731v2.pdf
PWC	https://paperswithcode.com/paper/openml-benchmarking-suites-and-the-openml100
Repo	https://github.com/jorisvandenbossche/target-encoder-benchmarks
Framework	none

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent


Title	Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
Authors	Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, Ji Liu
Abstract	Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high communication cost on the central node. Motivated by this, we ask, can decentralized algorithms be faster than its centralized counterpart? Although decentralized PSGD (D-PSGD) algorithms have been studied by the control community, existing analysis and theory do not show any advantage over centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario where only the decentralized network is available. In this paper, we study a D-PSGD algorithm and provide the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent. This is because D-PSGD has comparable total computational complexities to C-PSGD but requires much less communication cost on the busiest node. We further conduct an empirical study to validate our theoretical analysis across multiple frameworks (CNTK and Torch), different network configurations, and computation platforms up to 112 GPUs. On network configurations with low bandwidth or high latency, D-PSGD can be up to one order of magnitude faster than its well-optimized centralized counterparts.
Tasks
Published	2017-05-25
URL	http://arxiv.org/abs/1705.09056v5
PDF	http://arxiv.org/pdf/1705.09056v5.pdf
PWC	https://paperswithcode.com/paper/can-decentralized-algorithms-outperform
Repo	https://github.com/facebookresearch/stochastic_gradient_push
Framework	pytorch

dna2vec: Consistent vector representations of variable-length k-mers


Title	dna2vec: Consistent vector representations of variable-length k-mers
Authors	Patrick Ng
Abstract	One of the ubiquitous representation of long DNA sequence is dividing it into shorter k-mer components. Unfortunately, the straightforward vector encoding of k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse yet, the distance between any pair of one-hot vectors is equidistant. This is particularly problematic when applying the latest machine learning algorithms to solve problems in biological sequence analysis. In this paper, we propose a novel method to train distributed representations of variable-length k-mers. Our method is based on the popular word embedding model word2vec, which is trained on a shallow two-layer neural network. Our experiments provide evidence that the summing of dna2vec vectors is akin to nucleotides concatenation. We also demonstrate that there is correlation between Needleman-Wunsch similarity score and cosine similarity of dna2vec vectors.
Tasks
Published	2017-01-23
URL	http://arxiv.org/abs/1701.06279v1
PDF	http://arxiv.org/pdf/1701.06279v1.pdf
PWC	https://paperswithcode.com/paper/dna2vec-consistent-vector-representations-of
Repo	https://github.com/DamLabResources/journalclub
Framework	none

Spinal cord gray matter segmentation using deep dilated convolutions


Title	Spinal cord gray matter segmentation using deep dilated convolutions
Authors	Christian S. Perone, Evan Calabrese, Julien Cohen-Adad
Abstract	Gray matter (GM) tissue changes have been associated with a wide range of neurological disorders and was also recently found relevant as a biomarker for disability in amyotrophic lateral sclerosis. The ability to automatically segment the GM is, therefore, an important task for modern studies of the spinal cord. In this work, we devise a modern, simple and end-to-end fully automated human spinal cord gray matter segmentation method using Deep Learning, that works both on in vivo and ex vivo MRI acquisitions. We evaluate our method against six independently developed methods on a GM segmentation challenge and report state-of-the-art results in 8 out of 10 different evaluation metrics as well as major network parameter reduction when compared to the traditional medical imaging architectures such as U-Nets.
Tasks	Medical Image Segmentation
Published	2017-10-02
URL	http://arxiv.org/abs/1710.01269v1
PDF	http://arxiv.org/pdf/1710.01269v1.pdf
PWC	https://paperswithcode.com/paper/spinal-cord-gray-matter-segmentation-using
Repo	https://github.com/neuropoly/multiclass-segmentation
Framework	pytorch

Faster K-Means Cluster Estimation


Title	Faster K-Means Cluster Estimation
Authors	Siddhesh Khandelwal, Amit Awekar
Abstract	There has been considerable work on improving popular clustering algorithm `K-means’ in terms of mean squared error (MSE) and speed, both. However, most of the k-means variants tend to compute distance of each data point to each cluster centroid for every iteration. We propose a fast heuristic to overcome this bottleneck with only marginal increase in MSE. We observe that across all iterations of K-means, a data point changes its membership only among a small subset of clusters. Our heuristic predicts such clusters for each data point by looking at nearby clusters after the first iteration of k-means. We augment well known variants of k-means with our heuristic to demonstrate effectiveness of our heuristic. For various synthetic and real-world datasets, our heuristic achieves speed-up of up-to 3 times when compared to efficient variants of k-means. \|
Tasks
Published	2017-01-17
URL	http://arxiv.org/abs/1701.04600v1
PDF	http://arxiv.org/pdf/1701.04600v1.pdf
PWC	https://paperswithcode.com/paper/faster-k-means-cluster-estimation
Repo	https://github.com/siddheshk/Faster-Kmeans
Framework	none

Shake-Shake regularization


Title	Shake-Shake regularization
Authors	Xavier Gastaldi
Abstract	The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake
Tasks
Published	2017-05-21
URL	http://arxiv.org/abs/1705.07485v2
PDF	http://arxiv.org/pdf/1705.07485v2.pdf
PWC	https://paperswithcode.com/paper/shake-shake-regularization
Repo	https://github.com/motokimura/shake_shake_chainer
Framework	pytorch

Action Schema Networks: Generalised Policies with Deep Learning


Title	Action Schema Networks: Generalised Policies with Deep Learning
Authors	Sam Toyer, Felipe Trevizan, Sylvie Thiébaux, Lexing Xie
Abstract	In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weight-sharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the network to be amortised over all problems in that domain. Further, we propose a training method which balances exploration and supervised training on small problems to produce a policy which remains robust when evaluated on larger problems. In experiments, we show that ASNet’s learning capability allows it to significantly outperform traditional non-learning planners in several challenging domains.
Tasks
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04271v2
PDF	http://arxiv.org/pdf/1709.04271v2.pdf
PWC	https://paperswithcode.com/paper/action-schema-networks-generalised-policies
Repo	https://github.com/qxcv/asnets
Framework	none

Input Fast-Forwarding for Better Deep Learning


Title	Input Fast-Forwarding for Better Deep Learning
Authors	Ahmed Ibrahim, A. Lynn Abbott, Mohamed E. Hussein
Abstract	This paper introduces a new architectural framework, known as input fast-forwarding, that can enhance the performance of deep networks. The main idea is to incorporate a parallel path that sends representations of input values forward to deeper network layers. This scheme is substantially different from “deep supervision” in which the loss layer is re-introduced to earlier layers. The parallel path provided by fast-forwarding enhances the training process in two ways. First, it enables the individual layers to combine higher-level information (from the standard processing path) with lower-level information (from the fast-forward path). Second, this new architecture reduces the problem of vanishing gradients substantially because the fast-forwarding path provides a shorter route for gradient backpropagation. In order to evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet), with 20 convolutional layers along with parallel fast-forward paths, has been created and tested. The paper presents empirical results that demonstrate improved learning capacity of FFNet due to fast-forwarding, as compared to GoogLeNet (with deep supervision) and CaffeNet, which are 4x and 18x larger in size, respectively. All of the source code and deep learning models described in this paper will be made available to the entire research community
Tasks
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08479v1
PDF	http://arxiv.org/pdf/1705.08479v1.pdf
PWC	https://paperswithcode.com/paper/input-fast-forwarding-for-better-deep
Repo	https://github.com/aicentral/FFNet
Framework	none