July 29, 2019

3463 words 17 mins read

Paper Group AWR 102

Unbounded cache model for online language modeling with open vocabulary. Domain Adaptation for Visual Applications: A Comprehensive Survey. Adversarial Discriminative Domain Adaptation. Parallel Streaming Wasserstein Barycenters. PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs. Global versus Localized Generative Advers …

Unbounded cache model for online language modeling with open vocabulary


Title	Unbounded cache model for online language modeling with open vocabulary
Authors	Edouard Grave, Moustapha Cisse, Armand Joulin
Abstract	Recently, continuous cache models were proposed as extensions to recurrent neural network language models, to adapt their predictions to local changes in the data distribution. These models only capture the local context, of up to a few thousands tokens. In this paper, we propose an extension of continuous cache models, which can scale to larger contexts. In particular, we use a large scale non-parametric memory component that stores all the hidden activations seen in the past. We leverage recent advances in approximate nearest neighbor search and quantization algorithms to store millions of representations while searching them efficiently. We conduct extensive experiments showing that our approach significantly improves the perplexity of pre-trained language models on new distributions, and can scale efficiently to much larger contexts than previously proposed local cache models.
Tasks	Language Modelling, Quantization
Published	2017-11-07
URL	http://arxiv.org/abs/1711.02604v1
PDF	http://arxiv.org/pdf/1711.02604v1.pdf
PWC	https://paperswithcode.com/paper/unbounded-cache-model-for-online-language
Repo	https://github.com/CoderINusE/unbounded-cache-lm
Framework	pytorch

Domain Adaptation for Visual Applications: A Comprehensive Survey


Title	Domain Adaptation for Visual Applications: A Comprehensive Survey
Authors	Gabriela Csurka
Abstract	The aim of this paper is to give an overview of domain adaptation and transfer learning with a specific view on visual applications. After a general motivation, we first position domain adaptation in the larger transfer learning problem. Second, we try to address and analyze briefly the state-of-the-art methods for different types of scenarios, first describing the historical shallow methods, addressing both the homogeneous and the heterogeneous domain adaptation methods. Third, we discuss the effect of the success of deep convolutional architectures which led to new type of domain adaptation methods that integrate the adaptation within the deep architecture. Fourth, we overview the methods that go beyond image categorization, such as object detection or image segmentation, video analyses or learning visual attributes. Finally, we conclude the paper with a section where we relate domain adaptation to other machine learning solutions.
Tasks	Domain Adaptation, Image Categorization, Object Detection, Semantic Segmentation, Transfer Learning
Published	2017-02-17
URL	http://arxiv.org/abs/1702.05374v2
PDF	http://arxiv.org/pdf/1702.05374v2.pdf
PWC	https://paperswithcode.com/paper/domain-adaptation-for-visual-applications-a
Repo	https://github.com/zxlzr/Deep-Transfer-Learning
Framework	tf

Adversarial Discriminative Domain Adaptation


Title	Adversarial Discriminative Domain Adaptation
Authors	Eric Tzeng, Judy Hoffman, Kate Saenko, Trevor Darrell
Abstract	Adversarial learning methods are a promising approach to training robust deep networks, and can generate complex samples across diverse domains. They also can improve recognition despite the presence of domain shift or dataset bias: several adversarial approaches to unsupervised domain adaptation have recently been introduced, which reduce the difference between the training and test domain distributions and thus improve generalization performance. Prior generative approaches show compelling visualizations, but are not optimal on discriminative tasks and can be limited to smaller shifts. Prior discriminative approaches could handle larger domain shifts, but imposed tied weights on the model and did not exploit a GAN-based loss. We first outline a novel generalized framework for adversarial adaptation, which subsumes recent state-of-the-art approaches as special cases, and we use this generalized view to better relate the prior approaches. We propose a previously unexplored instance of our general framework which combines discriminative modeling, untied weight sharing, and a GAN loss, which we call Adversarial Discriminative Domain Adaptation (ADDA). We show that ADDA is more effective yet considerably simpler than competing domain-adversarial methods, and demonstrate the promise of our approach by exceeding state-of-the-art unsupervised adaptation results on standard cross-domain digit classification tasks and a new more difficult cross-modality object classification task.
Tasks	Domain Adaptation, Object Classification, Unsupervised Domain Adaptation, Unsupervised Image-To-Image Translation
Published	2017-02-17
URL	http://arxiv.org/abs/1702.05464v1
PDF	http://arxiv.org/pdf/1702.05464v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-discriminative-domain-adaptation
Repo	https://github.com/Fujiki-Nakamura/ADDA.PyTorch
Framework	pytorch

Parallel Streaming Wasserstein Barycenters


Title	Parallel Streaming Wasserstein Barycenters
Authors	Matthew Staib, Sebastian Claici, Justin Solomon, Stefanie Jegelka
Abstract	Efficiently aggregating data from different sources is a challenging problem, particularly when samples from each source are distributed differently. These differences can be inherent to the inference task or present for other reasons: sensors in a sensor network may be placed far apart, affecting their individual measurements. Conversely, it is computationally advantageous to split Bayesian inference tasks across subsets of data, but data need not be identically distributed across subsets. One principled way to fuse probability distributions is via the lens of optimal transport: the Wasserstein barycenter is a single distribution that summarizes a collection of input measures while respecting their geometry. However, computing the barycenter scales poorly and requires discretization of all input distributions and the barycenter itself. Improving on this situation, we present a scalable, communication-efficient, parallel algorithm for computing the Wasserstein barycenter of arbitrary distributions. Our algorithm can operate directly on continuous input distributions and is optimized for streaming data. Our method is even robust to nonstationary input distributions and produces a barycenter estimate that tracks the input measures over time. The algorithm is semi-discrete, needing to discretize only the barycenter estimate. To the best of our knowledge, we also provide the first bounds on the quality of the approximate barycenter as the discretization becomes finer. Finally, we demonstrate the practical effectiveness of our method, both in tracking moving distributions on a sphere, as well as in a large-scale Bayesian inference task.
Tasks	Bayesian Inference
Published	2017-05-21
URL	http://arxiv.org/abs/1705.07443v2
PDF	http://arxiv.org/pdf/1705.07443v2.pdf
PWC	https://paperswithcode.com/paper/parallel-streaming-wasserstein-barycenters
Repo	https://github.com/mstaib/stochastic-barycenter-code
Framework	none

PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs


Title	PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs
Authors	Stephanie M. Lukin, Kevin Bowden, Casey Barackman, Marilyn A. Walker
Abstract	We present a new corpus, PersonaBank, consisting of 108 personal stories from weblogs that have been annotated with their Story Intention Graphs, a deep representation of the fabula of a story. We describe the topics of the stories and the basis of the Story Intention Graph representation, as well as the process of annotating the stories to produce the Story Intention Graphs and the challenges of adapting the tool to this new personal narrative domain We also discuss how the corpus can be used in applications that retell the story using different styles of tellings, co-tellings, or as a content planner.
Tasks
Published	2017-08-30
URL	http://arxiv.org/abs/1708.09082v1
PDF	http://arxiv.org/pdf/1708.09082v1.pdf
PWC	https://paperswithcode.com/paper/personabank-a-corpus-of-personal-narratives
Repo	https://github.com/ShivamGaurUQ/Automated-hashtag-generation-using-Deep-Learning
Framework	pytorch

Global versus Localized Generative Adversarial Nets


Title	Global versus Localized Generative Adversarial Nets
Authors	Guo-Jun Qi, Liheng Zhang, Hao Hu, Marzieh Edraki, Jingdong Wang, Xian-Sheng Hua
Abstract	In this paper, we present a novel localized Generative Adversarial Net (GAN) to learn on the manifold of real data. Compared with the classic GAN that {\em globally} parameterizes a manifold, the Localized GAN (LGAN) uses local coordinate charts to parameterize distinct local geometry of how data points can transform at different locations on the manifold. Specifically, around each point there exists a {\em local} generator that can produce data following diverse patterns of transformations on the manifold. The locality nature of LGAN enables local generators to adapt to and directly access the local geometry without need to invert the generator in a global GAN. Furthermore, it can prevent the manifold from being locally collapsed to a dimensionally deficient tangent subspace by imposing an orthonormality prior between tangents. This provides a geometric approach to alleviating mode collapse at least locally on the manifold by imposing independence between data transformations in different tangent directions. We will also demonstrate the LGAN can be applied to train a robust classifier that prefers locally consistent classification decisions on the manifold, and the resultant regularizer is closely related with the Laplace-Beltrami operator. Our experiments show that the proposed LGANs can not only produce diverse image transformations, but also deliver superior classification performances.
Tasks
Published	2017-11-16
URL	http://arxiv.org/abs/1711.06020v2
PDF	http://arxiv.org/pdf/1711.06020v2.pdf
PWC	https://paperswithcode.com/paper/global-versus-localized-generative
Repo	https://github.com/z331565360/Localized-GAN
Framework	pytorch

Practical Hash Functions for Similarity Estimation and Dimensionality Reduction


Title	Practical Hash Functions for Similarity Estimation and Dimensionality Reduction
Authors	Søren Dahlgaard, Mathias Bæk Tejs Knudsen, Mikkel Thorup
Abstract	Hashing is a basic tool for dimensionality reduction employed in several aspects of machine learning. However, the perfomance analysis is often carried out under the abstract assumption that a truly random unit cost hash function is used, without concern for which concrete hash function is employed. The concrete hash function may work fine on sufficiently random input. The question is if it can be trusted in the real world when faced with more structured input. In this paper we focus on two prominent applications of hashing, namely similarity estimation with the one permutation hashing (OPH) scheme of Li et al. [NIPS’12] and feature hashing (FH) of Weinberger et al. [ICML’09], both of which have found numerous applications, i.e. in approximate near-neighbour search with LSH and large-scale classification with SVM. We consider mixed tabulation hashing of Dahlgaard et al.[FOCS’15] which was proved to perform like a truly random hash function in many applications, including OPH. Here we first show improved concentration bounds for FH with truly random hashing and then argue that mixed tabulation performs similar for sparse input. Our main contribution, however, is an experimental comparison of different hashing schemes when used inside FH, OPH, and LSH. We find that mixed tabulation hashing is almost as fast as the multiply-mod-prime scheme ax+b mod p. Mutiply-mod-prime is guaranteed to work well on sufficiently random data, but we demonstrate that in the above applications, it can lead to bias and poor concentration on both real-world and synthetic data. We also compare with the popular MurmurHash3, which has no proven guarantees. Mixed tabulation and MurmurHash3 both perform similar to truly random hashing in our experiments. However, mixed tabulation is 40% faster than MurmurHash3, and it has the proven guarantee of good performance on all possible input.
Tasks	Dimensionality Reduction
Published	2017-11-23
URL	http://arxiv.org/abs/1711.08797v1
PDF	http://arxiv.org/pdf/1711.08797v1.pdf
PWC	https://paperswithcode.com/paper/practical-hash-functions-for-similarity
Repo	https://github.com/zera/Nips_MT
Framework	none

End-to-end Trained CNN Encode-Decoder Networks for Image Steganography


Title	End-to-end Trained CNN Encode-Decoder Networks for Image Steganography
Authors	Atique ur Rehman, Rafia Rahim, M Shahroz Nadeem, Sibt ul Hussain
Abstract	All the existing image steganography methods use manually crafted features to hide binary payloads into cover images. This leads to small payload capacity and image distortion. Here we propose a convolutional neural network based encoder-decoder architecture for embedding of images as payload. To this end, we make following three major contributions: (i) we propose a deep learning based generic encoder-decoder architecture for image steganography; (ii) we introduce a new loss function that ensures joint end-to-end training of encoder-decoder networks; (iii) we perform extensive empirical evaluation of proposed architecture on a range of challenging publicly available datasets (MNIST, CIFAR10, PASCAL-VOC12, ImageNet, LFW) and report state-of-the-art payload capacity at high PSNR and SSIM values.
Tasks	Image Steganography
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07201v1
PDF	http://arxiv.org/pdf/1711.07201v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-trained-cnn-encode-decoder
Repo	https://github.com/saadzia10/Steganography-Deep-Learning
Framework	tf

Efficient Processing of Deep Neural Networks: A Tutorial and Survey


Title	Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Authors	Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel Emer
Abstract	Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.
Tasks	Speech Recognition
Published	2017-03-27
URL	http://arxiv.org/abs/1703.09039v2
PDF	http://arxiv.org/pdf/1703.09039v2.pdf
PWC	https://paperswithcode.com/paper/efficient-processing-of-deep-neural-networks
Repo	https://github.com/Dhananjayadmd/DNN_MP
Framework	none

A General and Adaptive Robust Loss Function


Title	A General and Adaptive Robust Loss Function
Authors	Jonathan T. Barron
Abstract	We present a generalization of the Cauchy/Lorentzian, Geman-McClure, Welsch/Leclerc, generalized Charbonnier, Charbonnier/pseudo-Huber/L1-L2, and L2 loss functions. By introducing robustness as a continuous parameter, our loss function allows algorithms built around robust loss minimization to be generalized, which improves performance on basic vision tasks such as registration and clustering. Interpreting our loss as the negative log of a univariate density yields a general probability distribution that includes normal and Cauchy distributions as special cases. This probabilistic interpretation enables the training of neural networks in which the robustness of the loss automatically adapts itself during training, which improves performance on learning-based tasks such as generative image synthesis and unsupervised monocular depth estimation, without requiring any manual parameter tuning.
Tasks	Depth Estimation, Image Generation, Monocular Depth Estimation
Published	2017-01-11
URL	http://arxiv.org/abs/1701.03077v10
PDF	http://arxiv.org/pdf/1701.03077v10.pdf
PWC	https://paperswithcode.com/paper/a-general-and-adaptive-robust-loss-function
Repo	https://github.com/jonbarron/robust_loss_pytorch
Framework	pytorch

A Random Matrix Approach to Neural Networks


Title	A Random Matrix Approach to Neural Networks
Authors	Cosme Louart, Zhenyu Liao, Romain Couillet
Abstract	This article studies the Gram random matrix model $G=\frac1T\Sigma^{\rm T}\Sigma$, $\Sigma=\sigma(WX)$, classically found in the analysis of random feature maps and random neural networks, where $X=[x_1,\ldots,x_T]\in{\mathbb R}^{p\times T}$ is a (data) matrix of bounded norm, $W\in{\mathbb R}^{n\times p}$ is a matrix of independent zero-mean unit variance entries, and $\sigma:{\mathbb R}\to{\mathbb R}$ is a Lipschitz continuous (activation) function — $\sigma(WX)$ being understood entry-wise. By means of a key concentration of measure lemma arising from non-asymptotic random matrix arguments, we prove that, as $n,p,T$ grow large at the same rate, the resolvent $Q=(G+\gamma I_T)^{-1}$, for $\gamma>0$, has a similar behavior as that met in sample covariance matrix models, involving notably the moment $\Phi=\frac{T}n{\mathbb E}[G]$, which provides in passing a deterministic equivalent for the empirical spectral measure of $G$. Application-wise, this result enables the estimation of the asymptotic performance of single-layer random neural networks. This in turn provides practical insights into the underlying mechanisms into play in random neural networks, entailing several unexpected consequences, as well as a fast practical means to tune the network hyperparameters.
Tasks
Published	2017-02-17
URL	http://arxiv.org/abs/1702.05419v2
PDF	http://arxiv.org/pdf/1702.05419v2.pdf
PWC	https://paperswithcode.com/paper/a-random-matrix-approach-to-neural-networks
Repo	https://github.com/Zhenyu-LIAO/RMT4ELM
Framework	none

Sample-Efficient Algorithms for Recovering Structured Signals from Magnitude-Only Measurements


Title	Sample-Efficient Algorithms for Recovering Structured Signals from Magnitude-Only Measurements
Authors	Gauri Jagatap, Chinmay Hegde
Abstract	We consider the problem of recovering a signal $\mathbf{x}^* \in \mathbf{R}^n$, from magnitude-only measurements $y_i = \left\langle\mathbf{a}_i,\mathbf{x}^\right\rangle$ for $i=[m]$. Also called the phase retrieval, this is a fundamental challenge in bio-,astronomical imaging and speech processing. The problem above is ill-posed; additional assumptions on the signal and/or the measurements are necessary. In this paper we first study the case where the signal $\mathbf{x}^$ is $s$-sparse. We develop a novel algorithm that we call Compressive Phase Retrieval with Alternating Minimization, or CoPRAM. Our algorithm is simple; it combines the classical alternating minimization approach for phase retrieval with the CoSaMP algorithm for sparse recovery. Despite its simplicity, we prove that CoPRAM achieves a sample complexity of $O(s^2\log n)$ with Gaussian measurements $\mathbf{a}_i$, matching the best known existing results; moreover, it demonstrates linear convergence in theory and practice. Additionally, it requires no extra tuning parameters other than signal sparsity $s$ and is robust to noise. When the sorted coefficients of the sparse signal exhibit a power law decay, we show that CoPRAM achieves a sample complexity of $O(s\log n)$, which is close to the information-theoretic limit. We also consider the case where the signal $\mathbf{x}^*$ arises from structured sparsity models. We specifically examine the case of block-sparse signals with uniform block size of $b$ and block sparsity $k=s/b$. For this problem, we design a recovery algorithm Block CoPRAM that further reduces the sample complexity to $O(ks\log n)$. For sufficiently large block lengths of $b=\Theta(s)$, this bound equates to $O(s\log n)$. To our knowledge, this constitutes the first end-to-end algorithm for phase retrieval where the Gaussian sample complexity has a sub-quadratic dependence on the signal sparsity level.
Tasks
Published	2017-05-18
URL	http://arxiv.org/abs/1705.06412v2
PDF	http://arxiv.org/pdf/1705.06412v2.pdf
PWC	https://paperswithcode.com/paper/sample-efficient-algorithms-for-recovering
Repo	https://github.com/Jay-Lewis/phase_retrieval
Framework	none

Coordinating Filters for Faster Deep Neural Networks


Title	Coordinating Filters for Faster Deep Neural Networks
Authors	Wei Wen, Cong Xu, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li
Abstract	Very large-scale Deep Neural Networks (DNNs) have achieved remarkable successes in a large variety of computer vision tasks. However, the high computation intensity of DNNs makes it challenging to deploy these models on resource-limited systems. Some studies used low-rank approaches that approximate the filters by low-rank basis to accelerate the testing. Those works directly decomposed the pre-trained DNNs by Low-Rank Approximations (LRA). How to train DNNs toward lower-rank space for more efficient DNNs, however, remains as an open area. To solve the issue, in this work, we propose Force Regularization, which uses attractive forces to enforce filters so as to coordinate more weight information into lower-rank space. We mathematically and empirically verify that after applying our technique, standard LRA methods can reconstruct filters using much lower basis and thus result in faster DNNs. The effectiveness of our approach is comprehensively evaluated in ResNets, AlexNet, and GoogLeNet. In AlexNet, for example, Force Regularization gains 2x speedup on modern GPU without accuracy loss and 4.05x speedup on CPU by paying small accuracy degradation. Moreover, Force Regularization better initializes the low-rank DNNs such that the fine-tuning can converge faster toward higher accuracy. The obtained lower-rank DNNs can be further sparsified, proving that Force Regularization can be integrated with state-of-the-art sparsity-based acceleration methods. Source code is available in https://github.com/wenwei202/caffe
Tasks
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09746v3
PDF	http://arxiv.org/pdf/1703.09746v3.pdf
PWC	https://paperswithcode.com/paper/coordinating-filters-for-faster-deep-neural
Repo	https://github.com/Lanselott/FM_caffe
Framework	none

PacGAN: The power of two samples in generative adversarial networks


Title	PacGAN: The power of two samples in generative adversarial networks
Authors	Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh
Abstract	Generative adversarial networks (GANs) are innovative techniques for learning generative models of complex data distributions from samples. Despite remarkable recent improvements in generating realistic images, one of their major shortcomings is the fact that in practice, they tend to produce samples with little diversity, even when trained on diverse datasets. This phenomenon, known as mode collapse, has been the main focus of several recent advances in GANs. Yet there is little understanding of why mode collapse happens and why existing approaches are able to mitigate mode collapse. We propose a principled approach to handling mode collapse, which we call packing. The main idea is to modify the discriminator to make decisions based on multiple samples from the same class, either real or artificially generated. We borrow analysis tools from binary hypothesis testing—in particular the seminal result of Blackwell [Bla53]—to prove a fundamental connection between packing and mode collapse. We show that packing naturally penalizes generators with mode collapse, thereby favoring generator distributions with less mode collapse during the training process. Numerical experiments on benchmark datasets suggests that packing provides significant improvements in practice as well.
Tasks
Published	2017-12-12
URL	http://arxiv.org/abs/1712.04086v3
PDF	http://arxiv.org/pdf/1712.04086v3.pdf
PWC	https://paperswithcode.com/paper/pacgan-the-power-of-two-samples-in-generative
Repo	https://github.com/alex98chen/testGAN
Framework	tf

Robust Unsupervised Domain Adaptation for Neural Networks via Moment Alignment


Title	Robust Unsupervised Domain Adaptation for Neural Networks via Moment Alignment
Authors	Werner Zellinger, Bernhard A. Moser, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, Susanne Saminger-Platz
Abstract	A novel approach for unsupervised domain adaptation for neural networks is proposed. It relies on metric-based regularization of the learning process. The metric-based regularization aims at domain-invariant latent feature representations by means of maximizing the similarity between domain-specific activation distributions. The proposed metric results from modifying an integral probability metric such that it becomes less translation-sensitive on a polynomial function space. The metric has an intuitive interpretation in the dual space as the sum of differences of higher order central moments of the corresponding activation distributions. Under appropriate assumptions on the input distributions, error minimization is proven for the continuous case. As demonstrated by an analysis of standard benchmark experiments for sentiment analysis, object recognition and digit recognition, the outlined approach is robust regarding parameter changes and achieves higher classification accuracies than comparable approaches. The source code is available at https://github.com/wzell/mann.
Tasks	Domain Adaptation, Object Recognition, Sentiment Analysis, Unsupervised Domain Adaptation
Published	2017-11-16
URL	https://arxiv.org/abs/1711.06114v4
PDF	https://arxiv.org/pdf/1711.06114v4.pdf
PWC	https://paperswithcode.com/paper/robust-unsupervised-domain-adaptation-for
Repo	https://github.com/wzell/cmd
Framework	none