January 26, 2020

3112 words 15 mins read

Paper Group ANR 1347

The intriguing role of module criticality in the generalization of deep networks. Revisiting CycleGAN for semi-supervised segmentation. Dirichlet Variational Autoencoder. GAN2GAN: Generative Noise Learning for Blind Image Denoising with Single Noisy Images. SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recogniti …

The intriguing role of module criticality in the generalization of deep networks


Title	The intriguing role of module criticality in the generalization of deep networks
Authors	Niladri S. Chatterji, Behnam Neyshabur, Hanie Sedghi
Abstract	We study the phenomenon that some modules of deep neural networks (DNNs) are more critical than others. Meaning that rewinding their parameter values back to initialization, while keeping other modules fixed at the trained parameters, results in a large drop in the network’s performance. Our analysis reveals interesting properties of the loss landscape which leads us to propose a complexity measure, called module criticality, based on the shape of the valleys that connects the initial and final values of the module parameters. We formulate how generalization relates to the module criticality, and show that this measure is able to explain the superior generalization performance of some architectures over others, whereas earlier measures fail to do so.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00528v3
PDF	https://arxiv.org/pdf/1912.00528v3.pdf
PWC	https://paperswithcode.com/paper/the-intriguing-role-of-module-criticality-in-1
Repo
Framework

Revisiting CycleGAN for semi-supervised segmentation


Title	Revisiting CycleGAN for semi-supervised segmentation
Authors	Arnab Kumar Mondal, Aniket Agarwal, Jose Dolz, Christian Desrosiers
Abstract	In this work, we study the problem of training deep networks for semantic image segmentation using only a fraction of annotated images, which may significantly reduce human annotation efforts. Particularly, we propose a strategy that exploits the unpaired image style transfer capabilities of CycleGAN in semi-supervised segmentation. Unlike recent works using adversarial learning for semi-supervised segmentation, we enforce cycle consistency to learn a bidirectional mapping between unpaired images and segmentation masks. This adds an unsupervised regularization effect that boosts the segmentation performance when annotated data is limited. Experiments on three different public segmentation benchmarks (PASCAL VOC 2012, Cityscapes and ACDC) demonstrate the effectiveness of the proposed method. The proposed model achieves 2-4% of improvement with respect to the baseline and outperforms recent approaches for this task, particularly in low labeled data regime.
Tasks	Semantic Segmentation, Style Transfer
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11569v1
PDF	https://arxiv.org/pdf/1908.11569v1.pdf
PWC	https://paperswithcode.com/paper/revisiting-cyclegan-for-semi-supervised
Repo
Framework

Dirichlet Variational Autoencoder


Title	Dirichlet Variational Autoencoder
Authors	Weonyoung Joo, Wonsung Lee, Sungrae Park, Il-Chul Moon
Abstract	This paper proposes Dirichlet Variational Autoencoder (DirVAE) using a Dirichlet prior for a continuous latent variable that exhibits the characteristic of the categorical probabilities. To infer the parameters of DirVAE, we utilize the stochastic gradient method by approximating the Gamma distribution, which is a component of the Dirichlet distribution, with the inverse Gamma CDF approximation. Additionally, we reshape the component collapsing issue by investigating two problem sources, which are decoder weight collapsing and latent value collapsing, and we show that DirVAE has no component collapsing; while Gaussian VAE exhibits the decoder weight collapsing and Stick-Breaking VAE shows the latent value collapsing. The experimental results show that 1) DirVAE models the latent representation result with the best log-likelihood compared to the baselines; and 2) DirVAE produces more interpretable latent values with no collapsing issues which the baseline models suffer from. Also, we show that the learned latent representation from the DirVAE achieves the best classification accuracy in the semi-supervised and the supervised classification tasks on MNIST, OMNIGLOT, and SVHN compared to the baseline VAEs. Finally, we demonstrated that the DirVAE augmented topic models show better performances in most cases.
Tasks	Omniglot, Topic Models
Published	2019-01-09
URL	http://arxiv.org/abs/1901.02739v1
PDF	http://arxiv.org/pdf/1901.02739v1.pdf
PWC	https://paperswithcode.com/paper/dirichlet-variational-autoencoder
Repo
Framework


Title	GAN2GAN: Generative Noise Learning for Blind Image Denoising with Single Noisy Images
Authors	Sungmin Cha, Taesup Moon
Abstract	We tackle a challenging blind image denoising problem, in which only single distinct noisy images are available for training a denoiser, and no information about noise is known, except for it being zero-mean, additive, and independent of the clean image. In such a setting, which often occurs in practice, it is not possible to train a denoiser with the standard discriminative training or with the recently developed Noise2Noise (N2N) training; the former requires the underlying clean image for the given noisy image, and the latter requires two independently realized noisy image pair for a clean image. To that end, we propose GAN2GAN (Generated-Artificial-Noise to Generated-Artificial-Noise) method that first learns a generative model that can synthesize noisy image pairs based on simulating independent realizations of the noise in given single noisy images, then iteratively trains a denoiser with those synthesized pairs, as in the N2N training. In results, we show the denoiser trained with our GAN2GAN method for the blind denoising setting achieves an impressive denoising performance; it almost approaches the performance of the standard discriminatively-trained or N2N-trained models that have more information than ours, and significantly outperforms the recent baseline for the same setting, i.e., Noise2Void, and a more conventional yet strong one, BM3D.
Tasks	Denoising, Image Denoising
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10488v2
PDF	https://arxiv.org/pdf/1905.10488v2.pdf
PWC	https://paperswithcode.com/paper/gan2gan-generative-noise-learning-for-blind
Repo
Framework

SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition


Title	SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition
Authors	Zhen Huang, Tim Ng, Leo Liu, Henry Mason, Xiaodan Zhuang, Daben Liu
Abstract	Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self- Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet- 50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.
Tasks	Speech Recognition
Published	2019-10-04
URL	https://arxiv.org/abs/1910.01992v3
PDF	https://arxiv.org/pdf/1910.01992v3.pdf
PWC	https://paperswithcode.com/paper/sndcnn-self-normalizing-deep-cnns-with-scaled
Repo
Framework

LISA: Towards Learned DNA Sequence Search


Title	LISA: Towards Learned DNA Sequence Search
Authors	Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md, Tim Kraska
Abstract	Next-generation sequencing (NGS) technologies have enabled affordable sequencing of billions of short DNA fragments at high throughput, paving the way for population-scale genomics. Genomics data analytics at this scale requires overcoming performance bottlenecks, such as searching for short DNA sequences over long reference sequences. In this paper, we introduce LISA (Learned Indexes for Sequence Analysis), a novel learning-based approach to DNA sequence search. As a first proof of concept, we focus on accelerating one of the most essential flavors of the problem, called exact search. LISA builds on and extends FM-index, which is the state-of-the-art technique widely deployed in genomics tool-chains. Initial experiments with human genome datasets indicate that LISA achieves up to a factor of 4X performance speedup against its traditional counterpart.
Tasks
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04728v1
PDF	https://arxiv.org/pdf/1910.04728v1.pdf
PWC	https://paperswithcode.com/paper/lisa-towards-learned-dna-sequence-search
Repo
Framework

Attending to Future Tokens For Bidirectional Sequence Generation


Title	Attending to Future Tokens For Bidirectional Sequence Generation
Authors	Carolin Lawrence, Bhushan Kotnis, Mathias Niepert
Abstract	Neural sequence generation is typically performed token-by-token and left-to-right. Whenever a token is generated only previously produced tokens are taken into consideration. In contrast, for problems such as sequence classification, bidirectional attention, which takes both past and future tokens into consideration, has been shown to perform much better. We propose to make the sequence generation process bidirectional by employing special placeholder tokens. Treated as a node in a fully connected graph, a placeholder token can take past and future tokens into consideration when generating the actual output token. We verify the effectiveness of our approach experimentally on two conversational tasks where the proposed bidirectional model outperforms competitive baselines by a large margin.
Tasks
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05915v2
PDF	https://arxiv.org/pdf/1908.05915v2.pdf
PWC	https://paperswithcode.com/paper/attending-to-future-tokens-for-bidirectional
Repo
Framework

Refined $α$-Divergence Variational Inference via Rejection Sampling


Title	Refined $α$-Divergence Variational Inference via Rejection Sampling
Authors	Rahul Sharma, Abhishek Kumar, Piyush Rai
Abstract	We present an approximate inference method, based on a synergistic combination of R'enyi $\alpha$-divergence variational inference (RDVI) and rejection sampling (RS). RDVI is based on minimization of R'enyi $\alpha$-divergence $D_\alpha(pq)$ between the true distribution $p(x)$ and a variational approximation $q(x)$; RS draws samples from a distribution $p(x) = \tilde{p}(x)/Z_{p}$ using a proposal $q(x)$, s.t. $Mq(x) \geq \tilde{p}(x), \forall x$. Our inference method is based on a crucial observation that $D_\infty(pq)$ equals $\log M(\theta)$ where $M(\theta)$ is the optimal value of the RS constant for a given proposal $q_\theta(x)$. This enables us to develop a \emph{two-stage} hybrid inference algorithm. Stage-1 performs RDVI to learn $q_\theta$ by minimizing an estimator of $D_\alpha(pq)$, and uses the learned $q_\theta$ to find an (approximately) optimal $\tilde{M}(\theta)$. Stage-2 performs RS using the constant $\tilde{M}(\theta)$ to improve the approximate distribution $q_\theta$ and obtain a sample-based approximation. We prove that this two-stage method allows us to learn considerably more accurate approximations of the target distribution as compared to RDVI. We demonstrate our method’s efficacy via several experiments on synthetic and real datasets.
Tasks
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07627v3
PDF	https://arxiv.org/pdf/1909.07627v3.pdf
PWC	https://paperswithcode.com/paper/refined-divergence-variational-inference-via
Repo
Framework

A Path Towards Quantum Advantage in Training Deep Generative Models with Quantum Annealers


Title	A Path Towards Quantum Advantage in Training Deep Generative Models with Quantum Annealers
Authors	Walter Vinci, Lorenzo Buffoni, Hossein Sadeghi, Amir Khoshaman, Evgeny Andriyash, Mohammad H. Amin
Abstract	The development of quantum-classical hybrid (QCH) algorithms is critical to achieve state-of-the-art computational models. A QCH variational autoencoder (QVAE) was introduced in Ref. [1] by some of the authors of this paper. QVAE consists of a classical auto-encoding structure realized by traditional deep neural networks to perform inference to, and generation from, a discrete latent space. The latent generative process is formalized as thermal sampling from either a quantum or classical Boltzmann machine (QBM or BM). This setup allows quantum-assisted training of deep generative models by physically simulating the generative process with quantum annealers. In this paper, we have successfully employed D-Wave quantum annealers as Boltzmann samplers to perform quantum-assisted, end-to-end training of QVAE. The hybrid structure of QVAE allows us to deploy current-generation quantum annealers in QCH generative models to achieve competitive performance on datasets such as MNIST. The results presented in this paper suggest that commercially available quantum annealers can be deployed, in conjunction with well-crafted classical deep neutral networks, to achieve competitive results in unsupervised and semisupervised tasks on large-scale datasets. We also provide evidence that our setup is able to exploit large latent-space (Q)BMs, which develop slowly mixing modes. This expressive latent space results in slow and inefficient classical sampling, and paves the way to achieve quantum advantage with quantum annealing in realistic sampling applications.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02119v1
PDF	https://arxiv.org/pdf/1912.02119v1.pdf
PWC	https://paperswithcode.com/paper/a-path-towards-quantum-advantage-in-training
Repo
Framework

L3 Fusion: Fast Transformed Convolutions on CPUs


Title	L3 Fusion: Fast Transformed Convolutions on CPUs
Authors	Rati Gelashvili, Nir Shavit, Aleksandar Zlateski
Abstract	Fast convolutions via transforms, either Winograd or FFT, had emerged as a preferred way of performing the computation of convolutional layers, as it greatly reduces the number of required operations. Recent work shows that, for many layer structures, a well–designed implementation of fast convolutions can greatly utilize modern CPUs, significantly reducing the compute time. However, the generous amount of shared L3 cache present on modern CPUs is often neglected, and the algorithms are optimized solely for the private L2 cache. In this paper we propose an efficient `L3 Fusion` algorithm that is specifically designed for CPUs with significant amount of shared L3 cache. Using the hierarchical roofline model, we show that in many cases, especially for layers with fewer channels, the `L3 fused` approach can greatly outperform standard 3 stage one provided by big vendors such as Intel. We validate our theoretical findings, by benchmarking our `L3 fused` implementation against publicly available state of the art.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02165v1
PDF	https://arxiv.org/pdf/1912.02165v1.pdf
PWC	https://paperswithcode.com/paper/l3-fusion-fast-transformed-convolutions-on
Repo
Framework

M-BERT: Injecting Multimodal Information in the BERT Structure


Title	M-BERT: Injecting Multimodal Information in the BERT Structure
Authors	Wasifur Rahman, Md Kamrul Hasan, Amir Zadeh, Louis-Philippe Morency, Mohammed Ehsan Hoque
Abstract	Multimodal language analysis is an emerging research area in natural language processing that models language in a multimodal manner. It aims to understand language from the modalities of text, visual, and acoustic by modeling both intra-modal and cross-modal interactions. BERT (Bidirectional Encoder Representations from Transformers) provides strong contextual language representations after training on large-scale unlabeled corpora. Fine-tuning the vanilla BERT model has shown promising results in building state-of-the-art models for diverse NLP tasks like question answering and language inference. However, fine-tuning BERT in the presence of information from other modalities remains an open research problem. In this paper, we inject multimodal information within the input space of BERT network for modeling multimodal language. The proposed injection method allows BERT to reach a new state of the art of $84.38%$ binary accuracy on CMU-MOSI dataset (multimodal sentiment analysis) with a gap of 5.98 percent to the previous state of the art and 1.02 percent to the text-only BERT.
Tasks	Multimodal Sentiment Analysis, Question Answering, Sentiment Analysis
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05787v1
PDF	https://arxiv.org/pdf/1908.05787v1.pdf
PWC	https://paperswithcode.com/paper/m-bert-injecting-multimodal-information-in
Repo
Framework

Risk Averse Robust Adversarial Reinforcement Learning


Title	Risk Averse Robust Adversarial Reinforcement Learning
Authors	Xinlei Pan, Daniel Seita, Yang Gao, John Canny
Abstract	Deep reinforcement learning has recently made significant progress in solving computer games and robotic control tasks. A known problem, though, is that policies overfit to the training environment and may not avoid rare, catastrophic events such as automotive accidents. A classical technique for improving the robustness of reinforcement learning algorithms is to train on a set of randomized environments, but this approach only guards against common situations. Recently, robust adversarial reinforcement learning (RARL) was developed, which allows efficient applications of random and systematic perturbations by a trained adversary. A limitation of RARL is that only the expected control objective is optimized; there is no explicit modeling or optimization of risk. Thus the agents do not consider the probability of catastrophic events (i.e., those inducing abnormally large negative reward), except through their effect on the expected objective. In this paper we introduce risk-averse robust adversarial reinforcement learning (RARARL), using a risk-averse protagonist and a risk-seeking adversary. We test our approach on a self-driving vehicle controller. We use an ensemble of policy networks to model risk as the variance of value functions. We show through experiments that a risk-averse agent is better equipped to handle a risk-seeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary.
Tasks
Published	2019-03-31
URL	http://arxiv.org/abs/1904.00511v1
PDF	http://arxiv.org/pdf/1904.00511v1.pdf
PWC	https://paperswithcode.com/paper/risk-averse-robust-adversarial-reinforcement
Repo
Framework

Spoken Language Identification using ConvNets


Title	Spoken Language Identification using ConvNets
Authors	Sarthak, Shikhar Shukla, Govind Mittal
Abstract	Language Identification (LI) is an important first step in several speech processing systems. With a growing number of voice-based assistants, speech LI has emerged as a widely researched field. To approach the problem of identifying languages, we can either adopt an implicit approach where only the speech for a language is present or an explicit one where text is available with its corresponding transcript. This paper focuses on an implicit approach due to the absence of transcriptive data. This paper benchmarks existing models and proposes a new attention based model for language identification which uses log-Mel spectrogram images as input. We also present the effectiveness of raw waveforms as features to neural network models for LI tasks. For training and evaluation of models, we classified six languages (English, French, German, Spanish, Russian and Italian) with an accuracy of 95.4% and four languages (English, French, German, Spanish) with an accuracy of 96.3% obtained from the VoxForge dataset. This approach can further be scaled to incorporate more languages.
Tasks	Language Identification
Published	2019-10-09
URL	https://arxiv.org/abs/1910.04269v1
PDF	https://arxiv.org/pdf/1910.04269v1.pdf
PWC	https://paperswithcode.com/paper/spoken-language-identification-using-convnets
Repo
Framework

Evaluation of Seed Set Selection Approaches and Active Learning Strategies in Predictive Coding


Title	Evaluation of Seed Set Selection Approaches and Active Learning Strategies in Predictive Coding
Authors	Christian J. Mahoney, Nathaniel Huber-Fliflet, Haozhen Zhao, Jianping Zhang, Peter Gronvall, Shi Ye
Abstract	Active learning is a popular methodology in text classification - known in the legal domain as “predictive coding” or “Technology Assisted Review” or “TAR” - due to its potential to minimize the required review effort to build effective classifiers. In this study, we use extensive experimentation to examine the impact of popular seed set selection strategies in active learning, within a predictive coding exercise, and evaluate different active learning strategies against well-researched continuous active learning strategies for the purpose of determining efficient training methods for classifying large populations quickly and precisely. We study how random sampling, keyword models and clustering based seed set selection strategies combined together with top-ranked, uncertain, random, recall inspired, and hybrid active learning document selection strategies affect the performance of active learning for predictive coding. We use the percentage of documents requiring review to reach 75% recall as the “benchmark” metric to evaluate and compare our approaches. In most cases we find that seed set selection methods have a minor impact, though they do show significant impact in lower richness data sets or when choosing a top-ranked active learning selection strategy. Our results also show that active learning selection strategies implementing uncertainty, random, or 75% recall selection strategies has the potential to reach the optimum active learning round much earlier than the popular continuous active learning approach (top-ranked selection). The results of our research shed light on the impact of active learning seed set selection strategies and also the effectiveness of the selection strategies for the following learning rounds. Legal practitioners can use the results of this study to enhance the efficiency, precision, and simplicity of their predictive coding process.
Tasks	Active Learning, Text Classification
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04367v1
PDF	https://arxiv.org/pdf/1906.04367v1.pdf
PWC	https://paperswithcode.com/paper/evaluation-of-seed-set-selection-approaches
Repo
Framework

Beta DVBF: Learning State-Space Models for Control from High Dimensional Observations


Title	Beta DVBF: Learning State-Space Models for Control from High Dimensional Observations
Authors	Neha Das, Maximilian Karl, Philip Becker-Ehmck, Patrick van der Smagt
Abstract	Learning a model of dynamics from high-dimensional images can be a core ingredient for success in many applications across different domains, especially in sequential decision making. However, currently prevailing methods based on latent-variable models are limited to working with low resolution images only. In this work, we show that some of the issues with using high-dimensional observations arise from the discrepancy between the dimensionality of the latent and observable space, and propose solutions to overcome them.
Tasks	Decision Making, Latent Variable Models
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00756v1
PDF	https://arxiv.org/pdf/1911.00756v1.pdf
PWC	https://paperswithcode.com/paper/beta-dvbf-learning-state-space-models-for
Repo
Framework