Paper Group ANR 1347
The intriguing role of module criticality in the generalization of deep networks. Revisiting CycleGAN for semi-supervised segmentation. Dirichlet Variational Autoencoder. GAN2GAN: Generative Noise Learning for Blind Image Denoising with Single Noisy Images. SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recogniti …
The intriguing role of module criticality in the generalization of deep networks
Title | The intriguing role of module criticality in the generalization of deep networks |
Authors | Niladri S. Chatterji, Behnam Neyshabur, Hanie Sedghi |
Abstract | We study the phenomenon that some modules of deep neural networks (DNNs) are more critical than others. Meaning that rewinding their parameter values back to initialization, while keeping other modules fixed at the trained parameters, results in a large drop in the network’s performance. Our analysis reveals interesting properties of the loss landscape which leads us to propose a complexity measure, called module criticality, based on the shape of the valleys that connects the initial and final values of the module parameters. We formulate how generalization relates to the module criticality, and show that this measure is able to explain the superior generalization performance of some architectures over others, whereas earlier measures fail to do so. |
Tasks | |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00528v3 |
https://arxiv.org/pdf/1912.00528v3.pdf | |
PWC | https://paperswithcode.com/paper/the-intriguing-role-of-module-criticality-in-1 |
Repo | |
Framework | |
Revisiting CycleGAN for semi-supervised segmentation
Title | Revisiting CycleGAN for semi-supervised segmentation |
Authors | Arnab Kumar Mondal, Aniket Agarwal, Jose Dolz, Christian Desrosiers |
Abstract | In this work, we study the problem of training deep networks for semantic image segmentation using only a fraction of annotated images, which may significantly reduce human annotation efforts. Particularly, we propose a strategy that exploits the unpaired image style transfer capabilities of CycleGAN in semi-supervised segmentation. Unlike recent works using adversarial learning for semi-supervised segmentation, we enforce cycle consistency to learn a bidirectional mapping between unpaired images and segmentation masks. This adds an unsupervised regularization effect that boosts the segmentation performance when annotated data is limited. Experiments on three different public segmentation benchmarks (PASCAL VOC 2012, Cityscapes and ACDC) demonstrate the effectiveness of the proposed method. The proposed model achieves 2-4% of improvement with respect to the baseline and outperforms recent approaches for this task, particularly in low labeled data regime. |
Tasks | Semantic Segmentation, Style Transfer |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1908.11569v1 |
https://arxiv.org/pdf/1908.11569v1.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-cyclegan-for-semi-supervised |
Repo | |
Framework | |
Dirichlet Variational Autoencoder
Title | Dirichlet Variational Autoencoder |
Authors | Weonyoung Joo, Wonsung Lee, Sungrae Park, Il-Chul Moon |
Abstract | This paper proposes Dirichlet Variational Autoencoder (DirVAE) using a Dirichlet prior for a continuous latent variable that exhibits the characteristic of the categorical probabilities. To infer the parameters of DirVAE, we utilize the stochastic gradient method by approximating the Gamma distribution, which is a component of the Dirichlet distribution, with the inverse Gamma CDF approximation. Additionally, we reshape the component collapsing issue by investigating two problem sources, which are decoder weight collapsing and latent value collapsing, and we show that DirVAE has no component collapsing; while Gaussian VAE exhibits the decoder weight collapsing and Stick-Breaking VAE shows the latent value collapsing. The experimental results show that 1) DirVAE models the latent representation result with the best log-likelihood compared to the baselines; and 2) DirVAE produces more interpretable latent values with no collapsing issues which the baseline models suffer from. Also, we show that the learned latent representation from the DirVAE achieves the best classification accuracy in the semi-supervised and the supervised classification tasks on MNIST, OMNIGLOT, and SVHN compared to the baseline VAEs. Finally, we demonstrated that the DirVAE augmented topic models show better performances in most cases. |
Tasks | Omniglot, Topic Models |
Published | 2019-01-09 |
URL | http://arxiv.org/abs/1901.02739v1 |
http://arxiv.org/pdf/1901.02739v1.pdf | |
PWC | https://paperswithcode.com/paper/dirichlet-variational-autoencoder |
Repo | |
Framework | |
GAN2GAN: Generative Noise Learning for Blind Image Denoising with Single Noisy Images
Title | GAN2GAN: Generative Noise Learning for Blind Image Denoising with Single Noisy Images |
Authors | Sungmin Cha, Taesup Moon |
Abstract | We tackle a challenging blind image denoising problem, in which only single distinct noisy images are available for training a denoiser, and no information about noise is known, except for it being zero-mean, additive, and independent of the clean image. In such a setting, which often occurs in practice, it is not possible to train a denoiser with the standard discriminative training or with the recently developed Noise2Noise (N2N) training; the former requires the underlying clean image for the given noisy image, and the latter requires two independently realized noisy image pair for a clean image. To that end, we propose GAN2GAN (Generated-Artificial-Noise to Generated-Artificial-Noise) method that first learns a generative model that can synthesize noisy image pairs based on simulating independent realizations of the noise in given single noisy images, then iteratively trains a denoiser with those synthesized pairs, as in the N2N training. In results, we show the denoiser trained with our GAN2GAN method for the blind denoising setting achieves an impressive denoising performance; it almost approaches the performance of the standard discriminatively-trained or N2N-trained models that have more information than ours, and significantly outperforms the recent baseline for the same setting, i.e., Noise2Void, and a more conventional yet strong one, BM3D. |
Tasks | Denoising, Image Denoising |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10488v2 |
https://arxiv.org/pdf/1905.10488v2.pdf | |
PWC | https://paperswithcode.com/paper/gan2gan-generative-noise-learning-for-blind |
Repo | |
Framework | |
SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition
Title | SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition |
Authors | Zhen Huang, Tim Ng, Leo Liu, Henry Mason, Xiaodan Zhuang, Daben Liu |
Abstract | Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self- Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet- 50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use. |
Tasks | Speech Recognition |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.01992v3 |
https://arxiv.org/pdf/1910.01992v3.pdf | |
PWC | https://paperswithcode.com/paper/sndcnn-self-normalizing-deep-cnns-with-scaled |
Repo | |
Framework | |
LISA: Towards Learned DNA Sequence Search
Title | LISA: Towards Learned DNA Sequence Search |
Authors | Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md, Tim Kraska |
Abstract | Next-generation sequencing (NGS) technologies have enabled affordable sequencing of billions of short DNA fragments at high throughput, paving the way for population-scale genomics. Genomics data analytics at this scale requires overcoming performance bottlenecks, such as searching for short DNA sequences over long reference sequences. In this paper, we introduce LISA (Learned Indexes for Sequence Analysis), a novel learning-based approach to DNA sequence search. As a first proof of concept, we focus on accelerating one of the most essential flavors of the problem, called exact search. LISA builds on and extends FM-index, which is the state-of-the-art technique widely deployed in genomics tool-chains. Initial experiments with human genome datasets indicate that LISA achieves up to a factor of 4X performance speedup against its traditional counterpart. |
Tasks | |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04728v1 |
https://arxiv.org/pdf/1910.04728v1.pdf | |
PWC | https://paperswithcode.com/paper/lisa-towards-learned-dna-sequence-search |
Repo | |
Framework | |
Attending to Future Tokens For Bidirectional Sequence Generation
Title | Attending to Future Tokens For Bidirectional Sequence Generation |
Authors | Carolin Lawrence, Bhushan Kotnis, Mathias Niepert |
Abstract | Neural sequence generation is typically performed token-by-token and left-to-right. Whenever a token is generated only previously produced tokens are taken into consideration. In contrast, for problems such as sequence classification, bidirectional attention, which takes both past and future tokens into consideration, has been shown to perform much better. We propose to make the sequence generation process bidirectional by employing special placeholder tokens. Treated as a node in a fully connected graph, a placeholder token can take past and future tokens into consideration when generating the actual output token. We verify the effectiveness of our approach experimentally on two conversational tasks where the proposed bidirectional model outperforms competitive baselines by a large margin. |
Tasks | |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.05915v2 |
https://arxiv.org/pdf/1908.05915v2.pdf | |
PWC | https://paperswithcode.com/paper/attending-to-future-tokens-for-bidirectional |
Repo | |
Framework | |
Refined $α$-Divergence Variational Inference via Rejection Sampling
Title | Refined $α$-Divergence Variational Inference via Rejection Sampling |
Authors | Rahul Sharma, Abhishek Kumar, Piyush Rai |
Abstract | We present an approximate inference method, based on a synergistic combination of R'enyi $\alpha$-divergence variational inference (RDVI) and rejection sampling (RS). RDVI is based on minimization of R'enyi $\alpha$-divergence $D_\alpha(pq)$ between the true distribution $p(x)$ and a variational approximation $q(x)$; RS draws samples from a distribution $p(x) = \tilde{p}(x)/Z_{p}$ using a proposal $q(x)$, s.t. $Mq(x) \geq \tilde{p}(x), \forall x$. Our inference method is based on a crucial observation that $D_\infty(pq)$ equals $\log M(\theta)$ where $M(\theta)$ is the optimal value of the RS constant for a given proposal $q_\theta(x)$. This enables us to develop a \emph{two-stage} hybrid inference algorithm. Stage-1 performs RDVI to learn $q_\theta$ by minimizing an estimator of $D_\alpha(pq)$, and uses the learned $q_\theta$ to find an (approximately) optimal $\tilde{M}(\theta)$. Stage-2 performs RS using the constant $\tilde{M}(\theta)$ to improve the approximate distribution $q_\theta$ and obtain a sample-based approximation. We prove that this two-stage method allows us to learn considerably more accurate approximations of the target distribution as compared to RDVI. We demonstrate our method’s efficacy via several experiments on synthetic and real datasets. |
Tasks | |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.07627v3 |
https://arxiv.org/pdf/1909.07627v3.pdf | |
PWC | https://paperswithcode.com/paper/refined-divergence-variational-inference-via |
Repo | |
Framework | |
A Path Towards Quantum Advantage in Training Deep Generative Models with Quantum Annealers
Title | A Path Towards Quantum Advantage in Training Deep Generative Models with Quantum Annealers |
Authors | Walter Vinci, Lorenzo Buffoni, Hossein Sadeghi, Amir Khoshaman, Evgeny Andriyash, Mohammad H. Amin |
Abstract | The development of quantum-classical hybrid (QCH) algorithms is critical to achieve state-of-the-art computational models. A QCH variational autoencoder (QVAE) was introduced in Ref. [1] by some of the authors of this paper. QVAE consists of a classical auto-encoding structure realized by traditional deep neural networks to perform inference to, and generation from, a discrete latent space. The latent generative process is formalized as thermal sampling from either a quantum or classical Boltzmann machine (QBM or BM). This setup allows quantum-assisted training of deep generative models by physically simulating the generative process with quantum annealers. In this paper, we have successfully employed D-Wave quantum annealers as Boltzmann samplers to perform quantum-assisted, end-to-end training of QVAE. The hybrid structure of QVAE allows us to deploy current-generation quantum annealers in QCH generative models to achieve competitive performance on datasets such as MNIST. The results presented in this paper suggest that commercially available quantum annealers can be deployed, in conjunction with well-crafted classical deep neutral networks, to achieve competitive results in unsupervised and semisupervised tasks on large-scale datasets. We also provide evidence that our setup is able to exploit large latent-space (Q)BMs, which develop slowly mixing modes. This expressive latent space results in slow and inefficient classical sampling, and paves the way to achieve quantum advantage with quantum annealing in realistic sampling applications. |
Tasks | |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.02119v1 |
https://arxiv.org/pdf/1912.02119v1.pdf | |
PWC | https://paperswithcode.com/paper/a-path-towards-quantum-advantage-in-training |
Repo | |
Framework | |
L3 Fusion: Fast Transformed Convolutions on CPUs
Title | L3 Fusion: Fast Transformed Convolutions on CPUs |
Authors | Rati Gelashvili, Nir Shavit, Aleksandar Zlateski |
Abstract | Fast convolutions via transforms, either Winograd or FFT, had emerged as a preferred way of performing the computation of convolutional layers, as it greatly reduces the number of required operations. Recent work shows that, for many layer structures, a well–designed implementation of fast convolutions can greatly utilize modern CPUs, significantly reducing the compute time. However, the generous amount of shared L3 cache present on modern CPUs is often neglected, and the algorithms are optimized solely for the private L2 cache. In this paper we propose an efficient L3 Fusion algorithm that is specifically designed for CPUs with significant amount of shared L3 cache. Using the hierarchical roofline model, we show that in many cases, especially for layers with fewer channels, the L3 fused approach can greatly outperform standard 3 stage one provided by big vendors such as Intel. We validate our theoretical findings, by benchmarking our L3 fused implementation against publicly available state of the art. |
Tasks | |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.02165v1 |
https://arxiv.org/pdf/1912.02165v1.pdf | |
PWC | https://paperswithcode.com/paper/l3-fusion-fast-transformed-convolutions-on |
Repo | |
Framework | |
M-BERT: Injecting Multimodal Information in the BERT Structure
Title | M-BERT: Injecting Multimodal Information in the BERT Structure |
Authors | Wasifur Rahman, Md Kamrul Hasan, Amir Zadeh, Louis-Philippe Morency, Mohammed Ehsan Hoque |
Abstract | Multimodal language analysis is an emerging research area in natural language processing that models language in a multimodal manner. It aims to understand language from the modalities of text, visual, and acoustic by modeling both intra-modal and cross-modal interactions. BERT (Bidirectional Encoder Representations from Transformers) provides strong contextual language representations after training on large-scale unlabeled corpora. Fine-tuning the vanilla BERT model has shown promising results in building state-of-the-art models for diverse NLP tasks like question answering and language inference. However, fine-tuning BERT in the presence of information from other modalities remains an open research problem. In this paper, we inject multimodal information within the input space of BERT network for modeling multimodal language. The proposed injection method allows BERT to reach a new state of the art of $84.38%$ binary accuracy on CMU-MOSI dataset (multimodal sentiment analysis) with a gap of 5.98 percent to the previous state of the art and 1.02 percent to the text-only BERT. |
Tasks | Multimodal Sentiment Analysis, Question Answering, Sentiment Analysis |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05787v1 |
https://arxiv.org/pdf/1908.05787v1.pdf | |
PWC | https://paperswithcode.com/paper/m-bert-injecting-multimodal-information-in |
Repo | |
Framework | |
Risk Averse Robust Adversarial Reinforcement Learning
Title | Risk Averse Robust Adversarial Reinforcement Learning |
Authors | Xinlei Pan, Daniel Seita, Yang Gao, John Canny |
Abstract | Deep reinforcement learning has recently made significant progress in solving computer games and robotic control tasks. A known problem, though, is that policies overfit to the training environment and may not avoid rare, catastrophic events such as automotive accidents. A classical technique for improving the robustness of reinforcement learning algorithms is to train on a set of randomized environments, but this approach only guards against common situations. Recently, robust adversarial reinforcement learning (RARL) was developed, which allows efficient applications of random and systematic perturbations by a trained adversary. A limitation of RARL is that only the expected control objective is optimized; there is no explicit modeling or optimization of risk. Thus the agents do not consider the probability of catastrophic events (i.e., those inducing abnormally large negative reward), except through their effect on the expected objective. In this paper we introduce risk-averse robust adversarial reinforcement learning (RARARL), using a risk-averse protagonist and a risk-seeking adversary. We test our approach on a self-driving vehicle controller. We use an ensemble of policy networks to model risk as the variance of value functions. We show through experiments that a risk-averse agent is better equipped to handle a risk-seeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary. |
Tasks | |
Published | 2019-03-31 |
URL | http://arxiv.org/abs/1904.00511v1 |
http://arxiv.org/pdf/1904.00511v1.pdf | |
PWC | https://paperswithcode.com/paper/risk-averse-robust-adversarial-reinforcement |
Repo | |
Framework | |
Spoken Language Identification using ConvNets
Title | Spoken Language Identification using ConvNets |
Authors | Sarthak, Shikhar Shukla, Govind Mittal |
Abstract | Language Identification (LI) is an important first step in several speech processing systems. With a growing number of voice-based assistants, speech LI has emerged as a widely researched field. To approach the problem of identifying languages, we can either adopt an implicit approach where only the speech for a language is present or an explicit one where text is available with its corresponding transcript. This paper focuses on an implicit approach due to the absence of transcriptive data. This paper benchmarks existing models and proposes a new attention based model for language identification which uses log-Mel spectrogram images as input. We also present the effectiveness of raw waveforms as features to neural network models for LI tasks. For training and evaluation of models, we classified six languages (English, French, German, Spanish, Russian and Italian) with an accuracy of 95.4% and four languages (English, French, German, Spanish) with an accuracy of 96.3% obtained from the VoxForge dataset. This approach can further be scaled to incorporate more languages. |
Tasks | Language Identification |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.04269v1 |
https://arxiv.org/pdf/1910.04269v1.pdf | |
PWC | https://paperswithcode.com/paper/spoken-language-identification-using-convnets |
Repo | |
Framework | |
Evaluation of Seed Set Selection Approaches and Active Learning Strategies in Predictive Coding
Title | Evaluation of Seed Set Selection Approaches and Active Learning Strategies in Predictive Coding |
Authors | Christian J. Mahoney, Nathaniel Huber-Fliflet, Haozhen Zhao, Jianping Zhang, Peter Gronvall, Shi Ye |
Abstract | Active learning is a popular methodology in text classification - known in the legal domain as “predictive coding” or “Technology Assisted Review” or “TAR” - due to its potential to minimize the required review effort to build effective classifiers. In this study, we use extensive experimentation to examine the impact of popular seed set selection strategies in active learning, within a predictive coding exercise, and evaluate different active learning strategies against well-researched continuous active learning strategies for the purpose of determining efficient training methods for classifying large populations quickly and precisely. We study how random sampling, keyword models and clustering based seed set selection strategies combined together with top-ranked, uncertain, random, recall inspired, and hybrid active learning document selection strategies affect the performance of active learning for predictive coding. We use the percentage of documents requiring review to reach 75% recall as the “benchmark” metric to evaluate and compare our approaches. In most cases we find that seed set selection methods have a minor impact, though they do show significant impact in lower richness data sets or when choosing a top-ranked active learning selection strategy. Our results also show that active learning selection strategies implementing uncertainty, random, or 75% recall selection strategies has the potential to reach the optimum active learning round much earlier than the popular continuous active learning approach (top-ranked selection). The results of our research shed light on the impact of active learning seed set selection strategies and also the effectiveness of the selection strategies for the following learning rounds. Legal practitioners can use the results of this study to enhance the efficiency, precision, and simplicity of their predictive coding process. |
Tasks | Active Learning, Text Classification |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04367v1 |
https://arxiv.org/pdf/1906.04367v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluation-of-seed-set-selection-approaches |
Repo | |
Framework | |
Beta DVBF: Learning State-Space Models for Control from High Dimensional Observations
Title | Beta DVBF: Learning State-Space Models for Control from High Dimensional Observations |
Authors | Neha Das, Maximilian Karl, Philip Becker-Ehmck, Patrick van der Smagt |
Abstract | Learning a model of dynamics from high-dimensional images can be a core ingredient for success in many applications across different domains, especially in sequential decision making. However, currently prevailing methods based on latent-variable models are limited to working with low resolution images only. In this work, we show that some of the issues with using high-dimensional observations arise from the discrepancy between the dimensionality of the latent and observable space, and propose solutions to overcome them. |
Tasks | Decision Making, Latent Variable Models |
Published | 2019-11-02 |
URL | https://arxiv.org/abs/1911.00756v1 |
https://arxiv.org/pdf/1911.00756v1.pdf | |
PWC | https://paperswithcode.com/paper/beta-dvbf-learning-state-space-models-for |
Repo | |
Framework | |