January 31, 2020

2809 words 14 mins read

Paper Group AWR 434

Paper Group AWR 434

Efficient Per-Example Gradient Computations in Convolutional Neural Networks. Temporal Attentive Alignment for Video Domain Adaptation. Information Extraction Tool Text2ALM: From Narratives to Action Language System Descriptions. Bilinear Representation for Language-based Image Editing Using Conditional Generative Adversarial Networks. Jasper: An E …

Efficient Per-Example Gradient Computations in Convolutional Neural Networks

Title Efficient Per-Example Gradient Computations in Convolutional Neural Networks
Authors Gaspar Rochette, Andre Manoel, Eric W. Tramel
Abstract Deep learning frameworks leverage GPUs to perform massively-parallel computations over batches of many training examples efficiently. However, for certain tasks, one may be interested in performing per-example computations, for instance using per-example gradients to evaluate a quantity of interest unique to each example. One notable application comes from the field of differential privacy, where per-example gradients must be norm-bounded in order to limit the impact of each example on the aggregated batch gradient. In this work, we discuss how per-example gradients can be efficiently computed in convolutional neural networks (CNNs). We compare existing strategies by performing a few steps of differentially-private training on CNNs of varying sizes. We also introduce a new strategy for per-example gradient calculation, which is shown to be advantageous depending on the model architecture and how the model is trained. This is a first step in making differentially-private training of CNNs practical.
Tasks
Published 2019-12-12
URL https://arxiv.org/abs/1912.06015v1
PDF https://arxiv.org/pdf/1912.06015v1.pdf
PWC https://paperswithcode.com/paper/efficient-per-example-gradient-computations-1
Repo https://github.com/owkin/grad-cnns
Framework pytorch

Temporal Attentive Alignment for Video Domain Adaptation

Title Temporal Attentive Alignment for Video Domain Adaptation
Authors Min-Hung Chen, Zsolt Kira, Ghassan AlRegib
Abstract Although various image-based domain adaptation (DA) techniques have been proposed in recent years, domain shift in videos is still not well-explored. Most previous works only evaluate performance on small-scale datasets which are saturated. Therefore, we first propose a larger-scale dataset with larger domain discrepancy: UCF-HMDB_full. Second, we investigate different DA integration methods for videos, and show that simultaneously aligning and learning temporal dynamics achieves effective alignment even without sophisticated DA methods. Finally, we propose Temporal Attentive Adversarial Adaptation Network (TA3N), which explicitly attends to the temporal dynamics using domain discrepancy for more effective domain alignment, achieving state-of-the-art performance on three video DA datasets. The code and data are released at http://github.com/cmhungsteve/TA3N.
Tasks Domain Adaptation
Published 2019-05-26
URL https://arxiv.org/abs/1905.10861v5
PDF https://arxiv.org/pdf/1905.10861v5.pdf
PWC https://paperswithcode.com/paper/temporal-attentive-alignment-for-video-domain
Repo https://github.com/olivesgatech/TA3N
Framework pytorch

Information Extraction Tool Text2ALM: From Narratives to Action Language System Descriptions

Title Information Extraction Tool Text2ALM: From Narratives to Action Language System Descriptions
Authors Craig Olson, Yuliya Lierler
Abstract In this work we design a narrative understanding tool Text2ALM. This tool uses an action language ALM to perform inferences on complex interactions of events described in narratives. The methodology used to implement the Text2ALM system was originally outlined by Lierler, Inclezan, and Gelfond (2017) via a manual process of converting a narrative to an ALM model. It relies on a conglomeration of resources and techniques from two distinct fields of artificial intelligence, namely, natural language processing and knowledge representation and reasoning. The effectiveness of system Text2ALM is measured by its ability to correctly answer questions from the bAbI tasks published by Facebook Research in 2015. This tool matched or exceeded the performance of state-of-the-art machine learning methods in six of the seven tested tasks. We also illustrate that the Text2ALM approach generalizes to a broader spectrum of narratives.
Tasks
Published 2019-09-18
URL https://arxiv.org/abs/1909.08235v1
PDF https://arxiv.org/pdf/1909.08235v1.pdf
PWC https://paperswithcode.com/paper/information-extraction-tool-text2alm-from
Repo https://github.com/cdolson19/Text2ALM
Framework none

Bilinear Representation for Language-based Image Editing Using Conditional Generative Adversarial Networks

Title Bilinear Representation for Language-based Image Editing Using Conditional Generative Adversarial Networks
Authors Xiaofeng Mao, Yuefeng Chen, Yuhong Li, Tao Xiong, Yuan He, Hui Xue
Abstract The task of Language-Based Image Editing (LBIE) aims at generating a target image by editing the source image based on the given language description. The main challenge of LBIE is to disentangle the semantics in image and text and then combine them to generate realistic images. Therefore, the editing performance is heavily dependent on the learned representation. In this work, conditional generative adversarial network (cGAN) is utilized for LBIE. We find that existing conditioning methods in cGAN lack of representation power as they cannot learn the second-order correlation between two conditioning vectors. To solve this problem, we propose an improved conditional layer named Bilinear Residual Layer (BRL) to learning more powerful representations for LBIE task. Qualitative and quantitative comparisons demonstrate that our method can generate images with higher quality when compared to previous LBIE techniques.
Tasks
Published 2019-03-18
URL http://arxiv.org/abs/1903.07499v1
PDF http://arxiv.org/pdf/1903.07499v1.pdf
PWC https://paperswithcode.com/paper/bilinear-representation-for-language-based
Repo https://github.com/vtddggg/BilinearGAN_for_LBIE
Framework pytorch

Jasper: An End-to-End Convolutional Neural Acoustic Model

Title Jasper: An End-to-End Convolutional Neural Acoustic Model
Authors Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde
Abstract In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data. Our model, Jasper, uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections. To improve training, we further introduce a new layer-wise optimizer called NovoGrad. Through experiments, we demonstrate that the proposed deep architecture performs as well or better than more complex choices. Our deepest Jasper variant uses 54 convolutional layers. With this architecture, we achieve 2.95% WER using a beam-search decoder with an external neural language model and 3.86% WER with a greedy decoder on LibriSpeech test-clean. We also report competitive results on the Wall Street Journal and the Hub5’00 conversational evaluation datasets.
Tasks End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published 2019-04-05
URL https://arxiv.org/abs/1904.03288v3
PDF https://arxiv.org/pdf/1904.03288v3.pdf
PWC https://paperswithcode.com/paper/jasper-an-end-to-end-convolutional-neural
Repo https://github.com/NVIDIA/OpenSeq2Seq
Framework tf

Deep Learning Accelerated Light Source Experiments

Title Deep Learning Accelerated Light Source Experiments
Authors Zhengchun Liu, Tekin Bicer, Rajkumar Kettimuthu, Ian Foster
Abstract Experimental protocols at synchrotron light sources typically process and validate data only after an experiment has completed, which can lead to undetected errors and cannot enable online steering. Real-time data analysis can enable both detection of, and recovery from, errors, and optimization of data acquisition. However, modern scientific instruments, such as detectors at synchrotron light sources, can generate data at GBs/sec rates. Data processing methods such as the widely used computational tomography usually require considerable computational resources, and yield poor quality reconstructions in the early stages of data acquisition when available views are sparse. We describe here how a deep convolutional neural network can be integrated into the real-time streaming tomography pipeline to enable better-quality images in the early stages of data acquisition. Compared with conventional streaming tomography processing, our method can significantly improve tomography image quality, deliver comparable images using only 32% of the data needed for conventional streaming processing, and save 68% experiment time for data acquisition.
Tasks
Published 2019-10-09
URL https://arxiv.org/abs/1910.04081v1
PDF https://arxiv.org/pdf/1910.04081v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-accelerated-light-source
Repo https://github.com/ramsesproject/TomoGAN
Framework tf

Wasserstein Adversarial Examples via Projected Sinkhorn Iterations

Title Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
Authors Eric Wong, Frank R. Schmidt, J. Zico Kolter
Abstract A rapidly growing area of work has studied the existence of adversarial examples, datapoints which have been perturbed to fool a classifier, but the vast majority of these works have focused primarily on threat models defined by $\ell_p$ norm-bounded perturbations. In this paper, we propose a new threat model for adversarial attacks based on the Wasserstein distance. In the image classification setting, such distances measure the cost of moving pixel mass, which naturally cover “standard” image manipulations such as scaling, rotation, translation, and distortion (and can potentially be applied to other settings as well). To generate Wasserstein adversarial examples, we develop a procedure for projecting onto the Wasserstein ball, based upon a modified version of the Sinkhorn iteration. The resulting algorithm can successfully attack image classification models, bringing traditional CIFAR10 models down to 3% accuracy within a Wasserstein ball with radius 0.1 (i.e., moving 10% of the image mass 1 pixel), and we demonstrate that PGD-based adversarial training can improve this adversarial accuracy to 76%. In total, this work opens up a new direction of study in adversarial robustness, more formally considering convex metrics that accurately capture the invariances that we typically believe should exist in classifiers. Code for all experiments in the paper is available at https://github.com/locuslab/projected_sinkhorn.
Tasks Adversarial Attack, Adversarial Defense, Image Classification
Published 2019-02-21
URL https://arxiv.org/abs/1902.07906v2
PDF https://arxiv.org/pdf/1902.07906v2.pdf
PWC https://paperswithcode.com/paper/wasserstein-adversarial-examples-via
Repo https://github.com/locuslab/projected_sinkhorn
Framework pytorch

A Coefficient of Determination for Probabilistic Topic Models

Title A Coefficient of Determination for Probabilistic Topic Models
Authors Tommy Jones
Abstract This research proposes a new (old) metric for evaluating goodness of fit in topic models, the coefficient of determination, or $R^2$. Within the context of topic modeling, $R^2$ has the same interpretation that it does when used in a broader class of statistical models. Reporting $R^2$ with topic models addresses two current problems in topic modeling: a lack of standard cross-contextual evaluation metrics for topic modeling and ease of communication with lay audiences. The author proposes that $R^2$ should be reported as a standard metric when constructing topic models.
Tasks Topic Models
Published 2019-11-20
URL https://arxiv.org/abs/1911.11061v2
PDF https://arxiv.org/pdf/1911.11061v2.pdf
PWC https://paperswithcode.com/paper/a-coefficient-of-determination-for
Repo https://github.com/TommyJones/tidylda
Framework none

Topic Modeling in Embedding Spaces

Title Topic Modeling in Embedding Spaces
Authors Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei
Abstract Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings. In particular, it models each word with a categorical distribution whose natural parameter is the inner product between a word embedding and an embedding of its assigned topic. To fit the ETM, we develop an efficient amortized variational inference algorithm. The ETM discovers interpretable topics even with large vocabularies that include rare words and stop words. It outperforms existing document models, such as latent Dirichlet allocation (LDA), in terms of both topic quality and predictive performance.
Tasks Topic Models, Word Embeddings
Published 2019-07-08
URL https://arxiv.org/abs/1907.04907v1
PDF https://arxiv.org/pdf/1907.04907v1.pdf
PWC https://paperswithcode.com/paper/topic-modeling-in-embedding-spaces
Repo https://github.com/adjidieng/DETM
Framework pytorch

Large-scale weakly-supervised pre-training for video action recognition

Title Large-scale weakly-supervised pre-training for video action recognition
Authors Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan
Abstract Current fully-supervised video datasets consist of only a few hundred thousand videos and fewer than a thousand domain-specific labels. This hinders the progress towards advanced video architectures. This paper presents an in-depth study of using large volumes of web videos for pre-training video models for the task of action recognition. Our primary empirical finding is that pre-training at a very large scale (over 65 million videos), despite on noisy social-media videos and hashtags, substantially improves the state-of-the-art on three challenging public action recognition datasets. Further, we examine three questions in the construction of weakly-supervised video action datasets. First, given that actions involve interactions with objects, how should one construct a verb-object pre-training label space to benefit transfer learning the most? Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning? Finally, actions are generally less well-localized in long videos vs. short videos; since action labels are provided at a video level, how should one choose video clips for best performance, given some fixed budget of number or minutes of videos?
Tasks Action Classification, Action Recognition In Videos, Activity Recognition, Activity Recognition In Videos, Temporal Action Localization, Transfer Learning
Published 2019-05-02
URL http://arxiv.org/abs/1905.00561v1
PDF http://arxiv.org/pdf/1905.00561v1.pdf
PWC https://paperswithcode.com/paper/large-scale-weakly-supervised-pre-training
Repo https://github.com/microsoft/computervision-recipes
Framework pytorch

Going Deeper with Point Networks

Title Going Deeper with Point Networks
Authors Eric-Tuan Le, Iasonas Kokkinos, Niloy J. Mitra
Abstract In this work, we introduce three generic point cloud processing blocks that improve both accuracy and memory consumption of state-of-the-art networks thus allowing to design deeper and more accurate networks. The novel processing blocks are: a multi-resolution point cloud processing block; a convolution-type operation for point sets that blends neighborhood information in a memory-efficient manner; and a crosslink block that efficiently shares information across low- and high-resolution processing branches. Combining these blocks allows us to design significantly wider and deeper architectures. We extensively evaluate the proposed architectures on multiple point segmentation benchmarks (ShapeNet-Part, ScanNet, PartNet) and report systematic improvements in terms of both accuracy and memory consumption by using our generic modules in conjunction with multiple recent architectures (PointNet++, DGCNN, SpiderCNN, PointCNN). We report a 3.4% increase in IoU on the -most complex- PartNet dataset while decreasing memory footprint by 57%.
Tasks
Published 2019-07-01
URL https://arxiv.org/abs/1907.00960v1
PDF https://arxiv.org/pdf/1907.00960v1.pdf
PWC https://paperswithcode.com/paper/going-deeper-with-point-networks
Repo https://github.com/erictuanle/GoingDeeperwPointNetworks
Framework pytorch

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

Title Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Authors Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse
Abstract Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns. In this work, we study how the critical batch size changes based on properties of the optimization algorithm, including acceleration and preconditioning, through two different lenses: large scale experiments, and analysis of a simple noisy quadratic model (NQM). We experimentally demonstrate that optimization algorithms that employ preconditioning, specifically Adam and K-FAC, result in much larger critical batch sizes than stochastic gradient descent with momentum. We also demonstrate that the NQM captures many of the essential features of real neural network training, despite being drastically simpler to work with. The NQM predicts our results with preconditioned optimizers, previous results with accelerated gradient descent, and other results around optimal learning rates and large batch training, making it a useful tool to generate testable predictions about neural network optimization.
Tasks
Published 2019-07-09
URL https://arxiv.org/abs/1907.04164v2
PDF https://arxiv.org/pdf/1907.04164v2.pdf
PWC https://paperswithcode.com/paper/which-algorithmic-choices-matter-at-which
Repo https://github.com/gd-zhang/noisy-quadratic-model
Framework none

Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation

Title Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation
Authors Chengchao Shen, Mengqi Xue, Xinchao Wang, Jie Song, Li Sun, Mingli Song
Abstract A massive number of well-trained deep networks have been released by developers online. These networks may focus on different tasks and in many cases are optimized for different datasets. In this paper, we study how to exploit such heterogeneous pre-trained networks, known as teachers, so as to train a customized student network that tackles a set of selective tasks defined by the user. We assume no human annotations are available, and each teacher may be either single- or multi-task. To this end, we introduce a dual-step strategy that first extracts the task-specific knowledge from the heterogeneous teachers sharing the same sub-task, and then amalgamates the extracted knowledge to build the student network. To facilitate the training, we employ a selective learning scheme where, for each unlabelled sample, the student learns adaptively from only the teacher with the least prediction ambiguity. We evaluate the proposed approach on several datasets and experimental results demonstrate that the student, learned by such adaptive knowledge amalgamation, achieves performances even better than those of the teachers.
Tasks
Published 2019-08-20
URL https://arxiv.org/abs/1908.07121v1
PDF https://arxiv.org/pdf/1908.07121v1.pdf
PWC https://paperswithcode.com/paper/customizing-student-networks-from
Repo https://github.com/UpCoder/KnowledgeAmalgamationModule
Framework tf

Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops

Title Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops
Authors Limor Gultchin, Genevieve Patterson, Nancy Baym, Nathaniel Swinger, Adam Tauman Kalai
Abstract While humor is often thought to be beyond the reach of Natural Language Processing, we show that several aspects of single-word humor correlate with simple linear directions in Word Embeddings. In particular: (a) the word vectors capture multiple aspects discussed in humor theories from various disciplines; (b) each individual’s sense of humor can be represented by a vector, which can predict differences in people’s senses of humor on new, unrated, words; and (c) upon clustering humor ratings of multiple demographic groups, different humor preferences emerge across the different groups. Humor ratings are taken from the work of Engelthaler and Hills (2017) as well as from an original crowdsourcing study of 120,000 words. Our dataset further includes annotations for the theoretically-motivated humor features we identify.
Tasks Word Embeddings
Published 2019-02-08
URL https://arxiv.org/abs/1902.02783v3
PDF https://arxiv.org/pdf/1902.02783v3.pdf
PWC https://paperswithcode.com/paper/humor-in-word-embeddings-cockamamie
Repo https://github.com/zhou059/w266-project
Framework none

Splitting Steepest Descent for Growing Neural Architectures

Title Splitting Steepest Descent for Growing Neural Architectures
Authors Qiang Liu, Lemeng Wu, Dilin Wang
Abstract We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple criterion for deciding the best subset of neurons to split and a splitting gradient for optimally updating the off-springs. Theoretically, our splitting strategy is a second-order functional steepest descent for escaping saddle points in an $\infty$-Wasserstein metric space, on which the standard parametric gradient descent is a first-order steepest descent. Our method provides a new computationally efficient approach for optimizing neural network structures, especially for learning lightweight neural architectures in resource-constrained settings.
Tasks
Published 2019-10-06
URL https://arxiv.org/abs/1910.02366v3
PDF https://arxiv.org/pdf/1910.02366v3.pdf
PWC https://paperswithcode.com/paper/splitting-steepest-descent-for-growing-neural
Repo https://github.com/klightz/splitting
Framework pytorch
comments powered by Disqus