February 1, 2020

2990 words 15 mins read

Paper Group AWR 292

Paper Group AWR 292

Equi-normalization of Neural Networks. Complementary-Similarity Learning using Quadruplet Network. Automated shapeshifting for function recovery in damaged robots. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding. Patient Knowledge Distillation for BERT Model Compression. Detecting the Unexpect …

Equi-normalization of Neural Networks

Title Equi-normalization of Neural Networks
Authors Pierre Stock, Benjamin Graham, Rémi Gribonval, Hervé Jégou
Abstract Modern neural networks are over-parametrized. In particular, each rectified linear hidden unit can be modified by a multiplicative factor by adjusting input and output weights, without changing the rest of the network. Inspired by the Sinkhorn-Knopp algorithm, we introduce a fast iterative method for minimizing the L2 norm of the weights, equivalently the weight decay regularizer. It provably converges to a unique solution. Interleaving our algorithm with SGD during training improves the test accuracy. For small batches, our approach offers an alternative to batch-and group-normalization on CIFAR-10 and ImageNet with a ResNet-18.
Tasks
Published 2019-02-27
URL http://arxiv.org/abs/1902.10416v1
PDF http://arxiv.org/pdf/1902.10416v1.pdf
PWC https://paperswithcode.com/paper/equi-normalization-of-neural-networks
Repo https://github.com/facebookresearch/enorm
Framework pytorch

Complementary-Similarity Learning using Quadruplet Network

Title Complementary-Similarity Learning using Quadruplet Network
Authors Mansi Ranjit Mane, Stephen Guo, Kannan Achan
Abstract We propose a novel learning framework to answer questions such as “if a user is purchasing a shirt, what other items will (s)he need with the shirt?” Our framework learns distributed representations for items from available textual data, with the learned representations representing items in a latent space expressing functional complementarity as well similarity. In particular, our framework places functionally similar items close together in the latent space, while also placing complementary items closer than non-complementary items, but farther away than similar items. In this study, we introduce a new dataset of similar, complementary, and negative items derived from the Amazon co-purchase dataset. For evaluation purposes, we focus our approach on clothing and fashion verticals. As per our knowledge, this is the first attempt to learn similar and complementary relationships simultaneously through just textual title metadata. Our framework is applicable across a broad set of items in the product catalog and can generate quality complementary item recommendations at scale.
Tasks
Published 2019-08-26
URL https://arxiv.org/abs/1908.09928v2
PDF https://arxiv.org/pdf/1908.09928v2.pdf
PWC https://paperswithcode.com/paper/complementary-similarity-learning-using
Repo https://github.com/mansimane/quadnet-comp-sim
Framework tf

Automated shapeshifting for function recovery in damaged robots

Title Automated shapeshifting for function recovery in damaged robots
Authors Sam Kriegman, Stephanie Walker, Dylan Shah, Michael Levin, Rebecca Kramer-Bottiglio, Josh Bongard
Abstract A robot’s mechanical parts routinely wear out from normal functioning and can be lost to injury. For autonomous robots operating in isolated or hostile environments, repair from a human operator is often not possible. Thus, much work has sought to automate damage recovery in robots. However, every case reported in the literature to date has accepted the damaged mechanical structure as fixed, and focused on learning new ways to control it. Here we show for the first time a robot that automatically recovers from unexpected damage by deforming its resting mechanical structure without changing its control policy. We found that, especially in the case of “deep insult”, such as removal of all four of the robot’s legs, the damaged machine evolves shape changes that not only recover the original level of function (locomotion) as before, but can in fact surpass the original level of performance (speed). This suggests that shape change, instead of control readaptation, may be a better method to recover function after damage in some cases.
Tasks
Published 2019-05-22
URL https://arxiv.org/abs/1905.09264v1
PDF https://arxiv.org/pdf/1905.09264v1.pdf
PWC https://paperswithcode.com/paper/automated-shapeshifting-for-function-recovery
Repo https://github.com/skriegman/2019-RSS
Framework none

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

Title Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Authors Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
Abstract This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks. Although ensemble learning can improve model performance, serving an ensemble of large DNNs such as MT-DNN can be prohibitively expensive. Here we apply the knowledge distillation method (Hinton et al., 2015) in the multi-task learning setting. For each task, we train an ensemble of different MT-DNNs (teacher) that outperforms any single model, and then train a single MT-DNN (student) via multi-task learning to \emph{distill} knowledge from these ensemble teachers. We show that the distilled MT-DNN significantly outperforms the original MT-DNN on 7 out of 9 GLUE tasks, pushing the GLUE benchmark (single model) to 83.7% (1.5% absolute improvement\footnote{ Based on the GLUE leaderboard at https://gluebenchmark.com/leaderboard as of April 1, 2019.}). The code and pre-trained models will be made publicly available at https://github.com/namisan/mt-dnn.
Tasks Multi-Task Learning
Published 2019-04-20
URL http://arxiv.org/abs/1904.09482v1
PDF http://arxiv.org/pdf/1904.09482v1.pdf
PWC https://paperswithcode.com/paper/improving-multi-task-deep-neural-networks-via
Repo https://github.com/namisan/mt-dnn
Framework pytorch

Patient Knowledge Distillation for BERT Model Compression

Title Patient Knowledge Distillation for BERT Model Compression
Authors Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu
Abstract Pre-trained language models such as BERT have proven to be highly effective for natural language processing (NLP) tasks. However, the high demand for computing resources in training such models hinders their application in practice. In order to alleviate this resource hunger in large-scale model training, we propose a Patient Knowledge Distillation approach to compress an original large model (teacher) into an equally-effective lightweight shallow network (student). Different from previous knowledge distillation methods, which only use the output from the last layer of the teacher network for distillation, our student model patiently learns from multiple intermediate layers of the teacher model for incremental knowledge extraction, following two strategies: ($i$) PKD-Last: learning from the last $k$ layers; and ($ii$) PKD-Skip: learning from every $k$ layers. These two patient distillation schemes enable the exploitation of rich information in the teacher’s hidden layers, and encourage the student model to patiently learn from and imitate the teacher through a multi-layer distillation process. Empirically, this translates into improved results on multiple NLP tasks with significant gain in training efficiency, without sacrificing model accuracy.
Tasks Model Compression
Published 2019-08-25
URL https://arxiv.org/abs/1908.09355v1
PDF https://arxiv.org/pdf/1908.09355v1.pdf
PWC https://paperswithcode.com/paper/patient-knowledge-distillation-for-bert-model
Repo https://github.com/liujingqiao/distillation_model_keras_bert
Framework none

Detecting the Unexpected via Image Resynthesis

Title Detecting the Unexpected via Image Resynthesis
Authors Krzysztof Lis, Krishna Nakka, Pascal Fua, Mathieu Salzmann
Abstract Classical semantic segmentation methods, including the recent deep learning ones, assume that all classes observed at test time have been seen during training. In this paper, we tackle the more realistic scenario where unexpected objects of unknown classes can appear at test time. The main trends in this area either leverage the notion of prediction uncertainty to flag the regions with low confidence as unknown, or rely on autoencoders and highlight poorly-decoded regions. Having observed that, in both cases, the detected regions typically do not correspond to unexpected objects, in this paper, we introduce a drastically different strategy: It relies on the intuition that the network will produce spurious labels in regions depicting unexpected objects. Therefore, resynthesizing the image from the resulting semantic map will yield significant appearance differences with respect to the input image. In other words, we translate the problem of detecting unknown classes to one of identifying poorly-resynthesized image regions. We show that this outperforms both uncertainty- and autoencoder-based methods.
Tasks Semantic Segmentation
Published 2019-04-16
URL http://arxiv.org/abs/1904.07595v2
PDF http://arxiv.org/pdf/1904.07595v2.pdf
PWC https://paperswithcode.com/paper/detecting-the-unexpected-via-image
Repo https://github.com/adynathos/LabelGrab
Framework none

Large Scale Adversarial Representation Learning

Title Large Scale Adversarial Representation Learning
Authors Jeff Donahue, Karen Simonyan
Abstract Adversarially trained generative models (GANs) have recently achieved compelling image synthesis results. But despite early successes in using GANs for unsupervised representation learning, they have since been superseded by approaches based on self-supervision. In this work we show that progress in image generation quality translates to substantially improved representation learning performance. Our approach, BigBiGAN, builds upon the state-of-the-art BigGAN model, extending it to representation learning by adding an encoder and modifying the discriminator. We extensively evaluate the representation learning and generation capabilities of these BigBiGAN models, demonstrating that these generation-based models achieve the state of the art in unsupervised representation learning on ImageNet, as well as in unconditional image generation. Pretrained BigBiGAN models – including image generators and encoders – are available on TensorFlow Hub (https://tfhub.dev/s?publisher=deepmind&q=bigbigan).
Tasks Image Generation, Representation Learning, Self-Supervised Image Classification, Semi-Supervised Image Classification, Unsupervised Representation Learning
Published 2019-07-04
URL https://arxiv.org/abs/1907.02544v2
PDF https://arxiv.org/pdf/1907.02544v2.pdf
PWC https://paperswithcode.com/paper/large-scale-adversarial-representation
Repo https://github.com/LEGO999/BIgBiGAN
Framework tf

Coupled Variational Recurrent Collaborative Filtering

Title Coupled Variational Recurrent Collaborative Filtering
Authors Qingquan Song, Shiyu Chang, Xia Hu
Abstract We focus on the problem of streaming recommender system and explore novel collaborative filtering algorithms to handle the data dynamicity and complexity in a streaming manner. Although deep neural networks have demonstrated the effectiveness of recommendation tasks, it is lack of explorations on integrating probabilistic models and deep architectures under streaming recommendation settings. Conjoining the complementary advantages of probabilistic models and deep neural networks could enhance both model effectiveness and the understanding of inference uncertainties. To bridge the gap, in this paper, we propose a Coupled Variational Recurrent Collaborative Filtering (CVRCF) framework based on the idea of Deep Bayesian Learning to handle the streaming recommendation problem. The framework jointly combines stochastic processes and deep factorization models under a Bayesian paradigm to model the generation and evolution of users’ preferences and items’ popularities. To ensure efficient optimization and streaming update, we further propose a sequential variational inference algorithm based on a cross variational recurrent neural network structure. Experimental results on three benchmark datasets demonstrate that the proposed framework performs favorably against the state-of-the-art methods in terms of both temporal dependency modeling and predictive accuracy. The learned latent variables also provide visualized interpretations for the evolution of temporal dynamics.
Tasks Recommendation Systems
Published 2019-06-11
URL https://arxiv.org/abs/1906.04386v1
PDF https://arxiv.org/pdf/1906.04386v1.pdf
PWC https://paperswithcode.com/paper/coupled-variational-recurrent-collaborative
Repo https://github.com/song3134/CVRCF
Framework tf

Ensuring Readability and Data-fidelity using Head-modifier Templates in Deep Type Description Generation

Title Ensuring Readability and Data-fidelity using Head-modifier Templates in Deep Type Description Generation
Authors Jiangjie Chen, Ao Wang, Haiyun Jiang, Suo Feng, Chenguang Li, Yanghua Xiao
Abstract A type description is a succinct noun compound which helps human and machines to quickly grasp the informative and distinctive information of an entity. Entities in most knowledge graphs (KGs) still lack such descriptions, thus calling for automatic methods to supplement such information. However, existing generative methods either overlook the grammatical structure or make factual mistakes in generated texts. To solve these problems, we propose a head-modifier template-based method to ensure the readability and data fidelity of generated type descriptions. We also propose a new dataset and two automatic metrics for this task. Experiments show that our method improves substantially compared with baselines and achieves state-of-the-art performance on both datasets.
Tasks Knowledge Graphs
Published 2019-05-29
URL https://arxiv.org/abs/1905.12198v1
PDF https://arxiv.org/pdf/1905.12198v1.pdf
PWC https://paperswithcode.com/paper/ensuring-readability-and-data-fidelity-using
Repo https://github.com/Michael0134/HedModTmplGen
Framework pytorch

Pricing options and computing implied volatilities using neural networks

Title Pricing options and computing implied volatilities using neural networks
Authors Shuaiqiang Liu, Cornelis W. Oosterlee, Sander M. Bohte
Abstract This paper proposes a data-driven approach, by means of an Artificial Neural Network (ANN), to value financial options and to calculate implied volatilities with the aim of accelerating the corresponding numerical methods. With ANNs being universal function approximators, this method trains an optimized ANN on a data set generated by a sophisticated financial model, and runs the trained ANN as an agent of the original solver in a fast and efficient way. We test this approach on three different types of solvers, including the analytic solution for the Black-Scholes equation, the COS method for the Heston stochastic volatility model and Brent’s iterative root-finding method for the calculation of implied volatilities. The numerical results show that the ANN solver can reduce the computing time significantly.
Tasks
Published 2019-01-25
URL http://arxiv.org/abs/1901.08943v2
PDF http://arxiv.org/pdf/1901.08943v2.pdf
PWC https://paperswithcode.com/paper/pricing-options-and-computing-implied
Repo https://github.com/MoSharieff/BSNeuralNet
Framework none

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

Title The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks
Authors Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, Dawn Song
Abstract This paper studies model-inversion attacks, in which the access to a model is abused to infer information about the training data. Since its first introduction by~\citet{fredrikson2014privacy}, such attacks have raised serious concerns given that training data usually contain privacy sensitive information. Thus far, successful model-inversion attacks have only been demonstrated on simple models, such as linear regression and logistic regression. Previous attempts to invert neural networks, even the ones with simple architectures, have failed to produce convincing results. Here we present a novel attack method, termed the \emph{generative model-inversion attack}, which can invert deep neural networks with high success rates. Rather than reconstructing private training data from scratch, we leverage partial public information, which can be very generic, to learn a distributional prior via generative adversarial networks (GANs) and use it to guide the inversion process. Moreover, we theoretically prove that a model’s predictive power and its vulnerability to inversion attacks are indeed two sides of the same coin—highly predictive models are able to establish a strong correlation between features and labels, which coincides exactly with what an adversary exploits to mount the attacks. Our extensive experiments demonstrate that the proposed attack improves identification accuracy over the existing work by about $75%$ for reconstructing face images from a state-of-the-art face recognition classifier. We also show that differential privacy, in its canonical form, is of little avail to defend against our attacks.
Tasks Face Recognition
Published 2019-11-17
URL https://arxiv.org/abs/1911.07135v1
PDF https://arxiv.org/pdf/1911.07135v1.pdf
PWC https://paperswithcode.com/paper/the-secret-revealer-generative-model-1
Repo https://github.com/JJublanc/model_inversion_attack
Framework none

Small-Footprint Keyword Spotting on Raw Audio Data with Sinc-Convolutions

Title Small-Footprint Keyword Spotting on Raw Audio Data with Sinc-Convolutions
Authors Simon Mittermaier, Ludwig Kürzinger, Bernd Waschneck, Gerhard Rigoll
Abstract Keyword Spotting (KWS) enables speech-based user interaction on smart devices. Always-on and battery-powered application scenarios for smart devices put constraints on hardware resources and power consumption, while also demanding high accuracy as well as real-time capability. Previous architectures first extracted acoustic features and then applied a neural network to classify keyword probabilities, optimizing towards memory footprint and execution time. Compared to previous publications, we took additional steps to reduce power and memory consumption without reducing classification accuracy. Power-consuming audio preprocessing and data transfer steps are eliminated by directly classifying from raw audio. For this, our end-to-end architecture extracts spectral features using parametrized Sinc-convolutions. Its memory footprint is further reduced by grouping depthwise separable convolutions. Our network achieves the competitive accuracy of 96.4% on Google’s Speech Commands test set with only 62k parameters.
Tasks Keyword Spotting, Small-Footprint Keyword Spotting
Published 2019-11-05
URL https://arxiv.org/abs/1911.02086v1
PDF https://arxiv.org/pdf/1911.02086v1.pdf
PWC https://paperswithcode.com/paper/small-footprint-keyword-spotting-on-raw-audio
Repo https://github.com/renyuanL/ry-Speech-commands
Framework tf

Saliency-driven Word Alignment Interpretation for Neural Machine Translation

Title Saliency-driven Word Alignment Interpretation for Neural Machine Translation
Authors Shuoyang Ding, Hainan Xu, Philipp Koehn
Abstract Despite their original goal to jointly learn to align and translate, Neural Machine Translation (NMT) models, especially Transformer, are often perceived as not learning interpretable word alignments. In this paper, we show that NMT models do learn interpretable word alignments, which could only be revealed with proper interpretation methods. We propose a series of such methods that are model-agnostic, are able to be applied either offline or online, and do not require parameter update or architectural change. We show that under the force decoding setup, the alignments induced by our interpretation method are of better quality than fast-align for some systems, and when performing free decoding, they agree well with the alignments induced by automatic alignment tools.
Tasks Machine Translation, Word Alignment
Published 2019-06-25
URL https://arxiv.org/abs/1906.10282v2
PDF https://arxiv.org/pdf/1906.10282v2.pdf
PWC https://paperswithcode.com/paper/saliency-driven-word-alignment-interpretation
Repo https://github.com/shuoyangd/meerkat
Framework pytorch

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

Title Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Authors Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin
Abstract In the natural language processing literature, neural networks are becoming increasingly deeper and complex. The recent poster child of this trend is the deep language representation model, which includes BERT, ELMo, and GPT. These developments have led to the conviction that previous-generation, shallower neural networks for language understanding are obsolete. In this paper, however, we demonstrate that rudimentary, lightweight neural networks can still be made competitive without architecture changes, external training data, or additional input features. We propose to distill knowledge from BERT, a state-of-the-art language representation model, into a single-layer BiLSTM, as well as its siamese counterpart for sentence-pair tasks. Across multiple datasets in paraphrasing, natural language inference, and sentiment classification, we achieve comparable results with ELMo, while using roughly 100 times fewer parameters and 15 times less inference time.
Tasks Natural Language Inference, Sentiment Analysis
Published 2019-03-28
URL http://arxiv.org/abs/1903.12136v1
PDF http://arxiv.org/pdf/1903.12136v1.pdf
PWC https://paperswithcode.com/paper/distilling-task-specific-knowledge-from-bert
Repo https://github.com/pvgladkov/knowledge-distillation
Framework pytorch

CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers

Title CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
Authors Alexandros Koliousis, Pijika Watcharapichat, Matthias Weidlich, Luo Mai, Paolo Costa, Peter Pietzuch
Abstract Deep learning models are trained on servers with many GPUs, and training must scale with the number of GPUs. Systems such as TensorFlow and Caffe2 train models with parallel synchronous stochastic gradient descent: they process a batch of training data at a time, partitioned across GPUs, and average the resulting partial gradients to obtain an updated global model. To fully utilise all GPUs, systems must increase the batch size, which hinders statistical efficiency. Users tune hyper-parameters such as the learning rate to compensate for this, which is complex and model-specific. We describe CROSSBOW, a new single-server multi-GPU system for training deep learning models that enables users to freely choose their preferred batch size - however small - while scaling to multiple GPUs. CROSSBOW uses many parallel model replicas and avoids reduced statistical efficiency through a new synchronous training method. We introduce SMA, a synchronous variant of model averaging in which replicas independently explore the solution space with gradient descent, but adjust their search synchronously based on the trajectory of a globally-consistent average model. CROSSBOW achieves high hardware efficiency with small batch sizes by potentially training multiple model replicas per GPU, automatically tuning the number of replicas to maximise throughput. Our experiments show that CROSSBOW improves the training time of deep learning models on an 8-GPU server by 1.3-4x compared to TensorFlow.
Tasks
Published 2019-01-08
URL http://arxiv.org/abs/1901.02244v1
PDF http://arxiv.org/pdf/1901.02244v1.pdf
PWC https://paperswithcode.com/paper/crossbow-scaling-deep-learning-with-small
Repo https://github.com/lsds/Crossbow
Framework tf
comments powered by Disqus