February 1, 2020

3470 words 17 mins read

Paper Group AWR 146

Distillation $\approx$ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network. Contextual Hybrid Session-based News Recommendation with Recurrent Neural Networks. Learning to learn with quantum neural networks via classical neural networks. LayoutLM: Pre-training of Text and Layout …

Distillation $\approx$ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network


Title	Distillation $\approx$ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network
Authors	Bin Dong, Jikai Hou, Yiping Lu, Zhihua Zhang
Abstract	Distillation is a method to transfer knowledge from one model to another and often achieves higher accuracy with the same capacity. In this paper, we aim to provide a theoretical understanding on what mainly helps with the distillation. Our answer is “early stopping”. Assuming that the teacher network is overparameterized, we argue that the teacher network is essentially harvesting dark knowledge from the data via early stopping. This can be justified by a new concept, {Anisotropic Information Retrieval (AIR)}, which means that the neural network tends to fit the informative information first and the non-informative information (including noise) later. Motivated by the recent development on theoretically analyzing overparameterized neural networks, we can characterize AIR by the eigenspace of the Neural Tangent Kernel(NTK). AIR facilities a new understanding of distillation. With that, we further utilize distillation to refine noisy labels. We propose a self-distillation algorithm to sequentially distill knowledge from the network in the previous training epoch to avoid memorizing the wrong labels. We also demonstrate, both theoretically and empirically, that self-distillation can benefit from more than just early stopping. Theoretically, we prove convergence of the proposed algorithm to the ground truth labels for randomly initialized overparameterized neural networks in terms of $\ell_2$ distance, while the previous result was on convergence in $0$-$1$ loss. The theoretical result ensures the learned neural network enjoy a margin on the training data which leads to better generalization. Empirically, we achieve better testing accuracy and entirely avoid early stopping which makes the algorithm more user-friendly.
Tasks	Information Retrieval
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01255v1
PDF	https://arxiv.org/pdf/1910.01255v1.pdf
PWC	https://paperswithcode.com/paper/distillation-approx-early-stopping-harvesting
Repo	https://github.com/lizhemin15/self-distillation
Framework	pytorch

Contextual Hybrid Session-based News Recommendation with Recurrent Neural Networks


Title	Contextual Hybrid Session-based News Recommendation with Recurrent Neural Networks
Authors	Gabriel de Souza Pereira Moreira, Dietmar Jannach, Adilson Marques da Cunha
Abstract	Recommender systems help users deal with information overload by providing tailored item suggestions to them. The recommendation of news is often considered to be challenging, since the relevance of an article for a user can depend on a variety of factors, including the user’s short-term reading interests, the reader’s context, or the recency or popularity of an article. Previous work has shown that the use of Recurrent Neural Networks is promising for the next-in-session prediction task, but has certain limitations when only recorded item click sequences are used as input. In this work, we present a contextual hybrid, deep learning based approach for session-based news recommendation that is able to leverage a variety of information types. We evaluated our approach on two public datasets, using a temporal evaluation protocol that simulates the dynamics of a news portal in a realistic way. Our results confirm the benefits of considering additional types of information, including article popularity and recency, in the proposed way, resulting in significantly higher recommendation accuracy and catalog coverage than other session-based algorithms. Additional experiments show that the proposed parameterizable loss function used in our method also allows us to balance two usually conflicting quality factors, accuracy and novelty. Keywords: Artificial Neural Networks, Context-Aware Recommender Systems, Hybrid Recommender Systems, News Recommender Systems, Session-based Recommendation
Tasks	Recommendation Systems, Session-Based Recommendations
Published	2019-04-15
URL	https://arxiv.org/abs/1904.10367v2
PDF	https://arxiv.org/pdf/1904.10367v2.pdf
PWC	https://paperswithcode.com/paper/190410367
Repo	https://github.com/gabrielspmoreira/chameleon_recsys
Framework	tf

Learning to learn with quantum neural networks via classical neural networks


Title	Learning to learn with quantum neural networks via classical neural networks
Authors	Guillaume Verdon, Michael Broughton, Jarrod R. McClean, Kevin J. Sung, Ryan Babbush, Zhang Jiang, Hartmut Neven, Masoud Mohseni
Abstract	Quantum Neural Networks (QNNs) are a promising variational learning paradigm with applications to near-term quantum processors, however they still face some significant challenges. One such challenge is finding good parameter initialization heuristics that ensure rapid and consistent convergence to local minima of the parameterized quantum circuit landscape. In this work, we train classical neural networks to assist in the quantum learning process, also know as meta-learning, to rapidly find approximate optima in the parameter landscape for several classes of quantum variational algorithms. Specifically, we train classical recurrent neural networks to find approximately optimal parameters within a small number of queries of the cost function for the Quantum Approximate Optimization Algorithm (QAOA) for MaxCut, QAOA for Sherrington-Kirkpatrick Ising model, and for a Variational Quantum Eigensolver for the Hubbard model. By initializing other optimizers at parameter values suggested by the classical neural network, we demonstrate a significant improvement in the total number of optimization iterations required to reach a given accuracy. We further demonstrate that the optimization strategies learned by the neural network generalize well across a range of problem instance sizes. This opens up the possibility of training on small, classically simulatable problem instances, in order to initialize larger, classically intractably simulatable problem instances on quantum devices, thereby significantly reducing the number of required quantum-classical optimization iterations.
Tasks	Meta-Learning
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05415v1
PDF	https://arxiv.org/pdf/1907.05415v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-learn-with-quantum-neural
Repo	https://github.com/dumkar/learning-to-learn-qnn
Framework	pytorch

LayoutLM: Pre-training of Text and Layout for Document Image Understanding


Title	LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Authors	Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou
Abstract	Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread of pre-training models for NLP applications, they almost focused on text-level manipulation, while neglecting the layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model the interaction between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage the image features to incorporate the visual information of words into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at https://github.com/microsoft/unilm/tree/master/layoutlm.
Tasks	Document Image Classification, Document Layout Analysis, Image Classification
Published	2019-12-31
URL	https://arxiv.org/abs/1912.13318v3
PDF	https://arxiv.org/pdf/1912.13318v3.pdf
PWC	https://paperswithcode.com/paper/layoutlm-pre-training-of-text-and-layout-for
Repo	https://github.com/microsoft/unilm/tree/master/layoutlm
Framework	pytorch

Path-Augmented Graph Transformer Network


Title	Path-Augmented Graph Transformer Network
Authors	Benson Chen, Regina Barzilay, Tommi Jaakkola
Abstract	Much of the recent work on learning molecular representations has been based on Graph Convolution Networks (GCN). These models rely on local aggregation operations and can therefore miss higher-order graph properties. To remedy this, we propose Path-Augmented Graph Transformer Networks (PAGTN) that are explicitly built on longer-range dependencies in graph-structured data. Specifically, we use path features in molecular graphs to create global attention layers. We compare our PAGTN model against the GCN model and show that our model consistently outperforms GCNs on molecular property prediction datasets including quantum chemistry (QM7, QM8, QM9), physical chemistry (ESOL, Lipophilictiy) and biochemistry (BACE, BBBP).
Tasks	Molecular Property Prediction
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12712v1
PDF	https://arxiv.org/pdf/1905.12712v1.pdf
PWC	https://paperswithcode.com/paper/path-augmented-graph-transformer-network
Repo	https://github.com/benatorc/PA-Graph-Transformer
Framework	pytorch

Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach


Title	Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach
Authors	Wenpeng Yin, Jamaal Hay, Dan Roth
Abstract	Zero-shot text classification (0Shot-TC) is a challenging NLU problem to which little attention has been paid by the research community. 0Shot-TC aims to associate an appropriate label with a piece of text, irrespective of the text domain and the aspect (e.g., topic, emotion, event, etc.) described by the label. And there are only a few articles studying 0Shot-TC, all focusing only on topical categorization which, we argue, is just the tip of the iceberg in 0Shot-TC. In addition, the chaotic experiments in literature make no uniform comparison, which blurs the progress. This work benchmarks the 0Shot-TC problem by providing unified datasets, standardized evaluations, and state-of-the-art baselines. Our contributions include: i) The datasets we provide facilitate studying 0Shot-TC relative to conceptually different and diverse aspects: the `topic'' aspect includes` sports’’ and `politics'' as labels; the` emotion’’ aspect includes `joy'' and` anger’'; the `situation'' aspect includes` medical assistance’’ and ``water shortage’'. ii) We extend the existing evaluation setup (label-partially-unseen) – given a dataset, train on some labels, test on all labels – to include a more challenging yet realistic evaluation label-fully-unseen 0Shot-TC (Chang et al., 2008), aiming at classifying text snippets without seeing task specific training data at all. iii) We unify the 0Shot-TC of diverse aspects within a textual entailment formulation and study it this way. Code & Data: https://github.com/yinwenpeng/BenchmarkingZeroShot \|
Tasks	Natural Language Inference, Text Classification
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00161v1
PDF	https://arxiv.org/pdf/1909.00161v1.pdf
PWC	https://paperswithcode.com/paper/benchmarking-zero-shot-text-classification
Repo	https://github.com/yinwenpeng/BenchmarkingZeroShot
Framework	pytorch

The Broad Optimality of Profile Maximum Likelihood


Title	The Broad Optimality of Profile Maximum Likelihood
Authors	Yi Hao, Alon Orlitsky
Abstract	We study three fundamental statistical-learning problems: distribution estimation, property estimation, and property testing. We establish the profile maximum likelihood (PML) estimator as the first unified sample-optimal approach to a wide range of learning tasks. In particular, for every alphabet size $k$ and desired accuracy $\varepsilon$: $\textbf{Distribution estimation}$ Under $\ell_1$ distance, PML yields optimal $\Theta(k/(\varepsilon^2\log k))$ sample complexity for sorted-distribution estimation, and a PML-based estimator empirically outperforms the Good-Turing estimator on the actual distribution; $\textbf{Additive property estimation}$ For a broad class of additive properties, the PML plug-in estimator uses just four times the sample size required by the best estimator to achieve roughly twice its error, with exponentially higher confidence; $\boldsymbol{\alpha}\textbf{-R'enyi entropy estimation}$ For integer $\alpha>1$, the PML plug-in estimator has optimal $k^{1-1/\alpha}$ sample complexity; for non-integer $\alpha>3/4$, the PML plug-in estimator has sample complexity lower than the state of the art; $\textbf{Identity testing}$ In testing whether an unknown distribution is equal to or at least $\varepsilon$ far from a given distribution in $\ell_1$ distance, a PML-based tester achieves the optimal sample complexity up to logarithmic factors of $k$. Most of these results also hold for a near-linear-time computable variant of PML. Stronger results hold for a different and novel variant called truncated PML (TPML).
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03794v3
PDF	https://arxiv.org/pdf/1906.03794v3.pdf
PWC	https://paperswithcode.com/paper/the-broad-optimality-of-profile-maximum
Repo	https://github.com/ucsdyi/PML_poster
Framework	none

UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition


Title	UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition
Authors	Asanka G Perera, Yee Wei Law, Javaan Chahl
Abstract	Current UAV-recorded datasets are mostly limited to action recognition and object tracking, whereas the gesture signals datasets were mostly recorded in indoor spaces. Currently, there is no outdoor recorded public video dataset for UAV commanding signals. Gesture signals can be effectively used with UAVs by leveraging the UAVs visual sensors and operational simplicity. To fill this gap and enable research in wider application areas, we present a UAV gesture signals dataset recorded in an outdoor setting. We selected 13 gestures suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. We provide 119 high-definition video clips consisting of 37151 frames. The overall baseline gesture recognition performance computed using Pose-based Convolutional Neural Network (P-CNN) is 91.9 %. All the frames are annotated with body joints and gesture classes in order to extend the dataset’s applicability to a wider research area including gesture recognition, action recognition, human pose recognition and situation awareness.
Tasks	Gesture Recognition, Object Tracking, Temporal Action Localization
Published	2019-01-09
URL	http://arxiv.org/abs/1901.02602v1
PDF	http://arxiv.org/pdf/1901.02602v1.pdf
PWC	https://paperswithcode.com/paper/uav-gesture-a-dataset-for-uav-control-and
Repo	https://github.com/asankagp/UAV-GESTURE
Framework	none

Toward Controlling Discrimination in Online Ad Auctions


Title	Toward Controlling Discrimination in Online Ad Auctions
Authors	L. Elisa Celis, Anay Mehrotra, Nisheeth K. Vishnoi
Abstract	Online advertising platforms are thriving due to the customizable audiences they offer advertisers. However, recent studies show that advertisements can be discriminatory with respect to the gender or race of the audience that sees the ad, and may inadvertently cross ethical and/or legal boundaries. To prevent this, we propose a constrained ad auction framework that maximizes the platform’s revenue conditioned on ensuring that the audience seeing an advertiser’s ad is distributed appropriately across sensitive types such as gender or race. Building upon Myerson’s classic work, we first present an optimal auction mechanism for a large class of fairness constraints. Finding the parameters of this optimal auction, however, turns out to be a non-convex problem. We show that this non-convex problem can be reformulated as a more structured non-convex problem with no saddle points or local-maxima; this allows us to develop a gradient-descent-based algorithm to solve it. Our empirical results on the A1 Yahoo! dataset demonstrate that our algorithm can obtain uniform coverage across different user types for each advertiser at a minor loss to the revenue of the platform, and a small change to the size of the audience each advertiser reaches.
Tasks
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10450v2
PDF	https://arxiv.org/pdf/1901.10450v2.pdf
PWC	https://paperswithcode.com/paper/fair-online-advertising
Repo	https://github.com/AnayMehrotra/Fair-Online-Advertising
Framework	none

Adversarial Learning of Deepfakes in Accounting


Title	Adversarial Learning of Deepfakes in Accounting
Authors	Marco Schreyer, Timur Sattarov, Bernd Reimer, Damian Borth
Abstract	Nowadays, organizations collect vast quantities of accounting relevant transactions, referred to as ‘journal entries’, in ‘Enterprise Resource Planning’ (ERP) systems. The aggregation of those entries ultimately defines an organization’s financial statement. To detect potential misstatements and fraud, international audit standards demand auditors to directly assess journal entries using ‘Computer Assisted AuditTechniques’ (CAATs). At the same time, discoveries in deep learning research revealed that machine learning models are vulnerable to ‘adversarial attacks’. It also became evident that such attack techniques can be misused to generate ‘Deepfakes’ designed to directly attack the perception of humans by creating convincingly altered media content. The research of such developments and their potential impact on the finance and accounting domain is still in its early stage. We believe that it is of vital relevance to investigate how such techniques could be maliciously misused in this sphere. In this work, we show an adversarial attack against CAATs using deep neural networks. We first introduce a real-world ‘thread model’ designed to camouflage accounting anomalies such as fraudulent journal entries. Second, we show that adversarial autoencoder neural networks are capable of learning a human interpretable model of journal entries that disentangles the entries latent generative factors. Finally, we demonstrate how such a model can be maliciously misused by a perpetrator to generate robust ‘adversarial’ journal entries that mislead CAATs.
Tasks	Adversarial Attack
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03810v1
PDF	https://arxiv.org/pdf/1910.03810v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-learning-of-deepfakes-in
Repo	https://github.com/GitiHubi/deepPaper
Framework	none

A principled approach for generating adversarial images under non-smooth dissimilarity metrics


Title	A principled approach for generating adversarial images under non-smooth dissimilarity metrics
Authors	Aram-Alexandre Pooladian, Chris Finlay, Tim Hoheisel, Adam Oberman
Abstract	Deep neural networks perform well on real world data but are prone to adversarial perturbations: small changes in the input easily lead to misclassification. In this work, we propose an attack methodology not only for cases where the perturbations are measured by $\ell_p$ norms, but in fact any adversarial dissimilarity metric with a closed proximal form. This includes, but is not limited to, $\ell_1, \ell_2$, and $\ell_\infty$ perturbations; the $\ell_0$ counting “norm” (i.e. true sparseness); and the total variation seminorm, which is a (non-$\ell_p$) convolutional dissimilarity measuring local pixel changes. Our approach is a natural extension of a recent adversarial attack method, and eliminates the differentiability requirement of the metric. We demonstrate our algorithm, ProxLogBarrier, on the MNIST, CIFAR10, and ImageNet-1k datasets. We consider undefended and defended models, and show that our algorithm easily transfers to various datasets. We observe that ProxLogBarrier outperforms a host of modern adversarial attacks specialized for the $\ell_0$ case. Moreover, by altering images in the total variation seminorm, we shed light on a new class of perturbations that exploit neighboring pixel information.
Tasks	Adversarial Attack
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01667v2
PDF	https://arxiv.org/pdf/1908.01667v2.pdf
PWC	https://paperswithcode.com/paper/a-principled-approach-for-generating
Repo	https://github.com/APooladian/ProxLogBarrierAttack
Framework	pytorch

TextKD-GAN: Text Generation using KnowledgeDistillation and Generative Adversarial Networks


Title	TextKD-GAN: Text Generation using KnowledgeDistillation and Generative Adversarial Networks
Authors	Md. Akmal Haidar, Mehdi Rezagholizadeh
Abstract	Text generation is of particular interest in many NLP applications such as machine translation, language modeling, and text summarization. Generative adversarial networks (GANs) achieved a remarkable success in high quality image generation in computer vision,and recently, GANs have gained lots of interest from the NLP community as well. However, achieving similar success in NLP would be more challenging due to the discrete nature of text. In this work, we introduce a method using knowledge distillation to effectively exploit GAN setup for text generation. We demonstrate how autoencoders (AEs) can be used for providing a continuous representation of sentences, which is a smooth representation that assign non-zero probabilities to more than one word. We distill this representation to train the generator to synthesize similar smooth representations. We perform a number of experiments to validate our idea using different datasets and show that our proposed approach yields better performance in terms of the BLEU score and Jensen-Shannon distance (JSD) measure compared to traditional GAN-based text generation approaches without pre-training.
Tasks	Image Generation, Language Modelling, Machine Translation, Text Generation, Text Summarization
Published	2019-04-23
URL	http://arxiv.org/abs/1905.01976v1
PDF	http://arxiv.org/pdf/1905.01976v1.pdf
PWC	https://paperswithcode.com/paper/190501976
Repo	https://github.com/Ankur3107/awesome-daily-blog
Framework	tf

Few-Shot NLG with Pre-Trained Language Model


Title	Few-Shot NLG with Pre-Trained Language Model
Authors	Zhiyu Chen, Harini Eavani, Wenhu Chen, Yinyin Liu, William Yang Wang
Abstract	Neural-based end-to-end approaches to natural language generation (NLG) from structured data or knowledge are data-hungry, making their adoption for real-world applications difficult with limited data. In this work, we propose the new task of \textit{few-shot natural language generation}. Motivated by how humans tend to summarize tabular data, we propose a simple yet effective approach and show that it not only demonstrates strong performance but also provides good generalization across domains. The design of the model architecture is based on two aspects: content selection/copying from input data and language modeling to compose coherent sentences, which can be acquired from prior knowledge. Accordingly, we employ a pre-trained domain-independent language model to serve as the prior, while the content selection/copying can be learned with only a few in-domain training instances, thus attaining the few-shot learning objective. To demonstrate that our approach generalizes across domains, we curated table-to-text data from multiple domains. With just 200 training examples, we show that our approach achieves very reasonable performances and outperforms the strongest baseline by an average of over 8.0 BLEU points improvement. Our code and data is publicly available at https://github.com/czyssrs/Few-Shot-NLG
Tasks	Few-Shot Learning, Language Modelling, Text Generation
Published	2019-04-21
URL	https://arxiv.org/abs/1904.09521v2
PDF	https://arxiv.org/pdf/1904.09521v2.pdf
PWC	https://paperswithcode.com/paper/few-shot-nlg-with-pre-trained-language-model
Repo	https://github.com/czyssrs/Few-Shot-NLG
Framework	tf

Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection


Title	Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection
Authors	Jonathan Aigrain, Marcin Detyniecki
Abstract	Despite having excellent performances for a wide variety of tasks, modern neural networks are unable to provide a reliable confidence value allowing to detect misclassifications. This limitation is at the heart of what is known as an adversarial example, where the network provides a wrong prediction associated with a strong confidence to a slightly modified image. Moreover, this overconfidence issue has also been observed for regular errors and out-of-distribution data. We tackle this problem by what we call introspection, i.e. using the information provided by the logits of an already pretrained neural network. We show that by training a simple 3-layers neural network on top of the logit activations, we are able to detect misclassifications at a competitive level.
Tasks
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09186v1
PDF	https://arxiv.org/pdf/1905.09186v1.pdf
PWC	https://paperswithcode.com/paper/detecting-adversarial-examples-and-other
Repo	https://github.com/gietema/umpire
Framework	tf

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting


Title	Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Authors	Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang
Abstract	Quantization can improve the execution latency and energy efficiency of neural networks on both commodity GPUs and specialized accelerators. The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training. DNN weights and activations follow a bell-shaped distribution post-training, while practical hardware uses a linear quantization grid. This leads to challenges in dealing with outliers in the distribution. Prior work has addressed this by clipping the outliers or using specialized hardware. In this work, we propose outlier channel splitting (OCS), which duplicates channels containing outliers, then halves the channel values. The network remains functionally identical, but affected outliers are moved toward the center of the distribution. OCS requires no additional training and works on commodity hardware. Experimental evaluation on ImageNet classification and language modeling shows that OCS can outperform state-of-the-art clipping techniques with only minor overhead.
Tasks	Language Modelling, Neural Network Compression, Quantization
Published	2019-01-28
URL	https://arxiv.org/abs/1901.09504v3
PDF	https://arxiv.org/pdf/1901.09504v3.pdf
PWC	https://paperswithcode.com/paper/improving-neural-network-quantization-without
Repo	https://github.com/NervanaSystems/distiller
Framework	pytorch