April 2, 2020

3461 words 17 mins read

Paper Group ANR 278

Rethinking Batch Normalization in Transformers. Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream. PointAR: Efficient Lighting Estimation for Mobile Augmented Reality. Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification. Distributional semantic modeling: a revised tec …

Rethinking Batch Normalization in Transformers


Title	Rethinking Batch Normalization in Transformers
Authors	Sheng Shen, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer
Abstract	The standard normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN). This is different than batch normalization (BN), which is widely-adopted in Computer Vision. The preferred use of LN in NLP is principally due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks; however, a thorough understanding of the underlying reasons for this is not always evident. In this paper, we perform a systematic study of NLP transformer models to understand why BN has a poor performance, as compared to LN. We find that the statistics of NLP data across the batch dimension exhibit large fluctuations throughout training. This results in instability, if BN is naively implemented. To address this, we propose Power Normalization (PN), a novel normalization scheme that resolves this issue by (i) relaxing zero-mean normalization in BN, (ii) incorporating a running quadratic mean instead of per batch statistics to stabilize fluctuations, and (iii) using an approximate backpropagation for incorporating the running statistics in the forward pass. We show theoretically, under mild assumptions, that PN leads to a smaller Lipschitz constant for the loss, compared with BN. Furthermore, we prove that the approximate backpropagation scheme leads to bounded gradients. We extensively test PN for transformers on a range of NLP tasks, and we show that it significantly outperforms both LN and BN. In particular, PN outperforms LN by 0.4/0.6 BLEU on IWSLT14/WMT14 and 5.6/3.0 PPL on PTB/WikiText-103.
Tasks
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07845v1
PDF	https://arxiv.org/pdf/2003.07845v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-batch-normalization-in
Repo
Framework

Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream


Title	Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream
Authors	Simran K, Prathiksha Balakrishna, Vinayakumar R, Soman KP
Abstract	In recent days, the amount of Cyber Security text data shared via social media resources mainly Twitter has increased. An accurate analysis of this data can help to develop cyber threat situational awareness framework for a cyber threat. This work proposes a deep learning based approach for tweet data analysis. To convert the tweets into numerical representations, various text representations are employed. These features are feed into deep learning architecture for optimal feature extraction as well as classification. Various hyperparameter tuning approaches are used for identifying optimal text representation method as well as optimal network parameters and network structures for deep learning models. For comparative analysis, the classical text representation method with classical machine learning algorithm is employed. From the detailed analysis of experiments, we found that the deep learning architecture with advanced text representation methods performed better than the classical text representation and classical machine learning algorithms. The primary reason for this is that the advanced text representation methods have the capability to learn sequential properties which exist among the textual data and deep learning architectures learns the optimal features along with decreasing the feature size.
Tasks
Published	2020-03-31
URL	https://arxiv.org/abs/2004.00503v1
PDF	https://arxiv.org/pdf/2004.00503v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-approach-for-enhanced-cyber
Repo
Framework

PointAR: Efficient Lighting Estimation for Mobile Augmented Reality


Title	PointAR: Efficient Lighting Estimation for Mobile Augmented Reality
Authors	Yiqin Zhao, Tian Guo
Abstract	We propose an efficient lighting estimation pipeline that is suitable to run on modern mobile devices, with comparable resource complexities to state-of-the-art on-device deep learning models. Our pipeline, referred to as PointAR, takes a single RGB-D image captured from the mobile camera and a 2D location in that image, and estimates a 2nd order spherical harmonics coefficients which can be directly utilized by rendering engines for indoor lighting in the context of augmented reality. Our key insight is to formulate the lighting estimation as a learning problem directly from point clouds, which is in part inspired by the Monte Carlo integration leveraged by real-time spherical harmonics lighting. While existing approaches estimate lighting information with complex deep learning pipelines, our method focuses on reducing the computational complexity. Through both quantitative and qualitative experiments, we demonstrate that PointAR achieves lower lighting estimation errors compared to state-of-the-art methods. Further, our method requires an order of magnitude lower resource, comparable to that of mobile-specific DNNs.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2004.00006v1
PDF	https://arxiv.org/pdf/2004.00006v1.pdf
PWC	https://paperswithcode.com/paper/pointar-efficient-lighting-estimation-for
Repo
Framework

Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification


Title	Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification
Authors	Chris Dulhanty, Alexander Wong
Abstract	Modern face recognition systems leverage datasets containing images of hundreds of thousands of specific individuals’ faces to train deep convolutional neural networks to learn an embedding space that maps an arbitrary individual’s face to a vector representation of their identity. The performance of a face recognition system in face verification (1:1) and face identification (1:N) tasks is directly related to the ability of an embedding space to discriminate between identities. Recently, there has been significant public scrutiny into the source and privacy implications of large-scale face recognition training datasets such as MS-Celeb-1M and MegaFace, as many people are uncomfortable with their face being used to train dual-use technologies that can enable mass surveillance. However, the impact of an individual’s inclusion in training data on a derived system’s ability to recognize them has not previously been studied. In this work, we audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model’s training data and an accuracy of 75.73% for those not present. This modest difference in accuracy demonstrates that face recognition systems using deep learning work better for individuals they are trained on, which has serious privacy implications when one considers all major open source face recognition training datasets do not obtain informed consent from individuals during their collection.
Tasks	Face Identification, Face Recognition, Face Verification
Published	2020-01-09
URL	https://arxiv.org/abs/2001.03071v2
PDF	https://arxiv.org/pdf/2001.03071v2.pdf
PWC	https://paperswithcode.com/paper/investigating-the-impact-of-inclusion-in-face
Repo
Framework


Title	Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach
Authors	Oleksandr Palagin, Vitalii Velychko, Kyrylo Malakhov, Oleksandr Shchurov
Abstract	We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings) - term vector space models as a result, inspired by the recent ontology-related approach (using different types of contextual knowledge such as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to the identification of terms (term extraction) and relations between them (relation extraction) called semantic pre-processing technology - SPT. Our method relies on automatic term extraction from the natural language texts and subsequent formation of the problem-oriented or application-oriented (also deeply annotated) text corpora where the fundamental entity is the term (includes non-compositional and compositional terms). This gives us an opportunity to changeover from distributed word representations (or word embeddings) to distributed term representations (or term embeddings). This transition will allow to generate more accurate semantic maps of different subject domains (also, of relations between input terms - it is useful to explore clusters and oppositions, or to test your hypotheses about them). The semantic map can be represented as a graph using Vec2graph - a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs. The Vec2graph library coupled with term embeddings will not only improve accuracy in solving standard NLP tasks, but also update the conventional concept of automated ontology development. The main practical result of our work is the development kit (set of toolkits represented as web service APIs and web application), which provides all necessary routines for the basic linguistic pre-processing and the semantic pre-processing of the natural language texts in Ukrainian for future training of term vector space models.
Tasks	Relation Extraction, Word Embeddings
Published	2020-03-06
URL	https://arxiv.org/abs/2003.03350v1
PDF	https://arxiv.org/pdf/2003.03350v1.pdf
PWC	https://paperswithcode.com/paper/distributional-semantic-modeling-a-revised
Repo
Framework

EQL – an extremely easy to learn knowledge graph query language, achieving highspeed and precise search


Title	EQL – an extremely easy to learn knowledge graph query language, achieving highspeed and precise search
Authors	Han Liu, Shantao Liu
Abstract	EQL, also named as Extremely Simple Query Language, can be widely used in the field of knowledge graph, precise search, strong artificial intelligence, database, smart speaker ,patent search and other fields. EQL adopt the principle of minimalism in design and pursues simplicity and easy to learn so that everyone can master it quickly. EQL language and lambda calculus are interconvertible, that reveals the mathematical nature of EQL language, and lays a solid foundation for rigor and logical integrity of EQL language. The EQL language and a comprehensive knowledge graph system with the world’s commonsense can together form the foundation of strong AI in the future, and make up for the current lack of understanding of world’s commonsense by current AI system. EQL language can be used not only by humans, but also as a basic language for data query and data exchange between robots.
Tasks
Published	2020-03-19
URL	https://arxiv.org/abs/2003.11105v1
PDF	https://arxiv.org/pdf/2003.11105v1.pdf
PWC	https://paperswithcode.com/paper/eql-an-extremely-easy-to-learn-knowledge
Repo
Framework

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? – A Neural Tangent Kernel Perspective


Title	Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? – A Neural Tangent Kernel Perspective
Authors	Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao
Abstract	Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called “neural tangent kernel” perspective. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities. Numerical results are provided to support our claim.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06262v1
PDF	https://arxiv.org/pdf/2002.06262v1.pdf
PWC	https://paperswithcode.com/paper/why-do-deep-residual-networks-generalize
Repo
Framework

Why Molière most likely did write his plays


Title	Why Molière most likely did write his plays
Authors	Florian Cafiero, Jean-Baptiste Camps
Abstract	As for Shakespeare, a hard-fought debate has emerged about Moli`ere, a supposedly uneducated actor who, according to some, could not have written the masterpieces attributed to him. In the past decades, the century-old thesis according to which Pierre Corneille would be their actual author has become popular, mostly because of new works in computational linguistics. These results are reassessed here through state-of-the-art attribution methods. We study a corpus of comedies in verse by major authors of Moli`ere and Corneille’s time. Analysis of lexicon, rhymes, word forms, affixes, morphosyntactic sequences, and function words do not give any clue that another author among the major playwrights of the time would have written the plays signed under the name Moli`ere.
Tasks
Published	2020-01-02
URL	https://arxiv.org/abs/2001.01595v1
PDF	https://arxiv.org/pdf/2001.01595v1.pdf
PWC	https://paperswithcode.com/paper/why-moliere-most-likely-did-write-his-plays
Repo
Framework

AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses


Title	AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses
Authors	Tong Niu, Mohit Bansal
Abstract	Many sequence-to-sequence dialogue models tend to generate safe, uninformative responses. There have been various useful efforts on trying to eliminate them. However, these approaches either improve decoding algorithms during inference, rely on hand-crafted features, or employ complex models. In our work, we build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. Specifically, we start with a simple yet effective automatic metric, AvgOut, which calculates the average output probability distribution of all time steps on the decoder side during training. This metric directly estimates which tokens are more likely to be generated, thus making it a faithful evaluation of the model diversity (i.e., for diverse models, the token probabilities should be more evenly distributed rather than peaked at a few dull tokens). We then leverage this novel metric to propose three models that promote diversity without losing relevance. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch; the second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level; the third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal. Moreover, we experiment with a hybrid model by combining the loss terms of MinAvgOut and RL. All four models outperform their base LSTM-RNN model on both diversity and relevance by a large margin, and are comparable to or better than competitive baselines (also verified via human evaluation). Moreover, our approaches are orthogonal to the base model, making them applicable as an add-on to other emerging better dialogue models in the future.
Tasks	Feature Engineering
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05467v1
PDF	https://arxiv.org/pdf/2001.05467v1.pdf
PWC	https://paperswithcode.com/paper/avgout-a-simple-output-probability-measure-to
Repo
Framework

Information Extraction based on Named Entity for Tourism Corpus


Title	Information Extraction based on Named Entity for Tourism Corpus
Authors	Chantana Chantrapornchai, Aphisit Tunsakul
Abstract	Tourism information is scattered around nowadays. To search for the information, it is usually time consuming to browse through the results from search engine, select and view the details of each accommodation. In this paper, we present a methodology to extract particular information from full text returned from the search engine to facilitate the users. Then, the users can specifically look to the desired relevant information. The approach can be used for the same task in other domains. The main steps are 1) building training data and 2) building recognition model. First, the tourism data is gathered and the vocabularies are built. The raw corpus is used to train for creating vocabulary embedding. Also, it is used for creating annotated data. The process of creating named entity annotation is presented. Then, the recognition model of a given entity type can be built. From the experiments, given hotel description, the model can extract the desired entity,i.e, name, location, facility. The extracted data can further be stored as a structured information, e.g., in the ontology format, for future querying and inference. The model for automatic named entity identification, based on machine learning, yields the error ranging 8%-25%.
Tasks
Published	2020-01-03
URL	https://arxiv.org/abs/2001.01588v1
PDF	https://arxiv.org/pdf/2001.01588v1.pdf
PWC	https://paperswithcode.com/paper/information-extraction-based-on-named-entity
Repo
Framework

Learning light field synthesis with Multi-Plane Images: scene encoding as a recurrent segmentation task


Title	Learning light field synthesis with Multi-Plane Images: scene encoding as a recurrent segmentation task
Authors	Tomás Völker, Guillaume Boisson, Bertrand Chupeau
Abstract	In this paper we address the problem of view synthesis from large baseline light fields, by turning a sparse set of input views into a Multi-plane Image (MPI). Because available datasets are scarce, we propose a lightweight network that does not require extensive training. Unlike latest approaches, our model does not learn to estimate RGB layers but only encodes the scene geometry within MPI alpha layers, which comes down to a segmentation task. A Learned Gradient Descent (LGD) framework is used to cascade the same convolutional network in a recurrent fashion in order to refine the volumetric representation obtained. Thanks to its low number of parameters, our model trains successfully on a small light field video dataset and provides visually appealing results. It also exhibits convenient generalization properties regarding both the number of input views, the number of depth planes in the MPI, and the number of refinement iterations.
Tasks
Published	2020-02-12
URL	https://arxiv.org/abs/2002.05028v2
PDF	https://arxiv.org/pdf/2002.05028v2.pdf
PWC	https://paperswithcode.com/paper/learning-light-field-synthesis-with-multi
Repo
Framework

Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base


Title	Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base
Authors	William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler
Abstract	We describe a novel way of representing a symbolic knowledge base (KB) called a sparse-matrix reified KB. This representation enables neural modules that are fully differentiable, faithful to the original semantics of the KB, expressive enough to model multi-hop inferences, and scalable enough to use with realistically large KBs. The sparse-matrix reified KB can be distributed across multiple GPUs, can scale to tens of millions of entities and facts, and is orders of magnitude faster than naive sparse-matrix implementations. The reified KB enables very simple end-to-end architectures to obtain competitive performance on several benchmarks representing two families of tasks: KB completion, and learning semantic parsers from denotations.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06115v1
PDF	https://arxiv.org/pdf/2002.06115v1.pdf
PWC	https://paperswithcode.com/paper/scalable-neural-methods-for-reasoning-with-a-1
Repo
Framework

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation


Title	Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation
Authors	Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean Meyn
Abstract	This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm design.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02584v1
PDF	https://arxiv.org/pdf/2002.02584v1.pdf
PWC	https://paperswithcode.com/paper/explicit-mean-square-error-bounds-for-monte
Repo
Framework

Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension


Title	Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension
Authors	Max Bartolo, Alastair Roberts, Johannes Welbl, Sebastian Riedel, Pontus Stenetorp
Abstract	Innovations in annotation methodology have been a propellant for Reading Comprehension (RC) datasets and models. One recent trend to challenge current RC models is to involve a model in the annotation process: humans create questions adversarially, such that the model fails to answer them correctly. In this work we investigate this annotation approach and apply it in three different settings, collecting a total of 36,000 samples with progressively stronger models in the annotation loop. This allows us to explore questions such as the reproducibility of the adversarial effect, transfer from data collected with varying model-in-the-loop strengths, and generalisation to data collected without a model. We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets, yet with progressive deterioration as the model-in-the-loop strength increases. Furthermore we find that stronger models can still learn from datasets collected with substantially weaker models in the loop: When trained on data collected with a BiDAF model in the loop, RoBERTa achieves 36.0F1 on questions that it cannot answer when trained on SQuAD - only marginally lower than when trained on data collected using RoBERTa itself.
Tasks	Reading Comprehension
Published	2020-02-02
URL	https://arxiv.org/abs/2002.00293v1
PDF	https://arxiv.org/pdf/2002.00293v1.pdf
PWC	https://paperswithcode.com/paper/beat-the-ai-investigating-adversarial-human
Repo
Framework

A Zero-Shot based Fingerprint Presentation Attack Detection System


Title	A Zero-Shot based Fingerprint Presentation Attack Detection System
Authors	Haozhe Liu, Wentian Zhang, Guojie Liu, Feng Liu
Abstract	With the development of presentation attacks, Automated Fingerprint Recognition Systems(AFRSs) are vulnerable to presentation attack. Thus, numerous methods of presentation attack detection(PAD) have been proposed to ensure the normal utilization of AFRS. However, the demand of large-scale presentation attack images and the low-level generalization ability always astrict existing PAD methods’ actual performances. Therefore, we propose a novel Zero-Shot Presentation Attack Detection Model to guarantee the generalization of the PAD model. The proposed ZSPAD-Model based on generative model does not utilize any negative samples in the process of establishment, which ensures the robustness for various types or materials based presentation attack. Different from other auto-encoder based model, the Fine-grained Map architecture is proposed to refine the reconstruction error of the auto-encoder networks and a task-specific gaussian model is utilized to improve the quality of clustering. Meanwhile, in order to improve the performance of the proposed model, 9 confidence scores are discussed in this article. Experimental results showed that the ZSPAD-Model is the state of the art for ZSPAD, and the MS-Score is the best confidence score. Compared with existing methods, the proposed ZSPAD-Model performs better than the feature-based method and under the multi-shot setting, the proposed method overperforms the learning based method with little training data. When large training data is available, their results are similar.
Tasks
Published	2020-02-12
URL	https://arxiv.org/abs/2002.04908v1
PDF	https://arxiv.org/pdf/2002.04908v1.pdf
PWC	https://paperswithcode.com/paper/a-zero-shot-based-fingerprint-presentation
Repo
Framework