Paper Group ANR 278
Rethinking Batch Normalization in Transformers. Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream. PointAR: Efficient Lighting Estimation for Mobile Augmented Reality. Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification. Distributional semantic modeling: a revised tec …
Rethinking Batch Normalization in Transformers
Title | Rethinking Batch Normalization in Transformers |
Authors | Sheng Shen, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer |
Abstract | The standard normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN). This is different than batch normalization (BN), which is widely-adopted in Computer Vision. The preferred use of LN in NLP is principally due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks; however, a thorough understanding of the underlying reasons for this is not always evident. In this paper, we perform a systematic study of NLP transformer models to understand why BN has a poor performance, as compared to LN. We find that the statistics of NLP data across the batch dimension exhibit large fluctuations throughout training. This results in instability, if BN is naively implemented. To address this, we propose Power Normalization (PN), a novel normalization scheme that resolves this issue by (i) relaxing zero-mean normalization in BN, (ii) incorporating a running quadratic mean instead of per batch statistics to stabilize fluctuations, and (iii) using an approximate backpropagation for incorporating the running statistics in the forward pass. We show theoretically, under mild assumptions, that PN leads to a smaller Lipschitz constant for the loss, compared with BN. Furthermore, we prove that the approximate backpropagation scheme leads to bounded gradients. We extensively test PN for transformers on a range of NLP tasks, and we show that it significantly outperforms both LN and BN. In particular, PN outperforms LN by 0.4/0.6 BLEU on IWSLT14/WMT14 and 5.6/3.0 PPL on PTB/WikiText-103. |
Tasks | |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07845v1 |
https://arxiv.org/pdf/2003.07845v1.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-batch-normalization-in |
Repo | |
Framework | |
Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream
Title | Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream |
Authors | Simran K, Prathiksha Balakrishna, Vinayakumar R, Soman KP |
Abstract | In recent days, the amount of Cyber Security text data shared via social media resources mainly Twitter has increased. An accurate analysis of this data can help to develop cyber threat situational awareness framework for a cyber threat. This work proposes a deep learning based approach for tweet data analysis. To convert the tweets into numerical representations, various text representations are employed. These features are feed into deep learning architecture for optimal feature extraction as well as classification. Various hyperparameter tuning approaches are used for identifying optimal text representation method as well as optimal network parameters and network structures for deep learning models. For comparative analysis, the classical text representation method with classical machine learning algorithm is employed. From the detailed analysis of experiments, we found that the deep learning architecture with advanced text representation methods performed better than the classical text representation and classical machine learning algorithms. The primary reason for this is that the advanced text representation methods have the capability to learn sequential properties which exist among the textual data and deep learning architectures learns the optimal features along with decreasing the feature size. |
Tasks | |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2004.00503v1 |
https://arxiv.org/pdf/2004.00503v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-approach-for-enhanced-cyber |
Repo | |
Framework | |
PointAR: Efficient Lighting Estimation for Mobile Augmented Reality
Title | PointAR: Efficient Lighting Estimation for Mobile Augmented Reality |
Authors | Yiqin Zhao, Tian Guo |
Abstract | We propose an efficient lighting estimation pipeline that is suitable to run on modern mobile devices, with comparable resource complexities to state-of-the-art on-device deep learning models. Our pipeline, referred to as PointAR, takes a single RGB-D image captured from the mobile camera and a 2D location in that image, and estimates a 2nd order spherical harmonics coefficients which can be directly utilized by rendering engines for indoor lighting in the context of augmented reality. Our key insight is to formulate the lighting estimation as a learning problem directly from point clouds, which is in part inspired by the Monte Carlo integration leveraged by real-time spherical harmonics lighting. While existing approaches estimate lighting information with complex deep learning pipelines, our method focuses on reducing the computational complexity. Through both quantitative and qualitative experiments, we demonstrate that PointAR achieves lower lighting estimation errors compared to state-of-the-art methods. Further, our method requires an order of magnitude lower resource, comparable to that of mobile-specific DNNs. |
Tasks | |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2004.00006v1 |
https://arxiv.org/pdf/2004.00006v1.pdf | |
PWC | https://paperswithcode.com/paper/pointar-efficient-lighting-estimation-for |
Repo | |
Framework | |
Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification
Title | Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification |
Authors | Chris Dulhanty, Alexander Wong |
Abstract | Modern face recognition systems leverage datasets containing images of hundreds of thousands of specific individuals’ faces to train deep convolutional neural networks to learn an embedding space that maps an arbitrary individual’s face to a vector representation of their identity. The performance of a face recognition system in face verification (1:1) and face identification (1:N) tasks is directly related to the ability of an embedding space to discriminate between identities. Recently, there has been significant public scrutiny into the source and privacy implications of large-scale face recognition training datasets such as MS-Celeb-1M and MegaFace, as many people are uncomfortable with their face being used to train dual-use technologies that can enable mass surveillance. However, the impact of an individual’s inclusion in training data on a derived system’s ability to recognize them has not previously been studied. In this work, we audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model’s training data and an accuracy of 75.73% for those not present. This modest difference in accuracy demonstrates that face recognition systems using deep learning work better for individuals they are trained on, which has serious privacy implications when one considers all major open source face recognition training datasets do not obtain informed consent from individuals during their collection. |
Tasks | Face Identification, Face Recognition, Face Verification |
Published | 2020-01-09 |
URL | https://arxiv.org/abs/2001.03071v2 |
https://arxiv.org/pdf/2001.03071v2.pdf | |
PWC | https://paperswithcode.com/paper/investigating-the-impact-of-inclusion-in-face |
Repo | |
Framework | |
Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach
Title | Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach |
Authors | Oleksandr Palagin, Vitalii Velychko, Kyrylo Malakhov, Oleksandr Shchurov |
Abstract | We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings) - term vector space models as a result, inspired by the recent ontology-related approach (using different types of contextual knowledge such as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to the identification of terms (term extraction) and relations between them (relation extraction) called semantic pre-processing technology - SPT. Our method relies on automatic term extraction from the natural language texts and subsequent formation of the problem-oriented or application-oriented (also deeply annotated) text corpora where the fundamental entity is the term (includes non-compositional and compositional terms). This gives us an opportunity to changeover from distributed word representations (or word embeddings) to distributed term representations (or term embeddings). This transition will allow to generate more accurate semantic maps of different subject domains (also, of relations between input terms - it is useful to explore clusters and oppositions, or to test your hypotheses about them). The semantic map can be represented as a graph using Vec2graph - a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs. The Vec2graph library coupled with term embeddings will not only improve accuracy in solving standard NLP tasks, but also update the conventional concept of automated ontology development. The main practical result of our work is the development kit (set of toolkits represented as web service APIs and web application), which provides all necessary routines for the basic linguistic pre-processing and the semantic pre-processing of the natural language texts in Ukrainian for future training of term vector space models. |
Tasks | Relation Extraction, Word Embeddings |
Published | 2020-03-06 |
URL | https://arxiv.org/abs/2003.03350v1 |
https://arxiv.org/pdf/2003.03350v1.pdf | |
PWC | https://paperswithcode.com/paper/distributional-semantic-modeling-a-revised |
Repo | |
Framework | |
EQL – an extremely easy to learn knowledge graph query language, achieving highspeed and precise search
Title | EQL – an extremely easy to learn knowledge graph query language, achieving highspeed and precise search |
Authors | Han Liu, Shantao Liu |
Abstract | EQL, also named as Extremely Simple Query Language, can be widely used in the field of knowledge graph, precise search, strong artificial intelligence, database, smart speaker ,patent search and other fields. EQL adopt the principle of minimalism in design and pursues simplicity and easy to learn so that everyone can master it quickly. EQL language and lambda calculus are interconvertible, that reveals the mathematical nature of EQL language, and lays a solid foundation for rigor and logical integrity of EQL language. The EQL language and a comprehensive knowledge graph system with the world’s commonsense can together form the foundation of strong AI in the future, and make up for the current lack of understanding of world’s commonsense by current AI system. EQL language can be used not only by humans, but also as a basic language for data query and data exchange between robots. |
Tasks | |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.11105v1 |
https://arxiv.org/pdf/2003.11105v1.pdf | |
PWC | https://paperswithcode.com/paper/eql-an-extremely-easy-to-learn-knowledge |
Repo | |
Framework | |
Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? – A Neural Tangent Kernel Perspective
Title | Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? – A Neural Tangent Kernel Perspective |
Authors | Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao |
Abstract | Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called “neural tangent kernel” perspective. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities. Numerical results are provided to support our claim. |
Tasks | |
Published | 2020-02-14 |
URL | https://arxiv.org/abs/2002.06262v1 |
https://arxiv.org/pdf/2002.06262v1.pdf | |
PWC | https://paperswithcode.com/paper/why-do-deep-residual-networks-generalize |
Repo | |
Framework | |
Why Molière most likely did write his plays
Title | Why Molière most likely did write his plays |
Authors | Florian Cafiero, Jean-Baptiste Camps |
Abstract | As for Shakespeare, a hard-fought debate has emerged about Moli`ere, a supposedly uneducated actor who, according to some, could not have written the masterpieces attributed to him. In the past decades, the century-old thesis according to which Pierre Corneille would be their actual author has become popular, mostly because of new works in computational linguistics. These results are reassessed here through state-of-the-art attribution methods. We study a corpus of comedies in verse by major authors of Moli`ere and Corneille’s time. Analysis of lexicon, rhymes, word forms, affixes, morphosyntactic sequences, and function words do not give any clue that another author among the major playwrights of the time would have written the plays signed under the name Moli`ere. |
Tasks | |
Published | 2020-01-02 |
URL | https://arxiv.org/abs/2001.01595v1 |
https://arxiv.org/pdf/2001.01595v1.pdf | |
PWC | https://paperswithcode.com/paper/why-moliere-most-likely-did-write-his-plays |
Repo | |
Framework | |
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses
Title | AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses |
Authors | Tong Niu, Mohit Bansal |
Abstract | Many sequence-to-sequence dialogue models tend to generate safe, uninformative responses. There have been various useful efforts on trying to eliminate them. However, these approaches either improve decoding algorithms during inference, rely on hand-crafted features, or employ complex models. In our work, we build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. Specifically, we start with a simple yet effective automatic metric, AvgOut, which calculates the average output probability distribution of all time steps on the decoder side during training. This metric directly estimates which tokens are more likely to be generated, thus making it a faithful evaluation of the model diversity (i.e., for diverse models, the token probabilities should be more evenly distributed rather than peaked at a few dull tokens). We then leverage this novel metric to propose three models that promote diversity without losing relevance. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch; the second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level; the third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal. Moreover, we experiment with a hybrid model by combining the loss terms of MinAvgOut and RL. All four models outperform their base LSTM-RNN model on both diversity and relevance by a large margin, and are comparable to or better than competitive baselines (also verified via human evaluation). Moreover, our approaches are orthogonal to the base model, making them applicable as an add-on to other emerging better dialogue models in the future. |
Tasks | Feature Engineering |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.05467v1 |
https://arxiv.org/pdf/2001.05467v1.pdf | |
PWC | https://paperswithcode.com/paper/avgout-a-simple-output-probability-measure-to |
Repo | |
Framework | |
Information Extraction based on Named Entity for Tourism Corpus
Title | Information Extraction based on Named Entity for Tourism Corpus |
Authors | Chantana Chantrapornchai, Aphisit Tunsakul |
Abstract | Tourism information is scattered around nowadays. To search for the information, it is usually time consuming to browse through the results from search engine, select and view the details of each accommodation. In this paper, we present a methodology to extract particular information from full text returned from the search engine to facilitate the users. Then, the users can specifically look to the desired relevant information. The approach can be used for the same task in other domains. The main steps are 1) building training data and 2) building recognition model. First, the tourism data is gathered and the vocabularies are built. The raw corpus is used to train for creating vocabulary embedding. Also, it is used for creating annotated data. The process of creating named entity annotation is presented. Then, the recognition model of a given entity type can be built. From the experiments, given hotel description, the model can extract the desired entity,i.e, name, location, facility. The extracted data can further be stored as a structured information, e.g., in the ontology format, for future querying and inference. The model for automatic named entity identification, based on machine learning, yields the error ranging 8%-25%. |
Tasks | |
Published | 2020-01-03 |
URL | https://arxiv.org/abs/2001.01588v1 |
https://arxiv.org/pdf/2001.01588v1.pdf | |
PWC | https://paperswithcode.com/paper/information-extraction-based-on-named-entity |
Repo | |
Framework | |
Learning light field synthesis with Multi-Plane Images: scene encoding as a recurrent segmentation task
Title | Learning light field synthesis with Multi-Plane Images: scene encoding as a recurrent segmentation task |
Authors | Tomás Völker, Guillaume Boisson, Bertrand Chupeau |
Abstract | In this paper we address the problem of view synthesis from large baseline light fields, by turning a sparse set of input views into a Multi-plane Image (MPI). Because available datasets are scarce, we propose a lightweight network that does not require extensive training. Unlike latest approaches, our model does not learn to estimate RGB layers but only encodes the scene geometry within MPI alpha layers, which comes down to a segmentation task. A Learned Gradient Descent (LGD) framework is used to cascade the same convolutional network in a recurrent fashion in order to refine the volumetric representation obtained. Thanks to its low number of parameters, our model trains successfully on a small light field video dataset and provides visually appealing results. It also exhibits convenient generalization properties regarding both the number of input views, the number of depth planes in the MPI, and the number of refinement iterations. |
Tasks | |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.05028v2 |
https://arxiv.org/pdf/2002.05028v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-light-field-synthesis-with-multi |
Repo | |
Framework | |
Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base
Title | Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base |
Authors | William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler |
Abstract | We describe a novel way of representing a symbolic knowledge base (KB) called a sparse-matrix reified KB. This representation enables neural modules that are fully differentiable, faithful to the original semantics of the KB, expressive enough to model multi-hop inferences, and scalable enough to use with realistically large KBs. The sparse-matrix reified KB can be distributed across multiple GPUs, can scale to tens of millions of entities and facts, and is orders of magnitude faster than naive sparse-matrix implementations. The reified KB enables very simple end-to-end architectures to obtain competitive performance on several benchmarks representing two families of tasks: KB completion, and learning semantic parsers from denotations. |
Tasks | |
Published | 2020-02-14 |
URL | https://arxiv.org/abs/2002.06115v1 |
https://arxiv.org/pdf/2002.06115v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-neural-methods-for-reasoning-with-a-1 |
Repo | |
Framework | |
Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation
Title | Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation |
Authors | Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean Meyn |
Abstract | This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm design. |
Tasks | |
Published | 2020-02-07 |
URL | https://arxiv.org/abs/2002.02584v1 |
https://arxiv.org/pdf/2002.02584v1.pdf | |
PWC | https://paperswithcode.com/paper/explicit-mean-square-error-bounds-for-monte |
Repo | |
Framework | |
Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension
Title | Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension |
Authors | Max Bartolo, Alastair Roberts, Johannes Welbl, Sebastian Riedel, Pontus Stenetorp |
Abstract | Innovations in annotation methodology have been a propellant for Reading Comprehension (RC) datasets and models. One recent trend to challenge current RC models is to involve a model in the annotation process: humans create questions adversarially, such that the model fails to answer them correctly. In this work we investigate this annotation approach and apply it in three different settings, collecting a total of 36,000 samples with progressively stronger models in the annotation loop. This allows us to explore questions such as the reproducibility of the adversarial effect, transfer from data collected with varying model-in-the-loop strengths, and generalisation to data collected without a model. We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets, yet with progressive deterioration as the model-in-the-loop strength increases. Furthermore we find that stronger models can still learn from datasets collected with substantially weaker models in the loop: When trained on data collected with a BiDAF model in the loop, RoBERTa achieves 36.0F1 on questions that it cannot answer when trained on SQuAD - only marginally lower than when trained on data collected using RoBERTa itself. |
Tasks | Reading Comprehension |
Published | 2020-02-02 |
URL | https://arxiv.org/abs/2002.00293v1 |
https://arxiv.org/pdf/2002.00293v1.pdf | |
PWC | https://paperswithcode.com/paper/beat-the-ai-investigating-adversarial-human |
Repo | |
Framework | |
A Zero-Shot based Fingerprint Presentation Attack Detection System
Title | A Zero-Shot based Fingerprint Presentation Attack Detection System |
Authors | Haozhe Liu, Wentian Zhang, Guojie Liu, Feng Liu |
Abstract | With the development of presentation attacks, Automated Fingerprint Recognition Systems(AFRSs) are vulnerable to presentation attack. Thus, numerous methods of presentation attack detection(PAD) have been proposed to ensure the normal utilization of AFRS. However, the demand of large-scale presentation attack images and the low-level generalization ability always astrict existing PAD methods’ actual performances. Therefore, we propose a novel Zero-Shot Presentation Attack Detection Model to guarantee the generalization of the PAD model. The proposed ZSPAD-Model based on generative model does not utilize any negative samples in the process of establishment, which ensures the robustness for various types or materials based presentation attack. Different from other auto-encoder based model, the Fine-grained Map architecture is proposed to refine the reconstruction error of the auto-encoder networks and a task-specific gaussian model is utilized to improve the quality of clustering. Meanwhile, in order to improve the performance of the proposed model, 9 confidence scores are discussed in this article. Experimental results showed that the ZSPAD-Model is the state of the art for ZSPAD, and the MS-Score is the best confidence score. Compared with existing methods, the proposed ZSPAD-Model performs better than the feature-based method and under the multi-shot setting, the proposed method overperforms the learning based method with little training data. When large training data is available, their results are similar. |
Tasks | |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.04908v1 |
https://arxiv.org/pdf/2002.04908v1.pdf | |
PWC | https://paperswithcode.com/paper/a-zero-shot-based-fingerprint-presentation |
Repo | |
Framework | |