February 1, 2020

2917 words 14 mins read

Paper Group AWR 150

Paper Group AWR 150

Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers. GANs for Semi-Supervised Opinion Spam Detection. Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving. Multi-Grained Named Entity Recognition. Deep learning for Chemometric and non-translational data. BERT and PALs: Projected Attention Layers fo …

Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers

Title Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers
Authors Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao Guo, Saloni Potdar
Abstract Most approaches to extraction multiple relations from a paragraph require multiple passes over the paragraph. In practice, multiple passes are computationally expensive and this makes difficult to scale to longer paragraphs and larger text corpora. In this work, we focus on the task of multiple relation extraction by encoding the paragraph only once (one-pass). We build our solution on the pre-trained self-attentive (Transformer) models, where we first add a structured prediction layer to handle extraction between multiple entity pairs, then enhance the paragraph embedding to capture multiple relational information associated with each entity with an entity-aware attention technique. We show that our approach is not only scalable but can also perform state-of-the-art on the standard benchmark ACE 2005.
Tasks Relation Extraction, Structured Prediction
Published 2019-02-04
URL https://arxiv.org/abs/1902.01030v2
PDF https://arxiv.org/pdf/1902.01030v2.pdf
PWC https://paperswithcode.com/paper/extracting-multiple-relations-in-one-pass
Repo https://github.com/helloeve/mre-in-one-pass
Framework tf

GANs for Semi-Supervised Opinion Spam Detection

Title GANs for Semi-Supervised Opinion Spam Detection
Authors Gray Stanton, Athirai A. Irissappane
Abstract Online reviews have become a vital source of information in purchasing a service (product). Opinion spammers manipulate reviews, affecting the overall perception of the service. A key challenge in detecting opinion spam is obtaining ground truth. Though there exists a large set of reviews online, only a few of them have been labeled spam or non-spam. In this paper, we propose spamGAN, a generative adversarial network which relies on limited set of labeled data as well as unlabeled data for opinion spam detection. spamGAN improves the state-of-the-art GAN based techniques for text classification. Experiments on TripAdvisor dataset show that spamGAN outperforms existing spam detection techniques when limited labeled data is used. Apart from detecting spam reviews, spamGAN can also generate reviews with reasonable perplexity.
Tasks Text Classification
Published 2019-03-19
URL https://arxiv.org/abs/1903.08289v2
PDF https://arxiv.org/pdf/1903.08289v2.pdf
PWC https://paperswithcode.com/paper/gans-for-semi-supervised-opinion-spam
Repo https://github.com/gray-stanton/spamGAN
Framework tf

Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

Title Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving
Authors Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, Jianfeng Gao
Abstract We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems. The essential component of the model is a novel attention mechanism, called TP-Attention, which explicitly encodes the relations between each Transformer cell and the other cells from which values have been retrieved by attention. TP-Attention goes beyond linear combination of retrieved values, strengthening representation-building and resolving ambiguities introduced by multiple layers of standard attention. The TP-Transformer’s attention maps give better insights into how it is capable of solving the Mathematics Dataset’s challenging problems. Pretrained models and code will be made available after publication.
Tasks
Published 2019-10-15
URL https://arxiv.org/abs/1910.06611v1
PDF https://arxiv.org/pdf/1910.06611v1.pdf
PWC https://paperswithcode.com/paper/enhancing-the-transformer-with-explicit-1
Repo https://github.com/ischlag/TP-Transformer
Framework pytorch

Multi-Grained Named Entity Recognition

Title Multi-Grained Named Entity Recognition
Authors Congying Xia, Chenwei Zhang, Tao Yang, Yaliang Li, Nan Du, Xian Wu, Wei Fan, Fenglong Ma, Philip Yu
Abstract This paper presents a novel framework, MGNER, for Multi-Grained Named Entity Recognition where multiple entities or entity mentions in a sentence could be non-overlapping or totally nested. Different from traditional approaches regarding NER as a sequential labeling task and annotate entities consecutively, MGNER detects and recognizes entities on multiple granularities: it is able to recognize named entities without explicitly assuming non-overlapping or totally nested structures. MGNER consists of a Detector that examines all possible word segments and a Classifier that categorizes entities. In addition, contextual information and a self-attention mechanism are utilized throughout the framework to improve the NER performance. Experimental results show that MGNER outperforms current state-of-the-art baselines up to 4.4% in terms of the F1 score among nested/non-overlapping NER tasks.
Tasks Multi-Grained Named Entity Recognition, Named Entity Recognition, Nested Mention Recognition, Nested Named Entity Recognition
Published 2019-06-20
URL https://arxiv.org/abs/1906.08449v1
PDF https://arxiv.org/pdf/1906.08449v1.pdf
PWC https://paperswithcode.com/paper/multi-grained-named-entity-recognition
Repo https://github.com/congyingxia/Multi-Grained-NER
Framework tf

Deep learning for Chemometric and non-translational data

Title Deep learning for Chemometric and non-translational data
Authors Jacob Søgaard Larsen, Line Clemmensen
Abstract We propose a novel method to train deep convolutional neural networks which learn from multiple data sets of varying input sizes through weight sharing. This is an advantage in chemometrics where individual measurements represent exact chemical compounds and thus signals cannot be translated or resized without disturbing their interpretation. Our approach show superior performance compared to transfer learning when a medium sized and a small data set are trained together. While we observe a small improvement compared to individual training when two medium sized data sets are trained together, in particular through a reduction in the variance.
Tasks Transfer Learning
Published 2019-10-01
URL https://arxiv.org/abs/1910.00391v4
PDF https://arxiv.org/pdf/1910.00391v4.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-chemometric-and-non
Repo https://github.com/DTUComputeStatisticsAndDataAnalysis/Weight-Share
Framework tf

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Title BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning
Authors Asa Cooper Stickland, Iain Murray
Abstract Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or `projected attention layers’, we match the performance of separately fine-tuned models on the GLUE benchmark with roughly 7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset. |
Tasks Multi-Task Learning, Natural Language Inference
Published 2019-02-07
URL https://arxiv.org/abs/1902.02671v2
PDF https://arxiv.org/pdf/1902.02671v2.pdf
PWC https://paperswithcode.com/paper/bert-and-pals-projected-attention-layers-for
Repo https://github.com/AsaCooperStickland/Bert-n-Pals
Framework pytorch

ERNIE: Enhanced Language Representation with Informative Entities

Title ERNIE: Enhanced Language Representation with Informative Entities
Authors Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, Qun Liu
Abstract Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The source code of this paper can be obtained from https://github.com/thunlp/ERNIE.
Tasks Entity Typing, Knowledge Graphs, Natural Language Inference, Relation Extraction, Sentiment Analysis
Published 2019-05-17
URL https://arxiv.org/abs/1905.07129v3
PDF https://arxiv.org/pdf/1905.07129v3.pdf
PWC https://paperswithcode.com/paper/ernie-enhanced-language-representation-with
Repo https://github.com/thunlp/ERNIE
Framework pytorch

CMRNet: Camera to LiDAR-Map Registration

Title CMRNet: Camera to LiDAR-Map Registration
Authors Daniele Cattaneo, Matteo Vaghi, Augusto Luis Ballardini, Simone Fontana, Domenico Giorgio Sorrenti, Wolfram Burgard
Abstract In this paper we present CMRNet, a realtime approach based on a Convolutional Neural Network to localize an RGB image of a scene in a map built from LiDAR data. Our network is not trained in the working area, i.e. CMRNet does not learn the map. Instead it learns to match an image to the map. We validate our approach on the KITTI dataset, processing each frame independently without any tracking procedure. CMRNet achieves 0.27m and 1.07deg median localization accuracy on the sequence 00 of the odometry dataset, starting from a rough pose estimate displaced up to 3.5m and 17deg. To the best of our knowledge this is the first CNN-based approach that learns to match images from a monocular camera to a given, preexisting 3D LiDAR-map.
Tasks
Published 2019-06-24
URL https://arxiv.org/abs/1906.10109v2
PDF https://arxiv.org/pdf/1906.10109v2.pdf
PWC https://paperswithcode.com/paper/cmrnet-camera-to-lidar-map-registration
Repo https://github.com/catta202000/CMRNet
Framework pytorch

Understanding Generalization through Visualizations

Title Understanding Generalization through Visualizations
Authors W. Ronny Huang, Zeyad Emam, Micah Goldblum, Liam Fowl, Justin K. Terry, Furong Huang, Tom Goldstein
Abstract The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remain elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization methods, we discuss the mystery of generalization, the geometry of loss landscapes, and how the curse (or, rather, the blessing) of dimensionality causes optimizers to settle into minima that generalize well.
Tasks
Published 2019-06-07
URL https://arxiv.org/abs/1906.03291v4
PDF https://arxiv.org/pdf/1906.03291v4.pdf
PWC https://paperswithcode.com/paper/understanding-generalization-through
Repo https://github.com/wronnyhuang/gen-viz
Framework pytorch

Improving Black-box Adversarial Attacks with a Transfer-based Prior

Title Improving Black-box Adversarial Attacks with a Transfer-based Prior
Authors Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu
Abstract We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.
Tasks
Published 2019-06-17
URL https://arxiv.org/abs/1906.06919v2
PDF https://arxiv.org/pdf/1906.06919v2.pdf
PWC https://paperswithcode.com/paper/improving-black-box-adversarial-attacks-with
Repo https://github.com/thu-ml/Prior-Guided-RGF
Framework tf

A Direct Approach to Robust Deep Learning Using Adversarial Networks

Title A Direct Approach to Robust Deep Learning Using Adversarial Networks
Authors Huaxia Wang, Chun-Nam Yu
Abstract Deep neural networks have been shown to perform well in many classical machine learning problems, especially in image classification tasks. However, researchers have found that neural networks can be easily fooled, and they are surprisingly sensitive to small perturbations imperceptible to humans. Carefully crafted input images (adversarial examples) can force a well-trained neural network to provide arbitrary outputs. Including adversarial examples during training is a popular defense mechanism against adversarial attacks. In this paper we propose a new defensive mechanism under the generative adversarial network (GAN) framework. We model the adversarial noise using a generative network, trained jointly with a classification discriminative network as a minimax game. We show empirically that our adversarial network approach works well against black box attacks, with performance on par with state-of-art methods such as ensemble adversarial training and adversarial training with projected gradient descent.
Tasks Image Classification
Published 2019-05-23
URL https://arxiv.org/abs/1905.09591v1
PDF https://arxiv.org/pdf/1905.09591v1.pdf
PWC https://paperswithcode.com/paper/a-direct-approach-to-robust-deep-learning-1
Repo https://github.com/whxbergkamp/RobustDL_GAN
Framework tf

Where’s My Head? Definition, Dataset and Models for Numeric Fused-Heads Identification and Resolution

Title Where’s My Head? Definition, Dataset and Models for Numeric Fused-Heads Identification and Resolution
Authors Yanai Elazar, Yoav Goldberg
Abstract We provide the first computational treatment of fused-heads constructions (FH), focusing on the numeric fused-heads (NFH). FHs constructions are noun phrases (NPs) in which the head noun is missing and is said to be `fused’ with its dependent modifier. This missing information is implicit and is important for sentence understanding. The missing references are easily filled in by humans but pose a challenge for computational models. We formulate the handling of FH as a two stages process: identification of the FH construction and resolution of the missing head. We explore the NFH phenomena in large corpora of English text and create (1) a dataset and a highly accurate method for NFH identification; (2) a 10k examples (1M tokens) crowd-sourced dataset of NFH resolution; and (3) a neural baseline for the NFH resolution task. We release our code and dataset, in hope to foster further research into this challenging problem. |
Tasks
Published 2019-05-26
URL https://arxiv.org/abs/1905.10886v1
PDF https://arxiv.org/pdf/1905.10886v1.pdf
PWC https://paperswithcode.com/paper/wheres-my-head-definition-dataset-and-models
Repo https://github.com/yanaiela/num_fh
Framework none

Meta-Learning with Implicit Gradients

Title Meta-Learning with Implicit Gradients
Authors Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine
Abstract A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.
Tasks Few-Shot Image Classification, Few-Shot Learning, Meta-Learning
Published 2019-09-10
URL https://arxiv.org/abs/1909.04630v1
PDF https://arxiv.org/pdf/1909.04630v1.pdf
PWC https://paperswithcode.com/paper/meta-learning-with-implicit-gradients
Repo https://github.com/spiglerg/pyMeta
Framework tf

Time to Die: Death Prediction in Dota 2 using Deep Learning

Title Time to Die: Death Prediction in Dota 2 using Deep Learning
Authors Adam Katona, Ryan Spick, Victoria Hodge, Simon Demediuk, Florian Block, Anders Drachen, James Alfred Walker
Abstract Esports have become major international sports with hundreds of millions of spectators. Esports games generate massive amounts of telemetry data. Using these to predict the outcome of esports matches has received considerable attention, but micro-predictions, which seek to predict events inside a match, is as yet unknown territory. Micro-predictions are however of perennial interest across esports commentators and audience, because they provide the ability to observe events that might otherwise be missed: esports games are highly complex with fast-moving action where the balance of a game can change in the span of seconds, and where events can happen in multiple areas of the playing field at the same time. Such events can happen rapidly, and it is easy for commentators and viewers alike to miss an event and only observe the following impact of events. In Dota 2, a player hero being killed by the opposing team is a key event of interest to commentators and audience. We present a deep learning network with shared weights which provides accurate death predictions within a five-second window. The network is trained on a vast selection of Dota 2 gameplay features and professional/semi-professional level match dataset. Even though death events are rare within a game (1% of the data), the model achieves 0.377 precision with 0.725 recall on test data when prompted to predict which of any of the 10 players of either team will die within 5 seconds. An example of the system applied to a Dota 2 match is presented. This model enables real-time micro-predictions of kills in Dota 2, one of the most played esports titles in the world, giving commentators and viewers time to move their attention to these key events.
Tasks Dota 2
Published 2019-05-21
URL https://arxiv.org/abs/1906.03939v1
PDF https://arxiv.org/pdf/1906.03939v1.pdf
PWC https://paperswithcode.com/paper/time-to-die-death-prediction-in-dota-2-using
Repo https://github.com/adam-katona/dota2_death_prediction
Framework pytorch

Training neural networks to mimic the brain improves object recognition performance

Title Training neural networks to mimic the brain improves object recognition performance
Authors Callie Federer, Haoyan Xu, Alona Fyshe, Joel Zylberberg
Abstract The current state-of-the-art object recognition algorithms, deep convolutional neural networks (DCNNs), are inspired by the architecture of the mammalian visual system, and capable of human-level performance on many tasks. However, even these algorithms make errors. As DCNNs train on object recognition tasks, they develop representations in their hidden layers that become more similar to those observed in the mammalian brains. Moreover, DCNNs trained on object recognition tasks are currently among the best models we have of the mammalian visual system. This led us to hypothesize that teaching DCNNs to achieve even more brain-like representations could improve their performance. To test this, we trained DCNNs on a composite task, wherein networks were trained to: a) classify images of objects; while b) having intermediate representations that resemble those observed in neural recordings from monkey visual cortex. Compared with DCNNs trained purely for object categorization, DCNNs trained on the composite task had better object recognition performance, make more reasonable errors and are more robust to label corruption. Our results outline a new way to train object recognition networks, using strategies in which the brain serves as a teacher signal for training DCNNs.
Tasks Object Recognition, Transfer Learning
Published 2019-05-25
URL https://arxiv.org/abs/1905.10679v2
PDF https://arxiv.org/pdf/1905.10679v2.pdf
PWC https://paperswithcode.com/paper/training-neural-networks-to-have-brain-like
Repo https://github.com/cfederer/TrainCNNsWithNeuralData
Framework tf
comments powered by Disqus