February 1, 2020

2917 words 14 mins read

Paper Group AWR 150

Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers. GANs for Semi-Supervised Opinion Spam Detection. Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving. Multi-Grained Named Entity Recognition. Deep learning for Chemometric and non-translational data. BERT and PALs: Projected Attention Layers fo …

Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers


Title	Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers
Authors	Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao Guo, Saloni Potdar
Abstract	Most approaches to extraction multiple relations from a paragraph require multiple passes over the paragraph. In practice, multiple passes are computationally expensive and this makes difficult to scale to longer paragraphs and larger text corpora. In this work, we focus on the task of multiple relation extraction by encoding the paragraph only once (one-pass). We build our solution on the pre-trained self-attentive (Transformer) models, where we first add a structured prediction layer to handle extraction between multiple entity pairs, then enhance the paragraph embedding to capture multiple relational information associated with each entity with an entity-aware attention technique. We show that our approach is not only scalable but can also perform state-of-the-art on the standard benchmark ACE 2005.
Tasks	Relation Extraction, Structured Prediction
Published	2019-02-04
URL	https://arxiv.org/abs/1902.01030v2
PDF	https://arxiv.org/pdf/1902.01030v2.pdf
PWC	https://paperswithcode.com/paper/extracting-multiple-relations-in-one-pass
Repo	https://github.com/helloeve/mre-in-one-pass
Framework	tf

GANs for Semi-Supervised Opinion Spam Detection


Title	GANs for Semi-Supervised Opinion Spam Detection
Authors	Gray Stanton, Athirai A. Irissappane
Abstract	Online reviews have become a vital source of information in purchasing a service (product). Opinion spammers manipulate reviews, affecting the overall perception of the service. A key challenge in detecting opinion spam is obtaining ground truth. Though there exists a large set of reviews online, only a few of them have been labeled spam or non-spam. In this paper, we propose spamGAN, a generative adversarial network which relies on limited set of labeled data as well as unlabeled data for opinion spam detection. spamGAN improves the state-of-the-art GAN based techniques for text classification. Experiments on TripAdvisor dataset show that spamGAN outperforms existing spam detection techniques when limited labeled data is used. Apart from detecting spam reviews, spamGAN can also generate reviews with reasonable perplexity.
Tasks	Text Classification
Published	2019-03-19
URL	https://arxiv.org/abs/1903.08289v2
PDF	https://arxiv.org/pdf/1903.08289v2.pdf
PWC	https://paperswithcode.com/paper/gans-for-semi-supervised-opinion-spam
Repo	https://github.com/gray-stanton/spamGAN
Framework	tf

Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving


Title	Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving
Authors	Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, Jianfeng Gao
Abstract	We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems. The essential component of the model is a novel attention mechanism, called TP-Attention, which explicitly encodes the relations between each Transformer cell and the other cells from which values have been retrieved by attention. TP-Attention goes beyond linear combination of retrieved values, strengthening representation-building and resolving ambiguities introduced by multiple layers of standard attention. The TP-Transformer’s attention maps give better insights into how it is capable of solving the Mathematics Dataset’s challenging problems. Pretrained models and code will be made available after publication.
Tasks
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06611v1
PDF	https://arxiv.org/pdf/1910.06611v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-the-transformer-with-explicit-1
Repo	https://github.com/ischlag/TP-Transformer
Framework	pytorch

Multi-Grained Named Entity Recognition


Title	Multi-Grained Named Entity Recognition
Authors	Congying Xia, Chenwei Zhang, Tao Yang, Yaliang Li, Nan Du, Xian Wu, Wei Fan, Fenglong Ma, Philip Yu
Abstract	This paper presents a novel framework, MGNER, for Multi-Grained Named Entity Recognition where multiple entities or entity mentions in a sentence could be non-overlapping or totally nested. Different from traditional approaches regarding NER as a sequential labeling task and annotate entities consecutively, MGNER detects and recognizes entities on multiple granularities: it is able to recognize named entities without explicitly assuming non-overlapping or totally nested structures. MGNER consists of a Detector that examines all possible word segments and a Classifier that categorizes entities. In addition, contextual information and a self-attention mechanism are utilized throughout the framework to improve the NER performance. Experimental results show that MGNER outperforms current state-of-the-art baselines up to 4.4% in terms of the F1 score among nested/non-overlapping NER tasks.
Tasks	Multi-Grained Named Entity Recognition, Named Entity Recognition, Nested Mention Recognition, Nested Named Entity Recognition
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08449v1
PDF	https://arxiv.org/pdf/1906.08449v1.pdf
PWC	https://paperswithcode.com/paper/multi-grained-named-entity-recognition
Repo	https://github.com/congyingxia/Multi-Grained-NER
Framework	tf

Deep learning for Chemometric and non-translational data


Title	Deep learning for Chemometric and non-translational data
Authors	Jacob Søgaard Larsen, Line Clemmensen
Abstract	We propose a novel method to train deep convolutional neural networks which learn from multiple data sets of varying input sizes through weight sharing. This is an advantage in chemometrics where individual measurements represent exact chemical compounds and thus signals cannot be translated or resized without disturbing their interpretation. Our approach show superior performance compared to transfer learning when a medium sized and a small data set are trained together. While we observe a small improvement compared to individual training when two medium sized data sets are trained together, in particular through a reduction in the variance.
Tasks	Transfer Learning
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00391v4
PDF	https://arxiv.org/pdf/1910.00391v4.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-chemometric-and-non
Repo	https://github.com/DTUComputeStatisticsAndDataAnalysis/Weight-Share
Framework	tf

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning


Title	BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning
Authors	Asa Cooper Stickland, Iain Murray
Abstract	Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or `projected attention layers’, we match the performance of separately fine-tuned models on the GLUE benchmark with roughly 7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset. \|
Tasks	Multi-Task Learning, Natural Language Inference
Published	2019-02-07
URL	https://arxiv.org/abs/1902.02671v2
PDF	https://arxiv.org/pdf/1902.02671v2.pdf
PWC	https://paperswithcode.com/paper/bert-and-pals-projected-attention-layers-for
Repo	https://github.com/AsaCooperStickland/Bert-n-Pals
Framework	pytorch

ERNIE: Enhanced Language Representation with Informative Entities


Title	ERNIE: Enhanced Language Representation with Informative Entities
Authors	Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, Qun Liu
Abstract	Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The source code of this paper can be obtained from https://github.com/thunlp/ERNIE.
Tasks	Entity Typing, Knowledge Graphs, Natural Language Inference, Relation Extraction, Sentiment Analysis
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07129v3
PDF	https://arxiv.org/pdf/1905.07129v3.pdf
PWC	https://paperswithcode.com/paper/ernie-enhanced-language-representation-with
Repo	https://github.com/thunlp/ERNIE
Framework	pytorch

CMRNet: Camera to LiDAR-Map Registration


Title	CMRNet: Camera to LiDAR-Map Registration
Authors	Daniele Cattaneo, Matteo Vaghi, Augusto Luis Ballardini, Simone Fontana, Domenico Giorgio Sorrenti, Wolfram Burgard
Abstract	In this paper we present CMRNet, a realtime approach based on a Convolutional Neural Network to localize an RGB image of a scene in a map built from LiDAR data. Our network is not trained in the working area, i.e. CMRNet does not learn the map. Instead it learns to match an image to the map. We validate our approach on the KITTI dataset, processing each frame independently without any tracking procedure. CMRNet achieves 0.27m and 1.07deg median localization accuracy on the sequence 00 of the odometry dataset, starting from a rough pose estimate displaced up to 3.5m and 17deg. To the best of our knowledge this is the first CNN-based approach that learns to match images from a monocular camera to a given, preexisting 3D LiDAR-map.
Tasks
Published	2019-06-24
URL	https://arxiv.org/abs/1906.10109v2
PDF	https://arxiv.org/pdf/1906.10109v2.pdf
PWC	https://paperswithcode.com/paper/cmrnet-camera-to-lidar-map-registration
Repo	https://github.com/catta202000/CMRNet
Framework	pytorch

Understanding Generalization through Visualizations


Title	Understanding Generalization through Visualizations
Authors	W. Ronny Huang, Zeyad Emam, Micah Goldblum, Liam Fowl, Justin K. Terry, Furong Huang, Tom Goldstein
Abstract	The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remain elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization methods, we discuss the mystery of generalization, the geometry of loss landscapes, and how the curse (or, rather, the blessing) of dimensionality causes optimizers to settle into minima that generalize well.
Tasks
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03291v4
PDF	https://arxiv.org/pdf/1906.03291v4.pdf
PWC	https://paperswithcode.com/paper/understanding-generalization-through
Repo	https://github.com/wronnyhuang/gen-viz
Framework	pytorch

Improving Black-box Adversarial Attacks with a Transfer-based Prior


Title	Improving Black-box Adversarial Attacks with a Transfer-based Prior
Authors	Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu
Abstract	We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.
Tasks
Published	2019-06-17
URL	https://arxiv.org/abs/1906.06919v2
PDF	https://arxiv.org/pdf/1906.06919v2.pdf
PWC	https://paperswithcode.com/paper/improving-black-box-adversarial-attacks-with
Repo	https://github.com/thu-ml/Prior-Guided-RGF
Framework	tf

A Direct Approach to Robust Deep Learning Using Adversarial Networks


Title	A Direct Approach to Robust Deep Learning Using Adversarial Networks
Authors	Huaxia Wang, Chun-Nam Yu
Abstract	Deep neural networks have been shown to perform well in many classical machine learning problems, especially in image classification tasks. However, researchers have found that neural networks can be easily fooled, and they are surprisingly sensitive to small perturbations imperceptible to humans. Carefully crafted input images (adversarial examples) can force a well-trained neural network to provide arbitrary outputs. Including adversarial examples during training is a popular defense mechanism against adversarial attacks. In this paper we propose a new defensive mechanism under the generative adversarial network (GAN) framework. We model the adversarial noise using a generative network, trained jointly with a classification discriminative network as a minimax game. We show empirically that our adversarial network approach works well against black box attacks, with performance on par with state-of-art methods such as ensemble adversarial training and adversarial training with projected gradient descent.
Tasks	Image Classification
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09591v1
PDF	https://arxiv.org/pdf/1905.09591v1.pdf
PWC	https://paperswithcode.com/paper/a-direct-approach-to-robust-deep-learning-1
Repo	https://github.com/whxbergkamp/RobustDL_GAN
Framework	tf

Where’s My Head? Definition, Dataset and Models for Numeric Fused-Heads Identification and Resolution


Title	Where’s My Head? Definition, Dataset and Models for Numeric Fused-Heads Identification and Resolution
Authors	Yanai Elazar, Yoav Goldberg
Abstract	We provide the first computational treatment of fused-heads constructions (FH), focusing on the numeric fused-heads (NFH). FHs constructions are noun phrases (NPs) in which the head noun is missing and is said to be `fused’ with its dependent modifier. This missing information is implicit and is important for sentence understanding. The missing references are easily filled in by humans but pose a challenge for computational models. We formulate the handling of FH as a two stages process: identification of the FH construction and resolution of the missing head. We explore the NFH phenomena in large corpora of English text and create (1) a dataset and a highly accurate method for NFH identification; (2) a 10k examples (1M tokens) crowd-sourced dataset of NFH resolution; and (3) a neural baseline for the NFH resolution task. We release our code and dataset, in hope to foster further research into this challenging problem. \|
Tasks
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10886v1
PDF	https://arxiv.org/pdf/1905.10886v1.pdf
PWC	https://paperswithcode.com/paper/wheres-my-head-definition-dataset-and-models
Repo	https://github.com/yanaiela/num_fh
Framework	none

Meta-Learning with Implicit Gradients


Title	Meta-Learning with Implicit Gradients
Authors	Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine
Abstract	A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Meta-Learning
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04630v1
PDF	https://arxiv.org/pdf/1909.04630v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-with-implicit-gradients
Repo	https://github.com/spiglerg/pyMeta
Framework	tf

Time to Die: Death Prediction in Dota 2 using Deep Learning


Title	Time to Die: Death Prediction in Dota 2 using Deep Learning
Authors	Adam Katona, Ryan Spick, Victoria Hodge, Simon Demediuk, Florian Block, Anders Drachen, James Alfred Walker
Abstract	Esports have become major international sports with hundreds of millions of spectators. Esports games generate massive amounts of telemetry data. Using these to predict the outcome of esports matches has received considerable attention, but micro-predictions, which seek to predict events inside a match, is as yet unknown territory. Micro-predictions are however of perennial interest across esports commentators and audience, because they provide the ability to observe events that might otherwise be missed: esports games are highly complex with fast-moving action where the balance of a game can change in the span of seconds, and where events can happen in multiple areas of the playing field at the same time. Such events can happen rapidly, and it is easy for commentators and viewers alike to miss an event and only observe the following impact of events. In Dota 2, a player hero being killed by the opposing team is a key event of interest to commentators and audience. We present a deep learning network with shared weights which provides accurate death predictions within a five-second window. The network is trained on a vast selection of Dota 2 gameplay features and professional/semi-professional level match dataset. Even though death events are rare within a game (1% of the data), the model achieves 0.377 precision with 0.725 recall on test data when prompted to predict which of any of the 10 players of either team will die within 5 seconds. An example of the system applied to a Dota 2 match is presented. This model enables real-time micro-predictions of kills in Dota 2, one of the most played esports titles in the world, giving commentators and viewers time to move their attention to these key events.
Tasks	Dota 2
Published	2019-05-21
URL	https://arxiv.org/abs/1906.03939v1
PDF	https://arxiv.org/pdf/1906.03939v1.pdf
PWC	https://paperswithcode.com/paper/time-to-die-death-prediction-in-dota-2-using
Repo	https://github.com/adam-katona/dota2_death_prediction
Framework	pytorch

Training neural networks to mimic the brain improves object recognition performance


Title	Training neural networks to mimic the brain improves object recognition performance
Authors	Callie Federer, Haoyan Xu, Alona Fyshe, Joel Zylberberg
Abstract	The current state-of-the-art object recognition algorithms, deep convolutional neural networks (DCNNs), are inspired by the architecture of the mammalian visual system, and capable of human-level performance on many tasks. However, even these algorithms make errors. As DCNNs train on object recognition tasks, they develop representations in their hidden layers that become more similar to those observed in the mammalian brains. Moreover, DCNNs trained on object recognition tasks are currently among the best models we have of the mammalian visual system. This led us to hypothesize that teaching DCNNs to achieve even more brain-like representations could improve their performance. To test this, we trained DCNNs on a composite task, wherein networks were trained to: a) classify images of objects; while b) having intermediate representations that resemble those observed in neural recordings from monkey visual cortex. Compared with DCNNs trained purely for object categorization, DCNNs trained on the composite task had better object recognition performance, make more reasonable errors and are more robust to label corruption. Our results outline a new way to train object recognition networks, using strategies in which the brain serves as a teacher signal for training DCNNs.
Tasks	Object Recognition, Transfer Learning
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10679v2
PDF	https://arxiv.org/pdf/1905.10679v2.pdf
PWC	https://paperswithcode.com/paper/training-neural-networks-to-have-brain-like
Repo	https://github.com/cfederer/TrainCNNsWithNeuralData
Framework	tf