Paper Group AWR 150
Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers. GANs for Semi-Supervised Opinion Spam Detection. Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving. Multi-Grained Named Entity Recognition. Deep learning for Chemometric and non-translational data. BERT and PALs: Projected Attention Layers fo …
Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers
Title | Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers |
Authors | Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao Guo, Saloni Potdar |
Abstract | Most approaches to extraction multiple relations from a paragraph require multiple passes over the paragraph. In practice, multiple passes are computationally expensive and this makes difficult to scale to longer paragraphs and larger text corpora. In this work, we focus on the task of multiple relation extraction by encoding the paragraph only once (one-pass). We build our solution on the pre-trained self-attentive (Transformer) models, where we first add a structured prediction layer to handle extraction between multiple entity pairs, then enhance the paragraph embedding to capture multiple relational information associated with each entity with an entity-aware attention technique. We show that our approach is not only scalable but can also perform state-of-the-art on the standard benchmark ACE 2005. |
Tasks | Relation Extraction, Structured Prediction |
Published | 2019-02-04 |
URL | https://arxiv.org/abs/1902.01030v2 |
https://arxiv.org/pdf/1902.01030v2.pdf | |
PWC | https://paperswithcode.com/paper/extracting-multiple-relations-in-one-pass |
Repo | https://github.com/helloeve/mre-in-one-pass |
Framework | tf |
GANs for Semi-Supervised Opinion Spam Detection
Title | GANs for Semi-Supervised Opinion Spam Detection |
Authors | Gray Stanton, Athirai A. Irissappane |
Abstract | Online reviews have become a vital source of information in purchasing a service (product). Opinion spammers manipulate reviews, affecting the overall perception of the service. A key challenge in detecting opinion spam is obtaining ground truth. Though there exists a large set of reviews online, only a few of them have been labeled spam or non-spam. In this paper, we propose spamGAN, a generative adversarial network which relies on limited set of labeled data as well as unlabeled data for opinion spam detection. spamGAN improves the state-of-the-art GAN based techniques for text classification. Experiments on TripAdvisor dataset show that spamGAN outperforms existing spam detection techniques when limited labeled data is used. Apart from detecting spam reviews, spamGAN can also generate reviews with reasonable perplexity. |
Tasks | Text Classification |
Published | 2019-03-19 |
URL | https://arxiv.org/abs/1903.08289v2 |
https://arxiv.org/pdf/1903.08289v2.pdf | |
PWC | https://paperswithcode.com/paper/gans-for-semi-supervised-opinion-spam |
Repo | https://github.com/gray-stanton/spamGAN |
Framework | tf |
Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving
Title | Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving |
Authors | Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, Jianfeng Gao |
Abstract | We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems. The essential component of the model is a novel attention mechanism, called TP-Attention, which explicitly encodes the relations between each Transformer cell and the other cells from which values have been retrieved by attention. TP-Attention goes beyond linear combination of retrieved values, strengthening representation-building and resolving ambiguities introduced by multiple layers of standard attention. The TP-Transformer’s attention maps give better insights into how it is capable of solving the Mathematics Dataset’s challenging problems. Pretrained models and code will be made available after publication. |
Tasks | |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06611v1 |
https://arxiv.org/pdf/1910.06611v1.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-the-transformer-with-explicit-1 |
Repo | https://github.com/ischlag/TP-Transformer |
Framework | pytorch |
Multi-Grained Named Entity Recognition
Title | Multi-Grained Named Entity Recognition |
Authors | Congying Xia, Chenwei Zhang, Tao Yang, Yaliang Li, Nan Du, Xian Wu, Wei Fan, Fenglong Ma, Philip Yu |
Abstract | This paper presents a novel framework, MGNER, for Multi-Grained Named Entity Recognition where multiple entities or entity mentions in a sentence could be non-overlapping or totally nested. Different from traditional approaches regarding NER as a sequential labeling task and annotate entities consecutively, MGNER detects and recognizes entities on multiple granularities: it is able to recognize named entities without explicitly assuming non-overlapping or totally nested structures. MGNER consists of a Detector that examines all possible word segments and a Classifier that categorizes entities. In addition, contextual information and a self-attention mechanism are utilized throughout the framework to improve the NER performance. Experimental results show that MGNER outperforms current state-of-the-art baselines up to 4.4% in terms of the F1 score among nested/non-overlapping NER tasks. |
Tasks | Multi-Grained Named Entity Recognition, Named Entity Recognition, Nested Mention Recognition, Nested Named Entity Recognition |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08449v1 |
https://arxiv.org/pdf/1906.08449v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-grained-named-entity-recognition |
Repo | https://github.com/congyingxia/Multi-Grained-NER |
Framework | tf |
Deep learning for Chemometric and non-translational data
Title | Deep learning for Chemometric and non-translational data |
Authors | Jacob Søgaard Larsen, Line Clemmensen |
Abstract | We propose a novel method to train deep convolutional neural networks which learn from multiple data sets of varying input sizes through weight sharing. This is an advantage in chemometrics where individual measurements represent exact chemical compounds and thus signals cannot be translated or resized without disturbing their interpretation. Our approach show superior performance compared to transfer learning when a medium sized and a small data set are trained together. While we observe a small improvement compared to individual training when two medium sized data sets are trained together, in particular through a reduction in the variance. |
Tasks | Transfer Learning |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00391v4 |
https://arxiv.org/pdf/1910.00391v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-chemometric-and-non |
Repo | https://github.com/DTUComputeStatisticsAndDataAnalysis/Weight-Share |
Framework | tf |
BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning
Title | BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning |
Authors | Asa Cooper Stickland, Iain Murray |
Abstract | Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or `projected attention layers’, we match the performance of separately fine-tuned models on the GLUE benchmark with roughly 7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset. | |
Tasks | Multi-Task Learning, Natural Language Inference |
Published | 2019-02-07 |
URL | https://arxiv.org/abs/1902.02671v2 |
https://arxiv.org/pdf/1902.02671v2.pdf | |
PWC | https://paperswithcode.com/paper/bert-and-pals-projected-attention-layers-for |
Repo | https://github.com/AsaCooperStickland/Bert-n-Pals |
Framework | pytorch |
ERNIE: Enhanced Language Representation with Informative Entities
Title | ERNIE: Enhanced Language Representation with Informative Entities |
Authors | Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, Qun Liu |
Abstract | Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The source code of this paper can be obtained from https://github.com/thunlp/ERNIE. |
Tasks | Entity Typing, Knowledge Graphs, Natural Language Inference, Relation Extraction, Sentiment Analysis |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07129v3 |
https://arxiv.org/pdf/1905.07129v3.pdf | |
PWC | https://paperswithcode.com/paper/ernie-enhanced-language-representation-with |
Repo | https://github.com/thunlp/ERNIE |
Framework | pytorch |
CMRNet: Camera to LiDAR-Map Registration
Title | CMRNet: Camera to LiDAR-Map Registration |
Authors | Daniele Cattaneo, Matteo Vaghi, Augusto Luis Ballardini, Simone Fontana, Domenico Giorgio Sorrenti, Wolfram Burgard |
Abstract | In this paper we present CMRNet, a realtime approach based on a Convolutional Neural Network to localize an RGB image of a scene in a map built from LiDAR data. Our network is not trained in the working area, i.e. CMRNet does not learn the map. Instead it learns to match an image to the map. We validate our approach on the KITTI dataset, processing each frame independently without any tracking procedure. CMRNet achieves 0.27m and 1.07deg median localization accuracy on the sequence 00 of the odometry dataset, starting from a rough pose estimate displaced up to 3.5m and 17deg. To the best of our knowledge this is the first CNN-based approach that learns to match images from a monocular camera to a given, preexisting 3D LiDAR-map. |
Tasks | |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.10109v2 |
https://arxiv.org/pdf/1906.10109v2.pdf | |
PWC | https://paperswithcode.com/paper/cmrnet-camera-to-lidar-map-registration |
Repo | https://github.com/catta202000/CMRNet |
Framework | pytorch |
Understanding Generalization through Visualizations
Title | Understanding Generalization through Visualizations |
Authors | W. Ronny Huang, Zeyad Emam, Micah Goldblum, Liam Fowl, Justin K. Terry, Furong Huang, Tom Goldstein |
Abstract | The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remain elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization methods, we discuss the mystery of generalization, the geometry of loss landscapes, and how the curse (or, rather, the blessing) of dimensionality causes optimizers to settle into minima that generalize well. |
Tasks | |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.03291v4 |
https://arxiv.org/pdf/1906.03291v4.pdf | |
PWC | https://paperswithcode.com/paper/understanding-generalization-through |
Repo | https://github.com/wronnyhuang/gen-viz |
Framework | pytorch |
Improving Black-box Adversarial Attacks with a Transfer-based Prior
Title | Improving Black-box Adversarial Attacks with a Transfer-based Prior |
Authors | Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu |
Abstract | We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods. |
Tasks | |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.06919v2 |
https://arxiv.org/pdf/1906.06919v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-black-box-adversarial-attacks-with |
Repo | https://github.com/thu-ml/Prior-Guided-RGF |
Framework | tf |
A Direct Approach to Robust Deep Learning Using Adversarial Networks
Title | A Direct Approach to Robust Deep Learning Using Adversarial Networks |
Authors | Huaxia Wang, Chun-Nam Yu |
Abstract | Deep neural networks have been shown to perform well in many classical machine learning problems, especially in image classification tasks. However, researchers have found that neural networks can be easily fooled, and they are surprisingly sensitive to small perturbations imperceptible to humans. Carefully crafted input images (adversarial examples) can force a well-trained neural network to provide arbitrary outputs. Including adversarial examples during training is a popular defense mechanism against adversarial attacks. In this paper we propose a new defensive mechanism under the generative adversarial network (GAN) framework. We model the adversarial noise using a generative network, trained jointly with a classification discriminative network as a minimax game. We show empirically that our adversarial network approach works well against black box attacks, with performance on par with state-of-art methods such as ensemble adversarial training and adversarial training with projected gradient descent. |
Tasks | Image Classification |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09591v1 |
https://arxiv.org/pdf/1905.09591v1.pdf | |
PWC | https://paperswithcode.com/paper/a-direct-approach-to-robust-deep-learning-1 |
Repo | https://github.com/whxbergkamp/RobustDL_GAN |
Framework | tf |
Where’s My Head? Definition, Dataset and Models for Numeric Fused-Heads Identification and Resolution
Title | Where’s My Head? Definition, Dataset and Models for Numeric Fused-Heads Identification and Resolution |
Authors | Yanai Elazar, Yoav Goldberg |
Abstract | We provide the first computational treatment of fused-heads constructions (FH), focusing on the numeric fused-heads (NFH). FHs constructions are noun phrases (NPs) in which the head noun is missing and is said to be `fused’ with its dependent modifier. This missing information is implicit and is important for sentence understanding. The missing references are easily filled in by humans but pose a challenge for computational models. We formulate the handling of FH as a two stages process: identification of the FH construction and resolution of the missing head. We explore the NFH phenomena in large corpora of English text and create (1) a dataset and a highly accurate method for NFH identification; (2) a 10k examples (1M tokens) crowd-sourced dataset of NFH resolution; and (3) a neural baseline for the NFH resolution task. We release our code and dataset, in hope to foster further research into this challenging problem. | |
Tasks | |
Published | 2019-05-26 |
URL | https://arxiv.org/abs/1905.10886v1 |
https://arxiv.org/pdf/1905.10886v1.pdf | |
PWC | https://paperswithcode.com/paper/wheres-my-head-definition-dataset-and-models |
Repo | https://github.com/yanaiela/num_fh |
Framework | none |
Meta-Learning with Implicit Gradients
Title | Meta-Learning with Implicit Gradients |
Authors | Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine |
Abstract | A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks. |
Tasks | Few-Shot Image Classification, Few-Shot Learning, Meta-Learning |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04630v1 |
https://arxiv.org/pdf/1909.04630v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-with-implicit-gradients |
Repo | https://github.com/spiglerg/pyMeta |
Framework | tf |
Time to Die: Death Prediction in Dota 2 using Deep Learning
Title | Time to Die: Death Prediction in Dota 2 using Deep Learning |
Authors | Adam Katona, Ryan Spick, Victoria Hodge, Simon Demediuk, Florian Block, Anders Drachen, James Alfred Walker |
Abstract | Esports have become major international sports with hundreds of millions of spectators. Esports games generate massive amounts of telemetry data. Using these to predict the outcome of esports matches has received considerable attention, but micro-predictions, which seek to predict events inside a match, is as yet unknown territory. Micro-predictions are however of perennial interest across esports commentators and audience, because they provide the ability to observe events that might otherwise be missed: esports games are highly complex with fast-moving action where the balance of a game can change in the span of seconds, and where events can happen in multiple areas of the playing field at the same time. Such events can happen rapidly, and it is easy for commentators and viewers alike to miss an event and only observe the following impact of events. In Dota 2, a player hero being killed by the opposing team is a key event of interest to commentators and audience. We present a deep learning network with shared weights which provides accurate death predictions within a five-second window. The network is trained on a vast selection of Dota 2 gameplay features and professional/semi-professional level match dataset. Even though death events are rare within a game (1% of the data), the model achieves 0.377 precision with 0.725 recall on test data when prompted to predict which of any of the 10 players of either team will die within 5 seconds. An example of the system applied to a Dota 2 match is presented. This model enables real-time micro-predictions of kills in Dota 2, one of the most played esports titles in the world, giving commentators and viewers time to move their attention to these key events. |
Tasks | Dota 2 |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1906.03939v1 |
https://arxiv.org/pdf/1906.03939v1.pdf | |
PWC | https://paperswithcode.com/paper/time-to-die-death-prediction-in-dota-2-using |
Repo | https://github.com/adam-katona/dota2_death_prediction |
Framework | pytorch |
Training neural networks to mimic the brain improves object recognition performance
Title | Training neural networks to mimic the brain improves object recognition performance |
Authors | Callie Federer, Haoyan Xu, Alona Fyshe, Joel Zylberberg |
Abstract | The current state-of-the-art object recognition algorithms, deep convolutional neural networks (DCNNs), are inspired by the architecture of the mammalian visual system, and capable of human-level performance on many tasks. However, even these algorithms make errors. As DCNNs train on object recognition tasks, they develop representations in their hidden layers that become more similar to those observed in the mammalian brains. Moreover, DCNNs trained on object recognition tasks are currently among the best models we have of the mammalian visual system. This led us to hypothesize that teaching DCNNs to achieve even more brain-like representations could improve their performance. To test this, we trained DCNNs on a composite task, wherein networks were trained to: a) classify images of objects; while b) having intermediate representations that resemble those observed in neural recordings from monkey visual cortex. Compared with DCNNs trained purely for object categorization, DCNNs trained on the composite task had better object recognition performance, make more reasonable errors and are more robust to label corruption. Our results outline a new way to train object recognition networks, using strategies in which the brain serves as a teacher signal for training DCNNs. |
Tasks | Object Recognition, Transfer Learning |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10679v2 |
https://arxiv.org/pdf/1905.10679v2.pdf | |
PWC | https://paperswithcode.com/paper/training-neural-networks-to-have-brain-like |
Repo | https://github.com/cfederer/TrainCNNsWithNeuralData |
Framework | tf |