February 1, 2020

2841 words 14 mins read

Paper Group AWR 100

Paper Group AWR 100

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics. GPU-Accelerated Atari Emulation for Reinforcement Learning. Keyphrase Generation for Scientific Articles using GANs. Towards Open-Domain Named Entity Recognition via Neural Correction Models. Finding Generalizable Evidence by Learning …

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

Title Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
Authors Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yunhui Liu, Wei Liu
Abstract We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic tasks where spatio-temporal features are prevailing. In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. Inspired by the success of two-stream approaches in video classification, we propose to learn visual features by regressing both motion and appearance statistics along spatial and temporal dimensions, given only the input video data. Specifically, we extract statistical concepts (fast-motion region and the corresponding dominant direction, spatio-temporal color diversity, dominant color, etc.) from simple patterns in both spatial and temporal domains. Unlike prior puzzles that are even hard for humans to solve, the proposed approach is consistent with human inherent visual habits and therefore easy to answer. We conduct extensive experiments with C3D to validate the effectiveness of our proposed approach. The experiments show that our approach can significantly improve the performance of C3D when applied to video classification tasks. Code is available at https://github.com/laura-wang/video_repres_mas.
Tasks Action Recognition In Videos, Representation Learning, Video Classification
Published 2019-04-07
URL http://arxiv.org/abs/1904.03597v1
PDF http://arxiv.org/pdf/1904.03597v1.pdf
PWC https://paperswithcode.com/paper/self-supervised-spatio-temporal
Repo https://github.com/laura-wang/video_repres_mas
Framework tf

GPU-Accelerated Atari Emulation for Reinforcement Learning

Title GPU-Accelerated Atari Emulation for Reinforcement Learning
Authors Steven Dalton, Iuri Frosio, Michael Garland
Abstract We designed and implemented a CUDA port of the Atari Learning Environment (ALE), a system for developing and evaluating deep reinforcement algorithms using Atari games. Our CUDA Learning Environment (CuLE) overcomes many limitations of existing CPU-based Atari emulators and scales naturally to multi-GPU systems. It leverages the parallelization capability of GPUs to run thousands of Atari games simultaneously; by rendering frames directly on the GPU, CuLE avoids the bottleneck arising from the limited CPU-GPU communication bandwidth. As a result, CuLE is able to generate between 40M and 190M frames per hour using a single GPU, a finding that could be previously achieved only through a cluster of CPUs. We demonstrate the advantages of CuLE by effectively training agents with traditional deep reinforcement learning algorithms and measuring the utilization and throughput of the GPU. Our analysis further highlights the differences in the data generation pattern for emulators running on CPUs or GPUs. CuLE is available at https://github.com/NVLabs/cule .
Tasks Atari Games
Published 2019-07-19
URL https://arxiv.org/abs/1907.08467v1
PDF https://arxiv.org/pdf/1907.08467v1.pdf
PWC https://paperswithcode.com/paper/gpu-accelerated-atari-emulation-for
Repo https://github.com/NVLABs/cule
Framework pytorch

Keyphrase Generation for Scientific Articles using GANs

Title Keyphrase Generation for Scientific Articles using GANs
Authors Avinash Swaminathan, Raj Kuwar Gupta, Haimin Zhang, Debanjan Mahata, Rakesh Gosangi, Rajiv Ratn Shah
Abstract In this paper, we present a keyphrase generation approach using conditional Generative Adversarial Networks (GAN). In our GAN model, the generator outputs a sequence of keyphrases based on the title and abstract of a scientific article. The discriminator learns to distinguish between machine-generated and human-curated keyphrases. We evaluate this approach on standard benchmark datasets. Our model achieves state-of-the-art performance in generation of abstractive keyphrases and is also comparable to the best performing extractive techniques. We also demonstrate that our method generates more diverse keyphrases and make our implementation publicly available.
Tasks
Published 2019-09-24
URL https://arxiv.org/abs/1909.12229v1
PDF https://arxiv.org/pdf/1909.12229v1.pdf
PWC https://paperswithcode.com/paper/keyphrase-generation-for-scientific-articles
Repo https://github.com/AilabUdineGit/keyphrase-gan-master
Framework pytorch

Towards Open-Domain Named Entity Recognition via Neural Correction Models

Title Towards Open-Domain Named Entity Recognition via Neural Correction Models
Authors Mengdi Zhu, Zheye Deng, Wenhan Xiong, Mo Yu, Ming Zhang, William Yang Wang
Abstract Named Entity Recognition (NER) plays an important role in a wide range of natural language processing tasks, such as relation extraction, question answering, etc. However, previous studies on NER are limited to a particular genre, using small manually-annotated or large but low-quality datasets. In this work, we propose a semi-supervised annotation framework to make full use of abstracts from Wikipedia and obtain a large and high-quality dataset called AnchorNER. We assume anchored strings in abstracts are named entities and annotate them with entity types mentioned in DBpedia. To improve the coverage, we design a neural correction model trained with a human-annotated NER dataset, DocRED, to correct the false-negative entity labels, and then train a BERT model with the corrected dataset. We evaluate our trained model on six NER datasets and our experimental results show that we have obtained state-of-the-art open-domain performances — on top of the strong baselines BERT-base and BERT-large, we achieve relative improvements of 4.66% and 3.07% respectively.
Tasks Named Entity Recognition, Question Answering, Relation Extraction
Published 2019-09-13
URL https://arxiv.org/abs/1909.06058v1
PDF https://arxiv.org/pdf/1909.06058v1.pdf
PWC https://paperswithcode.com/paper/towards-open-domain-named-entity-recognition
Repo https://github.com/zmd971202/OpenNER
Framework pytorch

Finding Generalizable Evidence by Learning to Convince Q&A Models

Title Finding Generalizable Evidence by Learning to Convince Q&A Models
Authors Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, Kyunghyun Cho
Abstract We propose a system that finds the strongest supporting evidence for a given answer to a question, using passage-based question-answering (QA) as a testbed. We train evidence agents to select the passage sentences that most convince a pretrained QA model of a given answer, if the QA model received those sentences instead of the full passage. Rather than finding evidence that convinces one model alone, we find that agents select evidence that generalizes; agent-chosen evidence increases the plausibility of the supported answer, as judged by other QA models and humans. Given its general nature, this approach improves QA in a robust manner: using agent-selected evidence (i) humans can correctly answer questions with only ~20% of the full passage and (ii) QA models can generalize to longer passages and harder questions.
Tasks Question Answering
Published 2019-09-12
URL https://arxiv.org/abs/1909.05863v1
PDF https://arxiv.org/pdf/1909.05863v1.pdf
PWC https://paperswithcode.com/paper/finding-generalizable-evidence-by-learning-to
Repo https://github.com/ethanjperez/convince
Framework none

Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation

Title Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation
Authors Janis Postels, Francesco Ferroni, Huseyin Coskun, Nassir Navab, Federico Tombari
Abstract We present a sampling-free approach for computing the epistemic uncertainty of a neural network. Epistemic uncertainty is an important quantity for the deployment of deep neural networks in safety-critical applications, since it represents how much one can trust predictions on new data. Recently promising works were proposed using noise injection combined with Monte-Carlo sampling at inference time to estimate this quantity (e.g. Monte-Carlo dropout). Our main contribution is an approximation of the epistemic uncertainty estimated by these methods that does not require sampling, thus notably reducing the computational overhead. We apply our approach to large-scale visual tasks (i.e., semantic segmentation and depth regression) to demonstrate the advantages of our method compared to sampling-based approaches in terms of quality of the uncertainty estimates as well as of computational overhead.
Tasks Semantic Segmentation
Published 2019-08-01
URL https://arxiv.org/abs/1908.00598v3
PDF https://arxiv.org/pdf/1908.00598v3.pdf
PWC https://paperswithcode.com/paper/sampling-free-epistemic-uncertainty
Repo https://github.com/janisgp/Sampling-free-Epistemic-Uncertainty
Framework tf

LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations

Title LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations
Authors Max Eichler, Gözde Gül Şahin, Iryna Gurevych
Abstract We present LINSPECTOR WEB, an open source multilingual inspector to analyze word representations. Our system provides researchers working in low-resource settings with an easily accessible web based probing tool to gain quick insights into their word embeddings especially outside of the English language. To do this we employ 16 simple linguistic probing tasks such as gender, case marking, and tense for a diverse set of 28 languages. We support probing of static word embeddings along with pretrained AllenNLP models that are commonly used for NLP downstream tasks such as named entity recognition, natural language inference and dependency parsing. The results are visualized in a polar chart and also provided as a table. LINSPECTOR WEB is available as an offline tool or at https://linspector.ukp.informatik.tu-darmstadt.de.
Tasks Dependency Parsing, Named Entity Recognition, Natural Language Inference, Word Embeddings
Published 2019-07-26
URL https://arxiv.org/abs/1907.11438v3
PDF https://arxiv.org/pdf/1907.11438v3.pdf
PWC https://paperswithcode.com/paper/linspector-web-a-multilingual-probing-suite
Repo https://github.com/UKPLab/linspector-web
Framework none

Visualizing the decision-making process in deep neural decision forest

Title Visualizing the decision-making process in deep neural decision forest
Authors Shichao Li, Kwang-Ting Cheng
Abstract Deep neural decision forest (NDF) achieved remarkable performance on various vision tasks via combining decision tree and deep representation learning. In this work, we first trace the decision-making process of this model and visualize saliency maps to understand which portion of the input influence it more for both classification and regression problems. We then apply NDF on a multi-task coordinate regression problem and demonstrate the distribution of routing probabilities, which is vital for interpreting NDF yet not shown for regression problems. The pre-trained model and code for visualization will be available at https://github.com/Nicholasli1995/VisualizingNDF
Tasks Decision Making, Representation Learning
Published 2019-04-19
URL http://arxiv.org/abs/1904.09201v1
PDF http://arxiv.org/pdf/1904.09201v1.pdf
PWC https://paperswithcode.com/paper/visualizing-the-decision-making-process-in
Repo https://github.com/Nicholasli1995/VisualizingNDF
Framework pytorch

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis

Title What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Authors Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee
Abstract Many new proposals for scene text recognition (STR) models have been introduced in recent years. While each claim to have pushed the boundary of the technology, a holistic and fair comparison has been largely missing in the field due to the inconsistent choices of training and evaluation datasets. This paper addresses this difficulty with three major contributions. First, we examine the inconsistencies of training and evaluation datasets, and the performance gap results from inconsistencies. Second, we introduce a unified four-stage STR framework that most existing STR models fit into. Using this framework allows for the extensive evaluation of previously proposed STR modules and the discovery of previously unexplored module combinations. Third, we analyze the module-wise contributions to performance in terms of accuracy, speed, and memory demand, under one consistent set of training and evaluation datasets. Such analyses clean up the hindrance on the current comparisons to understand the performance gain of the existing modules.
Tasks Scene Text Recognition
Published 2019-04-03
URL https://arxiv.org/abs/1904.01906v4
PDF https://arxiv.org/pdf/1904.01906v4.pdf
PWC https://paperswithcode.com/paper/what-is-wrong-with-scene-text-recognition
Repo https://github.com/clovaai/deep-text-recognition-benchmark
Framework pytorch

Team Papelo: Transformer Networks at FEVER

Title Team Papelo: Transformer Networks at FEVER
Authors Christopher Malon
Abstract We develop a system for the FEVER fact extraction and verification challenge that uses a high precision entailment classifier based on transformer networks pretrained with language modeling, to classify a broad set of potential evidence. The precision of the entailment classifier allows us to enhance recall by considering every statement from several articles to decide upon each claim. We include not only the articles best matching the claim text by TFIDF score, but read additional articles whose titles match named entities and capitalized expressions occurring in the claim text. The entailment module evaluates potential evidence one statement at a time, together with the title of the page the evidence came from (providing a hint about possible pronoun antecedents). In preliminary evaluation, the system achieves .5736 FEVER score, .6108 label accuracy, and .6485 evidence F1 on the FEVER shared task test set.
Tasks Language Modelling
Published 2019-01-08
URL http://arxiv.org/abs/1901.02534v1
PDF http://arxiv.org/pdf/1901.02534v1.pdf
PWC https://paperswithcode.com/paper/team-papelo-transformer-networks-at-fever
Repo https://github.com/cdmalon/finetune-transformer-lm
Framework tf

Mapping State Space using Landmarks for Universal Goal Reaching

Title Mapping State Space using Landmarks for Universal Goal Reaching
Authors Zhiao Huang, Fangchen Liu, Hao Su
Abstract An agent that has well understood the environment should be able to apply its skills for any given goals, leading to the fundamental problem of learning the Universal Value Function Approximator (UVFA). A UVFA learns to predict the cumulative rewards between all state-goal pairs. However, empirically, the value function for long-range goals is always hard to estimate and may consequently result in failed policy. This has presented challenges to the learning process and the capability of neural networks. We propose a method to address this issue in large MDPs with sparse rewards, in which exploration and routing across remote states are both extremely challenging. Our method explicitly models the environment in a hierarchical manner, with a high-level dynamic landmark-based map abstracting the visited state space, and a low-level value network to derive precise local decisions. We use farthest point sampling to select landmark states from past experience, which has improved exploration compared with simple uniform sampling. Experimentally we showed that our method enables the agent to reach long-range goals at the early training stage, and achieve better performance than standard RL algorithms for a number of challenging tasks.
Tasks
Published 2019-08-15
URL https://arxiv.org/abs/1908.05451v1
PDF https://arxiv.org/pdf/1908.05451v1.pdf
PWC https://paperswithcode.com/paper/mapping-state-space-using-landmarks-for
Repo https://github.com/FangchenLiu/map_planner
Framework pytorch

MintNet: Building Invertible Neural Networks with Masked Convolutions

Title MintNet: Building Invertible Neural Networks with Masked Convolutions
Authors Yang Song, Chenlin Meng, Stefano Ermon
Abstract We propose a new way of constructing invertible neural networks by combining simple building blocks with a novel set of composition rules. This leads to a rich set of invertible architectures, including those similar to ResNets. Inversion is achieved with a locally convergent iterative procedure that is parallelizable and very fast in practice. Additionally, the determinant of the Jacobian can be computed analytically and efficiently, enabling their generative use as flow models. To demonstrate their flexibility, we show that our invertible neural networks are competitive with ResNets on MNIST and CIFAR-10 classification. When trained as generative models, our invertible networks achieve competitive likelihoods on MNIST, CIFAR-10 and ImageNet 32x32, with bits per dimension of 0.98, 3.32 and 4.06 respectively.
Tasks Image Generation
Published 2019-07-18
URL https://arxiv.org/abs/1907.07945v2
PDF https://arxiv.org/pdf/1907.07945v2.pdf
PWC https://paperswithcode.com/paper/mintnet-building-invertible-neural-networks
Repo https://github.com/ermongroup/mintnet
Framework pytorch

Learnable Embedding Space for Efficient Neural Architecture Compression

Title Learnable Embedding Space for Efficient Neural Architecture Compression
Authors Shengcao Cao, Xiaofang Wang, Kris M. Kitani
Abstract We propose a method to incrementally learn an embedding space over the domain of network architectures, to enable the careful selection of architectures for evaluation during compressed architecture search. Given a teacher network, we search for a compressed network architecture by using Bayesian Optimization (BO) with a kernel function defined over our proposed embedding space to select architectures for evaluation. We demonstrate that our search algorithm can significantly outperform various baseline methods, such as random search and reinforcement learning (Ashok et al., 2018). The compressed architectures found by our method are also better than the state-of-the-art manually-designed compact architecture ShuffleNet (Zhang et al., 2018). We also demonstrate that the learned embedding space can be transferred to new settings for architecture search, such as a larger teacher network or a teacher network in a different architecture family, without any training. Code is publicly available here: https://github.com/Friedrich1006/ESNAC .
Tasks Neural Architecture Search
Published 2019-02-01
URL http://arxiv.org/abs/1902.00383v2
PDF http://arxiv.org/pdf/1902.00383v2.pdf
PWC https://paperswithcode.com/paper/learnable-embedding-space-for-efficient
Repo https://github.com/Friedrich1006/ESNAC
Framework pytorch

Inverse Reinforcement Learning in Contextual MDPs

Title Inverse Reinforcement Learning in Contextual MDPs
Authors Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor
Abstract We consider the Inverse Reinforcement Learning problem in Contextual Markov Decision Processes. In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context. There is also an “expert” who knows this mapping and acts according to the optimal policy for each context. The goal of the agent is to learn the expert’s mapping by observing demonstrations. We define an optimization problem for finding this mapping and show that when it is linear, the problem is convex. We present and analyze the sample complexity of three algorithms for solving this problem: the mirrored descent algorithm, evolution strategies, and the ellipsoid method. We also extend the first two methods to work with general reward functions, e.g., deep neural networks, but without the theoretical guarantees. Finally, we compare the different techniques empirically in driving simulation and a medical treatment regime.
Tasks Autonomous Driving
Published 2019-05-23
URL https://arxiv.org/abs/1905.09710v3
PDF https://arxiv.org/pdf/1905.09710v3.pdf
PWC https://paperswithcode.com/paper/inverse-reinforcement-learning-in-contextual
Repo https://github.com/CIRLMDP/CIRL
Framework none

Class-Based Styling: Real-time Localized Style Transfer with Semantic Segmentation

Title Class-Based Styling: Real-time Localized Style Transfer with Semantic Segmentation
Authors Lironne Kurzman, David Vazquez, Issam Laradji
Abstract We propose a Class-Based Styling method (CBS) that can map different styles for different object classes in real-time. CBS achieves real-time performance by carrying out two steps simultaneously. While a semantic segmentation method is used to obtain the mask of each object class in a video frame, a styling method is used to style that frame globally. Then an object class can be styled by combining the segmentation mask and the styled image. The user can also select multiple styles so that different object classes can have different styles in a single frame. For semantic segmentation, we leverage DABNet that achieves high accuracy, yet only has 0.76 million parameters and runs at 104 FPS. For the style transfer step, we use a popular real-time method proposed by Johnson et al. [7]. We evaluated CBS on a video of the CityScapes dataset and observed high-quality localized style transfer results for different object classes and real-time performance.
Tasks Semantic Segmentation, Style Transfer
Published 2019-08-30
URL https://arxiv.org/abs/1908.11525v1
PDF https://arxiv.org/pdf/1908.11525v1.pdf
PWC https://paperswithcode.com/paper/class-based-styling-real-time-localized-style
Repo https://github.com/IssamLaradji/CBStyling
Framework pytorch
comments powered by Disqus