February 1, 2020

2841 words 14 mins read

Paper Group AWR 100

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics. GPU-Accelerated Atari Emulation for Reinforcement Learning. Keyphrase Generation for Scientific Articles using GANs. Towards Open-Domain Named Entity Recognition via Neural Correction Models. Finding Generalizable Evidence by Learning …

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics


Title	Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
Authors	Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yunhui Liu, Wei Liu
Abstract	We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic tasks where spatio-temporal features are prevailing. In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. Inspired by the success of two-stream approaches in video classification, we propose to learn visual features by regressing both motion and appearance statistics along spatial and temporal dimensions, given only the input video data. Specifically, we extract statistical concepts (fast-motion region and the corresponding dominant direction, spatio-temporal color diversity, dominant color, etc.) from simple patterns in both spatial and temporal domains. Unlike prior puzzles that are even hard for humans to solve, the proposed approach is consistent with human inherent visual habits and therefore easy to answer. We conduct extensive experiments with C3D to validate the effectiveness of our proposed approach. The experiments show that our approach can significantly improve the performance of C3D when applied to video classification tasks. Code is available at https://github.com/laura-wang/video_repres_mas.
Tasks	Action Recognition In Videos, Representation Learning, Video Classification
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03597v1
PDF	http://arxiv.org/pdf/1904.03597v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-spatio-temporal
Repo	https://github.com/laura-wang/video_repres_mas
Framework	tf

GPU-Accelerated Atari Emulation for Reinforcement Learning


Title	GPU-Accelerated Atari Emulation for Reinforcement Learning
Authors	Steven Dalton, Iuri Frosio, Michael Garland
Abstract	We designed and implemented a CUDA port of the Atari Learning Environment (ALE), a system for developing and evaluating deep reinforcement algorithms using Atari games. Our CUDA Learning Environment (CuLE) overcomes many limitations of existing CPU-based Atari emulators and scales naturally to multi-GPU systems. It leverages the parallelization capability of GPUs to run thousands of Atari games simultaneously; by rendering frames directly on the GPU, CuLE avoids the bottleneck arising from the limited CPU-GPU communication bandwidth. As a result, CuLE is able to generate between 40M and 190M frames per hour using a single GPU, a finding that could be previously achieved only through a cluster of CPUs. We demonstrate the advantages of CuLE by effectively training agents with traditional deep reinforcement learning algorithms and measuring the utilization and throughput of the GPU. Our analysis further highlights the differences in the data generation pattern for emulators running on CPUs or GPUs. CuLE is available at https://github.com/NVLabs/cule .
Tasks	Atari Games
Published	2019-07-19
URL	https://arxiv.org/abs/1907.08467v1
PDF	https://arxiv.org/pdf/1907.08467v1.pdf
PWC	https://paperswithcode.com/paper/gpu-accelerated-atari-emulation-for
Repo	https://github.com/NVLABs/cule
Framework	pytorch

Keyphrase Generation for Scientific Articles using GANs


Title	Keyphrase Generation for Scientific Articles using GANs
Authors	Avinash Swaminathan, Raj Kuwar Gupta, Haimin Zhang, Debanjan Mahata, Rakesh Gosangi, Rajiv Ratn Shah
Abstract	In this paper, we present a keyphrase generation approach using conditional Generative Adversarial Networks (GAN). In our GAN model, the generator outputs a sequence of keyphrases based on the title and abstract of a scientific article. The discriminator learns to distinguish between machine-generated and human-curated keyphrases. We evaluate this approach on standard benchmark datasets. Our model achieves state-of-the-art performance in generation of abstractive keyphrases and is also comparable to the best performing extractive techniques. We also demonstrate that our method generates more diverse keyphrases and make our implementation publicly available.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.12229v1
PDF	https://arxiv.org/pdf/1909.12229v1.pdf
PWC	https://paperswithcode.com/paper/keyphrase-generation-for-scientific-articles
Repo	https://github.com/AilabUdineGit/keyphrase-gan-master
Framework	pytorch

Towards Open-Domain Named Entity Recognition via Neural Correction Models


Title	Towards Open-Domain Named Entity Recognition via Neural Correction Models
Authors	Mengdi Zhu, Zheye Deng, Wenhan Xiong, Mo Yu, Ming Zhang, William Yang Wang
Abstract	Named Entity Recognition (NER) plays an important role in a wide range of natural language processing tasks, such as relation extraction, question answering, etc. However, previous studies on NER are limited to a particular genre, using small manually-annotated or large but low-quality datasets. In this work, we propose a semi-supervised annotation framework to make full use of abstracts from Wikipedia and obtain a large and high-quality dataset called AnchorNER. We assume anchored strings in abstracts are named entities and annotate them with entity types mentioned in DBpedia. To improve the coverage, we design a neural correction model trained with a human-annotated NER dataset, DocRED, to correct the false-negative entity labels, and then train a BERT model with the corrected dataset. We evaluate our trained model on six NER datasets and our experimental results show that we have obtained state-of-the-art open-domain performances — on top of the strong baselines BERT-base and BERT-large, we achieve relative improvements of 4.66% and 3.07% respectively.
Tasks	Named Entity Recognition, Question Answering, Relation Extraction
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06058v1
PDF	https://arxiv.org/pdf/1909.06058v1.pdf
PWC	https://paperswithcode.com/paper/towards-open-domain-named-entity-recognition
Repo	https://github.com/zmd971202/OpenNER
Framework	pytorch

Finding Generalizable Evidence by Learning to Convince Q&A Models


Title	Finding Generalizable Evidence by Learning to Convince Q&A Models
Authors	Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, Kyunghyun Cho
Abstract	We propose a system that finds the strongest supporting evidence for a given answer to a question, using passage-based question-answering (QA) as a testbed. We train evidence agents to select the passage sentences that most convince a pretrained QA model of a given answer, if the QA model received those sentences instead of the full passage. Rather than finding evidence that convinces one model alone, we find that agents select evidence that generalizes; agent-chosen evidence increases the plausibility of the supported answer, as judged by other QA models and humans. Given its general nature, this approach improves QA in a robust manner: using agent-selected evidence (i) humans can correctly answer questions with only ~20% of the full passage and (ii) QA models can generalize to longer passages and harder questions.
Tasks	Question Answering
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05863v1
PDF	https://arxiv.org/pdf/1909.05863v1.pdf
PWC	https://paperswithcode.com/paper/finding-generalizable-evidence-by-learning-to
Repo	https://github.com/ethanjperez/convince
Framework	none

Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation


Title	Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation
Authors	Janis Postels, Francesco Ferroni, Huseyin Coskun, Nassir Navab, Federico Tombari
Abstract	We present a sampling-free approach for computing the epistemic uncertainty of a neural network. Epistemic uncertainty is an important quantity for the deployment of deep neural networks in safety-critical applications, since it represents how much one can trust predictions on new data. Recently promising works were proposed using noise injection combined with Monte-Carlo sampling at inference time to estimate this quantity (e.g. Monte-Carlo dropout). Our main contribution is an approximation of the epistemic uncertainty estimated by these methods that does not require sampling, thus notably reducing the computational overhead. We apply our approach to large-scale visual tasks (i.e., semantic segmentation and depth regression) to demonstrate the advantages of our method compared to sampling-based approaches in terms of quality of the uncertainty estimates as well as of computational overhead.
Tasks	Semantic Segmentation
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00598v3
PDF	https://arxiv.org/pdf/1908.00598v3.pdf
PWC	https://paperswithcode.com/paper/sampling-free-epistemic-uncertainty
Repo	https://github.com/janisgp/Sampling-free-Epistemic-Uncertainty
Framework	tf

LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations


Title	LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations
Authors	Max Eichler, Gözde Gül Şahin, Iryna Gurevych
Abstract	We present LINSPECTOR WEB, an open source multilingual inspector to analyze word representations. Our system provides researchers working in low-resource settings with an easily accessible web based probing tool to gain quick insights into their word embeddings especially outside of the English language. To do this we employ 16 simple linguistic probing tasks such as gender, case marking, and tense for a diverse set of 28 languages. We support probing of static word embeddings along with pretrained AllenNLP models that are commonly used for NLP downstream tasks such as named entity recognition, natural language inference and dependency parsing. The results are visualized in a polar chart and also provided as a table. LINSPECTOR WEB is available as an offline tool or at https://linspector.ukp.informatik.tu-darmstadt.de.
Tasks	Dependency Parsing, Named Entity Recognition, Natural Language Inference, Word Embeddings
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11438v3
PDF	https://arxiv.org/pdf/1907.11438v3.pdf
PWC	https://paperswithcode.com/paper/linspector-web-a-multilingual-probing-suite
Repo	https://github.com/UKPLab/linspector-web
Framework	none

Visualizing the decision-making process in deep neural decision forest


Title	Visualizing the decision-making process in deep neural decision forest
Authors	Shichao Li, Kwang-Ting Cheng
Abstract	Deep neural decision forest (NDF) achieved remarkable performance on various vision tasks via combining decision tree and deep representation learning. In this work, we first trace the decision-making process of this model and visualize saliency maps to understand which portion of the input influence it more for both classification and regression problems. We then apply NDF on a multi-task coordinate regression problem and demonstrate the distribution of routing probabilities, which is vital for interpreting NDF yet not shown for regression problems. The pre-trained model and code for visualization will be available at https://github.com/Nicholasli1995/VisualizingNDF
Tasks	Decision Making, Representation Learning
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09201v1
PDF	http://arxiv.org/pdf/1904.09201v1.pdf
PWC	https://paperswithcode.com/paper/visualizing-the-decision-making-process-in
Repo	https://github.com/Nicholasli1995/VisualizingNDF
Framework	pytorch

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis


Title	What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Authors	Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee
Abstract	Many new proposals for scene text recognition (STR) models have been introduced in recent years. While each claim to have pushed the boundary of the technology, a holistic and fair comparison has been largely missing in the field due to the inconsistent choices of training and evaluation datasets. This paper addresses this difficulty with three major contributions. First, we examine the inconsistencies of training and evaluation datasets, and the performance gap results from inconsistencies. Second, we introduce a unified four-stage STR framework that most existing STR models fit into. Using this framework allows for the extensive evaluation of previously proposed STR modules and the discovery of previously unexplored module combinations. Third, we analyze the module-wise contributions to performance in terms of accuracy, speed, and memory demand, under one consistent set of training and evaluation datasets. Such analyses clean up the hindrance on the current comparisons to understand the performance gain of the existing modules.
Tasks	Scene Text Recognition
Published	2019-04-03
URL	https://arxiv.org/abs/1904.01906v4
PDF	https://arxiv.org/pdf/1904.01906v4.pdf
PWC	https://paperswithcode.com/paper/what-is-wrong-with-scene-text-recognition
Repo	https://github.com/clovaai/deep-text-recognition-benchmark
Framework	pytorch

Team Papelo: Transformer Networks at FEVER


Title	Team Papelo: Transformer Networks at FEVER
Authors	Christopher Malon
Abstract	We develop a system for the FEVER fact extraction and verification challenge that uses a high precision entailment classifier based on transformer networks pretrained with language modeling, to classify a broad set of potential evidence. The precision of the entailment classifier allows us to enhance recall by considering every statement from several articles to decide upon each claim. We include not only the articles best matching the claim text by TFIDF score, but read additional articles whose titles match named entities and capitalized expressions occurring in the claim text. The entailment module evaluates potential evidence one statement at a time, together with the title of the page the evidence came from (providing a hint about possible pronoun antecedents). In preliminary evaluation, the system achieves .5736 FEVER score, .6108 label accuracy, and .6485 evidence F1 on the FEVER shared task test set.
Tasks	Language Modelling
Published	2019-01-08
URL	http://arxiv.org/abs/1901.02534v1
PDF	http://arxiv.org/pdf/1901.02534v1.pdf
PWC	https://paperswithcode.com/paper/team-papelo-transformer-networks-at-fever
Repo	https://github.com/cdmalon/finetune-transformer-lm
Framework	tf

Mapping State Space using Landmarks for Universal Goal Reaching


Title	Mapping State Space using Landmarks for Universal Goal Reaching
Authors	Zhiao Huang, Fangchen Liu, Hao Su
Abstract	An agent that has well understood the environment should be able to apply its skills for any given goals, leading to the fundamental problem of learning the Universal Value Function Approximator (UVFA). A UVFA learns to predict the cumulative rewards between all state-goal pairs. However, empirically, the value function for long-range goals is always hard to estimate and may consequently result in failed policy. This has presented challenges to the learning process and the capability of neural networks. We propose a method to address this issue in large MDPs with sparse rewards, in which exploration and routing across remote states are both extremely challenging. Our method explicitly models the environment in a hierarchical manner, with a high-level dynamic landmark-based map abstracting the visited state space, and a low-level value network to derive precise local decisions. We use farthest point sampling to select landmark states from past experience, which has improved exploration compared with simple uniform sampling. Experimentally we showed that our method enables the agent to reach long-range goals at the early training stage, and achieve better performance than standard RL algorithms for a number of challenging tasks.
Tasks
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05451v1
PDF	https://arxiv.org/pdf/1908.05451v1.pdf
PWC	https://paperswithcode.com/paper/mapping-state-space-using-landmarks-for
Repo	https://github.com/FangchenLiu/map_planner
Framework	pytorch

MintNet: Building Invertible Neural Networks with Masked Convolutions


Title	MintNet: Building Invertible Neural Networks with Masked Convolutions
Authors	Yang Song, Chenlin Meng, Stefano Ermon
Abstract	We propose a new way of constructing invertible neural networks by combining simple building blocks with a novel set of composition rules. This leads to a rich set of invertible architectures, including those similar to ResNets. Inversion is achieved with a locally convergent iterative procedure that is parallelizable and very fast in practice. Additionally, the determinant of the Jacobian can be computed analytically and efficiently, enabling their generative use as flow models. To demonstrate their flexibility, we show that our invertible neural networks are competitive with ResNets on MNIST and CIFAR-10 classification. When trained as generative models, our invertible networks achieve competitive likelihoods on MNIST, CIFAR-10 and ImageNet 32x32, with bits per dimension of 0.98, 3.32 and 4.06 respectively.
Tasks	Image Generation
Published	2019-07-18
URL	https://arxiv.org/abs/1907.07945v2
PDF	https://arxiv.org/pdf/1907.07945v2.pdf
PWC	https://paperswithcode.com/paper/mintnet-building-invertible-neural-networks
Repo	https://github.com/ermongroup/mintnet
Framework	pytorch

Learnable Embedding Space for Efficient Neural Architecture Compression


Title	Learnable Embedding Space for Efficient Neural Architecture Compression
Authors	Shengcao Cao, Xiaofang Wang, Kris M. Kitani
Abstract	We propose a method to incrementally learn an embedding space over the domain of network architectures, to enable the careful selection of architectures for evaluation during compressed architecture search. Given a teacher network, we search for a compressed network architecture by using Bayesian Optimization (BO) with a kernel function defined over our proposed embedding space to select architectures for evaluation. We demonstrate that our search algorithm can significantly outperform various baseline methods, such as random search and reinforcement learning (Ashok et al., 2018). The compressed architectures found by our method are also better than the state-of-the-art manually-designed compact architecture ShuffleNet (Zhang et al., 2018). We also demonstrate that the learned embedding space can be transferred to new settings for architecture search, such as a larger teacher network or a teacher network in a different architecture family, without any training. Code is publicly available here: https://github.com/Friedrich1006/ESNAC .
Tasks	Neural Architecture Search
Published	2019-02-01
URL	http://arxiv.org/abs/1902.00383v2
PDF	http://arxiv.org/pdf/1902.00383v2.pdf
PWC	https://paperswithcode.com/paper/learnable-embedding-space-for-efficient
Repo	https://github.com/Friedrich1006/ESNAC
Framework	pytorch

Inverse Reinforcement Learning in Contextual MDPs


Title	Inverse Reinforcement Learning in Contextual MDPs
Authors	Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor
Abstract	We consider the Inverse Reinforcement Learning problem in Contextual Markov Decision Processes. In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context. There is also an “expert” who knows this mapping and acts according to the optimal policy for each context. The goal of the agent is to learn the expert’s mapping by observing demonstrations. We define an optimization problem for finding this mapping and show that when it is linear, the problem is convex. We present and analyze the sample complexity of three algorithms for solving this problem: the mirrored descent algorithm, evolution strategies, and the ellipsoid method. We also extend the first two methods to work with general reward functions, e.g., deep neural networks, but without the theoretical guarantees. Finally, we compare the different techniques empirically in driving simulation and a medical treatment regime.
Tasks	Autonomous Driving
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09710v3
PDF	https://arxiv.org/pdf/1905.09710v3.pdf
PWC	https://paperswithcode.com/paper/inverse-reinforcement-learning-in-contextual
Repo	https://github.com/CIRLMDP/CIRL
Framework	none

Class-Based Styling: Real-time Localized Style Transfer with Semantic Segmentation


Title	Class-Based Styling: Real-time Localized Style Transfer with Semantic Segmentation
Authors	Lironne Kurzman, David Vazquez, Issam Laradji
Abstract	We propose a Class-Based Styling method (CBS) that can map different styles for different object classes in real-time. CBS achieves real-time performance by carrying out two steps simultaneously. While a semantic segmentation method is used to obtain the mask of each object class in a video frame, a styling method is used to style that frame globally. Then an object class can be styled by combining the segmentation mask and the styled image. The user can also select multiple styles so that different object classes can have different styles in a single frame. For semantic segmentation, we leverage DABNet that achieves high accuracy, yet only has 0.76 million parameters and runs at 104 FPS. For the style transfer step, we use a popular real-time method proposed by Johnson et al. [7]. We evaluated CBS on a video of the CityScapes dataset and observed high-quality localized style transfer results for different object classes and real-time performance.
Tasks	Semantic Segmentation, Style Transfer
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11525v1
PDF	https://arxiv.org/pdf/1908.11525v1.pdf
PWC	https://paperswithcode.com/paper/class-based-styling-real-time-localized-style
Repo	https://github.com/IssamLaradji/CBStyling
Framework	pytorch