Paper Group AWR 100
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics. GPU-Accelerated Atari Emulation for Reinforcement Learning. Keyphrase Generation for Scientific Articles using GANs. Towards Open-Domain Named Entity Recognition via Neural Correction Models. Finding Generalizable Evidence by Learning …
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
Title | Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics |
Authors | Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yunhui Liu, Wei Liu |
Abstract | We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic tasks where spatio-temporal features are prevailing. In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. Inspired by the success of two-stream approaches in video classification, we propose to learn visual features by regressing both motion and appearance statistics along spatial and temporal dimensions, given only the input video data. Specifically, we extract statistical concepts (fast-motion region and the corresponding dominant direction, spatio-temporal color diversity, dominant color, etc.) from simple patterns in both spatial and temporal domains. Unlike prior puzzles that are even hard for humans to solve, the proposed approach is consistent with human inherent visual habits and therefore easy to answer. We conduct extensive experiments with C3D to validate the effectiveness of our proposed approach. The experiments show that our approach can significantly improve the performance of C3D when applied to video classification tasks. Code is available at https://github.com/laura-wang/video_repres_mas. |
Tasks | Action Recognition In Videos, Representation Learning, Video Classification |
Published | 2019-04-07 |
URL | http://arxiv.org/abs/1904.03597v1 |
http://arxiv.org/pdf/1904.03597v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-spatio-temporal |
Repo | https://github.com/laura-wang/video_repres_mas |
Framework | tf |
GPU-Accelerated Atari Emulation for Reinforcement Learning
Title | GPU-Accelerated Atari Emulation for Reinforcement Learning |
Authors | Steven Dalton, Iuri Frosio, Michael Garland |
Abstract | We designed and implemented a CUDA port of the Atari Learning Environment (ALE), a system for developing and evaluating deep reinforcement algorithms using Atari games. Our CUDA Learning Environment (CuLE) overcomes many limitations of existing CPU-based Atari emulators and scales naturally to multi-GPU systems. It leverages the parallelization capability of GPUs to run thousands of Atari games simultaneously; by rendering frames directly on the GPU, CuLE avoids the bottleneck arising from the limited CPU-GPU communication bandwidth. As a result, CuLE is able to generate between 40M and 190M frames per hour using a single GPU, a finding that could be previously achieved only through a cluster of CPUs. We demonstrate the advantages of CuLE by effectively training agents with traditional deep reinforcement learning algorithms and measuring the utilization and throughput of the GPU. Our analysis further highlights the differences in the data generation pattern for emulators running on CPUs or GPUs. CuLE is available at https://github.com/NVLabs/cule . |
Tasks | Atari Games |
Published | 2019-07-19 |
URL | https://arxiv.org/abs/1907.08467v1 |
https://arxiv.org/pdf/1907.08467v1.pdf | |
PWC | https://paperswithcode.com/paper/gpu-accelerated-atari-emulation-for |
Repo | https://github.com/NVLABs/cule |
Framework | pytorch |
Keyphrase Generation for Scientific Articles using GANs
Title | Keyphrase Generation for Scientific Articles using GANs |
Authors | Avinash Swaminathan, Raj Kuwar Gupta, Haimin Zhang, Debanjan Mahata, Rakesh Gosangi, Rajiv Ratn Shah |
Abstract | In this paper, we present a keyphrase generation approach using conditional Generative Adversarial Networks (GAN). In our GAN model, the generator outputs a sequence of keyphrases based on the title and abstract of a scientific article. The discriminator learns to distinguish between machine-generated and human-curated keyphrases. We evaluate this approach on standard benchmark datasets. Our model achieves state-of-the-art performance in generation of abstractive keyphrases and is also comparable to the best performing extractive techniques. We also demonstrate that our method generates more diverse keyphrases and make our implementation publicly available. |
Tasks | |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.12229v1 |
https://arxiv.org/pdf/1909.12229v1.pdf | |
PWC | https://paperswithcode.com/paper/keyphrase-generation-for-scientific-articles |
Repo | https://github.com/AilabUdineGit/keyphrase-gan-master |
Framework | pytorch |
Towards Open-Domain Named Entity Recognition via Neural Correction Models
Title | Towards Open-Domain Named Entity Recognition via Neural Correction Models |
Authors | Mengdi Zhu, Zheye Deng, Wenhan Xiong, Mo Yu, Ming Zhang, William Yang Wang |
Abstract | Named Entity Recognition (NER) plays an important role in a wide range of natural language processing tasks, such as relation extraction, question answering, etc. However, previous studies on NER are limited to a particular genre, using small manually-annotated or large but low-quality datasets. In this work, we propose a semi-supervised annotation framework to make full use of abstracts from Wikipedia and obtain a large and high-quality dataset called AnchorNER. We assume anchored strings in abstracts are named entities and annotate them with entity types mentioned in DBpedia. To improve the coverage, we design a neural correction model trained with a human-annotated NER dataset, DocRED, to correct the false-negative entity labels, and then train a BERT model with the corrected dataset. We evaluate our trained model on six NER datasets and our experimental results show that we have obtained state-of-the-art open-domain performances — on top of the strong baselines BERT-base and BERT-large, we achieve relative improvements of 4.66% and 3.07% respectively. |
Tasks | Named Entity Recognition, Question Answering, Relation Extraction |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06058v1 |
https://arxiv.org/pdf/1909.06058v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-open-domain-named-entity-recognition |
Repo | https://github.com/zmd971202/OpenNER |
Framework | pytorch |
Finding Generalizable Evidence by Learning to Convince Q&A Models
Title | Finding Generalizable Evidence by Learning to Convince Q&A Models |
Authors | Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, Kyunghyun Cho |
Abstract | We propose a system that finds the strongest supporting evidence for a given answer to a question, using passage-based question-answering (QA) as a testbed. We train evidence agents to select the passage sentences that most convince a pretrained QA model of a given answer, if the QA model received those sentences instead of the full passage. Rather than finding evidence that convinces one model alone, we find that agents select evidence that generalizes; agent-chosen evidence increases the plausibility of the supported answer, as judged by other QA models and humans. Given its general nature, this approach improves QA in a robust manner: using agent-selected evidence (i) humans can correctly answer questions with only ~20% of the full passage and (ii) QA models can generalize to longer passages and harder questions. |
Tasks | Question Answering |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05863v1 |
https://arxiv.org/pdf/1909.05863v1.pdf | |
PWC | https://paperswithcode.com/paper/finding-generalizable-evidence-by-learning-to |
Repo | https://github.com/ethanjperez/convince |
Framework | none |
Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation
Title | Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation |
Authors | Janis Postels, Francesco Ferroni, Huseyin Coskun, Nassir Navab, Federico Tombari |
Abstract | We present a sampling-free approach for computing the epistemic uncertainty of a neural network. Epistemic uncertainty is an important quantity for the deployment of deep neural networks in safety-critical applications, since it represents how much one can trust predictions on new data. Recently promising works were proposed using noise injection combined with Monte-Carlo sampling at inference time to estimate this quantity (e.g. Monte-Carlo dropout). Our main contribution is an approximation of the epistemic uncertainty estimated by these methods that does not require sampling, thus notably reducing the computational overhead. We apply our approach to large-scale visual tasks (i.e., semantic segmentation and depth regression) to demonstrate the advantages of our method compared to sampling-based approaches in terms of quality of the uncertainty estimates as well as of computational overhead. |
Tasks | Semantic Segmentation |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00598v3 |
https://arxiv.org/pdf/1908.00598v3.pdf | |
PWC | https://paperswithcode.com/paper/sampling-free-epistemic-uncertainty |
Repo | https://github.com/janisgp/Sampling-free-Epistemic-Uncertainty |
Framework | tf |
LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations
Title | LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations |
Authors | Max Eichler, Gözde Gül Şahin, Iryna Gurevych |
Abstract | We present LINSPECTOR WEB, an open source multilingual inspector to analyze word representations. Our system provides researchers working in low-resource settings with an easily accessible web based probing tool to gain quick insights into their word embeddings especially outside of the English language. To do this we employ 16 simple linguistic probing tasks such as gender, case marking, and tense for a diverse set of 28 languages. We support probing of static word embeddings along with pretrained AllenNLP models that are commonly used for NLP downstream tasks such as named entity recognition, natural language inference and dependency parsing. The results are visualized in a polar chart and also provided as a table. LINSPECTOR WEB is available as an offline tool or at https://linspector.ukp.informatik.tu-darmstadt.de. |
Tasks | Dependency Parsing, Named Entity Recognition, Natural Language Inference, Word Embeddings |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11438v3 |
https://arxiv.org/pdf/1907.11438v3.pdf | |
PWC | https://paperswithcode.com/paper/linspector-web-a-multilingual-probing-suite |
Repo | https://github.com/UKPLab/linspector-web |
Framework | none |
Visualizing the decision-making process in deep neural decision forest
Title | Visualizing the decision-making process in deep neural decision forest |
Authors | Shichao Li, Kwang-Ting Cheng |
Abstract | Deep neural decision forest (NDF) achieved remarkable performance on various vision tasks via combining decision tree and deep representation learning. In this work, we first trace the decision-making process of this model and visualize saliency maps to understand which portion of the input influence it more for both classification and regression problems. We then apply NDF on a multi-task coordinate regression problem and demonstrate the distribution of routing probabilities, which is vital for interpreting NDF yet not shown for regression problems. The pre-trained model and code for visualization will be available at https://github.com/Nicholasli1995/VisualizingNDF |
Tasks | Decision Making, Representation Learning |
Published | 2019-04-19 |
URL | http://arxiv.org/abs/1904.09201v1 |
http://arxiv.org/pdf/1904.09201v1.pdf | |
PWC | https://paperswithcode.com/paper/visualizing-the-decision-making-process-in |
Repo | https://github.com/Nicholasli1995/VisualizingNDF |
Framework | pytorch |
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Title | What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis |
Authors | Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee |
Abstract | Many new proposals for scene text recognition (STR) models have been introduced in recent years. While each claim to have pushed the boundary of the technology, a holistic and fair comparison has been largely missing in the field due to the inconsistent choices of training and evaluation datasets. This paper addresses this difficulty with three major contributions. First, we examine the inconsistencies of training and evaluation datasets, and the performance gap results from inconsistencies. Second, we introduce a unified four-stage STR framework that most existing STR models fit into. Using this framework allows for the extensive evaluation of previously proposed STR modules and the discovery of previously unexplored module combinations. Third, we analyze the module-wise contributions to performance in terms of accuracy, speed, and memory demand, under one consistent set of training and evaluation datasets. Such analyses clean up the hindrance on the current comparisons to understand the performance gain of the existing modules. |
Tasks | Scene Text Recognition |
Published | 2019-04-03 |
URL | https://arxiv.org/abs/1904.01906v4 |
https://arxiv.org/pdf/1904.01906v4.pdf | |
PWC | https://paperswithcode.com/paper/what-is-wrong-with-scene-text-recognition |
Repo | https://github.com/clovaai/deep-text-recognition-benchmark |
Framework | pytorch |
Team Papelo: Transformer Networks at FEVER
Title | Team Papelo: Transformer Networks at FEVER |
Authors | Christopher Malon |
Abstract | We develop a system for the FEVER fact extraction and verification challenge that uses a high precision entailment classifier based on transformer networks pretrained with language modeling, to classify a broad set of potential evidence. The precision of the entailment classifier allows us to enhance recall by considering every statement from several articles to decide upon each claim. We include not only the articles best matching the claim text by TFIDF score, but read additional articles whose titles match named entities and capitalized expressions occurring in the claim text. The entailment module evaluates potential evidence one statement at a time, together with the title of the page the evidence came from (providing a hint about possible pronoun antecedents). In preliminary evaluation, the system achieves .5736 FEVER score, .6108 label accuracy, and .6485 evidence F1 on the FEVER shared task test set. |
Tasks | Language Modelling |
Published | 2019-01-08 |
URL | http://arxiv.org/abs/1901.02534v1 |
http://arxiv.org/pdf/1901.02534v1.pdf | |
PWC | https://paperswithcode.com/paper/team-papelo-transformer-networks-at-fever |
Repo | https://github.com/cdmalon/finetune-transformer-lm |
Framework | tf |
Mapping State Space using Landmarks for Universal Goal Reaching
Title | Mapping State Space using Landmarks for Universal Goal Reaching |
Authors | Zhiao Huang, Fangchen Liu, Hao Su |
Abstract | An agent that has well understood the environment should be able to apply its skills for any given goals, leading to the fundamental problem of learning the Universal Value Function Approximator (UVFA). A UVFA learns to predict the cumulative rewards between all state-goal pairs. However, empirically, the value function for long-range goals is always hard to estimate and may consequently result in failed policy. This has presented challenges to the learning process and the capability of neural networks. We propose a method to address this issue in large MDPs with sparse rewards, in which exploration and routing across remote states are both extremely challenging. Our method explicitly models the environment in a hierarchical manner, with a high-level dynamic landmark-based map abstracting the visited state space, and a low-level value network to derive precise local decisions. We use farthest point sampling to select landmark states from past experience, which has improved exploration compared with simple uniform sampling. Experimentally we showed that our method enables the agent to reach long-range goals at the early training stage, and achieve better performance than standard RL algorithms for a number of challenging tasks. |
Tasks | |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05451v1 |
https://arxiv.org/pdf/1908.05451v1.pdf | |
PWC | https://paperswithcode.com/paper/mapping-state-space-using-landmarks-for |
Repo | https://github.com/FangchenLiu/map_planner |
Framework | pytorch |
MintNet: Building Invertible Neural Networks with Masked Convolutions
Title | MintNet: Building Invertible Neural Networks with Masked Convolutions |
Authors | Yang Song, Chenlin Meng, Stefano Ermon |
Abstract | We propose a new way of constructing invertible neural networks by combining simple building blocks with a novel set of composition rules. This leads to a rich set of invertible architectures, including those similar to ResNets. Inversion is achieved with a locally convergent iterative procedure that is parallelizable and very fast in practice. Additionally, the determinant of the Jacobian can be computed analytically and efficiently, enabling their generative use as flow models. To demonstrate their flexibility, we show that our invertible neural networks are competitive with ResNets on MNIST and CIFAR-10 classification. When trained as generative models, our invertible networks achieve competitive likelihoods on MNIST, CIFAR-10 and ImageNet 32x32, with bits per dimension of 0.98, 3.32 and 4.06 respectively. |
Tasks | Image Generation |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.07945v2 |
https://arxiv.org/pdf/1907.07945v2.pdf | |
PWC | https://paperswithcode.com/paper/mintnet-building-invertible-neural-networks |
Repo | https://github.com/ermongroup/mintnet |
Framework | pytorch |
Learnable Embedding Space for Efficient Neural Architecture Compression
Title | Learnable Embedding Space for Efficient Neural Architecture Compression |
Authors | Shengcao Cao, Xiaofang Wang, Kris M. Kitani |
Abstract | We propose a method to incrementally learn an embedding space over the domain of network architectures, to enable the careful selection of architectures for evaluation during compressed architecture search. Given a teacher network, we search for a compressed network architecture by using Bayesian Optimization (BO) with a kernel function defined over our proposed embedding space to select architectures for evaluation. We demonstrate that our search algorithm can significantly outperform various baseline methods, such as random search and reinforcement learning (Ashok et al., 2018). The compressed architectures found by our method are also better than the state-of-the-art manually-designed compact architecture ShuffleNet (Zhang et al., 2018). We also demonstrate that the learned embedding space can be transferred to new settings for architecture search, such as a larger teacher network or a teacher network in a different architecture family, without any training. Code is publicly available here: https://github.com/Friedrich1006/ESNAC . |
Tasks | Neural Architecture Search |
Published | 2019-02-01 |
URL | http://arxiv.org/abs/1902.00383v2 |
http://arxiv.org/pdf/1902.00383v2.pdf | |
PWC | https://paperswithcode.com/paper/learnable-embedding-space-for-efficient |
Repo | https://github.com/Friedrich1006/ESNAC |
Framework | pytorch |
Inverse Reinforcement Learning in Contextual MDPs
Title | Inverse Reinforcement Learning in Contextual MDPs |
Authors | Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor |
Abstract | We consider the Inverse Reinforcement Learning problem in Contextual Markov Decision Processes. In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context. There is also an “expert” who knows this mapping and acts according to the optimal policy for each context. The goal of the agent is to learn the expert’s mapping by observing demonstrations. We define an optimization problem for finding this mapping and show that when it is linear, the problem is convex. We present and analyze the sample complexity of three algorithms for solving this problem: the mirrored descent algorithm, evolution strategies, and the ellipsoid method. We also extend the first two methods to work with general reward functions, e.g., deep neural networks, but without the theoretical guarantees. Finally, we compare the different techniques empirically in driving simulation and a medical treatment regime. |
Tasks | Autonomous Driving |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09710v3 |
https://arxiv.org/pdf/1905.09710v3.pdf | |
PWC | https://paperswithcode.com/paper/inverse-reinforcement-learning-in-contextual |
Repo | https://github.com/CIRLMDP/CIRL |
Framework | none |
Class-Based Styling: Real-time Localized Style Transfer with Semantic Segmentation
Title | Class-Based Styling: Real-time Localized Style Transfer with Semantic Segmentation |
Authors | Lironne Kurzman, David Vazquez, Issam Laradji |
Abstract | We propose a Class-Based Styling method (CBS) that can map different styles for different object classes in real-time. CBS achieves real-time performance by carrying out two steps simultaneously. While a semantic segmentation method is used to obtain the mask of each object class in a video frame, a styling method is used to style that frame globally. Then an object class can be styled by combining the segmentation mask and the styled image. The user can also select multiple styles so that different object classes can have different styles in a single frame. For semantic segmentation, we leverage DABNet that achieves high accuracy, yet only has 0.76 million parameters and runs at 104 FPS. For the style transfer step, we use a popular real-time method proposed by Johnson et al. [7]. We evaluated CBS on a video of the CityScapes dataset and observed high-quality localized style transfer results for different object classes and real-time performance. |
Tasks | Semantic Segmentation, Style Transfer |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1908.11525v1 |
https://arxiv.org/pdf/1908.11525v1.pdf | |
PWC | https://paperswithcode.com/paper/class-based-styling-real-time-localized-style |
Repo | https://github.com/IssamLaradji/CBStyling |
Framework | pytorch |