January 31, 2020

3257 words 16 mins read

Paper Group AWR 427

Benchmarking Natural Language Understanding Services for building Conversational Agents. Deep Learning for Multiple-Image Super-Resolution. KG-BERT: BERT for Knowledge Graph Completion. Towards VQA Models That Can Read. ELG: An Event Logic Graph. Presence-Only Geographical Priors for Fine-Grained Image Classification. Approximate Bayesian Computati …

Benchmarking Natural Language Understanding Services for building Conversational Agents


Title	Benchmarking Natural Language Understanding Services for building Conversational Agents
Authors	Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser
Abstract	We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer. In this paper, we present the first wide coverage evaluation and comparison of some of the most popular NLU services, on a large, multi-domain (21 domains) dataset of 25K user utterances that we have collected and annotated with Intent and Entity Type specifications and which will be released as part of this submission. The results show that on Intent classification Watson significantly outperforms the other platforms, namely, Dialogflow, LUIS and Rasa; though these also perform well. Interestingly, on Entity Type recognition, Watson performs significantly worse due to its low Precision. Again, Dialogflow, LUIS and Rasa perform well on this task.
Tasks	Intent Classification
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05566v3
PDF	http://arxiv.org/pdf/1903.05566v3.pdf
PWC	https://paperswithcode.com/paper/benchmarking-natural-language-understanding
Repo	https://github.com/xliuhw/NLU-Evaluation-Data
Framework	none

Deep Learning for Multiple-Image Super-Resolution


Title	Deep Learning for Multiple-Image Super-Resolution
Authors	Michal Kawulok, Pawel Benecki, Szymon Piechaczek, Krzysztof Hrynczenko, Daniel Kostrzewa, Jakub Nalepa
Abstract	Super-resolution reconstruction (SRR) is a process aimed at enhancing spatial resolution of images, either from a single observation, based on the learned relation between low and high resolution, or from multiple images presenting the same scene. SRR is particularly valuable, if it is infeasible to acquire images at desired resolution, but many images of the same scene are available at lower resolution—this is inherent to a variety of remote sensing scenarios. Recently, we have witnessed substantial improvement in single-image SRR attributed to the use of deep neural networks for learning the relation between low and high resolution. Importantly, deep learning has not been exploited for multiple-image SRR, which benefits from information fusion and in general allows for achieving higher reconstruction accuracy. In this letter, we introduce a new method which combines the advantages of multiple-image fusion with learning the low-to-high resolution mapping using deep networks. The reported experimental results indicate that our algorithm outperforms the state-of-the-art SRR methods, including these that operate from a single image, as well as those that perform multiple-image fusion.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-03-01
URL	http://arxiv.org/abs/1903.00440v1
PDF	http://arxiv.org/pdf/1903.00440v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-multiple-image-super
Repo	https://github.com/ajinkya933/Image_repo
Framework	none

KG-BERT: BERT for Knowledge Graph Completion


Title	KG-BERT: BERT for Knowledge Graph Completion
Authors	Liang Yao, Chengsheng Mao, Yuan Luo
Abstract	Knowledge graphs are important resources for many artificial intelligence tasks but often suffer from incompleteness. In this work, we propose to use pre-trained language models for knowledge graph completion. We treat triples in knowledge graphs as textual sequences and propose a novel framework named Knowledge Graph Bidirectional Encoder Representations from Transformer (KG-BERT) to model these triples. Our method takes entity and relation descriptions of a triple as input and computes scoring function of the triple with the KG-BERT language model. Experimental results on multiple benchmark knowledge graphs show that our method can achieve state-of-the-art performance in triple classification, link prediction and relation prediction tasks.
Tasks	Knowledge Graph Completion, Knowledge Graphs, Language Modelling, Link Prediction
Published	2019-09-07
URL	https://arxiv.org/abs/1909.03193v2
PDF	https://arxiv.org/pdf/1909.03193v2.pdf
PWC	https://paperswithcode.com/paper/kg-bert-bert-for-knowledge-graph-completion
Repo	https://github.com/ManasRMohanty/DS5500-capstone
Framework	none

Towards VQA Models That Can Read


Title	Towards VQA Models That Can Read
Authors	Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach
Abstract	Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today’s VQA models can not read! Our paper takes a first step towards addressing this problem. First, we introduce a new “TextVQA” dataset to facilitate progress on this important problem. Existing datasets either have a small proportion of questions about text (e.g., the VQA dataset) or are too small (e.g., the VizWiz dataset). TextVQA contains 45,336 questions on 28,408 images that require reasoning about text to answer. Second, we introduce a novel model architecture that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the image. Consequently, we call our approach Look, Read, Reason & Answer (LoRRA). We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset. We find that the gap between human performance and machine performance is significantly larger on TextVQA than on VQA 2.0, suggesting that TextVQA is well-suited to benchmark progress along directions complementary to VQA 2.0.
Tasks	Visual Question Answering
Published	2019-04-18
URL	https://arxiv.org/abs/1904.08920v2
PDF	https://arxiv.org/pdf/1904.08920v2.pdf
PWC	https://paperswithcode.com/paper/towards-vqa-models-that-can-read
Repo	https://github.com/xinke-wang/Awesome-Text-VQA
Framework	none

ELG: An Event Logic Graph


Title	ELG: An Event Logic Graph
Authors	Xiao Ding, Zhongyang Li, Ting Liu, Kuo Liao
Abstract	The evolution and development of events have their own basic principles, which make events happen sequentially. Therefore, the discovery of such evolutionary patterns among events are of great value for event prediction, decision-making and scenario design of dialog systems. However, conventional knowledge graph mainly focuses on the entities and their relations, which neglects the real world events. In this paper, we present a novel type of knowledge base - Event Logic Graph (ELG), which can reveal evolutionary patterns and development logics of real world events. Specifically, ELG is a directed cyclic graph, whose nodes are events, and edges stand for the sequential, causal, conditional or hypernym-hyponym (is-a) relations between events. We constructed two domain ELG: financial domain ELG, which consists of more than 1.5 million of event nodes and more than 1.8 million of directed edges, and travel domain ELG, which consists of about 30 thousand of event nodes and more than 234 thousand of directed edges. Experimental results show that ELG is effective for the task of script event prediction.
Tasks	Decision Making
Published	2019-07-18
URL	https://arxiv.org/abs/1907.08015v2
PDF	https://arxiv.org/pdf/1907.08015v2.pdf
PWC	https://paperswithcode.com/paper/elg-an-event-logic-graph
Repo	https://github.com/shengyp/Temporal-and-Evolving-KG
Framework	none

Presence-Only Geographical Priors for Fine-Grained Image Classification


Title	Presence-Only Geographical Priors for Fine-Grained Image Classification
Authors	Oisin Mac Aodha, Elijah Cole, Pietro Perona
Abstract	Appearance information alone is often not sufficient to accurately differentiate between fine-grained visual categories. Human experts make use of additional cues such as where, and when, a given image was taken in order to inform their final decision. This contextual information is readily available in many online image collections but has been underutilized by existing image classifiers that focus solely on making predictions based on the image contents. We propose an efficient spatio-temporal prior, that when conditioned on a geographical location and time, estimates the probability that a given object category occurs at that location. Our prior is trained from presence-only observation data and jointly models object categories, their spatio-temporal distributions, and photographer biases. Experiments performed on multiple challenging image classification datasets show that combining our prior with the predictions from image classifiers results in a large improvement in final classification performance.
Tasks	Fine-Grained Image Classification, Image Classification
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05272v3
PDF	https://arxiv.org/pdf/1906.05272v3.pdf
PWC	https://paperswithcode.com/paper/presence-only-geographical-priors-for-fine
Repo	https://github.com/visipedia/fg_geo
Framework	none

Approximate Bayesian Computation with the Sliced-Wasserstein Distance


Title	Approximate Bayesian Computation with the Sliced-Wasserstein Distance
Authors	Kimia Nadjahi, Valentin De Bortoli, Alain Durmus, Roland Badeau, Umut Şimşekli
Abstract	Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood. It constructs an approximate posterior distribution by finding parameters for which the simulated data are close to the observations in terms of summary statistics. These statistics are defined beforehand and might induce a loss of information, which has been shown to deteriorate the quality of the approximation. To overcome this problem, Wasserstein-ABC has been recently proposed, and compares the datasets via the Wasserstein distance between their empirical distributions, but does not scale well to the dimension or the number of samples. We propose a new ABC technique, called Sliced-Wasserstein ABC and based on the Sliced-Wasserstein distance, which has better computational and statistical properties. We derive two theoretical results showing the asymptotical consistency of our approach, and we illustrate its advantages on synthetic data and an image denoising task.
Tasks	Denoising, Image Denoising
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12815v2
PDF	https://arxiv.org/pdf/1910.12815v2.pdf
PWC	https://paperswithcode.com/paper/approximate-bayesian-computation-with-the
Repo	https://github.com/kimiandj/slicedwass_abc
Framework	none

Collaborative Evolutionary Reinforcement Learning


Title	Collaborative Evolutionary Reinforcement Learning
Authors	Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, Kagan Tumer
Abstract	Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically struggle with achieving effective exploration and are extremely sensitive to the choice of hyperparameters. One reason is that most approaches use a noisy version of their operating policy to explore - thereby limiting the range of exploration. In this paper, we introduce Collaborative Evolutionary Reinforcement Learning (CERL), a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space. A collection of learners - typically proven algorithms like TD3 - optimize over varying time-horizons leading to this diverse portfolio. All learners contribute to and use a shared replay buffer to achieve greater sample efficiency. Computational resources are dynamically distributed to favor the best learners as a form of online algorithm selection. Neuroevolution binds this entire process to generate a single emergent learner that exceeds the capabilities of any individual learner. Experiments in a range of continuous control benchmarks demonstrate that the emergent learner significantly outperforms its composite learners while remaining overall more sample-efficient - notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation.
Tasks	Continuous Control
Published	2019-05-02
URL	https://arxiv.org/abs/1905.00976v2
PDF	https://arxiv.org/pdf/1905.00976v2.pdf
PWC	https://paperswithcode.com/paper/collaborative-evolutionary-reinforcement
Repo	https://github.com/intelai/cerl
Framework	pytorch

PiiGAN: Generative Adversarial Networks for Pluralistic Image Inpainting


Title	PiiGAN: Generative Adversarial Networks for Pluralistic Image Inpainting
Authors	Weiwei Cai, Zhanguo Wei
Abstract	The latest methods based on deep learning have achieved amazing results regarding the complex work of inpainting large missing areas in an image. But this type of method generally attempts to generate one single “optimal” result, ignoring many other plausible results. Considering the uncertainty of the inpainting task, one sole result can hardly be regarded as a desired regeneration of the missing area. In view of this weakness, which is related to the design of the previous algorithms, we propose a novel deep generative model equipped with a brand new style extractor which can extract the style feature (latent vector) from the ground truth. Once obtained, the extracted style feature and the ground truth are both input into the generator. We also craft a consistency loss that guides the generated image to approximate the ground truth. After iterations, our generator is able to learn the mapping of styles corresponding to multiple sets of vectors. The proposed model can generate a large number of results consistent with the context semantics of the image. Moreover, we evaluated the effectiveness of our model on three datasets, i.e., CelebA, PlantVillage, and MauFlex. Compared to state-of-the-art inpainting methods, this model is able to offer desirable inpainting results with both better quality and higher diversity. The code and model will be made available on https://github.com/vivitsai/PiiGAN.
Tasks	Image Inpainting
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01834v2
PDF	https://arxiv.org/pdf/1912.01834v2.pdf
PWC	https://paperswithcode.com/paper/diversity-generated-image-inpainting-with
Repo	https://github.com/vivitsai/SEGAN
Framework	tf

Instance-Level Meta Normalization


Title	Instance-Level Meta Normalization
Authors	Songhao Jia, Ding-Jie Chen, Hwann-Tzong Chen
Abstract	This paper presents a normalization mechanism called Instance-Level Meta Normalization (ILM~Norm) to address a learning-to-normalize problem. ILM~Norm learns to predict the normalization parameters via both the feature feed-forward and the gradient back-propagation paths. ILM~Norm provides a meta normalization mechanism and has several good properties. It can be easily plugged into existing instance-level normalization schemes such as Instance Normalization, Layer Normalization, or Group Normalization. ILM~Norm normalizes each instance individually and therefore maintains high performance even when small mini-batch is used. The experimental results show that ILM~Norm well adapts to different network architectures and tasks, and it consistently improves the performance of the original models. The code is available at url{https://github.com/Gasoonjia/ILM-Norm.
Tasks
Published	2019-04-06
URL	http://arxiv.org/abs/1904.03516v1
PDF	http://arxiv.org/pdf/1904.03516v1.pdf
PWC	https://paperswithcode.com/paper/instance-level-meta-normalization
Repo	https://github.com/Gasoonjia/ILM-Norm
Framework	pytorch

DocRED: A Large-Scale Document-Level Relation Extraction Dataset


Title	DocRED: A Large-Scale Document-Level Relation Extraction Dataset
Authors	Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, Maosong Sun
Abstract	Multiple entities in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs. In order to accelerate the research on document-level RE, we introduce DocRED, a new dataset constructed from Wikipedia and Wikidata with three features: (1) DocRED annotates both named entities and relations, and is the largest human-annotated dataset for document-level RE from plain text; (2) DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document; (3) along with the human-annotated data, we also offer large-scale distantly supervised data, which enables DocRED to be adopted for both supervised and weakly supervised scenarios. In order to verify the challenges of document-level RE, we implement recent state-of-the-art methods for RE and conduct a thorough evaluation of these methods on DocRED. Empirical results show that DocRED is challenging for existing RE methods, which indicates that document-level RE remains an open problem and requires further efforts. Based on the detailed analysis on the experiments, we discuss multiple promising directions for future research.
Tasks	Relation Extraction
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06127v3
PDF	https://arxiv.org/pdf/1906.06127v3.pdf
PWC	https://paperswithcode.com/paper/docred-a-large-scale-document-level-relation
Repo	https://github.com/thunlp/DocRED
Framework	pytorch

Accurate, reliable and fast robustness evaluation


Title	Accurate, reliable and fast robustness evaluation
Authors	Wieland Brendel, Jonas Rauber, Matthias Kümmerer, Ivan Ustyuzhaninov, Matthias Bethge
Abstract	Throughout the past five years, the susceptibility of neural networks to minimal adversarial perturbations has moved from a peculiar phenomenon to a core issue in Deep Learning. Despite much attention, however, progress towards more robust models is significantly impaired by the difficulty of evaluating the robustness of neural network models. Today’s methods are either fast but brittle (gradient-based attacks), or they are fairly reliable but slow (score- and decision-based attacks). We here develop a new set of gradient-based adversarial attacks which (a) are more reliable in the face of gradient-masking than other gradient-based attacks, (b) perform better and are more query efficient than current state-of-the-art gradient-based attacks, (c) can be flexibly adapted to a wide range of adversarial criteria and (d) require virtually no hyperparameter tuning. These findings are carefully validated across a diverse set of six different models and hold for L0, L1, L2 and Linf in both targeted as well as untargeted scenarios. Implementations will soon be available in all major toolboxes (Foolbox, CleverHans and ART). We hope that this class of attacks will make robustness evaluations easier and more reliable, thus contributing to more signal in the search for more robust machine learning models.
Tasks
Published	2019-07-01
URL	https://arxiv.org/abs/1907.01003v2
PDF	https://arxiv.org/pdf/1907.01003v2.pdf
PWC	https://paperswithcode.com/paper/accurate-reliable-and-fast-robustness
Repo	https://github.com/wielandbrendel/brendel_bethge_attack
Framework	none

VIBE: Video Inference for Human Body Pose and Shape Estimation


Title	VIBE: Video Inference for Human Body Pose and Shape Estimation
Authors	Muhammed Kocabas, Nikos Athanasiou, Michael J. Black
Abstract	Human motion is fundamental to understanding behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methods fail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose Video Inference for Body Pose and Shape Estimation (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape regression networks. We define a temporal network architecture and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. We perform extensive experimentation to analyze the importance of motion and demonstrate the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving state-of-the-art performance. Code and pretrained models are available at https://github.com/mkocabas/VIBE.
Tasks	3D Human Pose Estimation, 3D Pose Estimation, Motion Capture, Pose Estimation
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05656v2
PDF	https://arxiv.org/pdf/1912.05656v2.pdf
PWC	https://paperswithcode.com/paper/vibe-video-inference-for-human-body-pose-and
Repo	https://github.com/mkocabas/VIBE
Framework	pytorch

Challenging Environments for Traffic Sign Detection: Reliability Assessment under Inclement Conditions


Title	Challenging Environments for Traffic Sign Detection: Reliability Assessment under Inclement Conditions
Authors	Dogancan Temel, Tariq Alshawi, Min-Hung Chen, Ghassan AlRegib
Abstract	State-of-the-art algorithms successfully localize and recognize traffic signs over existing datasets, which are limited in terms of challenging condition type and severity. Therefore, it is not possible to estimate the performance of traffic sign detection algorithms under overlooked challenging conditions. Another shortcoming of existing datasets is the limited utilization of temporal information and the unavailability of consecutive frames and annotations. To overcome these shortcomings, we generated the CURE-TSD video dataset and hosted the first IEEE Video and Image Processing (VIP) Cup within the IEEE Signal Processing Society. In this paper, we provide a detailed description of the CURE-TSD dataset, analyze the characteristics of the top performing algorithms, and provide a performance benchmark. Moreover, we investigate the robustness of the benchmarked algorithms with respect to sign size, challenge type and severity. Benchmarked algorithms are based on state-of-the-art and custom convolutional neural networks that achieved a precision of 0.55 and a recall of 0.32, F0.5 score of 0.48 and F2 score of 0.35. Experimental results show that benchmarked algorithms are highly sensitive to tested challenging conditions, which result in an average performance drop of 0.17 in terms of precision and a performance drop of 0.28 in recall under severe conditions. The dataset is publicly available at https://github.com/olivesgatech/CURE-TSD.
Tasks
Published	2019-02-19
URL	https://arxiv.org/abs/1902.06857v2
PDF	https://arxiv.org/pdf/1902.06857v2.pdf
PWC	https://paperswithcode.com/paper/challenging-environments-for-traffic-sign
Repo	https://github.com/olivesgatech/CURE-TSR
Framework	pytorch

Empirical Upper Bound in Object Detection and More


Title	Empirical Upper Bound in Object Detection and More
Authors	Ali Borji, Seyed Mehdi Iranmanesh
Abstract	Object detection remains as one of the most notorious open problems in computer vision. Despite large strides in accuracy in recent years, modern object detectors have started to saturate on popular benchmarks raising the question of how far we can reach with deep learning tools and tricks. Here, by employing 2 state-of-the-art object detection benchmarks, and analyzing more than 15 models over 4 large scale datasets, we I) carefully determine the upperbound in AP, which is 91.6% on VOC (test2007), 78.2% on COCO (val2017), and 58.9% on OpenImages V4 (validation), regardless of the IOU. These numbers are much better than the mAP of the best model1 (47.9% on VOC, and 46.9% on COCO; IOUs=.5:.95), II) characterize the sources of errors in object detectors, in a novel and intuitive way, and find that classification error (confusion with other classes and misses) explains the largest fraction of errors and weighs more than localization and duplicate errors, and III) analyze the invariance properties of models when surrounding context of an object is removed, when an object is placed in an incongruent background, and when images are blurred or flipped vertically. We find that models generate boxes on empty regions and that context is more important for detecting small objects than larger ones. Our work taps into the tight relationship between recognition and detection and offers insights for building better models.
Tasks	Object Detection
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12451v3
PDF	https://arxiv.org/pdf/1911.12451v3.pdf
PWC	https://paperswithcode.com/paper/empirical-upper-bound-in-object-detection-and
Repo	https://github.com/aliborji/DeetctionUpperbound
Framework	pytorch