Paper Group AWR 427
Benchmarking Natural Language Understanding Services for building Conversational Agents. Deep Learning for Multiple-Image Super-Resolution. KG-BERT: BERT for Knowledge Graph Completion. Towards VQA Models That Can Read. ELG: An Event Logic Graph. Presence-Only Geographical Priors for Fine-Grained Image Classification. Approximate Bayesian Computati …
Benchmarking Natural Language Understanding Services for building Conversational Agents
Title | Benchmarking Natural Language Understanding Services for building Conversational Agents |
Authors | Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser |
Abstract | We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer. In this paper, we present the first wide coverage evaluation and comparison of some of the most popular NLU services, on a large, multi-domain (21 domains) dataset of 25K user utterances that we have collected and annotated with Intent and Entity Type specifications and which will be released as part of this submission. The results show that on Intent classification Watson significantly outperforms the other platforms, namely, Dialogflow, LUIS and Rasa; though these also perform well. Interestingly, on Entity Type recognition, Watson performs significantly worse due to its low Precision. Again, Dialogflow, LUIS and Rasa perform well on this task. |
Tasks | Intent Classification |
Published | 2019-03-13 |
URL | http://arxiv.org/abs/1903.05566v3 |
http://arxiv.org/pdf/1903.05566v3.pdf | |
PWC | https://paperswithcode.com/paper/benchmarking-natural-language-understanding |
Repo | https://github.com/xliuhw/NLU-Evaluation-Data |
Framework | none |
Deep Learning for Multiple-Image Super-Resolution
Title | Deep Learning for Multiple-Image Super-Resolution |
Authors | Michal Kawulok, Pawel Benecki, Szymon Piechaczek, Krzysztof Hrynczenko, Daniel Kostrzewa, Jakub Nalepa |
Abstract | Super-resolution reconstruction (SRR) is a process aimed at enhancing spatial resolution of images, either from a single observation, based on the learned relation between low and high resolution, or from multiple images presenting the same scene. SRR is particularly valuable, if it is infeasible to acquire images at desired resolution, but many images of the same scene are available at lower resolution—this is inherent to a variety of remote sensing scenarios. Recently, we have witnessed substantial improvement in single-image SRR attributed to the use of deep neural networks for learning the relation between low and high resolution. Importantly, deep learning has not been exploited for multiple-image SRR, which benefits from information fusion and in general allows for achieving higher reconstruction accuracy. In this letter, we introduce a new method which combines the advantages of multiple-image fusion with learning the low-to-high resolution mapping using deep networks. The reported experimental results indicate that our algorithm outperforms the state-of-the-art SRR methods, including these that operate from a single image, as well as those that perform multiple-image fusion. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-03-01 |
URL | http://arxiv.org/abs/1903.00440v1 |
http://arxiv.org/pdf/1903.00440v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-multiple-image-super |
Repo | https://github.com/ajinkya933/Image_repo |
Framework | none |
KG-BERT: BERT for Knowledge Graph Completion
Title | KG-BERT: BERT for Knowledge Graph Completion |
Authors | Liang Yao, Chengsheng Mao, Yuan Luo |
Abstract | Knowledge graphs are important resources for many artificial intelligence tasks but often suffer from incompleteness. In this work, we propose to use pre-trained language models for knowledge graph completion. We treat triples in knowledge graphs as textual sequences and propose a novel framework named Knowledge Graph Bidirectional Encoder Representations from Transformer (KG-BERT) to model these triples. Our method takes entity and relation descriptions of a triple as input and computes scoring function of the triple with the KG-BERT language model. Experimental results on multiple benchmark knowledge graphs show that our method can achieve state-of-the-art performance in triple classification, link prediction and relation prediction tasks. |
Tasks | Knowledge Graph Completion, Knowledge Graphs, Language Modelling, Link Prediction |
Published | 2019-09-07 |
URL | https://arxiv.org/abs/1909.03193v2 |
https://arxiv.org/pdf/1909.03193v2.pdf | |
PWC | https://paperswithcode.com/paper/kg-bert-bert-for-knowledge-graph-completion |
Repo | https://github.com/ManasRMohanty/DS5500-capstone |
Framework | none |
Towards VQA Models That Can Read
Title | Towards VQA Models That Can Read |
Authors | Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach |
Abstract | Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today’s VQA models can not read! Our paper takes a first step towards addressing this problem. First, we introduce a new “TextVQA” dataset to facilitate progress on this important problem. Existing datasets either have a small proportion of questions about text (e.g., the VQA dataset) or are too small (e.g., the VizWiz dataset). TextVQA contains 45,336 questions on 28,408 images that require reasoning about text to answer. Second, we introduce a novel model architecture that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the image. Consequently, we call our approach Look, Read, Reason & Answer (LoRRA). We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset. We find that the gap between human performance and machine performance is significantly larger on TextVQA than on VQA 2.0, suggesting that TextVQA is well-suited to benchmark progress along directions complementary to VQA 2.0. |
Tasks | Visual Question Answering |
Published | 2019-04-18 |
URL | https://arxiv.org/abs/1904.08920v2 |
https://arxiv.org/pdf/1904.08920v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-vqa-models-that-can-read |
Repo | https://github.com/xinke-wang/Awesome-Text-VQA |
Framework | none |
ELG: An Event Logic Graph
Title | ELG: An Event Logic Graph |
Authors | Xiao Ding, Zhongyang Li, Ting Liu, Kuo Liao |
Abstract | The evolution and development of events have their own basic principles, which make events happen sequentially. Therefore, the discovery of such evolutionary patterns among events are of great value for event prediction, decision-making and scenario design of dialog systems. However, conventional knowledge graph mainly focuses on the entities and their relations, which neglects the real world events. In this paper, we present a novel type of knowledge base - Event Logic Graph (ELG), which can reveal evolutionary patterns and development logics of real world events. Specifically, ELG is a directed cyclic graph, whose nodes are events, and edges stand for the sequential, causal, conditional or hypernym-hyponym (is-a) relations between events. We constructed two domain ELG: financial domain ELG, which consists of more than 1.5 million of event nodes and more than 1.8 million of directed edges, and travel domain ELG, which consists of about 30 thousand of event nodes and more than 234 thousand of directed edges. Experimental results show that ELG is effective for the task of script event prediction. |
Tasks | Decision Making |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.08015v2 |
https://arxiv.org/pdf/1907.08015v2.pdf | |
PWC | https://paperswithcode.com/paper/elg-an-event-logic-graph |
Repo | https://github.com/shengyp/Temporal-and-Evolving-KG |
Framework | none |
Presence-Only Geographical Priors for Fine-Grained Image Classification
Title | Presence-Only Geographical Priors for Fine-Grained Image Classification |
Authors | Oisin Mac Aodha, Elijah Cole, Pietro Perona |
Abstract | Appearance information alone is often not sufficient to accurately differentiate between fine-grained visual categories. Human experts make use of additional cues such as where, and when, a given image was taken in order to inform their final decision. This contextual information is readily available in many online image collections but has been underutilized by existing image classifiers that focus solely on making predictions based on the image contents. We propose an efficient spatio-temporal prior, that when conditioned on a geographical location and time, estimates the probability that a given object category occurs at that location. Our prior is trained from presence-only observation data and jointly models object categories, their spatio-temporal distributions, and photographer biases. Experiments performed on multiple challenging image classification datasets show that combining our prior with the predictions from image classifiers results in a large improvement in final classification performance. |
Tasks | Fine-Grained Image Classification, Image Classification |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05272v3 |
https://arxiv.org/pdf/1906.05272v3.pdf | |
PWC | https://paperswithcode.com/paper/presence-only-geographical-priors-for-fine |
Repo | https://github.com/visipedia/fg_geo |
Framework | none |
Approximate Bayesian Computation with the Sliced-Wasserstein Distance
Title | Approximate Bayesian Computation with the Sliced-Wasserstein Distance |
Authors | Kimia Nadjahi, Valentin De Bortoli, Alain Durmus, Roland Badeau, Umut Şimşekli |
Abstract | Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood. It constructs an approximate posterior distribution by finding parameters for which the simulated data are close to the observations in terms of summary statistics. These statistics are defined beforehand and might induce a loss of information, which has been shown to deteriorate the quality of the approximation. To overcome this problem, Wasserstein-ABC has been recently proposed, and compares the datasets via the Wasserstein distance between their empirical distributions, but does not scale well to the dimension or the number of samples. We propose a new ABC technique, called Sliced-Wasserstein ABC and based on the Sliced-Wasserstein distance, which has better computational and statistical properties. We derive two theoretical results showing the asymptotical consistency of our approach, and we illustrate its advantages on synthetic data and an image denoising task. |
Tasks | Denoising, Image Denoising |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12815v2 |
https://arxiv.org/pdf/1910.12815v2.pdf | |
PWC | https://paperswithcode.com/paper/approximate-bayesian-computation-with-the |
Repo | https://github.com/kimiandj/slicedwass_abc |
Framework | none |
Collaborative Evolutionary Reinforcement Learning
Title | Collaborative Evolutionary Reinforcement Learning |
Authors | Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, Kagan Tumer |
Abstract | Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically struggle with achieving effective exploration and are extremely sensitive to the choice of hyperparameters. One reason is that most approaches use a noisy version of their operating policy to explore - thereby limiting the range of exploration. In this paper, we introduce Collaborative Evolutionary Reinforcement Learning (CERL), a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space. A collection of learners - typically proven algorithms like TD3 - optimize over varying time-horizons leading to this diverse portfolio. All learners contribute to and use a shared replay buffer to achieve greater sample efficiency. Computational resources are dynamically distributed to favor the best learners as a form of online algorithm selection. Neuroevolution binds this entire process to generate a single emergent learner that exceeds the capabilities of any individual learner. Experiments in a range of continuous control benchmarks demonstrate that the emergent learner significantly outperforms its composite learners while remaining overall more sample-efficient - notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation. |
Tasks | Continuous Control |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.00976v2 |
https://arxiv.org/pdf/1905.00976v2.pdf | |
PWC | https://paperswithcode.com/paper/collaborative-evolutionary-reinforcement |
Repo | https://github.com/intelai/cerl |
Framework | pytorch |
PiiGAN: Generative Adversarial Networks for Pluralistic Image Inpainting
Title | PiiGAN: Generative Adversarial Networks for Pluralistic Image Inpainting |
Authors | Weiwei Cai, Zhanguo Wei |
Abstract | The latest methods based on deep learning have achieved amazing results regarding the complex work of inpainting large missing areas in an image. But this type of method generally attempts to generate one single “optimal” result, ignoring many other plausible results. Considering the uncertainty of the inpainting task, one sole result can hardly be regarded as a desired regeneration of the missing area. In view of this weakness, which is related to the design of the previous algorithms, we propose a novel deep generative model equipped with a brand new style extractor which can extract the style feature (latent vector) from the ground truth. Once obtained, the extracted style feature and the ground truth are both input into the generator. We also craft a consistency loss that guides the generated image to approximate the ground truth. After iterations, our generator is able to learn the mapping of styles corresponding to multiple sets of vectors. The proposed model can generate a large number of results consistent with the context semantics of the image. Moreover, we evaluated the effectiveness of our model on three datasets, i.e., CelebA, PlantVillage, and MauFlex. Compared to state-of-the-art inpainting methods, this model is able to offer desirable inpainting results with both better quality and higher diversity. The code and model will be made available on https://github.com/vivitsai/PiiGAN. |
Tasks | Image Inpainting |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.01834v2 |
https://arxiv.org/pdf/1912.01834v2.pdf | |
PWC | https://paperswithcode.com/paper/diversity-generated-image-inpainting-with |
Repo | https://github.com/vivitsai/SEGAN |
Framework | tf |
Instance-Level Meta Normalization
Title | Instance-Level Meta Normalization |
Authors | Songhao Jia, Ding-Jie Chen, Hwann-Tzong Chen |
Abstract | This paper presents a normalization mechanism called Instance-Level Meta Normalization (ILM~Norm) to address a learning-to-normalize problem. ILM~Norm learns to predict the normalization parameters via both the feature feed-forward and the gradient back-propagation paths. ILM~Norm provides a meta normalization mechanism and has several good properties. It can be easily plugged into existing instance-level normalization schemes such as Instance Normalization, Layer Normalization, or Group Normalization. ILM~Norm normalizes each instance individually and therefore maintains high performance even when small mini-batch is used. The experimental results show that ILM~Norm well adapts to different network architectures and tasks, and it consistently improves the performance of the original models. The code is available at url{https://github.com/Gasoonjia/ILM-Norm. |
Tasks | |
Published | 2019-04-06 |
URL | http://arxiv.org/abs/1904.03516v1 |
http://arxiv.org/pdf/1904.03516v1.pdf | |
PWC | https://paperswithcode.com/paper/instance-level-meta-normalization |
Repo | https://github.com/Gasoonjia/ILM-Norm |
Framework | pytorch |
DocRED: A Large-Scale Document-Level Relation Extraction Dataset
Title | DocRED: A Large-Scale Document-Level Relation Extraction Dataset |
Authors | Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, Maosong Sun |
Abstract | Multiple entities in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs. In order to accelerate the research on document-level RE, we introduce DocRED, a new dataset constructed from Wikipedia and Wikidata with three features: (1) DocRED annotates both named entities and relations, and is the largest human-annotated dataset for document-level RE from plain text; (2) DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document; (3) along with the human-annotated data, we also offer large-scale distantly supervised data, which enables DocRED to be adopted for both supervised and weakly supervised scenarios. In order to verify the challenges of document-level RE, we implement recent state-of-the-art methods for RE and conduct a thorough evaluation of these methods on DocRED. Empirical results show that DocRED is challenging for existing RE methods, which indicates that document-level RE remains an open problem and requires further efforts. Based on the detailed analysis on the experiments, we discuss multiple promising directions for future research. |
Tasks | Relation Extraction |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06127v3 |
https://arxiv.org/pdf/1906.06127v3.pdf | |
PWC | https://paperswithcode.com/paper/docred-a-large-scale-document-level-relation |
Repo | https://github.com/thunlp/DocRED |
Framework | pytorch |
Accurate, reliable and fast robustness evaluation
Title | Accurate, reliable and fast robustness evaluation |
Authors | Wieland Brendel, Jonas Rauber, Matthias Kümmerer, Ivan Ustyuzhaninov, Matthias Bethge |
Abstract | Throughout the past five years, the susceptibility of neural networks to minimal adversarial perturbations has moved from a peculiar phenomenon to a core issue in Deep Learning. Despite much attention, however, progress towards more robust models is significantly impaired by the difficulty of evaluating the robustness of neural network models. Today’s methods are either fast but brittle (gradient-based attacks), or they are fairly reliable but slow (score- and decision-based attacks). We here develop a new set of gradient-based adversarial attacks which (a) are more reliable in the face of gradient-masking than other gradient-based attacks, (b) perform better and are more query efficient than current state-of-the-art gradient-based attacks, (c) can be flexibly adapted to a wide range of adversarial criteria and (d) require virtually no hyperparameter tuning. These findings are carefully validated across a diverse set of six different models and hold for L0, L1, L2 and Linf in both targeted as well as untargeted scenarios. Implementations will soon be available in all major toolboxes (Foolbox, CleverHans and ART). We hope that this class of attacks will make robustness evaluations easier and more reliable, thus contributing to more signal in the search for more robust machine learning models. |
Tasks | |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.01003v2 |
https://arxiv.org/pdf/1907.01003v2.pdf | |
PWC | https://paperswithcode.com/paper/accurate-reliable-and-fast-robustness |
Repo | https://github.com/wielandbrendel/brendel_bethge_attack |
Framework | none |
VIBE: Video Inference for Human Body Pose and Shape Estimation
Title | VIBE: Video Inference for Human Body Pose and Shape Estimation |
Authors | Muhammed Kocabas, Nikos Athanasiou, Michael J. Black |
Abstract | Human motion is fundamental to understanding behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methods fail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose Video Inference for Body Pose and Shape Estimation (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape regression networks. We define a temporal network architecture and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. We perform extensive experimentation to analyze the importance of motion and demonstrate the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving state-of-the-art performance. Code and pretrained models are available at https://github.com/mkocabas/VIBE. |
Tasks | 3D Human Pose Estimation, 3D Pose Estimation, Motion Capture, Pose Estimation |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05656v2 |
https://arxiv.org/pdf/1912.05656v2.pdf | |
PWC | https://paperswithcode.com/paper/vibe-video-inference-for-human-body-pose-and |
Repo | https://github.com/mkocabas/VIBE |
Framework | pytorch |
Challenging Environments for Traffic Sign Detection: Reliability Assessment under Inclement Conditions
Title | Challenging Environments for Traffic Sign Detection: Reliability Assessment under Inclement Conditions |
Authors | Dogancan Temel, Tariq Alshawi, Min-Hung Chen, Ghassan AlRegib |
Abstract | State-of-the-art algorithms successfully localize and recognize traffic signs over existing datasets, which are limited in terms of challenging condition type and severity. Therefore, it is not possible to estimate the performance of traffic sign detection algorithms under overlooked challenging conditions. Another shortcoming of existing datasets is the limited utilization of temporal information and the unavailability of consecutive frames and annotations. To overcome these shortcomings, we generated the CURE-TSD video dataset and hosted the first IEEE Video and Image Processing (VIP) Cup within the IEEE Signal Processing Society. In this paper, we provide a detailed description of the CURE-TSD dataset, analyze the characteristics of the top performing algorithms, and provide a performance benchmark. Moreover, we investigate the robustness of the benchmarked algorithms with respect to sign size, challenge type and severity. Benchmarked algorithms are based on state-of-the-art and custom convolutional neural networks that achieved a precision of 0.55 and a recall of 0.32, F0.5 score of 0.48 and F2 score of 0.35. Experimental results show that benchmarked algorithms are highly sensitive to tested challenging conditions, which result in an average performance drop of 0.17 in terms of precision and a performance drop of 0.28 in recall under severe conditions. The dataset is publicly available at https://github.com/olivesgatech/CURE-TSD. |
Tasks | |
Published | 2019-02-19 |
URL | https://arxiv.org/abs/1902.06857v2 |
https://arxiv.org/pdf/1902.06857v2.pdf | |
PWC | https://paperswithcode.com/paper/challenging-environments-for-traffic-sign |
Repo | https://github.com/olivesgatech/CURE-TSR |
Framework | pytorch |
Empirical Upper Bound in Object Detection and More
Title | Empirical Upper Bound in Object Detection and More |
Authors | Ali Borji, Seyed Mehdi Iranmanesh |
Abstract | Object detection remains as one of the most notorious open problems in computer vision. Despite large strides in accuracy in recent years, modern object detectors have started to saturate on popular benchmarks raising the question of how far we can reach with deep learning tools and tricks. Here, by employing 2 state-of-the-art object detection benchmarks, and analyzing more than 15 models over 4 large scale datasets, we I) carefully determine the upperbound in AP, which is 91.6% on VOC (test2007), 78.2% on COCO (val2017), and 58.9% on OpenImages V4 (validation), regardless of the IOU. These numbers are much better than the mAP of the best model1 (47.9% on VOC, and 46.9% on COCO; IOUs=.5:.95), II) characterize the sources of errors in object detectors, in a novel and intuitive way, and find that classification error (confusion with other classes and misses) explains the largest fraction of errors and weighs more than localization and duplicate errors, and III) analyze the invariance properties of models when surrounding context of an object is removed, when an object is placed in an incongruent background, and when images are blurred or flipped vertically. We find that models generate boxes on empty regions and that context is more important for detecting small objects than larger ones. Our work taps into the tight relationship between recognition and detection and offers insights for building better models. |
Tasks | Object Detection |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12451v3 |
https://arxiv.org/pdf/1911.12451v3.pdf | |
PWC | https://paperswithcode.com/paper/empirical-upper-bound-in-object-detection-and |
Repo | https://github.com/aliborji/DeetctionUpperbound |
Framework | pytorch |