February 1, 2020

2886 words 14 mins read

Paper Group AWR 126

Paper Group AWR 126

Accurate Visual Localization for Automotive Applications. Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual. Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts. Single Image Deraining: A Comprehensive Benchmark Analysis. Constructive Type-Logical Supertagging with Self-Attention Networks. Improving Robus …

Accurate Visual Localization for Automotive Applications

Title Accurate Visual Localization for Automotive Applications
Authors Eli Brosh, Matan Friedmann, Ilan Kadar, Lev Yitzhak Lavy, Elad Levi, Shmuel Rippa, Yair Lempert, Bruno Fernandez-Ruiz, Roei Herzig, Trevor Darrell
Abstract Accurate vehicle localization is a crucial step towards building effective Vehicle-to-Vehicle networks and automotive applications. Yet standard grade GPS data, such as that provided by mobile phones, is often noisy and exhibits significant localization errors in many urban areas. Approaches for accurate localization from imagery often rely on structure-based techniques, and thus are limited in scale and are expensive to compute. In this paper, we present a scalable visual localization approach geared for real-time performance. We propose a hybrid coarse-to-fine approach that leverages visual and GPS location cues. Our solution uses a self-supervised approach to learn a compact road image representation. This representation enables efficient visual retrieval and provides coarse localization cues, which are fused with vehicle ego-motion to obtain high accuracy location estimates. As a benchmark to evaluate the performance of our visual localization approach, we introduce a new large-scale driving dataset based on video and GPS data obtained from a large-scale network of connected dash-cams. Our experiments confirm that our approach is highly effective in challenging urban environments, reducing localization error by an order of magnitude.
Tasks Visual Localization
Published 2019-05-01
URL http://arxiv.org/abs/1905.03706v1
PDF http://arxiv.org/pdf/1905.03706v1.pdf
PWC https://paperswithcode.com/paper/190503706
Repo https://github.com/getnexar/Nexar-Visual-Localization
Framework none

Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual

Title Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual
Authors He He, Sheng Zha, Haohan Wang
Abstract Statistical natural language inference (NLI) models are susceptible to learning dataset bias: superficial cues that happen to associate with the label on a particular dataset, but are not useful in general, e.g., negation words indicate contradiction. As exposed by several recent challenge datasets, these models perform poorly when such association is absent, e.g., predicting that “I love dogs” contradicts “I don’t love cats”. Our goal is to design learning algorithms that guard against known dataset bias. We formalize the concept of dataset bias under the framework of distribution shift and present a simple debiasing algorithm based on residual fitting, which we call DRiFt. We first learn a biased model that only uses features that are known to relate to dataset bias. Then, we train a debiased model that fits to the residual of the biased model, focusing on examples that cannot be predicted well by biased features only. We use DRiFt to train three high-performing NLI models on two benchmark datasets, SNLI and MNLI. Our debiased models achieve significant gains over baseline models on two challenge test sets, while maintaining reasonable performance on the original test sets.
Tasks Natural Language Inference
Published 2019-08-28
URL https://arxiv.org/abs/1908.10763v2
PDF https://arxiv.org/pdf/1908.10763v2.pdf
PWC https://paperswithcode.com/paper/unlearn-dataset-bias-in-natural-language
Repo https://github.com/hhexiy/debiased
Framework mxnet

Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts

Title Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts
Authors Rui Xia, Zixiang Ding
Abstract Emotion cause extraction (ECE), the task aimed at extracting the potential causes behind certain emotions in text, has gained much attention in recent years due to its wide applications. However, it suffers from two shortcomings: 1) the emotion must be annotated before cause extraction in ECE, which greatly limits its applications in real-world scenarios; 2) the way to first annotate emotion and then extract the cause ignores the fact that they are mutually indicative. In this work, we propose a new task: emotion-cause pair extraction (ECPE), which aims to extract the potential pairs of emotions and corresponding causes in a document. We propose a 2-step approach to address this new ECPE task, which first performs individual emotion extraction and cause extraction via multi-task learning, and then conduct emotion-cause pairing and filtering. The experimental results on a benchmark emotion cause corpus prove the feasibility of the ECPE task as well as the effectiveness of our approach.
Tasks Emotion Recognition, Multi-Task Learning
Published 2019-06-04
URL https://arxiv.org/abs/1906.01267v1
PDF https://arxiv.org/pdf/1906.01267v1.pdf
PWC https://paperswithcode.com/paper/emotion-cause-pair-extraction-a-new-task-to
Repo https://github.com/NUSTM/ECPE
Framework tf

Single Image Deraining: A Comprehensive Benchmark Analysis

Title Single Image Deraining: A Comprehensive Benchmark Analysis
Authors Siyuan Li, Iago Breno Araujo, Wenqi Ren, Zhangyang Wang, Eric K. Tokuda, Roberto Hirata Junior, Roberto Cesar-Junior, Jiawan Zhang, Xiaojie Guo, Xiaochun Cao
Abstract We present a comprehensive study and evaluation of existing single image deraining algorithms, using a new large-scale benchmark consisting of both synthetic and real-world rainy images.This dataset highlights diverse data sources and image contents, and is divided into three subsets (rain streak, rain drop, rain and mist), each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Experiments on the dataset shed light on the comparisons and limitations of state-of-the-art deraining algorithms, and suggest promising future directions.
Tasks Rain Removal, Single Image Deraining
Published 2019-03-20
URL http://arxiv.org/abs/1903.08558v1
PDF http://arxiv.org/pdf/1903.08558v1.pdf
PWC https://paperswithcode.com/paper/single-image-deraining-a-comprehensive
Repo https://github.com/lsy17096535/Single-Image-Deraining
Framework none

Constructive Type-Logical Supertagging with Self-Attention Networks

Title Constructive Type-Logical Supertagging with Self-Attention Networks
Authors Konstantinos Kogkalidis, Michael Moortgat, Tejaswini Deoskar
Abstract We propose a novel application of self-attention networks towards grammar induction. We present an attention-based supertagger for a refined type-logical grammar, trained on constructing types inductively. In addition to achieving a high overall type accuracy, our model is able to learn the syntax of the grammar’s type system along with its denotational semantics. This lifts the closed world assumption commonly made by lexicalized grammar supertaggers, greatly enhancing its generalization potential. This is evidenced both by its adequate accuracy over sparse word types and its ability to correctly construct complex types never seen during training, which, to the best of our knowledge, was as of yet unaccomplished.
Tasks
Published 2019-05-31
URL https://arxiv.org/abs/1905.13418v1
PDF https://arxiv.org/pdf/1905.13418v1.pdf
PWC https://paperswithcode.com/paper/constructive-type-logical-supertagging-with
Repo https://github.com/konstantinosKokos/Lassy-TLG-Supertagging
Framework pytorch

Improving Robustness of Deep Learning Based Knee MRI Segmentation: Mixup and Adversarial Domain Adaptation

Title Improving Robustness of Deep Learning Based Knee MRI Segmentation: Mixup and Adversarial Domain Adaptation
Authors Egor Panfilov, Aleksei Tiulpin, Stefan Klein, Miika T. Nieminen, Simo Saarakkala
Abstract Degeneration of articular cartilage (AC) is actively studied in knee osteoarthritis (OA) research via magnetic resonance imaging (MRI). Segmentation of AC tissues from MRI data is an essential step in quantification of their damage. Deep learning (DL) based methods have shown potential in this realm and are the current state-of-the-art, however, their robustness to heterogeneity of MRI acquisition settings remains an open problem. In this study, we investigated two modern regularization techniques – mixup and adversarial unsupervised domain adaptation (UDA) – to improve the robustness of DL-based knee cartilage segmentation to new MRI acquisition settings. Our validation setup included two datasets produced by different MRI scanners and using distinct data acquisition protocols. We assessed the robustness of automatic segmentation by comparing mixup and UDA approaches to a strong baseline method at different OA severity stages and, additionally, in relation to anatomical locations. Our results showed that for moderate changes in knee MRI data acquisition settings both approaches may provide notable improvements in the robustness, which are consistent for all stages of the disease and affect the clinically important areas of the knee joint. However, mixup may be considered as a recommended approach, since it is more computationally efficient and does not require additional data from the target acquisition setup.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2019-08-12
URL https://arxiv.org/abs/1908.04126v3
PDF https://arxiv.org/pdf/1908.04126v3.pdf
PWC https://paperswithcode.com/paper/improving-robustness-of-deep-learning-based
Repo https://github.com/MIPT-Oulu/RobustCartilageSegmentation
Framework none

Semi-supervised representation learning via dual autoencoders for domain adaptation

Title Semi-supervised representation learning via dual autoencoders for domain adaptation
Authors Shuai Yang, Hao Wang, Yuhong Zhang, Pei-Pei Li, Yi Zhu, Xuegang Hu
Abstract Domain adaptation aims to exploit the knowledge in source domain to promote the learning tasks in target domain, which plays a critical role in real-world applications. Recently, lots of deep learning approaches based on autoencoders have achieved a significance performance in domain adaptation. However, most existing methods focus on minimizing the distribution divergence by putting the source and target data together to learn global feature representations, while they do not consider the local relationship between instances in the same category from different domains. To address this problem, we propose a novel Semi-Supervised Representation Learning framework via Dual Autoencoders for domain adaptation, named SSRLDA. More specifically, we extract richer feature representations by learning the global and local feature representations simultaneously using two novel autoencoders, which are referred to as marginalized denoising autoencoder with adaptation distribution (MDAad) and multi-class marginalized denoising autoencoder (MMDA) respectively. Meanwhile, we make full use of label information to optimize feature representations. Experimental results show that our proposed approach outperforms several state-of-the-art baseline methods.
Tasks Denoising, Domain Adaptation, Representation Learning, Unsupervised Domain Adaptation
Published 2019-08-04
URL https://arxiv.org/abs/1908.01342v4
PDF https://arxiv.org/pdf/1908.01342v4.pdf
PWC https://paperswithcode.com/paper/semi-supervised-representation-learning-via
Repo https://github.com/Minminhfut/SSRLDACode
Framework none

Heterogeneous Gaussian Mechanism: Preserving Differential Privacy in Deep Learning with Provable Robustness

Title Heterogeneous Gaussian Mechanism: Preserving Differential Privacy in Deep Learning with Provable Robustness
Authors NhatHai Phan, Minh Vu, Yang Liu, Ruoming Jin, Dejing Dou, Xintao Wu, My T. Thai
Abstract In this paper, we propose a novel Heterogeneous Gaussian Mechanism (HGM) to preserve differential privacy in deep neural networks, with provable robustness against adversarial examples. We first relax the constraint of the privacy budget in the traditional Gaussian Mechanism from (0, 1] to (0, \infty), with a new bound of the noise scale to preserve differential privacy. The noise in our mechanism can be arbitrarily redistributed, offering a distinctive ability to address the trade-off between model utility and privacy loss. To derive provable robustness, our HGM is applied to inject Gaussian noise into the first hidden layer. Then, a tighter robustness bound is proposed. Theoretical analysis and thorough evaluations show that our mechanism notably improves the robustness of differentially private deep neural networks, compared with baseline approaches, under a variety of model attacks.
Tasks
Published 2019-06-02
URL https://arxiv.org/abs/1906.01444v1
PDF https://arxiv.org/pdf/1906.01444v1.pdf
PWC https://paperswithcode.com/paper/heterogeneous-gaussian-mechanism-preserving
Repo https://github.com/haiphanNJIT/SecureSGD
Framework tf

Source Camera Verification from Strongly Stabilized Videos

Title Source Camera Verification from Strongly Stabilized Videos
Authors Enes Altinisik, Husrev Taha Sencar
Abstract The in-camera image stabilization technology deployed by most cameras today poses one of the most significant challenges to photo-response non-uniformity based source camera attribution from videos. When performed digitally, stabilization involves cropping, warping, and inpainting of video frames to eliminate unwanted camera motion. Hence, successful attribution requires inversion of these transformations in a blind manner. To address this challenge, we introduce a source camera verification method for videos that takes into account spatially variant nature of stabilization transformations and assumes a larger degree of freedom in their search. Our method identifies transformations at a sub-frame level and incorporates a number of constraints to validate their correctness. The method also adopts a holistic approach in countering disruptive effects of other video generation steps, such as video coding and downsizing, for more reliable attribution. Tests performed on one public and two custom datasets show that proposed method is able to verify the source of 23-40% of videos that underwent stronger stabilization without a significant impact on false attribution rate
Tasks Video Generation
Published 2019-11-26
URL https://arxiv.org/abs/1912.05018v2
PDF https://arxiv.org/pdf/1912.05018v2.pdf
PWC https://paperswithcode.com/paper/source-camera-attribution-from-strongly
Repo https://github.com/VideoPRNUExtractor/Weighter
Framework none

CITE: A Corpus of Image-Text Discourse Relations

Title CITE: A Corpus of Image-Text Discourse Relations
Authors Malihe Alikhani, Sreyasi Nag Chowdhury, Gerard de Melo, Matthew Stone
Abstract This paper presents a novel crowd-sourced resource for multimodal discourse: our resource characterizes inferences in image-text contexts in the domain of cooking recipes in the form of coherence relations. Like previous corpora annotating discourse structure between text arguments, such as the Penn Discourse Treebank, our new corpus aids in establishing a better understanding of natural communication and common-sense reasoning, while our findings have implications for a wide range of applications, such as understanding and generation of multimodal documents.
Tasks Common Sense Reasoning
Published 2019-04-12
URL http://arxiv.org/abs/1904.06286v2
PDF http://arxiv.org/pdf/1904.06286v2.pdf
PWC https://paperswithcode.com/paper/cite-a-corpus-of-image-text-discourse
Repo https://github.com/malihealikhani/CITE
Framework none

Time2Vec: Learning a Vector Representation of Time

Title Time2Vec: Learning a Vector Representation of Time
Authors Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, Marcus Brubaker
Abstract Time is an important feature in many applications involving events that occur synchronously and/or asynchronously. To effectively consume time information, recent studies have focused on designing new architectures. In this paper, we take an orthogonal but complementary approach by providing a model-agnostic vector representation for time, called Time2Vec, that can be easily imported into many existing and future architectures and improve their performances. We show on a range of models and problems that replacing the notion of time with its Time2Vec representation improves the performance of the final model.
Tasks
Published 2019-07-11
URL https://arxiv.org/abs/1907.05321v1
PDF https://arxiv.org/pdf/1907.05321v1.pdf
PWC https://paperswithcode.com/paper/time2vec-learning-a-vector-representation-of
Repo https://github.com/avinashbarnwal/Time2Vec
Framework none

ProtoAttend: Attention-Based Prototypical Learning

Title ProtoAttend: Attention-Based Prototypical Learning
Authors Sercan O. Arik, Tomas Pfister
Abstract We propose a novel inherently interpretable machine learning method that bases decisions on few relevant examples that we call prototypes. Our method, ProtoAttend, can be integrated into a wide range of neural network architectures including pre-trained models. It utilizes an attention mechanism that relates the encoded representations to samples in order to determine prototypes. The resulting model outperforms state of the art in three high impact problems without sacrificing accuracy of the original model: (1) it enables high-quality interpretability that outputs samples most relevant to the decision-making (i.e. a sample-based interpretability method); (2) it achieves state of the art confidence estimation by quantifying the mismatch across prototype labels; and (3) it obtains state of the art in distribution mismatch detection. All this can be achieved with minimal additional test time and a practically viable training time computational cost.
Tasks Decision Making, Interpretable Machine Learning
Published 2019-02-17
URL https://arxiv.org/abs/1902.06292v4
PDF https://arxiv.org/pdf/1902.06292v4.pdf
PWC https://paperswithcode.com/paper/attention-based-prototypical-learning-towards
Repo https://github.com/google-research/google-research/tree/master/protoattend
Framework tf

Fonts-2-Handwriting: A Seed-Augment-Train framework for universal digit classification

Title Fonts-2-Handwriting: A Seed-Augment-Train framework for universal digit classification
Authors Vinay Uday Prabhu, Sanghyun Han, Dian Ang Yap, Mihail Douhaniaris, Preethi Seshadri, John Whaley
Abstract In this paper, we propose a Seed-Augment-Train/Transfer (SAT) framework that contains a synthetic seed image dataset generation procedure for languages with different numeral systems using freely available open font file datasets. This seed dataset of images is then augmented to create a purely synthetic training dataset, which is in turn used to train a deep neural network and test on held-out real world handwritten digits dataset spanning five Indic scripts, Kannada, Tamil, Gujarati, Malayalam, and Devanagari. We showcase the efficacy of this approach both qualitatively, by training a Boundary-seeking GAN (BGAN) that generates realistic digit images in the five languages, and also quantitatively by testing a CNN trained on the synthetic data on the real-world datasets. This establishes not only an interesting nexus between the font-datasets-world and transfer learning but also provides a recipe for universal-digit classification in any script.
Tasks Transfer Learning
Published 2019-05-16
URL https://arxiv.org/abs/1905.08633v1
PDF https://arxiv.org/pdf/1905.08633v1.pdf
PWC https://paperswithcode.com/paper/190508633
Repo https://github.com/unifyid-labs/DeepGenStruct-Notebooks
Framework none

A Learnable Safety Measure

Title A Learnable Safety Measure
Authors Steve Heim, Alexander von Rohr, Sebastian Trimpe, Alexander Badri-Spröwitz
Abstract Failures are challenging for learning to control physical systems since they risk damage, time-consuming resets, and often provide little gradient information. Adding safety constraints to exploration typically requires a lot of prior knowledge and domain expertise. We present a safety measure which implicitly captures how the system dynamics relate to a set of failure states. Not only can this measure be used as a safety function, but also to directly compute the set of safe state-action pairs. Further, we show a model-free approach to learn this measure by active sampling using Gaussian processes. While safety can only be guaranteed after learning the safety measure, we show that failures can already be greatly reduced by using the estimated measure during learning.
Tasks Gaussian Processes
Published 2019-10-07
URL https://arxiv.org/abs/1910.02835v1
PDF https://arxiv.org/pdf/1910.02835v1.pdf
PWC https://paperswithcode.com/paper/a-learnable-safety-measure
Repo https://github.com/sheim/vibly
Framework none

Visual Semantic Reasoning for Image-Text Matching

Title Visual Semantic Reasoning for Image-Text Matching
Authors Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu
Abstract Image-text matching has been a hot research topic bridging the vision and language areas. It remains challenging because the current representation of image usually lacks global semantic concepts as in its corresponding text caption. To address this issue, we propose a simple and interpretable reasoning model to generate visual representation that captures key objects and semantic concepts of a scene. Specifically, we first build up connections between image regions and perform reasoning with Graph Convolutional Networks to generate features with semantic relationships. Then, we propose to use the gate and memory mechanism to perform global semantic reasoning on these relationship-enhanced features, select the discriminative information and gradually generate the representation for the whole scene. Experiments validate that our method achieves a new state-of-the-art for the image-text matching on MS-COCO and Flickr30K datasets. It outperforms the current best method by 6.8% relatively for image retrieval and 4.8% relatively for caption retrieval on MS-COCO (Recall@1 using 1K test set). On Flickr30K, our model improves image retrieval by 12.6% relatively and caption retrieval by 5.8% relatively (Recall@1). Our code is available at https://github.com/KunpengLi1994/VSRN.
Tasks Image Retrieval, Text Matching
Published 2019-09-06
URL https://arxiv.org/abs/1909.02701v1
PDF https://arxiv.org/pdf/1909.02701v1.pdf
PWC https://paperswithcode.com/paper/visual-semantic-reasoning-for-image-text
Repo https://github.com/KunpengLi1994/VSRN
Framework pytorch
comments powered by Disqus