February 1, 2020

2886 words 14 mins read

Paper Group AWR 126

Accurate Visual Localization for Automotive Applications. Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual. Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts. Single Image Deraining: A Comprehensive Benchmark Analysis. Constructive Type-Logical Supertagging with Self-Attention Networks. Improving Robus …

Accurate Visual Localization for Automotive Applications


Title	Accurate Visual Localization for Automotive Applications
Authors	Eli Brosh, Matan Friedmann, Ilan Kadar, Lev Yitzhak Lavy, Elad Levi, Shmuel Rippa, Yair Lempert, Bruno Fernandez-Ruiz, Roei Herzig, Trevor Darrell
Abstract	Accurate vehicle localization is a crucial step towards building effective Vehicle-to-Vehicle networks and automotive applications. Yet standard grade GPS data, such as that provided by mobile phones, is often noisy and exhibits significant localization errors in many urban areas. Approaches for accurate localization from imagery often rely on structure-based techniques, and thus are limited in scale and are expensive to compute. In this paper, we present a scalable visual localization approach geared for real-time performance. We propose a hybrid coarse-to-fine approach that leverages visual and GPS location cues. Our solution uses a self-supervised approach to learn a compact road image representation. This representation enables efficient visual retrieval and provides coarse localization cues, which are fused with vehicle ego-motion to obtain high accuracy location estimates. As a benchmark to evaluate the performance of our visual localization approach, we introduce a new large-scale driving dataset based on video and GPS data obtained from a large-scale network of connected dash-cams. Our experiments confirm that our approach is highly effective in challenging urban environments, reducing localization error by an order of magnitude.
Tasks	Visual Localization
Published	2019-05-01
URL	http://arxiv.org/abs/1905.03706v1
PDF	http://arxiv.org/pdf/1905.03706v1.pdf
PWC	https://paperswithcode.com/paper/190503706
Repo	https://github.com/getnexar/Nexar-Visual-Localization
Framework	none

Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual


Title	Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual
Authors	He He, Sheng Zha, Haohan Wang
Abstract	Statistical natural language inference (NLI) models are susceptible to learning dataset bias: superficial cues that happen to associate with the label on a particular dataset, but are not useful in general, e.g., negation words indicate contradiction. As exposed by several recent challenge datasets, these models perform poorly when such association is absent, e.g., predicting that “I love dogs” contradicts “I don’t love cats”. Our goal is to design learning algorithms that guard against known dataset bias. We formalize the concept of dataset bias under the framework of distribution shift and present a simple debiasing algorithm based on residual fitting, which we call DRiFt. We first learn a biased model that only uses features that are known to relate to dataset bias. Then, we train a debiased model that fits to the residual of the biased model, focusing on examples that cannot be predicted well by biased features only. We use DRiFt to train three high-performing NLI models on two benchmark datasets, SNLI and MNLI. Our debiased models achieve significant gains over baseline models on two challenge test sets, while maintaining reasonable performance on the original test sets.
Tasks	Natural Language Inference
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10763v2
PDF	https://arxiv.org/pdf/1908.10763v2.pdf
PWC	https://paperswithcode.com/paper/unlearn-dataset-bias-in-natural-language
Repo	https://github.com/hhexiy/debiased
Framework	mxnet

Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts


Title	Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts
Authors	Rui Xia, Zixiang Ding
Abstract	Emotion cause extraction (ECE), the task aimed at extracting the potential causes behind certain emotions in text, has gained much attention in recent years due to its wide applications. However, it suffers from two shortcomings: 1) the emotion must be annotated before cause extraction in ECE, which greatly limits its applications in real-world scenarios; 2) the way to first annotate emotion and then extract the cause ignores the fact that they are mutually indicative. In this work, we propose a new task: emotion-cause pair extraction (ECPE), which aims to extract the potential pairs of emotions and corresponding causes in a document. We propose a 2-step approach to address this new ECPE task, which first performs individual emotion extraction and cause extraction via multi-task learning, and then conduct emotion-cause pairing and filtering. The experimental results on a benchmark emotion cause corpus prove the feasibility of the ECPE task as well as the effectiveness of our approach.
Tasks	Emotion Recognition, Multi-Task Learning
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01267v1
PDF	https://arxiv.org/pdf/1906.01267v1.pdf
PWC	https://paperswithcode.com/paper/emotion-cause-pair-extraction-a-new-task-to
Repo	https://github.com/NUSTM/ECPE
Framework	tf

Single Image Deraining: A Comprehensive Benchmark Analysis


Title	Single Image Deraining: A Comprehensive Benchmark Analysis
Authors	Siyuan Li, Iago Breno Araujo, Wenqi Ren, Zhangyang Wang, Eric K. Tokuda, Roberto Hirata Junior, Roberto Cesar-Junior, Jiawan Zhang, Xiaojie Guo, Xiaochun Cao
Abstract	We present a comprehensive study and evaluation of existing single image deraining algorithms, using a new large-scale benchmark consisting of both synthetic and real-world rainy images.This dataset highlights diverse data sources and image contents, and is divided into three subsets (rain streak, rain drop, rain and mist), each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Experiments on the dataset shed light on the comparisons and limitations of state-of-the-art deraining algorithms, and suggest promising future directions.
Tasks	Rain Removal, Single Image Deraining
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08558v1
PDF	http://arxiv.org/pdf/1903.08558v1.pdf
PWC	https://paperswithcode.com/paper/single-image-deraining-a-comprehensive
Repo	https://github.com/lsy17096535/Single-Image-Deraining
Framework	none

Constructive Type-Logical Supertagging with Self-Attention Networks


Title	Constructive Type-Logical Supertagging with Self-Attention Networks
Authors	Konstantinos Kogkalidis, Michael Moortgat, Tejaswini Deoskar
Abstract	We propose a novel application of self-attention networks towards grammar induction. We present an attention-based supertagger for a refined type-logical grammar, trained on constructing types inductively. In addition to achieving a high overall type accuracy, our model is able to learn the syntax of the grammar’s type system along with its denotational semantics. This lifts the closed world assumption commonly made by lexicalized grammar supertaggers, greatly enhancing its generalization potential. This is evidenced both by its adequate accuracy over sparse word types and its ability to correctly construct complex types never seen during training, which, to the best of our knowledge, was as of yet unaccomplished.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13418v1
PDF	https://arxiv.org/pdf/1905.13418v1.pdf
PWC	https://paperswithcode.com/paper/constructive-type-logical-supertagging-with
Repo	https://github.com/konstantinosKokos/Lassy-TLG-Supertagging
Framework	pytorch

Improving Robustness of Deep Learning Based Knee MRI Segmentation: Mixup and Adversarial Domain Adaptation


Title	Improving Robustness of Deep Learning Based Knee MRI Segmentation: Mixup and Adversarial Domain Adaptation
Authors	Egor Panfilov, Aleksei Tiulpin, Stefan Klein, Miika T. Nieminen, Simo Saarakkala
Abstract	Degeneration of articular cartilage (AC) is actively studied in knee osteoarthritis (OA) research via magnetic resonance imaging (MRI). Segmentation of AC tissues from MRI data is an essential step in quantification of their damage. Deep learning (DL) based methods have shown potential in this realm and are the current state-of-the-art, however, their robustness to heterogeneity of MRI acquisition settings remains an open problem. In this study, we investigated two modern regularization techniques – mixup and adversarial unsupervised domain adaptation (UDA) – to improve the robustness of DL-based knee cartilage segmentation to new MRI acquisition settings. Our validation setup included two datasets produced by different MRI scanners and using distinct data acquisition protocols. We assessed the robustness of automatic segmentation by comparing mixup and UDA approaches to a strong baseline method at different OA severity stages and, additionally, in relation to anatomical locations. Our results showed that for moderate changes in knee MRI data acquisition settings both approaches may provide notable improvements in the robustness, which are consistent for all stages of the disease and affect the clinically important areas of the knee joint. However, mixup may be considered as a recommended approach, since it is more computationally efficient and does not require additional data from the target acquisition setup.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2019-08-12
URL	https://arxiv.org/abs/1908.04126v3
PDF	https://arxiv.org/pdf/1908.04126v3.pdf
PWC	https://paperswithcode.com/paper/improving-robustness-of-deep-learning-based
Repo	https://github.com/MIPT-Oulu/RobustCartilageSegmentation
Framework	none

Semi-supervised representation learning via dual autoencoders for domain adaptation


Title	Semi-supervised representation learning via dual autoencoders for domain adaptation
Authors	Shuai Yang, Hao Wang, Yuhong Zhang, Pei-Pei Li, Yi Zhu, Xuegang Hu
Abstract	Domain adaptation aims to exploit the knowledge in source domain to promote the learning tasks in target domain, which plays a critical role in real-world applications. Recently, lots of deep learning approaches based on autoencoders have achieved a significance performance in domain adaptation. However, most existing methods focus on minimizing the distribution divergence by putting the source and target data together to learn global feature representations, while they do not consider the local relationship between instances in the same category from different domains. To address this problem, we propose a novel Semi-Supervised Representation Learning framework via Dual Autoencoders for domain adaptation, named SSRLDA. More specifically, we extract richer feature representations by learning the global and local feature representations simultaneously using two novel autoencoders, which are referred to as marginalized denoising autoencoder with adaptation distribution (MDAad) and multi-class marginalized denoising autoencoder (MMDA) respectively. Meanwhile, we make full use of label information to optimize feature representations. Experimental results show that our proposed approach outperforms several state-of-the-art baseline methods.
Tasks	Denoising, Domain Adaptation, Representation Learning, Unsupervised Domain Adaptation
Published	2019-08-04
URL	https://arxiv.org/abs/1908.01342v4
PDF	https://arxiv.org/pdf/1908.01342v4.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-representation-learning-via
Repo	https://github.com/Minminhfut/SSRLDACode
Framework	none

Heterogeneous Gaussian Mechanism: Preserving Differential Privacy in Deep Learning with Provable Robustness


Title	Heterogeneous Gaussian Mechanism: Preserving Differential Privacy in Deep Learning with Provable Robustness
Authors	NhatHai Phan, Minh Vu, Yang Liu, Ruoming Jin, Dejing Dou, Xintao Wu, My T. Thai
Abstract	In this paper, we propose a novel Heterogeneous Gaussian Mechanism (HGM) to preserve differential privacy in deep neural networks, with provable robustness against adversarial examples. We first relax the constraint of the privacy budget in the traditional Gaussian Mechanism from (0, 1] to (0, \infty), with a new bound of the noise scale to preserve differential privacy. The noise in our mechanism can be arbitrarily redistributed, offering a distinctive ability to address the trade-off between model utility and privacy loss. To derive provable robustness, our HGM is applied to inject Gaussian noise into the first hidden layer. Then, a tighter robustness bound is proposed. Theoretical analysis and thorough evaluations show that our mechanism notably improves the robustness of differentially private deep neural networks, compared with baseline approaches, under a variety of model attacks.
Tasks
Published	2019-06-02
URL	https://arxiv.org/abs/1906.01444v1
PDF	https://arxiv.org/pdf/1906.01444v1.pdf
PWC	https://paperswithcode.com/paper/heterogeneous-gaussian-mechanism-preserving
Repo	https://github.com/haiphanNJIT/SecureSGD
Framework	tf

Source Camera Verification from Strongly Stabilized Videos


Title	Source Camera Verification from Strongly Stabilized Videos
Authors	Enes Altinisik, Husrev Taha Sencar
Abstract	The in-camera image stabilization technology deployed by most cameras today poses one of the most significant challenges to photo-response non-uniformity based source camera attribution from videos. When performed digitally, stabilization involves cropping, warping, and inpainting of video frames to eliminate unwanted camera motion. Hence, successful attribution requires inversion of these transformations in a blind manner. To address this challenge, we introduce a source camera verification method for videos that takes into account spatially variant nature of stabilization transformations and assumes a larger degree of freedom in their search. Our method identifies transformations at a sub-frame level and incorporates a number of constraints to validate their correctness. The method also adopts a holistic approach in countering disruptive effects of other video generation steps, such as video coding and downsizing, for more reliable attribution. Tests performed on one public and two custom datasets show that proposed method is able to verify the source of 23-40% of videos that underwent stronger stabilization without a significant impact on false attribution rate
Tasks	Video Generation
Published	2019-11-26
URL	https://arxiv.org/abs/1912.05018v2
PDF	https://arxiv.org/pdf/1912.05018v2.pdf
PWC	https://paperswithcode.com/paper/source-camera-attribution-from-strongly
Repo	https://github.com/VideoPRNUExtractor/Weighter
Framework	none

CITE: A Corpus of Image-Text Discourse Relations


Title	CITE: A Corpus of Image-Text Discourse Relations
Authors	Malihe Alikhani, Sreyasi Nag Chowdhury, Gerard de Melo, Matthew Stone
Abstract	This paper presents a novel crowd-sourced resource for multimodal discourse: our resource characterizes inferences in image-text contexts in the domain of cooking recipes in the form of coherence relations. Like previous corpora annotating discourse structure between text arguments, such as the Penn Discourse Treebank, our new corpus aids in establishing a better understanding of natural communication and common-sense reasoning, while our findings have implications for a wide range of applications, such as understanding and generation of multimodal documents.
Tasks	Common Sense Reasoning
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06286v2
PDF	http://arxiv.org/pdf/1904.06286v2.pdf
PWC	https://paperswithcode.com/paper/cite-a-corpus-of-image-text-discourse
Repo	https://github.com/malihealikhani/CITE
Framework	none

Time2Vec: Learning a Vector Representation of Time


Title	Time2Vec: Learning a Vector Representation of Time
Authors	Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, Marcus Brubaker
Abstract	Time is an important feature in many applications involving events that occur synchronously and/or asynchronously. To effectively consume time information, recent studies have focused on designing new architectures. In this paper, we take an orthogonal but complementary approach by providing a model-agnostic vector representation for time, called Time2Vec, that can be easily imported into many existing and future architectures and improve their performances. We show on a range of models and problems that replacing the notion of time with its Time2Vec representation improves the performance of the final model.
Tasks
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05321v1
PDF	https://arxiv.org/pdf/1907.05321v1.pdf
PWC	https://paperswithcode.com/paper/time2vec-learning-a-vector-representation-of
Repo	https://github.com/avinashbarnwal/Time2Vec
Framework	none

ProtoAttend: Attention-Based Prototypical Learning


Title	ProtoAttend: Attention-Based Prototypical Learning
Authors	Sercan O. Arik, Tomas Pfister
Abstract	We propose a novel inherently interpretable machine learning method that bases decisions on few relevant examples that we call prototypes. Our method, ProtoAttend, can be integrated into a wide range of neural network architectures including pre-trained models. It utilizes an attention mechanism that relates the encoded representations to samples in order to determine prototypes. The resulting model outperforms state of the art in three high impact problems without sacrificing accuracy of the original model: (1) it enables high-quality interpretability that outputs samples most relevant to the decision-making (i.e. a sample-based interpretability method); (2) it achieves state of the art confidence estimation by quantifying the mismatch across prototype labels; and (3) it obtains state of the art in distribution mismatch detection. All this can be achieved with minimal additional test time and a practically viable training time computational cost.
Tasks	Decision Making, Interpretable Machine Learning
Published	2019-02-17
URL	https://arxiv.org/abs/1902.06292v4
PDF	https://arxiv.org/pdf/1902.06292v4.pdf
PWC	https://paperswithcode.com/paper/attention-based-prototypical-learning-towards
Repo	https://github.com/google-research/google-research/tree/master/protoattend
Framework	tf

Fonts-2-Handwriting: A Seed-Augment-Train framework for universal digit classification


Title	Fonts-2-Handwriting: A Seed-Augment-Train framework for universal digit classification
Authors	Vinay Uday Prabhu, Sanghyun Han, Dian Ang Yap, Mihail Douhaniaris, Preethi Seshadri, John Whaley
Abstract	In this paper, we propose a Seed-Augment-Train/Transfer (SAT) framework that contains a synthetic seed image dataset generation procedure for languages with different numeral systems using freely available open font file datasets. This seed dataset of images is then augmented to create a purely synthetic training dataset, which is in turn used to train a deep neural network and test on held-out real world handwritten digits dataset spanning five Indic scripts, Kannada, Tamil, Gujarati, Malayalam, and Devanagari. We showcase the efficacy of this approach both qualitatively, by training a Boundary-seeking GAN (BGAN) that generates realistic digit images in the five languages, and also quantitatively by testing a CNN trained on the synthetic data on the real-world datasets. This establishes not only an interesting nexus between the font-datasets-world and transfer learning but also provides a recipe for universal-digit classification in any script.
Tasks	Transfer Learning
Published	2019-05-16
URL	https://arxiv.org/abs/1905.08633v1
PDF	https://arxiv.org/pdf/1905.08633v1.pdf
PWC	https://paperswithcode.com/paper/190508633
Repo	https://github.com/unifyid-labs/DeepGenStruct-Notebooks
Framework	none

A Learnable Safety Measure


Title	A Learnable Safety Measure
Authors	Steve Heim, Alexander von Rohr, Sebastian Trimpe, Alexander Badri-Spröwitz
Abstract	Failures are challenging for learning to control physical systems since they risk damage, time-consuming resets, and often provide little gradient information. Adding safety constraints to exploration typically requires a lot of prior knowledge and domain expertise. We present a safety measure which implicitly captures how the system dynamics relate to a set of failure states. Not only can this measure be used as a safety function, but also to directly compute the set of safe state-action pairs. Further, we show a model-free approach to learn this measure by active sampling using Gaussian processes. While safety can only be guaranteed after learning the safety measure, we show that failures can already be greatly reduced by using the estimated measure during learning.
Tasks	Gaussian Processes
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02835v1
PDF	https://arxiv.org/pdf/1910.02835v1.pdf
PWC	https://paperswithcode.com/paper/a-learnable-safety-measure
Repo	https://github.com/sheim/vibly
Framework	none

Visual Semantic Reasoning for Image-Text Matching


Title	Visual Semantic Reasoning for Image-Text Matching
Authors	Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu
Abstract	Image-text matching has been a hot research topic bridging the vision and language areas. It remains challenging because the current representation of image usually lacks global semantic concepts as in its corresponding text caption. To address this issue, we propose a simple and interpretable reasoning model to generate visual representation that captures key objects and semantic concepts of a scene. Specifically, we first build up connections between image regions and perform reasoning with Graph Convolutional Networks to generate features with semantic relationships. Then, we propose to use the gate and memory mechanism to perform global semantic reasoning on these relationship-enhanced features, select the discriminative information and gradually generate the representation for the whole scene. Experiments validate that our method achieves a new state-of-the-art for the image-text matching on MS-COCO and Flickr30K datasets. It outperforms the current best method by 6.8% relatively for image retrieval and 4.8% relatively for caption retrieval on MS-COCO (Recall@1 using 1K test set). On Flickr30K, our model improves image retrieval by 12.6% relatively and caption retrieval by 5.8% relatively (Recall@1). Our code is available at https://github.com/KunpengLi1994/VSRN.
Tasks	Image Retrieval, Text Matching
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02701v1
PDF	https://arxiv.org/pdf/1909.02701v1.pdf
PWC	https://paperswithcode.com/paper/visual-semantic-reasoning-for-image-text
Repo	https://github.com/KunpengLi1994/VSRN
Framework	pytorch