February 1, 2020

3112 words 15 mins read

Paper Group AWR 163

Why are Saliency Maps Noisy? Cause of and Solution to Noisy Saliency Maps. UNITER: Learning UNiversal Image-TExt Representations. Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases. Deep Leakage from Gradients. STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing. To Tune or Not To Tun …

Why are Saliency Maps Noisy? Cause of and Solution to Noisy Saliency Maps


Title	Why are Saliency Maps Noisy? Cause of and Solution to Noisy Saliency Maps
Authors	Beomsu Kim, Junghoon Seo, SeungHyun Jeon, Jamyoung Koo, Jeongyeol Choe, Taegyun Jeon
Abstract	Saliency Map, the gradient of the score function with respect to the input, is the most basic technique for interpreting deep neural network decisions. However, saliency maps are often visually noisy. Although several hypotheses were proposed to account for this phenomenon, there are few works that provide rigorous analyses of noisy saliency maps. In this paper, we firstly propose a new hypothesis that noise may occur in saliency maps when irrelevant features pass through ReLU activation functions. Then, we propose Rectified Gradient, a method that alleviates this problem through layer-wise thresholding during backpropagation. Experiments with neural networks trained on CIFAR-10 and ImageNet showed effectiveness of our method and its superiority to other attribution methods.
Tasks
Published	2019-02-13
URL	https://arxiv.org/abs/1902.04893v3
PDF	https://arxiv.org/pdf/1902.04893v3.pdf
PWC	https://paperswithcode.com/paper/why-are-saliency-maps-noisy-cause-of-and
Repo	https://github.com/1202kbs/Rectified-Gradient
Framework	tf

UNITER: Learning UNiversal Image-TExt Representations


Title	UNITER: Learning UNiversal Image-TExt Representations
Authors	Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu
Abstract	Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding. In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings. We design three pre-training tasks: Masked Language Modeling (MLM), Image-Text Matching (ITM), and Masked Region Modeling (MRM, with three variants). Different from concurrent work on multimodal pre-training that apply joint random masking to both modalities, we use conditioned masking on pre-training tasks (i.e., masked language/region modeling is conditioned on full observation of image/text). Comprehensive analysis shows that conditioned masking yields better performance than unconditioned masking. We also conduct a thorough ablation study to find an optimal setting for the combination of pre-training tasks. Extensive experiments show that UNITER achieves new state of the art across six V+L tasks (over nine datasets), including Visual Question Answering, Image-Text Retrieval, Referring Expression Comprehension, Visual Commonsense Reasoning, Visual Entailment, and NLVR2.
Tasks	Language Modelling, Question Answering, Text Matching, Visual Commonsense Reasoning, Visual Question Answering
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11740v1
PDF	https://arxiv.org/pdf/1909.11740v1.pdf
PWC	https://paperswithcode.com/paper/uniter-learning-universal-image-text-1
Repo	https://github.com/ChenRocks/UNITER
Framework	pytorch

Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases


Title	Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
Authors	Christopher Clark, Mark Yatskar, Luke Zettlemoyer
Abstract	State-of-the-art models often make use of superficial patterns in the data that do not generalize well to out-of-domain or adversarial settings. For example, textual entailment models often learn that particular key words imply entailment, irrespective of context, and visual question answering models learn to predict prototypical answers, without considering evidence in the image. In this paper, we show that if we have prior knowledge of such biases, we can train a model to be more robust to domain shift. Our method has two stages: we (1) train a naive model that makes predictions exclusively based on dataset biases, and (2) train a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize. Experiments on five datasets with out-of-domain test sets show significantly improved robustness in all settings, including a 12 point gain on a changing priors visual question answering dataset and a 9 point gain on an adversarial question answering test set.
Tasks	Natural Language Inference, Question Answering, Visual Question Answering
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03683v1
PDF	https://arxiv.org/pdf/1909.03683v1.pdf
PWC	https://paperswithcode.com/paper/dont-take-the-easy-way-out-ensemble-based
Repo	https://github.com/chrisc36/debias
Framework	tf

Deep Leakage from Gradients


Title	Deep Leakage from Gradients
Authors	Ligeng Zhu, Zhijian Liu, Song Han
Abstract	Exchanging gradients is a widely used method in modern multi-node machine learning system (e.g., distributed training, collaborative learning). For a long time, people believed that gradients are safe to share: i.e., the training data will not be leaked by gradient exchange. However, we show that it is possible to obtain the private training data from the publicly shared gradients. We name this leakage as Deep Leakage from Gradient and empirically validate the effectiveness on both computer vision and natural language processing tasks. Experimental results show that our attack is much stronger than previous approaches: the recovery is pixel-wise accurate for images and token-wise matching for texts. We want to raise people’s awareness to rethink the gradient’s safety. Finally, we discuss several possible strategies to prevent such deep leakage. The most effective defense method is gradient pruning.
Tasks
Published	2019-06-21
URL	https://arxiv.org/abs/1906.08935v2
PDF	https://arxiv.org/pdf/1906.08935v2.pdf
PWC	https://paperswithcode.com/paper/deep-leakage-from-gradients
Repo	https://github.com/mit-han-lab/dlg
Framework	pytorch

STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing


Title	STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing
Authors	Ming Liu, Yukang Ding, Min Xia, Xiao Liu, Errui Ding, Wangmeng Zuo, Shilei Wen
Abstract	Arbitrary attribute editing generally can be tackled by incorporating encoder-decoder and generative adversarial networks. However, the bottleneck layer in encoder-decoder usually gives rise to blurry and low quality editing result. And adding skip connections improves image quality at the cost of weakened attribute manipulation ability. Moreover, existing methods exploit target attribute vector to guide the flexible translation to desired target domain. In this work, we suggest to address these issues from selective transfer perspective. Considering that specific editing task is certainly only related to the changed attributes instead of all target attributes, our model selectively takes the difference between target and source attribute vectors as input. Furthermore, selective transfer units are incorporated with encoder-decoder to adaptively select and modify encoder feature for enhanced attribute editing. Experiments show that our method (i.e., STGAN) simultaneously improves attribute manipulation accuracy as well as perception quality, and performs favorably against state-of-the-arts in arbitrary facial attribute editing and season translation.
Tasks
Published	2019-04-22
URL	http://arxiv.org/abs/1904.09709v1
PDF	http://arxiv.org/pdf/1904.09709v1.pdf
PWC	https://paperswithcode.com/paper/stgan-a-unified-selective-transfer-network
Repo	https://github.com/csmliu/STGAN
Framework	tf

To Tune or Not To Tune? How About the Best of Both Worlds?


Title	To Tune or Not To Tune? How About the Best of Both Worlds?
Authors	Ran Wang, Haibo Su, Chunye Wang, Kailin Ji, Jupeng Ding
Abstract	The introduction of pre-trained language models has revolutionized natural language research communities. However, researchers still know relatively little regarding their theoretical and empirical properties. In this regard, Peters et al. perform several experiments which demonstrate that it is better to adapt BERT with a light-weight task-specific head, rather than building a complex one on top of the pre-trained language model, and freeze the parameters in the said language model. However, there is another option to adopt. In this paper, we propose a new adaptation method which we first train the task model with the BERT parameters frozen and then fine-tune the entire model together. Our experimental results show that our model adaptation method can achieve 4.7% accuracy improvement in semantic similarity task, 0.99% accuracy improvement in sequence labeling task and 0.72% accuracy improvement in the text classification task.
Tasks	Language Modelling, Semantic Similarity, Semantic Textual Similarity, Text Classification
Published	2019-07-09
URL	https://arxiv.org/abs/1907.05338v1
PDF	https://arxiv.org/pdf/1907.05338v1.pdf
PWC	https://paperswithcode.com/paper/to-tune-or-not-to-tune-how-about-the-best-of
Repo	https://github.com/uzaymacar/comparatively-finetuning-bert
Framework	pytorch

Neural Snowball for Few-Shot Relation Learning


Title	Neural Snowball for Few-Shot Relation Learning
Authors	Tianyu Gao, Xu Han, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun
Abstract	Knowledge graphs typically undergo open-ended growth of new relations. This cannot be well handled by relation extraction that focuses on pre-defined relations with sufficient training data. To address new relations with few-shot instances, we propose a novel bootstrapping approach, Neural Snowball, to learn new relations by transferring semantic knowledge about existing relations. More specifically, we use Relational Siamese Networks (RSN) to learn the metric of relational similarities between instances based on existing relations and their labeled data. Afterwards, given a new relation and its few-shot instances, we use RSN to accumulate reliable instances from unlabeled corpora; these instances are used to train a relation classifier, which can further identify new facts of the new relation. The process is conducted iteratively like a snowball. Experiments show that our model can gather high-quality instances for better few-shot relation learning and achieves significant improvement compared to baselines. Codes and datasets are released on https://github.com/thunlp/Neural-Snowball.
Tasks	Knowledge Graphs, Relation Extraction
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11007v2
PDF	https://arxiv.org/pdf/1908.11007v2.pdf
PWC	https://paperswithcode.com/paper/neural-snowball-for-few-shot-relation
Repo	https://github.com/thunlp/Neural-Snowball
Framework	pytorch

ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging


Title	ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging
Authors	Samarth Brahmbhatt, Cusuh Ham, Charles C. Kemp, James Hays
Abstract	Grasping and manipulating objects is an important human skill. Since hand-object contact is fundamental to grasping, capturing it can lead to important insights. However, observing contact through external sensors is challenging because of occlusion and the complexity of the human hand. We present ContactDB, a novel dataset of contact maps for household objects that captures the rich hand-object contact that occurs during grasping, enabled by use of a thermal camera. Participants in our study grasped 3D printed objects with a post-grasp functional intent. ContactDB includes 3750 3D meshes of 50 household objects textured with contact maps and 375K frames of synchronized RGB-D+thermal images. To the best of our knowledge, this is the first large-scale dataset that records detailed contact maps for human grasps. Analysis of this data shows the influence of functional intent and object size on grasping, the tendency to touch/avoid ‘active areas’, and the high frequency of palm and proximal finger contact. Finally, we train state-of-the-art image translation and 3D convolution algorithms to predict diverse contact patterns from object shape. Data, code and models are available at https://contactdb.cc.gatech.edu.
Tasks	Human Grasp Contact Prediction
Published	2019-04-15
URL	http://arxiv.org/abs/1904.06830v1
PDF	http://arxiv.org/pdf/1904.06830v1.pdf
PWC	https://paperswithcode.com/paper/contactdb-analyzing-and-predicting-grasp
Repo	https://github.com/samarth-robo/contactdb_utils
Framework	none

Hybrid Planning for Dynamic Multimodal Stochastic Shortest Paths


Title	Hybrid Planning for Dynamic Multimodal Stochastic Shortest Paths
Authors	Shushman Choudhury, Mykel J. Kochenderfer
Abstract	Sequential decision problems in applications such as manipulation in warehouses, multi-step meal preparation, and routing in autonomous vehicle networks often involve reasoning about uncertainty, planning over discrete modes as well as continuous states, and reacting to dynamic updates. To formalize such problems generally, we introduce a class of Markov Decision Processes (MDPs) called Dynamic Multimodal Stochastic Shortest Paths (DMSSPs). Much of the work in these domains solves deterministic variants, which can yield poor results when the uncertainty has downstream effects. We develop a Hybrid Stochastic Planning (HSP) algorithm, which uses domain-agnostic abstractions to efficiently unify heuristic search for planning over discrete modes, approximate dynamic programming for stochastic planning over continuous states, and hierarchical interleaved planning and execution. In the domain of autonomous multimodal routing, HSP obtains significantly higher quality solutions than a state-of-the-art Upper Confidence Trees algorithm and a two-level Receding Horizon Control algorithm.
Tasks
Published	2019-06-21
URL	https://arxiv.org/abs/1906.09094v1
PDF	https://arxiv.org/pdf/1906.09094v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-planning-for-dynamic-multimodal
Repo	https://github.com/sisl/CMSSPs
Framework	none

DZip: improved general-purpose lossless compression based on novel neural network modeling


Title	DZip: improved general-purpose lossless compression based on novel neural network modeling
Authors	Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, Idoia Ochoa
Abstract	We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate statistical model for the input data leads to substantial improvements in compression. We propose DZip, a general-purpose compressor for sequential data that exploits the well-known modeling capabilities of neural networks (NNs) for prediction, followed by arithmetic coding. Dzip uses a novel hybrid architecture based on adaptive and semi-adaptive training. Unlike most NN based compressors, DZip does not require additional training data and is not restricted to specific data types, only needing the alphabet size of the input data. The proposed compressor outperforms general-purpose compressors such as Gzip (on average 26% reduction) on a variety of real datasets, achieves near-optimal compression on synthetic datasets, and performs close to specialized compressors for large sequence lengths, without any human input. The main limitation of DZip in its current implementation is the encoding/decoding time, which limits its practicality. Nevertheless, the results showcase the potential of developing improved general-purpose compressors based on neural networks and hybrid modeling.
Tasks
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03572v1
PDF	https://arxiv.org/pdf/1911.03572v1.pdf
PWC	https://paperswithcode.com/paper/dzip-improved-general-purpose-lossless
Repo	https://github.com/mohit1997/DZip
Framework	tf

Synchronous Bidirectional Inference for Neural Sequence Generation


Title	Synchronous Bidirectional Inference for Neural Sequence Generation
Authors	Jiajun Zhang, Long Zhou, Yang Zhao, Chengqing Zong
Abstract	In sequence to sequence generation tasks (e.g. machine translation and abstractive summarization), inference is generally performed in a left-to-right manner to produce the result token by token. The neural approaches, such as LSTM and self-attention networks, are now able to make full use of all the predicted history hypotheses from left side during inference, but cannot meanwhile access any future (right side) information and usually generate unbalanced outputs in which left parts are much more accurate than right ones. In this work, we propose a synchronous bidirectional inference model to generate outputs using both left-to-right and right-to-left decoding simultaneously and interactively. First, we introduce a novel beam search algorithm that facilitates synchronous bidirectional decoding. Then, we present the core approach which enables left-to-right and right-to-left decoding to interact with each other, so as to utilize both the history and future predictions simultaneously during inference. We apply the proposed model to both LSTM and self-attention networks. In addition, we propose two strategies for parameter optimization. The extensive experiments on machine translation and abstractive summarization demonstrate that our synchronous bidirectional inference model can achieve remarkable improvements over the strong baselines.
Tasks	Abstractive Text Summarization, Machine Translation
Published	2019-02-24
URL	http://arxiv.org/abs/1902.08955v1
PDF	http://arxiv.org/pdf/1902.08955v1.pdf
PWC	https://paperswithcode.com/paper/synchronous-bidirectional-inference-for
Repo	https://github.com/ZNLP/sb-nmt
Framework	tf

Abstractive Summarization of Spoken and Written Conversation


Title	Abstractive Summarization of Spoken and Written Conversation
Authors	Prakhar Ganesh, Saket Dingliwal
Abstract	Nowadays, lots of information is available in form of dialogues. We propose a novel abstractive summarization system for conversations. We use sequence tagging of utterances for identifying the discourse relations of the dialogue. After aptly capturing these relations in a paragraph, we feed it into an Attention-based pointer network to produce abstractive summaries. We obtain ROUGE-1, 2 F-scores similar to those of extractive summaries of various previous works.
Tasks	Abstractive Text Summarization
Published	2019-02-05
URL	http://arxiv.org/abs/1902.01615v1
PDF	http://arxiv.org/pdf/1902.01615v1.pdf
PWC	https://paperswithcode.com/paper/abstractive-summarization-of-spoken-and
Repo	https://github.com/saketdingliwal/Abstractive-Dialogue-Summarization
Framework	tf

Deep localization of protein structures in fluorescence microscopy images


Title	Deep localization of protein structures in fluorescence microscopy images
Authors	Muhammad Tahir, Saeed Anwar, Ajmal Mian
Abstract	Accurate localization of proteins from fluorescence microscopy images is a challenging task due to the inter-class similarities and intra-class disparities introducing grave concerns in addressing multi-class classification problems. Conventional machine learning-based image prediction relies heavily on pre-processing such as normalization and segmentation followed by hand-crafted feature extraction before classification to identify useful and informative as well as application specific features.We propose an end-to-end Protein Localization Convolutional Neural Network (PLCNN) that classifies protein localization images more accurately and reliably. PLCNN directly processes raw imagery without involving any pre-processing steps and produces outputs without any customization or parameter adjustment for a particular dataset. The output of our approach is computed from probabilities produced by the network. Experimental analysis is performed on five publicly available benchmark datasets. PLCNN consistently outperformed the existing state-of-the-art approaches from machine learning and deep architectures.
Tasks
Published	2019-10-09
URL	https://arxiv.org/abs/1910.04287v1
PDF	https://arxiv.org/pdf/1910.04287v1.pdf
PWC	https://paperswithcode.com/paper/deep-localization-of-protein-structures-in
Repo	https://github.com/saeed-anwar/PLCNN
Framework	pytorch

Generating Multiple Objects at Spatially Distinct Locations


Title	Generating Multiple Objects at Spatially Distinct Locations
Authors	Tobias Hinz, Stefan Heinrich, Stefan Wermter
Abstract	Recent improvements to Generative Adversarial Networks (GANs) have made it possible to generate realistic images in high resolution based on natural language descriptions such as image captions. Furthermore, conditional GANs allow us to control the image generation process through labels or even natural language descriptions. However, fine-grained control of the image layout, i.e. where in the image specific objects should be located, is still difficult to achieve. This is especially true for images that should contain multiple distinct objects at different spatial locations. We introduce a new approach which allows us to control the location of arbitrarily many objects within an image by adding an object pathway to both the generator and the discriminator. Our approach does not need a detailed semantic layout but only bounding boxes and the respective labels of the desired objects are needed. The object pathway focuses solely on the individual objects and is iteratively applied at the locations specified by the bounding boxes. The global pathway focuses on the image background and the general image layout. We perform experiments on the Multi-MNIST, CLEVR, and the more complex MS-COCO data set. Our experiments show that through the use of the object pathway we can control object locations within images and can model complex scenes with multiple objects at various locations. We further show that the object pathway focuses on the individual objects and learns features relevant for these, while the global pathway focuses on global image characteristics and the image background.
Tasks	Conditional Image Generation, Image Generation, Text-to-Image Generation
Published	2019-01-03
URL	http://arxiv.org/abs/1901.00686v1
PDF	http://arxiv.org/pdf/1901.00686v1.pdf
PWC	https://paperswithcode.com/paper/generating-multiple-objects-at-spatially
Repo	https://github.com/tohinz/multiple-objects-gan
Framework	pytorch

LitGen: Genetic Literature Recommendation Guided by Human Explanations


Title	LitGen: Genetic Literature Recommendation Guided by Human Explanations
Authors	Allen Nie, Arturo L. Pineda, Matt W. Wright Hannah Wand, Bryan Wulf, Helio A. Costa, Ronak Y. Patel, Carlos D. Bustamante, James Zou
Abstract	As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curation particularly time-consuming is that the curator needs to identify papers that study variant pathogenicity using different types of approaches and evidences—e.g. biochemical assays or case control analysis. In collaboration with the Clinical Genomic Resource (ClinGen)—the flagship NIH program for clinical curation—we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity. LitGen uses semi-supervised deep learning to predict the type of evidence provided by each paper. It is trained on papers annotated by ClinGen curators and systematically evaluated on new test data collected by ClinGen. LitGen further leverages rich human explanations and unlabeled data to gain 7.9%-12.6% relative performance improvement over models learned only on the annotated papers. It is a useful framework to improve clinical variant curation.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10699v1
PDF	https://arxiv.org/pdf/1909.10699v1.pdf
PWC	https://paperswithcode.com/paper/litgen-genetic-literature-recommendation
Repo	https://github.com/windweller/ClinGenML
Framework	none