January 31, 2020

2514 words 12 mins read

Paper Group AWR 416

Kernel Graph Attention Network for Fact Verification. Learning Individual Styles of Conversational Gesture. Improving Adversarial Robustness via Guided Complement Entropy. Simple Applications of BERT for Ad Hoc Document Retrieval. Memeify: A Large-Scale Meme Generation System. Object-Oriented Dynamics Learning through Multi-Level Abstraction. Scala …

Kernel Graph Attention Network for Fact Verification


Title	Kernel Graph Attention Network for Fact Verification
Authors	Zhenghao Liu, Chenyan Xiong, Maosong Sun
Abstract	This paper presents Kernel Graph Attention Network (KGAT), which conducts more fine-grained evidence selection and reasoning for the fact verification task. Given a claim and a set of potential supporting evidence sentences, KGAT constructs a graph attention network using the evidence sentences as its nodes and learns to verify the claim integrity using its edge kernels and node kernels, where the edge kernels learn to propagate information across the evidence graph, and the node kernels learn to merge node level information to the graph level. KGAT reaches a comparable performance (69.4%) on FEVER, a large-scale benchmark for fact verification. Our experiments find that KGAT thrives on verification scenarios where multiple evidence pieces are required. This advantage mainly comes from the sparse and fine-grained attention mechanisms from our kernel technique.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09796v2
PDF	https://arxiv.org/pdf/1910.09796v2.pdf
PWC	https://paperswithcode.com/paper/kernel-graph-attention-network-for-fact
Repo	https://github.com/thunlp/KernelGAT
Framework	pytorch

Learning Individual Styles of Conversational Gesture


Title	Learning Individual Styles of Conversational Gesture
Authors	Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik
Abstract	Human speech is often accompanied by hand and arm gestures. Given audio speech input, we generate plausible gestures to go along with the sound. Specifically, we perform cross-modal translation from “in-the-wild’’ monologue speech of a single speaker to their hand and arm motion. We train on unlabeled videos for which we only have noisy pseudo ground truth from an automatic pose detection system. Our proposed model significantly outperforms baseline methods in a quantitative comparison. To support research toward obtaining a computational understanding of the relationship between gesture and speech, we release a large video dataset of person-specific gestures. The project website with video, code and data can be found at http://people.eecs.berkeley.edu/~shiry/speech2gesture .
Tasks	Speech-to-Gesture Translation
Published	2019-06-10
URL	https://arxiv.org/abs/1906.04160v1
PDF	https://arxiv.org/pdf/1906.04160v1.pdf
PWC	https://paperswithcode.com/paper/learning-individual-styles-of-conversational-1
Repo	https://github.com/amirbar/speech2gesture
Framework	none

Improving Adversarial Robustness via Guided Complement Entropy


Title	Improving Adversarial Robustness via Guided Complement Entropy
Authors	Hao-Yun Chen, Jhao-Hong Liang, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan
Abstract	Adversarial robustness has emerged as an important topic in deep learning as carefully crafted attack samples can significantly disturb the performance of a model. Many recent methods have proposed to improve adversarial robustness by utilizing adversarial training or model distillation, which adds additional procedures to model training. In this paper, we propose a new training paradigm called Guided Complement Entropy (GCE) that is capable of achieving “adversarial defense for free,” which involves no additional procedures in the process of improving adversarial robustness. In addition to maximizing model probabilities on the ground-truth class like cross-entropy, we neutralize its probabilities on the incorrect classes along with a “guided” term to balance between these two terms. We show in the experiments that our method achieves better model robustness with even better performance compared to the commonly used cross-entropy training objective. We also show that our method can be used orthogonal to adversarial training across well-known methods with noticeable robustness gain. To the best of our knowledge, our approach is the first one that improves model robustness without compromising performance.
Tasks	Adversarial Defense
Published	2019-03-23
URL	https://arxiv.org/abs/1903.09799v3
PDF	https://arxiv.org/pdf/1903.09799v3.pdf
PWC	https://paperswithcode.com/paper/improving-adversarial-robustness-via-guided
Repo	https://github.com/Line290/FeatureAttack
Framework	pytorch

Simple Applications of BERT for Ad Hoc Document Retrieval


Title	Simple Applications of BERT for Ad Hoc Document Retrieval
Authors	Wei Yang, Haotian Zhang, Jimmy Lin
Abstract	Following recent successes in applying BERT to question answering, we explore simple applications to ad hoc document retrieval. This required confronting the challenge posed by documents that are typically longer than the length of input BERT was designed to handle. We address this issue by applying inference on sentences individually, and then aggregating sentence scores to produce document scores. Experiments on TREC microblog and newswire test collections show that our approach is simple yet effective, as we report the highest average precision on these datasets by neural approaches that we are aware of.
Tasks	Ad-Hoc Information Retrieval, Question Answering
Published	2019-03-26
URL	http://arxiv.org/abs/1903.10972v1
PDF	http://arxiv.org/pdf/1903.10972v1.pdf
PWC	https://paperswithcode.com/paper/simple-applications-of-bert-for-ad-hoc
Repo	https://github.com/castorini/birch
Framework	pytorch

Memeify: A Large-Scale Meme Generation System


Title	Memeify: A Large-Scale Meme Generation System
Authors	Suryatej Reddy Vyalla, Vishaal Udandarao, Tanmoy Chakraborty
Abstract	Interest in the research areas related to meme propagation and generation has been increasing rapidly in the last couple of years. Meme datasets available online are either specific to a context or contain no class information. Here, we prepare a large-scale dataset of memes with captions and class labels. The dataset consists of 1.1 million meme captions from 128 classes. We also provide reasoning for the existence of broad categories, called “themes” across the meme dataset; each theme consists of multiple meme classes. Our generation system uses a trained state-of-the-art transformer-based model for caption generation by employing an encoder-decoder architecture. We develop a web interface, called Memeify for users to generate memes of their choice, and explain in detail, the working of individual components of the system. We also perform a qualitative evaluation of the generated memes by conducting a user study. A link to the demonstration of the Memeify system is https://youtu.be/P_Tfs0X-czs.
Tasks
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12279v2
PDF	https://arxiv.org/pdf/1910.12279v2.pdf
PWC	https://paperswithcode.com/paper/memeify-a-large-scale-meme-generation-system
Repo	https://github.com/suryatejreddy/Memeify
Framework	none

Object-Oriented Dynamics Learning through Multi-Level Abstraction


Title	Object-Oriented Dynamics Learning through Multi-Level Abstraction
Authors	Guangxiang Zhu, Jianhao Wang, Zhizhou Ren, Zichuan Lin, Chongjie Zhang
Abstract	Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability. However, existing approaches suffer from structural limitations and optimization difficulties for common environments with multiple dynamic objects. In this paper, we present a novel self-supervised learning framework, called Multi-level Abstraction Object-oriented Predictor (MAOP), which employs a three-level learning architecture that enables efficient object-based dynamics learning from raw visual observations. We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability. Our results show that MAOP significantly outperforms previous methods in terms of sample efficiency and generalization over novel environments for learning environment models. We also demonstrate that learned dynamics models enable efficient planning in unseen environments, comparable to true environment models. In addition, MAOP learns semantically and visually interpretable disentangled representations.
Tasks	Relational Reasoning
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07482v4
PDF	https://arxiv.org/pdf/1904.07482v4.pdf
PWC	https://paperswithcode.com/paper/object-oriented-dynamics-learning-through
Repo	https://github.com/mig-zh/OODP
Framework	tf

Scalable Learning-Based Sampling Optimization for Compressive Dynamic MRI


Title	Scalable Learning-Based Sampling Optimization for Compressive Dynamic MRI
Authors	Thomas Sanchez, Baran Gözcü, Ruud B. van Heeswijk, Armin Eftekhari, Efe Ilıcak, Tolga Çukur, Volkan Cevher
Abstract	Compressed sensing applied to magnetic resonance imaging (MRI) allows to reduce the scanning time by enabling images to be reconstructed from highly undersampled data. In this paper, we tackle the problem of designing a sampling mask for an arbitrary reconstruction method and a limited acquisition budget. Namely, we look for an optimal probability distribution from which a mask with a fixed cardinality is drawn. We demonstrate that this problem admits a compactly supported solution, which leads to a deterministic optimal sampling mask. We then propose a stochastic greedy algorithm that (i) provides an approximate solution to this problem, and (ii) resolves the scaling issues of [1,2]. We validate its performance on in vivo dynamic MRI with retrospective undersampling, showing that our method preserves the performance of [1,2] while reducing the computational burden by a factor close to 200.
Tasks
Published	2019-02-01
URL	https://arxiv.org/abs/1902.00386v5
PDF	https://arxiv.org/pdf/1902.00386v5.pdf
PWC	https://paperswithcode.com/paper/scalable-learning-based-sampling-optimization
Repo	https://github.com/t-sanchez/stochasticGreedyMRI
Framework	none

Spatio-temporal crop classification of low-resolution satellite imagery with capsule layers and distributed attention


Title	Spatio-temporal crop classification of low-resolution satellite imagery with capsule layers and distributed attention
Authors	John Brandt
Abstract	Land use classification of low resolution spatial imagery is one of the most extensively researched fields in remote sensing. Despite significant advancements in satellite technology, high resolution imagery lacks global coverage and can be prohibitively expensive to procure for extended time periods. Accurately classifying land use change without high resolution imagery offers the potential to monitor vital aspects of global development agenda including climate smart agriculture, drought resistant crops, and sustainable land management. Utilizing a combination of capsule layers and long-short term memory layers with distributed attention, the present paper achieves state-of-the-art accuracy on temporal crop type classification at a 30x30m resolution with Sentinel 2 imagery.
Tasks	Crop Classification
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10130v1
PDF	http://arxiv.org/pdf/1904.10130v1.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-crop-classification-of-low
Repo	https://github.com/JohnMBrandt/capsule-attention-networks
Framework	none

Gradient Methods for Solving Stackelberg Games


Title	Gradient Methods for Solving Stackelberg Games
Authors	Roi Naveiro, David Ríos Insua
Abstract	Stackelberg Games are gaining importance in the last years due to the raise of Adversarial Machine Learning (AML). Within this context, a new paradigm must be faced: in classical game theory, intervening agents were humans whose decisions are generally discrete and low dimensional. In AML, decisions are made by algorithms and are usually continuous and high dimensional, e.g. choosing the weights of a neural network. As closed form solutions for Stackelberg games generally do not exist, it is mandatory to have efficient algorithms to search for numerical solutions. We study two different procedures for solving this type of games using gradient methods. We study time and space scalability of both approaches and discuss in which situation it is more appropriate to use each of them. Finally, we illustrate their use in an adversarial prediction problem.
Tasks
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06901v3
PDF	https://arxiv.org/pdf/1908.06901v3.pdf
PWC	https://paperswithcode.com/paper/gradient-methods-for-solving-stackelberg
Repo	https://github.com/roinaveiro/GM_SG
Framework	pytorch

Learning to navigate image manifolds induced by generative adversarial networks for unsupervised video generation


Title	Learning to navigate image manifolds induced by generative adversarial networks for unsupervised video generation
Authors	Isabela Albuquerque, João Monteiro, Tiago H. Falk
Abstract	In this work, we introduce a two-step framework for generative modeling of temporal data. Specifically, the generative adversarial networks (GANs) setting is employed to generate synthetic scenes of moving objects. To do so, we propose a two-step training scheme within which: a generator of static frames is trained first. Afterwards, a recurrent model is trained with the goal of providing a sequence of inputs to the previously trained frames generator, thus yielding scenes which look natural. The adversarial setting is employed in both training steps. However, with the aim of avoiding known training instabilities in GANs, a multiple discriminator approach is used to train both models. Results in the studied video dataset indicate that, by employing such an approach, the recurrent part is able to learn how to coherently navigate the image manifold induced by the frames generator, thus yielding more natural-looking scenes.
Tasks	Video Generation
Published	2019-01-23
URL	http://arxiv.org/abs/1901.11384v1
PDF	http://arxiv.org/pdf/1901.11384v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-navigate-image-manifolds-induced
Repo	https://github.com/belaalb/frameGAN
Framework	pytorch

Video Person Re-ID: Fantastic Techniques and Where to Find Them


Title	Video Person Re-ID: Fantastic Techniques and Where to Find Them
Authors	Priyank Pathak, Amir Erfan Eshratifar, Michael Gormish
Abstract	The ability to identify the same person from multiple camera views without the explicit use of facial recognition is receiving commercial and academic interest. The current status-quo solutions are based on attention neural models. In this paper, we propose Attention and CL loss, which is a hybrid of center and Online Soft Mining (OSM) loss added to the attention loss on top of a temporal attention-based neural network. The proposed loss function applied with bag-of-tricks for training surpasses the state of the art on the common person Re-ID datasets, MARS and PRID 2011. Our source code is publicly available on github.
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2019-11-21
URL	https://arxiv.org/abs/1912.05295v1
PDF	https://arxiv.org/pdf/1912.05295v1.pdf
PWC	https://paperswithcode.com/paper/video-person-re-id-fantastic-techniques-and
Repo	https://github.com/ppriyank/Video-Person-Re-ID-Fantastic-Techniques-and-Where-to-Find-Them
Framework	pytorch

Modular Multimodal Architecture for Document Classification


Title	Modular Multimodal Architecture for Document Classification
Authors	Tyler Dauphinee, Nikunj Patel, Mohammad Rashidi
Abstract	Page classification is a crucial component to any document analysis system, allowing for complex branching control flows for different components of a given document. Utilizing both the visual and textual content of a page, the proposed method exceeds the current state-of-the-art performance on the RVL-CDIP benchmark at 93.03% test accuracy.
Tasks	Document Classification
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04376v1
PDF	https://arxiv.org/pdf/1912.04376v1.pdf
PWC	https://paperswithcode.com/paper/modular-multimodal-architecture-for-document
Repo	https://github.com/microsoft/unilm/tree/master/layoutlm
Framework	pytorch

FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension


Title	FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension
Authors	Yi-Ting Yeh, Yun-Nung Chen
Abstract	Conversational machine comprehension requires deep understanding of the dialogue flow, and the prior work proposed FlowQA to implicitly model the context representations in reasoning for better understanding. This paper proposes to explicitly model the information gain through dialogue reasoning in order to allow the model to focus on more informative cues. The proposed model achieves state-of-the-art performance in a conversational QA dataset QuAC and sequential instruction understanding dataset SCONE, which shows the effectiveness of the proposed mechanism and demonstrates its capability of generalization to different QA models and tasks.
Tasks	Reading Comprehension
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05117v3
PDF	https://arxiv.org/pdf/1908.05117v3.pdf
PWC	https://paperswithcode.com/paper/flowdelta-modeling-flow-information-gain-in
Repo	https://github.com/MiuLab/FlowDelta
Framework	pytorch

Massively Multilingual Transfer for NER


Title	Massively Multilingual Transfer for NER
Authors	Afshin Rahimi, Yuan Li, Trevor Cohn
Abstract	In cross-lingual transfer, NLP models over one or more source languages are applied to a low-resource target language. While most prior work has used a single source model or a few carefully selected models, here we consider a `massive’ setting with many such models. This setting raises the problem of poor transfer, particularly from distant languages. We propose two techniques for modulating the transfer, suitable for zero-shot or few-shot learning, respectively. Evaluating on named entity recognition, we show that our techniques are much more effective than strong baselines, including standard ensembling, and our unsupervised method rivals oracle selection of the single best individual model. \|
Tasks	Cross-Lingual Transfer, Few-Shot Learning, Named Entity Recognition
Published	2019-02-01
URL	https://arxiv.org/abs/1902.00193v4
PDF	https://arxiv.org/pdf/1902.00193v4.pdf
PWC	https://paperswithcode.com/paper/multilingual-ner-transfer-for-low-resource
Repo	https://github.com/afshinrahimi/mmner
Framework	tf

The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning


Title	The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning
Authors	Bonggun Shin, Hao Yang, Jinho D. Choi
Abstract	Recent advances in deep learning have facilitated the demand of neural models for real applications. In practice, these applications often need to be deployed with limited resources while keeping high accuracy. This paper touches the core of neural models in NLP, word embeddings, and presents a new embedding distillation framework that remarkably reduces the dimension of word embeddings without compromising accuracy. A novel distillation ensemble approach is also proposed that trains a high-efficient student model using multiple teacher models. In our approach, the teacher models play roles only during training such that the student model operates on its own without getting supports from the teacher models during decoding, which makes it eighty times faster and lighter than other typical ensemble methods. All models are evaluated on seven document classification datasets and show a significant advantage over the teacher models for most cases. Our analysis depicts insightful transformation of word embeddings from distillation and suggests a future direction to ensemble approaches using neural models.
Tasks	Document Classification, Word Embeddings
Published	2019-05-31
URL	https://arxiv.org/abs/1906.00095v1
PDF	https://arxiv.org/pdf/1906.00095v1.pdf
PWC	https://paperswithcode.com/paper/190600095
Repo	https://github.com/bgshin/distill_demo
Framework	tf