February 2, 2020

3073 words 15 mins read

Paper Group AWR 41

MUREL: Multimodal Relational Reasoning for Visual Question Answering. Rates of Convergence for Sparse Variational Gaussian Process Regression. Sliced Wasserstein Generative Models. Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences. Frequency Principle: Fourier Analysis Sheds Light on Deep Neura …

MUREL: Multimodal Relational Reasoning for Visual Question Answering


Title	MUREL: Multimodal Relational Reasoning for Visual Question Answering
Authors	Remi Cadene, Hedi Ben-younes, Matthieu Cord, Nicolas Thome
Abstract	Multimodal attentional networks are currently state-of-the-art models for Visual Question Answering (VQA) tasks involving real images. Although attention allows to focus on the visual content relevant to the question, this simple mechanism is arguably insufficient to model complex reasoning features required for VQA or other high-level tasks. In this paper, we propose MuRel, a multimodal relational network which is learned end-to-end to reason over real images. Our first contribution is the introduction of the MuRel cell, an atomic reasoning primitive representing interactions between question and image regions by a rich vectorial representation, and modeling region relations with pairwise combinations. Secondly, we incorporate the cell into a full MuRel network, which progressively refines visual and question interactions, and can be leveraged to define visualization schemes finer than mere attention maps. We validate the relevance of our approach with various ablation studies, and show its superiority to attention-based methods on three datasets: VQA 2.0, VQA-CP v2 and TDIUC. Our final MuRel network is competitive to or outperforms state-of-the-art results in this challenging context. Our code is available: https://github.com/Cadene/murel.bootstrap.pytorch
Tasks	Relational Reasoning, Visual Question Answering
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09487v1
PDF	http://arxiv.org/pdf/1902.09487v1.pdf
PWC	https://paperswithcode.com/paper/murel-multimodal-relational-reasoning-for
Repo	https://github.com/Cadene/murel.bootstrap.pytorch
Framework	pytorch

Rates of Convergence for Sparse Variational Gaussian Process Regression


Title	Rates of Convergence for Sparse Variational Gaussian Process Regression
Authors	David R. Burt, Carl E. Rasmussen, Mark van der Wilk
Abstract	Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ being the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We address this by characterising the behavior of an upper bound on the KL divergence to the posterior. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case of interest is that for regression with normally distributed inputs in D-dimensions with the popular Squared Exponential kernel, $M=\mathcal{O}(\log^D N)$ is sufficient. Our results show that as datasets grow, Gaussian process posteriors can truly be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.
Tasks	Continual Learning
Published	2019-03-08
URL	https://arxiv.org/abs/1903.03571v3
PDF	https://arxiv.org/pdf/1903.03571v3.pdf
PWC	https://paperswithcode.com/paper/rates-of-convergence-for-sparse-variational
Repo	https://github.com/DavidBurt2/Rates-of-Convergence-SGPR
Framework	none

Sliced Wasserstein Generative Models


Title	Sliced Wasserstein Generative Models
Authors	Jiqing Wu, Zhiwu Huang, Dinesh Acharya, Wen Li, Janine Thoma, Danda Pani Paudel, Luc Van Gool
Abstract	In generative modeling, the Wasserstein distance (WD) has emerged as a useful metric to measure the discrepancy between generated and real data distributions. Unfortunately, it is challenging to approximate the WD of high-dimensional distributions. In contrast, the sliced Wasserstein distance (SWD) factorizes high-dimensional distributions into their multiple one-dimensional marginal distributions and is thus easier to approximate. In this paper, we introduce novel approximations of the primal and dual SWD. Instead of using a large number of random projections, as it is done by conventional SWD approximation methods, we propose to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion. As concrete applications of our SWD approximations, we design two types of differentiable SWD blocks to equip modern generative frameworks—Auto-Encoders (AE) and Generative Adversarial Networks (GAN). In the experiments, we not only show the superiority of the proposed generative models on standard image synthesis benchmarks, but also demonstrate the state-of-the-art performance on challenging high resolution image and video generation in an unsupervised manner.
Tasks	Image Generation, Video Generation
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05408v2
PDF	http://arxiv.org/pdf/1904.05408v2.pdf
PWC	https://paperswithcode.com/paper/sliced-wasserstein-generative-models-1
Repo	https://github.com/musikisomorphie/swd
Framework	tf

Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences


Title	Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences
Authors	Yixin Cao, Xiang Wang, Xiangnan He, Zikun hu, Tat-Seng Chua
Abstract	Incorporating knowledge graph (KG) into recommender system is promising in improving the recommendation accuracy and explainability. However, existing methods largely assume that a KG is complete and simply transfer the “knowledge” in KG at the shallow level of entity raw data or embeddings. This may lead to suboptimal performance, since a practical KG can hardly be complete, and it is common that a KG has missing facts, relations, and entities. Thus, we argue that it is crucial to consider the incomplete nature of KG when incorporating it into recommender system. In this paper, we jointly learn the model of recommendation and knowledge graph completion. Distinct from previous KG-based recommendation methods, we transfer the relation information in KG, so as to understand the reasons that a user likes an item. As an example, if a user has watched several movies directed by (relation) the same person (entity), we can infer that the director relation plays a critical role when the user makes the decision, thus help to understand the user’s preference at a finer granularity. Technically, we contribute a new translation-based recommendation model, which specially accounts for various preferences in translating a user to an item, and then jointly train it with a KG completion model by combining several transfer schemes. Extensive experiments on two benchmark datasets show that our method outperforms state-of-the-art KG-based recommendation methods. Further analysis verifies the positive effect of joint training on both tasks of recommendation and KG completion, and the advantage of our model in understanding user preference. We publish our project at https://github.com/TaoMiner/joint-kg-recommender.
Tasks	Knowledge Graph Completion, Recommendation Systems
Published	2019-02-17
URL	http://arxiv.org/abs/1902.06236v1
PDF	http://arxiv.org/pdf/1902.06236v1.pdf
PWC	https://paperswithcode.com/paper/unifying-knowledge-graph-learning-and
Repo	https://github.com/TaoMiner/joint-kg-recommender
Framework	pytorch

Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks


Title	Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks
Authors	Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, Zheng Ma
Abstract	We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) — DNNs often fit target functions from low to high frequencies — on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16. This F-Principle of DNNs is opposite to the behavior of most conventional iterative numerical schemes (e.g., Jacobi method), which exhibit faster convergence for higher frequencies for various scientific computing problems. With a simple theory, we illustrate that this F-Principle results from the regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or randomized dataset.
Tasks
Published	2019-01-19
URL	https://arxiv.org/abs/1901.06523v5
PDF	https://arxiv.org/pdf/1901.06523v5.pdf
PWC	https://paperswithcode.com/paper/frequency-principle-fourier-analysis-sheds
Repo	https://github.com/xuzhiqin1990/F-Principle
Framework	tf

Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation


Title	Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Authors	Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei
Abstract	Recently, Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification. In this paper, we study NAS for semantic image segmentation. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Auto-DeepLab, our architecture searched specifically for semantic image segmentation, attains state-of-the-art performance without any ImageNet pretraining.
Tasks	Image Classification, Neural Architecture Search, Semantic Segmentation
Published	2019-01-10
URL	http://arxiv.org/abs/1901.02985v2
PDF	http://arxiv.org/pdf/1901.02985v2.pdf
PWC	https://paperswithcode.com/paper/auto-deeplab-hierarchical-neural-architecture
Repo	https://github.com/NoamRosenberg/AutoML
Framework	pytorch

Correlation-aware Adversarial Domain Adaptation and Generalization


Title	Correlation-aware Adversarial Domain Adaptation and Generalization
Authors	Mohammad Mahfujur Rahman, Clinton Fookes, Mahsa Baktashmotlagh, Sridha Sridharan
Abstract	Domain adaptation (DA) and domain generalization (DG) have emerged as a solution to the domain shift problem where the distribution of the source and target data is different. The task of DG is more challenging than DA as the target data is totally unseen during the training phase in DG scenarios. The current state-of-the-art employs adversarial techniques, however, these are rarely considered for the DG problem. Furthermore, these approaches do not consider correlation alignment which has been proven highly beneficial for minimizing domain discrepancy. In this paper, we propose a correlation-aware adversarial DA and DG framework where the features of the source and target data are minimized using correlation alignment along with adversarial learning. Incorporating the correlation alignment module along with adversarial learning helps to achieve a more domain agnostic model due to the improved ability to reduce domain discrepancy with unlabeled target data more effectively. Experiments on benchmark datasets serve as evidence that our proposed method yields improved state-of-the-art performance.
Tasks	Domain Adaptation, Domain Generalization
Published	2019-11-29
URL	https://arxiv.org/abs/1911.12983v1
PDF	https://arxiv.org/pdf/1911.12983v1.pdf
PWC	https://paperswithcode.com/paper/correlation-aware-adversarial-domain
Repo	https://github.com/mahfujur1/CA-DA-DG
Framework	none

Domain Generalization via Model-Agnostic Learning of Semantic Features


Title	Domain Generalization via Model-Agnostic Learning of Semantic Features
Authors	Qi Dou, Daniel C. Castro, Konstantinos Kamnitsas, Ben Glocker
Abstract	Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge about inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.
Tasks	Domain Generalization, Medical Image Segmentation, Metric Learning, Object Recognition, Semantic Segmentation
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13580v1
PDF	https://arxiv.org/pdf/1910.13580v1.pdf
PWC	https://paperswithcode.com/paper/domain-generalization-via-model-agnostic
Repo	https://github.com/biomedia-mira/masf
Framework	tf

GluonTS: Probabilistic Time Series Models in Python


Title	GluonTS: Probabilistic Time Series Models in Python
Authors	Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, Yuyang Wang
Abstract	We introduce Gluon Time Series (GluonTS, available at https://gluon-ts.mxnet.io), a library for deep-learning-based time series modeling. GluonTS simplifies the development of and experimentation with time series models for common tasks such as forecasting or anomaly detection. It provides all necessary components and tools that scientists need for quickly building new models, for efficiently running and analyzing experiments and for evaluating model accuracy.
Tasks	Anomaly Detection, Time Series, Time Series Forecasting, Time Series Prediction
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05264v2
PDF	https://arxiv.org/pdf/1906.05264v2.pdf
PWC	https://paperswithcode.com/paper/gluonts-probabilistic-time-series-models-in
Repo	https://github.com/mbohlkeschneider/gluon-ts
Framework	mxnet

Decentralized & Collaborative AI on Blockchain


Title	Decentralized & Collaborative AI on Blockchain
Authors	Justin D. Harris, Bo Waggoner
Abstract	Machine learning has recently enabled large advances in artificial intelligence, but these tend to be highly centralized. The large datasets required are generally proprietary; predictions are often sold on a per-query basis; and published models can quickly become out of date without effort to acquire more data and re-train them. We propose a framework for participants to collaboratively build a dataset and use smart contracts to host a continuously updated model. This model will be shared publicly on a blockchain where it can be free to use for inference. Ideal learning problems include scenarios where a model is used many times for similar input such as personal assistants, playing games, recommender systems, etc. In order to maintain the model’s accuracy with respect to some test set we propose both financial and non-financial (gamified) incentive structures for providing good data. A free and open source implementation for the Ethereum blockchain is provided at https://github.com/microsoft/0xDeCA10B.
Tasks	Recommendation Systems
Published	2019-07-16
URL	https://arxiv.org/abs/1907.07247v1
PDF	https://arxiv.org/pdf/1907.07247v1.pdf
PWC	https://paperswithcode.com/paper/decentralized-collaborative-ai-on-blockchain
Repo	https://github.com/microsoft/0xDeCA10B
Framework	none

Learning to Find Common Objects Across Few Image Collections


Title	Learning to Find Common Objects Across Few Image Collections
Authors	Amirreza Shaban, Amir Rahimi, Shray Bansal, Stephen Gould, Byron Boots, Richard Hartley
Abstract	Given a collection of bags where each bag is a set of images, our goal is to select one image from each bag such that the selected images are from the same object class. We model the selection as an energy minimization problem with unary and pairwise potential functions. Inspired by recent few-shot learning algorithms, we propose an approach to learn the potential functions directly from the data. Furthermore, we propose a fast greedy inference algorithm for energy minimization. We evaluate our approach on few-shot common object recognition as well as object co-localization tasks. Our experiments show that learning the pairwise and unary terms greatly improves the performance of the model over several well-known methods for these tasks. The proposed greedy optimization algorithm achieves performance comparable to state-of-the-art structured inference algorithms while being ~10 times faster. The code is publicly available on https://github.com/haamoon/finding_common_object.
Tasks	Few-Shot Learning, Object Recognition
Published	2019-04-29
URL	https://arxiv.org/abs/1904.12936v2
PDF	https://arxiv.org/pdf/1904.12936v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-find-common-objects-across-image
Repo	https://github.com/haamoon/finding_common_object
Framework	tf

Revealing the Dark Secrets of BERT


Title	Revealing the Dark Secrets of BERT
Authors	Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky
Abstract	BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out a qualitative and quantitative analysis of the information encoded by the individual BERT’s heads. Our findings suggest that there is a limited set of attention patterns that are repeated across different heads, indicating the overall model overparametrization. While different heads consistently use the same attention patterns, they have varying impact on performance across different tasks. We show that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models.
Tasks
Published	2019-08-21
URL	https://arxiv.org/abs/1908.08593v2
PDF	https://arxiv.org/pdf/1908.08593v2.pdf
PWC	https://paperswithcode.com/paper/revealing-the-dark-secrets-of-bert
Repo	https://github.com/KzKe/Transformer-models
Framework	none

User Intent Prediction in Information-seeking Conversations


Title	User Intent Prediction in Information-seeking Conversations
Authors	Chen Qu, Liu Yang, Bruce Croft, Yongfeng Zhang, Johanne R. Trippas, Minghui Qiu
Abstract	Conversational assistants are being progressively adopted by the general population. However, they are not capable of handling complicated information-seeking tasks that involve multiple turns of information exchange. Due to the limited communication bandwidth in conversational search, it is important for conversational assistants to accurately detect and predict user intent in information-seeking conversations. In this paper, we investigate two aspects of user intent prediction in an information-seeking setting. First, we extract features based on the content, structural, and sentiment characteristics of a given utterance, and use classic machine learning methods to perform user intent prediction. We then conduct an in-depth feature importance analysis to identify key features in this prediction task. We find that structural features contribute most to the prediction performance. Given this finding, we construct neural classifiers to incorporate context information and achieve better performance without feature engineering. Our findings can provide insights into the important factors and effective methods of user intent prediction in information-seeking conversations.
Tasks	Feature Engineering, Feature Importance
Published	2019-01-11
URL	http://arxiv.org/abs/1901.03489v1
PDF	http://arxiv.org/pdf/1901.03489v1.pdf
PWC	https://paperswithcode.com/paper/user-intent-prediction-in-information-seeking
Repo	https://github.com/prdwb/UserIntentPrediction
Framework	none

A Simple BERT-Based Approach for Lexical Simplification


Title	A Simple BERT-Based Approach for Lexical Simplification
Authors	Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu
Abstract	Lexical simplification (LS) aims to replace complex words in a given sentence with their simpler alternatives of equivalent meaning. Recently unsupervised lexical simplification approaches only rely on the complex word itself regardless of the given sentence to generate candidate substitutions, which will inevitably produce a large number of spurious candidates. We present a simple BERT-based LS approach that makes use of the pre-trained unsupervised deep bidirectional representations BERT. Despite being entirely unsupervised, experimental results show that our approach obtains obvious improvement than these baselines leveraging linguistic databases and parallel corpus, outperforming the state-of-the-art by more than 11 Accuracy points on three well-known benchmarks.
Tasks	Language Modelling, Lexical Simplification
Published	2019-07-14
URL	https://arxiv.org/abs/1907.06226v4
PDF	https://arxiv.org/pdf/1907.06226v4.pdf
PWC	https://paperswithcode.com/paper/a-simple-bert-based-approach-for-lexical
Repo	https://github.com/qiang2100/BERT-LS
Framework	pytorch

SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula


Title	SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula
Authors	Colin Wan, Zheng Li, Alicia Guo, Yue Zhao
Abstract	Synthetic population generation is the process of combining multiple socioeconomic and demographic datasets from different sources and/or granularity levels, and downscaling them to an individual level. Although it is a fundamental step for many data science tasks, an efficient and standard framework is absent. In this study, we propose a multi-stage framework called SynC (Synthetic Population via Gaussian Copula) to fill the gap. SynC first removes potential outliers in the data and then fits the filtered data with a Gaussian copula model to correctly capture dependencies and marginal distributions of sampled survey data. Finally, SynC leverages predictive models to merge datasets into one and then scales them accordingly to match the marginal constraints. We make three key contributions in this work: 1) propose a novel framework for generating individual level data from aggregated data sources by combining state-of-the-art machine learning and statistical techniques, 2) demonstrate its value as a feature engineering tool, as well as an alternative to data collection in situations where gathering is difficult through two real-world datasets, 3) release an easy-to-use framework implementation for reproducibility, and 4) ensure the methodology is scalable at the production level and can easily incorporate new data.
Tasks	Feature Engineering
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07998v2
PDF	https://arxiv.org/pdf/1904.07998v2.pdf
PWC	https://paperswithcode.com/paper/sync-a-unified-framework-for-generating
Repo	https://github.com/winstonll/SynC
Framework	none