Paper Group AWR 41
MUREL: Multimodal Relational Reasoning for Visual Question Answering. Rates of Convergence for Sparse Variational Gaussian Process Regression. Sliced Wasserstein Generative Models. Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences. Frequency Principle: Fourier Analysis Sheds Light on Deep Neura …
MUREL: Multimodal Relational Reasoning for Visual Question Answering
Title | MUREL: Multimodal Relational Reasoning for Visual Question Answering |
Authors | Remi Cadene, Hedi Ben-younes, Matthieu Cord, Nicolas Thome |
Abstract | Multimodal attentional networks are currently state-of-the-art models for Visual Question Answering (VQA) tasks involving real images. Although attention allows to focus on the visual content relevant to the question, this simple mechanism is arguably insufficient to model complex reasoning features required for VQA or other high-level tasks. In this paper, we propose MuRel, a multimodal relational network which is learned end-to-end to reason over real images. Our first contribution is the introduction of the MuRel cell, an atomic reasoning primitive representing interactions between question and image regions by a rich vectorial representation, and modeling region relations with pairwise combinations. Secondly, we incorporate the cell into a full MuRel network, which progressively refines visual and question interactions, and can be leveraged to define visualization schemes finer than mere attention maps. We validate the relevance of our approach with various ablation studies, and show its superiority to attention-based methods on three datasets: VQA 2.0, VQA-CP v2 and TDIUC. Our final MuRel network is competitive to or outperforms state-of-the-art results in this challenging context. Our code is available: https://github.com/Cadene/murel.bootstrap.pytorch |
Tasks | Relational Reasoning, Visual Question Answering |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09487v1 |
http://arxiv.org/pdf/1902.09487v1.pdf | |
PWC | https://paperswithcode.com/paper/murel-multimodal-relational-reasoning-for |
Repo | https://github.com/Cadene/murel.bootstrap.pytorch |
Framework | pytorch |
Rates of Convergence for Sparse Variational Gaussian Process Regression
Title | Rates of Convergence for Sparse Variational Gaussian Process Regression |
Authors | David R. Burt, Carl E. Rasmussen, Mark van der Wilk |
Abstract | Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ being the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We address this by characterising the behavior of an upper bound on the KL divergence to the posterior. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case of interest is that for regression with normally distributed inputs in D-dimensions with the popular Squared Exponential kernel, $M=\mathcal{O}(\log^D N)$ is sufficient. Our results show that as datasets grow, Gaussian process posteriors can truly be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios. |
Tasks | Continual Learning |
Published | 2019-03-08 |
URL | https://arxiv.org/abs/1903.03571v3 |
https://arxiv.org/pdf/1903.03571v3.pdf | |
PWC | https://paperswithcode.com/paper/rates-of-convergence-for-sparse-variational |
Repo | https://github.com/DavidBurt2/Rates-of-Convergence-SGPR |
Framework | none |
Sliced Wasserstein Generative Models
Title | Sliced Wasserstein Generative Models |
Authors | Jiqing Wu, Zhiwu Huang, Dinesh Acharya, Wen Li, Janine Thoma, Danda Pani Paudel, Luc Van Gool |
Abstract | In generative modeling, the Wasserstein distance (WD) has emerged as a useful metric to measure the discrepancy between generated and real data distributions. Unfortunately, it is challenging to approximate the WD of high-dimensional distributions. In contrast, the sliced Wasserstein distance (SWD) factorizes high-dimensional distributions into their multiple one-dimensional marginal distributions and is thus easier to approximate. In this paper, we introduce novel approximations of the primal and dual SWD. Instead of using a large number of random projections, as it is done by conventional SWD approximation methods, we propose to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion. As concrete applications of our SWD approximations, we design two types of differentiable SWD blocks to equip modern generative frameworks—Auto-Encoders (AE) and Generative Adversarial Networks (GAN). In the experiments, we not only show the superiority of the proposed generative models on standard image synthesis benchmarks, but also demonstrate the state-of-the-art performance on challenging high resolution image and video generation in an unsupervised manner. |
Tasks | Image Generation, Video Generation |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05408v2 |
http://arxiv.org/pdf/1904.05408v2.pdf | |
PWC | https://paperswithcode.com/paper/sliced-wasserstein-generative-models-1 |
Repo | https://github.com/musikisomorphie/swd |
Framework | tf |
Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences
Title | Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences |
Authors | Yixin Cao, Xiang Wang, Xiangnan He, Zikun hu, Tat-Seng Chua |
Abstract | Incorporating knowledge graph (KG) into recommender system is promising in improving the recommendation accuracy and explainability. However, existing methods largely assume that a KG is complete and simply transfer the “knowledge” in KG at the shallow level of entity raw data or embeddings. This may lead to suboptimal performance, since a practical KG can hardly be complete, and it is common that a KG has missing facts, relations, and entities. Thus, we argue that it is crucial to consider the incomplete nature of KG when incorporating it into recommender system. In this paper, we jointly learn the model of recommendation and knowledge graph completion. Distinct from previous KG-based recommendation methods, we transfer the relation information in KG, so as to understand the reasons that a user likes an item. As an example, if a user has watched several movies directed by (relation) the same person (entity), we can infer that the director relation plays a critical role when the user makes the decision, thus help to understand the user’s preference at a finer granularity. Technically, we contribute a new translation-based recommendation model, which specially accounts for various preferences in translating a user to an item, and then jointly train it with a KG completion model by combining several transfer schemes. Extensive experiments on two benchmark datasets show that our method outperforms state-of-the-art KG-based recommendation methods. Further analysis verifies the positive effect of joint training on both tasks of recommendation and KG completion, and the advantage of our model in understanding user preference. We publish our project at https://github.com/TaoMiner/joint-kg-recommender. |
Tasks | Knowledge Graph Completion, Recommendation Systems |
Published | 2019-02-17 |
URL | http://arxiv.org/abs/1902.06236v1 |
http://arxiv.org/pdf/1902.06236v1.pdf | |
PWC | https://paperswithcode.com/paper/unifying-knowledge-graph-learning-and |
Repo | https://github.com/TaoMiner/joint-kg-recommender |
Framework | pytorch |
Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks
Title | Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks |
Authors | Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, Zheng Ma |
Abstract | We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) — DNNs often fit target functions from low to high frequencies — on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16. This F-Principle of DNNs is opposite to the behavior of most conventional iterative numerical schemes (e.g., Jacobi method), which exhibit faster convergence for higher frequencies for various scientific computing problems. With a simple theory, we illustrate that this F-Principle results from the regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or randomized dataset. |
Tasks | |
Published | 2019-01-19 |
URL | https://arxiv.org/abs/1901.06523v5 |
https://arxiv.org/pdf/1901.06523v5.pdf | |
PWC | https://paperswithcode.com/paper/frequency-principle-fourier-analysis-sheds |
Repo | https://github.com/xuzhiqin1990/F-Principle |
Framework | tf |
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Title | Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation |
Authors | Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei |
Abstract | Recently, Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification. In this paper, we study NAS for semantic image segmentation. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Auto-DeepLab, our architecture searched specifically for semantic image segmentation, attains state-of-the-art performance without any ImageNet pretraining. |
Tasks | Image Classification, Neural Architecture Search, Semantic Segmentation |
Published | 2019-01-10 |
URL | http://arxiv.org/abs/1901.02985v2 |
http://arxiv.org/pdf/1901.02985v2.pdf | |
PWC | https://paperswithcode.com/paper/auto-deeplab-hierarchical-neural-architecture |
Repo | https://github.com/NoamRosenberg/AutoML |
Framework | pytorch |
Correlation-aware Adversarial Domain Adaptation and Generalization
Title | Correlation-aware Adversarial Domain Adaptation and Generalization |
Authors | Mohammad Mahfujur Rahman, Clinton Fookes, Mahsa Baktashmotlagh, Sridha Sridharan |
Abstract | Domain adaptation (DA) and domain generalization (DG) have emerged as a solution to the domain shift problem where the distribution of the source and target data is different. The task of DG is more challenging than DA as the target data is totally unseen during the training phase in DG scenarios. The current state-of-the-art employs adversarial techniques, however, these are rarely considered for the DG problem. Furthermore, these approaches do not consider correlation alignment which has been proven highly beneficial for minimizing domain discrepancy. In this paper, we propose a correlation-aware adversarial DA and DG framework where the features of the source and target data are minimized using correlation alignment along with adversarial learning. Incorporating the correlation alignment module along with adversarial learning helps to achieve a more domain agnostic model due to the improved ability to reduce domain discrepancy with unlabeled target data more effectively. Experiments on benchmark datasets serve as evidence that our proposed method yields improved state-of-the-art performance. |
Tasks | Domain Adaptation, Domain Generalization |
Published | 2019-11-29 |
URL | https://arxiv.org/abs/1911.12983v1 |
https://arxiv.org/pdf/1911.12983v1.pdf | |
PWC | https://paperswithcode.com/paper/correlation-aware-adversarial-domain |
Repo | https://github.com/mahfujur1/CA-DA-DG |
Framework | none |
Domain Generalization via Model-Agnostic Learning of Semantic Features
Title | Domain Generalization via Model-Agnostic Learning of Semantic Features |
Authors | Qi Dou, Daniel C. Castro, Konstantinos Kamnitsas, Ben Glocker |
Abstract | Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge about inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task. |
Tasks | Domain Generalization, Medical Image Segmentation, Metric Learning, Object Recognition, Semantic Segmentation |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13580v1 |
https://arxiv.org/pdf/1910.13580v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-generalization-via-model-agnostic |
Repo | https://github.com/biomedia-mira/masf |
Framework | tf |
GluonTS: Probabilistic Time Series Models in Python
Title | GluonTS: Probabilistic Time Series Models in Python |
Authors | Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, Yuyang Wang |
Abstract | We introduce Gluon Time Series (GluonTS, available at https://gluon-ts.mxnet.io), a library for deep-learning-based time series modeling. GluonTS simplifies the development of and experimentation with time series models for common tasks such as forecasting or anomaly detection. It provides all necessary components and tools that scientists need for quickly building new models, for efficiently running and analyzing experiments and for evaluating model accuracy. |
Tasks | Anomaly Detection, Time Series, Time Series Forecasting, Time Series Prediction |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05264v2 |
https://arxiv.org/pdf/1906.05264v2.pdf | |
PWC | https://paperswithcode.com/paper/gluonts-probabilistic-time-series-models-in |
Repo | https://github.com/mbohlkeschneider/gluon-ts |
Framework | mxnet |
Decentralized & Collaborative AI on Blockchain
Title | Decentralized & Collaborative AI on Blockchain |
Authors | Justin D. Harris, Bo Waggoner |
Abstract | Machine learning has recently enabled large advances in artificial intelligence, but these tend to be highly centralized. The large datasets required are generally proprietary; predictions are often sold on a per-query basis; and published models can quickly become out of date without effort to acquire more data and re-train them. We propose a framework for participants to collaboratively build a dataset and use smart contracts to host a continuously updated model. This model will be shared publicly on a blockchain where it can be free to use for inference. Ideal learning problems include scenarios where a model is used many times for similar input such as personal assistants, playing games, recommender systems, etc. In order to maintain the model’s accuracy with respect to some test set we propose both financial and non-financial (gamified) incentive structures for providing good data. A free and open source implementation for the Ethereum blockchain is provided at https://github.com/microsoft/0xDeCA10B. |
Tasks | Recommendation Systems |
Published | 2019-07-16 |
URL | https://arxiv.org/abs/1907.07247v1 |
https://arxiv.org/pdf/1907.07247v1.pdf | |
PWC | https://paperswithcode.com/paper/decentralized-collaborative-ai-on-blockchain |
Repo | https://github.com/microsoft/0xDeCA10B |
Framework | none |
Learning to Find Common Objects Across Few Image Collections
Title | Learning to Find Common Objects Across Few Image Collections |
Authors | Amirreza Shaban, Amir Rahimi, Shray Bansal, Stephen Gould, Byron Boots, Richard Hartley |
Abstract | Given a collection of bags where each bag is a set of images, our goal is to select one image from each bag such that the selected images are from the same object class. We model the selection as an energy minimization problem with unary and pairwise potential functions. Inspired by recent few-shot learning algorithms, we propose an approach to learn the potential functions directly from the data. Furthermore, we propose a fast greedy inference algorithm for energy minimization. We evaluate our approach on few-shot common object recognition as well as object co-localization tasks. Our experiments show that learning the pairwise and unary terms greatly improves the performance of the model over several well-known methods for these tasks. The proposed greedy optimization algorithm achieves performance comparable to state-of-the-art structured inference algorithms while being ~10 times faster. The code is publicly available on https://github.com/haamoon/finding_common_object. |
Tasks | Few-Shot Learning, Object Recognition |
Published | 2019-04-29 |
URL | https://arxiv.org/abs/1904.12936v2 |
https://arxiv.org/pdf/1904.12936v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-find-common-objects-across-image |
Repo | https://github.com/haamoon/finding_common_object |
Framework | tf |
Revealing the Dark Secrets of BERT
Title | Revealing the Dark Secrets of BERT |
Authors | Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky |
Abstract | BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out a qualitative and quantitative analysis of the information encoded by the individual BERT’s heads. Our findings suggest that there is a limited set of attention patterns that are repeated across different heads, indicating the overall model overparametrization. While different heads consistently use the same attention patterns, they have varying impact on performance across different tasks. We show that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models. |
Tasks | |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.08593v2 |
https://arxiv.org/pdf/1908.08593v2.pdf | |
PWC | https://paperswithcode.com/paper/revealing-the-dark-secrets-of-bert |
Repo | https://github.com/KzKe/Transformer-models |
Framework | none |
User Intent Prediction in Information-seeking Conversations
Title | User Intent Prediction in Information-seeking Conversations |
Authors | Chen Qu, Liu Yang, Bruce Croft, Yongfeng Zhang, Johanne R. Trippas, Minghui Qiu |
Abstract | Conversational assistants are being progressively adopted by the general population. However, they are not capable of handling complicated information-seeking tasks that involve multiple turns of information exchange. Due to the limited communication bandwidth in conversational search, it is important for conversational assistants to accurately detect and predict user intent in information-seeking conversations. In this paper, we investigate two aspects of user intent prediction in an information-seeking setting. First, we extract features based on the content, structural, and sentiment characteristics of a given utterance, and use classic machine learning methods to perform user intent prediction. We then conduct an in-depth feature importance analysis to identify key features in this prediction task. We find that structural features contribute most to the prediction performance. Given this finding, we construct neural classifiers to incorporate context information and achieve better performance without feature engineering. Our findings can provide insights into the important factors and effective methods of user intent prediction in information-seeking conversations. |
Tasks | Feature Engineering, Feature Importance |
Published | 2019-01-11 |
URL | http://arxiv.org/abs/1901.03489v1 |
http://arxiv.org/pdf/1901.03489v1.pdf | |
PWC | https://paperswithcode.com/paper/user-intent-prediction-in-information-seeking |
Repo | https://github.com/prdwb/UserIntentPrediction |
Framework | none |
A Simple BERT-Based Approach for Lexical Simplification
Title | A Simple BERT-Based Approach for Lexical Simplification |
Authors | Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu |
Abstract | Lexical simplification (LS) aims to replace complex words in a given sentence with their simpler alternatives of equivalent meaning. Recently unsupervised lexical simplification approaches only rely on the complex word itself regardless of the given sentence to generate candidate substitutions, which will inevitably produce a large number of spurious candidates. We present a simple BERT-based LS approach that makes use of the pre-trained unsupervised deep bidirectional representations BERT. Despite being entirely unsupervised, experimental results show that our approach obtains obvious improvement than these baselines leveraging linguistic databases and parallel corpus, outperforming the state-of-the-art by more than 11 Accuracy points on three well-known benchmarks. |
Tasks | Language Modelling, Lexical Simplification |
Published | 2019-07-14 |
URL | https://arxiv.org/abs/1907.06226v4 |
https://arxiv.org/pdf/1907.06226v4.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-bert-based-approach-for-lexical |
Repo | https://github.com/qiang2100/BERT-LS |
Framework | pytorch |
SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula
Title | SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula |
Authors | Colin Wan, Zheng Li, Alicia Guo, Yue Zhao |
Abstract | Synthetic population generation is the process of combining multiple socioeconomic and demographic datasets from different sources and/or granularity levels, and downscaling them to an individual level. Although it is a fundamental step for many data science tasks, an efficient and standard framework is absent. In this study, we propose a multi-stage framework called SynC (Synthetic Population via Gaussian Copula) to fill the gap. SynC first removes potential outliers in the data and then fits the filtered data with a Gaussian copula model to correctly capture dependencies and marginal distributions of sampled survey data. Finally, SynC leverages predictive models to merge datasets into one and then scales them accordingly to match the marginal constraints. We make three key contributions in this work: 1) propose a novel framework for generating individual level data from aggregated data sources by combining state-of-the-art machine learning and statistical techniques, 2) demonstrate its value as a feature engineering tool, as well as an alternative to data collection in situations where gathering is difficult through two real-world datasets, 3) release an easy-to-use framework implementation for reproducibility, and 4) ensure the methodology is scalable at the production level and can easily incorporate new data. |
Tasks | Feature Engineering |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07998v2 |
https://arxiv.org/pdf/1904.07998v2.pdf | |
PWC | https://paperswithcode.com/paper/sync-a-unified-framework-for-generating |
Repo | https://github.com/winstonll/SynC |
Framework | none |