April 2, 2020

3026 words 15 mins read

Paper Group ANR 297

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters. Compact Deep Aggregation for Set Retrieval. Manifold-based Test Generation for Image Classifiers. Superpixel Image Classification with Graph Attention Networks. High-dimensional, multiscale online changepoint detection. Theoretical Analysis of Divide-and-Conquer ERM: Beyond Square …

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters


Title	K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
Authors	Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu ji, Guihong Cao, Daxin Jiang, Ming Zhou
Abstract	We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, they may suffer from the problem of catastrophic forgetting. To address this, we propose K-Adapter, which remains the original parameters of the pre-trained model fixed and supports continual knowledge infusion. Taking RoBERTa as the pre-trained model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus different adapters are efficiently trained in a distributed way. We inject two kinds of knowledge, including factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge obtained from dependency parsing. Results on three knowledge-driven tasks (total six datasets) including relation classification, entity typing and question answering demonstrate that each adapter improves the performance, and the combination of both adapters brings further improvements. Probing experiments further show that K-Adapter captures richer factual and commonsense knowledge than RoBERTa.
Tasks	Dependency Parsing, Entity Typing, Question Answering, Relation Classification
Published	2020-02-05
URL	https://arxiv.org/abs/2002.01808v3
PDF	https://arxiv.org/pdf/2002.01808v3.pdf
PWC	https://paperswithcode.com/paper/k-adapter-infusing-knowledge-into-pre-trained
Repo
Framework

Compact Deep Aggregation for Set Retrieval


Title	Compact Deep Aggregation for Set Retrieval
Authors	Yujie Zhong, Relja Arandjelović, Andrew Zisserman
Abstract	The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking, whilst maintaining discriminability of the individual descriptors. We focus on a specific example of this general problem – that of retrieving images containing multiple faces from a large scale dataset of images. Here the set consists of the face descriptors in each image, and given a query for multiple identities, the goal is then to retrieve, in order, images which contain all the identities, all but one, \etc To this end, we make the following contributions: first, we propose a CNN architecture – {\em SetNet} – to achieve the objective: it learns face descriptors and their aggregation over a set to produce a compact fixed length descriptor designed for set retrieval, and the score of an image is a count of the number of identities that match the query; second, we show that this compact descriptor has minimal loss of discriminability up to two faces per image, and degrades slowly after that – far exceeding a number of baselines; third, we explore the speed vs.\ retrieval quality trade-off for set retrieval using this compact descriptor; and, finally, we collect and annotate a large dataset of images containing various number of celebrities, which we use for evaluation and is publicly released.
Tasks
Published	2020-03-26
URL	https://arxiv.org/abs/2003.11794v1
PDF	https://arxiv.org/pdf/2003.11794v1.pdf
PWC	https://paperswithcode.com/paper/compact-deep-aggregation-for-set-retrieval
Repo
Framework

Manifold-based Test Generation for Image Classifiers


Title	Manifold-based Test Generation for Image Classifiers
Authors	Taejoon Byun, Abhishek Vijayakumar, Sanjai Rayadurgam, Darren Cofer
Abstract	Neural networks used for image classification tasks in critical applications must be tested with sufficient realistic data to assure their correctness. To effectively test an image classification neural network, one must obtain realistic test data adequate enough to inspire confidence that differences between the implicit requirements and the learned model would be exposed. This raises two challenges: first, an adequate subset of the data points must be carefully chosen to inspire confidence, and second, the implicit requirements must be meaningfully extrapolated to data points beyond those in the explicit training set. This paper proposes a novel framework to address these challenges. Our approach is based on the premise that patterns in a large input data space can be effectively captured in a smaller manifold space, from which similar yet novel test cases—both the input and the label—can be sampled and generated. A variant of Conditional Variational Autoencoder (CVAE) is used for capturing this manifold with a generative function, and a search technique is applied on this manifold space to efficiently find fault-revealing inputs. Experiments show that this approach enables generation of thousands of realistic yet fault-revealing test cases efficiently even for well-trained models.
Tasks	Image Classification
Published	2020-02-15
URL	https://arxiv.org/abs/2002.06337v1
PDF	https://arxiv.org/pdf/2002.06337v1.pdf
PWC	https://paperswithcode.com/paper/manifold-based-test-generation-for-image
Repo
Framework

Superpixel Image Classification with Graph Attention Networks


Title	Superpixel Image Classification with Graph Attention Networks
Authors	Pedro H. C. Avelar, Anderson R. Tavares, Thiago L. T. da Silveira, Cláudio R. Jung, Luís C. Lamb
Abstract	This document reports the use of Graph Attention Networks for classifying oversegmented images, as well as a general procedure for generating oversegmented versions of image-based datasets. The code and learnt models for/from the experiments are available on github. The experiments were ran from June 2019 until December 2019. We obtained better results than the baseline models that uses geometric distance-based attention by using instead self attention, in a more sparsely connected graph network.
Tasks	Image Classification
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05544v1
PDF	https://arxiv.org/pdf/2002.05544v1.pdf
PWC	https://paperswithcode.com/paper/superpixel-image-classification-with-graph
Repo
Framework

High-dimensional, multiscale online changepoint detection


Title	High-dimensional, multiscale online changepoint detection
Authors	Yudong Chen, Tengyao Wang, Richard J. Samworth
Abstract	We introduce a new method for high-dimensional, online changepoint detection in settings where a $p$-variate Gaussian data stream may undergo a change in mean. The procedure works by performing likelihood ratio tests against simple alternatives of different scales in each coordinate, and then aggregating test statistics across scales and coordinates. The algorithm is online in the sense that its worst-case computational complexity per new observation, namely $O\bigl(p^2 \log (ep)\bigr)$, is independent of the number of previous observations; in practice, it may even be significantly faster than this. We prove that the patience, or average run length under the null, of our procedure is at least at the desired nominal level, and provide guarantees on its response delay under the alternative that depend on the sparsity of the vector of mean change. Simulations confirm the practical effectiveness of our proposal.
Tasks
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03668v1
PDF	https://arxiv.org/pdf/2003.03668v1.pdf
PWC	https://paperswithcode.com/paper/high-dimensional-multiscale-online
Repo
Framework

Theoretical Analysis of Divide-and-Conquer ERM: Beyond Square Loss and RKHS


Title	Theoretical Analysis of Divide-and-Conquer ERM: Beyond Square Loss and RKHS
Authors	Yong Liu, Lizhong Ding, Weiping Wang
Abstract	Theoretical analysis of the divide-and-conquer based distributed learning with least square loss in the reproducing kernel Hilbert space (RKHS) have recently been explored within the framework of learning theory. However, the studies on learning theory for general loss functions and hypothesis spaces remain limited. To fill the gap, we study the risk performance of distributed empirical risk minimization (ERM) for general loss functions and hypothesis spaces. The main contributions are two-fold. First, we derive two tight risk bounds under certain basic assumptions on the hypothesis space, as well as the smoothness, Lipschitz continuity, strong convexity of the loss function. Second, we further develop a more general risk bound for distributed ERM without the restriction of strong convexity.
Tasks
Published	2020-03-09
URL	https://arxiv.org/abs/2003.03882v2
PDF	https://arxiv.org/pdf/2003.03882v2.pdf
PWC	https://paperswithcode.com/paper/risk-analysis-of-divide-and-conquer-erm
Repo
Framework

C-CoCoA: A Continuous Cooperative Constraint Approximation Algorithm to Solve Functional DCOPs


Title	C-CoCoA: A Continuous Cooperative Constraint Approximation Algorithm to Solve Functional DCOPs
Authors	Amit Sarker, Abdullahil Baki Arif, Moumita Choudhury, Md. Mosaddek Khan
Abstract	Distributed Constraint Optimization Problems (DCOPs) have been widely used to coordinate interactions (i.e. constraints) in cooperative multi-agent systems. The traditional DCOP model assumes that variables owned by the agents can take only discrete values and constraints’ cost functions are defined for every possible value assignment of a set of variables. While this formulation is often reasonable, there are many applications where the variables are continuous decision variables and constraints are in functional form. To overcome this limitation, Functional DCOP (F-DCOP) model is proposed that is able to model problems with continuous variables. The existing F-DCOPs algorithms experience huge computation and communication overhead. This paper applies continuous non-linear optimization methods on Cooperative Constraint Approximation (CoCoA) algorithm. We empirically show that our algorithm is able to provide high-quality solutions at the expense of smaller communication cost and execution time compared to the existing F-DCOP algorithms.
Tasks
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12427v1
PDF	https://arxiv.org/pdf/2002.12427v1.pdf
PWC	https://paperswithcode.com/paper/c-cocoa-a-continuous-cooperative-constraint
Repo
Framework

Learning Oracle Attention for High-fidelity Face Completion


Title	Learning Oracle Attention for High-fidelity Face Completion
Authors	Tong Zhou, Changxing Ding, Shaowen Lin, Xinchao Wang, Dacheng Tao
Abstract	High-fidelity face completion is a challenging task due to the rich and subtle facial textures involved. What makes it more complicated is the correlations between different facial components, for example, the symmetry in texture and structure between both eyes. While recent works adopted the attention mechanism to learn the contextual relations among elements of the face, they have largely overlooked the disastrous impacts of inaccurate attention scores; in addition, they fail to pay sufficient attention to key facial components, the completion results of which largely determine the authenticity of a face image. Accordingly, in this paper, we design a comprehensive framework for face completion based on the U-Net structure. Specifically, we propose a dual spatial attention module to efficiently learn the correlations between facial textures at multiple scales; moreover, we provide an oracle supervision signal to the attention module to ensure that the obtained attention scores are reasonable. Furthermore, we take the location of the facial components as prior knowledge and impose a multi-discriminator on these regions, with which the fidelity of facial components is significantly promoted. Extensive experiments on two high-resolution face datasets including CelebA-HQ and Flickr-Faces-HQ demonstrate that the proposed approach outperforms state-of-the-art methods by large margins.
Tasks	Facial Inpainting
Published	2020-03-31
URL	https://arxiv.org/abs/2003.13903v1
PDF	https://arxiv.org/pdf/2003.13903v1.pdf
PWC	https://paperswithcode.com/paper/learning-oracle-attention-for-high-fidelity
Repo
Framework

DPGN: Distribution Propagation Graph Network for Few-shot Learning


Title	DPGN: Distribution Propagation Graph Network for Few-shot Learning
Authors	Ling Yang, Liangliang Li, Zilun Zhang, Xinyu Zhou, Erjin Zhou, Yu Liu
Abstract	Most graph-network-based meta-learning approaches model instance-level relation of examples. We extend this idea further to explicitly model the distribution-level relation of one example to all other examples in a 1-vs-N manner. We propose a novel approach named distribution propagation graph network (DPGN) for few-shot learning. It conveys both the distribution-level relations and instance-level relations in each few-shot learning task. To combine the distribution-level relations and instance-level relations for all examples, we construct a dual complete graph network which consists of a point graph and a distribution graph with each node standing for an example. Equipped with dual graph architecture, DPGN propagates label information from labeled examples to unlabeled examples within several update generations. In extensive experiments on few-shot learning benchmarks, DPGN outperforms state-of-the-art results by a large margin in 5% $\sim$ 12% under supervised setting and 7% $\sim$ 13% under semi-supervised setting. Code will be released.
Tasks	Few-Shot Learning, Meta-Learning
Published	2020-03-31
URL	https://arxiv.org/abs/2003.14247v2
PDF	https://arxiv.org/pdf/2003.14247v2.pdf
PWC	https://paperswithcode.com/paper/dpgn-distribution-propagation-graph-network
Repo
Framework

Learning Options from Demonstration using Skill Segmentation


Title	Learning Options from Demonstration using Skill Segmentation
Authors	Matthew Cockcroft, Shahil Mawjee, Steven James, Pravesh Ranchod
Abstract	We present a method for learning options from segmented demonstration trajectories. The trajectories are first segmented into skills using nonparametric Bayesian clustering and a reward function for each segment is then learned using inverse reinforcement learning. From this, a set of inferred trajectories for the demonstration are generated. Option initiation sets and termination conditions are learned from these trajectories using the one-class support vector machine clustering algorithm. We demonstrate our method in the four rooms domain, where an agent is able to autonomously discover usable options from human demonstration. Our results show that these inferred options can then be used to improve learning and planning.
Tasks
Published	2020-01-19
URL	https://arxiv.org/abs/2001.06793v1
PDF	https://arxiv.org/pdf/2001.06793v1.pdf
PWC	https://paperswithcode.com/paper/learning-options-from-demonstration-using
Repo
Framework

A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed Graphs


Title	A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed Graphs
Authors	Lu Bai, Lixin Cui, Edwin R. Hancock
Abstract	In this paper, we develop a new graph kernel, namely the Hierarchical Transitive-Aligned kernel, by transitively aligning the vertices between graphs through a family of hierarchical prototype graphs. Comparing to most existing state-of-the-art graph kernels, the proposed kernel has three theoretical advantages. First, it incorporates the locational correspondence information between graphs into the kernel computation, and thus overcomes the shortcoming of ignoring structural correspondences arising in most R-convolution kernels. Second, it guarantees the transitivity between the correspondence information that is not available for most existing matching kernels. Third, it incorporates the information of all graphs under comparisons into the kernel computation process, and thus encapsulates richer characteristics. By transductively training the C-SVM classifier, experimental evaluations demonstrate the effectiveness of the new transitive-aligned kernel. The proposed kernel can outperform state-of-the-art graph kernels on standard graph-based datasets in terms of the classification accuracy.
Tasks
Published	2020-02-08
URL	https://arxiv.org/abs/2002.04425v1
PDF	https://arxiv.org/pdf/2002.04425v1.pdf
PWC	https://paperswithcode.com/paper/a-hierarchical-transitive-aligned-graph
Repo
Framework

Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining


Title	Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Authors	Chengyu Wang, Minghui Qiu, Jun Huang, Xiaofeng He
Abstract	Pre-trained neural language models bring significant improvement for various NLP tasks, by fine-tuning the models on task-specific training sets. During fine-tuning, the parameters are initialized from pre-trained models directly, which ignores how the learning process of similar NLP tasks in different domains is correlated and mutually reinforced. In this paper, we propose an effective learning procedure named Meta Fine-Tuning (MFT), served as a meta-learner to solve a group of similar NLP tasks for neural language models. Instead of simply multi-task training over all the datasets, MFT only learns from typical instances of various domains to acquire highly transferable knowledge. It further encourages the language model to encode domain-invariant representations by optimizing a series of novel domain corruption loss functions. After MFT, the model can be fine-tuned for each domain with better parameter initializations and higher generalization ability. We implement MFT upon BERT to solve several multi-domain text mining tasks. Experimental results confirm the effectiveness of MFT and its usefulness for few-shot learning.
Tasks	Few-Shot Learning, Language Modelling
Published	2020-03-29
URL	https://arxiv.org/abs/2003.13003v1
PDF	https://arxiv.org/pdf/2003.13003v1.pdf
PWC	https://paperswithcode.com/paper/meta-fine-tuning-neural-language-models-for
Repo
Framework

Keeping Community in the Loop: Understanding Wikipedia Stakeholder Values for Machine Learning-Based Systems


Title	Keeping Community in the Loop: Understanding Wikipedia Stakeholder Values for Machine Learning-Based Systems
Authors	C. Estelle Smith, Bowen Yu, Anjali Srivastava, Aaron Halfaker, Loren Terveen, Haiyi Zhu
Abstract	On Wikipedia, sophisticated algorithmic tools are used to assess the quality of edits and take corrective actions. However, algorithms can fail to solve the problems they were designed for if they conflict with the values of communities who use them. In this study, we take a Value-Sensitive Algorithm Design approach to understanding a community-created and -maintained machine learning-based algorithm called the Objective Revision Evaluation System (ORES)—a quality prediction system used in numerous Wikipedia applications and contexts. Five major values converged across stakeholder groups that ORES (and its dependent applications) should: (1) reduce the effort of community maintenance, (2) maintain human judgement as the final authority, (3) support differing peoples’ differing workflows, (4) encourage positive engagement with diverse editor groups, and (5) establish trustworthiness of people and algorithms within the community. We reveal tensions between these values and discuss implications for future research to improve algorithms like ORES.
Tasks
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04879v1
PDF	https://arxiv.org/pdf/2001.04879v1.pdf
PWC	https://paperswithcode.com/paper/keeping-community-in-the-loop-understanding
Repo
Framework

Generative Partial Multi-View Clustering


Title	Generative Partial Multi-View Clustering
Authors	Qianqian Wang, Zhengming Ding, Zhiqiang Tao, Quanxue Gao, Yun Fu
Abstract	Nowadays, with the rapid development of data collection sources and feature extraction methods, multi-view data are getting easy to obtain and have received increasing research attention in recent years, among which, multi-view clustering (MVC) forms a mainstream research direction and is widely used in data analysis. However, existing MVC methods mainly assume that each sample appears in all the views, without considering the incomplete view case due to data corruption, sensor failure, equipment malfunction, etc. In this study, we design and build a generative partial multi-view clustering model, named as GP-MVC, to address the incomplete multi-view problem by explicitly generating the data of missing views. The main idea of GP-MVC lies at two-fold. First, multi-view encoder networks are trained to learn common low-dimensional representations, followed by a clustering layer to capture the consistent cluster structure across multiple views. Second, view-specific generative adversarial networks are developed to generate the missing data of one view conditioning on the shared representation given by other views. These two steps could be promoted mutually, where learning common representations facilitates data imputation and the generated data could further explores the view consistency. Moreover, an weighted adaptive fusion scheme is implemented to exploit the complementary information among different views. Experimental results on four benchmark datasets are provided to show the effectiveness of the proposed GP-MVC over the state-of-the-art methods.
Tasks	Imputation
Published	2020-03-29
URL	https://arxiv.org/abs/2003.13088v1
PDF	https://arxiv.org/pdf/2003.13088v1.pdf
PWC	https://paperswithcode.com/paper/generative-partial-multi-view-clustering
Repo
Framework

Unified Multi-Domain Learning and Data Imputation using Adversarial Autoencoder


Title	Unified Multi-Domain Learning and Data Imputation using Adversarial Autoencoder
Authors	Andre Mendes, Julian Togelius, Leandro dos Santos Coelho
Abstract	We present a novel framework that can combine multi-domain learning (MDL), data imputation (DI) and multi-task learning (MTL) to improve performance for classification and regression tasks in different domains. The core of our method is an adversarial autoencoder that can: (1) learn to produce domain-invariant embeddings to reduce the difference between domains; (2) learn the data distribution for each domain and correctly perform data imputation on missing data. For MDL, we use the Maximum Mean Discrepancy (MMD) measure to align the domain distributions. For DI, we use an adversarial approach where a generator fill in information for missing data and a discriminator tries to distinguish between real and imputed values. Finally, using the universal feature representation in the embeddings, we train a classifier using MTL that given input from any domain, can predict labels for all domains. We demonstrate the superior performance of our approach compared to other state-of-art methods in three distinct settings, DG-DI in image recognition with unstructured data, MTL-DI in grade estimation with structured data and MDMTL-DI in a selection process using mixed data.
Tasks	Imputation, Multi-Task Learning
Published	2020-03-15
URL	https://arxiv.org/abs/2003.07779v1
PDF	https://arxiv.org/pdf/2003.07779v1.pdf
PWC	https://paperswithcode.com/paper/unified-multi-domain-learning-and-data
Repo
Framework