Paper Group NANR 13
Weakly-supervised Knowledge Graph Alignment with Adversarial Learning. Utilizing Edge Features in Graph Neural Networks via Variational Information Maximization. Learning deep graph matching with channel-independent embedding and Hungarian attention. MMD GAN with Random-Forest Kernels. Locally Constant Networks. Learning Long- and Short-Term User L …
Weakly-supervised Knowledge Graph Alignment with Adversarial Learning
Title | Weakly-supervised Knowledge Graph Alignment with Adversarial Learning |
Authors | Anonymous |
Abstract | This paper studies aligning knowledge graphs from different sources or languages. Most existing methods train supervised methods for the alignment, which usually require a large number of aligned knowledge triplets. However, such a large number of aligned knowledge triplets may not be available or are expensive to obtain in many domains. Therefore, in this paper we propose to study aligning knowledge graphs in fully-unsupervised or weakly-supervised fashion, i.e., without or with only a few aligned triplets. We propose an unsupervised framework to align the entity and relation embddings of different knowledge graphs with an adversarial learning framework. Moreover, a regularization term which maximizes the mutual information between the embeddings of different knowledge graphs is used to mitigate the problem of mode collapse when learning the alignment functions. Such a framework can be further seamlessly integrated with existing supervised methods by utilizing a limited number of aligned triples as guidance. Experimental results on multiple datasets prove the effectiveness of our proposed approach in both the unsupervised and the weakly-supervised settings. |
Tasks | Knowledge Graphs |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SygfNCEYDH |
https://openreview.net/pdf?id=SygfNCEYDH | |
PWC | https://paperswithcode.com/paper/weakly-supervised-knowledge-graph-alignment-2 |
Repo | |
Framework | |
Utilizing Edge Features in Graph Neural Networks via Variational Information Maximization
Title | Utilizing Edge Features in Graph Neural Networks via Variational Information Maximization |
Authors | Anonymous |
Abstract | Graph Neural Networks (GNNs) broadly follow the scheme that the representation vector of each node is updated recursively using the message from neighbor nodes, where the message of a neighbor is usually pre-processed with a parameterized transform matrix. To make better use of edge features, we propose the Edge Information maximized Graph Neural Network (EIGNN) that maximizes the Mutual Information (MI) between edge features and message passing channels. The MI is reformulated as a differentiable objective via a variational approach. We theoretically show that the newly introduced objective enables the model to preserve edge information, and empirically corroborate the enhanced performance of MI-maximized models across a broad range of learning tasks including regression on molecular graphs and relation prediction in knowledge graphs. |
Tasks | Knowledge Graphs |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BygZK2VYvB |
https://openreview.net/pdf?id=BygZK2VYvB | |
PWC | https://paperswithcode.com/paper/utilizing-edge-features-in-graph-neural-1 |
Repo | |
Framework | |
Learning deep graph matching with channel-independent embedding and Hungarian attention
Title | Learning deep graph matching with channel-independent embedding and Hungarian attention |
Authors | Tianshu Yu, Runzhong Wang, Junchi Yan, Baoxin Li |
Abstract | Graph matching aims to establishing node-wise correspondence between two graphs, which is a classic combinatorial problem and in general NP-complete. Until very recently, deep graph matching methods start to resort to deep networks to achieve unprecedented matching accuracy. Along this direction, this paper makes two complementary contributions which can also be reused as plugin in existing works: i) a novel node and edge embedding strategy which stimulates the multi-head strategy in attention models and allows the information in each channel to be merged independently. In contrast, only node embedding is accounted in previous works; ii) a general masking mechanism over the loss function is devised to improve the smoothness of objective learning for graph matching. Using Hungarian algorithm, it dynamically constructs a structured and sparsely connected layer, taking into account the most contributing matching pairs as hard attention. Our approach performs competitively, and can also improve state-of-the-art methods as plugin, regarding with matching accuracy on three public benchmarks. |
Tasks | Graph Matching |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJgBd2NYPH |
https://openreview.net/pdf?id=rJgBd2NYPH | |
PWC | https://paperswithcode.com/paper/learning-deep-graph-matching-with-channel |
Repo | |
Framework | |
MMD GAN with Random-Forest Kernels
Title | MMD GAN with Random-Forest Kernels |
Authors | Tao Huang, Zhen Han, Xu Jia, Hanyuan Hang |
Abstract | In this paper, we propose a novel kind of kernel, random forest kernel, to enhance the empirical performance of MMD GAN. Different from common forests with deterministic routings, a probabilistic routing variant is used in our innovated random-forest kernel, which is possible to merge with the CNN frameworks. Our proposed random-forest kernel has the following advantages: From the perspective of random forest, the output of GAN discriminator can be viewed as feature inputs to the forest, where each tree gets access to merely a fraction of the features, and thus the entire forest benefits from ensemble learning. In the aspect of kernel method, random-forest kernel is proved to be characteristic, and therefore suitable for the MMD structure. Besides, being an asymmetric kernel, our random-forest kernel is much more flexible, in terms of capturing the differences between distributions. Sharing the advantages of CNN, kernel method, and ensemble learning, our random-forest kernel based MMD GAN obtains desirable empirical performances on CIFAR-10, CelebA and LSUN bedroom data sets. Furthermore, for the sake of completeness, we also put forward comprehensive theoretical analysis to support our experimental results. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJxhWa4KDr |
https://openreview.net/pdf?id=HJxhWa4KDr | |
PWC | https://paperswithcode.com/paper/mmd-gan-with-random-forest-kernels |
Repo | |
Framework | |
Locally Constant Networks
Title | Locally Constant Networks |
Authors | Anonymous |
Abstract | We show how neural models can be used to realize piece-wise constant functions such as decision trees. Our approach builds on ReLU networks that are piece-wise linear and hence their associated gradients with respect to the inputs are locally constant. We formally establish the equivalence between the classes of locally constant networks and decision trees. Moreover, we highlight several advantageous properties of locally constant networks, including how they realize decision trees with parameter sharing across branching / leaves. Indeed, only $M$ neurons suffice to implicitly model an oblique decision tree with $2^M$ leaf nodes. The neural representation also enables us to adopt many tools developed for deep networks (e.g., DropConnect (Wan et al., 2013)) while implicitly training decision trees. We demonstrate that our method outperforms alternative techniques for training oblique decision trees in the context of molecular property classification and regression tasks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bke8UR4FPB |
https://openreview.net/pdf?id=Bke8UR4FPB | |
PWC | https://paperswithcode.com/paper/locally-constant-networks-1 |
Repo | |
Framework | |
Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption
Title | Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption |
Authors | Wei Zhang, Yue Ying, Pan Lu, Hongyuan Zha |
Abstract | Personalized image caption, a natural extension of the standard image caption task, requires to generate brief image descriptions tailored for users’ writing style and traits, and is more practical to meet users’ real demands. Only a few recent studies shed light on this crucial task and learn static user representations to capture their long-term literal-preference. However, it is insufficient to achieve satisfactory performance due to the intrinsic existence of not only long-term user literal-preference, but also short-term literal-preference which is associated with users’ recent states. To bridge this gap, we develop a novel multimodal hierarchical transformer network (MHTN) for personalized image caption in this paper. It learns short-term user literal-preference based on users’recent captions through a short-term user encoder at the low level. And at the high level, the multimodal encoder integrates target image representations with short-term literal preference, as well as long-term literal-preference learned from user IDs. These two encoders enjoy the advantages of the powerful transformer networks. Extensive experiments on two real datasets show the effectiveness of considering two types of user literal-preference simultaneously and better performance over the state-of-the-art models |
Tasks | Image Captioning |
Published | 2020-02-04 |
URL | https://lupantech.github.io/papers/aaai20_caption.pdf |
https://lupantech.github.io/papers/aaai20_caption.pdf | |
PWC | https://paperswithcode.com/paper/learning-long-and-short-term-user-literal |
Repo | |
Framework | |
Sparse and Structured Visual Attention
Title | Sparse and Structured Visual Attention |
Authors | Anonymous |
Abstract | Visual attention mechanisms have been widely used in image captioning models. In this paper, to better link the image structure with the generated text, we replace the traditional softmax attention mechanism by two alternative sparsity-promoting transformations: sparsemax and Total-Variation Sparse Attention (TVmax). With sparsemax, we obtain sparse attention weights, selecting relevant features. In order to promote sparsity and encourage fusing of the related adjacent spatial locations, we propose TVmax. By selecting relevant groups of features, the TVmax transformation improves interpretability. We present results in the Microsoft COCO and Flickr30k datasets, obtaining gains in comparison to softmax. TVmax outperforms the other compared attention mechanisms in terms of human-rated caption quality and attention relevance. |
Tasks | Image Captioning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1e8WTEYPB |
https://openreview.net/pdf?id=r1e8WTEYPB | |
PWC | https://paperswithcode.com/paper/sparse-and-structured-visual-attention |
Repo | |
Framework | |
A Causal View on Robustness of Neural Networks
Title | A Causal View on Robustness of Neural Networks |
Authors | Anonymous |
Abstract | We present a causal view on the robustness of neural networks against input manipulations, which applies not only to traditional classification tasks but also to general measurement data. Based on this view, we design a deep causal manipulation augmented model (deep CAMA) which explicitly models the manipulations of data as a cause to the observed effect variables. We further develop data augmentation and test-time fine-tuning methods to improve deep CAMA’s robustness. When compared with discriminative deep neural networks, our proposed model shows superior robustness against unseen manipulations. As a by-product, our model achieves disentangled representation which separates the representation of manipulations from those of other latent causes. |
Tasks | Data Augmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hkxvl0EtDH |
https://openreview.net/pdf?id=Hkxvl0EtDH | |
PWC | https://paperswithcode.com/paper/a-causal-view-on-robustness-of-neural |
Repo | |
Framework | |
Concise Multi-head Attention Models
Title | Concise Multi-head Attention Models |
Authors | Anonymous |
Abstract | Attention based Transformer architecture has enabled significant advances in the field of natural language processing. In addition to new pre-training techniques, recent improvements crucially rely on working with a relatively larger embedding dimension for tokens. This leads to models that are prohibitively large to be employed in the downstream tasks. In this paper we identify one of the important factors contributing to the large embedding size requirement. In particular, our analysis highlights that the scaling between the number of heads and the size of each head in the existing architectures gives rise to this limitation, which we further validate with our experiments. As a solution, we propose a new way to set the projection size in attention heads that allows us to train models with a relatively smaller embedding dimension, without sacrificing the performance. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1eVXa4KvH |
https://openreview.net/pdf?id=r1eVXa4KvH | |
PWC | https://paperswithcode.com/paper/concise-multi-head-attention-models |
Repo | |
Framework | |
On Variational Learning of Controllable Representations for Text without Supervision
Title | On Variational Learning of Controllable Representations for Text without Supervision |
Authors | Anonymous |
Abstract | The variational autoencoder (VAE) has found success in modelling the manifold of natural images on certain datasets, allowing meaningful images to be generated while interpolating or extrapolating in the latent code space, but it is unclear whether similar capabilities are feasible for text considering its discrete nature. In this work, we investigate the reason why unsupervised learning of controllable representations fails for text. We find that traditional sequence VAEs can learn disentangled representations through their latent codes to some extent, but they often fail to properly decode when the latent factor is being manipulated, because the manipulated codes often land in holes or vacant regions in the aggregated posterior latent space, which the decoding network is not trained to process. Both as a validation of the explanation and as a fix to the problem, we propose to constrain the posterior mean to a learned probability simplex, and performs manipulation within this simplex. Our proposed method mitigates the latent vacancy problem and achieves the first success in unsupervised learning of controllable representations for text. Empirically, our method significantly outperforms unsupervised baselines and is competitive with strong supervised approaches on text style transfer. Furthermore, when switching the latent factor (e.g., topic) during a long sentence generation, our proposed framework can often complete the sentence in a seemingly natural way – a capability that has never been attempted by previous methods. |
Tasks | Style Transfer, Text Style Transfer |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hkex2a4FPr |
https://openreview.net/pdf?id=Hkex2a4FPr | |
PWC | https://paperswithcode.com/paper/on-variational-learning-of-controllable |
Repo | |
Framework | |
SCALABLE OBJECT-ORIENTED SEQUENTIAL GENERATIVE MODELS
Title | SCALABLE OBJECT-ORIENTED SEQUENTIAL GENERATIVE MODELS |
Authors | Anonymous |
Abstract | The most significant limitation of previous approaches to unsupervised learning for object-oriented representation is its scalability. Most of the previous models have been shown to work only on scenes with a few objects. In this paper, we propose SCALOR, a generative model for Scalable Sequential Object-Oriented Representation. With the spatially parallel attention and proposal-rejection mechanism, SCALOR is a scalable model that can deal with orders of magnitude more objects that previous models. Besides, we introduce the background model so that it can model the foreground objects and complex background together. In experiments on large-scale MNIST and DSprite datasets, we demonstrate that SCALOR can deal with scenes with near 100 objects as well as modeling complex natural background images. Importantly, using SCALOR, we demonstrate for the first time a result of modeling natural scenes with several tens of moving objects |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJxrKgStDH |
https://openreview.net/pdf?id=SJxrKgStDH | |
PWC | https://paperswithcode.com/paper/scalable-object-oriented-sequential |
Repo | |
Framework | |
Closed loop deep Bayesian inversion: Uncertainty driven acquisition for fast MRI
Title | Closed loop deep Bayesian inversion: Uncertainty driven acquisition for fast MRI |
Authors | Anonymous |
Abstract | This work proposes a closed-loop, uncertainty-driven adaptive sampling frame- work (CLUDAS) for accelerating magnetic resonance imaging (MRI) via deep Bayesian inversion. By closed-loop, we mean that our samples adapt in real- time to the incoming data. To our knowledge, we demonstrate the first genera- tive adversarial network (GAN) based framework for posterior estimation over a continuum sampling rates of an inverse problem. We use this estimator to drive the sampling for accelerated MRI. Our numerical evidence demonstrates that the variance estimate strongly correlates with the expected MSE improvement for dif- ferent acceleration rates even with few posterior samples. Moreover, the resulting masks bring improvements to the state-of-the-art fixed and active mask designing approaches across MSE, posterior variance and SSIM on real undersampled MRI scans. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlPOlBKDB |
https://openreview.net/pdf?id=BJlPOlBKDB | |
PWC | https://paperswithcode.com/paper/closed-loop-deep-bayesian-inversion |
Repo | |
Framework | |
Mutual Information Gradient Estimation for Representation Learning
Title | Mutual Information Gradient Estimation for Representation Learning |
Authors | Anonymous |
Abstract | Mutual information (MI) plays an important role in representation learning. However, MI is unfortunately intractable in continuous and high-dimensional settings. Recent advances establish tractable and scalable MI estimators to discover useful representation. However, most of existing methods are not capable of providing accurate estimation of MI with low-variance when the MI is large. We argue that estimating gradients of MI is more appealing for representation learning than directly estimating MI due to the difficulty of estimating MI. Therefore, we propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on score estimation of implicit distributions. It exhibits a tight and smooth gradient estimation of MI in the high-dimensional and large-MI setting. We expand the applications of MIGE in both unsupervised learning of deep representations based on InfoMax and the Information Bottleneck method. Experimental results have indicated the remarkable performance improvement in learning useful representation. |
Tasks | Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByxaUgrFvH |
https://openreview.net/pdf?id=ByxaUgrFvH | |
PWC | https://paperswithcode.com/paper/mutual-information-gradient-estimation-for |
Repo | |
Framework | |
Efficient Deep Representation Learning by Adaptive Latent Space Sampling
Title | Efficient Deep Representation Learning by Adaptive Latent Space Sampling |
Authors | Anonymous |
Abstract | Supervised deep learning requires a large amount of training samples with annotations (e.g. label class for classification task, pixel- or voxel-wised label map for segmentation tasks), which are expensive and time-consuming to obtain. During the training of a deep neural network, the annotated samples are fed into the network in a mini-batch way, where they are often regarded of equal importance. However, some of the samples may become less informative during training, as the magnitude of the gradient start to vanish for these samples. In the meantime, other samples of higher utility or hardness may be more demanded for the training process to proceed and require more exploitation. To address the challenges of expensive annotations and loss of sample informativeness, here we propose a novel training framework which adaptively selects informative samples that are fed to the training process. The adaptive selection or sampling is performed based on a hardness-aware strategy in the latent space constructed by a generative model. To evaluate the proposed training framework, we perform experiments on three different datasets, including MNIST and CIFAR-10 for image classification task and a medical image dataset IVUS for biophysical simulation task. On all three datasets, the proposed framework outperforms a random sampling method, which demonstrates the effectiveness of our framework. |
Tasks | Image Classification, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Byl3HxBFwH |
https://openreview.net/pdf?id=Byl3HxBFwH | |
PWC | https://paperswithcode.com/paper/efficient-deep-representation-learning-by |
Repo | |
Framework | |
GraphQA: Protein Model Quality Assessment using Graph Convolutional Network
Title | GraphQA: Protein Model Quality Assessment using Graph Convolutional Network |
Authors | Anonymous |
Abstract | Proteins are ubiquitous molecules whose function in biological processes is determined by their 3D structure. Experimental identification of a protein’s structure can be time-consuming, prohibitively expensive, and not always possible. Alternatively, protein folding can be modeled using computational methods, which however are not guaranteed to always produce optimal results. GraphQA is a graph-based method to estimate the quality of protein models, that possesses favorable properties such as representation learning, explicit modeling of both sequential and 3D structure, geometric invariance and computational efficiency. In this work, we demonstrate significant improvements of the state-of-the-art for both hand-engineered and representation-learning approaches, as well as carefully evaluating the individual contributions of GraphQA. |
Tasks | Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyxgBerKwB |
https://openreview.net/pdf?id=HyxgBerKwB | |
PWC | https://paperswithcode.com/paper/graphqa-protein-model-quality-assessment |
Repo | |
Framework | |