January 25, 2020

3619 words 17 mins read

Paper Group NAWR 21

A-LINK: Recognizing Disguised Faces via Active Learning based Inter-Domain Knowledge. Capsule Graph Neural Network. A Persistent Weisfeiler–Lehman Procedure for Graph Classification. Graph Kernels Based on Linear Patterns: Theoretical and Experimental Comparisons. Selective Sparse Sampling for Fine-Grained Image Recognition. Comparing distributions …

A-LINK: Recognizing Disguised Faces via Active Learning based Inter-Domain Knowledge


Title	A-LINK: Recognizing Disguised Faces via Active Learning based Inter-Domain Knowledge
Authors	Anshuman Suri, Mayank Vatsa, Richa Singh
Abstract	Recent advancements in deep learning have significantly increased the capabilities of face recognition. However, face recognition in an unconstrained environment is still an active research challenge. Covariates such as pose and low resolution have received significant attention, but “disguise” is considered an onerous covariate of face recognition. One primary reason for this is the unavailability of large and representative databases. To address the problem of recognizing disguised faces, we propose an active learning framework A-LINK, that intelligently selects training samples from the target domain data, such that the decision boundary does not overfit to a particular set of variations, and better generalizes to encode variability. The framework further applies domain adaptation with the actively selected training samples to fine-tune the network. We demonstrate the effectiveness of the proposed framework on DFW and Multi-PIE datasets with state-of-the-art models such as LCSSE and DenseNet.
Tasks	Active Learning, Domain Adaptation, Face Recognition, Heterogeneous Face Recognition
Published	2019-09-23
URL	http://iab-rubric.org/papers/2019_BTAS_ALINK.pdf
PDF	http://iab-rubric.org/papers/2019_BTAS_ALINK.pdf
PWC	https://paperswithcode.com/paper/a-link-recognizing-disguised-faces-via-active
Repo	https://github.com/iamgroot42/A-LINK
Framework	none

Capsule Graph Neural Network


Title	Capsule Graph Neural Network
Authors	Zhang Xinyi, Lihui Chen
Abstract	The high-quality node embeddings learned from the Graph Neural Networks (GNNs) have been applied to a wide range of node-based applications and some of them have achieved state-of-the-art (SOTA) performance. However, when applying node embeddings learned from GNNs to generate graph embeddings, the scalar node representation may not suffice to preserve the node/graph properties efficiently, resulting in sub-optimal graph embeddings. Inspired by the Capsule Neural Network (CapsNet), we propose the Capsule Graph Neural Network (CapsGNN), which adopts the concept of capsules to address the weakness in existing GNN-based graph embeddings algorithms. By extracting node features in the form of capsules, routing mechanism can be utilized to capture important information at the graph level. As a result, our model generates multiple embeddings for each graph to capture graph properties from different aspects. The attention module incorporated in CapsGNN is used to tackle graphs with various sizes which also enables the model to focus on critical parts of the graphs. Our extensive evaluations with 10 graph-structured datasets demonstrate that CapsGNN has a powerful mechanism that operates to capture macroscopic properties of the whole graph by data-driven. It outperforms other SOTA techniques on several graph classification tasks, by virtue of the new instrument.
Tasks	Graph Classification
Published	2019-05-01
URL	https://openreview.net/forum?id=Byl8BnRcYm
PDF	https://openreview.net/pdf?id=Byl8BnRcYm
PWC	https://paperswithcode.com/paper/capsule-graph-neural-network
Repo	https://github.com/benedekrozemberczki/CapsGNN
Framework	pytorch

A Persistent Weisfeiler–Lehman Procedure for Graph Classification


Title	A Persistent Weisfeiler–Lehman Procedure for Graph Classification
Authors	Bastian Rieck, Christian Bock, Karsten Borgwardt
Abstract	The Weisfeiler–Lehman graph kernel exhibits competitive performance in many graph classification tasks. However, its subtree features are not able to capture connected components and cycles, topological features known for characterising graphs. To extract such features, we leverage propagated node label information and transform unweighted graphs into metric ones. This permits us to augment the subtree features with topological information obtained using persistent homology, a concept from topological data analysis. Our method, which we formalise as a generalisation of Weisfeiler–Lehman subtree features, exhibits favourable classification accuracy and its improvements in predictive performance are mainly driven by including cycle information.
Tasks	Graph Classification, Topological Data Analysis
Published	2019-06-09
URL	http://proceedings.mlr.press/v97/rieck19a.html
PDF	http://proceedings.mlr.press/v97/rieck19a/rieck19a.pdf
PWC	https://paperswithcode.com/paper/a-persistent-weisfeilerlehman-procedure-for
Repo	https://github.com/BorgwardtLab/P-WL
Framework	none

Graph Kernels Based on Linear Patterns: Theoretical and Experimental Comparisons


Title	Graph Kernels Based on Linear Patterns: Theoretical and Experimental Comparisons
Authors	Linlin Jia, Benoit Gaüzère, Paul Honeine
Abstract	Graph kernels are powerful tools to bridge the gap between machine learning and data encoded as graphs. Most graph kernels are based on the decomposition of graphs into a set of patterns. The similarity between two graphs is then deduced from the similarity between corresponding patterns. Kernels based on linear patterns constitute a good trade-off between accuracy performance and computational complexity. In this work, we propose a thorough investigation and comparison of graph kernels based on different linear patterns, namely walks and paths. First, all these kernels are explored in detail, including their mathematical foundations, structures of patterns and computational complexity. Then, experiments are performed on various benchmark datasets exhibiting different types of graphs, including labeled and unlabeled graphs, graphs with different numbers of vertices, graphs with different average vertex degrees, cyclic and acyclic graphs. Finally, for regression and classification tasks, performance and computational complexity of kernels are compared and analyzed, and suggestions are proposed to choose kernels according to the types of graph datasets. This work leads to a clear comparison of strengths and weaknesses of these kernels. An open-source Python library containing an implementation of all discussed kernels is publicly available on GitHub to the community, thus allowing to promote and facilitate the use of graph kernels in machine learning problems.
Tasks	Graph Classification
Published	2019-03-01
URL	https://hal-normandie-univ.archives-ouvertes.fr/hal-02053946/
PDF	https://hal-normandie-univ.archives-ouvertes.fr/hal-02053946/document
PWC	https://paperswithcode.com/paper/graph-kernels-based-on-linear-patterns
Repo	https://github.com/jajupmochi/py-graph
Framework	none

Selective Sparse Sampling for Fine-Grained Image Recognition


Title	Selective Sparse Sampling for Fine-Grained Image Recognition
Authors	Yao Ding, Yanzhao Zhou, Yi Zhu, Qixiang Ye, Jianbin Jiao
Abstract	Fine-grained recognition poses the unique challenge of capturing subtle inter-class differences under considerable intra-class variances (e.g., beaks for bird species). Conventional approaches crop local regions and learn detailed representation from those regions, but suffer from the fixed number of parts and missing of surrounding context. In this paper, we propose a simple yet effective framework, called Selective Sparse Sampling, to capture diverse and fine-grained details. The framework is implemented using Convolutional Neural Networks, referred to as Selective Sparse Sampling Networks (S3Ns). With image-level supervision, S3Ns collect peaks, i.e., local maximums, from class response maps to estimate informative, receptive fields and learn a set of sparse attention for capturing fine-detailed visual evidence as well as preserving context. The evidence is selectively sampled to extract discriminative and complementary features, which significantly enrich the learned representation and guide the network to discover more subtle cues. Extensive experiments and ablation studies show that the proposed method consistently outperforms the state-of-the-art methods on challenging benchmarks including CUB-200-2011, FGVC-Aircraft, and Stanford Cars.
Tasks	Fine-Grained Image Classification, Fine-Grained Image Recognition
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Ding_Selective_Sparse_Sampling_for_Fine-Grained_Image_Recognition_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Ding_Selective_Sparse_Sampling_for_Fine-Grained_Image_Recognition_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/selective-sparse-sampling-for-fine-grained
Repo	https://github.com/Yao-DD/S3N
Framework	pytorch

Comparing distributions: \ell_1 geometry improves kernel two-sample testing


Title	Comparing distributions: \ell_1 geometry improves kernel two-sample testing
Authors	Meyer Scetbon, Gael Varoquaux
Abstract	Are two sets of observations drawn from the same distribution? This problem is a two-sample test. Kernel methods lead to many appealing properties. Indeed state-of-the-art approaches use the $L^2$ distance between kernel-based distribution representatives to derive their test statistics. Here, we show that $L^p$ distances (with $p\geq 1$) between these distribution representatives give metrics on the space of distributions that are well-behaved to detect differences between distributions as they metrize the weak convergence. Moreover, for analytic kernels, we show that the $L^1$ geometry gives improved testing power for scalable computational procedures. Specifically, we derive a finite dimensional approximation of the metric given as the $\ell_1$ norm of a vector which captures differences of expectations of analytic functions evaluated at spatial locations or frequencies (i.e, features). The features can be chosen to maximize the differences of the distributions and give interpretable indications of how they differs. Using an $\ell_1$ norm gives better detection because differences between representatives are dense as we use analytic kernels (non-zero almost everywhere). The tests are consistent, while much faster than state-of-the-art quadratic-time kernel-based tests. Experiments on artificial and real-world problems demonstrate improved power/time tradeoff than the state of the art, based on $\ell_2$ norms, and in some cases, better outright power than even the most expensive quadratic-time tests. This performance gain is retained even in high dimensions.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9398-comparing-distributions-ell_1-geometry-improves-kernel-two-sample-testing
PDF	http://papers.nips.cc/paper/9398-comparing-distributions-ell_1-geometry-improves-kernel-two-sample-testing.pdf
PWC	https://paperswithcode.com/paper/comparing-distributions-ell_1-geometry-1
Repo	https://github.com/meyerscetbon/l1_two_sample_test
Framework	none

Destruction and Construction Learning for Fine-Grained Image Recognition


Title	Destruction and Construction Learning for Fine-Grained Image Recognition
Authors	Yue Chen, Yalong Bai, Wei Zhang, Tao Mei
Abstract	Delicate feature representation about object parts plays a critical role in fine-grained recognition. For example, experts can even distinguish fine-grained objects relying only on object parts according to professional knowledge. In this paper, we propose a novel “Destruction and Construction Learning” (DCL) method to enhance the difficulty of fine-grained recognition and exercise the classification model to acquire expert knowledge. Besides the standard classification backbone network, another “destruction and construction” stream is introduced to carefully “destruct” and then “reconstruct” the input image, for learning discriminative regions and features. More specifically, for “destruction”, we first partition the input image into local regions and then shuffle them by a Region Confusion Mechanism (RCM). To correctly recognize these destructed images, the classification network has to pay more attention to discriminative regions for spotting the differences. To compensate the noises introduced by RCM, an adversarial loss, which distinguishes original images from destructed ones, is applied to reject noisy patterns introduced by RCM. For “construction”, a region alignment network, which tries to restore the original spatial layout of local regions, is followed to model the semantic correlation among local regions. By jointly training with parameter sharing, our proposed DCL injects more discriminative local details to the classification network. Experimental results show that our proposed framework achieves state-of-the-art performance on three standard benchmarks. Moreover, our proposed method does not need any external knowledge during training, and there is no computation overhead at inference time except the standard classification network feed-forwarding. Source code: https://github.com/JDAI-CV/DCL.
Tasks	Fine-Grained Image Classification, Fine-Grained Image Recognition
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Destruction_and_Construction_Learning_for_Fine-Grained_Image_Recognition_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_Destruction_and_Construction_Learning_for_Fine-Grained_Image_Recognition_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/destruction-and-construction-learning-for
Repo	https://github.com/JDAI-CV/DCL
Framework	pytorch

Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs


Title	Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs
Authors	Mike Voets, Kajsa Møllersen, Lars Ailo Bongo
Abstract	We have attempted to reproduce the results in Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs, published in JAMA 2016; 316(22), using publicly available data sets. We re-implemented the main method in the original study since the source code is not available. The original study used non-public fundus images from EyePACS and three hospitals in India for training. We used a different EyePACS data set from Kaggle. The original study used the benchmark data set Messidor-2 to evaluate the algorithm’s performance. We used another distribution of the Messidor-2 data set, since the original data set is no longer available. In the original study, ophthalmologists re-graded all images for diabetic retinopathy, macular edema, and image gradability. We have one diabetic retinopathy grade per image for our data sets, and we assessed image gradability ourselves. We were not able to reproduce the original study’s results with publicly available data. Our algorithm’s area under the receiver operating characteristic curve (AUC) of 0.951 (95% CI, 0.947-0.956) on the Kaggle EyePACS test set and 0.853 (95% CI, 0.835-0.871) on Messidor-2 did not come close to the reported AUC of 0.99 on both test sets in the original study. This may be caused by the use of a single grade per image, or different data. This study shows the challenges of reproducing deep learning method results, and the need for more replication and reproduction studies to validate deep learning methods, especially for medical image analysis. Our source code and instructions are available at: https://github.com/mikevoets/jama16-retina-replication.
Tasks
Published	2019-06-06
URL	https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0217541
PDF	https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0217541&type=printable
PWC	https://paperswithcode.com/paper/reproduction-study-using-public-data-of
Repo	https://github.com/mikevoets/jama16-retina-replication
Framework	tf

Transferable Normalization: Towards Improving Transferability of Deep Neural Networks


Title	Transferable Normalization: Towards Improving Transferability of Deep Neural Networks
Authors	Ximei Wang, Ying Jin, Mingsheng Long, Jianmin Wang, Michael I. Jordan
Abstract	Deep neural networks (DNNs) excel at learning representations when trained on large-scale datasets. Pre-trained DNNs also show strong transferability when fine-tuned to other labeled datasets. However, such transferability becomes weak when the target dataset is fully unlabeled as in Unsupervised Domain Adaptation (UDA). We envision that the loss of transferability may stem from the intrinsic limitation of the architecture design of DNNs. In this paper, we delve into the components of DNN architectures and propose Transferable Normalization (TransNorm) in place of existing normalization techniques. TransNorm is an end-to-end trainable layer to make DNNs more transferable across domains. As a general method, TransNorm can be easily applied to various deep neural networks and domain adaption methods, without introducing any extra hyper-parameters or learnable parameters. Empirical results justify that TransNorm not only improves classification accuracies but also accelerates convergence for mainstream DNN-based domain adaptation methods.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2019-12-01
URL	http://papers.nips.cc/paper/8470-transferable-normalization-towards-improving-transferability-of-deep-neural-networks
PDF	http://papers.nips.cc/paper/8470-transferable-normalization-towards-improving-transferability-of-deep-neural-networks.pdf
PWC	https://paperswithcode.com/paper/transferable-normalization-towards-improving
Repo	https://github.com/thuml/TransNorm
Framework	pytorch

ReacNetGenerator: an Automatic Reaction Network Generator for Reactive Molecular Dynamic Simulations


Title	ReacNetGenerator: an Automatic Reaction Network Generator for Reactive Molecular Dynamic Simulations
Authors	Jinzhe Zeng, Liqun Cao, Chih-Hao Chin, Haisheng Ren, John Z.H. Zhang, Tong Zhu
Abstract	Reactive molecular dynamics (MD) simulation makes it possible to study the reaction mechanisms of complex reaction systems at the atomic level. However, the analysis of the MD trajectories which contain thousands of species and reaction pathways has become a major obstacle to the application of reactive MD simulation in large-scale systems. Here, we report the development and application of the Reaction Network Generator (ReacNetGenerator) method. It can automatically extract the reaction network from the reaction trajectory without any predefined reaction coordinates and elementary reaction steps. Molecular species can be automatically identified from the cartesian coordinates of atoms and the hidden Markov model is used to filter the trajectory noises which makes the analysis process easier and more accurate. The ReacNetGenerator has been successfully used to analyze the reactive MD trajectories of the combustion of methane and 4-component surrogate fuel for rocket propellant 3 (RP-3), and it has great advantages in efficiency and accuracy compared to traditional manual analysis.
Tasks
Published	2019-11-26
URL	https://doi.org/10.1039/C9CP05091D
PDF	https://doi.org/10.1039/C9CP05091D
PWC	https://paperswithcode.com/paper/reacnetgenerator-an-automatic-reaction
Repo	https://github.com/tongzhugroup/reacnetgenerator
Framework	none

Effective End-to-end Unsupervised Outlier Detection via Inlier Priority of Discriminative Network


Title	Effective End-to-end Unsupervised Outlier Detection via Inlier Priority of Discriminative Network
Authors	Siqi Wang, Yijie Zeng, Xinwang Liu, En Zhu, Jianping Yin, Chuanfu Xu, Marius Kloft
Abstract	Despite the wide success of deep neural networks (DNN), little progress has been made on end-to-end unsupervised outlier detection (UOD) from high dimensional data like raw images. In this paper, we propose a framework named E^3Outlier, which can perform UOD in a both effective and end-to-end manner: First, instead of the commonly-used autoencoders in previous end-to-end UOD methods, E^3Outlier for the first time leverages a discriminative DNN for better representation learning, by using surrogate supervision to create multiple pseudo classes from original unlabelled data. Next, unlike classic UOD that utilizes data characteristics like density or proximity, we exploit a novel property named inlier priority to enable end-to-end UOD by discriminative DNN. We demonstrate theoretically and empirically that the intrinsic class imbalance of inliers/outliers will make the network prioritize minimizing inliers’ loss when inliers/outliers are indiscriminately fed into the network for training, which enables us to differentiate outliers directly from DNN’s outputs. Finally, based on inlier priority, we propose the negative entropy based score as a simple and effective outlierness measure. Extensive evaluations show that E^3Outlier significantly advances UOD performance by up to 30% AUROC against state-of-the-art counterparts, especially on relatively difficult benchmarks.
Tasks	Outlier Detection, Representation Learning
Published	2019-12-01
URL	http://papers.nips.cc/paper/8830-effective-end-to-end-unsupervised-outlier-detection-via-inlier-priority-of-discriminative-network
PDF	http://papers.nips.cc/paper/8830-effective-end-to-end-unsupervised-outlier-detection-via-inlier-priority-of-discriminative-network.pdf
PWC	https://paperswithcode.com/paper/effective-end-to-end-unsupervised-outlier
Repo	https://github.com/demonzyj56/E3Outlier
Framework	pytorch

Vector space explorations of literary language


Title	Vector space explorations of literary language
Authors	Andreas van Cranenburgh, Karina van Dalen-Oskam, Joris van Zundert
Abstract	Literary novels are said to distinguish themselves from other novels through conventions associated with literariness. We investigate the task of predicting the literariness of novels as perceived by readers, based on a large reader survey of contemporary Dutch novels. Previous research showed that ratings of literariness are predictable from texts to a substantial extent using machine learning, suggesting that it may be possible to explain the consensus among readers on which novels are literary as a consensus on the kind of writing style that characterizes literature. Although we have not yet collected human judgments to establish the influence of writing style directly (we use a survey with judgments based on the titles of novels), we can try to analyze the behavior of machine learning models on particular text fragments as a proxy for human judgments. In order to explore aspects of the texts associated with literariness, we divide the texts of the novels in chunks of 2–3 pages and create vector space representations using topic models (Latent Dirichlet Allocation) and neural document embeddings (Distributed Bag-of-Words Paragraph Vectors). We analyze the semantic complexity of the novels using distance measures, supporting the notion that literariness can be partly explained as a deviation from the norm. Furthermore, we build predictive models and identify specific keywords and stylistic markers related to literariness. While genre plays a role, we find that the greater part of factors affecting judgments of literariness are explicable in bag-of-words terms,even in short text fragments and among novels with higher literary ratings. The code and notebook used to produce the results in this paper are available at https://github.com/andreasvc/litvecspace.
Tasks	Topic Models
Published	2019-02-09
URL	https://doi.org/10.1007/s10579-018-09442-4
PDF	https://link.springer.com/content/pdf/10.1007%2Fs10579-018-09442-4.pdf
PWC	https://paperswithcode.com/paper/vector-space-explorations-of-literary
Repo	https://github.com/andreasvc/litvecspace
Framework	none

Domain Generalization by Solving Jigsaw Puzzles


Title	Domain Generalization by Solving Jigsaw Puzzles
Authors	Fabio M. Carlucci, Antonio D’Innocente, Silvia Bucci, Barbara Caputo, Tatiana Tommasi
Abstract	Human adaptability relies crucially on the ability to learn and merge knowledge both from supervised and unsupervised learning: the parents point out few important concepts, but then the children fill in the gaps on their own. This is particularly effective, because supervised learning can never be exhaustive and thus learning autonomously allows to discover invariances and regularities that help to generalize. In this paper we propose to apply a similar approach to the task of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images. This secondary task helps the network to learn the concepts of spatial correlation while acting as a regularizer for the classification task. Multiple experiments on the PACS, VLCS, Office-Home and digits datasets confirm our intuition and show that this simple method outperforms previous domain generalization and adaptation solutions. An ablation study further illustrates the inner workings of our approach.
Tasks	Domain Generalization, Object Recognition
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Carlucci_Domain_Generalization_by_Solving_Jigsaw_Puzzles_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Carlucci_Domain_Generalization_by_Solving_Jigsaw_Puzzles_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/domain-generalization-by-solving-jigsaw-1
Repo	https://github.com/fmcarlucci/JigenDG
Framework	pytorch

Boltzmann Exploration Expectation–Maximisation


Title	Boltzmann Exploration Expectation–Maximisation
Authors	Mathias Edman, Neil Dhir
Abstract	We present a general method for fitting finite mixture models (FMM). Learning in a mixture model consists of finding the most likely cluster assignment for each data-point, as well as finding the parameters of the clusters themselves. In many mixture models, this is difficult with current learning methods, where the most common approach is to employ monotone learning algorithms e.g. the conventional expectation-maximisation algorithm. While effective, the success of any monotone algorithm is crucially dependant on good parameter initialisation, where a common choice is K-means initialisation, commonly employed for Gaussian mixture models. For other types of mixture models, the path to good initialisation parameters is often unclear and may require a problem-specific solution. To this end, we propose a general heuristic learning algorithm that utilises Boltzmann exploration to assign each observation to a specific base distribution within the mixture model, which we call Boltzmann exploration expectation-maximisation (BEEM). With BEEM, hard assignments allow straight forward parameter learning for each base distribution by conditioning only on its assigned observations. Consequently, it can be applied to mixtures of any base distribution where single component parameter learning is tractable. The stochastic learning procedure is able to escape local optima and is thus insensitive to parameter initialisation. We show competitive performance on a number of synthetic benchmark cases as well as on real-world datasets.
Tasks	Iris Segmentation
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08869
PDF	https://arxiv.org/pdf/1912.08869.pdf
PWC	https://paperswithcode.com/paper/boltzmann-exploration-expectationmaximisation
Repo	https://github.com/kaminAI/beem
Framework	none

Specializing Distributional Vectors of All Words for Lexical Entailment


Title	Specializing Distributional Vectors of All Words for Lexical Entailment
Authors	Aishwarya Kamath, Jonas Pfeiffer, Edoardo Maria Ponti, Goran Glava{\v{s}}, Ivan Vuli{'c}
Abstract	Semantic specialization methods fine-tune distributional word vectors using lexical knowledge from external resources (e.g. WordNet) to accentuate a particular relation between words. However, such post-processing methods suffer from limited coverage as they affect only vectors of words seen in the external resources. We present the first post-processing method that specializes vectors of all vocabulary words {–} including those unseen in the resources {–} for the asymmetric relation of lexical entailment (LE) (i.e., hyponymy-hypernymy relation). Leveraging a partially LE-specialized distributional space, our POSTLE (i.e., post-specialization for LE) model learns an explicit global specialization function, allowing for specialization of vectors of unseen words, as well as word vectors from other languages via cross-lingual transfer. We capture the function as a deep feed-forward neural network: its objective re-scales vector norms to reflect the concept hierarchy while simultaneously attracting hyponymy-hypernymy pairs to better reflect semantic similarity. An extended model variant augments the basic architecture with an adversarial discriminator. We demonstrate the usefulness and versatility of POSTLE models with different input distributional spaces in different scenarios (monolingual LE and zero-shot cross-lingual LE transfer) and tasks (binary and graded LE). We report consistent gains over state-of-the-art LE-specialization methods, and successfully LE-specialize word vectors for languages without any external lexical knowledge.
Tasks	Cross-Lingual Transfer, Semantic Similarity, Semantic Textual Similarity
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4310/
PDF	https://www.aclweb.org/anthology/W19-4310
PWC	https://paperswithcode.com/paper/specializing-distributional-vectors-of-all
Repo	https://github.com/ashkamath/POSTLE
Framework	none