April 2, 2020

3811 words 18 mins read

Paper Group ANR 348

Paper Group ANR 348

Unsupervised Multilingual Alignment using Wasserstein Barycenter. Learning Complexity of Simulated Annealing. Incorporating Expert Prior in Bayesian Optimisation via Space Warping. From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information. Sequential Bayesian Experimental Design for …

Unsupervised Multilingual Alignment using Wasserstein Barycenter

Title Unsupervised Multilingual Alignment using Wasserstein Barycenter
Authors Xin Lian, Kshitij Jain, Jakub Truszkowski, Pascal Poupart, Yaoliang Yu
Abstract We study unsupervised multilingual alignment, the problem of finding word-to-word translations between multiple languages without using any parallel data. One popular strategy is to reduce multilingual alignment to the much simplified bilingual setting, by picking one of the input languages as the pivot language that we transit through. However, it is well-known that transiting through a poorly chosen pivot language (such as English) may severely degrade the translation quality, since the assumed transitive relations among all pairs of languages may not be enforced in the training process. Instead of going through a rather arbitrarily chosen pivot language, we propose to use the Wasserstein barycenter as a more informative ‘‘mean’’ language: it encapsulates information from all languages and minimizes all pairwise transportation costs. We evaluate our method on standard benchmarks and demonstrate state-of-the-art performances.
Tasks
Published 2020-01-28
URL https://arxiv.org/abs/2002.00743v1
PDF https://arxiv.org/pdf/2002.00743v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-multilingual-alignment-using
Repo
Framework

Learning Complexity of Simulated Annealing

Title Learning Complexity of Simulated Annealing
Authors Avrim Blum, Chen Dan, Saeed Seddighin
Abstract Simulated annealing is an effective and general means of optimization. It is in fact inspired by metallurgy, where the temperature of a material determines its behavior in thermodynamics. Likewise, in simulated annealing, the actions that the algorithm takes depend entirely on the value of a variable which captures the notion of temperature. Typically, simulated annealing starts with a high temperature, which makes the algorithm pretty unpredictable, and gradually cools the temperature down to become more stable. A key component that plays a crucial role in the performance of simulated annealing is the criteria under which the temperature changes namely, the cooling schedule. Motivated by this, we study the following question in this work: “Given enough samples to the instances of a specific class of optimization problems, can we design optimal (or approximately optimal) cooling schedules that minimize the runtime or maximize the success rate of the algorithm on average when the underlying problem is drawn uniformly at random from the same class?” We provide positive results both in terms of sample complexity and simulation complexity. For sample complexity, we show that $\tilde O(\sqrt{m})$ samples suffice to find an approximately optimal cooling schedule of length $m$. We complement this result by giving a lower bound of $\tilde \Omega(m^{1/3})$ on the sample complexity of any learning algorithm that provides an almost optimal cooling schedule. These results are general and rely on no assumption. For simulation complexity, however, we make additional assumptions to measure the success rate of an algorithm. To this end, we introduce the monotone stationary graph that models the performance of simulated annealing. Based on this model, we present polynomial time algorithms with provable guarantees for the learning problem.
Tasks
Published 2020-03-06
URL https://arxiv.org/abs/2003.02981v1
PDF https://arxiv.org/pdf/2003.02981v1.pdf
PWC https://paperswithcode.com/paper/learning-complexity-of-simulated-annealing
Repo
Framework

Incorporating Expert Prior in Bayesian Optimisation via Space Warping

Title Incorporating Expert Prior in Bayesian Optimisation via Space Warping
Authors Anil Ramachandran, Sunil Gupta, Santu Rana, Cheng Li, Svetha Venkatesh
Abstract Bayesian optimisation is a well-known sample-efficient method for the optimisation of expensive black-box functions. However when dealing with big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function. Since the function evaluations are expensive in terms of both money and time, it may be desirable to alleviate this problem. One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation. In its standard form, Bayesian optimisation assumes the likelihood of any point in the search space being the optimum is equal. Therefore any prior knowledge that can provide information about the optimum of the function would elevate the optimisation performance. In this paper, we represent the prior knowledge about the function optimum through a prior distribution. The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum. We incorporate this prior directly in function model (Gaussian process), by redefining the kernel matrix, which allows this method to work with any acquisition function, i.e. acquisition agnostic approach. We show the superiority of our method over standard Bayesian optimisation method through optimisation of several benchmark functions and hyperparameter tuning of two algorithms: Support Vector Machine (SVM) and Random forest.
Tasks Bayesian Optimisation
Published 2020-03-27
URL https://arxiv.org/abs/2003.12250v1
PDF https://arxiv.org/pdf/2003.12250v1.pdf
PWC https://paperswithcode.com/paper/incorporating-expert-prior-in-bayesian
Repo
Framework

From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information

Title From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information
Authors Alexander Mehler, Rüdiger Gleim, Regina Gaitsch, Wahed Hemati, Tolga Uslu
Abstract Are nearby places (e.g. cities) described by related words? In this article we transfer this research question in the field of lexical encoding of geographic information onto the level of intertextuality. To this end, we explore Volunteered Geographic Information (VGI) to model texts addressing places at the level of cities or regions with the help of so-called topic networks. This is done to examine how language encodes and networks geographic information on the aboutness level of texts. Our hypothesis is that the networked thematizations of places are similar - regardless of their distances and the underlying communities of authors. To investigate this we introduce Multiplex Topic Networks (MTN), which we automatically derive from Linguistic Multilayer Networks (LMN) as a novel model, especially of thematic networking in text corpora. Our study shows a Zipfian organization of the thematic universe in which geographical places (especially cities) are located in online communication. We interpret this finding in the context of cognitive maps, a notion which we extend by so-called thematic maps. According to our interpretation of this finding, the organization of thematic maps as part of cognitive maps results from a tendency of authors to generate shareable content that ensures the continued existence of the underlying media. We test our hypothesis by example of special wikis and extracts of Wikipedia. In this way we come to the conclusion: Places, whether close to each other or not, are located in neighboring places that span similar subnetworks in the topic universe.
Tasks
Published 2020-02-04
URL https://arxiv.org/abs/2002.01454v1
PDF https://arxiv.org/pdf/2002.01454v1.pdf
PWC https://paperswithcode.com/paper/from-topic-networks-to-distributed-cognitive
Repo
Framework

Sequential Bayesian Experimental Design for Implicit Models via Mutual Information

Title Sequential Bayesian Experimental Design for Implicit Models via Mutual Information
Authors Steven Kleinegesse, Christopher Drovandi, Michael U. Gutmann
Abstract Bayesian experimental design (BED) is a framework that uses statistical models and decision making under uncertainty to optimise the cost and performance of a scientific experiment. Sequential BED, as opposed to static BED, considers the scenario where we can sequentially update our beliefs about the model parameters through data gathered in the experiment. A class of models of particular interest for the natural and medical sciences are implicit models, where the data generating distribution is intractable, but sampling from it is possible. Even though there has been a lot of work on static BED for implicit models in the past few years, the notoriously difficult problem of sequential BED for implicit models has barely been touched upon. We address this gap in the literature by devising a novel sequential design framework for parameter estimation that uses the Mutual Information (MI) between model parameters and simulated data as a utility function to find optimal experimental designs, which has not been done before for implicit models. Our approach uses likelihood-free inference by ratio estimation to simultaneously estimate posterior distributions and the MI. During the sequential BED procedure we utilise Bayesian optimisation to help us optimise the MI utility. We find that our framework is efficient for the various implicit models tested, yielding accurate parameter estimates after only a few iterations.
Tasks Bayesian Optimisation, Decision Making, Decision Making Under Uncertainty
Published 2020-03-20
URL https://arxiv.org/abs/2003.09379v1
PDF https://arxiv.org/pdf/2003.09379v1.pdf
PWC https://paperswithcode.com/paper/sequential-bayesian-experimental-design-for
Repo
Framework

Single Unit Status in Deep Convolutional Neural Network Codes for Face Identification: Sparseness Redefined

Title Single Unit Status in Deep Convolutional Neural Network Codes for Face Identification: Sparseness Redefined
Authors Connor J. Parde, Y. Ivette Colón, Matthew Q. Hill, Carlos D. Castillo, Prithviraj Dhar, Alice J. O’Toole
Abstract Deep convolutional neural networks (DCNNs) trained for face identification develop representations that generalize over variable images, while retaining subject (e.g., gender) and image (e.g., viewpoint) information. Identity, gender, and viewpoint codes were studied at the “neural unit” and ensemble levels of a face-identification network. At the unit level, identification, gender classification, and viewpoint estimation were measured by deleting units to create variably-sized, randomly-sampled subspaces at the top network layer. Identification of 3,531 identities remained high (area under the ROC approximately 1.0) as dimensionality decreased from 512 units to 16 (0.95), 4 (0.80), and 2 (0.72) units. Individual identities separated statistically on every top-layer unit. Cross-unit responses were minimally correlated, indicating that units code non-redundant identity cues. This “distributed” code requires only a sparse, random sample of units to identify faces accurately. Gender classification declined gradually and viewpoint estimation fell steeply as dimensionality decreased. Individual units were weakly predictive of gender and viewpoint, but ensembles proved effective predictors. Therefore, distributed and sparse codes co-exist in the network units to represent different face attributes. At the ensemble level, principal component analysis of face representations showed that identity, gender, and viewpoint information separated into high-dimensional subspaces, ordered by explained variance. Identity, gender, and viewpoint information contributed to all individual unit responses, undercutting a neural tuning analogy for face attributes. Interpretation of neural-like codes from DCNNs, and by analogy, high-level visual codes, cannot be inferred from single unit responses. Instead, “meaning” is encoded by directions in the high-dimensional space.
Tasks Face Identification, Viewpoint Estimation
Published 2020-02-14
URL https://arxiv.org/abs/2002.06274v2
PDF https://arxiv.org/pdf/2002.06274v2.pdf
PWC https://paperswithcode.com/paper/single-unit-status-in-deep-convolutional
Repo
Framework

Split-BOLFI for for misspecification-robust likelihood free inference in high dimensions

Title Split-BOLFI for for misspecification-robust likelihood free inference in high dimensions
Authors Owen Thomas, Henri Pesonen, Raquel Sá-Leão, Hermínia de Lencastre, Samuel Kaski, Jukka Corander
Abstract Likelihood-free inference for simulator-based statistical models has recently grown rapidly from its infancy to a useful tool for practitioners. However, models with more than a very small number of parameters as the target of inference have remained an enigma, in particular for the approximate Bayesian computation (ABC) community. To advance the possibilities for performing likelihood-free inference in high-dimensional parameter spaces, here we introduce an extension of the popular Bayesian optimisation based approach to approximate discrepancy functions in a probabilistic manner which lends itself to an efficient exploration of the parameter space. Our method achieves computational scalability by using separate acquisition procedures for the discrepancies defined for different parameters. These efficient high-dimensional simulation acquisitions are combined with exponentiated loss-likelihoods to provide a misspecification-robust characterisation of the marginal posterior distribution for all model parameters. The method successfully performs computationally efficient inference in a 100-dimensional space on canonical examples and compares favourably to existing Copula-ABC methods. We further illustrate the potential of this approach by fitting a bacterial transmission dynamics model to daycare centre data, which provides biologically coherent results on the strain competition in a 30-dimensional parameter space.
Tasks Bayesian Optimisation, Efficient Exploration
Published 2020-02-21
URL https://arxiv.org/abs/2002.09377v1
PDF https://arxiv.org/pdf/2002.09377v1.pdf
PWC https://paperswithcode.com/paper/split-bolfi-for-for-misspecification-robust
Repo
Framework

Global Attention based Graph Convolutional Neural Networks for Improved Materials Property Prediction

Title Global Attention based Graph Convolutional Neural Networks for Improved Materials Property Prediction
Authors Steph-Yves Louis, Yong Zhao, Alireza Nasiri, Xiran Wong, Yuqi Song, Fei Liu, Jianjun Hu
Abstract Machine learning (ML) methods have gained increasing popularity in exploring and developing new materials. More specifically, graph neural network (GNN) has been applied in predicting material properties. In this work, we develop a novel model, GATGNN, for predicting inorganic material properties based on graph neural networks composed of multiple graph-attention layers (GAT) and a global attention layer. Through the application of the GAT layers, our model can efficiently learn the complex bonds shared among the atoms within each atom’s local neighborhood. Subsequently, the global attention layer provides the weight coefficients of each atom in the inorganic crystal material which are used to considerably improve our model’s performance. Notably, with the development of our GATGNN model, we show that our method is able to both outperform the previous models’ predictions and provide insight into the crystallization of the material.
Tasks
Published 2020-03-11
URL https://arxiv.org/abs/2003.13379v1
PDF https://arxiv.org/pdf/2003.13379v1.pdf
PWC https://paperswithcode.com/paper/global-attention-based-graph-convolutional
Repo
Framework

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity

Title Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity
Authors Ivan Vulić, Simon Baker, Edoardo Maria Ponti, Ulla Petti, Ira Leviant, Kelly Wing, Olga Majewska, Eden Bar, Matt Malone, Thierry Poibeau, Roi Reichart, Anna Korhonen
Abstract We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering datasets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili). Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs, adjectives, adverbs), frequency ranks, similarity intervals, lexical fields, and concreteness levels. Additionally, owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets. Due to its extensive size and language coverage, Multi-SimLex provides entirely novel opportunities for experimental evaluation and analysis. On its monolingual and cross-lingual benchmarks, we evaluate and analyze a wide array of recent state-of-the-art monolingual and cross-lingual representation models, including static and contextualized word embeddings (such as fastText, M-BERT and XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised cross-lingual word embeddings. We also present a step-by-step dataset creation protocol for creating consistent, Multi-Simlex-style resources for additional languages. We make these contributions – the public release of Multi-SimLex datasets, their creation protocol, strong baseline results, and in-depth analyses which can be be helpful in guiding future developments in multilingual lexical semantics and representation learning – available via a website which will encourage community effort in further expansion of Multi-Simlex to many more languages. Such a large-scale semantic resource could inspire significant further advances in NLP across languages.
Tasks Representation Learning, Semantic Similarity, Semantic Textual Similarity, Word Embeddings
Published 2020-03-10
URL https://arxiv.org/abs/2003.04866v1
PDF https://arxiv.org/pdf/2003.04866v1.pdf
PWC https://paperswithcode.com/paper/multi-simlex-a-large-scale-evaluation-of
Repo
Framework

Adaptive Personalized Federated Learning

Title Adaptive Personalized Federated Learning
Authors Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi
Abstract Investigation of the degree of personalization in federated learning algorithms has shown that only maximizing the performance of the global model will confine the capacity of the local models to personalize. In this paper, we advocate an adaptive personalized federated learning (APFL) algorithm, where each client will train their local models while contributing to the global model. Theoretically, we show that the mixture of local and global models can reduce the generalization error, using the multi-domain learning theory. We also propose a communication-reduced bilevel optimization method, which reduces the communication rounds to $O(\sqrt{T})$ and show that under strong convexity and smoothness assumptions, the proposed algorithm can achieve a convergence rate of $O(1/T)$ with some residual error. The residual error is related to the gradient diversity among local models, and the gap between optimal local and global models.
Tasks bilevel optimization
Published 2020-03-30
URL https://arxiv.org/abs/2003.13461v1
PDF https://arxiv.org/pdf/2003.13461v1.pdf
PWC https://paperswithcode.com/paper/adaptive-personalized-federated-learning
Repo
Framework

Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction

Title Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction
Authors Danushka Bollegala, Ryuichi Kiryo, Kosuke Tsujino, Haruki Yukawa
Abstract Language-independent tokenisation (LIT) methods that do not require labelled language resources or lexicons have recently gained popularity because of their applicability in resource-poor languages. Moreover, they compactly represent a language using a fixed size vocabulary and can efficiently handle unseen or rare words. On the other hand, language-specific tokenisation (LST) methods have a long and established history, and are developed using carefully created lexicons and training resources. Unlike subtokens produced by LIT methods, LST methods produce valid morphological subwords. Despite the contrasting trade-offs between LIT vs. LST methods, their performance on downstream NLP tasks remain unclear. In this paper, we empirically compare the two approaches using semantic similarity measurement as an evaluation task across a diverse set of languages. Our experimental results covering eight languages show that LST consistently outperforms LIT when the vocabulary size is large, but LIT can produce comparable or better results than LST in many languages with comparatively smaller (i.e. less than 100K words) vocabulary sizes, encouraging the use of LIT when language-specific resources are unavailable, incomplete or a smaller model is required. Moreover, we find that smoothed inverse frequency (SIF) to be an accurate method to create word embeddings from subword embeddings for multilingual semantic similarity prediction tasks. Further analysis of the nearest neighbours of tokens show that semantically and syntactically related tokens are closely embedded in subword embedding spaces
Tasks Semantic Similarity, Semantic Textual Similarity, Word Embeddings
Published 2020-02-25
URL https://arxiv.org/abs/2002.11004v1
PDF https://arxiv.org/pdf/2002.11004v1.pdf
PWC https://paperswithcode.com/paper/language-independent-tokenisation-rivals
Repo
Framework

User-in-the-loop Adaptive Intent Detection for Instructable Digital Assistant

Title User-in-the-loop Adaptive Intent Detection for Instructable Digital Assistant
Authors Nicolas Lair, Clément Delgrange, David Mugisha, Jean-Michel Dussoux, Pierre-Yves Oudeyer, Peter Ford Dominey
Abstract People are becoming increasingly comfortable using Digital Assistants (DAs) to interact with services or connected objects. However, for non-programming users, the available possibilities for customizing their DA are limited and do not include the possibility of teaching the assistant new tasks. To make the most of the potential of DAs, users should be able to customize assistants by instructing them through Natural Language (NL). To provide such functionalities, NL interpretation in traditional assistants should be improved: (1) The intent identification system should be able to recognize new forms of known intents, and to acquire new intents as they are expressed by the user. (2) In order to be adaptive to novel intents, the Natural Language Understanding module should be sample efficient, and should not rely on a pretrained model. Rather, the system should continuously collect the training data as it learns new intents from the user. In this work, we propose AidMe (Adaptive Intent Detection in Multi-Domain Environments), a user-in-the-loop adaptive intent detection framework that allows the assistant to adapt to its user by learning his intents as their interaction progresses. AidMe builds its repertoire of intents and collects data to train a model of semantic similarity evaluation that can discriminate between the learned intents and autonomously discover new forms of known intents. AidMe addresses two major issues - intent learning and user adaptation - for instructable digital assistants. We demonstrate the capabilities of AidMe as a standalone system by comparing it with a one-shot learning system and a pretrained NLU module through simulations of interactions with a user. We also show how AidMe can smoothly integrate to an existing instructable digital assistant.
Tasks Intent Detection, One-Shot Learning, Semantic Similarity, Semantic Textual Similarity
Published 2020-01-16
URL https://arxiv.org/abs/2001.06007v1
PDF https://arxiv.org/pdf/2001.06007v1.pdf
PWC https://paperswithcode.com/paper/user-in-the-loop-adaptive-intent-detection
Repo
Framework

Generating Interpretable Poverty Maps using Object Detection in Satellite Images

Title Generating Interpretable Poverty Maps using Object Detection in Satellite Images
Authors Kumar Ayush, Burak Uzkent, Marshall Burke, David Lobell, Stefano Ermon
Abstract Accurate local-level poverty measurement is an essential task for governments and humanitarian organizations to track the progress towards improving livelihoods and distribute scarce resources. Recent computer vision advances in using satellite imagery to predict poverty have shown increasing accuracy, but they do not generate features that are interpretable to policymakers, inhibiting adoption by practitioners. Here we demonstrate an interpretable computational framework to accurately predict poverty at a local level by applying object detectors to high resolution (30cm) satellite images. Using the weighted counts of objects as features, we achieve 0.539 Pearson’s r^2 in predicting village-level poverty in Uganda, a 31% improvement over existing (and less interpretable) benchmarks. Feature importance and ablation analysis reveal intuitive relationships between object counts and poverty predictions. Our results suggest that interpretability does not have to come at the cost of performance, at least in this important domain.
Tasks Feature Importance, Object Detection
Published 2020-02-05
URL https://arxiv.org/abs/2002.01612v2
PDF https://arxiv.org/pdf/2002.01612v2.pdf
PWC https://paperswithcode.com/paper/generating-interpretable-poverty-maps-using
Repo
Framework

BMI: A Behavior Measurement Indicator for Fuel Poverty Using Aggregated Load Readings from Smart Meters

Title BMI: A Behavior Measurement Indicator for Fuel Poverty Using Aggregated Load Readings from Smart Meters
Authors P. Fergus, C. Chalmers
Abstract Fuel poverty affects between 50 and 125 million households in Europe and is a significant issue for both developed and developing countries globally. This means that fuel poor residents are unable to adequately warm their home and run the necessary energy services needed for lighting, cooking, hot water, and electrical appliances. The problem is complex but is typically caused by three factors; low income, high energy costs, and energy inefficient homes. In the United Kingdom (UK), 4 million families are currently living in fuel poverty. Those in series financial difficulty are either forced to self-disconnect or have their services terminated by energy providers. Fuel poverty contributed to 10,000 reported deaths in England in the winter of 2016-2107 due to homes being cold. While it is recognized by governments as a social, public health and environmental policy issue, the European Union (EU) has failed to provide a common definition of fuel poverty or a conventional set of indicators to measure it. This chapter discusses current fuel poverty strategies across the EU and proposes a new and foundational behavior measurement indicator designed to directly assess and monitor fuel poverty risks in households using smart meters, Consumer Access Device (CAD) data and machine learning. By detecting Activities of Daily Living (ADLS) through household appliance usage, it is possible to spot the early signs of financial difficulty and identify when support packages are required.
Tasks
Published 2020-02-16
URL https://arxiv.org/abs/2002.12899v1
PDF https://arxiv.org/pdf/2002.12899v1.pdf
PWC https://paperswithcode.com/paper/bmi-a-behavior-measurement-indicator-for-fuel
Repo
Framework

Deep Gaussian Markov random fields

Title Deep Gaussian Markov random fields
Authors Per Sidén, Fredrik Lindsten
Abstract Gaussian Markov random fields (GMRFs) are probabilistic graphical models widely used in spatial statistics and related fields to model dependencies over spatial structures. We establish a formal connection between GMRFs and convolutional neural networks (CNNs). Common GMRFs are special cases of a generative model where the inverse mapping from data to latent variables is given by a 1-layer linear CNN. This connection allows us to generalize GMRFs to multi-layer CNN architectures, effectively increasing the order of the corresponding GMRF in a way which has favorable computational scaling. We describe how well-established tools, such as autodiff and variational inference, can be used for simple and efficient inference and learning of the deep GMRF. We demonstrate the flexibility of the proposed model and show that it outperforms the state-of-the-art on a dataset of satellite temperatures, in terms of prediction and predictive uncertainty.
Tasks
Published 2020-02-18
URL https://arxiv.org/abs/2002.07467v1
PDF https://arxiv.org/pdf/2002.07467v1.pdf
PWC https://paperswithcode.com/paper/deep-gaussian-markov-random-fields
Repo
Framework
comments powered by Disqus