January 28, 2020

2911 words 14 mins read

Paper Group ANR 932

Paper Group ANR 932

A Bandit Approach to Posterior Dialog Orchestration Under a Budget. On the estimation of the Wasserstein distance in generative models. On the Difference Between the Information Bottleneck and the Deep Information Bottleneck. Decay Replay Mining to Predict Next Process Events. A Formalization of The Natural Gradient Method for General Similarity Me …

A Bandit Approach to Posterior Dialog Orchestration Under a Budget

Title A Bandit Approach to Posterior Dialog Orchestration Under a Budget
Authors Sohini Upadhyay, Mayank Agarwal, Djallel Bounneffouf, Yasaman Khazaeni
Abstract Building multi-domain AI agents is a challenging task and an open problem in the area of AI. Within the domain of dialog, the ability to orchestrate multiple independently trained dialog agents, or skills, to create a unified system is of particular significance. In this work, we study the task of online posterior dialog orchestration, where we define posterior orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. To account for the various costs associated with extracting skill features, we consider online posterior orchestration under a skill execution budget. We formalize this setting as Context Attentive Bandit with Observations (CABO), a variant of context attentive bandits, and evaluate it on simulated non-conversational and proprietary conversational datasets.
Tasks
Published 2019-06-22
URL https://arxiv.org/abs/1906.09384v1
PDF https://arxiv.org/pdf/1906.09384v1.pdf
PWC https://paperswithcode.com/paper/a-bandit-approach-to-posterior-dialog
Repo
Framework

On the estimation of the Wasserstein distance in generative models

Title On the estimation of the Wasserstein distance in generative models
Authors Thomas Pinetz, Daniel Soukup, Thomas Pock
Abstract Generative Adversarial Networks (GANs) have been used to model the underlying probability distribution of sample based datasets. GANs are notoriuos for training difficulties and their dependence on arbitrary hyperparameters. One recent improvement in GAN literature is to use the Wasserstein distance as loss function leading to Wasserstein Generative Adversarial Networks (WGANs). Using this as a basis, we show various ways in which the Wasserstein distance is estimated for the task of generative modelling. Additionally, the secrets in training such models are shown and summarized at the end of this work. Where applicable, we extend current works to different algorithms, different cost functions, and different regularization schemes to improve generative models.
Tasks
Published 2019-10-02
URL https://arxiv.org/abs/1910.00888v1
PDF https://arxiv.org/pdf/1910.00888v1.pdf
PWC https://paperswithcode.com/paper/on-the-estimation-of-the-wasserstein-distance
Repo
Framework

On the Difference Between the Information Bottleneck and the Deep Information Bottleneck

Title On the Difference Between the Information Bottleneck and the Deep Information Bottleneck
Authors Aleksander Wieczorek, Volker Roth
Abstract Combining the Information Bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proved successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. The two assumed properties of the data $X$, $Y$ and their latent representation $T$ take the form of two Markov chains $T-X-Y$ and $X-T-Y$. Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions $P(X,Y,T)$. We therefore show how to circumvent this limitation by optimising a lower bound for $I(T;Y)$ for which only the latter Markov chain has to be satisfied. The actual mutual information consists of the lower bound which is optimised in DVIB and cognate models in practice and of two terms measuring how much the former requirement $T-X-Y$ is violated. Finally, we propose to interpret the family of information bottleneck models as directed graphical models and show that in this framework the original and deep information bottlenecks are special cases of a fundamental IB model.
Tasks
Published 2019-12-31
URL https://arxiv.org/abs/1912.13480v1
PDF https://arxiv.org/pdf/1912.13480v1.pdf
PWC https://paperswithcode.com/paper/on-the-difference-between-the-information
Repo
Framework

Decay Replay Mining to Predict Next Process Events

Title Decay Replay Mining to Predict Next Process Events
Authors Julian Theis, Houshang Darabi
Abstract In complex processes, various events can happen in different sequences. The prediction of the next event given an a-priori process state is of importance in such processes. Recent methods have proposed deep learning techniques such as recurrent neural networks, developed on raw event logs, to predict the next event from a process state. However, such deep learning models by themselves lack a clear representation of the process states. At the same time, recent methods have neglected the time feature of event instances. In this paper, we take advantage of Petri nets as a powerful tool in modeling complex process behaviors considering time as an elemental variable. We propose an approach which starts from a Petri net process model constructed by a process mining algorithm. We enhance the Petri net model with time decay functions to create continuous process state samples. Finally, we use these samples in combination with discrete token movement counters and Petri net markings to train a deep learning model that predicts the next event. We demonstrate significant performance improvements and outperform the state-of-the-art methods on nine real-world benchmark event logs.
Tasks
Published 2019-03-12
URL https://arxiv.org/abs/1903.05084v3
PDF https://arxiv.org/pdf/1903.05084v3.pdf
PWC https://paperswithcode.com/paper/dream-nap-decay-replay-mining-to-predict-next
Repo
Framework

A Formalization of The Natural Gradient Method for General Similarity Measures

Title A Formalization of The Natural Gradient Method for General Similarity Measures
Authors Anton Mallasto, Tom Dela Haije, Aasa Feragen
Abstract In optimization, the natural gradient method is well-known for likelihood maximization. The method uses the Kullback-Leibler divergence, corresponding infinitesimally to the Fisher-Rao metric, which is pulled back to the parameter space of a family of probability distributions. This way, gradients with respect to the parameters respect the Fisher-Rao geometry of the space of distributions, which might differ vastly from the standard Euclidean geometry of the parameter space, often leading to faster convergence. However, when minimizing an arbitrary similarity measure between distributions, it is generally unclear which metric to use. We provide a general framework that, given a similarity measure, derives a metric for the natural gradient. We then discuss connections between the natural gradient method and multiple other optimization techniques in the literature. Finally, we provide computations of the formal natural gradient to show overlap with well-known cases and to compute natural gradients in novel frameworks.
Tasks
Published 2019-02-24
URL http://arxiv.org/abs/1902.08959v1
PDF http://arxiv.org/pdf/1902.08959v1.pdf
PWC https://paperswithcode.com/paper/a-formalization-of-the-natural-gradient
Repo
Framework

A sparse semismooth Newton based augmented Lagrangian method for large-scale support vector machines

Title A sparse semismooth Newton based augmented Lagrangian method for large-scale support vector machines
Authors Dunbiao Niu, Chengjing Wang, Peipei Tang, Qingsong Wang, Enbin Song
Abstract Support vector machines (SVMs) are successful modeling and prediction tools with a variety of applications. Previous work has demonstrated the superiority of the SVMs in dealing with the high dimensional, low sample size problems. However, the numerical difficulties of the SVMs will become severe with the increase of the sample size. Although there exist many solvers for the SVMs, only few of them are designed by exploiting the special structures of the SVMs. In this paper, we propose a highly efficient sparse semismooth Newton based augmented Lagrangian method for solving a large-scale convex quadratic programming problem with a linear equality constraint and a simple box constraint, which is generated from the dual problems of the SVMs. By leveraging the primal-dual error bound result, the fast local convergence rate of the augmented Lagrangian method can be guaranteed. Furthermore, by exploiting the second-order sparsity of the problem when using the semismooth Newton method, the algorithm can efficiently solve the aforementioned difficult problems. Finally, numerical comparisons demonstrate that the proposed algorithm outperforms the current state-of-the-art solvers for the large-scale SVMs.
Tasks
Published 2019-10-03
URL https://arxiv.org/abs/1910.01312v1
PDF https://arxiv.org/pdf/1910.01312v1.pdf
PWC https://paperswithcode.com/paper/a-sparse-semismooth-newton-based-augmented
Repo
Framework

Transcoding compositionally: using attention to find more generalizable solutions

Title Transcoding compositionally: using attention to find more generalizable solutions
Authors Kris Korrel, Dieuwke Hupkes, Verna Dankers, Elia Bruni
Abstract While sequence-to-sequence models have shown remarkable generalization power across several natural language tasks, their construct of solutions are argued to be less compositional than human-like generalization. In this paper, we present seq2attn, a new architecture that is specifically designed to exploit attention to find compositional patterns in the input. In seq2attn, the two standard components of an encoder-decoder model are connected via a transcoder, that modulates the information flow between them. We show that seq2attn can successfully generalize, without requiring any additional supervision, on two tasks which are specifically constructed to challenge the compositional skills of neural networks. The solutions found by the model are highly interpretable, allowing easy analysis of both the types of solutions that are found and potential causes for mistakes. We exploit this opportunity to introduce a new paradigm to test compositionality that studies the extent to which a model overgeneralizes when confronted with exceptions. We show that seq2attn exhibits such overgeneralization to a larger degree than a standard sequence-to-sequence model.
Tasks
Published 2019-06-04
URL https://arxiv.org/abs/1906.01234v2
PDF https://arxiv.org/pdf/1906.01234v2.pdf
PWC https://paperswithcode.com/paper/transcoding-compositionally-using-attention
Repo
Framework

Decomposing Generalization: Models of Generic, Habitual, and Episodic Statements

Title Decomposing Generalization: Models of Generic, Habitual, and Episodic Statements
Authors Venkata Subrahmanyan Govindarajan, Benjamin Van Durme, Aaron Steven White
Abstract We present a novel semantic framework for modeling linguistic expressions of generalization—generic, habitual, and episodic statements—as combinations of simple, real-valued referential properties of predicates and their arguments. We use this framework to construct a dataset covering the entirety of the Universal Dependencies English Web Treebank. We use this dataset to probe the efficacy of type-level and token-level information—including hand-engineered features and static (GloVe) and contextual (ELMo) word embeddings—for predicting expressions of generalization. Data and code are available at decomp.io.
Tasks Word Embeddings
Published 2019-01-31
URL https://arxiv.org/abs/1901.11429v2
PDF https://arxiv.org/pdf/1901.11429v2.pdf
PWC https://paperswithcode.com/paper/decomposing-generalization-models-of-generic
Repo
Framework

Deep-MAPS: Machine Learning based Mobile Air Pollution Sensing

Title Deep-MAPS: Machine Learning based Mobile Air Pollution Sensing
Authors Jun Song, Ke Han
Abstract Mobile and ubiquitous sensing of urban air quality has received increased attention as an economically and operationally viable means to survey atmospheric environment with high spatial-temporal resolution. This paper proposes a machine learning based mobile air pollution sensing framework, called Deep-MAPS, and demonstrates its scientific and financial values in the following aspects. (1) Based on a network of fixed and mobile air quality sensors, we perform spatial inference of PM2.5 concentrations in Beijing (3,025 km2, 19 Jun-16 Jul 2018) for a spatial-temporal resolution of 1km-by-1km and 1 hour, with over 85% accuracy. (2) We leverage urban big data to generate insights regarding the potential cause of pollution, which facilitates evidence-based sustainable urban management. (3) To achieve such spatial-temporal coverage and accuracy, Deep-MAPS can save up to 90% hardware investment, compared with ubiquitous sensing that relies primarily on fixed sensors.
Tasks
Published 2019-04-28
URL https://arxiv.org/abs/1904.12303v2
PDF https://arxiv.org/pdf/1904.12303v2.pdf
PWC https://paperswithcode.com/paper/exploring-urban-air-quality-with-maps-mobile
Repo
Framework

User-Interactive Machine Learning Model for Identifying Structural Relationships of Code Features

Title User-Interactive Machine Learning Model for Identifying Structural Relationships of Code Features
Authors Ankit Gupta, Kartik Chugh, Andrea Solis, Thomas LaToza
Abstract Traditional machine learning based intelligent systems assist users by learning patterns in data and making recommendations. However, these systems are limited in that the user has little means of understanding the rationale behind the systems suggestions, communicating their own understanding of patterns, or correcting system behavior. In this project, we outline a model for intelligent software based on a human computer feedback loop. The Machine Learning (ML) systems recommendations are reviewed by the user, and in turn, this information shapes the systems decision making. Our model was applied to developing an HTML editor that integrates ML with user interaction to ascertain structural relationships between HTML document features and apply them for code completion. The editor utilizes the ID3 algorithm to build decision trees, sequences of rules for predicting code the user will type. The editor displays the decision trees rules in the Interactive Rules Interface System (IRIS), which allows developers to prioritize, modify, or delete them. These interactions alter the data processed by ID3, providing the developer some control over the autocomplete system. Validation indicates that, absent user interaction, the ML model is able to predict tags with 78.4 percent accuracy, attributes with 62.9 percent accuracy, and values with 12.8 percent accuracy. Based off of the results of the user study, user interaction with the rules interface corrects feature relationships missed or mistaken by the automated process, enhancing autocomplete accuracy and developer productivity. Additionally, interaction is proven to help developers work with greater awareness of code patterns. Our research demonstrates the viability of a software integration of machine intelligence with human feedback.
Tasks Decision Making
Published 2019-07-18
URL https://arxiv.org/abs/1907.07679v1
PDF https://arxiv.org/pdf/1907.07679v1.pdf
PWC https://paperswithcode.com/paper/user-interactive-machine-learning-model-for
Repo
Framework

Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment

Title Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment
Authors Yifan Wu, Ezra Winston, Divyansh Kaushik, Zachary Lipton
Abstract Domain adaptation addresses the common problem when the target distribution generating our test data drifts from the source (training) distribution. While absent assumptions, domain adaptation is impossible, strict conditions, e.g. covariate or label shift, enable principled algorithms. Recently-proposed domain-adversarial approaches consist of aligning source and target encodings, often motivating this approach as minimizing two (of three) terms in a theoretical bound on target error. Unfortunately, this minimization can cause arbitrary increases in the third term, e.g. they can break down under shifting label distributions. We propose asymmetrically-relaxed distribution alignment, a new approach that overcomes some limitations of standard domain-adversarial algorithms. Moreover, we characterize precise assumptions under which our algorithm is theoretically principled and demonstrate empirical benefits on both synthetic and real datasets.
Tasks Domain Adaptation
Published 2019-03-05
URL http://arxiv.org/abs/1903.01689v2
PDF http://arxiv.org/pdf/1903.01689v2.pdf
PWC https://paperswithcode.com/paper/domain-adaptation-with-asymmetrically-relaxed
Repo
Framework

From Crowdsourcing to Crowdmining: Using Implicit Human Intelligence for Better Understanding of Crowdsourced Data

Title From Crowdsourcing to Crowdmining: Using Implicit Human Intelligence for Better Understanding of Crowdsourced Data
Authors Bin Guo, Huihui Chen, Yan Liu, Chao Chen, Qi Han, Zhiwen Yu
Abstract With the development of mobile social networks, more and more crowdsourced data are generated on the Web or collected from real-world sensing. The fragment, heterogeneous, and noisy nature of online/offline crowdsourced data, however, makes it difficult to be understood. Traditional content-based analyzing methods suffer from potential issues such as computational intensiveness and poor performance. To address them, this paper presents CrowdMining. In particular, we observe that the knowledge hidden in the process of data generation, regarding individual/crowd behavior patterns (e.g., mobility patterns, community contexts such as social ties and structure) and crowd-object interaction patterns (flickering or tweeting patterns) are neglected in crowdsourced data mining. Therefore, a novel approach that leverages implicit human intelligence (implicit HI) for crowdsourced data mining and understanding is proposed. Two studies titled CrowdEvent and CrowdRoute are presented to showcase its usage, where implicit HIs are extracted either from online or offline crowdsourced data. A generic model for CrowdMining is further proposed based on a set of existing studies. Experiments based on real-world datasets demonstrate the effectiveness of CrowdMining.
Tasks
Published 2019-08-07
URL https://arxiv.org/abs/1908.02412v1
PDF https://arxiv.org/pdf/1908.02412v1.pdf
PWC https://paperswithcode.com/paper/from-crowdsourcing-to-crowdmining-using
Repo
Framework

A new direction to promote the implementation of artificial intelligence in natural clinical settings

Title A new direction to promote the implementation of artificial intelligence in natural clinical settings
Authors Yunyou Huang, Zhifei Zhang, Nana Wang, Nengquan Li, Mengjia Du, Tianshu Hao, Jianfeng Zhan
Abstract Artificial intelligence (AI) researchers claim that they have made great achievements' in clinical realms. However, clinicians point out the so-called achievements’ have no ability to implement into natural clinical settings. The root cause for this huge gap is that many essential features of natural clinical tasks are overlooked by AI system developers without medical background. In this paper, we propose that the clinical benchmark suite is a novel and promising direction to capture the essential features of the real-world clinical tasks, hence qualifies itself for guiding the development of AI systems, promoting the implementation of AI in real-world clinical practice.
Tasks
Published 2019-05-08
URL https://arxiv.org/abs/1905.02940v1
PDF https://arxiv.org/pdf/1905.02940v1.pdf
PWC https://paperswithcode.com/paper/a-new-direction-to-promote-the-implementation
Repo
Framework

Variance-Reduced Decentralized Stochastic Optimization with Gradient Tracking – Part II: GT-SVRG

Title Variance-Reduced Decentralized Stochastic Optimization with Gradient Tracking – Part II: GT-SVRG
Authors Ran Xin, Usman A. Khan, Soummya Kar
Abstract Decentralized stochastic optimization has recently benefited from gradient tracking methods \cite{DSGT_Pu,DSGT_Xin} providing efficient solutions for large-scale empirical risk minimization problems. In Part I \cite{GT_SAGA} of this work, we develop \textbf{\texttt{GT-SAGA}} that is based on a decentralized implementation of SAGA \cite{SAGA} using gradient tracking and discuss regimes of practical interest where \textbf{\texttt{GT-SAGA}} outperforms existing decentralized approaches in terms of the total number of local gradient computations. In this paper, we describe \textbf{\texttt{GT-SVRG}} that develops a decentralized gradient tracking based implementation of SVRG \cite{SVRG}, another well-known variance-reduction technique. We show that the convergence rate of \textbf{\texttt{GT-SVRG}} matches that of \textbf{\texttt{GT-SAGA}} for smooth and strongly-convex functions and highlight different trade-offs between the two algorithms in various settings.
Tasks Stochastic Optimization
Published 2019-10-08
URL https://arxiv.org/abs/1910.04057v2
PDF https://arxiv.org/pdf/1910.04057v2.pdf
PWC https://paperswithcode.com/paper/variance-reduced-decentralized-stochastic
Repo
Framework

Game Design for Eliciting Distinguishable Behavior

Title Game Design for Eliciting Distinguishable Behavior
Authors Fan Yang, Liu Leqi, Yifan Wu, Zachary C. Lipton, Pradeep Ravikumar, William W. Cohen, Tom Mitchell
Abstract The ability to inferring latent psychological traits from human behavior is key to developing personalized human-interacting machine learning systems. Approaches to infer such traits range from surveys to manually-constructed experiments and games. However, these traditional games are limited because they are typically designed based on heuristics. In this paper, we formulate the task of designing \emph{behavior diagnostic games} that elicit distinguishable behavior as a mutual information maximization problem, which can be solved by optimizing a variational lower bound. Our framework is instantiated by using prospect theory to model varying player traits, and Markov Decision Processes to parameterize the games. We validate our approach empirically, showing that our designed games can successfully distinguish among players with different traits, outperforming manually-designed ones by a large margin.
Tasks
Published 2019-12-12
URL https://arxiv.org/abs/1912.06074v1
PDF https://arxiv.org/pdf/1912.06074v1.pdf
PWC https://paperswithcode.com/paper/game-design-for-eliciting-distinguishable-1
Repo
Framework
comments powered by Disqus