Paper Group ANR 932
A Bandit Approach to Posterior Dialog Orchestration Under a Budget. On the estimation of the Wasserstein distance in generative models. On the Difference Between the Information Bottleneck and the Deep Information Bottleneck. Decay Replay Mining to Predict Next Process Events. A Formalization of The Natural Gradient Method for General Similarity Me …
A Bandit Approach to Posterior Dialog Orchestration Under a Budget
Title | A Bandit Approach to Posterior Dialog Orchestration Under a Budget |
Authors | Sohini Upadhyay, Mayank Agarwal, Djallel Bounneffouf, Yasaman Khazaeni |
Abstract | Building multi-domain AI agents is a challenging task and an open problem in the area of AI. Within the domain of dialog, the ability to orchestrate multiple independently trained dialog agents, or skills, to create a unified system is of particular significance. In this work, we study the task of online posterior dialog orchestration, where we define posterior orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. To account for the various costs associated with extracting skill features, we consider online posterior orchestration under a skill execution budget. We formalize this setting as Context Attentive Bandit with Observations (CABO), a variant of context attentive bandits, and evaluate it on simulated non-conversational and proprietary conversational datasets. |
Tasks | |
Published | 2019-06-22 |
URL | https://arxiv.org/abs/1906.09384v1 |
https://arxiv.org/pdf/1906.09384v1.pdf | |
PWC | https://paperswithcode.com/paper/a-bandit-approach-to-posterior-dialog |
Repo | |
Framework | |
On the estimation of the Wasserstein distance in generative models
Title | On the estimation of the Wasserstein distance in generative models |
Authors | Thomas Pinetz, Daniel Soukup, Thomas Pock |
Abstract | Generative Adversarial Networks (GANs) have been used to model the underlying probability distribution of sample based datasets. GANs are notoriuos for training difficulties and their dependence on arbitrary hyperparameters. One recent improvement in GAN literature is to use the Wasserstein distance as loss function leading to Wasserstein Generative Adversarial Networks (WGANs). Using this as a basis, we show various ways in which the Wasserstein distance is estimated for the task of generative modelling. Additionally, the secrets in training such models are shown and summarized at the end of this work. Where applicable, we extend current works to different algorithms, different cost functions, and different regularization schemes to improve generative models. |
Tasks | |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.00888v1 |
https://arxiv.org/pdf/1910.00888v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-estimation-of-the-wasserstein-distance |
Repo | |
Framework | |
On the Difference Between the Information Bottleneck and the Deep Information Bottleneck
Title | On the Difference Between the Information Bottleneck and the Deep Information Bottleneck |
Authors | Aleksander Wieczorek, Volker Roth |
Abstract | Combining the Information Bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proved successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. The two assumed properties of the data $X$, $Y$ and their latent representation $T$ take the form of two Markov chains $T-X-Y$ and $X-T-Y$. Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions $P(X,Y,T)$. We therefore show how to circumvent this limitation by optimising a lower bound for $I(T;Y)$ for which only the latter Markov chain has to be satisfied. The actual mutual information consists of the lower bound which is optimised in DVIB and cognate models in practice and of two terms measuring how much the former requirement $T-X-Y$ is violated. Finally, we propose to interpret the family of information bottleneck models as directed graphical models and show that in this framework the original and deep information bottlenecks are special cases of a fundamental IB model. |
Tasks | |
Published | 2019-12-31 |
URL | https://arxiv.org/abs/1912.13480v1 |
https://arxiv.org/pdf/1912.13480v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-difference-between-the-information |
Repo | |
Framework | |
Decay Replay Mining to Predict Next Process Events
Title | Decay Replay Mining to Predict Next Process Events |
Authors | Julian Theis, Houshang Darabi |
Abstract | In complex processes, various events can happen in different sequences. The prediction of the next event given an a-priori process state is of importance in such processes. Recent methods have proposed deep learning techniques such as recurrent neural networks, developed on raw event logs, to predict the next event from a process state. However, such deep learning models by themselves lack a clear representation of the process states. At the same time, recent methods have neglected the time feature of event instances. In this paper, we take advantage of Petri nets as a powerful tool in modeling complex process behaviors considering time as an elemental variable. We propose an approach which starts from a Petri net process model constructed by a process mining algorithm. We enhance the Petri net model with time decay functions to create continuous process state samples. Finally, we use these samples in combination with discrete token movement counters and Petri net markings to train a deep learning model that predicts the next event. We demonstrate significant performance improvements and outperform the state-of-the-art methods on nine real-world benchmark event logs. |
Tasks | |
Published | 2019-03-12 |
URL | https://arxiv.org/abs/1903.05084v3 |
https://arxiv.org/pdf/1903.05084v3.pdf | |
PWC | https://paperswithcode.com/paper/dream-nap-decay-replay-mining-to-predict-next |
Repo | |
Framework | |
A Formalization of The Natural Gradient Method for General Similarity Measures
Title | A Formalization of The Natural Gradient Method for General Similarity Measures |
Authors | Anton Mallasto, Tom Dela Haije, Aasa Feragen |
Abstract | In optimization, the natural gradient method is well-known for likelihood maximization. The method uses the Kullback-Leibler divergence, corresponding infinitesimally to the Fisher-Rao metric, which is pulled back to the parameter space of a family of probability distributions. This way, gradients with respect to the parameters respect the Fisher-Rao geometry of the space of distributions, which might differ vastly from the standard Euclidean geometry of the parameter space, often leading to faster convergence. However, when minimizing an arbitrary similarity measure between distributions, it is generally unclear which metric to use. We provide a general framework that, given a similarity measure, derives a metric for the natural gradient. We then discuss connections between the natural gradient method and multiple other optimization techniques in the literature. Finally, we provide computations of the formal natural gradient to show overlap with well-known cases and to compute natural gradients in novel frameworks. |
Tasks | |
Published | 2019-02-24 |
URL | http://arxiv.org/abs/1902.08959v1 |
http://arxiv.org/pdf/1902.08959v1.pdf | |
PWC | https://paperswithcode.com/paper/a-formalization-of-the-natural-gradient |
Repo | |
Framework | |
A sparse semismooth Newton based augmented Lagrangian method for large-scale support vector machines
Title | A sparse semismooth Newton based augmented Lagrangian method for large-scale support vector machines |
Authors | Dunbiao Niu, Chengjing Wang, Peipei Tang, Qingsong Wang, Enbin Song |
Abstract | Support vector machines (SVMs) are successful modeling and prediction tools with a variety of applications. Previous work has demonstrated the superiority of the SVMs in dealing with the high dimensional, low sample size problems. However, the numerical difficulties of the SVMs will become severe with the increase of the sample size. Although there exist many solvers for the SVMs, only few of them are designed by exploiting the special structures of the SVMs. In this paper, we propose a highly efficient sparse semismooth Newton based augmented Lagrangian method for solving a large-scale convex quadratic programming problem with a linear equality constraint and a simple box constraint, which is generated from the dual problems of the SVMs. By leveraging the primal-dual error bound result, the fast local convergence rate of the augmented Lagrangian method can be guaranteed. Furthermore, by exploiting the second-order sparsity of the problem when using the semismooth Newton method, the algorithm can efficiently solve the aforementioned difficult problems. Finally, numerical comparisons demonstrate that the proposed algorithm outperforms the current state-of-the-art solvers for the large-scale SVMs. |
Tasks | |
Published | 2019-10-03 |
URL | https://arxiv.org/abs/1910.01312v1 |
https://arxiv.org/pdf/1910.01312v1.pdf | |
PWC | https://paperswithcode.com/paper/a-sparse-semismooth-newton-based-augmented |
Repo | |
Framework | |
Transcoding compositionally: using attention to find more generalizable solutions
Title | Transcoding compositionally: using attention to find more generalizable solutions |
Authors | Kris Korrel, Dieuwke Hupkes, Verna Dankers, Elia Bruni |
Abstract | While sequence-to-sequence models have shown remarkable generalization power across several natural language tasks, their construct of solutions are argued to be less compositional than human-like generalization. In this paper, we present seq2attn, a new architecture that is specifically designed to exploit attention to find compositional patterns in the input. In seq2attn, the two standard components of an encoder-decoder model are connected via a transcoder, that modulates the information flow between them. We show that seq2attn can successfully generalize, without requiring any additional supervision, on two tasks which are specifically constructed to challenge the compositional skills of neural networks. The solutions found by the model are highly interpretable, allowing easy analysis of both the types of solutions that are found and potential causes for mistakes. We exploit this opportunity to introduce a new paradigm to test compositionality that studies the extent to which a model overgeneralizes when confronted with exceptions. We show that seq2attn exhibits such overgeneralization to a larger degree than a standard sequence-to-sequence model. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01234v2 |
https://arxiv.org/pdf/1906.01234v2.pdf | |
PWC | https://paperswithcode.com/paper/transcoding-compositionally-using-attention |
Repo | |
Framework | |
Decomposing Generalization: Models of Generic, Habitual, and Episodic Statements
Title | Decomposing Generalization: Models of Generic, Habitual, and Episodic Statements |
Authors | Venkata Subrahmanyan Govindarajan, Benjamin Van Durme, Aaron Steven White |
Abstract | We present a novel semantic framework for modeling linguistic expressions of generalization—generic, habitual, and episodic statements—as combinations of simple, real-valued referential properties of predicates and their arguments. We use this framework to construct a dataset covering the entirety of the Universal Dependencies English Web Treebank. We use this dataset to probe the efficacy of type-level and token-level information—including hand-engineered features and static (GloVe) and contextual (ELMo) word embeddings—for predicting expressions of generalization. Data and code are available at decomp.io. |
Tasks | Word Embeddings |
Published | 2019-01-31 |
URL | https://arxiv.org/abs/1901.11429v2 |
https://arxiv.org/pdf/1901.11429v2.pdf | |
PWC | https://paperswithcode.com/paper/decomposing-generalization-models-of-generic |
Repo | |
Framework | |
Deep-MAPS: Machine Learning based Mobile Air Pollution Sensing
Title | Deep-MAPS: Machine Learning based Mobile Air Pollution Sensing |
Authors | Jun Song, Ke Han |
Abstract | Mobile and ubiquitous sensing of urban air quality has received increased attention as an economically and operationally viable means to survey atmospheric environment with high spatial-temporal resolution. This paper proposes a machine learning based mobile air pollution sensing framework, called Deep-MAPS, and demonstrates its scientific and financial values in the following aspects. (1) Based on a network of fixed and mobile air quality sensors, we perform spatial inference of PM2.5 concentrations in Beijing (3,025 km2, 19 Jun-16 Jul 2018) for a spatial-temporal resolution of 1km-by-1km and 1 hour, with over 85% accuracy. (2) We leverage urban big data to generate insights regarding the potential cause of pollution, which facilitates evidence-based sustainable urban management. (3) To achieve such spatial-temporal coverage and accuracy, Deep-MAPS can save up to 90% hardware investment, compared with ubiquitous sensing that relies primarily on fixed sensors. |
Tasks | |
Published | 2019-04-28 |
URL | https://arxiv.org/abs/1904.12303v2 |
https://arxiv.org/pdf/1904.12303v2.pdf | |
PWC | https://paperswithcode.com/paper/exploring-urban-air-quality-with-maps-mobile |
Repo | |
Framework | |
User-Interactive Machine Learning Model for Identifying Structural Relationships of Code Features
Title | User-Interactive Machine Learning Model for Identifying Structural Relationships of Code Features |
Authors | Ankit Gupta, Kartik Chugh, Andrea Solis, Thomas LaToza |
Abstract | Traditional machine learning based intelligent systems assist users by learning patterns in data and making recommendations. However, these systems are limited in that the user has little means of understanding the rationale behind the systems suggestions, communicating their own understanding of patterns, or correcting system behavior. In this project, we outline a model for intelligent software based on a human computer feedback loop. The Machine Learning (ML) systems recommendations are reviewed by the user, and in turn, this information shapes the systems decision making. Our model was applied to developing an HTML editor that integrates ML with user interaction to ascertain structural relationships between HTML document features and apply them for code completion. The editor utilizes the ID3 algorithm to build decision trees, sequences of rules for predicting code the user will type. The editor displays the decision trees rules in the Interactive Rules Interface System (IRIS), which allows developers to prioritize, modify, or delete them. These interactions alter the data processed by ID3, providing the developer some control over the autocomplete system. Validation indicates that, absent user interaction, the ML model is able to predict tags with 78.4 percent accuracy, attributes with 62.9 percent accuracy, and values with 12.8 percent accuracy. Based off of the results of the user study, user interaction with the rules interface corrects feature relationships missed or mistaken by the automated process, enhancing autocomplete accuracy and developer productivity. Additionally, interaction is proven to help developers work with greater awareness of code patterns. Our research demonstrates the viability of a software integration of machine intelligence with human feedback. |
Tasks | Decision Making |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.07679v1 |
https://arxiv.org/pdf/1907.07679v1.pdf | |
PWC | https://paperswithcode.com/paper/user-interactive-machine-learning-model-for |
Repo | |
Framework | |
Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment
Title | Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment |
Authors | Yifan Wu, Ezra Winston, Divyansh Kaushik, Zachary Lipton |
Abstract | Domain adaptation addresses the common problem when the target distribution generating our test data drifts from the source (training) distribution. While absent assumptions, domain adaptation is impossible, strict conditions, e.g. covariate or label shift, enable principled algorithms. Recently-proposed domain-adversarial approaches consist of aligning source and target encodings, often motivating this approach as minimizing two (of three) terms in a theoretical bound on target error. Unfortunately, this minimization can cause arbitrary increases in the third term, e.g. they can break down under shifting label distributions. We propose asymmetrically-relaxed distribution alignment, a new approach that overcomes some limitations of standard domain-adversarial algorithms. Moreover, we characterize precise assumptions under which our algorithm is theoretically principled and demonstrate empirical benefits on both synthetic and real datasets. |
Tasks | Domain Adaptation |
Published | 2019-03-05 |
URL | http://arxiv.org/abs/1903.01689v2 |
http://arxiv.org/pdf/1903.01689v2.pdf | |
PWC | https://paperswithcode.com/paper/domain-adaptation-with-asymmetrically-relaxed |
Repo | |
Framework | |
From Crowdsourcing to Crowdmining: Using Implicit Human Intelligence for Better Understanding of Crowdsourced Data
Title | From Crowdsourcing to Crowdmining: Using Implicit Human Intelligence for Better Understanding of Crowdsourced Data |
Authors | Bin Guo, Huihui Chen, Yan Liu, Chao Chen, Qi Han, Zhiwen Yu |
Abstract | With the development of mobile social networks, more and more crowdsourced data are generated on the Web or collected from real-world sensing. The fragment, heterogeneous, and noisy nature of online/offline crowdsourced data, however, makes it difficult to be understood. Traditional content-based analyzing methods suffer from potential issues such as computational intensiveness and poor performance. To address them, this paper presents CrowdMining. In particular, we observe that the knowledge hidden in the process of data generation, regarding individual/crowd behavior patterns (e.g., mobility patterns, community contexts such as social ties and structure) and crowd-object interaction patterns (flickering or tweeting patterns) are neglected in crowdsourced data mining. Therefore, a novel approach that leverages implicit human intelligence (implicit HI) for crowdsourced data mining and understanding is proposed. Two studies titled CrowdEvent and CrowdRoute are presented to showcase its usage, where implicit HIs are extracted either from online or offline crowdsourced data. A generic model for CrowdMining is further proposed based on a set of existing studies. Experiments based on real-world datasets demonstrate the effectiveness of CrowdMining. |
Tasks | |
Published | 2019-08-07 |
URL | https://arxiv.org/abs/1908.02412v1 |
https://arxiv.org/pdf/1908.02412v1.pdf | |
PWC | https://paperswithcode.com/paper/from-crowdsourcing-to-crowdmining-using |
Repo | |
Framework | |
A new direction to promote the implementation of artificial intelligence in natural clinical settings
Title | A new direction to promote the implementation of artificial intelligence in natural clinical settings |
Authors | Yunyou Huang, Zhifei Zhang, Nana Wang, Nengquan Li, Mengjia Du, Tianshu Hao, Jianfeng Zhan |
Abstract | Artificial intelligence (AI) researchers claim that they have made great achievements' in clinical realms. However, clinicians point out the so-called achievements’ have no ability to implement into natural clinical settings. The root cause for this huge gap is that many essential features of natural clinical tasks are overlooked by AI system developers without medical background. In this paper, we propose that the clinical benchmark suite is a novel and promising direction to capture the essential features of the real-world clinical tasks, hence qualifies itself for guiding the development of AI systems, promoting the implementation of AI in real-world clinical practice. |
Tasks | |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.02940v1 |
https://arxiv.org/pdf/1905.02940v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-direction-to-promote-the-implementation |
Repo | |
Framework | |
Variance-Reduced Decentralized Stochastic Optimization with Gradient Tracking – Part II: GT-SVRG
Title | Variance-Reduced Decentralized Stochastic Optimization with Gradient Tracking – Part II: GT-SVRG |
Authors | Ran Xin, Usman A. Khan, Soummya Kar |
Abstract | Decentralized stochastic optimization has recently benefited from gradient tracking methods \cite{DSGT_Pu,DSGT_Xin} providing efficient solutions for large-scale empirical risk minimization problems. In Part I \cite{GT_SAGA} of this work, we develop \textbf{\texttt{GT-SAGA}} that is based on a decentralized implementation of SAGA \cite{SAGA} using gradient tracking and discuss regimes of practical interest where \textbf{\texttt{GT-SAGA}} outperforms existing decentralized approaches in terms of the total number of local gradient computations. In this paper, we describe \textbf{\texttt{GT-SVRG}} that develops a decentralized gradient tracking based implementation of SVRG \cite{SVRG}, another well-known variance-reduction technique. We show that the convergence rate of \textbf{\texttt{GT-SVRG}} matches that of \textbf{\texttt{GT-SAGA}} for smooth and strongly-convex functions and highlight different trade-offs between the two algorithms in various settings. |
Tasks | Stochastic Optimization |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.04057v2 |
https://arxiv.org/pdf/1910.04057v2.pdf | |
PWC | https://paperswithcode.com/paper/variance-reduced-decentralized-stochastic |
Repo | |
Framework | |
Game Design for Eliciting Distinguishable Behavior
Title | Game Design for Eliciting Distinguishable Behavior |
Authors | Fan Yang, Liu Leqi, Yifan Wu, Zachary C. Lipton, Pradeep Ravikumar, William W. Cohen, Tom Mitchell |
Abstract | The ability to inferring latent psychological traits from human behavior is key to developing personalized human-interacting machine learning systems. Approaches to infer such traits range from surveys to manually-constructed experiments and games. However, these traditional games are limited because they are typically designed based on heuristics. In this paper, we formulate the task of designing \emph{behavior diagnostic games} that elicit distinguishable behavior as a mutual information maximization problem, which can be solved by optimizing a variational lower bound. Our framework is instantiated by using prospect theory to model varying player traits, and Markov Decision Processes to parameterize the games. We validate our approach empirically, showing that our designed games can successfully distinguish among players with different traits, outperforming manually-designed ones by a large margin. |
Tasks | |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.06074v1 |
https://arxiv.org/pdf/1912.06074v1.pdf | |
PWC | https://paperswithcode.com/paper/game-design-for-eliciting-distinguishable-1 |
Repo | |
Framework | |