Paper Group ANR 942
Federated Evaluation of On-device Personalization. Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health. Empirical Bayes Method for Boltzmann Machines. Intermittent Demand Forecasting with Deep Renewal Processes. GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level. …
Federated Evaluation of On-device Personalization
Title | Federated Evaluation of On-device Personalization |
Authors | Kangkang Wang, Rajiv Mathews, Chloé Kiddon, Hubert Eichner, Françoise Beaufays, Daniel Ramage |
Abstract | Federated learning is a distributed, on-device computation framework that enables training global models without exporting sensitive user data to servers. In this work, we describe methods to extend the federation framework to evaluate strategies for personalization of global models. We present tools to analyze the effects of personalization and evaluate conditions under which personalization yields desirable models. We report on our experiments personalizing a language model for a virtual keyboard for smartphones with a population of tens of millions of users. We show that a significant fraction of users benefit from personalization. |
Tasks | Language Modelling |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.10252v1 |
https://arxiv.org/pdf/1910.10252v1.pdf | |
PWC | https://paperswithcode.com/paper/federated-evaluation-of-on-device |
Repo | |
Framework | |
Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health
Title | Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health |
Authors | Peng Liao, Predrag Klasnja, Susan Murphy |
Abstract | Due to recent advancements in mobile device and sensing technology, health scientists are increasingly interested in developing mobile health (mHealth) treatments that are delivered to individuals at moments in which they are most effective in influencing the individual’s behavior. The mHealth intervention policies, also called just-in-time adaptive interventions, are decision rules that map an individual’s context to a treatment at each of many time points. Many mHealth interventions are designed for longer-term use, however their long-term efficacy is not well understood. In this work, we provide an approach for conducting inference about the long-term performance of one or more such policies using historical data collected under a possibly different policy. Our performance measure is the average of proximal outcomes (rewards) over a long time period should the particular mHealth policy be followed. We model the relative value function by a nonparametric function class and develop a coupled, penalized estimator of the average reward. We show that the proposed estimator is asymptotically normal when the number of trajectories goes to infinity. This work is motivated by HeartSteps, an mHealth physical activity intervention. |
Tasks | |
Published | 2019-12-30 |
URL | https://arxiv.org/abs/1912.13088v2 |
https://arxiv.org/pdf/1912.13088v2.pdf | |
PWC | https://paperswithcode.com/paper/off-policy-estimation-of-long-term-average |
Repo | |
Framework | |
Empirical Bayes Method for Boltzmann Machines
Title | Empirical Bayes Method for Boltzmann Machines |
Authors | Muneki Yasuda, Tomoyuki Obuchi |
Abstract | In this study, we consider an empirical Bayes method for Boltzmann machines and propose an algorithm for it. The empirical Bayes method allows estimation of the values of the hyperparameters of the Boltzmann machine by maximizing a specific likelihood function referred to as the empirical Bayes likelihood function in this study. However, the maximization is computationally hard because the empirical Bayes likelihood function involves intractable integrations of the partition function. The proposed algorithm avoids this computational problem by using the replica method and the Plefka expansion. Our method does not require any iterative procedures and is quite simple and fast, though it introduces a bias to the estimate, which exhibits an unnatural behavior with respect to the size of the dataset. This peculiar behavior is supposed to be due to the approximate treatment by the Plefka expansion. A possible extension to overcome this behavior is also discussed. |
Tasks | |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06002v2 |
https://arxiv.org/pdf/1906.06002v2.pdf | |
PWC | https://paperswithcode.com/paper/empirical-bayes-method-for-boltzmann-machines |
Repo | |
Framework | |
Intermittent Demand Forecasting with Deep Renewal Processes
Title | Intermittent Demand Forecasting with Deep Renewal Processes |
Authors | Ali Caner Turkmen, Yuyang Wang, Tim Januschowski |
Abstract | Intermittent demand, where demand occurrences appear sporadically in time, is a common and challenging problem in forecasting. In this paper, we first make the connections between renewal processes, and a collection of current models used for intermittent demand forecasting. We then develop a set of models that benefit from recurrent neural networks to parameterize conditional interdemand time and size distributions, building on the latest paradigm in “deep” temporal point processes. We present favorable empirical findings on discrete and continuous time intermittent demand data, validating the practical value of our approach. |
Tasks | Point Processes |
Published | 2019-11-23 |
URL | https://arxiv.org/abs/1911.10416v1 |
https://arxiv.org/pdf/1911.10416v1.pdf | |
PWC | https://paperswithcode.com/paper/intermittent-demand-forecasting-with-deep |
Repo | |
Framework | |
GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level
Title | GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level |
Authors | Zixian Huang, Yulin Shen, Xiao Li, Yuang Wei, Gong Cheng, Lin Zhou, Xinyu Dai, Yuzhong Qu |
Abstract | Scenario-based question answering (SQA) has attracted increasing research attention. It typically requires retrieving and integrating knowledge from multiple sources, and applying general knowledge to a specific case described by a scenario. SQA widely exists in the medical, geography, and legal domains—both in practice and in the exams. In this paper, we introduce the GeoSQA dataset. It consists of 1,981 scenarios and 4,110 multiple-choice questions in the geography domain at high school level, where diagrams (e.g., maps, charts) have been manually annotated with natural language descriptions to benefit NLP research. Benchmark results on a variety of state-of-the-art methods for question answering, textual entailment, and reading comprehension demonstrate the unique challenges presented by SQA for future research. |
Tasks | Natural Language Inference, Question Answering, Reading Comprehension |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07855v1 |
https://arxiv.org/pdf/1908.07855v1.pdf | |
PWC | https://paperswithcode.com/paper/190807855 |
Repo | |
Framework | |
Visuallly Grounded Generation of Entailments from Premises
Title | Visuallly Grounded Generation of Entailments from Premises |
Authors | Somaye Jafaritazehjani, Albert Gatt, Marc Tanti |
Abstract | Natural Language Inference (NLI) is the task of determining the semantic relationship between a premise and a hypothesis. In this paper, we focus on the {\em generation} of hypotheses from premises in a multimodal setting, to generate a sentence (hypothesis) given an image and/or its description (premise) as the input. The main goals of this paper are (a) to investigate whether it is reasonable to frame NLI as a generation task; and (b) to consider the degree to which grounding textual premises in visual information is beneficial to generation. We compare different neural architectures, showing through automatic and human evaluation that entailments can indeed be generated successfully. We also show that multimodal models outperform unimodal models in this task, albeit marginally. |
Tasks | Natural Language Inference |
Published | 2019-09-21 |
URL | https://arxiv.org/abs/1909.09788v1 |
https://arxiv.org/pdf/1909.09788v1.pdf | |
PWC | https://paperswithcode.com/paper/190909788 |
Repo | |
Framework | |
Improving Background Based Conversation with Context-aware Knowledge Pre-selection
Title | Improving Background Based Conversation with Context-aware Knowledge Pre-selection |
Authors | Yangjun Zhang, Pengjie Ren, Maarten de Rijke |
Abstract | Background Based Conversations (BBCs) have been developed to make dialogue systems generate more informative and natural responses by leveraging background knowledge. Existing methods for BBCs can be grouped into two categories: extraction-based methods and generation-based methods. The former extract spans frombackground material as responses that are not necessarily natural. The latter generate responses thatare natural but not necessarily effective in leveraging background knowledge. In this paper, we focus on generation-based methods and propose a model, namely Context-aware Knowledge Pre-selection (CaKe), which introduces a pre-selection process that uses dynamic bi-directional attention to improve knowledge selection by using the utterance history context as prior information to select the most relevant background material. Experimental results show that our model is superior to current state-of-the-art baselines, indicating that it benefits from the pre-selection process, thus improving in-formativeness and fluency. |
Tasks | |
Published | 2019-06-16 |
URL | https://arxiv.org/abs/1906.06685v1 |
https://arxiv.org/pdf/1906.06685v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-background-based-conversation-with |
Repo | |
Framework | |
Detecting Machine-Translated Text using Back Translation
Title | Detecting Machine-Translated Text using Back Translation |
Authors | Hoang-Quoc Nguyen-Son, Tran Phuong Thao, Seira Hidano, Shinsaku Kiyomoto |
Abstract | Machine-translated text plays a crucial role in the communication of people using different languages. However, adversaries can use such text for malicious purposes such as plagiarism and fake review. The existing methods detected a machine-translated text only using the text’s intrinsic content, but they are unsuitable for classifying the machine-translated and human-written texts with the same meanings. We have proposed a method to extract features used to distinguish machine/human text based on the similarity between the intrinsic text and its back-translation. The evaluation of detecting translated sentences with French shows that our method achieves 75.0% of both accuracy and F-score. It outperforms the existing methods whose the best accuracy is 62.8% and the F-score is 62.7%. The proposed method even detects more efficiently the back-translated text with 83.4% of accuracy, which is higher than 66.7% of the best previous accuracy. We also achieve similar results not only with F-score but also with similar experiments related to Japanese. Moreover, we prove that our detector can recognize both machine-translated and machine-back-translated texts without the language information which is used to generate these machine texts. It demonstrates the persistence of our method in various applications in both low- and rich-resource languages. |
Tasks | |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06558v1 |
https://arxiv.org/pdf/1910.06558v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-machine-translated-text-using-back |
Repo | |
Framework | |
Training products of expert capsules with mixing by dynamic routing
Title | Training products of expert capsules with mixing by dynamic routing |
Authors | Michael Hauser |
Abstract | This study develops an unsupervised learning algorithm for products of expert capsules with dynamic routing. Analogous to binary-valued neurons in Restricted Boltzmann Machines, the magnitude of a squashed capsule firing takes values between zero and one, representing the probability of the capsule being on. This analogy motivates the design of an energy function for capsule networks. In order to have an efficient sampling procedure where hidden layer nodes are not connected, the energy function is made consistent with dynamic routing in the sense of the probability of a capsule firing, and inference on the capsule network is computed with the dynamic routing between capsules procedure. In order to optimize the log-likelihood of the visible layer capsules, the gradient is found in terms of this energy function. The developed unsupervised learning algorithm is used to train a capsule network on standard vision datasets, and is able to generate realistic looking images from its learned distribution. |
Tasks | |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11643v1 |
https://arxiv.org/pdf/1907.11643v1.pdf | |
PWC | https://paperswithcode.com/paper/training-products-of-expert-capsules-with |
Repo | |
Framework | |
An Empirical Study on Learning Fairness Metrics for COMPAS Data with Human Supervision
Title | An Empirical Study on Learning Fairness Metrics for COMPAS Data with Human Supervision |
Authors | Hanchen Wang, Nina Grgic-Hlaca, Preethi Lahoti, Krishna P. Gummadi, Adrian Weller |
Abstract | The notion of individual fairness requires that similar people receive similar treatment. However, this is hard to achieve in practice since it is difficult to specify the appropriate similarity metric. In this work, we attempt to learn such similarity metric from human annotated data. We gather a new dataset of human judgments on a criminal recidivism prediction (COMPAS) task. By assuming the human supervision obeys the principle of individual fairness, we leverage prior work on metric learning, evaluate the performance of several metric learning methods on our dataset, and show that the learned metrics outperform the Euclidean and Precision metric under various criteria. We do not provide a way to directly learn a similarity metric satisfying the individual fairness, but to provide an empirical study on how to derive the similarity metric from human supervisors, then future work can use this as a tool to understand human supervision. |
Tasks | Metric Learning |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.10255v2 |
https://arxiv.org/pdf/1910.10255v2.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-study-on-learning-fairness |
Repo | |
Framework | |
Asymptotic nonparametric statistical analysis of stationary time series
Title | Asymptotic nonparametric statistical analysis of stationary time series |
Authors | Daniil Ryabko |
Abstract | Stationarity is a very general, qualitative assumption, that can be assessed on the basis of application specifics. It is thus a rather attractive assumption to base statistical analysis on, especially for problems for which less general qualitative assumptions, such as independence or finite memory, clearly fail. However, it has long been considered too general to allow for statistical inference to be made. One of the reasons for this is that rates of convergence, even of frequencies to the mean, are not available under this assumption alone. Recently, it has been shown that, while some natural and simple problems such as homogeneity, are indeed provably impossible to solve if one only assumes that the data is stationary (or stationary ergodic), many others can be solved using rather simple and intuitive algorithms. The latter problems include clustering and change point estimation. In this volume I summarize these results. The emphasis is on asymptotic consistency, since this the strongest property one can obtain assuming stationarity alone. While for most of the problems for which a solution is found this solution is algorithmically realizable, the main objective in this area of research, the objective which is only partially attained, is to understand what is possible and what is not possible to do for stationary time series. The considered problems include homogeneity testing, clustering with respect to distribution, clustering with respect to independence, change-point estimation, identity testing, and the general question of composite hypotheses testing. For the latter problem, a topological criterion for the existence of a consistent test is presented. In addition, several open questions are discussed. |
Tasks | Time Series |
Published | 2019-03-30 |
URL | http://arxiv.org/abs/1904.00173v1 |
http://arxiv.org/pdf/1904.00173v1.pdf | |
PWC | https://paperswithcode.com/paper/asymptotic-nonparametric-statistical-analysis |
Repo | |
Framework | |
Learning a faceted customer segmentation for discovering new business opportunities at Intel
Title | Learning a faceted customer segmentation for discovering new business opportunities at Intel |
Authors | Itay Lieder, Meirav Segal, Eran Avidan, Asaf Cohen, Tom Hope |
Abstract | For sales and marketing organizations within large enterprises, identifying and understanding new markets, customers and partners is a key challenge. Intel’s Sales and Marketing Group (SMG) faces similar challenges while growing in new markets and domains and evolving its existing business. In today’s complex technological and commercial landscape, there is need for intelligent automation supporting a fine-grained understanding of businesses in order to help SMG sift through millions of companies across many geographies and languages and identify relevant directions. We present a system developed in our company that mines millions of public business web pages, and extracts a faceted customer representation. We focus on two key customer aspects that are essential for finding relevant opportunities: industry segments (ranging from broad verticals such as healthcare, to more specific fields such as ‘video analytics’) and functional roles (e.g., ‘manufacturer’ or ‘retail’). To address the challenge of labeled data collection, we enrich our data with external information gleaned from Wikipedia, and develop a semi-supervised multi-label, multi-lingual deep learning model that parses customer website texts and classifies them into their respective facets. Our system scans and indexes companies as part of a large-scale knowledge graph that currently holds tens of millions of connected entities with thousands being fetched, enriched and connected to the graph by the hour in real time, and also supports knowledge and insight discovery. In experiments conducted in our company, we are able to significantly boost the performance of sales personnel in the task of discovering new customers and commercial partnership opportunities. |
Tasks | |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1912.00778v1 |
https://arxiv.org/pdf/1912.00778v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-faceted-customer-segmentation-for |
Repo | |
Framework | |
Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support
Title | Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support |
Authors | Sanket Tavarageri, Srinivas Sridharan, Bharat Kaul |
Abstract | The deep neural networks (DNNs) have been enormously successful in tasks that were hitherto in the human-only realm such as image recognition, and language translation. Owing to their success the DNNs are being explored for use in ever more sophisticated tasks. One of the ways that the DNNs are made to scale for the complex undertakings is by increasing their size – deeper and wider networks can model well the additional complexity. Such large models are trained using model parallelism on multiple compute devices such as multi-GPUs and multi-node systems. In this paper, we develop a compiler-driven approach to achieve model parallelism. We model the computation and communication costs of a dataflow graph that embodies the neural network training process and then, partition the graph using heuristics in such a manner that the communication between compute devices is minimal and we have a good load balance. The hardware scheduling assistants are proposed to assist the compiler in fine tuning the distribution of work at runtime. |
Tasks | |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.08168v1 |
https://arxiv.org/pdf/1906.08168v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-model-parallelism-for-deep-neural |
Repo | |
Framework | |
Evaluating time series forecasting models: An empirical study on performance estimation methods
Title | Evaluating time series forecasting models: An empirical study on performance estimation methods |
Authors | Vitor Cerqueira, Luis Torgo, Igor Mozetic |
Abstract | Performance estimation aims at estimating the loss that a predictive model will incur on unseen data. These procedures are part of the pipeline in every machine learning project and are used for assessing the overall generalisation ability of predictive models. In this paper we address the application of these methods to time series forecasting tasks. For independent and identically distributed data the most common approach is cross-validation. However, the dependency among observations in time series raises some caveats about the most appropriate way to estimate performance in this type of data and currently there is no settled way to do so. We compare different variants of cross-validation and of out-of-sample approaches using two case studies: One with 62 real-world time series and another with three synthetic time series. Results show noticeable differences in the performance estimation methods in the two scenarios. In particular, empirical experiments suggest that cross-validation approaches can be applied to stationary time series. However, in real-world scenarios, when different sources of non-stationary variation are at play, the most accurate estimates are produced by out-of-sample methods that preserve the temporal order of observations. |
Tasks | Time Series, Time Series Forecasting |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11744v1 |
https://arxiv.org/pdf/1905.11744v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-time-series-forecasting-models-an |
Repo | |
Framework | |
Isolation Kernel: The X Factor in Efficient and Effective Large Scale Online Kernel Learning
Title | Isolation Kernel: The X Factor in Efficient and Effective Large Scale Online Kernel Learning |
Authors | Kai Ming Ting, Jonathan R. Wells, Takashi Washio |
Abstract | Large scale online kernel learning aims to build an efficient and scalable kernel-based predictive model incrementally from a sequence of potentially infinite data points. A current key approach focuses on ways to produce an approximate finite-dimensional feature map, assuming that the kernel used has a feature map with intractable dimensionality—an assumption traditionally held in kernel-based methods. While this approach can deal with large scale datasets efficiently, this outcome is achieved by compromising predictive accuracy because of the approximation. We offer an alternative approach which overrides the assumption and puts the kernel used at the heart of the approach. It focuses on creating an exact, sparse and finite-dimensional feature map of a kernel called Isolation Kernel. Using this new approach, to achieve the above aim of large scale online kernel learning becomes extremely simple—simply use Isolation Kernel instead of a kernel having a feature map with intractable dimensionality. We show that, using Isolation Kernel, large scale online kernel learning can be achieved efficiently without sacrificing accuracy. |
Tasks | |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01104v2 |
https://arxiv.org/pdf/1907.01104v2.pdf | |
PWC | https://paperswithcode.com/paper/isolation-kernel-the-x-factor-in-efficient |
Repo | |
Framework | |