January 28, 2020

3092 words 15 mins read

Paper Group ANR 942

Federated Evaluation of On-device Personalization. Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health. Empirical Bayes Method for Boltzmann Machines. Intermittent Demand Forecasting with Deep Renewal Processes. GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level. …

Federated Evaluation of On-device Personalization


Title	Federated Evaluation of On-device Personalization
Authors	Kangkang Wang, Rajiv Mathews, Chloé Kiddon, Hubert Eichner, Françoise Beaufays, Daniel Ramage
Abstract	Federated learning is a distributed, on-device computation framework that enables training global models without exporting sensitive user data to servers. In this work, we describe methods to extend the federation framework to evaluate strategies for personalization of global models. We present tools to analyze the effects of personalization and evaluate conditions under which personalization yields desirable models. We report on our experiments personalizing a language model for a virtual keyboard for smartphones with a population of tens of millions of users. We show that a significant fraction of users benefit from personalization.
Tasks	Language Modelling
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10252v1
PDF	https://arxiv.org/pdf/1910.10252v1.pdf
PWC	https://paperswithcode.com/paper/federated-evaluation-of-on-device
Repo
Framework

Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health


Title	Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health
Authors	Peng Liao, Predrag Klasnja, Susan Murphy
Abstract	Due to recent advancements in mobile device and sensing technology, health scientists are increasingly interested in developing mobile health (mHealth) treatments that are delivered to individuals at moments in which they are most effective in influencing the individual’s behavior. The mHealth intervention policies, also called just-in-time adaptive interventions, are decision rules that map an individual’s context to a treatment at each of many time points. Many mHealth interventions are designed for longer-term use, however their long-term efficacy is not well understood. In this work, we provide an approach for conducting inference about the long-term performance of one or more such policies using historical data collected under a possibly different policy. Our performance measure is the average of proximal outcomes (rewards) over a long time period should the particular mHealth policy be followed. We model the relative value function by a nonparametric function class and develop a coupled, penalized estimator of the average reward. We show that the proposed estimator is asymptotically normal when the number of trajectories goes to infinity. This work is motivated by HeartSteps, an mHealth physical activity intervention.
Tasks
Published	2019-12-30
URL	https://arxiv.org/abs/1912.13088v2
PDF	https://arxiv.org/pdf/1912.13088v2.pdf
PWC	https://paperswithcode.com/paper/off-policy-estimation-of-long-term-average
Repo
Framework

Empirical Bayes Method for Boltzmann Machines


Title	Empirical Bayes Method for Boltzmann Machines
Authors	Muneki Yasuda, Tomoyuki Obuchi
Abstract	In this study, we consider an empirical Bayes method for Boltzmann machines and propose an algorithm for it. The empirical Bayes method allows estimation of the values of the hyperparameters of the Boltzmann machine by maximizing a specific likelihood function referred to as the empirical Bayes likelihood function in this study. However, the maximization is computationally hard because the empirical Bayes likelihood function involves intractable integrations of the partition function. The proposed algorithm avoids this computational problem by using the replica method and the Plefka expansion. Our method does not require any iterative procedures and is quite simple and fast, though it introduces a bias to the estimate, which exhibits an unnatural behavior with respect to the size of the dataset. This peculiar behavior is supposed to be due to the approximate treatment by the Plefka expansion. A possible extension to overcome this behavior is also discussed.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06002v2
PDF	https://arxiv.org/pdf/1906.06002v2.pdf
PWC	https://paperswithcode.com/paper/empirical-bayes-method-for-boltzmann-machines
Repo
Framework

Intermittent Demand Forecasting with Deep Renewal Processes


Title	Intermittent Demand Forecasting with Deep Renewal Processes
Authors	Ali Caner Turkmen, Yuyang Wang, Tim Januschowski
Abstract	Intermittent demand, where demand occurrences appear sporadically in time, is a common and challenging problem in forecasting. In this paper, we first make the connections between renewal processes, and a collection of current models used for intermittent demand forecasting. We then develop a set of models that benefit from recurrent neural networks to parameterize conditional interdemand time and size distributions, building on the latest paradigm in “deep” temporal point processes. We present favorable empirical findings on discrete and continuous time intermittent demand data, validating the practical value of our approach.
Tasks	Point Processes
Published	2019-11-23
URL	https://arxiv.org/abs/1911.10416v1
PDF	https://arxiv.org/pdf/1911.10416v1.pdf
PWC	https://paperswithcode.com/paper/intermittent-demand-forecasting-with-deep
Repo
Framework

GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level


Title	GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level
Authors	Zixian Huang, Yulin Shen, Xiao Li, Yuang Wei, Gong Cheng, Lin Zhou, Xinyu Dai, Yuzhong Qu
Abstract	Scenario-based question answering (SQA) has attracted increasing research attention. It typically requires retrieving and integrating knowledge from multiple sources, and applying general knowledge to a specific case described by a scenario. SQA widely exists in the medical, geography, and legal domains—both in practice and in the exams. In this paper, we introduce the GeoSQA dataset. It consists of 1,981 scenarios and 4,110 multiple-choice questions in the geography domain at high school level, where diagrams (e.g., maps, charts) have been manually annotated with natural language descriptions to benefit NLP research. Benchmark results on a variety of state-of-the-art methods for question answering, textual entailment, and reading comprehension demonstrate the unique challenges presented by SQA for future research.
Tasks	Natural Language Inference, Question Answering, Reading Comprehension
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07855v1
PDF	https://arxiv.org/pdf/1908.07855v1.pdf
PWC	https://paperswithcode.com/paper/190807855
Repo
Framework

Visuallly Grounded Generation of Entailments from Premises


Title	Visuallly Grounded Generation of Entailments from Premises
Authors	Somaye Jafaritazehjani, Albert Gatt, Marc Tanti
Abstract	Natural Language Inference (NLI) is the task of determining the semantic relationship between a premise and a hypothesis. In this paper, we focus on the {\em generation} of hypotheses from premises in a multimodal setting, to generate a sentence (hypothesis) given an image and/or its description (premise) as the input. The main goals of this paper are (a) to investigate whether it is reasonable to frame NLI as a generation task; and (b) to consider the degree to which grounding textual premises in visual information is beneficial to generation. We compare different neural architectures, showing through automatic and human evaluation that entailments can indeed be generated successfully. We also show that multimodal models outperform unimodal models in this task, albeit marginally.
Tasks	Natural Language Inference
Published	2019-09-21
URL	https://arxiv.org/abs/1909.09788v1
PDF	https://arxiv.org/pdf/1909.09788v1.pdf
PWC	https://paperswithcode.com/paper/190909788
Repo
Framework

Improving Background Based Conversation with Context-aware Knowledge Pre-selection


Title	Improving Background Based Conversation with Context-aware Knowledge Pre-selection
Authors	Yangjun Zhang, Pengjie Ren, Maarten de Rijke
Abstract	Background Based Conversations (BBCs) have been developed to make dialogue systems generate more informative and natural responses by leveraging background knowledge. Existing methods for BBCs can be grouped into two categories: extraction-based methods and generation-based methods. The former extract spans frombackground material as responses that are not necessarily natural. The latter generate responses thatare natural but not necessarily effective in leveraging background knowledge. In this paper, we focus on generation-based methods and propose a model, namely Context-aware Knowledge Pre-selection (CaKe), which introduces a pre-selection process that uses dynamic bi-directional attention to improve knowledge selection by using the utterance history context as prior information to select the most relevant background material. Experimental results show that our model is superior to current state-of-the-art baselines, indicating that it benefits from the pre-selection process, thus improving in-formativeness and fluency.
Tasks
Published	2019-06-16
URL	https://arxiv.org/abs/1906.06685v1
PDF	https://arxiv.org/pdf/1906.06685v1.pdf
PWC	https://paperswithcode.com/paper/improving-background-based-conversation-with
Repo
Framework

Detecting Machine-Translated Text using Back Translation


Title	Detecting Machine-Translated Text using Back Translation
Authors	Hoang-Quoc Nguyen-Son, Tran Phuong Thao, Seira Hidano, Shinsaku Kiyomoto
Abstract	Machine-translated text plays a crucial role in the communication of people using different languages. However, adversaries can use such text for malicious purposes such as plagiarism and fake review. The existing methods detected a machine-translated text only using the text’s intrinsic content, but they are unsuitable for classifying the machine-translated and human-written texts with the same meanings. We have proposed a method to extract features used to distinguish machine/human text based on the similarity between the intrinsic text and its back-translation. The evaluation of detecting translated sentences with French shows that our method achieves 75.0% of both accuracy and F-score. It outperforms the existing methods whose the best accuracy is 62.8% and the F-score is 62.7%. The proposed method even detects more efficiently the back-translated text with 83.4% of accuracy, which is higher than 66.7% of the best previous accuracy. We also achieve similar results not only with F-score but also with similar experiments related to Japanese. Moreover, we prove that our detector can recognize both machine-translated and machine-back-translated texts without the language information which is used to generate these machine texts. It demonstrates the persistence of our method in various applications in both low- and rich-resource languages.
Tasks
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06558v1
PDF	https://arxiv.org/pdf/1910.06558v1.pdf
PWC	https://paperswithcode.com/paper/detecting-machine-translated-text-using-back
Repo
Framework

Training products of expert capsules with mixing by dynamic routing


Title	Training products of expert capsules with mixing by dynamic routing
Authors	Michael Hauser
Abstract	This study develops an unsupervised learning algorithm for products of expert capsules with dynamic routing. Analogous to binary-valued neurons in Restricted Boltzmann Machines, the magnitude of a squashed capsule firing takes values between zero and one, representing the probability of the capsule being on. This analogy motivates the design of an energy function for capsule networks. In order to have an efficient sampling procedure where hidden layer nodes are not connected, the energy function is made consistent with dynamic routing in the sense of the probability of a capsule firing, and inference on the capsule network is computed with the dynamic routing between capsules procedure. In order to optimize the log-likelihood of the visible layer capsules, the gradient is found in terms of this energy function. The developed unsupervised learning algorithm is used to train a capsule network on standard vision datasets, and is able to generate realistic looking images from its learned distribution.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11643v1
PDF	https://arxiv.org/pdf/1907.11643v1.pdf
PWC	https://paperswithcode.com/paper/training-products-of-expert-capsules-with
Repo
Framework

An Empirical Study on Learning Fairness Metrics for COMPAS Data with Human Supervision


Title	An Empirical Study on Learning Fairness Metrics for COMPAS Data with Human Supervision
Authors	Hanchen Wang, Nina Grgic-Hlaca, Preethi Lahoti, Krishna P. Gummadi, Adrian Weller
Abstract	The notion of individual fairness requires that similar people receive similar treatment. However, this is hard to achieve in practice since it is difficult to specify the appropriate similarity metric. In this work, we attempt to learn such similarity metric from human annotated data. We gather a new dataset of human judgments on a criminal recidivism prediction (COMPAS) task. By assuming the human supervision obeys the principle of individual fairness, we leverage prior work on metric learning, evaluate the performance of several metric learning methods on our dataset, and show that the learned metrics outperform the Euclidean and Precision metric under various criteria. We do not provide a way to directly learn a similarity metric satisfying the individual fairness, but to provide an empirical study on how to derive the similarity metric from human supervisors, then future work can use this as a tool to understand human supervision.
Tasks	Metric Learning
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10255v2
PDF	https://arxiv.org/pdf/1910.10255v2.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-on-learning-fairness
Repo
Framework

Asymptotic nonparametric statistical analysis of stationary time series


Title	Asymptotic nonparametric statistical analysis of stationary time series
Authors	Daniil Ryabko
Abstract	Stationarity is a very general, qualitative assumption, that can be assessed on the basis of application specifics. It is thus a rather attractive assumption to base statistical analysis on, especially for problems for which less general qualitative assumptions, such as independence or finite memory, clearly fail. However, it has long been considered too general to allow for statistical inference to be made. One of the reasons for this is that rates of convergence, even of frequencies to the mean, are not available under this assumption alone. Recently, it has been shown that, while some natural and simple problems such as homogeneity, are indeed provably impossible to solve if one only assumes that the data is stationary (or stationary ergodic), many others can be solved using rather simple and intuitive algorithms. The latter problems include clustering and change point estimation. In this volume I summarize these results. The emphasis is on asymptotic consistency, since this the strongest property one can obtain assuming stationarity alone. While for most of the problems for which a solution is found this solution is algorithmically realizable, the main objective in this area of research, the objective which is only partially attained, is to understand what is possible and what is not possible to do for stationary time series. The considered problems include homogeneity testing, clustering with respect to distribution, clustering with respect to independence, change-point estimation, identity testing, and the general question of composite hypotheses testing. For the latter problem, a topological criterion for the existence of a consistent test is presented. In addition, several open questions are discussed.
Tasks	Time Series
Published	2019-03-30
URL	http://arxiv.org/abs/1904.00173v1
PDF	http://arxiv.org/pdf/1904.00173v1.pdf
PWC	https://paperswithcode.com/paper/asymptotic-nonparametric-statistical-analysis
Repo
Framework

Learning a faceted customer segmentation for discovering new business opportunities at Intel


Title	Learning a faceted customer segmentation for discovering new business opportunities at Intel
Authors	Itay Lieder, Meirav Segal, Eran Avidan, Asaf Cohen, Tom Hope
Abstract	For sales and marketing organizations within large enterprises, identifying and understanding new markets, customers and partners is a key challenge. Intel’s Sales and Marketing Group (SMG) faces similar challenges while growing in new markets and domains and evolving its existing business. In today’s complex technological and commercial landscape, there is need for intelligent automation supporting a fine-grained understanding of businesses in order to help SMG sift through millions of companies across many geographies and languages and identify relevant directions. We present a system developed in our company that mines millions of public business web pages, and extracts a faceted customer representation. We focus on two key customer aspects that are essential for finding relevant opportunities: industry segments (ranging from broad verticals such as healthcare, to more specific fields such as ‘video analytics’) and functional roles (e.g., ‘manufacturer’ or ‘retail’). To address the challenge of labeled data collection, we enrich our data with external information gleaned from Wikipedia, and develop a semi-supervised multi-label, multi-lingual deep learning model that parses customer website texts and classifies them into their respective facets. Our system scans and indexes companies as part of a large-scale knowledge graph that currently holds tens of millions of connected entities with thousands being fetched, enriched and connected to the graph by the hour in real time, and also supports knowledge and insight discovery. In experiments conducted in our company, we are able to significantly boost the performance of sales personnel in the task of discovering new customers and commercial partnership opportunities.
Tasks
Published	2019-11-27
URL	https://arxiv.org/abs/1912.00778v1
PDF	https://arxiv.org/pdf/1912.00778v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-faceted-customer-segmentation-for
Repo
Framework

Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support


Title	Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support
Authors	Sanket Tavarageri, Srinivas Sridharan, Bharat Kaul
Abstract	The deep neural networks (DNNs) have been enormously successful in tasks that were hitherto in the human-only realm such as image recognition, and language translation. Owing to their success the DNNs are being explored for use in ever more sophisticated tasks. One of the ways that the DNNs are made to scale for the complex undertakings is by increasing their size – deeper and wider networks can model well the additional complexity. Such large models are trained using model parallelism on multiple compute devices such as multi-GPUs and multi-node systems. In this paper, we develop a compiler-driven approach to achieve model parallelism. We model the computation and communication costs of a dataflow graph that embodies the neural network training process and then, partition the graph using heuristics in such a manner that the communication between compute devices is minimal and we have a good load balance. The hardware scheduling assistants are proposed to assist the compiler in fine tuning the distribution of work at runtime.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.08168v1
PDF	https://arxiv.org/pdf/1906.08168v1.pdf
PWC	https://paperswithcode.com/paper/automatic-model-parallelism-for-deep-neural
Repo
Framework

Evaluating time series forecasting models: An empirical study on performance estimation methods


Title	Evaluating time series forecasting models: An empirical study on performance estimation methods
Authors	Vitor Cerqueira, Luis Torgo, Igor Mozetic
Abstract	Performance estimation aims at estimating the loss that a predictive model will incur on unseen data. These procedures are part of the pipeline in every machine learning project and are used for assessing the overall generalisation ability of predictive models. In this paper we address the application of these methods to time series forecasting tasks. For independent and identically distributed data the most common approach is cross-validation. However, the dependency among observations in time series raises some caveats about the most appropriate way to estimate performance in this type of data and currently there is no settled way to do so. We compare different variants of cross-validation and of out-of-sample approaches using two case studies: One with 62 real-world time series and another with three synthetic time series. Results show noticeable differences in the performance estimation methods in the two scenarios. In particular, empirical experiments suggest that cross-validation approaches can be applied to stationary time series. However, in real-world scenarios, when different sources of non-stationary variation are at play, the most accurate estimates are produced by out-of-sample methods that preserve the temporal order of observations.
Tasks	Time Series, Time Series Forecasting
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11744v1
PDF	https://arxiv.org/pdf/1905.11744v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-time-series-forecasting-models-an
Repo
Framework

Isolation Kernel: The X Factor in Efficient and Effective Large Scale Online Kernel Learning


Title	Isolation Kernel: The X Factor in Efficient and Effective Large Scale Online Kernel Learning
Authors	Kai Ming Ting, Jonathan R. Wells, Takashi Washio
Abstract	Large scale online kernel learning aims to build an efficient and scalable kernel-based predictive model incrementally from a sequence of potentially infinite data points. A current key approach focuses on ways to produce an approximate finite-dimensional feature map, assuming that the kernel used has a feature map with intractable dimensionality—an assumption traditionally held in kernel-based methods. While this approach can deal with large scale datasets efficiently, this outcome is achieved by compromising predictive accuracy because of the approximation. We offer an alternative approach which overrides the assumption and puts the kernel used at the heart of the approach. It focuses on creating an exact, sparse and finite-dimensional feature map of a kernel called Isolation Kernel. Using this new approach, to achieve the above aim of large scale online kernel learning becomes extremely simple—simply use Isolation Kernel instead of a kernel having a feature map with intractable dimensionality. We show that, using Isolation Kernel, large scale online kernel learning can be achieved efficiently without sacrificing accuracy.
Tasks
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01104v2
PDF	https://arxiv.org/pdf/1907.01104v2.pdf
PWC	https://paperswithcode.com/paper/isolation-kernel-the-x-factor-in-efficient
Repo
Framework