October 16, 2019

2870 words 14 mins read

Paper Group ANR 1034

Paper Group ANR 1034

Pilot study for the COST Action “Reassembling the Republic of Letters”: language-driven network analysis of letters from the Hartlib’s Papers. Dimension-Robust MCMC in Bayesian Inverse Problems. Algorithms Inspired by Nature: A Survey. An Estimation and Analysis Framework for the Rasch Model. Mixed Uncertainty Sets for Robust Combinatorial Optimiza …

Pilot study for the COST Action “Reassembling the Republic of Letters”: language-driven network analysis of letters from the Hartlib’s Papers

Title Pilot study for the COST Action “Reassembling the Republic of Letters”: language-driven network analysis of letters from the Hartlib’s Papers
Authors Barbara McGillivray, Federico Sangati
Abstract The present report summarizes an exploratory study which we carried out in the context of the COST Action IS1310 “Reassembling the Republic of Letters, 1500-1800”, and which is relevant to the activities of Working Group 3 “Texts and Topics” and Working Group 2 “People and Networks”. In this study we investigated the use of Natural Language Processing (NLP) and Network Text Analysis on a small sample of seventeenth-century letters selected from Hartlib Papers, whose records are in one of the catalogues of Early Modern Letters Online (EMLO) and whose online edition is available on the website of the Humanities Research Institute at the University of Sheffield (http://www.hrionline.ac.uk/hartlib/). We outline the NLP pipeline used to automatically process the texts into a network representation, in order to identify the texts’ “narrative centrality”, i.e. the most central entities in the texts, and the relations between them.
Tasks
Published 2018-01-30
URL http://arxiv.org/abs/1801.09896v1
PDF http://arxiv.org/pdf/1801.09896v1.pdf
PWC https://paperswithcode.com/paper/pilot-study-for-the-cost-action-reassembling
Repo
Framework

Dimension-Robust MCMC in Bayesian Inverse Problems

Title Dimension-Robust MCMC in Bayesian Inverse Problems
Authors Victor Chen, Matthew M. Dunlop, Omiros Papaspiliopoulos, Andrew M. Stuart
Abstract The methodology developed in this article is motivated by a wide range of prediction and uncertainty quantification problems that arise in Statistics, Machine Learning and Applied Mathematics, such as non-parametric regression, multi-class classification and inversion of partial differential equations. One popular formulation of such problems is as Bayesian inverse problems, where a prior distribution is used to regularize inference on a high-dimensional latent state, typically a function or a field. It is common that such priors are non-Gaussian, for example piecewise-constant or heavy-tailed, and/or hierarchical, in the sense of involving a further set of low-dimensional parameters, which, for example, control the scale or smoothness of the latent state. In this formulation prediction and uncertainty quantification relies on efficient exploration of the posterior distribution of latent states and parameters. This article introduces a framework for efficient MCMC sampling in Bayesian inverse problems that capitalizes upon two fundamental ideas in MCMC, non-centred parameterisations of hierarchical models and dimension-robust samplers for latent Gaussian processes. Using a range of diverse applications we showcase that the proposed framework is dimension-robust, that is, the efficiency of the MCMC sampling does not deteriorate as the dimension of the latent state gets higher. We showcase the full potential of the machinery we develop in the article in semi-supervised multi-class classification, where our sampling algorithm is used within an active learning framework to guide the selection of input data to manually label in order to achieve high predictive accuracy with a minimal number of labelled data.
Tasks Active Learning, Efficient Exploration, Gaussian Processes
Published 2018-03-09
URL http://arxiv.org/abs/1803.03344v2
PDF http://arxiv.org/pdf/1803.03344v2.pdf
PWC https://paperswithcode.com/paper/robust-mcmc-sampling-with-non-gaussian-and
Repo
Framework

Algorithms Inspired by Nature: A Survey

Title Algorithms Inspired by Nature: A Survey
Authors Pranshu Gupta
Abstract Nature is known to be the best optimizer. Natural processes most often than not reach an optimal equilibrium. Scientists have always strived to understand and model such processes.Thus, many algorithms exist today that are inspired by nature. Many of these algorithms and heuristics can be used to solve problems for which no polynomial time algorithms exist,such as Job Shop Scheduling and many other Combinatorial Optimization problems. We will discuss some of these algorithms and heuristics and how they help us solve complex problems of practical importance.
Tasks Combinatorial Optimization
Published 2018-12-13
URL http://arxiv.org/abs/1903.01893v1
PDF http://arxiv.org/pdf/1903.01893v1.pdf
PWC https://paperswithcode.com/paper/algorithms-inspired-by-nature-a-survey
Repo
Framework

An Estimation and Analysis Framework for the Rasch Model

Title An Estimation and Analysis Framework for the Rasch Model
Authors Andrew S. Lan, Mung Chiang, Christoph Studer
Abstract The Rasch model is widely used for item response analysis in applications ranging from recommender systems to psychology, education, and finance. While a number of estimators have been proposed for the Rasch model over the last decades, the available analytical performance guarantees are mostly asymptotic. This paper provides a framework that relies on a novel linear minimum mean-squared error (L-MMSE) estimator which enables an exact, nonasymptotic, and closed-form analysis of the parameter estimation error under the Rasch model. The proposed framework provides guidelines on the number of items and responses required to attain low estimation errors in tests or surveys. We furthermore demonstrate its efficacy on a number of real-world collaborative filtering datasets, which reveals that the proposed L-MMSE estimator performs on par with state-of-the-art nonlinear estimators in terms of predictive performance.
Tasks Recommendation Systems
Published 2018-06-09
URL http://arxiv.org/abs/1806.03551v1
PDF http://arxiv.org/pdf/1806.03551v1.pdf
PWC https://paperswithcode.com/paper/an-estimation-and-analysis-framework-for-the
Repo
Framework

Mixed Uncertainty Sets for Robust Combinatorial Optimization

Title Mixed Uncertainty Sets for Robust Combinatorial Optimization
Authors Trivikram Dokka, Marc Goerigk, Rahul Roy
Abstract In robust optimization, the uncertainty set is used to model all possible outcomes of uncertain parameters. In the classic setting, one assumes that this set is provided by the decision maker based on the data available to her. Only recently it has been recognized that the process of building useful uncertainty sets is in itself a challenging task that requires mathematical support. In this paper, we propose an approach to go beyond the classic setting, by assuming multiple uncertainty sets to be prepared, each with a weight showing the degree of belief that the set is a “true” model of uncertainty. We consider theoretical aspects of this approach and show that it is as easy to model as the classic setting. In an extensive computational study using a shortest path problem based on real-world data, we auto-tune uncertainty sets to the available data, and show that with regard to out-sample performance, the combination of multiple sets can give better results than each set on its own.
Tasks Combinatorial Optimization
Published 2018-12-12
URL http://arxiv.org/abs/1812.04895v2
PDF http://arxiv.org/pdf/1812.04895v2.pdf
PWC https://paperswithcode.com/paper/mixed-uncertainty-sets-for-robust
Repo
Framework

Predicting Audio Advertisement Quality

Title Predicting Audio Advertisement Quality
Authors Samaneh Ebrahimi, Hossein Vahabi, Matthew Prockup, Oriol Nieto
Abstract Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads ranking and better audio ads creation. In this paper we propose one way to measure the quality of the audio ads using a proxy metric called Long Click Rate (LCR), which is defined by the amount of time a user engages with the follow-up display ad (that is shown while the audio ad is playing) divided by the impressions. We later focus on predicting the audio ad quality using only acoustic features such as harmony, rhythm, and timbre of the audio, extracted from the raw waveform. We discuss how the characteristics of the sound can be connected to concepts such as the clarity of the audio ad message, its trustworthiness, etc. Finally, we propose a new deep learning model for audio ad quality prediction, which outperforms the other discussed models trained on hand-crafted features. To the best of our knowledge, this is the first large-scale audio ad quality prediction study.
Tasks
Published 2018-02-09
URL http://arxiv.org/abs/1802.03319v1
PDF http://arxiv.org/pdf/1802.03319v1.pdf
PWC https://paperswithcode.com/paper/predicting-audio-advertisement-quality
Repo
Framework

Measuring the Stability of EHR- and EKG-based Predictive Models

Title Measuring the Stability of EHR- and EKG-based Predictive Models
Authors Andrew C. Miller, Ziad Obermeyer, Sendhil Mullainathan
Abstract Databases of electronic health records (EHRs) are increasingly used to inform clinical decisions. Machine learning methods can find patterns in EHRs that are predictive of future adverse outcomes. However, statistical models may be built upon patterns of health-seeking behavior that vary across patient subpopulations, leading to poor predictive performance when training on one patient population and predicting on another. This note proposes two tests to better measure and understand model generalization. We use these tests to compare models derived from two data sources: (i) historical medical records, and (ii) electrocardiogram (EKG) waveforms. In a predictive task, we show that EKG-based models can be more stable than EHR-based models across different patient populations.
Tasks
Published 2018-12-01
URL http://arxiv.org/abs/1812.00210v1
PDF http://arxiv.org/pdf/1812.00210v1.pdf
PWC https://paperswithcode.com/paper/measuring-the-stability-of-ehr-and-ekg-based
Repo
Framework

Memory Augmented Self-Play

Title Memory Augmented Self-Play
Authors Shagun Sodhani, Vardaan Pahuja
Abstract Self-play is an unsupervised training procedure which enables the reinforcement learning agents to explore the environment without requiring any external rewards. We augment the self-play setting by providing an external memory where the agent can store experience from the previous tasks. This enables the agent to come up with more diverse self-play tasks resulting in faster exploration of the environment. The agent pretrained in the memory augmented self-play setting easily outperforms the agent pretrained in no-memory self-play setting.
Tasks
Published 2018-05-28
URL http://arxiv.org/abs/1805.11016v2
PDF http://arxiv.org/pdf/1805.11016v2.pdf
PWC https://paperswithcode.com/paper/memory-augmented-self-play
Repo
Framework

Mitigating Bias in Adaptive Data Gathering via Differential Privacy

Title Mitigating Bias in Adaptive Data Gathering via Differential Privacy
Authors Seth Neel, Aaron Roth
Abstract Data that is gathered adaptively — via bandit algorithms, for example — exhibits bias. This is true both when gathering simple numeric valued data — the empirical means kept track of by stochastic bandit algorithms are biased downwards — and when gathering more complicated data — running hypothesis tests on complex data gathered via contextual bandit algorithms leads to false discovery. In this paper, we show that this problem is mitigated if the data collection procedure is differentially private. This lets us both bound the bias of simple numeric valued quantities (like the empirical means of stochastic bandit algorithms), and correct the p-values of hypothesis tests run on the adaptively gathered data. Moreover, there exist differentially private bandit algorithms with near optimal regret bounds: we apply existing theorems in the simple stochastic case, and give a new analysis for linear contextual bandits. We complement our theoretical results with experiments validating our theory.
Tasks Multi-Armed Bandits
Published 2018-06-06
URL http://arxiv.org/abs/1806.02329v1
PDF http://arxiv.org/pdf/1806.02329v1.pdf
PWC https://paperswithcode.com/paper/mitigating-bias-in-adaptive-data-gathering
Repo
Framework

Deep Embedding Kernel

Title Deep Embedding Kernel
Authors Linh Le, Ying Xie
Abstract In this paper, we propose a novel supervised learning method that is called Deep Embedding Kernel (DEK). DEK combines the advantages of deep learning and kernel methods in a unified framework. More specifically, DEK is a learnable kernel represented by a newly designed deep architecture. Compared with pre-defined kernels, this kernel can be explicitly trained to map data to an optimized high-level feature space where data may have favorable features toward the application. Compared with typical deep learning using SoftMax or logistic regression as the top layer, DEK is expected to be more generalizable to new data. Experimental results show that DEK has superior performance than typical machine learning methods in identity detection, classification, regression, dimension reduction, and transfer learning.
Tasks Dimensionality Reduction, Transfer Learning
Published 2018-04-16
URL http://arxiv.org/abs/1804.05806v1
PDF http://arxiv.org/pdf/1804.05806v1.pdf
PWC https://paperswithcode.com/paper/deep-embedding-kernel
Repo
Framework

Evotype: Towards the Evolution of Type Stencils

Title Evotype: Towards the Evolution of Type Stencils
Authors Tiago Martins, João Correia, Ernesto Costa, Penousal Machado
Abstract Typefaces are an essential resource employed by graphic designers. The increasing demand for innovative type design work increases the need for good technological means to assist the designer in the creation of a typeface. We present an evolutionary computation approach for the generation of type stencils to draw coherent glyphs for different characters. The proposed system employs a Genetic Algorithm to evolve populations of type stencils. The evaluation of each candidate stencil uses a hill climbing algorithm to search the best configurations to draw the target glyphs. We study the interplay between legibility, coherence and expressiveness, and show how our framework can be used in practice.
Tasks
Published 2018-06-26
URL http://arxiv.org/abs/1806.09731v1
PDF http://arxiv.org/pdf/1806.09731v1.pdf
PWC https://paperswithcode.com/paper/evotype-towards-the-evolution-of-type
Repo
Framework

Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots

Title Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots
Authors Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou
Abstract We propose a method that can leverage unlabeled data to learn a matching model for response selection in retrieval-based chatbots. The method employs a sequence-to-sequence architecture (Seq2Seq) model as a weak annotator to judge the matching degree of unlabeled pairs, and then performs learning with both the weak signals and the unlabeled data. Experimental results on two public data sets indicate that matching models get significant improvements when they are learned with the proposed method.
Tasks
Published 2018-05-07
URL http://arxiv.org/abs/1805.02333v2
PDF http://arxiv.org/pdf/1805.02333v2.pdf
PWC https://paperswithcode.com/paper/learning-matching-models-with-weak
Repo
Framework

Selective Distillation of Weakly Annotated GTD for Vision-based Slab Identification System

Title Selective Distillation of Weakly Annotated GTD for Vision-based Slab Identification System
Authors Sang Jun Lee, Sang Woo Kim, Wookyong Kwon, Gyogwon Koo, Jong Pil Yun
Abstract This paper proposes an algorithm for recognizing slab identification numbers in factory scenes. In the development of a deep-learning based system, manual labeling to make ground truth data (GTD) is an important but expensive task. Furthermore, the quality of GTD is closely related to the performance of a supervised learning algorithm. To reduce manual work in the labeling process, we generated weakly annotated GTD by marking only character centroids. Whereas bounding-boxes for characters require at least a drag-and-drop operation or two clicks to annotate a character location, the weakly annotated GTD requires a single click to record a character location. The main contribution of this paper is on selective distillation to improve the quality of the weakly annotated GTD. Because manual GTD are usually generated by many people, it may contain personal bias or human error. To address this problem, the information in manual GTD is integrated and refined by selective distillation. In the process of selective distillation, a fully convolutional network is trained using the weakly annotated GTD, and its prediction maps are selectively used to revise locations and boundaries of semantic regions of characters in the initial GTD. The modified GTD are used in the main training stage, and a post-processing is conducted to retrieve text information. Experiments were thoroughly conducted on actual industry data collected at a steelmaking factory to demonstrate the effectiveness of the proposed method.
Tasks
Published 2018-10-09
URL http://arxiv.org/abs/1810.04029v2
PDF http://arxiv.org/pdf/1810.04029v2.pdf
PWC https://paperswithcode.com/paper/selective-distillation-of-weakly-annotated
Repo
Framework

An empirical learning-based validation procedure for simulation workflow

Title An empirical learning-based validation procedure for simulation workflow
Authors Zhuqing Liu, Liyuanjun Lai, Lin Zhang
Abstract Simulation workflow is a top-level model for the design and control of simulation process. It connects multiple simulation components with time and interaction restrictions to form a complete simulation system. Before the construction and evaluation of the component models, the validation of upper-layer simulation workflow is of the most importance in a simulation system. However, the methods especially for validating simulation workflow is very limit. Many of the existing validation techniques are domain-dependent with cumbersome questionnaire design and expert scoring. Therefore, this paper present an empirical learning-based validation procedure to implement a semi-automated evaluation for simulation workflow. First, representative features of general simulation workflow and their relations with validation indices are proposed. The calculation process of workflow credibility based on Analytic Hierarchy Process (AHP) is then introduced. In order to make full use of the historical data and implement more efficient validation, four learning algorithms, including back propagation neural network (BPNN), extreme learning machine (ELM), evolving new-neuron (eNFN) and fast incremental gaussian mixture model (FIGMN), are introduced for constructing the empirical relation between the workflow credibility and its features. A case study on a landing-process simulation workflow is established to test the feasibility of the proposed procedure. The experimental results also provide some useful overview of the state-of-the-art learning algorithms on the credibility evaluation of simulation models.
Tasks
Published 2018-09-11
URL http://arxiv.org/abs/1809.04441v1
PDF http://arxiv.org/pdf/1809.04441v1.pdf
PWC https://paperswithcode.com/paper/an-empirical-learning-based-validation
Repo
Framework

Variational Community Partition with Novel Network Structure Centrality Prior

Title Variational Community Partition with Novel Network Structure Centrality Prior
Authors Yiguang Bai, Sanyang Liu, Ke Yin, Jing Yuan
Abstract In this paper, we proposed a novel two-stage optimization method for network community partition, which is based on inherent network structure information. The introduced optimization approach utilizes the new network centrality measure of both links and vertices to construct the key affinity description of the given network, where the direct similarities between graph nodes or nodal features are not available to obtain the classical affinity matrix. Indeed, such calculated network centrality information presents the essential structure of network, hence, the proper measure for detecting network communities, which also introduces a `confidence’ criterion for referencing new labeled benchmark nodes. For the resulted challenging combinatorial optimization problem of graph clustering, the proposed optimization method iteratively employs an efficient convex optimization algorithm which is developed based under a new variational perspective of primal and dual. Experiments over both artificial and real-world network datasets demonstrate that the proposed optimization strategy of community detection significantly improves result accuracy and outperforms the state-of-the-art algorithms in terms of accuracy and reliability. |
Tasks Combinatorial Optimization, Community Detection, Graph Clustering, Network Community Partition
Published 2018-11-12
URL http://arxiv.org/abs/1811.04543v1
PDF http://arxiv.org/pdf/1811.04543v1.pdf
PWC https://paperswithcode.com/paper/variational-community-partition-with-novel
Repo
Framework
comments powered by Disqus