January 30, 2020

3079 words 15 mins read

Paper Group ANR 364

Evaluating model calibration in classification. Multi Sense Embeddings from Topic Models. Evaluating Topic Quality with Posterior Variability. Conditional Finite Mixtures of Poisson Distributions for Context-Dependent Neural Correlations. A realistic and robust model for Chinese word segmentation. A Seq-to-Seq Transformer Premised Temporal Convolut …

Evaluating model calibration in classification


Title	Evaluating model calibration in classification
Authors	Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas B. Schön
Abstract	Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their ability to represent uncertainty about predictions. In safety-critical applications, it is pivotal for a model to possess an adequate sense of uncertainty, which for probabilistic classifiers translates into outputting probability distributions that are consistent with the empirical frequencies observed from realized outcomes. A classifier with such a property is called calibrated. In this work, we develop a general theoretical calibration evaluation framework grounded in probability theory, and point out subtleties present in model calibration evaluation that lead to refined interpretations of existing evaluation techniques. Lastly, we propose new ways to quantify and visualize miscalibration in probabilistic classification, including novel multidimensional reliability diagrams.
Tasks	Calibration, Decision Making
Published	2019-02-19
URL	http://arxiv.org/abs/1902.06977v1
PDF	http://arxiv.org/pdf/1902.06977v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-model-calibration-in
Repo
Framework

Multi Sense Embeddings from Topic Models


Title	Multi Sense Embeddings from Topic Models
Authors	Shobhit Jain, Sravan Babu Bodapati, Ramesh Nallapati, Anima Anandkumar
Abstract	Distributed word embeddings have yielded state-of-the-art performance in many NLP tasks, mainly due to their success in capturing useful semantic information. These representations assign only a single vector to each word whereas a large number of words are polysemous (i.e., have multiple meanings). In this work, we approach this critical problem in lexical semantics, namely that of representing various senses of polysemous words in vector spaces. We propose a topic modeling based skip-gram approach for learning multi-prototype word embeddings. We also introduce a method to prune the embeddings determined by the probabilistic representation of the word in each topic. We use our embeddings to show that they can capture the context and word similarity strongly and outperform various state-of-the-art implementations.
Tasks	Topic Models, Word Embeddings
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07746v2
PDF	https://arxiv.org/pdf/1909.07746v2.pdf
PWC	https://paperswithcode.com/paper/multi-sense-embeddings-from-topic-models
Repo
Framework

Evaluating Topic Quality with Posterior Variability


Title	Evaluating Topic Quality with Posterior Variability
Authors	Linzi Xing, Michael J. Paul, Giuseppe Carenini
Abstract	Probabilistic topic models such as latent Dirichlet allocation (LDA) are popularly used with Bayesian inference methods such as Gibbs sampling to learn posterior distributions over topic model parameters. We derive a novel measure of LDA topic quality using the variability of the posterior distributions. Compared to several existing baselines for automatic topic evaluation, the proposed metric achieves state-of-the-art correlations with human judgments of topic quality in experiments on three corpora. We additionally demonstrate that topic quality estimation can be further improved using a supervised estimator that combines multiple metrics.
Tasks	Bayesian Inference, Topic Models
Published	2019-09-08
URL	https://arxiv.org/abs/1909.03524v2
PDF	https://arxiv.org/pdf/1909.03524v2.pdf
PWC	https://paperswithcode.com/paper/evaluating-topic-quality-with-posterior
Repo
Framework

Conditional Finite Mixtures of Poisson Distributions for Context-Dependent Neural Correlations


Title	Conditional Finite Mixtures of Poisson Distributions for Context-Dependent Neural Correlations
Authors	Sacha Sokoloski, Ruben Coen-Cagli
Abstract	Parallel recordings of neural spike counts have revealed the existence of context-dependent noise correlations in neural populations. Theories of population coding have also shown that such correlations can impact the information encoded by neural populations about external stimuli. Although studies have shown that these correlations often have a low-dimensional structure, it has proven difficult to capture this structure in a model that is compatible with theories of rate coding in correlated populations. To address this difficulty we develop a novel model based on conditional finite mixtures of independent Poisson distributions. The model can be conditioned on context variables (e.g. stimuli or task variables), and the number of mixture components in the model can be cross-validated to estimate the dimensionality of the target correlations. We derive an expectation-maximization algorithm to efficiently fit the model to realistic amounts of data from large neural populations. We then demonstrate that the model successfully captures stimulus-dependent correlations in the responses of macaque V1 neurons to oriented gratings. Our model incorporates arbitrary nonlinear context-dependence, and can thus be applied to improve predictions of neural activity based on deep neural networks.
Tasks
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00637v2
PDF	https://arxiv.org/pdf/1908.00637v2.pdf
PWC	https://paperswithcode.com/paper/conditional-finite-mixtures-of-poisson
Repo
Framework

A realistic and robust model for Chinese word segmentation


Title	A realistic and robust model for Chinese word segmentation
Authors	Chu-Ren Huang, Ting-Shuo Yo, Petr Simon, Shu-Kai Hsieh
Abstract	A realistic Chinese word segmentation tool must adapt to textual variations with minimal training input and yet robust enough to yield reliable segmentation result for all variants. Various lexicon-driven approaches to Chinese segmentation, e.g. [1,16], achieve high f-scores yet require massive training for any variation. Text-driven approach, e.g. [12], can be easily adapted for domain and genre changes yet has difficulty matching the high f-scores of the lexicon-driven approaches. In this paper, we refine and implement an innovative text-driven word boundary decision (WBD) segmentation model proposed in [15]. The WBD model treats word segmentation simply and efficiently as a binary decision on whether to realize the natural textual break between two adjacent characters as a word boundary. The WBD model allows simple and quick training data preparation converting characters as contextual vectors for learning the word boundary decision. Machine learning experiments with four different classifiers show that training with 1,000 vectors and 1 million vectors achieve comparable and reliable results. In addition, when applied to SigHAN Bakeoff 3 competition data, the WBD model produces OOV recall rates that are higher than all published results. Unlike all previous work, our OOV recall rate is comparable to our own F-score. Both experiments support the claim that the WBD model is a realistic model for Chinese word segmentation as it can be easily adapted for new variants with the robust result. In conclusion, we will discuss linguistic ramifications as well as future implications for the WBD approach.
Tasks	Chinese Word Segmentation
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08732v1
PDF	https://arxiv.org/pdf/1905.08732v1.pdf
PWC	https://paperswithcode.com/paper/a-realistic-and-robust-model-for-chinese-word
Repo
Framework

A Seq-to-Seq Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation


Title	A Seq-to-Seq Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation
Authors	Wei Jiang, Yan Tang
Abstract	The prevalent approaches of Chinese word segmentation task almost rely on the Bi-LSTM neural network. However, the methods based the Bi-LSTM have some inherent drawbacks: hard to parallel computing, little efficient in applying the Dropout method to inhibit the Overfitting and little efficient in capturing the character information at the more distant site of a long sentence for the word segmentation task. In this work, we propose a sequence-to-sequence transformer model for Chinese word segmentation, which is premised a type of convolutional neural network named temporal convolutional network. The model uses the temporal convolutional network to construct an encoder, and uses one layer of fully-connected neural network to build a decoder, and applies the Dropout method to inhibit the Overfitting, and captures the character information at the distant site of a sentence by adding the layers of the encoder, and binds Conditional Random Fields model to train parameters, and uses the Viterbi algorithm to infer the final result of the Chinese word segmentation. The experiments on traditional Chinese corpora and simplified Chinese corpora show that the performance of Chinese word segmentation of the model is equivalent to the performance of the methods based the Bi-LSTM, and the model has a tremendous growth in parallel computing than the models based the Bi-LSTM.
Tasks	Chinese Word Segmentation
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08454v1
PDF	https://arxiv.org/pdf/1905.08454v1.pdf
PWC	https://paperswithcode.com/paper/a-seq-to-seq-transformer-premised-temporal
Repo
Framework

Revisiting Sample Selection Approach to Positive-Unlabeled Learning: Turning Unlabeled Data into Positive rather than Negative


Title	Revisiting Sample Selection Approach to Positive-Unlabeled Learning: Turning Unlabeled Data into Positive rather than Negative
Authors	Miao Xu, Bingcong Li, Gang Niu, Bo Han, Masashi Sugiyama
Abstract	In the early history of positive-unlabeled (PU) learning, the sample selection approach, which heuristically selects negative (N) data from U data, was explored extensively. However, this approach was later dominated by the importance reweighting approach, which carefully treats all U data as N data. May there be a new sample selection method that can outperform the latest importance reweighting method in the deep learning age? This paper is devoted to answering this question affirmatively—we propose to label large-loss U data as P, based on the memorization properties of deep networks. Since P data selected in such a way are biased, we develop a novel learning objective that can handle such biased P data properly. Experiments confirm the superiority of the proposed method.
Tasks
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10155v1
PDF	http://arxiv.org/pdf/1901.10155v1.pdf
PWC	https://paperswithcode.com/paper/revisiting-sample-selection-approach-to
Repo
Framework

Findings of the First Shared Task on Machine Translation Robustness


Title	Findings of the First Shared Task on Machine Translation Robustness
Authors	Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan Pino, Hassan Sajjad
Abstract	We share the findings of the first shared task on improving robustness of Machine Translation (MT). The task provides a testbed representing challenges facing MT models deployed in the real world, and facilitates new approaches to improve models; robustness to noisy input and domain mismatch. We focus on two language pairs (English-French and English-Japanese), and the submitted systems are evaluated on a blind test set consisting of noisy comments on Reddit and professionally sourced translations. As a new task, we received 23 submissions by 11 participating teams from universities, companies, national labs, etc. All submitted systems achieved large improvements over baselines, with the best improvement having +22.33 BLEU. We evaluated submissions by both human judgment and automatic evaluation (BLEU), which shows high correlations (Pearson’s r = 0.94 and 0.95). Furthermore, we conducted a qualitative analysis of the submitted systems using compare-mt, which revealed their salient differences in handling challenges in this task. Such analysis provides additional insights when there is occasional disagreement between human judgment and BLEU, e.g. systems better at producing colloquial expressions received higher score from human judgment.
Tasks	Machine Translation
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11943v2
PDF	https://arxiv.org/pdf/1906.11943v2.pdf
PWC	https://paperswithcode.com/paper/findings-of-the-first-shared-task-on-machine
Repo
Framework

Diversity of Ensembles for Data Stream Classification


Title	Diversity of Ensembles for Data Stream Classification
Authors	Mohamed Souhayel Abassi
Abstract	When constructing a classifier ensemble, diversity among the base classifiers is one of the important characteristics. Several studies have been made in the context of standard static data, in particular, when analyzing the relationship between a high ensemble predictive performance and the diversity of its components. Besides, ensembles of learning machines have been performed to learn in the presence of concept drift and adapt to it. However, diversity measures have not received much research interest in evolving data streams. Only a few researchers directly consider promoting diversity while constructing an ensemble or rebuilding them in the moment of detecting drifts. In this paper, we present a theoretical analysis of different diversity measures and relate them to the success of ensemble learning algorithms for streaming data. The analysis provides a deeper understanding of the concept of diversity and its impact on online ensemble Learning in the presence of concept drift. More precisely, we are interested in answering the following research question; Which commonly used diversity measures are used in the context of static-data ensembles and how far are they applicable in the context of streaming data ensembles?
Tasks
Published	2019-02-22
URL	http://arxiv.org/abs/1902.08466v1
PDF	http://arxiv.org/pdf/1902.08466v1.pdf
PWC	https://paperswithcode.com/paper/diversity-of-ensembles-for-data-stream
Repo
Framework


Title	Learning in Modal Space: Solving Time-Dependent Stochastic PDEs Using Physics-Informed Neural Networks
Authors	Dongkun Zhang, Ling Guo, George Em Karniadakis
Abstract	One of the open problems in scientific computing is the long-time integration of nonlinear stochastic partial differential equations (SPDEs). We address this problem by taking advantage of recent advances in scientific machine learning and the dynamically orthogonal (DO) and bi-orthogonal (BO) methods for representing stochastic processes. Specifically, we propose two new Physics-Informed Neural Networks (PINNs) for solving time-dependent SPDEs, namely the NN-DO/BO methods, which incorporate the DO/BO constraints into the loss function with an implicit form instead of generating explicit expressions for the temporal derivatives of the DO/BO modes. Hence, the proposed methods overcome some of the drawbacks of the original DO/BO methods: we do not need the assumption that the covariance matrix of the random coefficients is invertible as in the original DO method, and we can remove the assumption of no eigenvalue crossing as in the original BO method. Moreover, the NN-DO/BO methods can be used to solve time-dependent stochastic inverse problems with the same formulation and computational complexity as for forward problems. We demonstrate the capability of the proposed methods via several numerical examples: (1) A linear stochastic advection equation with deterministic initial condition where the original DO/BO method would fail; (2) Long-time integration of the stochastic Burgers’ equation with many eigenvalue crossings during the whole time evolution where the original BO method fails. (3) Nonlinear reaction diffusion equation: we consider both the forward and the inverse problem, including noisy initial data, to investigate the flexibility of the NN-DO/BO methods in handling inverse and mixed type problems. Taken together, these simulation results demonstrate that the NN-DO/BO methods can be employed to effectively quantify uncertainty propagation in a wide range of physical problems.
Tasks
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01205v2
PDF	https://arxiv.org/pdf/1905.01205v2.pdf
PWC	https://paperswithcode.com/paper/learning-in-modal-space-solving-time
Repo
Framework

Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning


Title	Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
Authors	Weipeng Huang, Xingyi Cheng, Kunlong Chen, Taifeng Wang, Wei Chu
Abstract	The ambiguous annotation criteria bring into the divergence of Chinese Word Segmentation (CWS) datasets with various granularities. Multi-criteria learning leverage the annotation style of individual datasets and mine their common basic knowledge. In this paper, we proposed a domain adaptive segmenter to capture diverse criteria of datasets. Our model is based on Bidirectional Encoder Representations from Transformers (BERT), which is responsible for introducing external knowledge. We also optimize its computational efficiency via model pruning, quantization, and compiler optimization. Experiments show that our segmenter outperforms the previous results on 10 CWS datasets and is faster than the previous state-of-the-art Bi-LSTM-CRF model.
Tasks	Chinese Word Segmentation, Quantization
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04190v1
PDF	http://arxiv.org/pdf/1903.04190v1.pdf
PWC	https://paperswithcode.com/paper/toward-fast-and-accurate-neural-chinese-word
Repo
Framework

Causal structure based root cause analysis of outliers


Title	Causal structure based root cause analysis of outliers
Authors	Dominik Janzing, Kailash Budhathoki, Lenon Minorics, Patrick Blöbaum
Abstract	We describe a formal approach to identify ‘root causes’ of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of ‘conditional outlier score’ which measures whether a value of some variable is unexpected given the value of its parents in the DAG, if one were to assume that the causal structure and the corresponding conditional distributions are also valid for the anomaly. Finally, we quantify to what extent the high outlier score of some target variable can be attributed to outliers of its ancestors. This quantification is defined via Shapley values from cooperative game theory.
Tasks
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02724v1
PDF	https://arxiv.org/pdf/1912.02724v1.pdf
PWC	https://paperswithcode.com/paper/causal-structure-based-root-cause-analysis-of
Repo
Framework

Chinese Word Segmentation: Another Decade Review (2007-2017)


Title	Chinese Word Segmentation: Another Decade Review (2007-2017)
Authors	Hai Zhao, Deng Cai, Changning Huang, Chunyu Kit
Abstract	This paper reviews the development of Chinese word segmentation (CWS) in the most recent decade, 2007-2017. Special attention was paid to the deep learning technologies that has already permeated into most areas of natural language processing (NLP). The basic view we have arrived at is that compared to traditional supervised learning methods, neural network based methods have not shown any superior performance. The most critical challenge still lies on balancing of recognition of in-vocabulary (IV) and out-of-vocabulary (OOV) words. However, as neural models have potentials to capture the essential linguistic structure of natural language, we are optimistic about significant progresses may arrive in the near future.
Tasks	Chinese Word Segmentation
Published	2019-01-18
URL	http://arxiv.org/abs/1901.06079v1
PDF	http://arxiv.org/pdf/1901.06079v1.pdf
PWC	https://paperswithcode.com/paper/chinese-word-segmentation-another-decade
Repo
Framework

Transferability of Operational Status Classification Models Among Different Wind Turbine Typesq


Title	Transferability of Operational Status Classification Models Among Different Wind Turbine Typesq
Authors	Z. Trstanova, A. Martinsson, C. Matthews, S. Jimenez, B. Leimkuhler, T. Van Delft, M. Wilkinson
Abstract	A detailed understanding of wind turbine performance status classification can improve operations and maintenance in the wind energy industry. Due to different engineering properties of wind turbines, the standard supervised learning models used for classification do not generalize across data sets obtained from different wind sites. We propose two methods to deal with the transferability of the trained models: first, data normalization in the form of power curve alignment, and second, a robust method based on convolutional neural networks and feature-space extension. We demonstrate the success of our methods on real-world data sets with industrial applications.
Tasks
Published	2019-03-21
URL	http://arxiv.org/abs/1903.08901v1
PDF	http://arxiv.org/pdf/1903.08901v1.pdf
PWC	https://paperswithcode.com/paper/transferability-of-operational-status
Repo
Framework

Predicting overweight and obesity in later life from childhood data: A review of predictive modeling approaches


Title	Predicting overweight and obesity in later life from childhood data: A review of predictive modeling approaches
Authors	Ilkka Rautiainen, Sami Äyrämö
Abstract	Background: Overweight and obesity are an increasing phenomenon worldwide. Predicting future overweight or obesity early in the childhood reliably could enable a successful intervention by experts. While a lot of research has been done using explanatory modeling methods, capability of machine learning, and predictive modeling, in particular, remain mainly unexplored. In predictive modeling models are validated with previously unseen examples, giving a more accurate estimate of their performance and generalization ability in real-life scenarios. Objective: To find and review existing overweight or obesity research from the perspective of employing childhood data and predictive modeling methods. Methods: The initial phase included bibliographic searches using relevant search terms in PubMed, IEEE database and Google Scholar. The second phase consisted of iteratively searching references of potential studies and recent research that cite the potential studies. Results: Eight research articles and three review articles were identified as relevant for this review. Conclusions: Prediction models with high performance either have a relatively short time period to predict or/and are based on late childhood data. Logistic regression is currently the most often used method in forming the prediction models. In addition to child’s own weight and height information, maternal weight status or body mass index was often used as predictors in the models.
Tasks
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08361v1
PDF	https://arxiv.org/pdf/1911.08361v1.pdf
PWC	https://paperswithcode.com/paper/predicting-overweight-and-obesity-in-later
Repo
Framework