January 26, 2020

2587 words 13 mins read

Paper Group ANR 1408

Paper Group ANR 1408

Calibration, Entropy Rates, and Memory in Language Models. Analyzing the Role of Model Uncertainty for Electronic Health Records. A Deep Factorization of Style and Structure in Fonts. The few-get-richer: a surprising consequence of popularity-based rankings. Deep Learning Methods for Signature Verification. Mixture Probabilistic Principal Geodesic …

Calibration, Entropy Rates, and Memory in Language Models

Title Calibration, Entropy Rates, and Memory in Language Models
Authors Mark Braverman, Xinyi Chen, Sham M. Kakade, Karthik Narasimhan, Cyril Zhang, Yi Zhang
Abstract Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are \emph{miscalibrated}: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.
Tasks Calibration
Published 2019-06-11
URL https://arxiv.org/abs/1906.05664v1
PDF https://arxiv.org/pdf/1906.05664v1.pdf
PWC https://paperswithcode.com/paper/calibration-entropy-rates-and-memory-in
Repo
Framework

Analyzing the Role of Model Uncertainty for Electronic Health Records

Title Analyzing the Role of Model Uncertainty for Electronic Health Records
Authors Michael W. Dusenberry, Dustin Tran, Edward Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine Heller, Andrew M. Dai
Abstract In medicine, both ethical and monetary costs of incorrect predictions can be significant, and the complexity of the problems often necessitates increasingly complex models. Recent work has shown that changing just the random seed is enough for otherwise well-tuned deep neural networks to vary in their individual predicted probabilities. In light of this, we investigate the role of model uncertainty methods in the medical domain. Using RNN ensembles and various Bayesian RNNs, we show that population-level metrics, such as AUC-PR, AUC-ROC, log-likelihood, and calibration error, do not capture model uncertainty. Meanwhile, the presence of significant variability in patient-specific predictions and optimal decisions motivates the need for capturing model uncertainty. Understanding the uncertainty for individual patients is an area with clear clinical impact, such as determining when a model decision is likely to be brittle. We further show that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and we analyze how model uncertainty is impacted across individual input features and patient subgroups.
Tasks Calibration
Published 2019-06-10
URL https://arxiv.org/abs/1906.03842v3
PDF https://arxiv.org/pdf/1906.03842v3.pdf
PWC https://paperswithcode.com/paper/analyzing-the-role-of-model-uncertainty-for
Repo
Framework

A Deep Factorization of Style and Structure in Fonts

Title A Deep Factorization of Style and Structure in Fonts
Authors Akshay Srivatsan, Jonathan T. Barron, Dan Klein, Taylor Berg-Kirkpatrick
Abstract We propose a deep factorization model for typographic analysis that disentangles content from style. Specifically, a variational inference procedure factors each training glyph into the combination of a character-specific content embedding and a latent font-specific style variable. The underlying generative model combines these factors through an asymmetric transpose convolutional process to generate the image of the glyph itself. When trained on corpora of fonts, our model learns a manifold over font styles that can be used to analyze or reconstruct new, unseen fonts. On the task of reconstructing missing glyphs from an unknown font given only a small number of observations, our model outperforms both a strong nearest neighbors baseline and a state-of-the-art discriminative model from prior work.
Tasks
Published 2019-10-02
URL https://arxiv.org/abs/1910.00748v1
PDF https://arxiv.org/pdf/1910.00748v1.pdf
PWC https://paperswithcode.com/paper/a-deep-factorization-of-style-and-structure
Repo
Framework

The few-get-richer: a surprising consequence of popularity-based rankings

Title The few-get-richer: a surprising consequence of popularity-based rankings
Authors Fabrizio Germano, Vicenç Gómez, Gaël Le Mens
Abstract Ranking algorithms play a crucial role in online platforms ranging from search engines to recommender systems. In this paper, we identify a surprising consequence of popularity-based rankings: the fewer the items reporting a given signal, the higher the share of the overall traffic they collectively attract. This few-get-richer effect emerges in settings where there are few distinct classes of items (e.g., left-leaning news sources versus right-leaning news sources), and items are ranked based on their popularity. We demonstrate analytically that the few-get-richer effect emerges when people tend to click on top-ranked items and have heterogeneous preferences for the classes of items. Using simulations, we analyze how the strength of the effect changes with assumptions about the setting and human behavior. We also test our predictions experimentally in an online experiment with human participants. Our findings have important implications to understand the spread of misinformation.
Tasks Recommendation Systems
Published 2019-02-07
URL https://arxiv.org/abs/1902.02580v2
PDF https://arxiv.org/pdf/1902.02580v2.pdf
PWC https://paperswithcode.com/paper/the-few-get-richer-a-surprising-consequence
Repo
Framework

Deep Learning Methods for Signature Verification

Title Deep Learning Methods for Signature Verification
Authors Zihan Zeng, Jing Tian
Abstract Signature is widely used in human daily lives, and serves as a supplementary characteristic for verifying human identity. However, there is rare work of verifying signature. In this paper, we propose a few deep learning architectures to tackle this task, ranging from CNN, RNN to CNN-RNN compact model. We also improve Path Signature Features by encoding temporal information in order to enlarge the discrepancy between genuine and forgery signatures. Our numerical experiments demonstrate the effectiveness of our constructed models and features representations.
Tasks
Published 2019-12-08
URL https://arxiv.org/abs/1912.05435v1
PDF https://arxiv.org/pdf/1912.05435v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-methods-for-signature
Repo
Framework

Mixture Probabilistic Principal Geodesic Analysis

Title Mixture Probabilistic Principal Geodesic Analysis
Authors Youshan Zhang, Jiarui Xing, Miaomiao Zhang
Abstract Dimensionality reduction on Riemannian manifolds is challenging due to the complex nonlinear data structures. While probabilistic principal geodesic analysis~(PPGA) has been proposed to generalize conventional principal component analysis (PCA) onto manifolds, its effectiveness is limited to data with a single modality. In this paper, we present a novel Gaussian latent variable model that provides a unique way to integrate multiple PGA models into a maximum-likelihood framework. This leads to a well-defined mixture model of probabilistic principal geodesic analysis (MPPGA) on sub-populations, where parameters of the principal subspaces are automatically estimated by employing an Expectation Maximization algorithm. We further develop a mixture Bayesian PGA (MBPGA) model that automatically reduces data dimensionality by suppressing irrelevant principal geodesics. We demonstrate the advantages of our model in the contexts of clustering and statistical shape analysis, using synthetic sphere data, real corpus callosum, and mandible data from human brain magnetic resonance~(MR) and CT images.
Tasks Dimensionality Reduction
Published 2019-09-03
URL https://arxiv.org/abs/1909.01412v2
PDF https://arxiv.org/pdf/1909.01412v2.pdf
PWC https://paperswithcode.com/paper/mixture-probabilistic-principal
Repo
Framework

Image Formation Model Guided Deep Image Super-Resolution

Title Image Formation Model Guided Deep Image Super-Resolution
Authors Jinshan Pan, Yang Liu, Deqing Sun, Jimmy Ren, Ming-Ming Cheng, Jian Yang, Jinhui Tang
Abstract We present a simple and effective image super-resolution algorithm that imposes an image formation constraint on the deep neural networks via pixel substitution. The proposed algorithm first uses a deep neural network to estimate intermediate high-resolution images, blurs the intermediate images using known blur kernels, and then substitutes values of the pixels at the un-decimated positions with those of the corresponding pixels from the low-resolution images. The output of the pixel substitution process strictly satisfies the image formation model and is further refined by the same deep neural network in a cascaded manner. The proposed framework is trained in an end-to-end fashion and can work with existing feed-forward deep neural networks for super-resolution and converges fast in practice. Extensive experimental results show that the proposed algorithm performs favorably against state-of-the-art methods.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-08-18
URL https://arxiv.org/abs/1908.06444v3
PDF https://arxiv.org/pdf/1908.06444v3.pdf
PWC https://paperswithcode.com/paper/image-formation-model-guided-deep-image-super
Repo
Framework

A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation

Title A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation
Authors José E. Chacón
Abstract The misclassification error distance and the adjusted Rand index are two of the most commonly used criteria to evaluate the performance of clustering algorithms. This paper provides an in-depth comparison of the two criteria, aimed to better understand exactly what they measure, their properties and their differences. Starting from their population origins, the investigation includes many data analysis examples and the study of particular cases in great detail. An exhaustive simulation study allows inspecting the criteria distributions and reveals some previous misconceptions.
Tasks
Published 2019-07-26
URL https://arxiv.org/abs/1907.11505v1
PDF https://arxiv.org/pdf/1907.11505v1.pdf
PWC https://paperswithcode.com/paper/a-close-up-comparison-of-the
Repo
Framework

Towards Practical Lipschitz Stochastic Bandits

Title Towards Practical Lipschitz Stochastic Bandits
Authors Tianyu Wang, Weicheng Ye, Dawei Geng, Cynthia Rudin
Abstract Stochastic Lipschitz bandit algorithms are methods that govern exploration-exploitation tradeoffs, and have been used for a variety of important task domains, including zeroth order optimization. While beautiful theory has been developed for the stochastic Lipschitz bandit problem, the methods arising from these theories are not practical, and accordingly, the development of practical well-performing Lipschitz bandit algorithms has stalled in recent years. To remedy this, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and arm-space. Due to this flexibility, the algorithm is able to efficiently optimize rewards and minimize regret, by focusing on the portions of the space that are most relevant. Our experiments show that (1) using adaptively-learned partitioning, our method can surpass existing stochastic Lipschitz bandit algorithms, and (2) our algorithms can achieve state-of-the-art performance in challenging real-world tasks such as neural network hyperparameter tuning.
Tasks
Published 2019-01-26
URL https://arxiv.org/abs/1901.09277v4
PDF https://arxiv.org/pdf/1901.09277v4.pdf
PWC https://paperswithcode.com/paper/a-practical-bandit-method-with-advantages-in
Repo
Framework

Fact-Checking Meets Fauxtography: Verifying Claims About Images

Title Fact-Checking Meets Fauxtography: Verifying Claims About Images
Authors Dimitrina Zlatkova, Preslav Nakov, Ivan Koychev
Abstract The recent explosion of false claims in social media and on the Web in general has given rise to a lot of manual fact-checking initiatives. Unfortunately, the number of claims that need to be fact-checked is several orders of magnitude larger than what humans can handle manually. Thus, there has been a lot of research aiming at automating the process. Interestingly, previous work has largely ignored the growing number of claims about images. This is despite the fact that visual imagery is more influential than text and naturally appears alongside fake news. Here we aim at bridging this gap. In particular, we create a new dataset for this problem, and we explore a variety of features modeling the claim, the image, and the relationship between the claim and the image. The evaluation results show sizable improvements over the baseline. We release our dataset, hoping to enable further research on fact-checking claims about images.
Tasks
Published 2019-08-30
URL https://arxiv.org/abs/1908.11722v1
PDF https://arxiv.org/pdf/1908.11722v1.pdf
PWC https://paperswithcode.com/paper/fact-checking-meets-fauxtography-verifying
Repo
Framework

Gender Bias in Contextualized Word Embeddings

Title Gender Bias in Contextualized Word Embeddings
Authors Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang
Abstract In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo’s contextualized word vectors. First, we conduct several intrinsic analyses and find that (1) training data for ELMo contains significantly more male than female entities, (2) the trained ELMo embeddings systematically encode gender information and (3) ELMo unequally encodes gender information about male and female entities. Then, we show that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus. Finally, we explore two methods to mitigate such gender bias and show that the bias demonstrated on WinoBias can be eliminated.
Tasks Word Embeddings
Published 2019-04-05
URL http://arxiv.org/abs/1904.03310v1
PDF http://arxiv.org/pdf/1904.03310v1.pdf
PWC https://paperswithcode.com/paper/gender-bias-in-contextualized-word-embeddings
Repo
Framework

Generalized Belief Function: A new concept for uncertainty modelling and processing

Title Generalized Belief Function: A new concept for uncertainty modelling and processing
Authors Fuyuan Xiao
Abstract In this paper, we generalize the belief function on complex plane from another point of view. We first propose a new concept of complex mass function based on the complex number, called complex basic belief assignment, which is a generalization of the traditional mass function in Dempster-Shafer evidence theory. On the basis of the de nition of complex mass function, the belief function and plausibility function are generalized. In particular, when the complex mass function is degenerated from complex numbers to real numbers, the generalized belief and plausibility functions degenerate into the traditional belief and plausibility functions in DSE theory, respectively.
Tasks
Published 2019-07-03
URL https://arxiv.org/abs/1907.04719v1
PDF https://arxiv.org/pdf/1907.04719v1.pdf
PWC https://paperswithcode.com/paper/generalized-belief-function-a-new-concept-for
Repo
Framework

Long-span language modeling for speech recognition

Title Long-span language modeling for speech recognition
Authors Sarangarajan Parthasarathy, William Gale, Xie Chen, George Polovets, Shuangyu Chang
Abstract We explore neural language modeling for speech recognition where the context spans multiple sentences. Rather than encode history beyond the current sentence using a cache of words or document-level features, we focus our study on the ability of LSTM and Transformer language models to implicitly learn to carry over context across sentence boundaries. We introduce a new architecture that incorporates an attention mechanism into LSTM to combine the benefits of recurrent and attention architectures. We conduct language modeling and speech recognition experiments on the publicly available LibriSpeech corpus. We show that conventional training on a paragraph-level corpus results in significant reductions in perplexity compared to training on a sentence-level corpus. We also describe speech recognition experiments using long-span language models in second-pass re-ranking, and provide insights into the ability of such models to take advantage of context beyond the current sentence.
Tasks Language Modelling, Speech Recognition
Published 2019-11-11
URL https://arxiv.org/abs/1911.04571v1
PDF https://arxiv.org/pdf/1911.04571v1.pdf
PWC https://paperswithcode.com/paper/long-span-language-modeling-for-speech
Repo
Framework

Consistent recovery threshold of hidden nearest neighbor graphs

Title Consistent recovery threshold of hidden nearest neighbor graphs
Authors Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang
Abstract Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as $n\to \infty$: (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is $o(nk)$. We show that the maximum likelihood estimator achieves (1) exact recovery for $2 \le k \le n^{o(1)}$ if $ \liminf \frac{2\alpha_n}{\log n}>1$; (2) almost exact recovery for $ 1 \le k \le o\left( \frac{\log n}{\log \log n} \right)$ if $\liminf \frac{kD(P_nQ_n)}{\log n}>1$, where $\alpha_n \triangleq -2 \log \int \sqrt{d P_n d Q_n}$ is the R'enyi divergence of order $\frac{1}{2}$ and $D(P_nQ_n)$ is the Kullback-Leibler divergence. Under mild distributional assumptions, these conditions are shown to be information-theoretically necessary for any algorithm to succeed. A key challenge in the analysis is the enumeration of $2k$-NN graphs that differ from the hidden one by a given number of edges.
Tasks
Published 2019-11-18
URL https://arxiv.org/abs/1911.08004v1
PDF https://arxiv.org/pdf/1911.08004v1.pdf
PWC https://paperswithcode.com/paper/consistent-recovery-threshold-of-hidden
Repo
Framework

Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks

Title Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks
Authors Julius von Kügelgen, Paul K Rubenstein, Bernhard Schölkopf, Adrian Weller
Abstract We study the problem of causal discovery through targeted interventions. Starting from few observational measurements, we follow a Bayesian active learning approach to perform those experiments which, in expectation with respect to the current model, are maximally informative about the underlying causal structure. Unlike previous work, we consider the setting of continuous random variables with non-linear functional relationships, modelled with Gaussian process priors. To address the arising problem of choosing from an uncountable set of possible interventions, we propose to use Bayesian optimisation to efficiently maximise a Monte Carlo estimate of the expected information gain.
Tasks Active Learning, Bayesian Optimisation, Causal Discovery
Published 2019-10-09
URL https://arxiv.org/abs/1910.03962v1
PDF https://arxiv.org/pdf/1910.03962v1.pdf
PWC https://paperswithcode.com/paper/optimal-experimental-design-via-bayesian
Repo
Framework
comments powered by Disqus