January 26, 2020

2587 words 13 mins read

Paper Group ANR 1408

Calibration, Entropy Rates, and Memory in Language Models. Analyzing the Role of Model Uncertainty for Electronic Health Records. A Deep Factorization of Style and Structure in Fonts. The few-get-richer: a surprising consequence of popularity-based rankings. Deep Learning Methods for Signature Verification. Mixture Probabilistic Principal Geodesic …

Calibration, Entropy Rates, and Memory in Language Models


Title	Calibration, Entropy Rates, and Memory in Language Models
Authors	Mark Braverman, Xinyi Chen, Sham M. Kakade, Karthik Narasimhan, Cyril Zhang, Yi Zhang
Abstract	Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are \emph{miscalibrated}: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.
Tasks	Calibration
Published	2019-06-11
URL	https://arxiv.org/abs/1906.05664v1
PDF	https://arxiv.org/pdf/1906.05664v1.pdf
PWC	https://paperswithcode.com/paper/calibration-entropy-rates-and-memory-in
Repo
Framework

Analyzing the Role of Model Uncertainty for Electronic Health Records


Title	Analyzing the Role of Model Uncertainty for Electronic Health Records
Authors	Michael W. Dusenberry, Dustin Tran, Edward Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine Heller, Andrew M. Dai
Abstract	In medicine, both ethical and monetary costs of incorrect predictions can be significant, and the complexity of the problems often necessitates increasingly complex models. Recent work has shown that changing just the random seed is enough for otherwise well-tuned deep neural networks to vary in their individual predicted probabilities. In light of this, we investigate the role of model uncertainty methods in the medical domain. Using RNN ensembles and various Bayesian RNNs, we show that population-level metrics, such as AUC-PR, AUC-ROC, log-likelihood, and calibration error, do not capture model uncertainty. Meanwhile, the presence of significant variability in patient-specific predictions and optimal decisions motivates the need for capturing model uncertainty. Understanding the uncertainty for individual patients is an area with clear clinical impact, such as determining when a model decision is likely to be brittle. We further show that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and we analyze how model uncertainty is impacted across individual input features and patient subgroups.
Tasks	Calibration
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03842v3
PDF	https://arxiv.org/pdf/1906.03842v3.pdf
PWC	https://paperswithcode.com/paper/analyzing-the-role-of-model-uncertainty-for
Repo
Framework

A Deep Factorization of Style and Structure in Fonts


Title	A Deep Factorization of Style and Structure in Fonts
Authors	Akshay Srivatsan, Jonathan T. Barron, Dan Klein, Taylor Berg-Kirkpatrick
Abstract	We propose a deep factorization model for typographic analysis that disentangles content from style. Specifically, a variational inference procedure factors each training glyph into the combination of a character-specific content embedding and a latent font-specific style variable. The underlying generative model combines these factors through an asymmetric transpose convolutional process to generate the image of the glyph itself. When trained on corpora of fonts, our model learns a manifold over font styles that can be used to analyze or reconstruct new, unseen fonts. On the task of reconstructing missing glyphs from an unknown font given only a small number of observations, our model outperforms both a strong nearest neighbors baseline and a state-of-the-art discriminative model from prior work.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00748v1
PDF	https://arxiv.org/pdf/1910.00748v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-factorization-of-style-and-structure
Repo
Framework

The few-get-richer: a surprising consequence of popularity-based rankings


Title	The few-get-richer: a surprising consequence of popularity-based rankings
Authors	Fabrizio Germano, Vicenç Gómez, Gaël Le Mens
Abstract	Ranking algorithms play a crucial role in online platforms ranging from search engines to recommender systems. In this paper, we identify a surprising consequence of popularity-based rankings: the fewer the items reporting a given signal, the higher the share of the overall traffic they collectively attract. This few-get-richer effect emerges in settings where there are few distinct classes of items (e.g., left-leaning news sources versus right-leaning news sources), and items are ranked based on their popularity. We demonstrate analytically that the few-get-richer effect emerges when people tend to click on top-ranked items and have heterogeneous preferences for the classes of items. Using simulations, we analyze how the strength of the effect changes with assumptions about the setting and human behavior. We also test our predictions experimentally in an online experiment with human participants. Our findings have important implications to understand the spread of misinformation.
Tasks	Recommendation Systems
Published	2019-02-07
URL	https://arxiv.org/abs/1902.02580v2
PDF	https://arxiv.org/pdf/1902.02580v2.pdf
PWC	https://paperswithcode.com/paper/the-few-get-richer-a-surprising-consequence
Repo
Framework

Deep Learning Methods for Signature Verification


Title	Deep Learning Methods for Signature Verification
Authors	Zihan Zeng, Jing Tian
Abstract	Signature is widely used in human daily lives, and serves as a supplementary characteristic for verifying human identity. However, there is rare work of verifying signature. In this paper, we propose a few deep learning architectures to tackle this task, ranging from CNN, RNN to CNN-RNN compact model. We also improve Path Signature Features by encoding temporal information in order to enlarge the discrepancy between genuine and forgery signatures. Our numerical experiments demonstrate the effectiveness of our constructed models and features representations.
Tasks
Published	2019-12-08
URL	https://arxiv.org/abs/1912.05435v1
PDF	https://arxiv.org/pdf/1912.05435v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-methods-for-signature
Repo
Framework

Mixture Probabilistic Principal Geodesic Analysis


Title	Mixture Probabilistic Principal Geodesic Analysis
Authors	Youshan Zhang, Jiarui Xing, Miaomiao Zhang
Abstract	Dimensionality reduction on Riemannian manifolds is challenging due to the complex nonlinear data structures. While probabilistic principal geodesic analysis~(PPGA) has been proposed to generalize conventional principal component analysis (PCA) onto manifolds, its effectiveness is limited to data with a single modality. In this paper, we present a novel Gaussian latent variable model that provides a unique way to integrate multiple PGA models into a maximum-likelihood framework. This leads to a well-defined mixture model of probabilistic principal geodesic analysis (MPPGA) on sub-populations, where parameters of the principal subspaces are automatically estimated by employing an Expectation Maximization algorithm. We further develop a mixture Bayesian PGA (MBPGA) model that automatically reduces data dimensionality by suppressing irrelevant principal geodesics. We demonstrate the advantages of our model in the contexts of clustering and statistical shape analysis, using synthetic sphere data, real corpus callosum, and mandible data from human brain magnetic resonance~(MR) and CT images.
Tasks	Dimensionality Reduction
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01412v2
PDF	https://arxiv.org/pdf/1909.01412v2.pdf
PWC	https://paperswithcode.com/paper/mixture-probabilistic-principal
Repo
Framework

Image Formation Model Guided Deep Image Super-Resolution


Title	Image Formation Model Guided Deep Image Super-Resolution
Authors	Jinshan Pan, Yang Liu, Deqing Sun, Jimmy Ren, Ming-Ming Cheng, Jian Yang, Jinhui Tang
Abstract	We present a simple and effective image super-resolution algorithm that imposes an image formation constraint on the deep neural networks via pixel substitution. The proposed algorithm first uses a deep neural network to estimate intermediate high-resolution images, blurs the intermediate images using known blur kernels, and then substitutes values of the pixels at the un-decimated positions with those of the corresponding pixels from the low-resolution images. The output of the pixel substitution process strictly satisfies the image formation model and is further refined by the same deep neural network in a cascaded manner. The proposed framework is trained in an end-to-end fashion and can work with existing feed-forward deep neural networks for super-resolution and converges fast in practice. Extensive experimental results show that the proposed algorithm performs favorably against state-of-the-art methods.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06444v3
PDF	https://arxiv.org/pdf/1908.06444v3.pdf
PWC	https://paperswithcode.com/paper/image-formation-model-guided-deep-image-super
Repo
Framework

A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation


Title	A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation
Authors	José E. Chacón
Abstract	The misclassification error distance and the adjusted Rand index are two of the most commonly used criteria to evaluate the performance of clustering algorithms. This paper provides an in-depth comparison of the two criteria, aimed to better understand exactly what they measure, their properties and their differences. Starting from their population origins, the investigation includes many data analysis examples and the study of particular cases in great detail. An exhaustive simulation study allows inspecting the criteria distributions and reveals some previous misconceptions.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11505v1
PDF	https://arxiv.org/pdf/1907.11505v1.pdf
PWC	https://paperswithcode.com/paper/a-close-up-comparison-of-the
Repo
Framework

Towards Practical Lipschitz Stochastic Bandits


Title	Towards Practical Lipschitz Stochastic Bandits
Authors	Tianyu Wang, Weicheng Ye, Dawei Geng, Cynthia Rudin
Abstract	Stochastic Lipschitz bandit algorithms are methods that govern exploration-exploitation tradeoffs, and have been used for a variety of important task domains, including zeroth order optimization. While beautiful theory has been developed for the stochastic Lipschitz bandit problem, the methods arising from these theories are not practical, and accordingly, the development of practical well-performing Lipschitz bandit algorithms has stalled in recent years. To remedy this, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and arm-space. Due to this flexibility, the algorithm is able to efficiently optimize rewards and minimize regret, by focusing on the portions of the space that are most relevant. Our experiments show that (1) using adaptively-learned partitioning, our method can surpass existing stochastic Lipschitz bandit algorithms, and (2) our algorithms can achieve state-of-the-art performance in challenging real-world tasks such as neural network hyperparameter tuning.
Tasks
Published	2019-01-26
URL	https://arxiv.org/abs/1901.09277v4
PDF	https://arxiv.org/pdf/1901.09277v4.pdf
PWC	https://paperswithcode.com/paper/a-practical-bandit-method-with-advantages-in
Repo
Framework

Fact-Checking Meets Fauxtography: Verifying Claims About Images


Title	Fact-Checking Meets Fauxtography: Verifying Claims About Images
Authors	Dimitrina Zlatkova, Preslav Nakov, Ivan Koychev
Abstract	The recent explosion of false claims in social media and on the Web in general has given rise to a lot of manual fact-checking initiatives. Unfortunately, the number of claims that need to be fact-checked is several orders of magnitude larger than what humans can handle manually. Thus, there has been a lot of research aiming at automating the process. Interestingly, previous work has largely ignored the growing number of claims about images. This is despite the fact that visual imagery is more influential than text and naturally appears alongside fake news. Here we aim at bridging this gap. In particular, we create a new dataset for this problem, and we explore a variety of features modeling the claim, the image, and the relationship between the claim and the image. The evaluation results show sizable improvements over the baseline. We release our dataset, hoping to enable further research on fact-checking claims about images.
Tasks
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11722v1
PDF	https://arxiv.org/pdf/1908.11722v1.pdf
PWC	https://paperswithcode.com/paper/fact-checking-meets-fauxtography-verifying
Repo
Framework

Gender Bias in Contextualized Word Embeddings


Title	Gender Bias in Contextualized Word Embeddings
Authors	Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang
Abstract	In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo’s contextualized word vectors. First, we conduct several intrinsic analyses and find that (1) training data for ELMo contains significantly more male than female entities, (2) the trained ELMo embeddings systematically encode gender information and (3) ELMo unequally encodes gender information about male and female entities. Then, we show that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus. Finally, we explore two methods to mitigate such gender bias and show that the bias demonstrated on WinoBias can be eliminated.
Tasks	Word Embeddings
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03310v1
PDF	http://arxiv.org/pdf/1904.03310v1.pdf
PWC	https://paperswithcode.com/paper/gender-bias-in-contextualized-word-embeddings
Repo
Framework

Generalized Belief Function: A new concept for uncertainty modelling and processing


Title	Generalized Belief Function: A new concept for uncertainty modelling and processing
Authors	Fuyuan Xiao
Abstract	In this paper, we generalize the belief function on complex plane from another point of view. We first propose a new concept of complex mass function based on the complex number, called complex basic belief assignment, which is a generalization of the traditional mass function in Dempster-Shafer evidence theory. On the basis of the de nition of complex mass function, the belief function and plausibility function are generalized. In particular, when the complex mass function is degenerated from complex numbers to real numbers, the generalized belief and plausibility functions degenerate into the traditional belief and plausibility functions in DSE theory, respectively.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.04719v1
PDF	https://arxiv.org/pdf/1907.04719v1.pdf
PWC	https://paperswithcode.com/paper/generalized-belief-function-a-new-concept-for
Repo
Framework

Long-span language modeling for speech recognition


Title	Long-span language modeling for speech recognition
Authors	Sarangarajan Parthasarathy, William Gale, Xie Chen, George Polovets, Shuangyu Chang
Abstract	We explore neural language modeling for speech recognition where the context spans multiple sentences. Rather than encode history beyond the current sentence using a cache of words or document-level features, we focus our study on the ability of LSTM and Transformer language models to implicitly learn to carry over context across sentence boundaries. We introduce a new architecture that incorporates an attention mechanism into LSTM to combine the benefits of recurrent and attention architectures. We conduct language modeling and speech recognition experiments on the publicly available LibriSpeech corpus. We show that conventional training on a paragraph-level corpus results in significant reductions in perplexity compared to training on a sentence-level corpus. We also describe speech recognition experiments using long-span language models in second-pass re-ranking, and provide insights into the ability of such models to take advantage of context beyond the current sentence.
Tasks	Language Modelling, Speech Recognition
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04571v1
PDF	https://arxiv.org/pdf/1911.04571v1.pdf
PWC	https://paperswithcode.com/paper/long-span-language-modeling-for-speech
Repo
Framework

Consistent recovery threshold of hidden nearest neighbor graphs


Title	Consistent recovery threshold of hidden nearest neighbor graphs
Authors	Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang
Abstract	Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as $n\to \infty$: (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is $o(nk)$. We show that the maximum likelihood estimator achieves (1) exact recovery for $2 \le k \le n^{o(1)}$ if $ \liminf \frac{2\alpha_n}{\log n}>1$; (2) almost exact recovery for $ 1 \le k \le o\left( \frac{\log n}{\log \log n} \right)$ if $\liminf \frac{kD(P_nQ_n)}{\log n}>1$, where $\alpha_n \triangleq -2 \log \int \sqrt{d P_n d Q_n}$ is the R'enyi divergence of order $\frac{1}{2}$ and $D(P_nQ_n)$ is the Kullback-Leibler divergence. Under mild distributional assumptions, these conditions are shown to be information-theoretically necessary for any algorithm to succeed. A key challenge in the analysis is the enumeration of $2k$-NN graphs that differ from the hidden one by a given number of edges.
Tasks
Published	2019-11-18
URL	https://arxiv.org/abs/1911.08004v1
PDF	https://arxiv.org/pdf/1911.08004v1.pdf
PWC	https://paperswithcode.com/paper/consistent-recovery-threshold-of-hidden
Repo
Framework

Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks


Title	Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks
Authors	Julius von Kügelgen, Paul K Rubenstein, Bernhard Schölkopf, Adrian Weller
Abstract	We study the problem of causal discovery through targeted interventions. Starting from few observational measurements, we follow a Bayesian active learning approach to perform those experiments which, in expectation with respect to the current model, are maximally informative about the underlying causal structure. Unlike previous work, we consider the setting of continuous random variables with non-linear functional relationships, modelled with Gaussian process priors. To address the arising problem of choosing from an uncountable set of possible interventions, we propose to use Bayesian optimisation to efficiently maximise a Monte Carlo estimate of the expected information gain.
Tasks	Active Learning, Bayesian Optimisation, Causal Discovery
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03962v1
PDF	https://arxiv.org/pdf/1910.03962v1.pdf
PWC	https://paperswithcode.com/paper/optimal-experimental-design-via-bayesian
Repo
Framework