Paper Group ANR 1408
Calibration, Entropy Rates, and Memory in Language Models. Analyzing the Role of Model Uncertainty for Electronic Health Records. A Deep Factorization of Style and Structure in Fonts. The few-get-richer: a surprising consequence of popularity-based rankings. Deep Learning Methods for Signature Verification. Mixture Probabilistic Principal Geodesic …
Calibration, Entropy Rates, and Memory in Language Models
Title | Calibration, Entropy Rates, and Memory in Language Models |
Authors | Mark Braverman, Xinyi Chen, Sham M. Kakade, Karthik Narasimhan, Cyril Zhang, Yi Zhang |
Abstract | Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are \emph{miscalibrated}: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction. |
Tasks | Calibration |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.05664v1 |
https://arxiv.org/pdf/1906.05664v1.pdf | |
PWC | https://paperswithcode.com/paper/calibration-entropy-rates-and-memory-in |
Repo | |
Framework | |
Analyzing the Role of Model Uncertainty for Electronic Health Records
Title | Analyzing the Role of Model Uncertainty for Electronic Health Records |
Authors | Michael W. Dusenberry, Dustin Tran, Edward Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine Heller, Andrew M. Dai |
Abstract | In medicine, both ethical and monetary costs of incorrect predictions can be significant, and the complexity of the problems often necessitates increasingly complex models. Recent work has shown that changing just the random seed is enough for otherwise well-tuned deep neural networks to vary in their individual predicted probabilities. In light of this, we investigate the role of model uncertainty methods in the medical domain. Using RNN ensembles and various Bayesian RNNs, we show that population-level metrics, such as AUC-PR, AUC-ROC, log-likelihood, and calibration error, do not capture model uncertainty. Meanwhile, the presence of significant variability in patient-specific predictions and optimal decisions motivates the need for capturing model uncertainty. Understanding the uncertainty for individual patients is an area with clear clinical impact, such as determining when a model decision is likely to be brittle. We further show that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and we analyze how model uncertainty is impacted across individual input features and patient subgroups. |
Tasks | Calibration |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03842v3 |
https://arxiv.org/pdf/1906.03842v3.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-the-role-of-model-uncertainty-for |
Repo | |
Framework | |
A Deep Factorization of Style and Structure in Fonts
Title | A Deep Factorization of Style and Structure in Fonts |
Authors | Akshay Srivatsan, Jonathan T. Barron, Dan Klein, Taylor Berg-Kirkpatrick |
Abstract | We propose a deep factorization model for typographic analysis that disentangles content from style. Specifically, a variational inference procedure factors each training glyph into the combination of a character-specific content embedding and a latent font-specific style variable. The underlying generative model combines these factors through an asymmetric transpose convolutional process to generate the image of the glyph itself. When trained on corpora of fonts, our model learns a manifold over font styles that can be used to analyze or reconstruct new, unseen fonts. On the task of reconstructing missing glyphs from an unknown font given only a small number of observations, our model outperforms both a strong nearest neighbors baseline and a state-of-the-art discriminative model from prior work. |
Tasks | |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.00748v1 |
https://arxiv.org/pdf/1910.00748v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-factorization-of-style-and-structure |
Repo | |
Framework | |
The few-get-richer: a surprising consequence of popularity-based rankings
Title | The few-get-richer: a surprising consequence of popularity-based rankings |
Authors | Fabrizio Germano, Vicenç Gómez, Gaël Le Mens |
Abstract | Ranking algorithms play a crucial role in online platforms ranging from search engines to recommender systems. In this paper, we identify a surprising consequence of popularity-based rankings: the fewer the items reporting a given signal, the higher the share of the overall traffic they collectively attract. This few-get-richer effect emerges in settings where there are few distinct classes of items (e.g., left-leaning news sources versus right-leaning news sources), and items are ranked based on their popularity. We demonstrate analytically that the few-get-richer effect emerges when people tend to click on top-ranked items and have heterogeneous preferences for the classes of items. Using simulations, we analyze how the strength of the effect changes with assumptions about the setting and human behavior. We also test our predictions experimentally in an online experiment with human participants. Our findings have important implications to understand the spread of misinformation. |
Tasks | Recommendation Systems |
Published | 2019-02-07 |
URL | https://arxiv.org/abs/1902.02580v2 |
https://arxiv.org/pdf/1902.02580v2.pdf | |
PWC | https://paperswithcode.com/paper/the-few-get-richer-a-surprising-consequence |
Repo | |
Framework | |
Deep Learning Methods for Signature Verification
Title | Deep Learning Methods for Signature Verification |
Authors | Zihan Zeng, Jing Tian |
Abstract | Signature is widely used in human daily lives, and serves as a supplementary characteristic for verifying human identity. However, there is rare work of verifying signature. In this paper, we propose a few deep learning architectures to tackle this task, ranging from CNN, RNN to CNN-RNN compact model. We also improve Path Signature Features by encoding temporal information in order to enlarge the discrepancy between genuine and forgery signatures. Our numerical experiments demonstrate the effectiveness of our constructed models and features representations. |
Tasks | |
Published | 2019-12-08 |
URL | https://arxiv.org/abs/1912.05435v1 |
https://arxiv.org/pdf/1912.05435v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-methods-for-signature |
Repo | |
Framework | |
Mixture Probabilistic Principal Geodesic Analysis
Title | Mixture Probabilistic Principal Geodesic Analysis |
Authors | Youshan Zhang, Jiarui Xing, Miaomiao Zhang |
Abstract | Dimensionality reduction on Riemannian manifolds is challenging due to the complex nonlinear data structures. While probabilistic principal geodesic analysis~(PPGA) has been proposed to generalize conventional principal component analysis (PCA) onto manifolds, its effectiveness is limited to data with a single modality. In this paper, we present a novel Gaussian latent variable model that provides a unique way to integrate multiple PGA models into a maximum-likelihood framework. This leads to a well-defined mixture model of probabilistic principal geodesic analysis (MPPGA) on sub-populations, where parameters of the principal subspaces are automatically estimated by employing an Expectation Maximization algorithm. We further develop a mixture Bayesian PGA (MBPGA) model that automatically reduces data dimensionality by suppressing irrelevant principal geodesics. We demonstrate the advantages of our model in the contexts of clustering and statistical shape analysis, using synthetic sphere data, real corpus callosum, and mandible data from human brain magnetic resonance~(MR) and CT images. |
Tasks | Dimensionality Reduction |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01412v2 |
https://arxiv.org/pdf/1909.01412v2.pdf | |
PWC | https://paperswithcode.com/paper/mixture-probabilistic-principal |
Repo | |
Framework | |
Image Formation Model Guided Deep Image Super-Resolution
Title | Image Formation Model Guided Deep Image Super-Resolution |
Authors | Jinshan Pan, Yang Liu, Deqing Sun, Jimmy Ren, Ming-Ming Cheng, Jian Yang, Jinhui Tang |
Abstract | We present a simple and effective image super-resolution algorithm that imposes an image formation constraint on the deep neural networks via pixel substitution. The proposed algorithm first uses a deep neural network to estimate intermediate high-resolution images, blurs the intermediate images using known blur kernels, and then substitutes values of the pixels at the un-decimated positions with those of the corresponding pixels from the low-resolution images. The output of the pixel substitution process strictly satisfies the image formation model and is further refined by the same deep neural network in a cascaded manner. The proposed framework is trained in an end-to-end fashion and can work with existing feed-forward deep neural networks for super-resolution and converges fast in practice. Extensive experimental results show that the proposed algorithm performs favorably against state-of-the-art methods. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-08-18 |
URL | https://arxiv.org/abs/1908.06444v3 |
https://arxiv.org/pdf/1908.06444v3.pdf | |
PWC | https://paperswithcode.com/paper/image-formation-model-guided-deep-image-super |
Repo | |
Framework | |
A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation
Title | A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation |
Authors | José E. Chacón |
Abstract | The misclassification error distance and the adjusted Rand index are two of the most commonly used criteria to evaluate the performance of clustering algorithms. This paper provides an in-depth comparison of the two criteria, aimed to better understand exactly what they measure, their properties and their differences. Starting from their population origins, the investigation includes many data analysis examples and the study of particular cases in great detail. An exhaustive simulation study allows inspecting the criteria distributions and reveals some previous misconceptions. |
Tasks | |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11505v1 |
https://arxiv.org/pdf/1907.11505v1.pdf | |
PWC | https://paperswithcode.com/paper/a-close-up-comparison-of-the |
Repo | |
Framework | |
Towards Practical Lipschitz Stochastic Bandits
Title | Towards Practical Lipschitz Stochastic Bandits |
Authors | Tianyu Wang, Weicheng Ye, Dawei Geng, Cynthia Rudin |
Abstract | Stochastic Lipschitz bandit algorithms are methods that govern exploration-exploitation tradeoffs, and have been used for a variety of important task domains, including zeroth order optimization. While beautiful theory has been developed for the stochastic Lipschitz bandit problem, the methods arising from these theories are not practical, and accordingly, the development of practical well-performing Lipschitz bandit algorithms has stalled in recent years. To remedy this, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and arm-space. Due to this flexibility, the algorithm is able to efficiently optimize rewards and minimize regret, by focusing on the portions of the space that are most relevant. Our experiments show that (1) using adaptively-learned partitioning, our method can surpass existing stochastic Lipschitz bandit algorithms, and (2) our algorithms can achieve state-of-the-art performance in challenging real-world tasks such as neural network hyperparameter tuning. |
Tasks | |
Published | 2019-01-26 |
URL | https://arxiv.org/abs/1901.09277v4 |
https://arxiv.org/pdf/1901.09277v4.pdf | |
PWC | https://paperswithcode.com/paper/a-practical-bandit-method-with-advantages-in |
Repo | |
Framework | |
Fact-Checking Meets Fauxtography: Verifying Claims About Images
Title | Fact-Checking Meets Fauxtography: Verifying Claims About Images |
Authors | Dimitrina Zlatkova, Preslav Nakov, Ivan Koychev |
Abstract | The recent explosion of false claims in social media and on the Web in general has given rise to a lot of manual fact-checking initiatives. Unfortunately, the number of claims that need to be fact-checked is several orders of magnitude larger than what humans can handle manually. Thus, there has been a lot of research aiming at automating the process. Interestingly, previous work has largely ignored the growing number of claims about images. This is despite the fact that visual imagery is more influential than text and naturally appears alongside fake news. Here we aim at bridging this gap. In particular, we create a new dataset for this problem, and we explore a variety of features modeling the claim, the image, and the relationship between the claim and the image. The evaluation results show sizable improvements over the baseline. We release our dataset, hoping to enable further research on fact-checking claims about images. |
Tasks | |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1908.11722v1 |
https://arxiv.org/pdf/1908.11722v1.pdf | |
PWC | https://paperswithcode.com/paper/fact-checking-meets-fauxtography-verifying |
Repo | |
Framework | |
Gender Bias in Contextualized Word Embeddings
Title | Gender Bias in Contextualized Word Embeddings |
Authors | Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang |
Abstract | In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo’s contextualized word vectors. First, we conduct several intrinsic analyses and find that (1) training data for ELMo contains significantly more male than female entities, (2) the trained ELMo embeddings systematically encode gender information and (3) ELMo unequally encodes gender information about male and female entities. Then, we show that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus. Finally, we explore two methods to mitigate such gender bias and show that the bias demonstrated on WinoBias can be eliminated. |
Tasks | Word Embeddings |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.03310v1 |
http://arxiv.org/pdf/1904.03310v1.pdf | |
PWC | https://paperswithcode.com/paper/gender-bias-in-contextualized-word-embeddings |
Repo | |
Framework | |
Generalized Belief Function: A new concept for uncertainty modelling and processing
Title | Generalized Belief Function: A new concept for uncertainty modelling and processing |
Authors | Fuyuan Xiao |
Abstract | In this paper, we generalize the belief function on complex plane from another point of view. We first propose a new concept of complex mass function based on the complex number, called complex basic belief assignment, which is a generalization of the traditional mass function in Dempster-Shafer evidence theory. On the basis of the de nition of complex mass function, the belief function and plausibility function are generalized. In particular, when the complex mass function is degenerated from complex numbers to real numbers, the generalized belief and plausibility functions degenerate into the traditional belief and plausibility functions in DSE theory, respectively. |
Tasks | |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.04719v1 |
https://arxiv.org/pdf/1907.04719v1.pdf | |
PWC | https://paperswithcode.com/paper/generalized-belief-function-a-new-concept-for |
Repo | |
Framework | |
Long-span language modeling for speech recognition
Title | Long-span language modeling for speech recognition |
Authors | Sarangarajan Parthasarathy, William Gale, Xie Chen, George Polovets, Shuangyu Chang |
Abstract | We explore neural language modeling for speech recognition where the context spans multiple sentences. Rather than encode history beyond the current sentence using a cache of words or document-level features, we focus our study on the ability of LSTM and Transformer language models to implicitly learn to carry over context across sentence boundaries. We introduce a new architecture that incorporates an attention mechanism into LSTM to combine the benefits of recurrent and attention architectures. We conduct language modeling and speech recognition experiments on the publicly available LibriSpeech corpus. We show that conventional training on a paragraph-level corpus results in significant reductions in perplexity compared to training on a sentence-level corpus. We also describe speech recognition experiments using long-span language models in second-pass re-ranking, and provide insights into the ability of such models to take advantage of context beyond the current sentence. |
Tasks | Language Modelling, Speech Recognition |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04571v1 |
https://arxiv.org/pdf/1911.04571v1.pdf | |
PWC | https://paperswithcode.com/paper/long-span-language-modeling-for-speech |
Repo | |
Framework | |
Consistent recovery threshold of hidden nearest neighbor graphs
Title | Consistent recovery threshold of hidden nearest neighbor graphs |
Authors | Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang |
Abstract | Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as $n\to \infty$: (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is $o(nk)$. We show that the maximum likelihood estimator achieves (1) exact recovery for $2 \le k \le n^{o(1)}$ if $ \liminf \frac{2\alpha_n}{\log n}>1$; (2) almost exact recovery for $ 1 \le k \le o\left( \frac{\log n}{\log \log n} \right)$ if $\liminf \frac{kD(P_nQ_n)}{\log n}>1$, where $\alpha_n \triangleq -2 \log \int \sqrt{d P_n d Q_n}$ is the R'enyi divergence of order $\frac{1}{2}$ and $D(P_nQ_n)$ is the Kullback-Leibler divergence. Under mild distributional assumptions, these conditions are shown to be information-theoretically necessary for any algorithm to succeed. A key challenge in the analysis is the enumeration of $2k$-NN graphs that differ from the hidden one by a given number of edges. |
Tasks | |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.08004v1 |
https://arxiv.org/pdf/1911.08004v1.pdf | |
PWC | https://paperswithcode.com/paper/consistent-recovery-threshold-of-hidden |
Repo | |
Framework | |
Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks
Title | Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks |
Authors | Julius von Kügelgen, Paul K Rubenstein, Bernhard Schölkopf, Adrian Weller |
Abstract | We study the problem of causal discovery through targeted interventions. Starting from few observational measurements, we follow a Bayesian active learning approach to perform those experiments which, in expectation with respect to the current model, are maximally informative about the underlying causal structure. Unlike previous work, we consider the setting of continuous random variables with non-linear functional relationships, modelled with Gaussian process priors. To address the arising problem of choosing from an uncountable set of possible interventions, we propose to use Bayesian optimisation to efficiently maximise a Monte Carlo estimate of the expected information gain. |
Tasks | Active Learning, Bayesian Optimisation, Causal Discovery |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.03962v1 |
https://arxiv.org/pdf/1910.03962v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-experimental-design-via-bayesian |
Repo | |
Framework | |