January 29, 2020

3098 words 15 mins read

Paper Group ANR 528

$b$-Bit Sketch Trie: Scalable Similarity Search on Integer Sketches. First order expansion of convex regularized estimators. Explaining Predictions from Tree-based Boosting Ensembles. Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis. Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift. An i …

$b$-Bit Sketch Trie: Scalable Similarity Search on Integer Sketches


Title	$b$-Bit Sketch Trie: Scalable Similarity Search on Integer Sketches
Authors	Shunsuke Kanda, Yasuo Tabei
Abstract	Recently, randomly mapping vectorial data to strings of discrete symbols (i.e., sketches) for fast and space-efficient similarity searches has become popular. Such random mapping is called similarity-preserving hashing and approximates a similarity metric by using the Hamming distance. Although many efficient similarity searches have been proposed, most of them are designed for binary sketches. Similarity searches on integer sketches are in their infancy. In this paper, we present a novel space-efficient trie named $b$-bit sketch trie on integer sketches for scalable similarity searches by leveraging the idea behind succinct data structures (i.e., space-efficient data structures while supporting various data operations in the compressed format) and a favorable property of integer sketches as fixed-length strings. Our experimental results obtained using real-world datasets show that a trie-based index is built from integer sketches and efficiently performs similarity searches on the index by pruning useless portions of the search space, which greatly improves the search time and space-efficiency of the similarity search. The experimental results show that our similarity search is at most one order of magnitude faster than state-of-the-art similarity searches. Besides, our method needs only 10 GiB of memory on a billion-scale database, while state-of-the-art similarity searches need 29 GiB of memory.
Tasks
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08278v1
PDF	https://arxiv.org/pdf/1910.08278v1.pdf
PWC	https://paperswithcode.com/paper/b-bit-sketch-trie-scalable-similarity-search
Repo
Framework

First order expansion of convex regularized estimators


Title	First order expansion of convex regularized estimators
Authors	Pierre C Bellec, Arun K Kuchibhotla
Abstract	We consider first order expansions of convex penalized estimators in high-dimensional regression problems with random designs. Our setting includes linear regression and logistic regression as special cases. For a given penalty function $h$ and the corresponding penalized estimator $\hat\beta$, we construct a quantity $\eta$, the first order expansion of $\hat\beta$, such that the distance between $\hat\beta$ and $\eta$ is an order of magnitude smaller than the estimation error $\hat{\beta} - \beta^*$. In this sense, the first order expansion $\eta$ can be thought of as a generalization of influence functions from the mathematical statistics literature to regularized estimators in high-dimensions. Such first order expansion implies that the risk of $\hat{\beta}$ is asymptotically the same as the risk of $\eta$ which leads to a precise characterization of the MSE of $\hat\beta$; this characterization takes a particularly simple form for isotropic design. Such first order expansion also leads to inference results based on $\hat{\beta}$. We provide sufficient conditions for the existence of such first order expansion for three regularizers: the Lasso in its constrained form, the lasso in its penalized form, and the Group-Lasso. The results apply to general loss functions under some conditions and those conditions are satisfied for the squared loss in linear regression and for the logistic loss in the logistic model.
Tasks
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05480v2
PDF	https://arxiv.org/pdf/1910.05480v2.pdf
PWC	https://paperswithcode.com/paper/first-order-expansion-of-convex-regularized
Repo
Framework

Explaining Predictions from Tree-based Boosting Ensembles


Title	Explaining Predictions from Tree-based Boosting Ensembles
Authors	Ana Lucic, Hinda Haned, Maarten de Rijke
Abstract	Understanding how “black-box” models arrive at their predictions has sparked significant interest from both within and outside the AI community. Our work focuses on doing this by generating local explanations about individual predictions for tree-based ensembles, specifically Gradient Boosting Decision Trees (GBDTs). Given a correctly predicted instance in the training set, we wish to generate a counterfactual explanation for this instance, that is, the minimal perturbation of this instance such that the prediction flips to the opposite class. Most existing methods for counterfactual explanations are (1) model-agnostic, so they do not take into account the structure of the original model, and/or (2) involve building a surrogate model on top of the original model, which is not guaranteed to represent the original model accurately. There exists a method specifically for random forests; we wish to extend this method for GBDTs. This involves accounting for (1) the sequential dependency between trees and (2) training on the negative gradients instead of the original labels.
Tasks
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02582v1
PDF	https://arxiv.org/pdf/1907.02582v1.pdf
PWC	https://paperswithcode.com/paper/explaining-predictions-from-tree-based
Repo
Framework

Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis


Title	Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis
Authors	Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Abstract	We describe a detailed analysis of a sample of large benchmark of commonsense reasoning problems that has been automatically obtained from WordNet, SUMO and their mapping. The objective is to provide a better assessment of the quality of both the benchmark and the involved knowledge resources for advanced commonsense reasoning tasks. By means of this analysis, we are able to detect some knowledge misalignments, mapping errors and lack of knowledge and resources. Our final objective is the extraction of some guidelines towards a better exploitation of this commonsense knowledge framework by the improvement of the included resources.
Tasks
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02314v2
PDF	https://arxiv.org/pdf/1909.02314v2.pdf
PWC	https://paperswithcode.com/paper/commonsense-reasoningusing-wordnet-and-sumo-a
Repo
Framework

Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift


Title	Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
Authors	Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D Sculley, Sebastian Nowozin, Joshua V. Dillon, Balaji Lakshminarayanan, Jasper Snoek
Abstract	Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive {\em uncertainty}. Quantifying uncertainty is especially critical in real-world settings, which often involve input distributions that are shifted from the training distribution due to a variety of factors including sample bias and non-stationarity. In such settings, well calibrated uncertainty estimates convey information about when a model’s output should (or should not) be trusted. Many probabilistic deep learning methods, including Bayesian-and non-Bayesian methods, have been proposed in the literature for quantifying predictive uncertainty, but to our knowledge there has not previously been a rigorous large-scale empirical comparison of these methods under dataset shift. We present a large-scale benchmark of existing state-of-the-art methods on classification problems and investigate the effect of dataset shift on accuracy and calibration. We find that traditional post-hoc calibration does indeed fall short, as do several other previous methods. However, some methods that marginalize over models give surprisingly strong results across a broad spectrum of tasks.
Tasks	Calibration
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02530v2
PDF	https://arxiv.org/pdf/1906.02530v2.pdf
PWC	https://paperswithcode.com/paper/can-you-trust-your-models-uncertainty
Repo
Framework

An introduction to flexible methods for policy evaluation


Title	An introduction to flexible methods for policy evaluation
Authors	Martin Huber
Abstract	This chapter covers different approaches to policy evaluation for assessing the causal effect of a treatment or intervention on an outcome of interest. As an introduction to causal inference, the discussion starts with the experimental evaluation of a randomized treatment. It then reviews evaluation methods based on selection on observables (assuming a quasi-random treatment given observed covariates), instrumental variables (inducing a quasi-random shift in the treatment), difference-in-differences and changes-in-changes (exploiting changes in outcomes over time), as well as regression discontinuities and kinks (using changes in the treatment assignment at some threshold of a running variable). The chapter discusses methods particularly suited for data with many observations for a flexible (i.e. semi- or nonparametric) modeling of treatment effects, and/or many (i.e. high dimensional) observed covariates by applying machine learning to select and control for covariates in a data-driven way. This is not only useful for tackling confounding by controlling for instance for factors jointly affecting the treatment and the outcome, but also for learning effect heterogeneities across subgroups defined upon observable covariates and optimally targeting those groups for which the treatment is most effective.
Tasks	Causal Inference
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00641v1
PDF	https://arxiv.org/pdf/1910.00641v1.pdf
PWC	https://paperswithcode.com/paper/an-introduction-to-flexible-methods-for
Repo
Framework

Contraction Clustering (RASTER): A Very Fast Big Data Algorithm for Sequential and Parallel Density-Based Clustering in Linear Time, Constant Memory, and a Single Pass


Title	Contraction Clustering (RASTER): A Very Fast Big Data Algorithm for Sequential and Parallel Density-Based Clustering in Linear Time, Constant Memory, and a Single Pass
Authors	Gregor Ulm, Simon Smith, Adrian Nilsson, Emil Gustavsson, Mats Jirstrand
Abstract	Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering algorithms are infeasible due to their high memory requirements and/or unfavorable runtime complexity. In contrast, Contraction Clustering (RASTER) is a single-pass algorithm for identifying density-based clusters with linear time complexity. Due to its favorable runtime and the fact that its memory requirements are constant, this algorithm is highly suitable for big data applications where the amount of data to be processed is huge. It consists of two steps: (1) a contraction step which projects objects onto tiles and (2) an agglomeration step which groups tiles into clusters. This algorithm is extremely fast in both sequential and parallel execution. Our quantitative evaluation shows that a sequential implementation of RASTER performs significantly better than various standard clustering algorithms. Furthermore, the parallel speedup is significant: on a contemporary workstation, an implementation in Rust processes a batch of 500 million points with 1 million clusters in less than 50 seconds on one core. With 8 cores, the algorithm is about four times faster.
Tasks
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03620v2
PDF	https://arxiv.org/pdf/1907.03620v2.pdf
PWC	https://paperswithcode.com/paper/contraction-clustering-raster-a-very-fast-big
Repo
Framework

On the discriminative power of Hyper-parameters in Cross-Validation and how to choose them


Title	On the discriminative power of Hyper-parameters in Cross-Validation and how to choose them
Authors	Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio, Claudio Pomo, Azzurra Ragone
Abstract	Hyper-parameters tuning is a crucial task to make a model perform at its best. However, despite the well-established methodologies, some aspects of the tuning remain unexplored. As an example, it may affect not just accuracy but also novelty as well as it may depend on the adopted dataset. Moreover, sometimes it could be sufficient to concentrate on a single parameter only (or a few of them) instead of their overall set. In this paper we report on our investigation on hyper-parameters tuning by performing an extensive 10-Folds Cross-Validation on MovieLens and Amazon Movies for three well-known baselines: User-kNN, Item-kNN, BPR-MF. We adopted a grid search strategy considering approximately 15 values for each parameter, and we then evaluated each combination of parameters in terms of accuracy and novelty. We investigated the discriminative power of nDCG, Precision, Recall, MRR, EFD, EPC, and, finally, we analyzed the role of parameters on model evaluation for Cross-Validation.
Tasks
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02523v1
PDF	https://arxiv.org/pdf/1909.02523v1.pdf
PWC	https://paperswithcode.com/paper/on-the-discriminative-power-of-hyper
Repo
Framework

Efficient Parameter Estimation of Sampled Random Fields


Title	Efficient Parameter Estimation of Sampled Random Fields
Authors	Arthur P. Guillaumin, Adam M. Sykulski, Sofia C. Olhede, Frederik J. Simons
Abstract	We provide a computationally and statistically efficient method for estimating the parameters of a stochastic Gaussian model observed on a spatial grid, which need not be rectangular. Standard methods are plagued by computational intractability, where designing methods that can be implemented for realistically sized problems has been an issue for a long time. This has motivated the use of the Fourier Transform and the Whittle likelihood approximation. The challenge of frequency-domain methods is to determine and account for observational boundary effects, missing data, and the shape of the observed spatial grid. In this paper we address these effects explicitly by proposing a new quasi-likelihood estimator. We prove consistency and asymptotic normality of our estimator, and show that the proposed method solves boundary issues with Whittle estimation for finite samples, yielding parameter estimates with significantly reduced bias and error. We demonstrate the effectiveness of our method for incomplete lattices, in comparison to other recent methods. Finally, we apply our method to estimate the parameters of a Mat'ern process used to model data from Venus’ topography.
Tasks
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02447v2
PDF	https://arxiv.org/pdf/1907.02447v2.pdf
PWC	https://paperswithcode.com/paper/efficient-parameter-estimation-of-sampled
Repo
Framework

Neural Memory Networks for Seizure Type Classification


Title	Neural Memory Networks for Seizure Type Classification
Authors	David Ahmedt-Aristizabal, Tharindu Fernando, Simon Denman, Lars Petersson, Matthew J. Aburn, Clinton Fookes
Abstract	Classification of seizure type is a key step in the clinical process for evaluating an individual who presents with seizures. It determines the course of clinical diagnosis and treatment, and its impact stretches beyond the clinical domain to epilepsy research and the development of novel therapies. Automated identification of seizure type may facilitate understanding of the disease, and seizure detection and prediction has been the focus of recent research that has sought to exploit the benefits of machine learning and deep learning architectures. Nevertheless, there is not yet a definitive solution for automating the classification of seizure type, a task that must currently be performed by an expert epileptologist. Inspired by recent advances in neural memory networks (NMNs), we introduce a novel approach for the classification of seizure type using electrophysiological data. We first explore the performance of traditional deep learning techniques which use convolutional and recurrent neural networks, and enhance these architectures by using external memory modules with trainable neural plasticity. We show that our model achieves a state-of-the-art weighted F1 score of 0.945 for seizure type classification on the TUH EEG Seizure Corpus with the IBM TUSZ preprocessed data. This work highlights the potential of neural memory networks to support the field of epilepsy research, along with biomedical research and signal analysis more broadly.
Tasks	EEG, Seizure Detection
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04968v2
PDF	https://arxiv.org/pdf/1912.04968v2.pdf
PWC	https://paperswithcode.com/paper/neural-memory-networks-for-robust
Repo
Framework

An Analisys of Application Logs with Splunk : developing an App for the synthetic analysis of data and security incidents


Title	An Analisys of Application Logs with Splunk : developing an App for the synthetic analysis of data and security incidents
Authors	Roberto Bruzzese
Abstract	The present work aims to enhance the application logs of an hypothetical infrastructure platform, and to build an App that displays the synthetic data about performance, anomalies and security incidents synthesized in the form of a Dashboard. The reference architecture, with multiple applications and multiple HW distribution, implementing a Service Oriented Architecture, is a real case of which the details have been abstracted because we want to extend the concept to all architectures with similar characteristics.
Tasks
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11283v1
PDF	https://arxiv.org/pdf/1912.11283v1.pdf
PWC	https://paperswithcode.com/paper/an-analisys-of-application-logs-with-splunk
Repo
Framework

Exploring Neural Net Augmentation to BERT for Question Answering on SQUAD 2.0


Title	Exploring Neural Net Augmentation to BERT for Question Answering on SQUAD 2.0
Authors	Suhas Gupta
Abstract	Enhancing machine capabilities to answer questions has been a topic of considerable focus in recent years of NLP research. Language models like Embeddings from Language Models (ELMo)[1] and Bidirectional Encoder Representations from Transformers (BERT) [2] have been very successful in developing general purpose language models that can be optimized for a large number of downstream language tasks. In this work, we focused on augmenting the pre-trained BERT language model with different output neural net architectures and compared their performance on question answering task posed by the Stanford Question Answering Dataset 2.0 (SQUAD 2.0) [3]. Additionally, we also fine-tuned the pre-trained BERT model parameters to demonstrate its effectiveness in adapting to specialized language tasks. Our best output network, is the contextualized CNN that performs on both the unanswerable and answerable question answering tasks with F1 scores of 75.32 and 64.85 respectively.
Tasks	Language Modelling, Question Answering
Published	2019-08-04
URL	https://arxiv.org/abs/1908.01767v3
PDF	https://arxiv.org/pdf/1908.01767v3.pdf
PWC	https://paperswithcode.com/paper/exploring-neural-net-augmentation-to-bert-for
Repo
Framework

Thinking Slow about Latency Evaluation for Simultaneous Machine Translation


Title	Thinking Slow about Latency Evaluation for Simultaneous Machine Translation
Authors	Colin Cherry, George Foster
Abstract	Simultaneous machine translation attempts to translate a source sentence before it is finished being spoken, with applications to translation of spoken language for live streaming and conversation. Since simultaneous systems trade quality to reduce latency, having an effective and interpretable latency metric is crucial. We introduce a variant of the recently proposed Average Lagging (AL) metric, which we call Differentiable Average Lagging (DAL). It distinguishes itself by being differentiable and internally consistent to its underlying mathematical model.
Tasks	Machine Translation
Published	2019-05-31
URL	https://arxiv.org/abs/1906.00048v1
PDF	https://arxiv.org/pdf/1906.00048v1.pdf
PWC	https://paperswithcode.com/paper/190600048
Repo
Framework

SpaRCe: Sparse reservoir computing


Title	SpaRCe: Sparse reservoir computing
Authors	Luca Manneschi, Andrew C. Lin, Eleni Vasilaki
Abstract	“Sparse” neural networks, in which relatively few neurons or connections are active, are common in both machine learning and neuroscience. Whereas in machine learning, “sparseness” is related to a penalty term which effectively leads to some connecting weights becoming small or zero, in biological brains, sparseness is often created when high spiking thresholds prevent neuronal activity. Inspired by neuroscience, here we introduce sparseness into a reservoir computing network via neuron-specific learnable thresholds of activity, allowing neurons with low thresholds to give output but silencing outputs from neurons with high thresholds. This approach, which we term “SpaRCe”, optimises the sparseness level of the reservoir and applies the threshold mechanism to the information received by the read-out weights. Both the read-out weights and the thresholds are learned by a standard on-line gradient rule that minimises an error function on the outputs of the network. Threshold learning occurs by the balance of two opposing forces: reducing inter-neuronal correlations in the reservoir by deactivating redundant neurons, while increasing the activity of neurons participating in correct decisions. We test SpaRCe in a set of classification problems and find that introducing threshold learning improves performance compared to standard reservoir computing networks.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.08124v1
PDF	https://arxiv.org/pdf/1912.08124v1.pdf
PWC	https://paperswithcode.com/paper/sparce-sparse-reservoir-computing
Repo
Framework

A Model of Double Descent for High-dimensional Binary Linear Classification


Title	A Model of Double Descent for High-dimensional Binary Linear Classification
Authors	Zeyu Deng, Abla Kammoun, Christos Thrampoulidis
Abstract	We consider a model for logistic regression where only a subset of features of size $p$ is used for training a linear classifier over $n$ training samples. The classifier is obtained by running gradient-descent (GD) on the logistic-loss. For this model, we investigate the dependence of the generalization error on the overparameterization ratio $\kappa=p/n$. First, building on known deterministic results on convergence properties of the GD, we uncover a phase-transition phenomenon for the case of Gaussian regressors: the generalization error of GD is the same as that of the maximum-likelihood (ML) solution when $\kappa<\kappa_\star$, and that of the max-margin (SVM) solution when $\kappa>\kappa_\star$. Next, using the convex Gaussian min-max theorem (CGMT), we sharply characterize the performance of both the ML and SVM solutions. Combining these results, we obtain curves that explicitly characterize the generalization error of GD for varying values of $\kappa$. The numerical results validate the theoretical predictions and unveil double-descent phenomena that complement similar recent observations in linear regression settings.
Tasks
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05822v1
PDF	https://arxiv.org/pdf/1911.05822v1.pdf
PWC	https://paperswithcode.com/paper/a-model-of-double-descent-for-high
Repo
Framework