Paper Group ANR 467
Revisiting Ensembles in an Adversarial Context: Improving Natural Accuracy. Towards a Kernel based Physical Interpretation of Model Uncertainty. Ripple Walk Training: A Subgraph-based training framework for Large and Deep Graph Neural Network. Soft Threshold Weight Reparameterization for Learnable Sparsity. Curating Social Media Data. Quantile Regu …
Revisiting Ensembles in an Adversarial Context: Improving Natural Accuracy
Title | Revisiting Ensembles in an Adversarial Context: Improving Natural Accuracy |
Authors | Aditya Saligrama, Guillaume Leclerc |
Abstract | A necessary characteristic for the deployment of deep learning models in real world applications is resistance to small adversarial perturbations while maintaining accuracy on non-malicious inputs. While robust training provides models that exhibit better adversarial accuracy than standard models, there is still a significant gap in natural accuracy between robust and non-robust models which we aim to bridge. We consider a number of ensemble methods designed to mitigate this performance difference. Our key insight is that model trained to withstand small attacks, when ensembled, can often withstand significantly larger attacks, and this concept can in turn be leveraged to optimize natural accuracy. We consider two schemes, one that combines predictions from several randomly initialized robust models, and the other that fuses features from robust and standard models. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11572v1 |
https://arxiv.org/pdf/2002.11572v1.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-ensembles-in-an-adversarial |
Repo | |
Framework | |
Towards a Kernel based Physical Interpretation of Model Uncertainty
Title | Towards a Kernel based Physical Interpretation of Model Uncertainty |
Authors | Rishabh Singh, Jose C. Principe |
Abstract | This paper introduces a new information theoretic framework that provides a sensitive multi-modal quantification of data uncertainty by leveraging a quantum physical description of its metric space. We specifically work with the kernel mean embedding metric which yields an intuitive physical interpretation of the signal as a potential field, resulting in its new energy based formulation. This enables one to extract multi-scale uncertainty features of data in the form of information eigenmodes by utilizing moment decomposition concepts of quantum physics. In essence, this approach decomposes local realizations of the signal’s PDF in terms of quantum uncertainty moments. We specifically present the application of this framework as a non-parametric and non-intrusive surrogate tool for predictive uncertainty quantification of point-prediction neural network models, overcoming various limitations of conventional Bayesian and ensemble based UQ methods. Experimental comparisons with some established uncertainty quantification methods illustrate performance advantages exhibited by our framework. |
Tasks | |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2001.11495v2 |
https://arxiv.org/pdf/2001.11495v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-a-kernel-based-physical |
Repo | |
Framework | |
Ripple Walk Training: A Subgraph-based training framework for Large and Deep Graph Neural Network
Title | Ripple Walk Training: A Subgraph-based training framework for Large and Deep Graph Neural Network |
Authors | Jiyang Bai, Yuxiang Ren, Jiawei Zhang |
Abstract | Graph neural networks (GNNs) have achieved outstanding performance in learning graph-structured data. Many current GNNs suffer from three problems when facing large-size graphs and using a deeper structure: neighbors explosion, node dependence, and oversmoothing. In this paper, we propose a general subgraph-based training framework, namely Ripple Walk Training (RWT), for deep and large graph neural networks. RWT samples subgraphs from the full graph to constitute a mini-batch and the full GNN is updated based on the mini-batch gradient. We analyze the high-quality subgraphs required in a mini-batch in a theoretical way. A novel sampling method Ripple Walk Sampler works for sampling these high-quality subgraphs to constitute the mini-batch, which considers both the randomness and connectivity of the graph-structured data. Extensive experiments on different sizes of graphs demonstrate the effectiveness of RWT in training various GNNs (GCN & GAT). |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.07206v2 |
https://arxiv.org/pdf/2002.07206v2.pdf | |
PWC | https://paperswithcode.com/paper/ripple-walk-training-a-subgraph-based |
Repo | |
Framework | |
Soft Threshold Weight Reparameterization for Learnable Sparsity
Title | Soft Threshold Weight Reparameterization for Learnable Sparsity |
Authors | Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi |
Abstract | Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic non-uniform sparsity budgets which have sub-optimal layer-wise parameter allocation resulting in a) lower prediction accuracy or b) higher inference cost (FLOPs). This work proposes Soft Threshold Reparameterization (STR), a novel use of the soft-threshold operator on DNN weights. STR smoothly induces sparsity while learning pruning thresholds thereby obtaining a non-uniform sparsity budget. Our method achieves state-of-the-art accuracy for unstructured sparsity in CNNs (ResNet50 and MobileNetV1 on ImageNet-1K), and, additionally, learns non-uniform budgets that empirically reduce the FLOPs by up to 50%. Notably, STR boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs. In short, STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics. |
Tasks | |
Published | 2020-02-08 |
URL | https://arxiv.org/abs/2002.03231v4 |
https://arxiv.org/pdf/2002.03231v4.pdf | |
PWC | https://paperswithcode.com/paper/soft-threshold-weight-reparameterization-for |
Repo | |
Framework | |
Curating Social Media Data
Title | Curating Social Media Data |
Authors | Kushal Vaghani |
Abstract | Social media platforms have empowered the democratization of the pulse of people in the modern era. Due to its immense popularity and high usage, data published on social media sites (e.g., Twitter, Facebook and Tumblr) is a rich ocean of information. Therefore data-driven analytics of social imprints has become a vital asset for organisations and governments to further improve their products and services. However, due to the dynamic and noisy nature of social media data, performing accurate analysis on raw data is a challenging task. A key requirement is to curate the raw data before fed into analytics pipelines. This curation process transforms the raw data into contextualized data and knowledge. We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data and preparing it for reliable analytics. Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools. Further, we offer a dual-correction mechanism using both automated and crowd-sourced approaches. The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data. For the purposes of this research, we use Twitter as our motivational social media data platform due to its popularity. |
Tasks | |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09202v1 |
https://arxiv.org/pdf/2002.09202v1.pdf | |
PWC | https://paperswithcode.com/paper/curating-social-media-data |
Repo | |
Framework | |
Quantile Regularization: Towards Implicit Calibration of Regression Models
Title | Quantile Regularization: Towards Implicit Calibration of Regression Models |
Authors | Saiteja Utpala, Piyush Rai |
Abstract | Recent works have shown that most deep learning models are often poorly calibrated, i.e., they may produce overconfident predictions that are wrong. It is therefore desirable to have models that produce predictive uncertainty estimates that are reliable. Several approaches have been proposed recently to calibrate classification models. However, there is relatively little work on calibrating regression models. We present a method for calibrating regression models based on a novel quantile regularizer defined as the cumulative KL divergence between two CDFs. Unlike most of the existing approaches for calibrating regression models, which are based on post-hoc processing of the model’s output and require an additional dataset, our method is trainable in an end-to-end fashion without requiring an additional dataset. The proposed regularizer can be used with any training objective for regression. We also show that post-hoc calibration methods like Isotonic Calibration sometimes compound miscalibration whereas our method provides consistently better calibrations. We provide empirical results demonstrating that the proposed quantile regularizer significantly improves calibration for regression models trained using approaches, such as Dropout VI and Deep Ensembles. |
Tasks | Calibration |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12860v1 |
https://arxiv.org/pdf/2002.12860v1.pdf | |
PWC | https://paperswithcode.com/paper/quantile-regularization-towards-implicit |
Repo | |
Framework | |
Multiple Metric Learning for Structured Data
Title | Multiple Metric Learning for Structured Data |
Authors | Nicolo Colombo |
Abstract | We address the problem of merging graph and feature-space information while learning a metric from structured data. Existing algorithms tackle the problem in an asymmetric way, by either extracting vectorized summaries of the graph structure or adding hard constraints to feature-space algorithms. Following a different path, we define a metric regression scheme where we train metric-constrained linear combinations of dissimilarity matrices. The idea is that the input matrices can be pre-computed dissimilarity measures obtained from any kind of available data (e.g. node attributes or edge structure). As the model inputs are distance measures, we do not need to assume the existence of any underlying feature space. Main challenge is that metric constraints (especially positive-definiteness and sub-additivity), are not automatically respected if, for example, the coefficients of the linear combination are allowed to be negative. Both positive and sub-additive constraints are linear inequalities, but the computational complexity of imposing them scales as O(D3), where D is the size of the input matrices (i.e. the size of the data set). This becomes quickly prohibitive, even when D is relatively small. We propose a new graph-based technique for optimizing under such constraints and show that, in some cases, our approach may reduce the original computational complexity of the optimization process by one order of magnitude. Contrarily to existing methods, our scheme applies to any (possibly non-convex) metric-constrained objective function. |
Tasks | Metric Learning |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05747v1 |
https://arxiv.org/pdf/2002.05747v1.pdf | |
PWC | https://paperswithcode.com/paper/multiple-metric-learning-for-structured-data |
Repo | |
Framework | |
Double Explore-then-Commit: Asymptotic Optimality and Beyond
Title | Double Explore-then-Commit: Asymptotic Optimality and Beyond |
Authors | Tianyuan Jin, Pan Xu, Xiaokui Xiao, Quanquan Gu |
Abstract | We study the two-armed bandit problem with subGaussian rewards. The explore-then-commit (ETC) strategy, which consists of an exploration phase followed by an exploitation phase, is one of the most widely used algorithms in a variety of online decision applications. Nevertheless, it has been shown in Garivier et al. (2016) that ETC is suboptimal in the asymptotic sense as the horizon grows, and thus, is worse than fully sequential strategies such as Upper Confidence Bound (UCB). In this paper, we argue that a variant of ETC algorithm can actually achieve the asymptotically optimal regret bounds for multi-armed bandit problems as UCB-type algorithms do. Specifically, we propose a double explore-then-commit (DETC) algorithm that has two exploration and exploitation phases. We prove that DETC achieves the asymptotically optimal regret bound as the time horizon goes to infinity. To our knowledge, DETC is the first non-fully-sequential algorithm that achieves such asymptotic optimality. In addition, we extend DETC to batched bandit problems, where (i) the exploration process is split into a small number of batches and (ii) the round complexity is of central interest. We prove that a batched version of DETC can achieve the asymptotic optimality with only constant round complexity. This is the first batched bandit algorithm that can attain asymptotic optimality in terms of both regret and round complexity. |
Tasks | |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09174v1 |
https://arxiv.org/pdf/2002.09174v1.pdf | |
PWC | https://paperswithcode.com/paper/double-explore-then-commit-asymptotic |
Repo | |
Framework | |
Secure multiparty computations in floating-point arithmetic
Title | Secure multiparty computations in floating-point arithmetic |
Authors | Chuan Guo, Awni Hannun, Brian Knott, Laurens van der Maaten, Mark Tygert, Ruiyu Zhu |
Abstract | Secure multiparty computations enable the distribution of so-called shares of sensitive data to multiple parties such that the multiple parties can effectively process the data while being unable to glean much information about the data (at least not without collusion among all parties to put back together all the shares). Thus, the parties may conspire to send all their processed results to a trusted third party (perhaps the data provider) at the conclusion of the computations, with only the trusted third party being able to view the final results. Secure multiparty computations for privacy-preserving machine-learning turn out to be possible using solely standard floating-point arithmetic, at least with a carefully controlled leakage of information less than the loss of accuracy due to roundoff, all backed by rigorous mathematical proofs of worst-case bounds on information loss and numerical stability in finite-precision arithmetic. Numerical examples illustrate the high performance attained on commodity off-the-shelf hardware for generalized linear models, including ordinary linear least-squares regression, binary and multinomial logistic regression, probit regression, and Poisson regression. |
Tasks | Mathematical Proofs |
Published | 2020-01-09 |
URL | https://arxiv.org/abs/2001.03192v1 |
https://arxiv.org/pdf/2001.03192v1.pdf | |
PWC | https://paperswithcode.com/paper/secure-multiparty-computations-in-floating |
Repo | |
Framework | |
Learning Latent Causal Structures with a Redundant Input Neural Network
Title | Learning Latent Causal Structures with a Redundant Input Neural Network |
Authors | Jonathan D. Young, Bryan Andrews, Gregory F. Cooper, Xinghua Lu |
Abstract | Most causal discovery algorithms find causal structure among a set of observed variables. Learning the causal structure among latent variables remains an important open problem, particularly when using high-dimensional data. In this paper, we address a problem for which it is known that inputs cause outputs, and these causal relationships are encoded by a causal network among a set of an unknown number of latent variables. We developed a deep learning model, which we call a redundant input neural network (RINN), with a modified architecture and a regularized objective function to find causal relationships between input, hidden, and output variables. More specifically, our model allows input variables to directly interact with all latent variables in a neural network to influence what information the latent variables should encode in order to generate the output variables accurately. In this setting, the direct connections between input and latent variables makes the latent variables partially interpretable; furthermore, the connectivity among the latent variables in the neural network serves to model their potential causal relationships to each other and to the output variables. A series of simulation experiments provide support that the RINN method can successfully recover latent causal structure between input and output variables. |
Tasks | Causal Discovery |
Published | 2020-03-29 |
URL | https://arxiv.org/abs/2003.13135v1 |
https://arxiv.org/pdf/2003.13135v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-latent-causal-structures-with-a |
Repo | |
Framework | |
A lower bound for the ELBO of the Bernoulli Variational Autoencoder
Title | A lower bound for the ELBO of the Bernoulli Variational Autoencoder |
Authors | Robert Sicks, Ralf Korn, Stefanie Schwaar |
Abstract | We consider a variational autoencoder (VAE) for binary data. Our main innovations are an interpretable lower bound for its training objective, a modified initialization and architecture of such a VAE that leads to faster training, and a decision support for finding the appropriate dimension of the latent space via using a PCA. Numerical examples illustrate our theoretical result and the performance of the new architecture. |
Tasks | |
Published | 2020-03-26 |
URL | https://arxiv.org/abs/2003.11830v1 |
https://arxiv.org/pdf/2003.11830v1.pdf | |
PWC | https://paperswithcode.com/paper/a-lower-bound-for-the-elbo-of-the-bernoulli |
Repo | |
Framework | |
Sampling and Learning for Boolean Function
Title | Sampling and Learning for Boolean Function |
Authors | Chuyu Xiong |
Abstract | In this article, we continue our study on universal learning machine by introducing new tools. We first discuss boolean function and boolean circuit, and we establish one set of tools, namely, fitting extremum and proper sampling set. We proved the fundamental relationship between proper sampling set and complexity of boolean circuit. Armed with this set of tools, we then introduce much more effective learning strategies. We show that with such learning strategies and learning dynamics, universal learning can be achieved, and requires much less data. |
Tasks | |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07317v1 |
https://arxiv.org/pdf/2001.07317v1.pdf | |
PWC | https://paperswithcode.com/paper/sampling-and-learning-for-boolean-function |
Repo | |
Framework | |
Using Image Captions and Multitask Learning for Recommending Query Reformulations
Title | Using Image Captions and Multitask Learning for Recommending Query Reformulations |
Authors | Gaurav Verma, Vishwa Vinay, Sahil Bansal, Shashank Oberoi, Makkunda Sharma, Prakhar Gupta |
Abstract | Interactive search sessions often contain multiple queries, where the user submits a reformulated version of the previous query in response to the original results. We aim to enhance the query recommendation experience for a commercial image search engine. Our proposed methodology incorporates current state-of-the-art practices from relevant literature – the use of generation-based sequence-to-sequence models that capture session context, and a multitask architecture that simultaneously optimizes the ranking of results. We extend this setup by driving the learning of such a model with captions of clicked images as the target, instead of using the subsequent query within the session. Since these captions tend to be linguistically richer, the reformulation mechanism can be seen as assistance to construct more descriptive queries. In addition, via the use of a pairwise loss for the secondary ranking task, we show that the generated reformulations are more diverse. |
Tasks | Image Captioning, Image Retrieval |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.00708v1 |
https://arxiv.org/pdf/2003.00708v1.pdf | |
PWC | https://paperswithcode.com/paper/using-image-captions-and-multitask-learning |
Repo | |
Framework | |
The Ladder Algorithm: Finding Repetitive Structures in Medical Images by Induction
Title | The Ladder Algorithm: Finding Repetitive Structures in Medical Images by Induction |
Authors | Rhydian Windsor, Amir Jamaludin |
Abstract | In this paper we introduce the Ladder Algorithm; a novel recurrent algorithm to detect repetitive structures in natural images with high accuracy using little training data. We then demonstrate the algorithm on the task of extracting vertebrae from whole spine magnetic resonance scans with only lumbar MR scans for training data. It is shown to achieve high perforamance with 99.8% precision and recall, exceeding current state of the art approaches for lumbar vertebrae detection in T1 and T2 weighted scans. It also generalises without retraining to whole spine images with minimal drop in accuracy, achieving 99.4% detection rate. |
Tasks | |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2001.11284v1 |
https://arxiv.org/pdf/2001.11284v1.pdf | |
PWC | https://paperswithcode.com/paper/the-ladder-algorithm-finding-repetitive |
Repo | |
Framework | |
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Title | AutoML-Zero: Evolving Machine Learning Algorithms From Scratch |
Authors | Esteban Real, Chen Liang, David R. So, Quoc V. Le |
Abstract | Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant progress. However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks—or similarly restrictive search spaces. Our goal is to show that AutoML can go further: it is possible today to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space. Despite the vastness of this space, evolutionary search can still discover two-layer neural networks trained by backpropagation. These simple neural networks can then be surpassed by evolving directly on tasks of interest, e.g. CIFAR-10 variants, where modern techniques emerge in the top algorithms, such as bilinear interactions, normalized gradients, and weight averaging. Moreover, evolution adapts algorithms to different task types: e.g., dropout-like techniques appear when little data is available. We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction for the field. |
Tasks | AutoML |
Published | 2020-03-06 |
URL | https://arxiv.org/abs/2003.03384v1 |
https://arxiv.org/pdf/2003.03384v1.pdf | |
PWC | https://paperswithcode.com/paper/automl-zero-evolving-machine-learning |
Repo | |
Framework | |