April 3, 2020

# Paper Group AWR 50

The Reciprocal Bayesian LASSO. Certified Defenses for Adversarial Patches. Training Binary Neural Networks using the Bayesian Learning Rule. AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC. Tidying Deep Saliency Prediction Architectures. Directional Deep Embedding and Appearance Learning for Fast Video Object Segment …

#### The Reciprocal Bayesian LASSO

Title The Reciprocal Bayesian LASSO
Authors Himel Mallick, Rahim Alhamzawi, Vladimir Svetnik
Abstract A reciprocal LASSO (rLASSO) regularization employs a decreasing penalty function as opposed to conventional penalization methods that use increasing penalties on the coefficients, leading to stronger parsimony and superior model selection relative to traditional shrinkage methods. Here we consider a fully Bayesian formulation of the rLASSO problem, which is based on the observation that the rLASSO estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters are assigned independent inverse Laplace priors. Bayesian inference from this posterior is possible using an expanded hierarchy motivated by a scale mixture of double Pareto or truncated normal distributions. On simulated and real datasets, we show that the Bayesian formulation outperforms its classical cousin in estimation, prediction, and variable selection across a wide range of scenarios while offering the advantage of posterior inference. Finally, we discuss other variants of this new approach and provide a unified framework for variable selection using flexible reciprocal penalties. All methods described in this paper are publicly available as an R package at: https://github.com/himelmallick/BayesRecipe.
Published 2020-01-23
URL https://arxiv.org/abs/2001.08327v1
PDF https://arxiv.org/pdf/2001.08327v1.pdf
PWC https://paperswithcode.com/paper/the-reciprocal-bayesian-lasso
Repo https://github.com/himelmallick/BayesRecipe
Framework none

#### Certified Defenses for Adversarial Patches

Title Certified Defenses for Adversarial Patches
Authors Ping-Yeh Chiang, Renkun Ni, Ahmed Abdelkader, Chen Zhu, Christoph Studor, Tom Goldstein
Abstract Adversarial patch attacks are among one of the most practical threat models against real-world computer vision systems. This paper studies certified and empirical defenses against patch attacks. We begin with a set of experiments showing that most existing defenses, which work by pre-processing input images to mitigate adversarial patches, are easily broken by simple white-box adversaries. Motivated by this finding, we propose the first certified defense against patch attacks, and propose faster methods for its training. Furthermore, we experiment with different patch shapes for testing, obtaining surprisingly good robustness transfer across shapes, and present preliminary results on certified defense against sparse attacks. Our complete implementation can be found on: https://github.com/Ping-C/certifiedpatchdefense.
Published 2020-03-14
URL https://arxiv.org/abs/2003.06693v1
PDF https://arxiv.org/pdf/2003.06693v1.pdf
Repo https://github.com/Ping-C/certifiedpatchdefense
Framework pytorch

#### Training Binary Neural Networks using the Bayesian Learning Rule

Title Training Binary Neural Networks using the Bayesian Learning Rule
Authors Xiangming Meng, Roman Bachmann, Mohammad Emtiyaz Khan
Abstract Neural networks with binary weights are computation-efficient and hardware-friendly, but their training is challenging because it involves a discrete optimization problem. Surprisingly, ignoring the discrete nature of the problem and using gradient-based methods, such as Straight-Through Estimator, still works well in practice. This raises the question: are there principled approaches which justify such methods? In this paper, we propose such an approach using the Bayesian learning rule. The rule, when applied to estimate a Bernoulli distribution over the binary weights, results in an algorithm which justifies some of the algorithmic choices made by the previous approaches. The algorithm not only obtains state-of-the-art performance, but also enables uncertainty estimation for continual learning to avoid catastrophic forgetting. Our work provides a principled approach for training binary neural networks which justifies and extends existing approaches.
Published 2020-02-25
URL https://arxiv.org/abs/2002.10778v2
PDF https://arxiv.org/pdf/2002.10778v2.pdf
PWC https://paperswithcode.com/paper/training-binary-neural-networks-using-the
Repo https://github.com/team-approx-bayes/BayesBiNN
Framework pytorch

Authors Ruqi Zhang, A. Feder Cooper, Christopher De Sa
Abstract Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions. It is a faster alternative to HMC: instead of using the whole dataset at each iteration, SGHMC uses only a subsample. This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution. One can prevent this using a step size that decays to zero, but such a step size schedule can drastically slow down convergence. To address this tension, we propose a novel second-order SG-MCMC algorithm—AMAGOLD—that infrequently uses Metropolis-Hastings (M-H) corrections to remove bias. The infrequency of corrections amortizes their cost. We prove AMAGOLD converges to the target distribution with a fixed, rather than a diminishing, step size, and that its convergence rate is at most a constant factor slower than a full-batch baseline. We empirically demonstrate AMAGOLD’s effectiveness on synthetic distributions, Bayesian logistic regression, and Bayesian neural networks.
Published 2020-02-29
URL https://arxiv.org/abs/2003.00193v1
PDF https://arxiv.org/pdf/2003.00193v1.pdf
Repo https://github.com/ruqizhang/amagold
Framework pytorch

#### Tidying Deep Saliency Prediction Architectures

Title Tidying Deep Saliency Prediction Architectures
Abstract Learning computational models for visual attention (saliency estimation) is an effort to inch machines/robots closer to human visual cognitive abilities. Data-driven efforts have dominated the landscape since the introduction of deep neural network architectures. In deep learning research, the choices in architecture design are often empirical and frequently lead to more complex models than necessary. The complexity, in turn, hinders the application requirements. In this paper, we identify four key components of saliency models, i.e., input features, multi-level integration, readout architecture, and loss functions. We review the existing state of the art models on these four components and propose novel and simpler alternatives. As a result, we propose two novel end-to-end architectures called SimpleNet and MDNSal, which are neater, minimal, more interpretable and achieve state of the art performance on public saliency benchmarks. SimpleNet is an optimized encoder-decoder architecture and brings notable performance gains on the SALICON dataset (the largest saliency benchmark). MDNSal is a parametric model that directly predicts parameters of a GMM distribution and is aimed to bring more interpretability to the prediction maps. The proposed saliency models can be inferred at 25fps, making them suitable for real-time applications. Code and pre-trained models are available at https://github.com/samyak0210/saliency.
Published 2020-03-10
URL https://arxiv.org/abs/2003.04942v1
PDF https://arxiv.org/pdf/2003.04942v1.pdf
PWC https://paperswithcode.com/paper/tidying-deep-saliency-prediction
Repo https://github.com/samyak0210/saliency
Framework pytorch

#### Directional Deep Embedding and Appearance Learning for Fast Video Object Segmentation

Title Directional Deep Embedding and Appearance Learning for Fast Video Object Segmentation
Authors Yingjie Yin, De Xu, Xingang Wang, Lei Zhang
Abstract Most recent semi-supervised video object segmentation (VOS) methods rely on fine-tuning deep convolutional neural networks online using the given mask of the first frame or predicted masks of subsequent frames. However, the online fine-tuning process is usually time-consuming, limiting the practical use of such methods. We propose a directional deep embedding and appearance learning (DDEAL) method, which is free of the online fine-tuning process, for fast VOS. First, a global directional matching module, which can be efficiently implemented by parallel convolutional operations, is proposed to learn a semantic pixel-wise embedding as an internal guidance. Second, an effective directional appearance model based statistics is proposed to represent the target and background on a spherical embedding space for VOS. Equipped with the global directional matching module and the directional appearance model learning module, DDEAL learns static cues from the labeled first frame and dynamically updates cues of the subsequent frames for object segmentation. Our method exhibits state-of-the-art VOS performance without using online fine-tuning. Specifically, it achieves a J & F mean score of 74.8% on DAVIS 2017 dataset and an overall score G of 71.3% on the large-scale YouTube-VOS dataset, while retaining a speed of 25 fps with a single NVIDIA TITAN Xp GPU. Furthermore, our faster version runs 31 fps with only a little accuracy loss. Our code and trained networks are available at https://github.com/YingjieYin/Directional-Deep-Embedding-and-Appearance-Learning-for-Fast-Video-Object-Segmentation.
Tasks Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2020-02-17
URL https://arxiv.org/abs/2002.06736v1
PDF https://arxiv.org/pdf/2002.06736v1.pdf
PWC https://paperswithcode.com/paper/directional-deep-embedding-and-appearance
Repo https://github.com/YingjieYin/Directional-Deep-Embedding-and-Appearance-Learning-for-Fast-Video-Object-Segmentation
Framework pytorch

#### Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

Title Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits
Authors Thibaut Cuvelier, Richard Combes, Eric Gourdin
Abstract We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset {0,1}^d$ where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = {\cal O}\Big( {d (\ln m)^2 (\ln T) \over \Delta_{\min} }\Big)$, but it has computational complexity ${\cal O}({\cal X})$ which is typically exponential in $d$, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret $R(T) = {\cal O} \Big({d (\ln m)^2 (\ln T)\over \Delta_{\min} }\Big)$ and computational complexity ${\cal O}(T {\bf poly}(d))$. Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time ${\cal O}(T {\bf poly}(d))$ by repeatedly maximizing a linear function over ${\cal X}$ subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.
Published 2020-02-17
URL https://arxiv.org/abs/2002.07258v1
PDF https://arxiv.org/pdf/2002.07258v1.pdf
PWC https://paperswithcode.com/paper/statistically-efficient-polynomial-time
Repo https://github.com/dourouc05/CombinatorialBandits.jl
Framework none

#### Convolutional Kernel Networks for Graph-Structured Data

Title Convolutional Kernel Networks for Graph-Structured Data
Authors Dexiong Chen, Laurent Jacob, Julien Mairal
Abstract We introduce a family of multilayer graph kernels and establish new links between graph convolutional neural networks and kernel methods. Our approach generalizes convolutional kernel networks to graph-structured data, by representing graphs as a sequence of kernel feature maps, where each node carries information about local graph substructures. On the one hand, the kernel point of view offers an unsupervised, expressive, and easy-to-regularize data representation, which is useful when limited samples are available. On the other hand, our model can also be trained end-to-end on large-scale data, leading to new types of graph convolutional neural networks. We show that our method achieves competitive performance on several graph classification benchmarks, while offering simple model interpretation. Our code is freely available at https://github.com/claying/GCKN.
Published 2020-03-11
URL https://arxiv.org/abs/2003.05189v1
PDF https://arxiv.org/pdf/2003.05189v1.pdf
PWC https://paperswithcode.com/paper/convolutional-kernel-networks-for-graph
Repo https://github.com/claying/GCKN
Framework pytorch

#### COVID-CT-Dataset: A CT Scan Dataset about COVID-19

Title COVID-CT-Dataset: A CT Scan Dataset about COVID-19
Authors Jinyu Zhao, Yichen Zhang, Xuehai He, Pengtao Xie
Abstract CT scans are promising in providing accurate, fast, and cheap screening and testing of COVID-19. In this paper, we build a publicly available COVID-CT dataset, containing 275 CT scans that are positive for COVID-19, to foster the research and development of deep learning methods which predict whether a person is affected with COVID-19 by analyzing his/her CTs. We train a deep convolutional neural network on this dataset and achieve an F1 of 0.85 which is a promising performance but yet to be further improved. The data and code are available at https://github.com/UCSD-AI4H/COVID-CT
Published 2020-03-30
URL https://arxiv.org/abs/2003.13865v1
PDF https://arxiv.org/pdf/2003.13865v1.pdf
Repo https://github.com/UCSD-AI4H/COVID-CT
Framework pytorch

#### Flows for simultaneous manifold learning and density estimation

Title Flows for simultaneous manifold learning and density estimation
Authors Johann Brehmer, Kyle Cranmer
Abstract We introduce manifold-modeling flows (MFMFs), a new class of generative models that simultaneously learn the data manifold as well as a tractable probability density on that manifold. Combining aspects of normalizing flows, GANs, autoencoders, and energy-based models, they have the potential to represent data sets with a manifold structure more faithfully and provide handles on dimensionality reduction, denoising, and out-of-distribution detection. We argue why such models should not be trained by maximum likelihood alone and present a new training algorithm that separates manifold and density updates. With two pedagogical examples we demonstrate how manifold-modeling flows let us learn the data manifold and allow for better inference than standard flows in the ambient data space.
Tasks Denoising, Density Estimation, Dimensionality Reduction, Out-of-Distribution Detection
Published 2020-03-31
URL https://arxiv.org/abs/2003.13913v1
PDF https://arxiv.org/pdf/2003.13913v1.pdf
PWC https://paperswithcode.com/paper/flows-for-simultaneous-manifold-learning-and
Repo https://github.com/johannbrehmer/manifold-flow
Framework pytorch

#### Unsupervised Neural Universal Denoiser for Finite-Input General-Output Noisy Channel

Title Unsupervised Neural Universal Denoiser for Finite-Input General-Output Noisy Channel
Authors Tae-Eon Park, Taesup Moon
Abstract We devise a novel neural network-based universal denoiser for the finite-input, general-output (FIGO) channel. Based on the assumption of known noisy channel densities, which is realistic in many practical scenarios, we train the network such that it can denoise as well as the best sliding window denoiser for any given underlying clean source data. Our algorithm, dubbed as Generalized CUDE (Gen-CUDE), enjoys several desirable properties; it can be trained in an unsupervised manner (solely based on the noisy observation data), has much smaller computational complexity compared to the previously developed universal denoiser for the same setting, and has much tighter upper bound on the denoising performance, which is obtained by a theoretical analysis. In our experiments, we show such tighter upper bound is also realized in practice by showing that Gen-CUDE achieves much better denoising results compared to other strong baselines for both synthetic and real underlying clean sequences.
Published 2020-03-05
URL https://arxiv.org/abs/2003.02623v1
PDF https://arxiv.org/pdf/2003.02623v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-neural-universal-denoiser-for
Repo https://github.com/pte1236/Gen-CUDE
Framework none

#### Automated Augmented Conjugate Inference for Non-conjugate Gaussian Process Models

Title Automated Augmented Conjugate Inference for Non-conjugate Gaussian Process Models
Authors Théo Galy-Fajou, Florian Wenzel, Manfred Opper
Abstract We propose automated augmented conjugate inference, a new inference method for non-conjugate Gaussian processes (GP) models. Our method automatically constructs an auxiliary variable augmentation that renders the GP model conditionally conjugate. Building on the conjugate structure of the augmented model, we develop two inference methods. First, a fast and scalable stochastic variational inference method that uses efficient block coordinate ascent updates, which are computed in closed form. Second, an asymptotically correct Gibbs sampler that is useful for small datasets. Our experiments show that our method are up two orders of magnitude faster and more robust than existing state-of-the-art black-box methods.
Published 2020-02-26
URL https://arxiv.org/abs/2002.11451v1
PDF https://arxiv.org/pdf/2002.11451v1.pdf
PWC https://paperswithcode.com/paper/automated-augmented-conjugate-inference-for
Repo https://github.com/theogf/AutoConjGP_Exp
Framework tf

#### Bulbar ALS Detection Based on Analysis of Voice Perturbation and Vibrato

Title Bulbar ALS Detection Based on Analysis of Voice Perturbation and Vibrato
Authors Maxim Vashkevich, Alexander Petrovsky, Yuliya Rushkevich
Abstract On average the lack of biological markers causes a one year diagnostic delay to detect amyotrophic lateral sclerosis (ALS). To improve the diagnostic process an automatic voice assessment based on acoustic analysis can be used. The purpose of this work was to verify the sutability of the sustain vowel phonation test for automatic detection of patients with ALS. We proposed enhanced procedure for separation of voice signal into fundamental periods that requires for calculation of perturbation measurements (such as jitter and shimmer). Also we proposed method for quantitative assessment of pathological vibrato manifestations in sustain vowel phonation. The study’s experiments show that using the proposed acoustic analysis methods, the classifier based on linear discriminant analysis attains 90.7% accuracy with 86.7% sensitivity and 92.2% specificity.
Published 2020-03-24
URL https://arxiv.org/abs/2003.10806v1
PDF https://arxiv.org/pdf/2003.10806v1.pdf
PWC https://paperswithcode.com/paper/bulbar-als-detection-based-on-analysis-of
Repo https://github.com/Mak-Sim/Troparion
Framework none

#### On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping

Title On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping
Authors Sanghyun Hong, Varun Chandrasekaran, Yiğitcan Kaya, Tudor Dumitraş, Nicolas Papernot
Abstract Machine learning algorithms are vulnerable to data poisoning attacks. Prior taxonomies that focus on specific scenarios, e.g., indiscriminate or targeted, have enabled defenses for the corresponding subset of known attacks. Yet, this introduces an inevitable arms race between adversaries and defenders. In this work, we study the feasibility of an attack-agnostic defense relying on artifacts that are common to all poisoning attacks. Specifically, we focus on a common element between all attacks: they modify gradients computed to train the model. We identify two main artifacts of gradients computed in the presence of poison: (1) their $\ell_2$ norms have significantly higher magnitudes than those of clean gradients, and (2) their orientation differs from clean gradients. Based on these observations, we propose the prerequisite for a generic poisoning defense: it must bound gradient magnitudes and minimize differences in orientation. We call this gradient shaping. As an exemplar tool to evaluate the feasibility of gradient shaping, we use differentially private stochastic gradient descent (DP-SGD), which clips and perturbs individual gradients during training to obtain privacy guarantees. We find that DP-SGD, even in configurations that do not result in meaningful privacy guarantees, increases the model’s robustness to indiscriminate attacks. It also mitigates worst-case targeted attacks and increases the adversary’s cost in multi-poison scenarios. The only attack we find DP-SGD to be ineffective against is a strong, yet unrealistic, indiscriminate attack. Our results suggest that, while we currently lack a generic poisoning defense, gradient shaping is a promising direction for future research.
Published 2020-02-26
URL https://arxiv.org/abs/2002.11497v2
PDF https://arxiv.org/pdf/2002.11497v2.pdf
PWC https://paperswithcode.com/paper/on-the-effectiveness-of-mitigating-data
Framework jax

#### What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

Title What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization
Authors Caleb Belth, Xinyi Zheng, Jilles Vreeken, Danai Koutra
Abstract Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG. In this work, we introduce a unified solution to KG characterization by formulating the problem as unsupervised KG summarization with a set of inductive, soft rules, which describe what is normal in a KG, and thus can be used to identify what is abnormal, whether it be strange or missing. Unlike first-order logic rules, our rules are labeled, rooted graphs, i.e., patterns that describe the expected neighborhood around a (seen or unseen) node, based on its type, and information in the KG. Stepping away from the traditional support/confidence-based rule mining techniques, we propose KGist, Knowledge Graph Inductive SummarizaTion, which learns a summary of inductive rules that best compress the KG according to the Minimum Description Length principle—a formulation that we are the first to use in the context of KG rule mining. We apply our rules to three large KGs (NELL, DBpedia, and Yago), and tasks such as compression, various types of error detection, and identification of incomplete information. We show that KGist outperforms task-specific, supervised and unsupervised baselines in error detection and incompleteness identification, (identifying the location of up to 93% of missing entities—over 10% more than baselines), while also being efficient for large knowledge graphs.