April 3, 2020

# Paper Group AWR 3

TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU. Detecting Patch Adversarial Attacks with Image Residuals. Limitations of weak labels for embedding and tagging. Manifold Regularization for Adversarial Robustness. Unified Image and Video Saliency Modeling. Joint Geographical and Temporal Modeling based on Matrix Fa …

#### TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU

Title TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU
Authors Filip Vaverka, Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina
Abstract Energy efficiency of hardware accelerators of deep neural networks (DNN) can be improved by introducing approximate arithmetic circuits. In order to quantify the error introduced by using these circuits and avoid the expensive hardware prototyping, a software emulator of the DNN accelerator is usually executed on CPU or GPU. However, this emulation is typically two or three orders of magnitude slower than a software DNN implementation running on CPU or GPU and operating with standard floating point arithmetic instructions and common DNN libraries. The reason is that there is no hardware support for approximate arithmetic operations on common CPUs and GPUs and these operations have to be expensively emulated. In order to address this issue, we propose an efficient emulation method for approximate circuits utilized in a given DNN accelerator which is emulated on GPU. All relevant approximate circuits are implemented as look-up tables and accessed through a texture memory mechanism of CUDA capable GPUs. We exploit the fact that the texture memory is optimized for irregular read-only access and in some GPU architectures is even implemented as a dedicated cache. This technique allowed us to reduce the inference time of the emulated DNN accelerator approximately 200 times with respect to an optimized CPU version on complex DNNs such as ResNet. The proposed approach extends the TensorFlow library and is available online at https://github.com/ehw-fit/tf-approximate.
Published 2020-02-21
URL https://arxiv.org/abs/2002.09481v1
PDF https://arxiv.org/pdf/2002.09481v1.pdf
PWC https://paperswithcode.com/paper/tfapprox-towards-a-fast-emulation-of-dnn
Repo https://github.com/ehw-fit/tf-approximate
Framework tf

#### Detecting Patch Adversarial Attacks with Image Residuals

Title Detecting Patch Adversarial Attacks with Image Residuals
Authors Marius Arvinte, Ahmed Tewfik, Sriram Vishwanath
Abstract We introduce an adversarial sample detection algorithm based on image residuals, specifically designed to guard against patch-based attacks. The image residual is obtained as the difference between an input image and a denoised version of it, and a discriminator is trained to distinguish between clean and adversarial samples. More precisely, we use a wavelet domain algorithm for denoising images and demonstrate that the obtained residuals act as a digital fingerprint for adversarial attacks. To emulate the limitations of a physical adversary, we evaluate the performance of our approach against localized (patch-based) adversarial attacks, including in settings where the adversary has complete knowledge about the detection scheme. Our results show that the proposed detection method generalizes to previously unseen, stronger attacks and that it is able to reduce the success rate (conversely, increase the computational effort) of an adaptive attacker.
Published 2020-02-28
URL https://arxiv.org/abs/2002.12504v2
PDF https://arxiv.org/pdf/2002.12504v2.pdf
Repo https://github.com/mariusarvinte/wavelet-patch-detection
Framework tf

#### Limitations of weak labels for embedding and tagging

Title Limitations of weak labels for embedding and tagging
Authors Nicolas Turpault, Romain Serizel, Emmanuel Vincent
Abstract While many datasets and approaches in ambient sound analysis use weakly labeled data, the impact of weak labels on the performance in comparison to strong labels remains unclear. Indeed, weakly labeled data is usually used because it is too expensive to annotate every data with a strong label and for some use cases strong labels are not sure to give better results. Moreover, weak labels are usually mixed with various other challenges like multilabels, unbalanced classes, overlapping events. In this paper, we formulate a supervised problem which involves weak labels. We create a dataset that focuses on difference between strong and weak labels. We investigate the impact of weak labels when training an embedding or an end-to-end classi-fier. Different experimental scenarios are discussed to give insights into which type of applications are most sensitive to weakly labeled data.
Published 2020-02-05
URL https://arxiv.org/abs/2002.01687v2
PDF https://arxiv.org/pdf/2002.01687v2.pdf
PWC https://paperswithcode.com/paper/limitations-of-weak-labels-for-embedding-and
Repo https://github.com/turpaultn/walle
Framework none

#### Manifold Regularization for Adversarial Robustness

Title Manifold Regularization for Adversarial Robustness
Authors Charles Jin, Martin Rinard
Abstract Manifold regularization is a technique that penalizes the complexity of learned functions over the intrinsic geometry of input data. We develop a connection to learning functions which are “locally stable”, and propose new regularization terms for training deep neural networks that are stable against a class of local perturbations. These regularizers enable us to train a network to state-of-the-art robust accuracy of 70% on CIFAR-10 against a PGD adversary using $\ell_\infty$ perturbations of size $\epsilon = 8/255$. Furthermore, our techniques do not rely on the construction of any adversarial examples, thus running orders of magnitude faster than standard algorithms for adversarial training.
Published 2020-03-09
URL https://arxiv.org/abs/2003.04286v1
PDF https://arxiv.org/pdf/2003.04286v1.pdf
Framework pytorch

#### Unified Image and Video Saliency Modeling

Title Unified Image and Video Saliency Modeling
Authors Richard Droste, Jianbo Jiao, J. Alison Noble
Abstract Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. On the one hand, image saliency modeling is a well-studied problem and progress on benchmarks like \mbox{SALICON} and MIT300 is slowing. For video saliency prediction on the other hand, rapid gains have been achieved on the recent DHF1K benchmark through network architectures that are optimized for this task. Here, we take a step back and ask: Can image and video saliency modeling be approached via a unified model, with mutual benefit? We find that it is crucial to model the domain shift between image and video saliency data and between different video saliency datasets for effective joint modeling. We identify different sources of domain shift and address them through four novel domain adaptation techniques - Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN - in addition to an improved formulation of learned Gaussian priors. We integrate these techniques into a simple and lightweight encoder-RNN-decoder-style network, UNISAL, and train the entire network simultaneously with image and video saliency data. We evaluate our method on the video saliency datasets DHF1K, Hollywood-2 and UCF-Sports, as well as the image saliency datasets SALICON and MIT300. With one set of parameters, our method achieves state-of-the-art performance on all video saliency datasets and is on par with the state-of-the-art for image saliency prediction, despite a 5 to 20-fold reduction in model size and the fastest runtime among all competing deep models. We provide retrospective analyses and ablation studies which demonstrate the importance of the domain shift modeling. The code is available at https://github.com/rdroste/unisal.
Published 2020-03-11
URL https://arxiv.org/abs/2003.05477v1
PDF https://arxiv.org/pdf/2003.05477v1.pdf
PWC https://paperswithcode.com/paper/unified-image-and-video-saliency-modeling
Repo https://github.com/rdroste/unisal
Framework pytorch

#### Joint Geographical and Temporal Modeling based on Matrix Factorization for Point-of-Interest Recommendation

Title Joint Geographical and Temporal Modeling based on Matrix Factorization for Point-of-Interest Recommendation
Abstract With the popularity of Location-based Social Networks, Point-of-Interest (POI) recommendation has become an important task, which learns the users’ preferences and mobility patterns to recommend POIs. Previous studies show that incorporating contextual information such as geographical and temporal influences is necessary to improve POI recommendation by addressing the data sparsity problem. However, existing methods model the geographical influence based on the physical distance between POIs and users, while ignoring the temporal characteristics of such geographical influences. In this paper, we perform a study on the user mobility patterns where we find out that users’ check-ins happen around several centers depending on their current temporal state. Next, we propose a spatio-temporal activity-centers algorithm to model users’ behavior more accurately. Finally, we demonstrate the effectiveness of our proposed contextual model by incorporating it into the matrix factorization model under two different settings: i) static and ii) temporal. To show the effectiveness of our proposed method, which we refer to as STACP, we conduct experiments on two well-known real-world datasets acquired from Gowalla and Foursquare LBSNs. Experimental results show that the STACP model achieves a statistically significant performance improvement, compared to the state-of-the-art techniques. Also, we demonstrate the effectiveness of capturing geographical and temporal information for modeling users’ activity centers and the importance of modeling them jointly.
Published 2020-01-24
URL https://arxiv.org/abs/2001.08961v1
PDF https://arxiv.org/pdf/2001.08961v1.pdf
PWC https://paperswithcode.com/paper/joint-geographical-and-temporal-modeling
Repo https://github.com/rahmanidashti/STACP
Framework none

#### DHP: Differentiable Meta Pruning via HyperNetworks

Title DHP: Differentiable Meta Pruning via HyperNetworks
Authors Yawei Li, Shuhang Gu, Kai Zhang, Luc Van Gool, Radu Timofte
Abstract Network pruning has been the driving force for the efficient inference of neural networks and the alleviation of model storage and transmission burden. Traditional network pruning methods focus on the per-filter influence on the network accuracy by analyzing the filter distribution. With the advent of AutoML and neural architecture search (NAS), pruning has become topical with automatic mechanism and searching based architecture optimization. However, current automatic designs rely on either reinforcement learning or evolutionary algorithm, which often do not have a theoretical convergence guarantee or do not converge in a meaningful time limit. In this paper, we propose a differentiable pruning method via hypernetworks for automatic network pruning and layer-wise configuration optimization. A hypernetwork is designed to generate the weights of the backbone network. The input of the hypernetwork, namely, the latent vectors control the output channels of the layers of backbone network. By applying $\ell_1$ sparsity regularization to the latent vectors and utilizing proximal gradient, sparse latent vectors can be obtained with removed zero elements. Thus, the corresponding elements of the hypernetwork outputs can also be removed, achieving the effect of network pruning. The latent vectors of all the layers are pruned together, resulting in an automatic layer configuration. Extensive experiments are conducted on various networks for image classification, single image super-resolution, and denoising. And the experimental results validate the proposed method.
Tasks AutoML, Denoising, Image Classification, Image Super-Resolution, Network Pruning, Neural Architecture Search, Super-Resolution
Published 2020-03-30
URL https://arxiv.org/abs/2003.13683v1
PDF https://arxiv.org/pdf/2003.13683v1.pdf
PWC https://paperswithcode.com/paper/dhp-differentiable-meta-pruning-via
Repo https://github.com/ofsoundof/dhp
Framework none

#### Increasing the robustness of DNNs against image corruptions by playing the Game of Noise

Title Increasing the robustness of DNNs against image corruptions by playing the Game of Noise
Authors Evgenia Rusak, Lukas Schott, Roland S. Zimmermann, Julian Bitterwolf, Oliver Bringmann, Matthias Bethge, Wieland Brendel
Abstract The human visual system is remarkably robust against a wide range of naturally occurring variations and corruptions like rain or snow. In contrast, the performance of modern image recognition models strongly degrades when evaluated on previously unseen corruptions. Here, we demonstrate that a simple but properly tuned training with additive Gaussian and Speckle noise generalizes surprisingly well to unseen corruptions, easily reaching the previous state of the art on the corruption benchmark ImageNet-C (with ResNet50) and on MNIST-C. We build on top of these strong baseline results and show that an adversarial training of the recognition model against uncorrelated worst-case noise distributions leads to an additional increase in performance. This regularization can be combined with previously proposed defense methods for further improvement.
Published 2020-01-16
URL https://arxiv.org/abs/2001.06057v3
PDF https://arxiv.org/pdf/2001.06057v3.pdf
PWC https://paperswithcode.com/paper/increasing-the-robustness-of-dnns-against
Repo https://github.com/hendrycks/robustness
Framework pytorch

#### f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

Title f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation
Authors Konstantin Sofiiuk, Ilia Petrov, Olga Barinova, Anton Konushin
Abstract Deep neural networks have become a mainstream approach to interactive segmentation. As we show in our experiments, while for some images a trained network provides accurate segmentation result with just a few clicks, for some unknown objects it cannot achieve satisfactory result even with a large amount of user input. Recently proposed backpropagating refinement (BRS) scheme introduces an optimization problem for interactive segmentation that results in significantly better performance for the hard cases. At the same time, BRS requires running forward and backward pass through a deep network several times that leads to significantly increased computational budget per click compared to other methods. We propose f-BRS (feature backpropagating refinement scheme) that solves an optimization problem with respect to auxiliary variables instead of the network inputs, and requires running forward and backward pass just for a small part of a network. Experiments on GrabCut, Berkeley, DAVIS and SBD datasets set new state-of-the-art at an order of magnitude lower time per click compared to original BRS. The code and trained models are available at https://github.com/saic-vul/fbrs_interactive_segmentation .
Published 2020-01-28
URL https://arxiv.org/abs/2001.10331v1
PDF https://arxiv.org/pdf/2001.10331v1.pdf
PWC https://paperswithcode.com/paper/f-brs-rethinking-backpropagating-refinement
Repo https://github.com/saic-vul/fbrs_interactive_segmentation
Framework pytorch

#### A Benchmark for Point Clouds Registration Algorithms

Title A Benchmark for Point Clouds Registration Algorithms
Authors Simone Fontana, Daniele Cattaneo, Augusto Luis Ballardini, Matteo Vaghi, Domenico Giorgio Sorrenti
Abstract Point clouds registration is a fundamental step of many point clouds processing pipelines; however, most algorithms are tested on data collected ad-hoc and not shared with the research community. These data often cover only a very limited set of use cases; therefore, the results cannot be generalised. Public datasets proposed until now, taken individually, cover only a few kinds of environment and mostly a single sensor. For these reasons, we developed a benchmark, for localization and mapping applications, using multiple publicly available datasets. In this way, we have been able to cover many kinds of environments and many kinds of sensor that can produce point clouds. Furthermore, the ground truth has been thoroughly inspected and evaluated to ensure its quality. For some of the datasets, the accuracy of the ground truth system was not reported by the original authors, therefore we estimated it with our own novel method, based on an iterative registration algorithm. Along with the data, we provide a broad set of registration problems, chosen to cover different types of initial misalignment, various degrees of overlap, and different kinds of registration problems. Lastly, we propose a metric to measure the performances of registration algorithms: it combines the commonly used rotation and translation errors together, to allow an objective comparison of the alignments. This work aims at encouraging authors to use a public and shared benchmark, instead than data collected ad-hoc, to ensure objectivity and repeatability, two fundamental characteristics in any scientific field.
Published 2020-03-28
URL https://arxiv.org/abs/2003.12841v1
PDF https://arxiv.org/pdf/2003.12841v1.pdf
PWC https://paperswithcode.com/paper/a-benchmark-for-point-clouds-registration
Repo https://github.com/iralabdisco/point_clouds_registration_benchmark
Framework none

#### Salvaging Federated Learning by Local Adaptation

Title Salvaging Federated Learning by Local Adaptation
Authors Tao Yu, Eugene Bagdasaryan, Vitaly Shmatikov
Abstract Federated learning (FL) is a heavily promoted approach for training ML models on sensitive data, e.g., text typed by users on their smartphones. FL is expressly designed for training on data that are unbalanced and non-iid across the participants. To ensure privacy and integrity of the federated model, latest FL approaches use differential privacy or robust aggregation to limit the influence of “outlier” participants. First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own. Second, we show that differential privacy and robust aggregation make this problem worse by further destroying the accuracy of the federated model for many participants. Then, we evaluate three techniques for local adaptation of federated models: fine-tuning, multi-task learning, and knowledge distillation. We analyze where each technique is applicable and demonstrate that all participants benefit from local adaptation. Participants whose local models are poor obtain big accuracy improvements over conventional FL. Participants whose local models are better than the federated model and who have no incentive to participate in FL today improve less, but sufficiently to make the adapted federated model better than their local models.
Published 2020-02-12
URL https://arxiv.org/abs/2002.04758v1
PDF https://arxiv.org/pdf/2002.04758v1.pdf
PWC https://paperswithcode.com/paper/salvaging-federated-learning-by-local
Framework pytorch

#### A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising

Title A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising
Authors Kaixuan Wei, Ying Fu, Jiaolong Yang, Hua Huang
Abstract Lacking rich and realistic data, learned single image denoising algorithms generalize poorly to real raw images that do not resemble the data used for training. Although the problem can be alleviated by the heteroscedastic Gaussian model for noise synthesis, the noise sources caused by digital camera electronics are still largely overlooked, despite their significant effect on raw measurement, especially under extremely low-light condition. To address this issue, we present a highly accurate noise formation model based on the characteristics of CMOS photosensors, thereby enabling us to synthesize realistic samples that better match the physics of image formation process. Given the proposed noise model, we additionally propose a method to calibrate the noise parameters for available modern digital cameras, which is simple and reproducible for any new device. We systematically study the generalizability of a neural network trained with existing schemes, by introducing a new low-light denoising dataset that covers many modern digital cameras from diverse brands. Extensive empirical results collectively show that by utilizing our proposed noise formation model, a network can reach the capability as if it had been trained with rich real data, which demonstrates the effectiveness of our noise formation model.
Published 2020-03-28
URL https://arxiv.org/abs/2003.12751v1
PDF https://arxiv.org/pdf/2003.12751v1.pdf
PWC https://paperswithcode.com/paper/a-physics-based-noise-formation-model-for
Repo https://github.com/Vandermode/NoiseModel
Framework none

#### Real-Time Detection of Dictionary DGA Network Traffic using Deep Learning

Title Real-Time Detection of Dictionary DGA Network Traffic using Deep Learning
Authors Kate Highnam, Domenic Puzio, Song Luo, Nicholas R. Jennings
Abstract Botnets and malware continue to avoid detection by static rules engines when using domain generation algorithms (DGAs) for callouts to unique, dynamically generated web addresses. Common DGA detection techniques fail to reliably detect DGA variants that combine random dictionary words to create domain names that closely mirror legitimate domains. To combat this, we created a novel hybrid neural network, Bilbo the bagging model, that analyses domains and scores the likelihood they are generated by such algorithms and therefore are potentially malicious. Bilbo is the first parallel usage of a convolutional neural network (CNN) and a long short-term memory (LSTM) network for DGA detection. Our unique architecture is found to be the most consistent in performance in terms of AUC, F1 score, and accuracy when generalising across different dictionary DGA classification tasks compared to current state-of-the-art deep learning architectures. We validate using reverse-engineered dictionary DGA domains and detail our real-time implementation strategy for scoring real-world network logs within a large financial enterprise. In four hours of actual network traffic, the model discovered at least five potential command-and-control networks that commercial vendor tools did not flag.
Published 2020-03-28
URL https://arxiv.org/abs/2003.12805v1
PDF https://arxiv.org/pdf/2003.12805v1.pdf
PWC https://paperswithcode.com/paper/real-time-detection-of-dictionary-dga-network
Repo https://github.com/jinxmirror13/bilbo-bagging-hybrid
Framework none

#### Weakly-Supervised Action Localization by Generative Attention Modeling

Title Weakly-Supervised Action Localization by Generative Attention Modeling
Authors Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang
Abstract Weakly-supervised temporal action localization is a problem of learning an action localization model with only video-level action labeling available. The general framework largely relies on the classification activation, which employs an attention model to identify the action-related frames and then categorizes them into different classes. Such method results in the action-context confusion issue: context frames near action clips tend to be recognized as action frames themselves, since they are closely related to the specific classes. To solve the problem, in this paper we propose to model the class-agnostic frame-wise probability conditioned on the frame attention using conditional Variational Auto-Encoder (VAE). With the observation that the context exhibits notable difference from the action at representation level, a probabilistic model, i.e., conditional VAE, is learned to model the likelihood of each frame given the attention. By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated. Experiments on THUMOS14 and ActivityNet1.2 demonstrate advantage of our method and effectiveness in handling action-context confusion problem. Code is now available on GitHub.
Tasks Action Localization, Temporal Action Localization, Weakly Supervised Action Localization, Weakly-supervised Temporal Action Localization
Published 2020-03-27
URL https://arxiv.org/abs/2003.12424v2
PDF https://arxiv.org/pdf/2003.12424v2.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-action-localization-by-2
Repo https://github.com/bfshi/DGAM-Weakly-Supervised-Action-Localization
Framework pytorch

#### X-Stance: A Multilingual Multi-Target Dataset for Stance Detection

Title X-Stance: A Multilingual Multi-Target Dataset for Stance Detection
Authors Jannis Vamvas, Rico Sennrich
Abstract We extract a large-scale stance detection dataset from comments written by candidates of elections in Switzerland. The dataset consists of German, French and Italian text, allowing for a cross-lingual evaluation of stance detection. It contains 67 000 comments on more than 150 political issues (targets). Unlike stance detection models that have specific target issues, we use the dataset to train a single model on all the issues. To make learning across targets possible, we prepend to each instance a natural question that represents the target (e.g. “Do you support X?"). Baseline results from multilingual BERT show that zero-shot cross-lingual and cross-target transfer of stance detection is moderately successful with this approach.