February 1, 2020

3184 words 15 mins read

Paper Group AWR 83

An Empirical Investigation of Randomized Defenses against Adversarial Attacks. Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty. Hierarchical Back Projection Network for Image Super-Resolution. Predicting Multiple Demographic Attributes with Task Specific Embedding Transformation and Attention Network. Looking back at Lab …

An Empirical Investigation of Randomized Defenses against Adversarial Attacks


Title	An Empirical Investigation of Randomized Defenses against Adversarial Attacks
Authors	Yannik Potdevin, Dirk Nowotka, Vijay Ganesh
Abstract	In recent years, Deep Neural Networks (DNNs) have had a dramatic impact on a variety of problems that were long considered very difficult, e. g., image classification and automatic language translation to name just a few. The accuracy of modern DNNs in classification tasks is remarkable indeed. At the same time, attackers have devised powerful methods to construct specially-crafted malicious inputs (often referred to as adversarial examples) that can trick DNNs into mis-classifying them. What is worse is that despite the many defense mechanisms proposed to protect DNNs against adversarial attacks, attackers are often able to circumvent these defenses, rendering them useless. This state of affairs is extremely worrying, especially since machine learning systems get adopted at scale. In this paper, we propose a scientific evaluation methodology aimed at assessing the quality, efficacy, robustness and efficiency of randomized defenses to protect DNNs against adversarial examples. Using this methodology, we evaluate a variety of defense mechanisms. In addition, we also propose a defense mechanism we call Randomly Perturbed Ensemble Neural Networks (RPENNs). We provide a thorough and comprehensive evaluation of the considered defense mechanisms against a white-box attacker model, six different adversarial attack methods and using the ILSVRC2012 validation data set.
Tasks	Adversarial Attack, Image Classification
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05580v1
PDF	https://arxiv.org/pdf/1909.05580v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-investigation-of-randomized
Repo	https://github.com/ypotdevin/randomized-defenses
Framework	none

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty


Title	Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
Authors	Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song
Abstract	Self-supervision provides effective representations for downstream tasks without requiring labels. However, existing approaches lag behind fully supervised training and are often not thought beneficial beyond obviating or reducing the need for annotations. We find that self-supervision can benefit robustness in a variety of ways, including robustness to adversarial examples, label corruption, and common input corruptions. Additionally, self-supervision greatly benefits out-of-distribution detection on difficult, near-distribution outliers, so much so that it exceeds the performance of fully supervised methods. These results demonstrate the promise of self-supervision for improving robustness and uncertainty estimation and establish these tasks as new axes of evaluation for future self-supervised learning research.
Tasks	Anomaly Detection, Outlier Detection, Out-of-Distribution Detection
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12340v2
PDF	https://arxiv.org/pdf/1906.12340v2.pdf
PWC	https://paperswithcode.com/paper/using-self-supervised-learning-can-improve
Repo	https://github.com/hendrycks/ss-ood
Framework	pytorch

Hierarchical Back Projection Network for Image Super-Resolution


Title	Hierarchical Back Projection Network for Image Super-Resolution
Authors	Zhi-Song Liu, Li-Wen Wang, Chu-Tak Li, Wan-Chi Siu
Abstract	Deep learning based single image super-resolution methods use a large number of training datasets and have recently achieved great quality progress both quantitatively and qualitatively. Most deep networks focus on nonlinear mapping from low-resolution inputs to high-resolution outputs via residual learning without exploring the feature abstraction and analysis. We propose a Hierarchical Back Projection Network (HBPN), that cascades multiple HourGlass (HG) modules to bottom-up and top-down process features across all scales to capture various spatial correlations and then consolidates the best representation for reconstruction. We adopt the back projection blocks in our proposed network to provide the error correlated up and down-sampling process to replace simple deconvolution and pooling process for better estimation. A new Softmax based Weighted Reconstruction (WR) process is used to combine the outputs of HG modules to further improve super-resolution. Experimental results on various datasets (including the validation dataset, NTIRE2019, of the Real Image Super-resolution Challenge) show that our proposed approach can achieve and improve the performance of the state-of-the-art methods for different scaling factors.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-06-17
URL	https://arxiv.org/abs/1906.06874v2
PDF	https://arxiv.org/pdf/1906.06874v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-back-projection-network-for
Repo	https://github.com/Holmes-Alan/HBPN
Framework	none

Predicting Multiple Demographic Attributes with Task Specific Embedding Transformation and Attention Network


Title	Predicting Multiple Demographic Attributes with Task Specific Embedding Transformation and Attention Network
Authors	Raehyun Kim, Hyunjae Kim, Janghyuk Lee, Jaewoo Kang
Abstract	Most companies utilize demographic information to develop their strategy in a market. However, such information is not available to most retail companies. Several studies have been conducted to predict the demographic attributes of users from their transaction histories, but they have some limitations. First, they focused on parameter sharing to predict all attributes but capturing task-specific features is also important in multi-task learning. Second, they assumed that all transactions are equally important in predicting demographic attributes. However, some transactions are more useful than others for predicting a certain attribute. Furthermore, decision making process of models cannot be interpreted as they work in a black-box manner. To address the limitations, we propose an Embedding Transformation Network with Attention (ETNA) model which shares representations at the bottom of the model structure and transforms them to task-specific representations using a simple linear transformation method. In addition, we can obtain more informative transactions for predicting certain attributes using the attention mechanism. The experimental results show that our model outperforms the previous models on all tasks. In our qualitative analysis, we show the visualization of attention weights, which provides business managers with some useful insights.
Tasks	Decision Making, Multi-Task Learning
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10144v1
PDF	http://arxiv.org/pdf/1903.10144v1.pdf
PWC	https://paperswithcode.com/paper/predicting-multiple-demographic-attributes
Repo	https://github.com/dmis-lab/demographic-prediction
Framework	pytorch

Looking back at Labels: A Class based Domain Adaptation Technique


Title	Looking back at Labels: A Class based Domain Adaptation Technique
Authors	Vinod Kumar Kurmi, Vinay P. Namboodiri
Abstract	In this paper, we solve the problem of adapting classifiers across domains. We consider the problem of domain adaptation for multi-class classification where we are provided a labeled set of examples in a source dataset and we are provided a target dataset with no supervision. In this setting, we propose an adversarial discriminator based approach. While the approach based on adversarial discriminator has been previously proposed; in this paper, we present an informed adversarial discriminator. Our observation relies on the analysis that shows that if the discriminator has access to all the information available including the class structure present in the source dataset, then it can guide the transformation of features of the target set of classes to a more structure adapted space. Using this formulation, we obtain state-of-the-art results for the standard evaluation on benchmark datasets. We further provide detailed analysis which shows that using all the labeled information results in an improved domain adaptation.
Tasks	Domain Adaptation, Image Classification
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01341v1
PDF	http://arxiv.org/pdf/1904.01341v1.pdf
PWC	https://paperswithcode.com/paper/looking-back-at-labels-a-class-based-domain
Repo	https://github.com/vinodkkurmi/DiscriminatorDomainAdaptation
Framework	pytorch

A Self-Attentive Emotion Recognition Network


Title	A Self-Attentive Emotion Recognition Network
Authors	Harris Partaourides, Kostantinos Papadamou, Nicolas Kourtellis, Ilias Leontiadis, Sotirios Chatzis
Abstract	Modern deep learning approaches have achieved groundbreaking performance in modeling and classifying sequential data. Specifically, attention networks constitute the state-of-the-art paradigm for capturing long temporal dynamics. This paper examines the efficacy of this paradigm in the challenging task of emotion recognition in dyadic conversations. In contrast to existing approaches, our work introduces a novel attention mechanism capable of inferring the immensity of the effect of each past utterance on the current speaker emotional state. The proposed attention mechanism performs this inference procedure without the need of a decoder network; this is achieved by means of innovative self-attention arguments. Our self-attention networks capture the correlation patterns among consecutive encoder network states, thus allowing to robustly and effectively model temporal dynamics over arbitrary long temporal horizons. Thus, we enable capturing strong affective patterns over the course of long discussions. We exhibit the effectiveness of our approach considering the challenging IEMOCAP benchmark. As we show, our devised methodology outperforms state-of-the-art alternatives and commonly used approaches, giving rise to promising new research directions in the context of Online Social Network (OSN) analysis tasks.
Tasks	Emotion Recognition
Published	2019-04-24
URL	http://arxiv.org/abs/1905.01972v1
PDF	http://arxiv.org/pdf/1905.01972v1.pdf
PWC	https://paperswithcode.com/paper/190501972
Repo	https://github.com/Partaourides/SERN
Framework	tf

Uncertainty Propagation in Deep Neural Network Using Active Subspace


Title	Uncertainty Propagation in Deep Neural Network Using Active Subspace
Authors	Weiqi Ji, Zhuyin Ren, Chung K. Law
Abstract	The inputs of deep neural network (DNN) from real-world data usually come with uncertainties. Yet, it is challenging to propagate the uncertainty in the input features to the DNN predictions at a low computational cost. This work employs a gradient-based subspace method and response surface technique to accelerate the uncertainty propagation in DNN. Specifically, the active subspace method is employed to identify the most important subspace in the input features using the gradient of the DNN output to the inputs. Then the response surface within that low-dimensional subspace can be efficiently built, and the uncertainty of the prediction can be acquired by evaluating the computationally cheap response surface instead of the DNN models. In addition, the subspace can help explain the adversarial examples. The approach is demonstrated in MNIST datasets with a convolutional neural network. Code is available at: https://github.com/jiweiqi/nnsubspace.
Tasks
Published	2019-03-10
URL	https://arxiv.org/abs/1903.03989v2
PDF	https://arxiv.org/pdf/1903.03989v2.pdf
PWC	https://paperswithcode.com/paper/uncertainty-propagation-in-deep-neural-1
Repo	https://github.com/jiweiqi/nnsubspace
Framework	none

Scaling description of generalization with number of parameters in deep learning


Title	Scaling description of generalization with number of parameters in deep learning
Authors	Mario Geiger, Arthur Jacot, Stefano Spigler, Franck Gabriel, Levent Sagun, Stéphane d’Ascoli, Giulio Biroli, Clément Hongler, Matthieu Wyart
Abstract	Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{}$. Instead, empirical studies have shown that in the over-parametrized regime, generalization error keeps decreasing with $N$. We resolve this paradox through a new framework. We rely on the so-called Neural Tangent Kernel, which connects large neural nets to kernel methods, to show that the initialization causes finite-size random fluctuations $\f_{N}-\bar{f}_{N}\sim N^{-1/4}$ of the neural net output function $f_{N}$ around its expectation $\bar{f}_{N}$. These affect the generalization error $\epsilon_{N}$ for classification: under natural assumptions, it decays to a plateau value $\epsilon_{\infty}$ in a power-law fashion $\sim N^{-1/2}$. This description breaks down at a so-called jamming transition $N=N^{}$. At this threshold, we argue that $\f_{N}$ diverges. This result leads to a plausible explanation for the cusp in test error known to occur at $N^{}$. Our results are confirmed by extensive empirical observations on the MNIST and CIFAR image datasets. Our analysis finally suggests that, given a computational envelope, the smallest generalization error is obtained using several networks of intermediate sizes, just beyond $N^{}$, and averaging their outputs.
Tasks
Published	2019-01-06
URL	https://arxiv.org/abs/1901.01608v5
PDF	https://arxiv.org/pdf/1901.01608v5.pdf
PWC	https://paperswithcode.com/paper/scaling-description-of-generalization-with
Repo	https://github.com/glouppe/info8010-deep-learning
Framework	pytorch

An Attentive Survey of Attention Models


Title	An Attentive Survey of Attention Models
Authors	Sneha Chaudhari, Gungor Polatkan, Rohan Ramanath, Varun Mithal
Abstract	Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we propose a taxonomy which groups existing techniques into coherent categories. We review the different neural architectures in which attention has been incorporated, and also show how attention improves interpretability of neural models. Finally, we discuss some applications in which modeling attention has a significant impact. We hope this survey will provide a succinct introduction to attention models and guide practitioners while developing approaches for their applications.
Tasks
Published	2019-04-05
URL	http://arxiv.org/abs/1904.02874v1
PDF	http://arxiv.org/pdf/1904.02874v1.pdf
PWC	https://paperswithcode.com/paper/an-attentive-survey-of-attention-models
Repo	https://github.com/dddfgkl/paper
Framework	none

Soft Actor-Critic for Discrete Action Settings


Title	Soft Actor-Critic for Discrete Action Settings
Authors	Petros Christodoulou
Abstract	Soft Actor-Critic is a state-of-the-art reinforcement learning algorithm for continuous action settings that is not applicable to discrete action settings. Many important settings involve discrete actions, however, and so here we derive an alternative version of the Soft Actor-Critic algorithm that is applicable to discrete action settings. We then show that, even without any hyperparameter tuning, it is competitive with the tuned model-free state-of-the-art on a selection of games from the Atari suite.
Tasks	Atari Games
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07207v2
PDF	https://arxiv.org/pdf/1910.07207v2.pdf
PWC	https://paperswithcode.com/paper/soft-actor-critic-for-discrete-action
Repo	https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch
Framework	pytorch

A Hierarchical Architecture for Sequential Decision-Making in Autonomous Driving using Deep Reinforcement Learning


Title	A Hierarchical Architecture for Sequential Decision-Making in Autonomous Driving using Deep Reinforcement Learning
Authors	Majid Moghadam, Gabriel Hugh Elkaim
Abstract	Tactical decision making is a critical feature for advanced driving systems, that incorporates several challenges such as complexity of the uncertain environment and reliability of the autonomous system. In this work, we develop a multi-modal architecture that includes the environmental modeling of ego surrounding and train a deep reinforcement learning (DRL) agent that yields consistent performance in stochastic highway driving scenarios. To this end, we feed the occupancy grid of the ego surrounding into the DRL agent and obtain the high-level sequential commands (i.e. lane change) to send them to lower-level controllers. We will show that dividing the autonomous driving problem into a multi-layer control architecture enables us to leverage the AI power to solve each layer separately and achieve an admissible reliability score. Comparing with end-to-end approaches, this architecture enables us to end up with a more reliable system which can be implemented in actual self-driving cars.
Tasks	Autonomous Driving, Decision Making, Self-Driving Cars
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08464v1
PDF	https://arxiv.org/pdf/1906.08464v1.pdf
PWC	https://paperswithcode.com/paper/a-hierarchical-architecture-for-sequential
Repo	https://github.com/MajidMoghadam2006/deepcars-reinforcement-learning
Framework	none

Visual Dialogue State Tracking for Question Generation


Title	Visual Dialogue State Tracking for Question Generation
Authors	Wei Pang, Xiaojie Wang
Abstract	GuessWhat?! is a visual dialogue task between a guesser and an oracle. The guesser aims to locate an object supposed by the oracle oneself in an image by asking a sequence of Yes/No questions. Asking proper questions with the progress of dialogue is vital for achieving successful final guess. As a result, the progress of dialogue should be properly represented and tracked. Previous models for question generation pay less attention on the representation and tracking of dialogue states, and therefore are prone to asking low quality questions such as repeated questions. This paper proposes visual dialogue state tracking (VDST) based method for question generation. A visual dialogue state is defined as the distribution on objects in the image as well as representations of objects. Representations of objects are updated with the change of the distribution on objects. An object-difference based attention is used to decode new question. The distribution on objects is updated by comparing the question-answer pair and objects. Experimental results on GuessWhat?! dataset show that our model significantly outperforms existing methods and achieves new state-of-the-art performance. It is also noticeable that our model reduces the rate of repeated questions from more than 50% to 21.9% compared with previous state-of-the-art methods.
Tasks	Dialogue State Tracking, Question Generation, Visual Dialog
Published	2019-11-12
URL	https://arxiv.org/abs/1911.07928v2
PDF	https://arxiv.org/pdf/1911.07928v2.pdf
PWC	https://paperswithcode.com/paper/visual-dialogue-state-tracking-for-question
Repo	https://github.com/xubuvd/guesswhat
Framework	tf

Modeling Sequences with Quantum States: A Look Under the Hood


Title	Modeling Sequences with Quantum States: A Look Under the Hood
Authors	Tai-Danae Bradley, E. Miles Stoudenmire, John Terilla
Abstract	Classical probability distributions on sets of sequences can be modeled using quantum states. Here, we do so with a quantum state that is pure and entangled. Because it is entangled, the reduced densities that describe subsystems also carry information about the complementary subsystem. This is in contrast to the classical marginal distributions on a subsystem in which information about the complementary system has been integrated out and lost. A training algorithm based on the density matrix renormalization group (DMRG) procedure uses the extra information contained in the reduced densities and organizes it into a tensor network model. An understanding of the extra information contained in the reduced densities allow us to examine the mechanics of this DMRG algorithm and study the generalization error of the resulting model. As an illustration, we work with the even-parity dataset and produce an estimate for the generalization error as a function of the fraction of the dataset used in training.
Tasks
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07425v1
PDF	https://arxiv.org/pdf/1910.07425v1.pdf
PWC	https://paperswithcode.com/paper/modeling-sequences-with-quantum-states-a-look
Repo	https://github.com/emstoudenmire/parity
Framework	none


Title	Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Authors	Yale Song, Mohammad Soleymani
Abstract	Visual-semantic embedding aims to find a shared latent space where related visual and textual instances are close to each other. Most current methods learn injective embedding functions that map an instance to a single point in the shared space. Unfortunately, injective embedding cannot effectively handle polysemous instances with multiple possible meanings; at best, it would find an average representation of different meanings. This hinders its use in real-world scenarios where individual instances and their cross-modal associations are often ambiguous. In this work, we introduce Polysemous Instance Embedding Networks (PIE-Nets) that compute multiple and diverse representations of an instance by combining global context with locally-guided features via multi-head self-attention and residual learning. To learn visual-semantic embedding, we tie-up two PIE-Nets and optimize them jointly in the multiple instance learning framework. Most existing work on cross-modal retrieval focuses on image-text data. Here, we also tackle a more challenging case of video-text retrieval. To facilitate further research in video-text retrieval, we release a new dataset of 50K video-sentence pairs collected from social media, dubbed MRW (my reaction when). We demonstrate our approach on both image-text and video-text retrieval scenarios using MS-COCO, TGIF, and our new MRW dataset.
Tasks	Cross-Modal Retrieval, Multiple Instance Learning
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04402v2
PDF	https://arxiv.org/pdf/1906.04402v2.pdf
PWC	https://paperswithcode.com/paper/polysemous-visual-semantic-embedding-for-1
Repo	https://github.com/yalesong/pvse
Framework	pytorch

Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis


Title	Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis
Authors	Richard J. Chen, Ming Y. Lu, Jingwen Wang, Drew F. K. Williamson, Scott J. Rodig, Neal I. Lindeman, Faisal Mahmood
Abstract	Cancer diagnosis, prognosis, and therapeutic response predictions are based on morphological information from histology slides and molecular profiles from genomic data. However, most deep learning-based objective outcome prediction and grading paradigms are based on histology or genomics alone and do not make use of the complementary information in an intuitive manner. In this work, we propose Pathomic Fusion, a strategy for end-to-end multimodal fusion of histology image and genomic (mutations, CNV, mRNAseq) features for survival outcome prediction. Our approach models pairwise feature interactions across modalities by taking the Kronecker product of gated feature representations and controls the expressiveness of each representation via a gating-based attention mechanism. The proposed framework is able to model pairwise interactions across features in different modalities and control their relative importance. We validate our approach using glioma datasets from the Cancer Genome Atlas (TCGA), which contains paired whole-slide image, genotype, and transcriptome data with ground truth survival and histologic grade labels. Based on a rigorous 15-fold cross-validation, our results demonstrate that the proposed multimodal fusion paradigm improves prognostic determinations from grading and molecular subtyping as well as unimodal deep networks trained on histology and genomic data alone. The proposed method establishes insight and theory on how to train deep networks on multimodal biomedical data in an intuitive manner, which will be useful for other problems in medicine that seek to combine heterogeneous data streams for understanding diseases and predicting response and resistance to treatment.
Tasks
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08937v2
PDF	https://arxiv.org/pdf/1912.08937v2.pdf
PWC	https://paperswithcode.com/paper/pathomic-fusion-an-integrated-framework-for
Repo	https://github.com/mahmoodlab/PathomicFusion
Framework	pytorch