February 2, 2020

2724 words 13 mins read

Paper Group AWR 18

Paper Group AWR 18

AdvHat: Real-world adversarial attack on ArcFace Face ID system. CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. The Implicit Bias of Depth: How Incremental Learning Drives Generalization. Geometric Back-projection Network for Point Cloud Classification. Towards Robust Deep Reinforcement Learning for Traffic Signal Control: Dem …

AdvHat: Real-world adversarial attack on ArcFace Face ID system

Title AdvHat: Real-world adversarial attack on ArcFace Face ID system
Authors Stepan Komkov, Aleksandr Petiushko
Abstract In this paper we propose a novel easily reproducible technique to attack the best public Face ID system ArcFace in different shooting conditions. To create an attack, we print the rectangular paper sticker on a common color printer and put it on the hat. The adversarial sticker is prepared with a novel algorithm for off-plane transformations of the image which imitates sticker location on the hat. Such an approach confuses the state-of-the-art public Face ID model LResNet100E-IR, ArcFace@ms1m-refine-v2 and is transferable to other Face ID models.
Tasks Adversarial Attack
Published 2019-08-23
URL https://arxiv.org/abs/1908.08705v1
PDF https://arxiv.org/pdf/1908.08705v1.pdf
PWC https://paperswithcode.com/paper/advhat-real-world-adversarial-attack-on
Repo https://github.com/papermsucode/advhat
Framework tf

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval

Title CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Authors Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao
Abstract Text-image cross-modal retrieval is a challenging task in the field of language and vision. Most previous approaches independently embed images and sentences into a joint embedding space and compare their similarities. However, previous approaches rarely explore the interactions between images and sentences before calculating similarities in the joint space. Intuitively, when matching between images and sentences, human beings would alternatively attend to regions in images and words in sentences, and select the most salient information considering the interaction between both modalities. In this paper, we propose Cross-modal Adaptive Message Passing (CAMP), which adaptively controls the information flow for message passing across modalities. Our approach not only takes comprehensive and fine-grained cross-modal interactions into account, but also properly handles negative pairs and irrelevant information with an adaptive gating scheme. Moreover, instead of conventional joint embedding approaches for text-image matching, we infer the matching score based on the fused features, and propose a hardest negative binary cross-entropy loss for training. Results on COCO and Flickr30k significantly surpass state-of-the-art methods, demonstrating the effectiveness of our approach.
Tasks Cross-Modal Retrieval, Image Retrieval
Published 2019-09-12
URL https://arxiv.org/abs/1909.05506v1
PDF https://arxiv.org/pdf/1909.05506v1.pdf
PWC https://paperswithcode.com/paper/camp-cross-modal-adaptive-message-passing-for
Repo https://github.com/ZihaoWang-CV/CAMP_iccv19
Framework pytorch

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

Title The Implicit Bias of Depth: How Incremental Learning Drives Generalization
Authors Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely
Abstract A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity. We formally define the notion of incremental learning dynamics and derive the conditions on depth and initialization for which this phenomenon arises in deep linear models. Our main theoretical contribution is a dynamical depth separation result, proving that while shallow models can exhibit incremental learning dynamics, they require the initialization to be exponentially small for these dynamics to present themselves. However, once the model becomes deeper, the dependence becomes polynomial and incremental learning can arise in more natural settings. We complement our theoretical findings by experimenting with deep matrix sensing, quadratic neural networks and with binary classification using diagonal and convolutional linear networks, showing all of these models exhibit incremental learning.
Tasks
Published 2019-09-26
URL https://arxiv.org/abs/1909.12051v2
PDF https://arxiv.org/pdf/1909.12051v2.pdf
PWC https://paperswithcode.com/paper/the-implicit-bias-of-depth-how-incremental
Repo https://github.com/dsgissin/Incremental-Learning
Framework tf

Geometric Back-projection Network for Point Cloud Classification

Title Geometric Back-projection Network for Point Cloud Classification
Authors Shi Qiu, Saeed Anwar, Nick Barnes
Abstract As the basic task of point cloud learning, classification is fundamental but always challenging. To address some unsolved problems of existing methods, we propose a CNN based network leveraging an idea of error-correcting feedback structure to comprehensively capture the local features of 3D point clouds. Besides, we also enrich the explicit and implicit geometric information of point clouds in low-level 3D space and high-level feature space, respectively. By applying an attention module based on channel affinity, that focuses on distinct channels, the learned feature map of our network can effectively avoid redundancy. The performance on synthetic and real-world datasets demonstrate the superiority and applicability of our network. Comparing with other state-of-the-art methods, our approach balances accuracy and efficiency.
Tasks
Published 2019-11-28
URL https://arxiv.org/abs/1911.12885v3
PDF https://arxiv.org/pdf/1911.12885v3.pdf
PWC https://paperswithcode.com/paper/geometric-feedback-network-for-point-cloud
Repo https://github.com/ShiQiu0419/GFNet
Framework pytorch

Towards Robust Deep Reinforcement Learning for Traffic Signal Control: Demand Surges, Incidents and Sensor Failures

Title Towards Robust Deep Reinforcement Learning for Traffic Signal Control: Demand Surges, Incidents and Sensor Failures
Authors Filipe Rodrigues, Carlos Lima Azevedo
Abstract Reinforcement learning (RL) constitutes a promising solution for alleviating the problem of traffic congestion. In particular, deep RL algorithms have been shown to produce adaptive traffic signal controllers that outperform conventional systems. However, in order to be reliable in highly dynamic urban areas, such controllers need to be robust with the respect to a series of exogenous sources of uncertainty. In this paper, we develop an open-source callback-based framework for promoting the flexible evaluation of different deep RL configurations under a traffic simulation environment. With this framework, we investigate how deep RL-based adaptive traffic controllers perform under different scenarios, namely under demand surges caused by special events, capacity reductions from incidents and sensor failures. We extract several key insights for the development of robust deep RL algorithms for traffic control and propose concrete designs to mitigate the impact of the considered exogenous uncertainties.
Tasks
Published 2019-04-17
URL https://arxiv.org/abs/1904.08353v2
PDF https://arxiv.org/pdf/1904.08353v2.pdf
PWC https://paperswithcode.com/paper/towards-robust-deep-reinforcement-learning
Repo https://github.com/fmpr/CAREL
Framework tf

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

Title Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
Authors R. Thomas McCoy, Ellie Pavlick, Tal Linzen
Abstract A machine learning system can score well on a given test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another. We hypothesize that statistical NLI models may adopt three fallible syntactic heuristics: the lexical overlap heuristic, the subsequence heuristic, and the constituent heuristic. To determine whether models have adopted these heuristics, we introduce a controlled evaluation set called HANS (Heuristic Analysis for NLI Systems), which contains many examples where the heuristics fail. We find that models trained on MNLI, including BERT, a state-of-the-art model, perform very poorly on HANS, suggesting that they have indeed adopted these heuristics. We conclude that there is substantial room for improvement in NLI systems, and that the HANS dataset can motivate and measure progress in this area
Tasks Natural Language Inference
Published 2019-02-04
URL https://arxiv.org/abs/1902.01007v4
PDF https://arxiv.org/pdf/1902.01007v4.pdf
PWC https://paperswithcode.com/paper/right-for-the-wrong-reasons-diagnosing
Repo https://github.com/tommccoy1/hans
Framework none

Differentially Private Bayesian Linear Regression

Title Differentially Private Bayesian Linear Regression
Authors Garrett Bernstein, Daniel Sheldon
Abstract Linear regression is an important tool across many fields that work with sensitive human-sourced data. Significant prior work has focused on producing differentially private point estimates, which provide a privacy guarantee to individuals while still allowing modelers to draw insights from data by estimating regression coefficients. We investigate the problem of Bayesian linear regression, with the goal of computing posterior distributions that correctly quantify uncertainty given privately released statistics. We show that a naive approach that ignores the noise injected by the privacy mechanism does a poor job in realistic data settings. We then develop noise-aware methods that perform inference over the privacy mechanism and produce correct posteriors across a wide range of scenarios.
Tasks
Published 2019-10-29
URL https://arxiv.org/abs/1910.13153v1
PDF https://arxiv.org/pdf/1910.13153v1.pdf
PWC https://paperswithcode.com/paper/differentially-private-bayesian-linear
Repo https://github.com/gbernstein6/private_bayesian_regression
Framework none

Efficient Adaptation of Pretrained Transformers for Abstractive Summarization

Title Efficient Adaptation of Pretrained Transformers for Abstractive Summarization
Authors Andrew Hoang, Antoine Bosselut, Asli Celikyilmaz, Yejin Choi
Abstract Large-scale learning of transformer language models has yielded improvements on a variety of natural language understanding tasks. Whether they can be effectively adapted for summarization, however, has been less explored, as the learned representations are less seamlessly integrated into existing neural text production architectures. In this work, we propose two solutions for efficiently adapting pretrained transformer language models as text summarizers: source embeddings and domain-adaptive training. We test these solutions on three abstractive summarization datasets, achieving new state of the art performance on two of them. Finally, we show that these improvements are achieved by producing more focused summaries with fewer superfluous and that performance improvements are more pronounced on more abstractive datasets.
Tasks Abstractive Text Summarization
Published 2019-06-01
URL https://arxiv.org/abs/1906.00138v1
PDF https://arxiv.org/pdf/1906.00138v1.pdf
PWC https://paperswithcode.com/paper/190600138
Repo https://github.com/Andrew03/transformer-abstractive-summarization
Framework pytorch

Deep Generalized Method of Moments for Instrumental Variable Analysis

Title Deep Generalized Method of Moments for Instrumental Variable Analysis
Authors Andrew Bennett, Nathan Kallus, Tobias Schnabel
Abstract Instrumental variable analysis is a powerful tool for estimating causal effects when randomization or full control of confounders is not possible. The application of standard methods such as 2SLS, GMM, and more recent variants are significantly impeded when the causal effects are complex, the instruments are high-dimensional, and/or the treatment is high-dimensional. In this paper, we propose the DeepGMM algorithm to overcome this. Our algorithm is based on a new variational reformulation of GMM with optimal inverse-covariance weighting that allows us to efficiently control very many moment conditions. We further develop practical techniques for optimization and model selection that make it particularly successful in practice. Our algorithm is also computationally tractable and can handle large-scale datasets. Numerical results show our algorithm matches the performance of the best tuned methods in standard settings and continues to work in high-dimensional settings where even recent methods break.
Tasks Model Selection
Published 2019-05-29
URL https://arxiv.org/abs/1905.12495v1
PDF https://arxiv.org/pdf/1905.12495v1.pdf
PWC https://paperswithcode.com/paper/deep-generalized-method-of-moments-for
Repo https://github.com/CausalML/DeepGMM
Framework pytorch

Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation

Title Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation
Authors Shengju Qian, Keqiang Sun, Wayne Wu, Chen Qian, Jiaya Jia
Abstract Facial landmark detection, or face alignment, is a fundamental task that has been extensively studied. In this paper, we investigate a new perspective of facial landmark detection and demonstrate it leads to further notable improvement. Given that any face images can be factored into space of style that captures lighting, texture and image environment, and a style-invariant structure space, our key idea is to leverage disentangled style and shape space of each individual to augment existing structures via style translation. With these augmented synthetic samples, our semi-supervised model surprisingly outperforms the fully-supervised one by a large margin. Extensive experiments verify the effectiveness of our idea with state-of-the-art results on WFLW, 300W, COFW, and AFLW datasets. Our proposed structure is general and could be assembled into any face alignment frameworks. The code is made publicly available at https://github.com/thesouthfrog/stylealign.
Tasks Face Alignment, Facial Landmark Detection
Published 2019-08-18
URL https://arxiv.org/abs/1908.06440v1
PDF https://arxiv.org/pdf/1908.06440v1.pdf
PWC https://paperswithcode.com/paper/aggregation-via-separation-boosting-facial
Repo https://github.com/thesouthfrog/stylealign
Framework tf

On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems

Title On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems
Authors Baekjin Kim, Ambuj Tewari
Abstract We investigate the optimality of perturbation based algorithms in the stochastic and adversarial multi-armed bandit problems. For the stochastic case, we provide a unified regret analysis for both sub-Weibull and bounded perturbations when rewards are sub-Gaussian. Our bounds are instance optimal for sub-Weibull perturbations with parameter 2 that also have a matching lower tail bound, and all bounded support perturbations where there is sufficient probability mass at the extremes of the support. For the adversarial setting, we prove rigorous barriers against two natural solution approaches using tools from discrete choice theory and extreme value theory. Our results suggest that the optimal perturbation, if it exists, will be of Frechet-type.
Tasks
Published 2019-02-02
URL https://arxiv.org/abs/1902.00610v4
PDF https://arxiv.org/pdf/1902.00610v4.pdf
PWC https://paperswithcode.com/paper/on-the-optimality-of-perturbations-in
Repo https://github.com/Kimbaekjin/Perturbation-Methods-StochasticMAB
Framework none

Learning to compress and search visual data in large-scale systems

Title Learning to compress and search visual data in large-scale systems
Authors Sohrab Ferdowsi
Abstract The problem of high-dimensional and large-scale representation of visual data is addressed from an unsupervised learning perspective. The emphasis is put on discrete representations, where the description length can be measured in bits and hence the model capacity can be controlled. The algorithmic infrastructure is developed based on the synthesis and analysis prior models whose rate-distortion properties, as well as capacity vs. sample complexity trade-offs are carefully optimized. These models are then extended to multi-layers, namely the RRQ and the ML-STC frameworks, where the latter is further evolved as a powerful deep neural network architecture with fast and sample-efficient training and discrete representations. For the developed algorithms, three important applications are developed. First, the problem of large-scale similarity search in retrieval systems is addressed, where a double-stage solution is proposed leading to faster query times and shorter database storage. Second, the problem of learned image compression is targeted, where the proposed models can capture more redundancies from the training images than the conventional compression codecs. Finally, the proposed algorithms are used to solve ill-posed inverse problems. In particular, the problems of image denoising and compressive sensing are addressed with promising results.
Tasks Compressive Sensing, Denoising, Image Compression, Image Denoising
Published 2019-01-24
URL http://arxiv.org/abs/1901.08437v1
PDF http://arxiv.org/pdf/1901.08437v1.pdf
PWC https://paperswithcode.com/paper/learning-to-compress-and-search-visual-data
Repo https://github.com/sssohrab/PhDthesis
Framework none

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

Title vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Authors Alexei Baevski, Steffen Schneider, Michael Auli
Abstract We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task. The algorithm uses either a gumbel softmax or online k-means clustering to quantize the dense representations. Discretization enables the direct application of algorithms from the NLP community which require discrete inputs. Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition.
Tasks Speech Recognition
Published 2019-10-12
URL https://arxiv.org/abs/1910.05453v3
PDF https://arxiv.org/pdf/1910.05453v3.pdf
PWC https://paperswithcode.com/paper/vq-wav2vec-self-supervised-learning-of-1
Repo https://github.com/pytorch/fairseq
Framework pytorch

Bayesian Volumetric Autoregressive generative models for better semisupervised learning

Title Bayesian Volumetric Autoregressive generative models for better semisupervised learning
Authors Guilherme Pombo, Robert Gray, Tom Varsavsky, John Ashburner, Parashkev Nachev
Abstract Deep generative models are rapidly gaining traction in medical imaging. Nonetheless, most generative architectures struggle to capture the underlying probability distributions of volumetric data, exhibit convergence problems, and offer no robust indices of model uncertainty. By comparison, the autoregressive generative model PixelCNN can be extended to volumetric data with relative ease, it readily attempts to learn the true underlying probability distribution and it still admits a Bayesian reformulation that provides a principled framework for reasoning about model uncertainty. Our contributions in this paper are two fold: first, we extend PixelCNN to work with volumetric brain magnetic resonance imaging data. Second, we show that reformulating this model to approximate a deep Gaussian process yields a measure of uncertainty that improves the performance of semi-supervised learning, in particular classification performance in settings where the proportion of labelled data is low. We quantify this improvement across classification, regression, and semantic segmentation tasks, training and testing on clinical magnetic resonance brain imaging data comprising T1-weighted and diffusion-weighted sequences.
Tasks Semantic Segmentation
Published 2019-07-26
URL https://arxiv.org/abs/1907.11559v1
PDF https://arxiv.org/pdf/1907.11559v1.pdf
PWC https://paperswithcode.com/paper/bayesian-volumetric-autoregressive-generative
Repo https://github.com/guilherme-pombo/3DPixelCNN
Framework tf

RTHN: A RNN-Transformer Hierarchical Network for Emotion Cause Extraction

Title RTHN: A RNN-Transformer Hierarchical Network for Emotion Cause Extraction
Authors Rui Xia, Mengran Zhang, Zixiang Ding
Abstract The emotion cause extraction (ECE) task aims at discovering the potential causes behind a certain emotion expression in a document. Techniques including rule-based methods, traditional machine learning methods and deep neural networks have been proposed to solve this task. However, most of the previous work considered ECE as a set of independent clause classification problems and ignored the relations between multiple clauses in a document. In this work, we propose a joint emotion cause extraction framework, named RNN-Transformer Hierarchical Network (RTHN), to encode and classify multiple clauses synchronously. RTHN is composed of a lower word-level encoder based on RNNs to encode multiple words in each clause, and an upper clause-level encoder based on Transformer to learn the correlation between multiple clauses in a document. We furthermore propose ways to encode the relative position and global predication information into Transformer that can capture the causality between clauses and make RTHN more efficient. We finally achieve the best performance among 12 compared systems and improve the F1 score of the state-of-the-art from 72.69% to 76.77%.
Tasks
Published 2019-06-04
URL https://arxiv.org/abs/1906.01236v1
PDF https://arxiv.org/pdf/1906.01236v1.pdf
PWC https://paperswithcode.com/paper/rthn-a-rnn-transformer-hierarchical-network
Repo https://github.com/NUSTM/RTHN
Framework tf
comments powered by Disqus