February 2, 2020

2724 words 13 mins read

Paper Group AWR 18

AdvHat: Real-world adversarial attack on ArcFace Face ID system. CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. The Implicit Bias of Depth: How Incremental Learning Drives Generalization. Geometric Back-projection Network for Point Cloud Classification. Towards Robust Deep Reinforcement Learning for Traffic Signal Control: Dem …

AdvHat: Real-world adversarial attack on ArcFace Face ID system


Title	AdvHat: Real-world adversarial attack on ArcFace Face ID system
Authors	Stepan Komkov, Aleksandr Petiushko
Abstract	In this paper we propose a novel easily reproducible technique to attack the best public Face ID system ArcFace in different shooting conditions. To create an attack, we print the rectangular paper sticker on a common color printer and put it on the hat. The adversarial sticker is prepared with a novel algorithm for off-plane transformations of the image which imitates sticker location on the hat. Such an approach confuses the state-of-the-art public Face ID model LResNet100E-IR, ArcFace@ms1m-refine-v2 and is transferable to other Face ID models.
Tasks	Adversarial Attack
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08705v1
PDF	https://arxiv.org/pdf/1908.08705v1.pdf
PWC	https://paperswithcode.com/paper/advhat-real-world-adversarial-attack-on
Repo	https://github.com/papermsucode/advhat
Framework	tf


Title	CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Authors	Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao
Abstract	Text-image cross-modal retrieval is a challenging task in the field of language and vision. Most previous approaches independently embed images and sentences into a joint embedding space and compare their similarities. However, previous approaches rarely explore the interactions between images and sentences before calculating similarities in the joint space. Intuitively, when matching between images and sentences, human beings would alternatively attend to regions in images and words in sentences, and select the most salient information considering the interaction between both modalities. In this paper, we propose Cross-modal Adaptive Message Passing (CAMP), which adaptively controls the information flow for message passing across modalities. Our approach not only takes comprehensive and fine-grained cross-modal interactions into account, but also properly handles negative pairs and irrelevant information with an adaptive gating scheme. Moreover, instead of conventional joint embedding approaches for text-image matching, we infer the matching score based on the fused features, and propose a hardest negative binary cross-entropy loss for training. Results on COCO and Flickr30k significantly surpass state-of-the-art methods, demonstrating the effectiveness of our approach.
Tasks	Cross-Modal Retrieval, Image Retrieval
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05506v1
PDF	https://arxiv.org/pdf/1909.05506v1.pdf
PWC	https://paperswithcode.com/paper/camp-cross-modal-adaptive-message-passing-for
Repo	https://github.com/ZihaoWang-CV/CAMP_iccv19
Framework	pytorch

The Implicit Bias of Depth: How Incremental Learning Drives Generalization


Title	The Implicit Bias of Depth: How Incremental Learning Drives Generalization
Authors	Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely
Abstract	A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity. We formally define the notion of incremental learning dynamics and derive the conditions on depth and initialization for which this phenomenon arises in deep linear models. Our main theoretical contribution is a dynamical depth separation result, proving that while shallow models can exhibit incremental learning dynamics, they require the initialization to be exponentially small for these dynamics to present themselves. However, once the model becomes deeper, the dependence becomes polynomial and incremental learning can arise in more natural settings. We complement our theoretical findings by experimenting with deep matrix sensing, quadratic neural networks and with binary classification using diagonal and convolutional linear networks, showing all of these models exhibit incremental learning.
Tasks
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12051v2
PDF	https://arxiv.org/pdf/1909.12051v2.pdf
PWC	https://paperswithcode.com/paper/the-implicit-bias-of-depth-how-incremental
Repo	https://github.com/dsgissin/Incremental-Learning
Framework	tf

Geometric Back-projection Network for Point Cloud Classification


Title	Geometric Back-projection Network for Point Cloud Classification
Authors	Shi Qiu, Saeed Anwar, Nick Barnes
Abstract	As the basic task of point cloud learning, classification is fundamental but always challenging. To address some unsolved problems of existing methods, we propose a CNN based network leveraging an idea of error-correcting feedback structure to comprehensively capture the local features of 3D point clouds. Besides, we also enrich the explicit and implicit geometric information of point clouds in low-level 3D space and high-level feature space, respectively. By applying an attention module based on channel affinity, that focuses on distinct channels, the learned feature map of our network can effectively avoid redundancy. The performance on synthetic and real-world datasets demonstrate the superiority and applicability of our network. Comparing with other state-of-the-art methods, our approach balances accuracy and efficiency.
Tasks
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12885v3
PDF	https://arxiv.org/pdf/1911.12885v3.pdf
PWC	https://paperswithcode.com/paper/geometric-feedback-network-for-point-cloud
Repo	https://github.com/ShiQiu0419/GFNet
Framework	pytorch

Towards Robust Deep Reinforcement Learning for Traffic Signal Control: Demand Surges, Incidents and Sensor Failures


Title	Towards Robust Deep Reinforcement Learning for Traffic Signal Control: Demand Surges, Incidents and Sensor Failures
Authors	Filipe Rodrigues, Carlos Lima Azevedo
Abstract	Reinforcement learning (RL) constitutes a promising solution for alleviating the problem of traffic congestion. In particular, deep RL algorithms have been shown to produce adaptive traffic signal controllers that outperform conventional systems. However, in order to be reliable in highly dynamic urban areas, such controllers need to be robust with the respect to a series of exogenous sources of uncertainty. In this paper, we develop an open-source callback-based framework for promoting the flexible evaluation of different deep RL configurations under a traffic simulation environment. With this framework, we investigate how deep RL-based adaptive traffic controllers perform under different scenarios, namely under demand surges caused by special events, capacity reductions from incidents and sensor failures. We extract several key insights for the development of robust deep RL algorithms for traffic control and propose concrete designs to mitigate the impact of the considered exogenous uncertainties.
Tasks
Published	2019-04-17
URL	https://arxiv.org/abs/1904.08353v2
PDF	https://arxiv.org/pdf/1904.08353v2.pdf
PWC	https://paperswithcode.com/paper/towards-robust-deep-reinforcement-learning
Repo	https://github.com/fmpr/CAREL
Framework	tf

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference


Title	Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
Authors	R. Thomas McCoy, Ellie Pavlick, Tal Linzen
Abstract	A machine learning system can score well on a given test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another. We hypothesize that statistical NLI models may adopt three fallible syntactic heuristics: the lexical overlap heuristic, the subsequence heuristic, and the constituent heuristic. To determine whether models have adopted these heuristics, we introduce a controlled evaluation set called HANS (Heuristic Analysis for NLI Systems), which contains many examples where the heuristics fail. We find that models trained on MNLI, including BERT, a state-of-the-art model, perform very poorly on HANS, suggesting that they have indeed adopted these heuristics. We conclude that there is substantial room for improvement in NLI systems, and that the HANS dataset can motivate and measure progress in this area
Tasks	Natural Language Inference
Published	2019-02-04
URL	https://arxiv.org/abs/1902.01007v4
PDF	https://arxiv.org/pdf/1902.01007v4.pdf
PWC	https://paperswithcode.com/paper/right-for-the-wrong-reasons-diagnosing
Repo	https://github.com/tommccoy1/hans
Framework	none

Differentially Private Bayesian Linear Regression


Title	Differentially Private Bayesian Linear Regression
Authors	Garrett Bernstein, Daniel Sheldon
Abstract	Linear regression is an important tool across many fields that work with sensitive human-sourced data. Significant prior work has focused on producing differentially private point estimates, which provide a privacy guarantee to individuals while still allowing modelers to draw insights from data by estimating regression coefficients. We investigate the problem of Bayesian linear regression, with the goal of computing posterior distributions that correctly quantify uncertainty given privately released statistics. We show that a naive approach that ignores the noise injected by the privacy mechanism does a poor job in realistic data settings. We then develop noise-aware methods that perform inference over the privacy mechanism and produce correct posteriors across a wide range of scenarios.
Tasks
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13153v1
PDF	https://arxiv.org/pdf/1910.13153v1.pdf
PWC	https://paperswithcode.com/paper/differentially-private-bayesian-linear
Repo	https://github.com/gbernstein6/private_bayesian_regression
Framework	none

Efficient Adaptation of Pretrained Transformers for Abstractive Summarization


Title	Efficient Adaptation of Pretrained Transformers for Abstractive Summarization
Authors	Andrew Hoang, Antoine Bosselut, Asli Celikyilmaz, Yejin Choi
Abstract	Large-scale learning of transformer language models has yielded improvements on a variety of natural language understanding tasks. Whether they can be effectively adapted for summarization, however, has been less explored, as the learned representations are less seamlessly integrated into existing neural text production architectures. In this work, we propose two solutions for efficiently adapting pretrained transformer language models as text summarizers: source embeddings and domain-adaptive training. We test these solutions on three abstractive summarization datasets, achieving new state of the art performance on two of them. Finally, we show that these improvements are achieved by producing more focused summaries with fewer superfluous and that performance improvements are more pronounced on more abstractive datasets.
Tasks	Abstractive Text Summarization
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00138v1
PDF	https://arxiv.org/pdf/1906.00138v1.pdf
PWC	https://paperswithcode.com/paper/190600138
Repo	https://github.com/Andrew03/transformer-abstractive-summarization
Framework	pytorch

Deep Generalized Method of Moments for Instrumental Variable Analysis


Title	Deep Generalized Method of Moments for Instrumental Variable Analysis
Authors	Andrew Bennett, Nathan Kallus, Tobias Schnabel
Abstract	Instrumental variable analysis is a powerful tool for estimating causal effects when randomization or full control of confounders is not possible. The application of standard methods such as 2SLS, GMM, and more recent variants are significantly impeded when the causal effects are complex, the instruments are high-dimensional, and/or the treatment is high-dimensional. In this paper, we propose the DeepGMM algorithm to overcome this. Our algorithm is based on a new variational reformulation of GMM with optimal inverse-covariance weighting that allows us to efficiently control very many moment conditions. We further develop practical techniques for optimization and model selection that make it particularly successful in practice. Our algorithm is also computationally tractable and can handle large-scale datasets. Numerical results show our algorithm matches the performance of the best tuned methods in standard settings and continues to work in high-dimensional settings where even recent methods break.
Tasks	Model Selection
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12495v1
PDF	https://arxiv.org/pdf/1905.12495v1.pdf
PWC	https://paperswithcode.com/paper/deep-generalized-method-of-moments-for
Repo	https://github.com/CausalML/DeepGMM
Framework	pytorch

Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation


Title	Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation
Authors	Shengju Qian, Keqiang Sun, Wayne Wu, Chen Qian, Jiaya Jia
Abstract	Facial landmark detection, or face alignment, is a fundamental task that has been extensively studied. In this paper, we investigate a new perspective of facial landmark detection and demonstrate it leads to further notable improvement. Given that any face images can be factored into space of style that captures lighting, texture and image environment, and a style-invariant structure space, our key idea is to leverage disentangled style and shape space of each individual to augment existing structures via style translation. With these augmented synthetic samples, our semi-supervised model surprisingly outperforms the fully-supervised one by a large margin. Extensive experiments verify the effectiveness of our idea with state-of-the-art results on WFLW, 300W, COFW, and AFLW datasets. Our proposed structure is general and could be assembled into any face alignment frameworks. The code is made publicly available at https://github.com/thesouthfrog/stylealign.
Tasks	Face Alignment, Facial Landmark Detection
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06440v1
PDF	https://arxiv.org/pdf/1908.06440v1.pdf
PWC	https://paperswithcode.com/paper/aggregation-via-separation-boosting-facial
Repo	https://github.com/thesouthfrog/stylealign
Framework	tf

On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems


Title	On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems
Authors	Baekjin Kim, Ambuj Tewari
Abstract	We investigate the optimality of perturbation based algorithms in the stochastic and adversarial multi-armed bandit problems. For the stochastic case, we provide a unified regret analysis for both sub-Weibull and bounded perturbations when rewards are sub-Gaussian. Our bounds are instance optimal for sub-Weibull perturbations with parameter 2 that also have a matching lower tail bound, and all bounded support perturbations where there is sufficient probability mass at the extremes of the support. For the adversarial setting, we prove rigorous barriers against two natural solution approaches using tools from discrete choice theory and extreme value theory. Our results suggest that the optimal perturbation, if it exists, will be of Frechet-type.
Tasks
Published	2019-02-02
URL	https://arxiv.org/abs/1902.00610v4
PDF	https://arxiv.org/pdf/1902.00610v4.pdf
PWC	https://paperswithcode.com/paper/on-the-optimality-of-perturbations-in
Repo	https://github.com/Kimbaekjin/Perturbation-Methods-StochasticMAB
Framework	none

Learning to compress and search visual data in large-scale systems


Title	Learning to compress and search visual data in large-scale systems
Authors	Sohrab Ferdowsi
Abstract	The problem of high-dimensional and large-scale representation of visual data is addressed from an unsupervised learning perspective. The emphasis is put on discrete representations, where the description length can be measured in bits and hence the model capacity can be controlled. The algorithmic infrastructure is developed based on the synthesis and analysis prior models whose rate-distortion properties, as well as capacity vs. sample complexity trade-offs are carefully optimized. These models are then extended to multi-layers, namely the RRQ and the ML-STC frameworks, where the latter is further evolved as a powerful deep neural network architecture with fast and sample-efficient training and discrete representations. For the developed algorithms, three important applications are developed. First, the problem of large-scale similarity search in retrieval systems is addressed, where a double-stage solution is proposed leading to faster query times and shorter database storage. Second, the problem of learned image compression is targeted, where the proposed models can capture more redundancies from the training images than the conventional compression codecs. Finally, the proposed algorithms are used to solve ill-posed inverse problems. In particular, the problems of image denoising and compressive sensing are addressed with promising results.
Tasks	Compressive Sensing, Denoising, Image Compression, Image Denoising
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08437v1
PDF	http://arxiv.org/pdf/1901.08437v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-compress-and-search-visual-data
Repo	https://github.com/sssohrab/PhDthesis
Framework	none

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations


Title	vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Authors	Alexei Baevski, Steffen Schneider, Michael Auli
Abstract	We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task. The algorithm uses either a gumbel softmax or online k-means clustering to quantize the dense representations. Discretization enables the direct application of algorithms from the NLP community which require discrete inputs. Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition.
Tasks	Speech Recognition
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05453v3
PDF	https://arxiv.org/pdf/1910.05453v3.pdf
PWC	https://paperswithcode.com/paper/vq-wav2vec-self-supervised-learning-of-1
Repo	https://github.com/pytorch/fairseq
Framework	pytorch

Bayesian Volumetric Autoregressive generative models for better semisupervised learning


Title	Bayesian Volumetric Autoregressive generative models for better semisupervised learning
Authors	Guilherme Pombo, Robert Gray, Tom Varsavsky, John Ashburner, Parashkev Nachev
Abstract	Deep generative models are rapidly gaining traction in medical imaging. Nonetheless, most generative architectures struggle to capture the underlying probability distributions of volumetric data, exhibit convergence problems, and offer no robust indices of model uncertainty. By comparison, the autoregressive generative model PixelCNN can be extended to volumetric data with relative ease, it readily attempts to learn the true underlying probability distribution and it still admits a Bayesian reformulation that provides a principled framework for reasoning about model uncertainty. Our contributions in this paper are two fold: first, we extend PixelCNN to work with volumetric brain magnetic resonance imaging data. Second, we show that reformulating this model to approximate a deep Gaussian process yields a measure of uncertainty that improves the performance of semi-supervised learning, in particular classification performance in settings where the proportion of labelled data is low. We quantify this improvement across classification, regression, and semantic segmentation tasks, training and testing on clinical magnetic resonance brain imaging data comprising T1-weighted and diffusion-weighted sequences.
Tasks	Semantic Segmentation
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11559v1
PDF	https://arxiv.org/pdf/1907.11559v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-volumetric-autoregressive-generative
Repo	https://github.com/guilherme-pombo/3DPixelCNN
Framework	tf

RTHN: A RNN-Transformer Hierarchical Network for Emotion Cause Extraction


Title	RTHN: A RNN-Transformer Hierarchical Network for Emotion Cause Extraction
Authors	Rui Xia, Mengran Zhang, Zixiang Ding
Abstract	The emotion cause extraction (ECE) task aims at discovering the potential causes behind a certain emotion expression in a document. Techniques including rule-based methods, traditional machine learning methods and deep neural networks have been proposed to solve this task. However, most of the previous work considered ECE as a set of independent clause classification problems and ignored the relations between multiple clauses in a document. In this work, we propose a joint emotion cause extraction framework, named RNN-Transformer Hierarchical Network (RTHN), to encode and classify multiple clauses synchronously. RTHN is composed of a lower word-level encoder based on RNNs to encode multiple words in each clause, and an upper clause-level encoder based on Transformer to learn the correlation between multiple clauses in a document. We furthermore propose ways to encode the relative position and global predication information into Transformer that can capture the causality between clauses and make RTHN more efficient. We finally achieve the best performance among 12 compared systems and improve the F1 score of the state-of-the-art from 72.69% to 76.77%.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01236v1
PDF	https://arxiv.org/pdf/1906.01236v1.pdf
PWC	https://paperswithcode.com/paper/rthn-a-rnn-transformer-hierarchical-network
Repo	https://github.com/NUSTM/RTHN
Framework	tf