February 1, 2020

2678 words 13 mins read

Paper Group AWR 120

Locality-constrained Spatial Transformer Network for Video Crowd Counting. Robustness of Generalized Learning Vector Quantization Models against Adversarial Attacks. Adversarial Defense by Suppressing High-frequency Components. Personalizing Graph Neural Networks with Attention Mechanism for Session-based Recommendation. C^3 Framework: An Open-sour …

Locality-constrained Spatial Transformer Network for Video Crowd Counting


Title	Locality-constrained Spatial Transformer Network for Video Crowd Counting
Authors	Yanyan Fang, Biyun Zhan, Wandi Cai, Shenghua Gao, Bo Hu
Abstract	Compared with single image based crowd counting, video provides the spatial-temporal information of the crowd that would help improve the robustness of crowd counting. But translation, rotation and scaling of people lead to the change of density map of heads between neighbouring frames. Meanwhile, people walking in/out or being occluded in dynamic scenes leads to the change of head counts. To alleviate these issues in video crowd counting, a Locality-constrained Spatial Transformer Network (LSTN) is proposed. Specifically, we first leverage a Convolutional Neural Networks to estimate the density map for each frame. Then to relate the density maps between neighbouring frames, a Locality-constrained Spatial Transformer (LST) module is introduced to estimate the density map of next frame with that of current frame. To facilitate the performance evaluation, a large-scale video crowd counting dataset is collected, which contains 15K frames with about 394K annotated heads captured from 13 different scenes. As far as we know, it is the largest video crowd counting dataset. Extensive experiments on our dataset and other crowd counting datasets validate the effectiveness of our LSTN for crowd counting.
Tasks	Crowd Counting
Published	2019-07-18
URL	https://arxiv.org/abs/1907.07911v1
PDF	https://arxiv.org/pdf/1907.07911v1.pdf
PWC	https://paperswithcode.com/paper/locality-constrained-spatial-transformer
Repo	https://github.com/sweetyy83/Lstn_fdst_dataset
Framework	none

Robustness of Generalized Learning Vector Quantization Models against Adversarial Attacks


Title	Robustness of Generalized Learning Vector Quantization Models against Adversarial Attacks
Authors	Sascha Saralajew, Lars Holdijk, Maike Rees, Thomas Villmann
Abstract	Adversarial attacks and the development of (deep) neural networks robust against them are currently two widely researched topics. The robustness of Learning Vector Quantization (LVQ) models against adversarial attacks has however not yet been studied to the same extent. We therefore present an extensive evaluation of three LVQ models: Generalized LVQ, Generalized Matrix LVQ and Generalized Tangent LVQ. The evaluation suggests that both Generalized LVQ and Generalized Tangent LVQ have a high base robustness, on par with the current state-of-the-art in robust neural network methods. In contrast to this, Generalized Matrix LVQ shows a high susceptibility to adversarial attacks, scoring consistently behind all other models. Additionally, our numerical evaluation indicates that increasing the number of prototypes per class improves the robustness of the models.
Tasks	Quantization
Published	2019-02-01
URL	http://arxiv.org/abs/1902.00577v2
PDF	http://arxiv.org/pdf/1902.00577v2.pdf
PWC	https://paperswithcode.com/paper/robustness-of-generalized-learning-vector
Repo	https://github.com/LarsHoldijk/robust_LVQ_models
Framework	tf

Adversarial Defense by Suppressing High-frequency Components


Title	Adversarial Defense by Suppressing High-frequency Components
Authors	Zhendong Zhang, Cheolkon Jung, Xiaolong Liang
Abstract	Recent works show that deep neural networks trained on image classification dataset bias towards textures. Those models are easily fooled by applying small high-frequency perturbations to clean images. In this paper, we learn robust image classification models by removing high-frequency components. Specifically, we develop a differentiable high-frequency suppression module based on discrete Fourier transform (DFT). Combining with adversarial training, we won the 5th place in the IJCAI-2019 Alibaba Adversarial AI Challenge. Our code is available online.
Tasks	Adversarial Defense, Image Classification
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06566v3
PDF	https://arxiv.org/pdf/1908.06566v3.pdf
PWC	https://paperswithcode.com/paper/adversarial-defense-by-suppressing-high
Repo	https://github.com/zzd1992/Adversarial-Defense-by-Suppressing-High-Frequencies
Framework	pytorch

Personalizing Graph Neural Networks with Attention Mechanism for Session-based Recommendation


Title	Personalizing Graph Neural Networks with Attention Mechanism for Session-based Recommendation
Authors	Shu Wu, Mengqi Zhang, Xin Jiang, Xu Ke, Liang Wang
Abstract	The problem of personalized session-based recommendation aims to predict users’ next click based on their sequential behaviors. Existing session-based recommendation methods only consider all sessions of user as a single sequence, ignoring the relationship of among sessions. Other than that, most of them neglect complex transitions of items and the collaborative relationship between users and items. To this end, we propose a novel method, named Personalizing Graph Neural Networks with Attention Mechanism, A-PGNN for brevity. A-PGNN mainly consists of two components: One is Personalizing Graph Neural Network (PGNN), which is used to capture complex transitions in user session sequence. Compared with the traditional Graph Neural Network (GNN) model, it also considers the role of users in the sequence. The other is Dot-Product Attention mechanism, which draws on the attention mechanism in machine translation to explicitly model the effect of historical sessions on the current session. These two parts make it possible to learn the multi-level transition relationships between items and sessions in user-specific fashion. Extensive experiments conducted on two real-world data sets show that A-PGNN significantly outperforms the state-of-the-art personalizing session-based recommendation methods consistently.
Tasks	Machine Translation, Session-Based Recommendations
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08887v2
PDF	https://arxiv.org/pdf/1910.08887v2.pdf
PWC	https://paperswithcode.com/paper/personalizing-graph-neural-networks-with
Repo	https://github.com/CRIPAC-DIG/A-PGNN
Framework	tf

C^3 Framework: An Open-source PyTorch Code for Crowd Counting


Title	C^3 Framework: An Open-source PyTorch Code for Crowd Counting
Authors	Junyu Gao, Wei Lin, Bin Zhao, Dong Wang, Chenyu Gao, Jun Wen
Abstract	This technical report attempts to provide efficient and solid kits addressed on the field of crowd counting, which is denoted as Crowd Counting Code Framework (C$^3$F). The contributions of C$^3$F are in three folds: 1) Some solid baseline networks are presented, which have achieved the state-of-the-arts. 2) Some flexible parameter setting strategies are provided to further promote the performance. 3) A powerful log system is developed to record the experiment process, which can enhance the reproducibility of each experiment. Our code is made publicly available at \url{https://github.com/gjy3035/C-3-Framework}. Furthermore, we also post a Chinese blog\footnote{\url{https://zhuanlan.zhihu.com/p/65650998}} to describe the details and insights of crowd counting.
Tasks	Crowd Counting
Published	2019-07-05
URL	https://arxiv.org/abs/1907.02724v1
PDF	https://arxiv.org/pdf/1907.02724v1.pdf
PWC	https://paperswithcode.com/paper/c3-framework-an-open-source-pytorch-code-for
Repo	https://github.com/surajdakua/Crowd-Counting-Using-Pytorch
Framework	pytorch

Attention Is (not) All You Need for Commonsense Reasoning


Title	Attention Is (not) All You Need for Commonsense Reasoning
Authors	Tassilo Klein, Moin Nabi
Abstract	The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13497v1
PDF	https://arxiv.org/pdf/1905.13497v1.pdf
PWC	https://paperswithcode.com/paper/attention-is-not-all-you-need-for-commonsense
Repo	https://github.com/SAP-samples/acl2019-commonsense-reasoning
Framework	pytorch

Fast 3D Line Segment Detection From Unorganized Point Cloud


Title	Fast 3D Line Segment Detection From Unorganized Point Cloud
Authors	Xiaohu Lu, Yahui Liu, Kai Li
Abstract	This paper presents a very simple but efficient algorithm for 3D line segment detection from large scale unorganized point cloud. Unlike traditional methods which usually extract 3D edge points first and then link them to fit for 3D line segments, we propose a very simple 3D line segment detection algorithm based on point cloud segmentation and 2D line detection. Given the input unorganized point cloud, three steps are performed to detect 3D line segments. Firstly, the point cloud is segmented into 3D planes via region growing and region merging. Secondly, for each 3D plane, all the points belonging to it are projected onto the plane itself to form a 2D image, which is followed by 2D contour extraction and Least Square Fitting to get the 2D line segments. Those 2D line segments are then re-projected onto the 3D plane to get the corresponding 3D line segments. Finally, a post-processing procedure is proposed to eliminate outliers and merge adjacent 3D line segments. Experiments on several public datasets demonstrate the efficiency and robustness of our method. More results and the C++ source code of the proposed algorithm are publicly available at https://github.com/xiaohulugo/3DLineDetection.
Tasks	Line Segment Detection
Published	2019-01-08
URL	http://arxiv.org/abs/1901.02532v1
PDF	http://arxiv.org/pdf/1901.02532v1.pdf
PWC	https://paperswithcode.com/paper/fast-3d-line-segment-detection-from
Repo	https://github.com/xiaohulugo/3DLineDetection
Framework	none

W-PoseNet: Dense Correspondence Regularized Pixel Pair Pose Regression


Title	W-PoseNet: Dense Correspondence Regularized Pixel Pair Pose Regression
Authors	Zelin Xu, Ke Chen, Kui Jia
Abstract	Solving 6D pose estimation is non-trivial to cope with intrinsic appearance and shape variation and severe inter-object occlusion, and is made more challenging in light of extrinsic large illumination changes and low quality of the acquired data under an uncontrolled environment. This paper introduces a novel pose estimation algorithm W-PoseNet, which densely regresses from input data to 6D pose and also 3D coordinates in model space. In other words, local features learned for pose regression in our deep network are regularized by explicitly learning pixel-wise correspondence mapping onto 3D pose-sensitive coordinates as an auxiliary task. Moreover, a sparse pair combination of pixel-wise features and soft voting on pixel-pair pose predictions are designed to improve robustness to inconsistent and sparse local features. Experiment results on the popular YCB-Video and LineMOD benchmarks show that the proposed W-PoseNet consistently achieves superior performance to the state-of-the-art algorithms.
Tasks	6D Pose Estimation, Pose Estimation
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11888v1
PDF	https://arxiv.org/pdf/1912.11888v1.pdf
PWC	https://paperswithcode.com/paper/w-posenet-dense-correspondence-regularized
Repo	https://github.com/xzlscut/W-PoseNet
Framework	none

STELA: A Real-Time Scene Text Detector with Learned Anchor


Title	STELA: A Real-Time Scene Text Detector with Learned Anchor
Authors	Linjie Deng, Yanxiang Gong, Xinchen Lu, Yi Lin, Zheng Ma, Mei Xie
Abstract	To achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks. In this work, we present a simple and intuitive method for multi-oriented text detection where each location of feature maps only associates with one reference box. The idea is inspired from the twostage R-CNN framework that can estimate the location of objects with any shape by using learned proposals. The aim of our method is to integrate this mechanism into a onestage detector and employ the learned anchor which is obtained through a regression operation to replace the original one into the final predictions. Based on RetinaNet, our method achieves competitive performances on several public benchmarks with a totally real-time efficiency (26:5fps at 800p), which surpasses all of anchor-based scene text detectors. In addition, with less attention on anchor design, we believe our method is easy to be applied on other analogous detection tasks. The code will publicly available at https://github.com/xhzdeng/stela.
Tasks	Scene Text Detection
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07549v2
PDF	https://arxiv.org/pdf/1909.07549v2.pdf
PWC	https://paperswithcode.com/paper/stela-a-real-time-scene-text-detector-with
Repo	https://github.com/xhzdeng/stela
Framework	pytorch

Practical Calculation of Gittins Indices for Multi-armed Bandits


Title	Practical Calculation of Gittins Indices for Multi-armed Bandits
Authors	James Edwards
Abstract	Gittins indices provide an optimal solution to the classical multi-armed bandit problem. An obstacle to their use has been the common perception that their computation is very difficult. This paper demonstrates an accessible general methodology for the calculating Gittins indices for the multi-armed bandit with a detailed study on the cases of Bernoulli and Gaussian rewards. With accompanying easy-to-use open source software, this work removes computation as a barrier to using Gittins indices in these commonly found settings.
Tasks	Multi-Armed Bandits
Published	2019-09-11
URL	https://arxiv.org/abs/1909.05075v1
PDF	https://arxiv.org/pdf/1909.05075v1.pdf
PWC	https://paperswithcode.com/paper/practical-calculation-of-gittins-indices-for
Repo	https://github.com/jedwards24/gittins
Framework	none

Toward Unsupervised Text Content Manipulation


Title	Toward Unsupervised Text Content Manipulation
Authors	Wentao Wang, Zhiting Hu, Zichao Yang, Haoran Shi, Frank Xu, Eric Xing
Abstract	Controlled generation of text is of high practical use. Recent efforts have made impressive progress in generating or editing sentences with given textual attributes (e.g., sentiment). This work studies a new practical setting of text content manipulation. Given a structured record, such as `(PLAYER: Lebron, POINTS: 20, ASSISTS: 10)', and a reference sentence, such as` Kobe easily dropped 30 points’, we aim to generate a sentence that accurately describes the full content in the record, with the same writing style (e.g., wording, transitions) of the reference. The problem is unsupervised due to lack of parallel data in practice, and is challenging to minimally yet effectively manipulate the text (by rewriting/adding/deleting text portions) to ensure fidelity to the structured content. We derive a dataset from a basketball game report corpus as our testbed, and develop a neural method with unsupervised competing objectives and explicit content coverage constraints. Automatic and human evaluations show superiority of our approach over competitive methods including a strong rule-based baseline and prior approaches designed for style transfer.
Tasks	Style Transfer
Published	2019-01-28
URL	http://arxiv.org/abs/1901.09501v2
PDF	http://arxiv.org/pdf/1901.09501v2.pdf
PWC	https://paperswithcode.com/paper/toward-unsupervised-text-content-manipulation
Repo	https://github.com/ZhitingHu/text_content_manipulation
Framework	none

Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks


Title	Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks
Authors	Héctor Andrade-Loarca, Gitta Kutyniok, Ozan Öktem, Philipp Petersen
Abstract	Microlocal analysis provides deep insight into singularity structures and is often crucial for solving inverse problems, predominately, in imaging sciences. Of particular importance is the analysis of wavefront sets and the correct extraction of those. In this paper, we introduce the first algorithmic approach to extract the wavefront set of images, which combines data-based and model-based methods. Based on a celebrated property of the shearlet transform to unravel information on the wavefront set, we extract the wavefront set of an image by first applying a discrete shearlet transform and then feeding local patches of this transform to a deep convolutional neural network trained on labeled data. The resulting algorithm outperforms all competing algorithms in edge-orientation and ramp-orientation detection.
Tasks
Published	2019-01-05
URL	https://arxiv.org/abs/1901.01388v2
PDF	https://arxiv.org/pdf/1901.01388v2.pdf
PWC	https://paperswithcode.com/paper/extraction-of-digital-wavefront-sets-using
Repo	https://github.com/arsenal9971/DeNSE
Framework	none

Are Graph Neural Networks Miscalibrated?


Title	Are Graph Neural Networks Miscalibrated?
Authors	Leonardo Teixeira, Brian Jalaian, Bruno Ribeiro
Abstract	Graph Neural Networks (GNNs) have proven to be successful in many classification tasks, outperforming previous state-of-the-art methods in terms of accuracy. However, accuracy alone is not enough for high-stakes decision making. Decision makers want to know the likelihood that a specific GNN prediction is correct. For this purpose, obtaining calibrated models is essential. In this work, we perform an empirical evaluation of the calibration of state-of-the-art GNNs on multiple datasets. Our experiments show that GNNs can be calibrated in some datasets but also badly miscalibrated in others, and that state-of-the-art calibration methods are helpful but do not fix the problem.
Tasks	Calibration, Decision Making
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02296v2
PDF	https://arxiv.org/pdf/1905.02296v2.pdf
PWC	https://paperswithcode.com/paper/are-graph-neural-networks-miscalibrated
Repo	https://github.com/PurdueMINDS/GNNsMiscalibrated
Framework	pytorch

A comparison of some conformal quantile regression methods


Title	A comparison of some conformal quantile regression methods
Authors	Matteo Sesia, Emmanuel J. Candès
Abstract	We compare two recently proposed methods that combine ideas from conformal inference and quantile regression to produce locally adaptive and marginally valid prediction intervals under sample exchangeability (Romano et al., 2019; Kivaranovic et al., 2019). First, we prove that these two approaches are asymptotically efficient in large samples, under some additional assumptions. Then we compare them empirically on simulated and real data. Our results demonstrate that the method in Romano et al. (2019) typically yields tighter prediction intervals in finite samples. Finally, we discuss how to tune these procedures by fixing the relative proportions of observations used for training and conformalization.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05433v1
PDF	https://arxiv.org/pdf/1909.05433v1.pdf
PWC	https://paperswithcode.com/paper/a-comparison-of-some-conformal-quantile
Repo	https://github.com/msesia/cqr-comparison
Framework	pytorch

Efficient Pose Selection for Interactive Camera Calibration


Title	Efficient Pose Selection for Interactive Camera Calibration
Authors	Pavel Rojtberg, Arjan Kuijper
Abstract	The choice of poses for camera calibration with planar patterns is only rarely considered - yet the calibration precision heavily depends on it. This work presents a pose selection method that finds a compact and robust set of calibration poses and is suitable for interactive calibration. Consequently, singular poses that would lead to an unreliable solution are avoided explicitly, while poses reducing the uncertainty of the calibration are favoured. For this, we use uncertainty propagation. Our method takes advantage of a self-identifying calibration pattern to track the camera pose in real-time. This allows to iteratively guide the user to the target poses, until the desired quality level is reached. Therefore, only a sparse set of key-frames is needed for calibration. The method is evaluated on separate training and testing sets, as well as on synthetic data. Our approach performs better than comparable solutions while requiring 30% less calibration frames.
Tasks	Calibration
Published	2019-07-09
URL	https://arxiv.org/abs/1907.04096v1
PDF	https://arxiv.org/pdf/1907.04096v1.pdf
PWC	https://paperswithcode.com/paper/efficient-pose-selection-for-interactive
Repo	https://github.com/paroj/pose_calib
Framework	none