February 1, 2020

3226 words 16 mins read

Paper Group AWR 214

Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. Similarity Learning via Kernel Preserving Embedding. Robust Building-based Registration of Airborne LiDAR Data and Optical Imagery on Urban Scenes. CityFlow: A Multi-Agent Reinforcement Lea …

Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization


Title	Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization
Authors	Seungyong Moon, Gaon An, Hyun Oh Song
Abstract	Solving for adversarial examples with projected gradient descent has been demonstrated to be highly effective in fooling the neural network based classifiers. However, in the black-box setting, the attacker is limited only to the query access to the network and solving for a successful adversarial example becomes much more difficult. To this end, recent methods aim at estimating the true gradient signal based on the input queries but at the cost of excessive queries. We propose an efficient discrete surrogate to the optimization problem which does not require estimating the gradient and consequently becomes free of the first order update hyperparameters to tune. Our experiments on Cifar-10 and ImageNet show the state of the art black-box attack performance with significant reduction in the required queries compared to a number of recently proposed methods. The source code is available at https://github.com/snu-mllab/parsimonious-blackbox-attack.
Tasks	Combinatorial Optimization
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06635v1
PDF	https://arxiv.org/pdf/1905.06635v1.pdf
PWC	https://paperswithcode.com/paper/parsimonious-black-box-adversarial-attacks
Repo	https://github.com/snu-mllab/parsimonious-blackbox-attack
Framework	tf

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction


Title	COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
Authors	Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, Yejin Choi
Abstract	We present the first comprehensive study on automatic knowledge base construction for two prevalent commonsense knowledge graphs: ATOMIC (Sap et al., 2019) and ConceptNet (Speer et al., 2017). Contrary to many conventional KBs that store knowledge with canonical templates, commonsense KBs only store loosely structured open-text descriptions of knowledge. We posit that an important step toward automatic commonsense completion is the development of generative models of commonsense knowledge, and propose COMmonsEnse Transformers (COMET) that learn to generate rich and diverse commonsense descriptions in natural language. Despite the challenges of commonsense modeling, our investigation reveals promising results when implicit knowledge from deep pre-trained language models is transferred to generate explicit knowledge in commonsense knowledge graphs. Empirical results demonstrate that COMET is able to generate novel knowledge that humans rate as high quality, with up to 77.5% (ATOMIC) and 91.7% (ConceptNet) precision at top 1, which approaches human performance for these resources. Our findings suggest that using generative commonsense models for automatic commonsense KB completion could soon be a plausible alternative to extractive methods.
Tasks	graph construction, Knowledge Graphs
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05317v2
PDF	https://arxiv.org/pdf/1906.05317v2.pdf
PWC	https://paperswithcode.com/paper/comet-commonsense-transformers-for-automatic
Repo	https://github.com/Saner3/pytorch-transformers-comet
Framework	pytorch

Similarity Learning via Kernel Preserving Embedding


Title	Similarity Learning via Kernel Preserving Embedding
Authors	Zhao Kang, Yiwei Lu, Yuanzhang Su, Changsheng Li, Zenglin Xu
Abstract	Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has been developed and successfully applied in various models, such as low-rank representation, sparse subspace learning, semi-supervised learning. However, it just tries to reconstruct the original data and some valuable information, e.g., the manifold structure, is largely ignored. In this paper, we argue that it is beneficial to preserve the overall relations when we extract similarity information. Specifically, we propose a novel similarity learning framework by minimizing the reconstruction error of kernel matrices, rather than the reconstruction error of original data adopted by existing work. Taking the clustering task as an example to evaluate our method, we observe considerable improvements compared to other state-of-the-art methods. More importantly, our proposed framework is very general and provides a novel and fundamental building block for many other similarity-based tasks. Besides, our proposed kernel preserving opens up a large number of possibilities to embed high-dimensional data into low-dimensional space.
Tasks
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04235v1
PDF	http://arxiv.org/pdf/1903.04235v1.pdf
PWC	https://paperswithcode.com/paper/similarity-learning-via-kernel-preserving
Repo	https://github.com/sckangz/SLKE
Framework	none

Robust Building-based Registration of Airborne LiDAR Data and Optical Imagery on Urban Scenes


Title	Robust Building-based Registration of Airborne LiDAR Data and Optical Imagery on Urban Scenes
Authors	Thanh Huy Nguyen, Sylvie Daniel, Didier Gueriot, Christophe Sintes, Jean-Marc Le Caillec
Abstract	The motivation of this paper is to address the problem of registering airborne LiDAR data and optical aerial or satellite imagery acquired from different platforms, at different times, with different points of view and levels of detail. In this paper, we present a robust registration method based on building regions, which are extracted from optical images using mean shift segmentation, and from LiDAR data using a 3D point cloud filtering process. The matching of the extracted building segments is then carried out using Graph Transformation Matching (GTM) which allows to determine a common pattern of relative positions of segment centers. Thanks to this registration, the relative shifts between the data sets are significantly reduced, which enables a subsequent fine registration and a resulting high-quality data fusion.
Tasks
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03668v1
PDF	http://arxiv.org/pdf/1904.03668v1.pdf
PWC	https://paperswithcode.com/paper/robust-building-based-registration-of
Repo	https://github.com/nthuy190991/igarss2019
Framework	none

CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario


Title	CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario
Authors	Huichu Zhang, Siyuan Feng, Chang Liu, Yaoyao Ding, Yichen Zhu, Zihan Zhou, Weinan Zhang, Yong Yu, Haiming Jin, Zhenhui Li
Abstract	Traffic signal control is an emerging application scenario for reinforcement learning. Besides being as an important problem that affects people’s daily life in commuting, traffic signal control poses its unique challenges for reinforcement learning in terms of adapting to dynamic traffic environment and coordinating thousands of agents including vehicles and pedestrians. A key factor in the success of modern reinforcement learning relies on a good simulator to generate a large number of data samples for learning. The most commonly used open-source traffic simulator SUMO is, however, not scalable to large road network and large traffic flow, which hinders the study of reinforcement learning on traffic scenarios. This motivates us to create a new traffic simulator CityFlow with fundamentally optimized data structures and efficient algorithms. CityFlow can support flexible definitions for road network and traffic flow based on synthetic and real-world data. It also provides user-friendly interface for reinforcement learning. Most importantly, CityFlow is more than twenty times faster than SUMO and is capable of supporting city-wide traffic simulation with an interactive render for monitoring. Besides traffic signal control, CityFlow could serve as the base for other transportation studies and can create new possibilities to test machine learning methods in the intelligent transportation domain.
Tasks	Multi-agent Reinforcement Learning
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05217v1
PDF	https://arxiv.org/pdf/1905.05217v1.pdf
PWC	https://paperswithcode.com/paper/cityflow-a-multi-agent-reinforcement-learning
Repo	https://github.com/cityflow-project/CityFlow
Framework	none

Adversarial Explanations for Understanding Image Classification Decisions and Improved Neural Network Robustness


Title	Adversarial Explanations for Understanding Image Classification Decisions and Improved Neural Network Robustness
Authors	Walt Woods, Jack Chen, Christof Teuscher
Abstract	For sensitive problems, such as medical imaging or fraud detection, Neural Network (NN) adoption has been slow due to concerns about their reliability, leading to a number of algorithms for explaining their decisions. NNs have also been found vulnerable to a class of imperceptible attacks, called adversarial examples, which arbitrarily alter the output of the network. Here we demonstrate both that these attacks can invalidate prior attempts to explain the decisions of NNs, and that with very robust networks, the attacks themselves may be leveraged as explanations with greater fidelity to the model. We show that the introduction of a novel regularization technique inspired by the Lipschitz constraint, alongside other proposed improvements, greatly improves an NN’s resistance to adversarial examples. On the ImageNet classification task, we demonstrate a network with an Accuracy-Robustness Area (ARA) of 0.0053, an ARA 2.4x greater than the previous state of the art. Improving the mechanisms by which NN decisions are understood is an important direction for both establishing trust in sensitive domains and learning more about the stimuli to which NNs respond.
Tasks	Fraud Detection, Image Classification
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02896v2
PDF	https://arxiv.org/pdf/1906.02896v2.pdf
PWC	https://paperswithcode.com/paper/reliable-classification-explanations-via
Repo	https://github.com/wwoods/adversarial-explanations-cifar
Framework	pytorch

Controllable Text-to-Image Generation


Title	Controllable Text-to-Image Generation
Authors	Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr
Abstract	In this paper, we propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions. To achieve this, we introduce a word-level spatial and channel-wise attention-driven generator that can disentangle different visual attributes, and allow the model to focus on generating and manipulating subregions corresponding to the most relevant words. Also, a word-level discriminator is proposed to provide fine-grained supervisory feedback by correlating words with image regions, facilitating training an effective generator which is able to manipulate specific visual attributes without affecting the generation of other content. Furthermore, perceptual loss is adopted to reduce the randomness involved in the image generation, and to encourage the generator to manipulate specific attributes required in the modified text. Extensive experiments on benchmark datasets demonstrate that our method outperforms existing state of the art, and is able to effectively manipulate synthetic images using natural language descriptions. Code is available at https://github.com/mrlibw/ControlGAN.
Tasks	Image Generation, Text-to-Image Generation
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07083v2
PDF	https://arxiv.org/pdf/1909.07083v2.pdf
PWC	https://paperswithcode.com/paper/controllable-text-to-image-generation
Repo	https://github.com/taki0112/ControlGAN-Tensorflow
Framework	tf

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection


Title	PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
Authors	Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li
Abstract	We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds. Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction to learn more discriminative point cloud features. It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks. Specifically, the proposed framework summarizes the 3D scene with a 3D voxel CNN into a small set of keypoints via a novel voxel set abstraction module to save follow-up computations and also to encode representative scene features. Given the high-quality 3D proposals generated by the voxel CNN, the RoI-grid pooling is proposed to abstract proposal-specific features from the keypoints to the RoI-grid points via keypoint set abstraction with multiple receptive fields. Compared with conventional pooling operations, the RoI-grid feature points encode much richer context information for accurately estimating object confidences and locations. Extensive experiments on both the KITTI dataset and the Waymo Open dataset show that our proposed PV-RCNN surpasses state-of-the-art 3D detection methods with remarkable margins by using only point clouds.
Tasks	3D Object Detection, Object Detection
Published	2019-12-31
URL	https://arxiv.org/abs/1912.13192v1
PDF	https://arxiv.org/pdf/1912.13192v1.pdf
PWC	https://paperswithcode.com/paper/pv-rcnn-point-voxel-feature-set-abstraction
Repo	https://github.com/jhultman/PV-RCNN
Framework	pytorch

Self-Attention Graph Pooling


Title	Self-Attention Graph Pooling
Authors	Junhyun Lee, Inyeop Lee, Jaewoo Kang
Abstract	Advanced methods of applying deep learning to structured data such as graphs have been proposed in recent years. In particular, studies have focused on generalizing convolutional neural networks to graph data, which includes redefining the convolution and the downsampling (pooling) operations for graphs. The method of generalizing the convolution operation to graphs has been proven to improve performance and is widely used. However, the method of applying downsampling to graphs is still difficult to perform and has room for improvement. In this paper, we propose a graph pooling method based on self-attention. Self-attention using graph convolution allows our pooling method to consider both node features and graph topology. To ensure a fair comparison, the same training procedures and model architectures were used for the existing pooling methods and our method. The experimental results demonstrate that our method achieves superior graph classification performance on the benchmark datasets using a reasonable number of parameters.
Tasks	Graph Classification
Published	2019-04-17
URL	https://arxiv.org/abs/1904.08082v4
PDF	https://arxiv.org/pdf/1904.08082v4.pdf
PWC	https://paperswithcode.com/paper/self-attention-graph-pooling
Repo	https://github.com/inyeoplee77/SAGPool
Framework	pytorch

Toward a Procedural Fruit Tree Rendering Framework for Image Analysis


Title	Toward a Procedural Fruit Tree Rendering Framework for Image Analysis
Authors	Thomas Duboudin, Maxime Petit, Liming Chen
Abstract	We propose a procedural fruit tree rendering framework, based on Blender and Python scripts allowing to generate quickly labeled dataset (i.e. including ground truth semantic segmentation). It is designed to train image analysis deep learning methods (e.g. in a robotic fruit harvesting context), where real labeled training datasets are usually scarce and existing synthetic ones are too specialized. Moreover, the framework includes the possibility to introduce parametrized variations in the model (e.g. lightning conditions, background), producing a dataset with embedded Domain Randomization aspect.
Tasks	Semantic Segmentation
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04759v1
PDF	https://arxiv.org/pdf/1907.04759v1.pdf
PWC	https://paperswithcode.com/paper/toward-a-procedural-fruit-tree-rendering
Repo	https://github.com/tduboudi/IAMPS2019-Procedural-Fruit-Tree-Rendering-Framework
Framework	none

PiNet: A Permutation Invariant Graph Neural Network for Graph Classification


Title	PiNet: A Permutation Invariant Graph Neural Network for Graph Classification
Authors	Peter Meltzer, Marcelo Daniel Gutierrez Mallea, Peter J. Bentley
Abstract	We propose an end-to-end deep learning learning model for graph classification and representation learning that is invariant to permutation of the nodes of the input graphs. We address the challenge of learning a fixed size graph representation for graphs of varying dimensions through a differentiable node attention pooling mechanism. In addition to a theoretical proof of its invariance to permutation, we provide empirical evidence demonstrating the statistically significant gain in accuracy when faced with an isomorphic graph classification task given only a small number of training examples. We analyse the effect of four different matrices to facilitate the local message passing mechanism by which graph convolutions are performed vs. a matrix parametrised by a learned parameter pair able to transition smoothly between the former. Finally, we show that our model achieves competitive classification performance with existing techniques on a set of molecule datasets.
Tasks	Graph Classification, Representation Learning
Published	2019-05-08
URL	https://arxiv.org/abs/1905.03046v1
PDF	https://arxiv.org/pdf/1905.03046v1.pdf
PWC	https://paperswithcode.com/paper/pinet-a-permutation-invariant-graph-neural
Repo	https://github.com/meltzerpete/PiNet
Framework	pytorch

Semi-Supervised Unconstrained Action Unit Detection via Latent Feature Domain


Title	Semi-Supervised Unconstrained Action Unit Detection via Latent Feature Domain
Authors	Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Xuequan Lu, Lizhuang Ma
Abstract	Facial action unit (AU) detection in the wild is a challenging problem, due to the unconstrained variability in facial appearances and the lack of accurate AU annotations. Most existing methods either depend on impractical labor-intensive labeling by experts, or inaccurate pseudo labels. In this paper, we propose an end-to-end semi-supervised unconstrained AU detection framework, which transfers accurate AU labels from a constrained source domain to an unconstrained target domain by exploiting accurate labels of AU-related facial landmarks. Specifically, we map a source image with AU label and a target image without AU label into a latent feature domain by combining source landmark-related feature with target landmark-free feature. Due to the combination of source AU-related information and target AU-free information, the latent feature domain with the transferred source AU label can be learned by maximizing the target-domain AU detection performance. Moreover, to disentangle landmark-related and landmark-free features, we introduce a novel landmark adversarial loss which can solve the multi-player minimax game in adversarial learning. Our framework can also be naturally extended for use with target-domain pseudo AU labels. Extensive experiments show that our method soundly outperforms the baselines, upper-bounds and state-of-the-art approaches on the challenging BP4D, GFT and EmotioNet benchmarks. The code for our method is available at https://github.com/ZhiwenShao/ADLD.
Tasks	Action Unit Detection, Image Generation
Published	2019-03-25
URL	https://arxiv.org/abs/1903.10143v3
PDF	https://arxiv.org/pdf/1903.10143v3.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-unconstrained-action-unit
Repo	https://github.com/ZhiwenShao/ADLD
Framework	pytorch

Scalability in Perception for Autonomous Driving: Waymo Open Dataset


Title	Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Authors	Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov
Abstract	The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community’s contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.
Tasks	Autonomous Driving
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04838v6
PDF	https://arxiv.org/pdf/1912.04838v6.pdf
PWC	https://paperswithcode.com/paper/scalability-in-perception-for-autonomous
Repo	https://github.com/JdeRobot/BehaviorSuite
Framework	tf

Order-free Learning Alleviating Exposure Bias in Multi-label Classification


Title	Order-free Learning Alleviating Exposure Bias in Multi-label Classification
Authors	Che-Ping Tsai, Hung-Yi Lee
Abstract	Multi-label classification (MLC) assigns multiple labels to each sample. Prior studies show that MLC can be transformed to a sequence prediction problem with a recurrent neural network (RNN) decoder to model the label dependency. However, training a RNN decoder requires a predefined order of labels, which is not directly available in the MLC specification. Besides, RNN thus trained tends to overfit the label combinations in the training set and have difficulty generating unseen label sequences. In this paper, we propose a new framework for MLC which does not rely on a predefined label order and thus alleviates exposure bias. The experimental results on three multi-label classification benchmark datasets show that our method outperforms competitive baselines by a large margin. We also find the proposed approach has a higher probability of generating label combinations not seen during training than the baseline models. The result shows that the proposed approach has better generalization capability.
Tasks	Multi-Label Classification
Published	2019-09-08
URL	https://arxiv.org/abs/1909.03434v1
PDF	https://arxiv.org/pdf/1909.03434v1.pdf
PWC	https://paperswithcode.com/paper/order-free-learning-alleviating-exposure-bias
Repo	https://github.com/jackyyy0228/Order-free-Learning-Alleviating-Exposure-Bias-in-Multi-label-Classification
Framework	pytorch

FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving


Title	FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving
Authors	Varun Ravi Kumar, Sandesh Athni Hiremath, Stefan Milz, Christian Witt, Clement Pinnard, Senthil Yogamani, Patrick Mader
Abstract	Fisheye cameras are commonly used in applications like autonomous driving and surveillance to provide a large field of view ($>180^\circ$). However, they come at the cost of strong non-linear distortion which require more complex algorithms. In this paper, we explore Euclidean distance estimation on fisheye cameras for automotive scenes. Obtaining accurate and dense depth supervision is difficult in practice, but self-supervised learning approaches show promising results and could potentially overcome the problem. We present a novel self-supervised scale-aware framework for learning Euclidean distance and ego-motion from raw monocular fisheye videos without applying rectification. While it is possible to perform piece-wise linear approximation of fisheye projection surface and apply standard rectilinear models, it has its own set of issues like re-sampling distortion and discontinuities in transition regions. To encourage further research in this area, we will release this dataset as part of our WoodScape project \cite{yogamani2019woodscape}. We further evaluated the proposed algorithm on the KITTI dataset and obtained state-of-the-art results comparable to other self-supervised monocular methods. Qualitative results on an unseen fisheye video demonstrate impressive performance, see https://youtu.be/Sgq1WzoOmXg .
Tasks	Autonomous Driving
Published	2019-10-07
URL	https://arxiv.org/abs/1910.04076v2
PDF	https://arxiv.org/pdf/1910.04076v2.pdf
PWC	https://paperswithcode.com/paper/fisheyedistancenet-self-supervised-scale
Repo	https://github.com/rvarun7777/FisheyeDistanceNet
Framework	none