February 1, 2020

3226 words 16 mins read

Paper Group AWR 214

Paper Group AWR 214

Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. Similarity Learning via Kernel Preserving Embedding. Robust Building-based Registration of Airborne LiDAR Data and Optical Imagery on Urban Scenes. CityFlow: A Multi-Agent Reinforcement Lea …

Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization

Title Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization
Authors Seungyong Moon, Gaon An, Hyun Oh Song
Abstract Solving for adversarial examples with projected gradient descent has been demonstrated to be highly effective in fooling the neural network based classifiers. However, in the black-box setting, the attacker is limited only to the query access to the network and solving for a successful adversarial example becomes much more difficult. To this end, recent methods aim at estimating the true gradient signal based on the input queries but at the cost of excessive queries. We propose an efficient discrete surrogate to the optimization problem which does not require estimating the gradient and consequently becomes free of the first order update hyperparameters to tune. Our experiments on Cifar-10 and ImageNet show the state of the art black-box attack performance with significant reduction in the required queries compared to a number of recently proposed methods. The source code is available at https://github.com/snu-mllab/parsimonious-blackbox-attack.
Tasks Combinatorial Optimization
Published 2019-05-16
URL https://arxiv.org/abs/1905.06635v1
PDF https://arxiv.org/pdf/1905.06635v1.pdf
PWC https://paperswithcode.com/paper/parsimonious-black-box-adversarial-attacks
Repo https://github.com/snu-mllab/parsimonious-blackbox-attack
Framework tf

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

Title COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
Authors Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, Yejin Choi
Abstract We present the first comprehensive study on automatic knowledge base construction for two prevalent commonsense knowledge graphs: ATOMIC (Sap et al., 2019) and ConceptNet (Speer et al., 2017). Contrary to many conventional KBs that store knowledge with canonical templates, commonsense KBs only store loosely structured open-text descriptions of knowledge. We posit that an important step toward automatic commonsense completion is the development of generative models of commonsense knowledge, and propose COMmonsEnse Transformers (COMET) that learn to generate rich and diverse commonsense descriptions in natural language. Despite the challenges of commonsense modeling, our investigation reveals promising results when implicit knowledge from deep pre-trained language models is transferred to generate explicit knowledge in commonsense knowledge graphs. Empirical results demonstrate that COMET is able to generate novel knowledge that humans rate as high quality, with up to 77.5% (ATOMIC) and 91.7% (ConceptNet) precision at top 1, which approaches human performance for these resources. Our findings suggest that using generative commonsense models for automatic commonsense KB completion could soon be a plausible alternative to extractive methods.
Tasks graph construction, Knowledge Graphs
Published 2019-06-12
URL https://arxiv.org/abs/1906.05317v2
PDF https://arxiv.org/pdf/1906.05317v2.pdf
PWC https://paperswithcode.com/paper/comet-commonsense-transformers-for-automatic
Repo https://github.com/Saner3/pytorch-transformers-comet
Framework pytorch

Similarity Learning via Kernel Preserving Embedding

Title Similarity Learning via Kernel Preserving Embedding
Authors Zhao Kang, Yiwei Lu, Yuanzhang Su, Changsheng Li, Zenglin Xu
Abstract Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has been developed and successfully applied in various models, such as low-rank representation, sparse subspace learning, semi-supervised learning. However, it just tries to reconstruct the original data and some valuable information, e.g., the manifold structure, is largely ignored. In this paper, we argue that it is beneficial to preserve the overall relations when we extract similarity information. Specifically, we propose a novel similarity learning framework by minimizing the reconstruction error of kernel matrices, rather than the reconstruction error of original data adopted by existing work. Taking the clustering task as an example to evaluate our method, we observe considerable improvements compared to other state-of-the-art methods. More importantly, our proposed framework is very general and provides a novel and fundamental building block for many other similarity-based tasks. Besides, our proposed kernel preserving opens up a large number of possibilities to embed high-dimensional data into low-dimensional space.
Tasks
Published 2019-03-11
URL http://arxiv.org/abs/1903.04235v1
PDF http://arxiv.org/pdf/1903.04235v1.pdf
PWC https://paperswithcode.com/paper/similarity-learning-via-kernel-preserving
Repo https://github.com/sckangz/SLKE
Framework none

Robust Building-based Registration of Airborne LiDAR Data and Optical Imagery on Urban Scenes

Title Robust Building-based Registration of Airborne LiDAR Data and Optical Imagery on Urban Scenes
Authors Thanh Huy Nguyen, Sylvie Daniel, Didier Gueriot, Christophe Sintes, Jean-Marc Le Caillec
Abstract The motivation of this paper is to address the problem of registering airborne LiDAR data and optical aerial or satellite imagery acquired from different platforms, at different times, with different points of view and levels of detail. In this paper, we present a robust registration method based on building regions, which are extracted from optical images using mean shift segmentation, and from LiDAR data using a 3D point cloud filtering process. The matching of the extracted building segments is then carried out using Graph Transformation Matching (GTM) which allows to determine a common pattern of relative positions of segment centers. Thanks to this registration, the relative shifts between the data sets are significantly reduced, which enables a subsequent fine registration and a resulting high-quality data fusion.
Tasks
Published 2019-04-07
URL http://arxiv.org/abs/1904.03668v1
PDF http://arxiv.org/pdf/1904.03668v1.pdf
PWC https://paperswithcode.com/paper/robust-building-based-registration-of
Repo https://github.com/nthuy190991/igarss2019
Framework none

CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario

Title CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario
Authors Huichu Zhang, Siyuan Feng, Chang Liu, Yaoyao Ding, Yichen Zhu, Zihan Zhou, Weinan Zhang, Yong Yu, Haiming Jin, Zhenhui Li
Abstract Traffic signal control is an emerging application scenario for reinforcement learning. Besides being as an important problem that affects people’s daily life in commuting, traffic signal control poses its unique challenges for reinforcement learning in terms of adapting to dynamic traffic environment and coordinating thousands of agents including vehicles and pedestrians. A key factor in the success of modern reinforcement learning relies on a good simulator to generate a large number of data samples for learning. The most commonly used open-source traffic simulator SUMO is, however, not scalable to large road network and large traffic flow, which hinders the study of reinforcement learning on traffic scenarios. This motivates us to create a new traffic simulator CityFlow with fundamentally optimized data structures and efficient algorithms. CityFlow can support flexible definitions for road network and traffic flow based on synthetic and real-world data. It also provides user-friendly interface for reinforcement learning. Most importantly, CityFlow is more than twenty times faster than SUMO and is capable of supporting city-wide traffic simulation with an interactive render for monitoring. Besides traffic signal control, CityFlow could serve as the base for other transportation studies and can create new possibilities to test machine learning methods in the intelligent transportation domain.
Tasks Multi-agent Reinforcement Learning
Published 2019-05-13
URL https://arxiv.org/abs/1905.05217v1
PDF https://arxiv.org/pdf/1905.05217v1.pdf
PWC https://paperswithcode.com/paper/cityflow-a-multi-agent-reinforcement-learning
Repo https://github.com/cityflow-project/CityFlow
Framework none

Adversarial Explanations for Understanding Image Classification Decisions and Improved Neural Network Robustness

Title Adversarial Explanations for Understanding Image Classification Decisions and Improved Neural Network Robustness
Authors Walt Woods, Jack Chen, Christof Teuscher
Abstract For sensitive problems, such as medical imaging or fraud detection, Neural Network (NN) adoption has been slow due to concerns about their reliability, leading to a number of algorithms for explaining their decisions. NNs have also been found vulnerable to a class of imperceptible attacks, called adversarial examples, which arbitrarily alter the output of the network. Here we demonstrate both that these attacks can invalidate prior attempts to explain the decisions of NNs, and that with very robust networks, the attacks themselves may be leveraged as explanations with greater fidelity to the model. We show that the introduction of a novel regularization technique inspired by the Lipschitz constraint, alongside other proposed improvements, greatly improves an NN’s resistance to adversarial examples. On the ImageNet classification task, we demonstrate a network with an Accuracy-Robustness Area (ARA) of 0.0053, an ARA 2.4x greater than the previous state of the art. Improving the mechanisms by which NN decisions are understood is an important direction for both establishing trust in sensitive domains and learning more about the stimuli to which NNs respond.
Tasks Fraud Detection, Image Classification
Published 2019-06-07
URL https://arxiv.org/abs/1906.02896v2
PDF https://arxiv.org/pdf/1906.02896v2.pdf
PWC https://paperswithcode.com/paper/reliable-classification-explanations-via
Repo https://github.com/wwoods/adversarial-explanations-cifar
Framework pytorch

Controllable Text-to-Image Generation

Title Controllable Text-to-Image Generation
Authors Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr
Abstract In this paper, we propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions. To achieve this, we introduce a word-level spatial and channel-wise attention-driven generator that can disentangle different visual attributes, and allow the model to focus on generating and manipulating subregions corresponding to the most relevant words. Also, a word-level discriminator is proposed to provide fine-grained supervisory feedback by correlating words with image regions, facilitating training an effective generator which is able to manipulate specific visual attributes without affecting the generation of other content. Furthermore, perceptual loss is adopted to reduce the randomness involved in the image generation, and to encourage the generator to manipulate specific attributes required in the modified text. Extensive experiments on benchmark datasets demonstrate that our method outperforms existing state of the art, and is able to effectively manipulate synthetic images using natural language descriptions. Code is available at https://github.com/mrlibw/ControlGAN.
Tasks Image Generation, Text-to-Image Generation
Published 2019-09-16
URL https://arxiv.org/abs/1909.07083v2
PDF https://arxiv.org/pdf/1909.07083v2.pdf
PWC https://paperswithcode.com/paper/controllable-text-to-image-generation
Repo https://github.com/taki0112/ControlGAN-Tensorflow
Framework tf

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

Title PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
Authors Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li
Abstract We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds. Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction to learn more discriminative point cloud features. It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks. Specifically, the proposed framework summarizes the 3D scene with a 3D voxel CNN into a small set of keypoints via a novel voxel set abstraction module to save follow-up computations and also to encode representative scene features. Given the high-quality 3D proposals generated by the voxel CNN, the RoI-grid pooling is proposed to abstract proposal-specific features from the keypoints to the RoI-grid points via keypoint set abstraction with multiple receptive fields. Compared with conventional pooling operations, the RoI-grid feature points encode much richer context information for accurately estimating object confidences and locations. Extensive experiments on both the KITTI dataset and the Waymo Open dataset show that our proposed PV-RCNN surpasses state-of-the-art 3D detection methods with remarkable margins by using only point clouds.
Tasks 3D Object Detection, Object Detection
Published 2019-12-31
URL https://arxiv.org/abs/1912.13192v1
PDF https://arxiv.org/pdf/1912.13192v1.pdf
PWC https://paperswithcode.com/paper/pv-rcnn-point-voxel-feature-set-abstraction
Repo https://github.com/jhultman/PV-RCNN
Framework pytorch

Self-Attention Graph Pooling

Title Self-Attention Graph Pooling
Authors Junhyun Lee, Inyeop Lee, Jaewoo Kang
Abstract Advanced methods of applying deep learning to structured data such as graphs have been proposed in recent years. In particular, studies have focused on generalizing convolutional neural networks to graph data, which includes redefining the convolution and the downsampling (pooling) operations for graphs. The method of generalizing the convolution operation to graphs has been proven to improve performance and is widely used. However, the method of applying downsampling to graphs is still difficult to perform and has room for improvement. In this paper, we propose a graph pooling method based on self-attention. Self-attention using graph convolution allows our pooling method to consider both node features and graph topology. To ensure a fair comparison, the same training procedures and model architectures were used for the existing pooling methods and our method. The experimental results demonstrate that our method achieves superior graph classification performance on the benchmark datasets using a reasonable number of parameters.
Tasks Graph Classification
Published 2019-04-17
URL https://arxiv.org/abs/1904.08082v4
PDF https://arxiv.org/pdf/1904.08082v4.pdf
PWC https://paperswithcode.com/paper/self-attention-graph-pooling
Repo https://github.com/inyeoplee77/SAGPool
Framework pytorch

Toward a Procedural Fruit Tree Rendering Framework for Image Analysis

Title Toward a Procedural Fruit Tree Rendering Framework for Image Analysis
Authors Thomas Duboudin, Maxime Petit, Liming Chen
Abstract We propose a procedural fruit tree rendering framework, based on Blender and Python scripts allowing to generate quickly labeled dataset (i.e. including ground truth semantic segmentation). It is designed to train image analysis deep learning methods (e.g. in a robotic fruit harvesting context), where real labeled training datasets are usually scarce and existing synthetic ones are too specialized. Moreover, the framework includes the possibility to introduce parametrized variations in the model (e.g. lightning conditions, background), producing a dataset with embedded Domain Randomization aspect.
Tasks Semantic Segmentation
Published 2019-07-10
URL https://arxiv.org/abs/1907.04759v1
PDF https://arxiv.org/pdf/1907.04759v1.pdf
PWC https://paperswithcode.com/paper/toward-a-procedural-fruit-tree-rendering
Repo https://github.com/tduboudi/IAMPS2019-Procedural-Fruit-Tree-Rendering-Framework
Framework none

PiNet: A Permutation Invariant Graph Neural Network for Graph Classification

Title PiNet: A Permutation Invariant Graph Neural Network for Graph Classification
Authors Peter Meltzer, Marcelo Daniel Gutierrez Mallea, Peter J. Bentley
Abstract We propose an end-to-end deep learning learning model for graph classification and representation learning that is invariant to permutation of the nodes of the input graphs. We address the challenge of learning a fixed size graph representation for graphs of varying dimensions through a differentiable node attention pooling mechanism. In addition to a theoretical proof of its invariance to permutation, we provide empirical evidence demonstrating the statistically significant gain in accuracy when faced with an isomorphic graph classification task given only a small number of training examples. We analyse the effect of four different matrices to facilitate the local message passing mechanism by which graph convolutions are performed vs. a matrix parametrised by a learned parameter pair able to transition smoothly between the former. Finally, we show that our model achieves competitive classification performance with existing techniques on a set of molecule datasets.
Tasks Graph Classification, Representation Learning
Published 2019-05-08
URL https://arxiv.org/abs/1905.03046v1
PDF https://arxiv.org/pdf/1905.03046v1.pdf
PWC https://paperswithcode.com/paper/pinet-a-permutation-invariant-graph-neural
Repo https://github.com/meltzerpete/PiNet
Framework pytorch

Semi-Supervised Unconstrained Action Unit Detection via Latent Feature Domain

Title Semi-Supervised Unconstrained Action Unit Detection via Latent Feature Domain
Authors Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Xuequan Lu, Lizhuang Ma
Abstract Facial action unit (AU) detection in the wild is a challenging problem, due to the unconstrained variability in facial appearances and the lack of accurate AU annotations. Most existing methods either depend on impractical labor-intensive labeling by experts, or inaccurate pseudo labels. In this paper, we propose an end-to-end semi-supervised unconstrained AU detection framework, which transfers accurate AU labels from a constrained source domain to an unconstrained target domain by exploiting accurate labels of AU-related facial landmarks. Specifically, we map a source image with AU label and a target image without AU label into a latent feature domain by combining source landmark-related feature with target landmark-free feature. Due to the combination of source AU-related information and target AU-free information, the latent feature domain with the transferred source AU label can be learned by maximizing the target-domain AU detection performance. Moreover, to disentangle landmark-related and landmark-free features, we introduce a novel landmark adversarial loss which can solve the multi-player minimax game in adversarial learning. Our framework can also be naturally extended for use with target-domain pseudo AU labels. Extensive experiments show that our method soundly outperforms the baselines, upper-bounds and state-of-the-art approaches on the challenging BP4D, GFT and EmotioNet benchmarks. The code for our method is available at https://github.com/ZhiwenShao/ADLD.
Tasks Action Unit Detection, Image Generation
Published 2019-03-25
URL https://arxiv.org/abs/1903.10143v3
PDF https://arxiv.org/pdf/1903.10143v3.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-unconstrained-action-unit
Repo https://github.com/ZhiwenShao/ADLD
Framework pytorch

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

Title Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Authors Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov
Abstract The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community’s contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.
Tasks Autonomous Driving
Published 2019-12-10
URL https://arxiv.org/abs/1912.04838v6
PDF https://arxiv.org/pdf/1912.04838v6.pdf
PWC https://paperswithcode.com/paper/scalability-in-perception-for-autonomous
Repo https://github.com/JdeRobot/BehaviorSuite
Framework tf

Order-free Learning Alleviating Exposure Bias in Multi-label Classification

Title Order-free Learning Alleviating Exposure Bias in Multi-label Classification
Authors Che-Ping Tsai, Hung-Yi Lee
Abstract Multi-label classification (MLC) assigns multiple labels to each sample. Prior studies show that MLC can be transformed to a sequence prediction problem with a recurrent neural network (RNN) decoder to model the label dependency. However, training a RNN decoder requires a predefined order of labels, which is not directly available in the MLC specification. Besides, RNN thus trained tends to overfit the label combinations in the training set and have difficulty generating unseen label sequences. In this paper, we propose a new framework for MLC which does not rely on a predefined label order and thus alleviates exposure bias. The experimental results on three multi-label classification benchmark datasets show that our method outperforms competitive baselines by a large margin. We also find the proposed approach has a higher probability of generating label combinations not seen during training than the baseline models. The result shows that the proposed approach has better generalization capability.
Tasks Multi-Label Classification
Published 2019-09-08
URL https://arxiv.org/abs/1909.03434v1
PDF https://arxiv.org/pdf/1909.03434v1.pdf
PWC https://paperswithcode.com/paper/order-free-learning-alleviating-exposure-bias
Repo https://github.com/jackyyy0228/Order-free-Learning-Alleviating-Exposure-Bias-in-Multi-label-Classification
Framework pytorch

FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving

Title FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving
Authors Varun Ravi Kumar, Sandesh Athni Hiremath, Stefan Milz, Christian Witt, Clement Pinnard, Senthil Yogamani, Patrick Mader
Abstract Fisheye cameras are commonly used in applications like autonomous driving and surveillance to provide a large field of view ($>180^\circ$). However, they come at the cost of strong non-linear distortion which require more complex algorithms. In this paper, we explore Euclidean distance estimation on fisheye cameras for automotive scenes. Obtaining accurate and dense depth supervision is difficult in practice, but self-supervised learning approaches show promising results and could potentially overcome the problem. We present a novel self-supervised scale-aware framework for learning Euclidean distance and ego-motion from raw monocular fisheye videos without applying rectification. While it is possible to perform piece-wise linear approximation of fisheye projection surface and apply standard rectilinear models, it has its own set of issues like re-sampling distortion and discontinuities in transition regions. To encourage further research in this area, we will release this dataset as part of our WoodScape project \cite{yogamani2019woodscape}. We further evaluated the proposed algorithm on the KITTI dataset and obtained state-of-the-art results comparable to other self-supervised monocular methods. Qualitative results on an unseen fisheye video demonstrate impressive performance, see https://youtu.be/Sgq1WzoOmXg .
Tasks Autonomous Driving
Published 2019-10-07
URL https://arxiv.org/abs/1910.04076v2
PDF https://arxiv.org/pdf/1910.04076v2.pdf
PWC https://paperswithcode.com/paper/fisheyedistancenet-self-supervised-scale
Repo https://github.com/rvarun7777/FisheyeDistanceNet
Framework none
comments powered by Disqus