April 3, 2020

# Paper Group AWR 66

M2m: Imbalanced Classification via Major-to-minor Translation. Toronto-3D: A Large-scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways. Learn to Design the Heuristics for Vehicle Routing Problem. Graph Neural Distance Metric Learning with Graph-Bert. Sound Event Detection with Depthwise Separable and Dilated Convolutions. Regiona …

#### M2m: Imbalanced Classification via Major-to-minor Translation

Title M2m: Imbalanced Classification via Major-to-minor Translation
Authors Jaehyung Kim, Jongheon Jeong, Jinwoo Shin
Abstract In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples (e.g., images) from more-frequent classes. This simple approach enables a classifier to learn more generalizable features of minority classes, by transferring and leveraging the diversity of the majority information. Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods. The performance of our method even surpasses those of previous state-of-the-art methods for the imbalanced classification.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00431v1
PDF https://arxiv.org/pdf/2004.00431v1.pdf
PWC https://paperswithcode.com/paper/m2m-imbalanced-classification-via-major-to
Repo https://github.com/alinlab/M2m
Framework none

#### Toronto-3D: A Large-scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways

Title Toronto-3D: A Large-scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways
Authors Weikai Tan, Nannan Qin, Lingfei Ma, Ying Li, Jing Du, Guorong Cai, Ke Yang, Jonathan Li
Abstract Semantic segmentation of large-scale outdoor point clouds is essential for urban scene understanding in various applications, especially autonomous driving and urban high-definition (HD) mapping. With rapid developments of mobile laser scanning (MLS) or mobile Light Detection and Ranging (LiDAR) systems, massive point clouds are available for scene understanding, but publicly accessible large-scale labeled datasets, which are essential for developing learning-based methods, are still limited. This paper introduces Toronto-3D, a large-scale urban outdoor point cloud dataset acquired by a MLS system in Toronto, Canada for semantic segmentation. This dataset covers approximately 1 km of point clouds and consists of about 78.3 million points with 8 labeled object classes. Baseline experiments for semantic segmentation were conducted and the results confirmed the capability of this dataset to train deep learning models effectively. Toronto-3D is publicly released to encourage new research activities, and the labels will be further improved and updated with feedback from the research community. Toronto-3D dataset can be downloaded at https://github.com/WeikaiTan/Toronto-3D
Tasks Autonomous Driving, Scene Understanding, Semantic Segmentation
Published 2020-03-18
URL https://arxiv.org/abs/2003.08284v2
PDF https://arxiv.org/pdf/2003.08284v2.pdf
PWC https://paperswithcode.com/paper/toronto-3d-a-large-scale-mobile-lidar-dataset
Repo https://github.com/WeikaiTan/Toronto-3D
Framework none

#### Learn to Design the Heuristics for Vehicle Routing Problem

Title Learn to Design the Heuristics for Vehicle Routing Problem
Authors Lei Gao, Mingxiang Chen, Qichang Chen, Ganzhong Luo, Nuoyi Zhu, Zhixin Liu
Abstract This paper presents an approach to learn the local-search heuristics that iteratively improves the solution of Vehicle Routing Problem (VRP). A local-search heuristics is composed of a destroy operator that destructs a candidate solution, and a following repair operator that rebuilds the destructed one into a new one. The proposed neural network, as trained through actor-critic framework, consists of an encoder in form of a modified version of Graph Attention Network where node embeddings and edge embeddings are integrated, and a GRU-based decoder rendering a pair of destroy and repair operators. Experiment results show that it outperforms both the traditional heuristics algorithms and the existing neural combinatorial optimization for VRP on medium-scale data set, and is able to tackle the large-scale data set (e.g., over 400 nodes) which is a considerable challenge in this area. Moreover, the need for expertise and handcrafted heuristics design is eliminated due to the fact that the proposed network learns to design the heuristics with a better performance. Our implementation is available online.
Published 2020-02-20
URL https://arxiv.org/abs/2002.08539v1
PDF https://arxiv.org/pdf/2002.08539v1.pdf
PWC https://paperswithcode.com/paper/learn-to-design-the-heuristics-for-vehicle
Repo https://github.com/water-mirror/NeuLNS
Framework pytorch

#### Graph Neural Distance Metric Learning with Graph-Bert

Title Graph Neural Distance Metric Learning with Graph-Bert
Authors Jiawei Zhang
Abstract Graph distance metric learning serves as the foundation for many graph learning problems, e.g., graph clustering, graph classification and graph matching. Existing research works on graph distance metric (or graph kernels) learning fail to maintain the basic properties of such metrics, e.g., non-negative, identity of indiscernibles, symmetry and triangle inequality, respectively. In this paper, we will introduce a new graph neural network based distance metric learning approaches, namely GB-DISTANCE (GRAPH-BERT based Neural Distance). Solely based on the attention mechanism, GB-DISTANCE can learn graph instance representations effectively based on a pre-trained GRAPH-BERT model. Different from the existing supervised/unsupervised metrics, GB-DISTANCE can be learned effectively in a semi-supervised manner. In addition, GB-DISTANCE can also maintain the distance metric basic properties mentioned above. Extensive experiments have been done on several benchmark graph datasets, and the results demonstrate that GB-DISTANCE can out-perform the existing baseline methods, especially the recent graph neural network model based graph metrics, with a significant gap in computing the graph distance.
Tasks Graph Classification, Graph Clustering, Graph Matching, Metric Learning
Published 2020-02-09
URL https://arxiv.org/abs/2002.03427v1
PDF https://arxiv.org/pdf/2002.03427v1.pdf
PWC https://paperswithcode.com/paper/graph-neural-distance-metric-learning-with
Repo https://github.com/jwzhanggy/graph_bert_work
Framework none

#### Sound Event Detection with Depthwise Separable and Dilated Convolutions

Title Sound Event Detection with Depthwise Separable and Dilated Convolutions
Authors Konstantinos Drossos, Stylianos I. Mimilakis, Shayan Gharib, Yanxiong Li, Tuomas Virtanen
Abstract State-of-the-art sound event detection (SED) methods usually employ a series of convolutional neural networks (CNNs) to extract useful features from the input audio signal, and then recurrent neural networks (RNNs) to model longer temporal context in the extracted features. The number of the channels of the CNNs and size of the weight matrices of the RNNs have a direct effect on the total amount of parameters of the SED method, which is to a couple of millions. Additionally, the usually long sequences that are used as an input to an SED method along with the employment of an RNN, introduce implications like increased training time, difficulty at gradient flow, and impeding the parallelization of the SED method. To tackle all these problems, we propose the replacement of the CNNs with depthwise separable convolutions and the replacement of the RNNs with dilated convolutions. We compare the proposed method to a baseline convolutional neural network on a SED task, and achieve a reduction of the amount of parameters by 85% and average training time per epoch by 78%, and an increase the average frame-wise F1 score and reduction of the average error rate by 4.6% and 3.8%, respectively.
Published 2020-02-02
URL https://arxiv.org/abs/2002.00476v1
PDF https://arxiv.org/pdf/2002.00476v1.pdf
PWC https://paperswithcode.com/paper/sound-event-detection-with-depthwise
Repo https://github.com/dr-costas/dnd-sed
Framework pytorch

#### Regional Registration of Whole Slide Image Stacks Containing Highly Deformed Artefacts

Title Regional Registration of Whole Slide Image Stacks Containing Highly Deformed Artefacts
Authors Mahsa Paknezhad, Sheng Yang Michael Loh, Yukti Choudhury, Valerie Koh Cui Koh, TimothyTay Kwang Yong, Hui Shan Tan, Ravindran Kanesvaran, Puay Hoon Tan, John Yuen Shyi Peng, Weimiao Yu, Yongcheng Benjamin Tan, Yong Zhen Loy, Min-Han Tan, Hwee Kuan Lee
Abstract Motivation: High resolution 2D whole slide imaging provides rich information about the tissue structure. This information can be a lot richer if these 2D images can be stacked into a 3D tissue volume. A 3D analysis, however, requires accurate reconstruction of the tissue volume from the 2D image stack. This task is not trivial due to the distortions that each individual tissue slice experiences while cutting and mounting the tissue on the glass slide. Performing registration for the whole tissue slices may be adversely affected by the deformed tissue regions. Consequently, regional registration is found to be more effective. In this paper, we propose an accurate and robust regional registration algorithm for whole slide images which incrementally focuses registration on the area around the region of interest. Results: Using mean similarity index as the metric, the proposed algorithm (mean $\pm$ std: $0.84 \pm 0.11$) followed by a fine registration algorithm ($0.86 \pm 0.08$) outperformed the state-of-the-art linear whole tissue registration algorithm ($0.74 \pm 0.19$) and the regional version of this algorithm ($0.81 \pm 0.15$). The proposed algorithm also outperforms the state-of-the-art nonlinear registration algorithm (original : $0.82 \pm 0.12$, regional : $0.77 \pm 0.22$) for whole slide images and a recently proposed patch-based registration algorithm (patch size 256: $0.79 \pm 0.16$ , patch size 512: $0.77 \pm 0.16$) for medical images. Availability: The C++ implementation code is available online at the github repository: https://github.com/MahsaPaknezhad/WSIRegistration
Published 2020-02-28
URL https://arxiv.org/abs/2002.12588v1
PDF https://arxiv.org/pdf/2002.12588v1.pdf
PWC https://paperswithcode.com/paper/regional-registration-of-whole-slide-image
Framework none

#### Learning Compositional Rules via Neural Program Synthesis

Title Learning Compositional Rules via Neural Program Synthesis
Authors Maxwell I. Nye, Armando Solar-Lezama, Joshua B. Tenenbaum, Brenden M. Lake
Abstract Many aspects of human reasoning, including language, require learning rules from very little data. Humans can do this, often learning systematic rules from very few examples, and combining these rules to form compositional rule-based systems. Current neural architectures, on the other hand, often fail to generalize in a compositional manner, especially when evaluated in ways that vary systematically from training. In this work, we present a neuro-symbolic model which learns entire rule systems from a small set of examples. Instead of directly predicting outputs from inputs, we train our model to induce the explicit system of rules governing a set of previously seen examples, drawing upon techniques from the neural program synthesis literature. Our rule-synthesis approach outperforms neural meta-learning techniques in three domains: an artificial instruction-learning domain used to evaluate human learning, the SCAN challenge datasets, and learning rule-based translations of number words into integers for a wide range of human languages.
Published 2020-03-12
URL https://arxiv.org/abs/2003.05562v1
PDF https://arxiv.org/pdf/2003.05562v1.pdf
PWC https://paperswithcode.com/paper/learning-compositional-rules-via-neural
Repo https://github.com/mtensor/rulesynthesis
Framework pytorch

#### Incorporating Relational Background Knowledge into Reinforcement Learning via Differentiable Inductive Logic Programming

Title Incorporating Relational Background Knowledge into Reinforcement Learning via Differentiable Inductive Logic Programming
Authors Ali Payani, Faramarz Fekri
Abstract Relational Reinforcement Learning (RRL) can offers various desirable features. Most importantly, it allows for incorporating expert knowledge into the learning, and hence leading to much faster learning and better generalization compared to the standard deep reinforcement learning. However, most of the existing RRL approaches are either incapable of incorporating expert background knowledge (e.g., in the form of explicit predicate language) or are not able to learn directly from non-relational data such as image. In this paper, we propose a novel deep RRL based on a differentiable Inductive Logic Programming (ILP) that can effectively learn relational information from image and present the state of the environment as first order logic predicates. Additionally, it can take the expert background knowledge and incorporate it into the learning problem using appropriate predicates. The differentiable ILP allows an end to end optimization of the entire framework for learning the policy in RRL. We show the efficacy of this novel RRL framework using environments such as BoxWorld, GridWorld as well as relational reasoning for the Sort-of-CLEVR dataset.
Published 2020-03-23
URL https://arxiv.org/abs/2003.10386v1
PDF https://arxiv.org/pdf/2003.10386v1.pdf
PWC https://paperswithcode.com/paper/incorporating-relational-background-knowledge
Repo https://github.com/dnlRRL2020/RRL
Framework tf

#### X-Linear Attention Networks for Image Captioning

Title X-Linear Attention Networks for Image Captioning
Authors Yingwei Pan, Ting Yao, Yehao Li, Tao Mei
Abstract Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2$^{nd}$ order interactions across multi-modal inputs. Nevertheless, there has not been evidence in support of building such interactions concurrently with attention mechanism for image captioning. In this paper, we introduce a unified attention block – X-Linear attention block, that fully employs bilinear pooling to selectively capitalize on visual information or perform multi-modal reasoning. Technically, X-Linear attention block simultaneously exploits both the spatial and channel-wise bilinear attention distributions to capture the 2$^{nd}$ order interactions between the input single-modal or multi-modal features. Higher and even infinity order feature interactions are readily modeled through stacking multiple X-Linear attention blocks and equipping the block with Exponential Linear Unit (ELU) in a parameter-free fashion, respectively. Furthermore, we present X-Linear Attention Networks (dubbed as X-LAN) that novelly integrates X-Linear attention block(s) into image encoder and sentence decoder of image captioning model to leverage higher order intra- and inter-modal interactions. The experiments on COCO benchmark demonstrate that our X-LAN obtains to-date the best published CIDEr performance of 132.0% on COCO Karpathy test split. When further endowing Transformer with X-Linear attention blocks, CIDEr is boosted up to 132.8%. Source code is available at \url{https://github.com/Panda-Peter/image-captioning}.
Published 2020-03-31
URL https://arxiv.org/abs/2003.14080v1
PDF https://arxiv.org/pdf/2003.14080v1.pdf
PWC https://paperswithcode.com/paper/x-linear-attention-networks-for-image
Repo https://github.com/Panda-Peter/image-captioning
Framework pytorch

Authors An Zhao, Mingyu Ding, Zhiwu Lu, Tao Xiang, Yulei Niu, Jiechao Guan, Ji-Rong Wen, Ping Luo
Abstract Existing few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples. However, in practice this assumption is often invalid – the target classes could come from a different domain. This poses an additional challenge of domain adaptation (DA) with few training samples. In this paper, the problem of domain-adaptive few-shot learning (DA-FSL) is tackled, which requires solving FSL and DA in a unified framework. To this end, we propose a novel domain-adversarial prototypical network (DAPN) model. It is designed to address a specific challenge in DA-FSL: the DA objective means that the source and target data distributions need to be aligned, typically through a shared domain-adaptive feature embedding space; but the FSL objective dictates that the target domain per class distribution must be different from that of any source domain class, meaning aligning the distributions across domains may harm the FSL performance. How to achieve global domain distribution alignment whilst maintaining source/target per-class discriminativeness thus becomes the key. Our solution is to explicitly enhance the source/target per-class separation before domain-adaptive feature embedding learning in the DAPN, in order to alleviate the negative effect of domain alignment on FSL. Extensive experiments show that our DAPN outperforms the state-of-the-art FSL and DA models, as well as their na"ive combinations. The code is available at https://github.com/dingmyu/DAPN.
Published 2020-03-19
URL https://arxiv.org/abs/2003.08626v1
PDF https://arxiv.org/pdf/2003.08626v1.pdf
Repo https://github.com/dingmyu/DAPN
Framework pytorch

#### Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors

Title Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Authors Jingliang Duan, Yang Guan, Shengbo Eben Li, Yangang Ren, Bo Cheng
Abstract In current reinforcement learning (RL) methods, function approximation errors are known to lead to the overestimated or underestimated Q-value estimates, thus resulting in suboptimal policies. We show that the learning of a state-action return distribution function can be used to improve the Q-value estimation accuracy. We employ the return distribution function within the maximum entropy RL framework in order to develop what we call the Distributional Soft Actor-Critic (DSAC) algorithm, which is an off-policy method for continuous control setting. Unlike traditional distributional RL algorithms which typically only learn a discrete return distribution, DSAC directly learns a continuous return distribution by truncating the difference between the target and current distribution to prevent gradient explosion. Additionally, we propose a new Parallel Asynchronous Buffer-Actor-Learner architecture (PABAL) to improve the learning efficiency, which is a generalization of current high-throughput learning architectures. We evaluate our method on the suite of MuJoCo continuous control tasks, achieving state-of-the-art performance.
Published 2020-01-09
URL https://arxiv.org/abs/2001.02811v2
PDF https://arxiv.org/pdf/2001.02811v2.pdf
Repo https://github.com/Jingliang-Duan/Distributional-Soft-Actor-Critic
Framework pytorch

#### Disrupting Deepfakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems

Title Disrupting Deepfakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems
Authors Nataniel Ruiz, Sarah Adel Bargal, Stan Sclaroff
Abstract Face modification systems using deep learning have become increasingly powerful and accessible. Given images of a person’s face, such systems can generate new images of that same person under different expressions and poses. Some systems can also modify targeted attributes such as hair color or age. This type of manipulated images and video have been coined Deepfakes. In order to prevent a malicious user from generating modified images of a person without their consent we tackle the new problem of generating adversarial attacks against such image translation systems, which disrupt the resulting output image. We call this problem disrupting deepfakes. Most image translation architectures are generative models conditioned on an attribute (e.g. put a smile on this person’s face). We are first to propose and successfully apply (1) class transferable adversarial attacks that generalize to different classes, which means that the attacker does not need to have knowledge about the conditioning class, and (2) adversarial training for generative adversarial networks (GANs) as a first step towards robust image translation networks. Finally, in gray-box scenarios, blurring can mount a successful defense against disruption. We present a spread-spectrum adversarial attack, which evades blur defenses.
Published 2020-03-03
URL https://arxiv.org/abs/2003.01279v2
PDF https://arxiv.org/pdf/2003.01279v2.pdf
Repo https://github.com/natanielruiz/disrupting-deepfakes
Framework pytorch

#### Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Title Look-into-Object: Self-supervised Structure Modeling for Object Recognition
Authors Mohan Zhou, Yalong Bai, Wei Zhang, Tiejun Zhao, Tao Mei
Abstract Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to “look into object” (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: https://github.com/JDAI-CV/LIO.
Tasks Object Detection, Object Recognition, Representation Learning
Published 2020-03-31
URL https://arxiv.org/abs/2003.14142v1
PDF https://arxiv.org/pdf/2003.14142v1.pdf
PWC https://paperswithcode.com/paper/look-into-object-self-supervised-structure
Repo https://github.com/JDAI-CV/LIO
Framework none

#### Discovering Nonlinear Relations with Minimum Predictive Information Regularization

Title Discovering Nonlinear Relations with Minimum Predictive Information Regularization
Authors Tailin Wu, Thomas Breuel, Michael Skuhersky, Jan Kautz
Abstract Identifying the underlying directional relations from observational time series with nonlinear interactions and complex relational structures is key to a wide range of applications, yet remains a hard problem. In this work, we introduce a novel minimum predictive information regularization method to infer directional relations from time series, allowing deep learning models to discover nonlinear relations. Our method substantially outperforms other methods for learning nonlinear relations in synthetic datasets, and discovers the directional relations in a video game environment and a heart-rate vs. breath-rate dataset.
Published 2020-01-07
URL https://arxiv.org/abs/2001.01885v1
PDF https://arxiv.org/pdf/2001.01885v1.pdf
PWC https://paperswithcode.com/paper/discovering-nonlinear-relations-with-minimum
Repo https://github.com/tailintalent/causal
Framework pytorch

#### Biologically-Motivated Deep Learning Method using Hierarchical Competitive Learning

Title Biologically-Motivated Deep Learning Method using Hierarchical Competitive Learning
Authors Takashi Shinozaki
Abstract This study proposes a novel biologically-motivated learning method for deep convolutional neural networks (CNNs). The combination of CNNs and back propagation (BP) learning is the most powerful method in recent machine learning regimes. However, it requires large labeled data for training, and this requirement can occasionally become a barrier for real world applications. To address this problem and utilize unlabeled data, I propose to introduce unsupervised competitive learning which only requires forward propagating signals as a pre-training method for CNNs. The method was evaluated by image discrimination tasks using MNIST, CIFAR-10, and ImageNet datasets, and it achieved a state-of-the-art performance as a biologically-motivated method in the ImageNet experiment. The results suggested that the method enables higher-level learning representations solely from forward propagating signals without a backward error signal for the learning of convolutional layers. The proposed method could be useful for a variety of poorly labeled data, for example, time series or medical data.