April 3, 2020

3288 words 16 mins read

Paper Group AWR 63

Closed-loop deep learning: generating forward models with back-propagation. GenDICE: Generalized Offline Estimation of Stationary Values. Towards Rapid and Robust Adversarial Training with One-Step Attacks. An Auxiliary Task for Learning Nuclei Segmentation in 3D Microscopy Images. Adversarial Training for Aspect-Based Sentiment Analysis with BERT. …

Closed-loop deep learning: generating forward models with back-propagation


Title	Closed-loop deep learning: generating forward models with back-propagation
Authors	Sama Daryanavard, Bernd Porr
Abstract	A reflex is a simple closed loop control approach which tries to minimise an error but fails to do so because it will always react too late. An adaptive algorithm can use this error to learn a forward model with the help of predictive cues. For example a driver learns to improve their steering by looking ahead to avoid steering in the last minute. In order to process complex cues such as the road ahead deep learning is a natural choice. However, this is usually only achieved indirectly by employing deep reinforcement learning having a discrete state space. Here, we show how this can be directly achieved by embedding deep learning into a closed loop system and preserving its continuous processing. We show specifically how error back-propagation can be achieved in z-space and in general how gradient based approaches can be analysed in such closed loop scenarios. The performance of this learning paradigm is demonstrated using a line-follower both in simulation and on a real robot that show very fast and continuous learning.
Tasks
Published	2020-01-09
URL	https://arxiv.org/abs/2001.02970v2
PDF	https://arxiv.org/pdf/2001.02970v2.pdf
PWC	https://paperswithcode.com/paper/closed-loop-deep-learning-generating-forward
Repo	https://github.com/a2198699s/AI-Line-Follower
Framework	none

GenDICE: Generalized Offline Estimation of Stationary Values


Title	GenDICE: Generalized Offline Estimation of Stationary Values
Authors	Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans
Abstract	An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain. In many real-world applications, access to the underlying transition operator is limited to a fixed set of data that has already been collected, without additional interaction with the environment being available. We show that consistent estimation remains possible in this challenging scenario, and that effective estimation can still be achieved in important applications. Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions, derived from fundamental properties of the stationary distribution, and exploiting constraint reformulations based on variational divergence minimization. The resulting algorithm, GenDICE, is straightforward and effective. We prove its consistency under general conditions, provide an error analysis, and demonstrate strong empirical performance on benchmark problems, including off-line PageRank and off-policy policy evaluation.
Tasks
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09072v1
PDF	https://arxiv.org/pdf/2002.09072v1.pdf
PWC	https://paperswithcode.com/paper/gendice-generalized-offline-estimation-of-1
Repo	https://github.com/zhangry868/GenDICE
Framework	none

Towards Rapid and Robust Adversarial Training with One-Step Attacks


Title	Towards Rapid and Robust Adversarial Training with One-Step Attacks
Authors	Leo Schwinn, René Raab, Björn Eskofier
Abstract	Adversarial training is the most successful empirical method for increasing the robustness of neural networks against adversarial attacks. However, the most effective approaches, like training with Projected Gradient Descent (PGD) are accompanied by high computational complexity. In this paper, we present two ideas that, in combination, enable adversarial training with the computationally less expensive Fast Gradient Sign Method (FGSM). First, we add uniform noise to the initial data point of the FGSM attack, which creates a wider variety of adversaries, thus prohibiting overfitting to one particular perturbation bound. Further, we add a learnable regularization step prior to the neural network, which we call Pixelwise Noise Injection Layer (PNIL). Inputs propagated trough the PNIL are resampled from a learned Gaussian distribution. The regularization induced by the PNIL prevents the model form learning to obfuscate its gradients, a factor that hindered prior approaches from successfully applying one-step methods for adversarial training. We show that noise injection in conjunction with FGSM-based adversarial training achieves comparable results to adversarial training with PGD while being considerably faster. Moreover, we outperform PGD-based adversarial training by combining noise injection and PNIL.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10097v4
PDF	https://arxiv.org/pdf/2002.10097v4.pdf
PWC	https://paperswithcode.com/paper/fast-and-stable-adversarial-training-through
Repo	https://github.com/SchwinnL/ML
Framework	tf

An Auxiliary Task for Learning Nuclei Segmentation in 3D Microscopy Images


Title	An Auxiliary Task for Learning Nuclei Segmentation in 3D Microscopy Images
Authors	Peter Hirsch, Dagmar Kainmueller
Abstract	Segmentation of cell nuclei in microscopy images is a prevalent necessity in cell biology. Especially for three-dimensional datasets, manual segmentation is prohibitively time-consuming, motivating the need for automated methods. Learning-based methods trained on pixel-wise ground-truth segmentations have been shown to yield state-of-the-art results on 2d benchmark image data of nuclei, yet a respective benchmark is missing for 3d image data. In this work, we perform a comparative evaluation of nuclei segmentation algorithms on a database of manually segmented 3d light microscopy volumes. We propose a novel learning strategy that boosts segmentation accuracy by means of a simple auxiliary task, thereby robustly outperforming each of our baselines. Furthermore, we show that one of our baselines, the popular three-label model, when trained with our proposed auxiliary task, outperforms the recent StarDist-3D. As an additional, practical contribution, we benchmark nuclei segmentation against nuclei detection, i.e. the task of merely pinpointing individual nuclei without generating respective pixel-accurate segmentations. For learning nuclei detection, large 3d training datasets of manually annotated nuclei center points are available. However, the impact on detection accuracy caused by training on such sparse ground truth as opposed to dense pixel-wise ground truth has not yet been quantified. To this end, we compare nuclei detection accuracy yielded by training on dense vs. sparse ground truth. Our results suggest that training on sparse ground truth yields competitive nuclei detection rates.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02857v1
PDF	https://arxiv.org/pdf/2002.02857v1.pdf
PWC	https://paperswithcode.com/paper/an-auxiliary-task-for-learning-nuclei
Repo	https://github.com/Kainmueller-Lab/aux_cpv_loss
Framework	tf

Adversarial Training for Aspect-Based Sentiment Analysis with BERT


Title	Adversarial Training for Aspect-Based Sentiment Analysis with BERT
Authors	Akbar Karimi, Leonardo Rossi, Andrea Prati, Katharina Full
Abstract	Aspect-Based Sentiment Analysis (ABSA) deals with the extraction of sentiments and their targets. Collecting labeled data for this task in order to help neural networks generalize better can be laborious and time-consuming. As an alternative, similar data to the real-world examples can be produced artificially through an adversarial process which is carried out in the embedding space. Although these examples are not real sentences, they have been shown to act as a regularization method which can make neural networks more robust. In this work, we apply adversarial training, which was put forward by Goodfellow et al. (2014), to the post-trained BERT (BERT-PT) language model proposed by Xu et al. (2019) on the two major tasks of Aspect Extraction and Aspect Sentiment Classification in sentiment analysis. After improving the results of post-trained BERT by an ablation study, we propose a novel architecture called BERT Adversarial Training (BAT) to utilize adversarial training in ABSA. The proposed model outperforms post-trained BERT in both tasks. To the best of our knowledge, this is the first study on the application of adversarial training in ABSA.
Tasks	Aspect-Based Sentiment Analysis, Aspect Extraction, Language Modelling, Sentiment Analysis
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11316v2
PDF	https://arxiv.org/pdf/2001.11316v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-training-for-aspect-based
Repo	https://github.com/akkarimi/Adversarial-Training-for-ABSA
Framework	pytorch

RPM-Net: Robust Point Matching using Learned Features


Title	RPM-Net: Robust Point Matching using Learned Features
Authors	Zi Jian Yew, Gim Hee Lee
Abstract	Iterative Closest Point (ICP) solves the rigid point cloud registration problem iteratively in two steps: (1) make hard assignments of spatially closest point correspondences, and then (2) find the least-squares rigid transformation. The hard assignments of closest point correspondences based on spatial distances are sensitive to the initial rigid transformation and noisy/outlier points, which often cause ICP to converge to wrong local minima. In this paper, we propose the RPM-Net – a less sensitive to initialization and more robust deep learning-based approach for rigid point cloud registration. To this end, our network uses the differentiable Sinkhorn layer and annealing to get soft assignments of point correspondences from hybrid features learned from both spatial coordinates and local geometry. To further improve registration performance, we introduce a secondary network to predict optimal annealing parameters. Unlike some existing methods, our RPM-Net handles missing correspondences and point clouds with partial visibility. Experimental results show that our RPM-Net achieves state-of-the-art performance compared to existing non-deep learning and recent deep learning methods. Our source code is available at the project website https://github.com/yewzijian/RPMNet .
Tasks	Point Cloud Registration
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13479v1
PDF	https://arxiv.org/pdf/2003.13479v1.pdf
PWC	https://paperswithcode.com/paper/2003-13479
Repo	https://github.com/yewzijian/RPMNet
Framework	none

Projected Stein Variational Gradient Descent


Title	Projected Stein Variational Gradient Descent
Authors	Peng Chen, Omar Ghattas
Abstract	The curse of dimensionality is a critical challenge in Bayesian inference for high dimensional parameters. In this work, we address this challenge by developing a projected Stein variational gradient descent (pSVGD) method, which projects the parameters into a subspace that is adaptively constructed using the gradient of the log-likelihood, and applies SVGD for the much lower-dimensional coefficients of the projection. We provide an upper bound for the projection error with respect to the posterior and demonstrate the accuracy (compared to SVGD) and scalability of pSVGD with respect to the number of parameters, samples, data points, and processor cores.
Tasks	Bayesian Inference
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03469v1
PDF	https://arxiv.org/pdf/2002.03469v1.pdf
PWC	https://paperswithcode.com/paper/projected-stein-variational-gradient-descent
Repo	https://github.com/cpempire/pSVGD
Framework	none

Augmented Transformer Achieves 97% and 85% for Top5 Prediction of Direct and Classical Retro-Synthesis


Title	Augmented Transformer Achieves 97% and 85% for Top5 Prediction of Direct and Classical Retro-Synthesis
Authors	Igor V. Tetko, Pavel Karpov, Ruud Van Deursen, Guillaume Godin
Abstract	We investigated the effect of different augmentation scenarios on predicting (retro)synthesis of chemical compounds using SMILES representation. We showed that augmentation of not only input sequences but also, importantly, of the target data eliminated the effect of data memorization by neural networks and improved their generalization performance for prediction of new sequences. The Top-5 accuracy was 85.4% for the prediction of the largest fragment (thus identifying principal transformation for classical retro-synthesis) for USPTO-50k test dataset and was achieved by a combination of SMILES augmentation and beam search. The same approach also outperformed best published results for prediction of direct reactions from the USPTO-MIT test set. Our model achieved 90.4% Top-1 and 96.5% Top-5 accuracy for its most challenging mixed set and 97% Top-5 accuracy for the USPTO-MIT separated set. The appearance frequency of the most abundantly generated SMILES was well correlated with the prediction outcome and can be used as a measure of the quality of reaction prediction.
Tasks
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02804v1
PDF	https://arxiv.org/pdf/2003.02804v1.pdf
PWC	https://paperswithcode.com/paper/augmented-transformer-achieves-97-and-85-for
Repo	https://github.com/bigchem/synthesis
Framework	tf

PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation


Title	PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation
Authors	Yang Zhang, Zixiang Zhou, Philip David, Xiangyu Yue, Zerong Xi, Hassan Foroosh
Abstract	The requirement of fine-grained perception by autonomous driving systems has resulted in recently increased research in the online semantic segmentation of single-scan LiDAR. Emerging datasets and technological advancements have enabled researchers to benchmark this problem and improve the applicable semantic segmentation algorithms. Still, online semantic segmentation of LiDAR scans in autonomous driving applications remains challenging due to three reasons: (1) the need for near-real-time latency with limited hardware, (2) points are distributed unevenly across space, and (3) an increasing number of more fine-grained semantic classes. The combination of the aforementioned challenges motivates us to propose a new LiDAR-specific, KNN-free segmentation algorithm - PolarNet. Instead of using common spherical or bird’s-eye-view projection, our polar bird’s-eye-view representation balances the points per grid and thus indirectly redistributes the network’s attention over the long-tailed points distribution over the radial axis in polar coordination. We find that our encoding scheme greatly increases the mIoU in three drastically different real urban LiDAR single-scan segmentation datasets while retaining ultra low latency and near real-time throughput.
Tasks	Autonomous Driving, Semantic Segmentation
Published	2020-03-31
URL	https://arxiv.org/abs/2003.14032v1
PDF	https://arxiv.org/pdf/2003.14032v1.pdf
PWC	https://paperswithcode.com/paper/polarnet-an-improved-grid-representation-for
Repo	https://github.com/edwardzhou130/PolarSeg
Framework	pytorch

Nonparametric Continuous Sensor Registration


Title	Nonparametric Continuous Sensor Registration
Authors	William Clark, Maani Ghaffari, Anthony Bloch
Abstract	This paper develops a new mathematical framework that enables nonparametric joint semantic/appearance and geometric representation of continuous functions using data. The joint semantic and geometric embedding is modeled by representing the processes in a reproducing kernel Hilbert space. The framework allows the functions to be defined on arbitrary smooth manifolds where the action of a Lie group is used to align them. The continuous functions allow the registration to be independent of a specific signal resolution and the framework is fully analytical with a closed-form derivation of the Riemannian gradient and Hessian. We study a more specialized but widely used case where the Lie group acts on functions isometrically. We solve the problem by maximizing the inner product between two functions defined over data, while the continuous action of the rigid body motion Lie group is captured through the integration of the flow in the corresponding Lie algebra. Low-dimensional cases are derived with numerical examples to show the generality of the proposed framework. The high-dimensional derivation for the special Euclidean group acting on the Euclidean space showcases the point cloud registration and bird’s-eye view map registration abilities. A specific derivation and implementation of this framework for RGB-D cameras outperform the state-of-the-art robust visual odometry and performs well in texture and structure-scares environments.
Tasks	Point Cloud Registration, Visual Odometry
Published	2020-01-08
URL	https://arxiv.org/abs/2001.04286v2
PDF	https://arxiv.org/pdf/2001.04286v2.pdf
PWC	https://paperswithcode.com/paper/nonparametric-continuous-sensor-registration
Repo	https://github.com/MaaniGhaffari/c-sensor-registration
Framework	none

SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases


Title	SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases
Authors	Yawei Sun, Lingling Zhang, Gong Cheng, Yuzhong Qu
Abstract	Semantic parsing transforms a natural language question into a formal query over a knowledge base. Many existing methods rely on syntactic parsing like dependencies. However, the accuracy of producing such expressive formalisms is not satisfying on long complex questions. In this paper, we propose a novel skeleton grammar to represent the high-level structure of a complex question. This dedicated coarse-grained formalism with a BERT-based parsing algorithm helps to improve the accuracy of the downstream fine-grained semantic parsing. Besides, to align the structure of a question with the structure of a knowledge base, our multi-strategy method combines sentence-level and word-level semantics. Our approach shows promising performance on several datasets.
Tasks	Semantic Parsing
Published	2020-03-31
URL	https://arxiv.org/abs/2003.13956v1
PDF	https://arxiv.org/pdf/2003.13956v1.pdf
PWC	https://paperswithcode.com/paper/sparqa-skeleton-based-semantic-parsing-for
Repo	https://github.com/nju-websoft/SPARQA
Framework	pytorch

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval


Title	TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Authors	Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
Abstract	We introduce a new multimodal retrieval task - TV show Retrieval (TVR), in which a short video moment has to be localized from a large video (with subtitle) corpus, given a natural language query. Different from previous moment retrieval tasks dealing with videos only, TVR requires the system to understand both the video and the associated subtitle text, making it a more realistic task. To support the study of this new task, we have collected a large-scale, high-quality dataset consisting of 108,965 queries on 21,793 videos from 6 TV shows of diverse genres, where each query is associated with a tight temporal alignment. Strict qualification and post-annotation verification tests are applied to ensure the quality of the collected data. We present several baselines and a novel Cross-modal Moment Localization (XML) modular network for this new dataset and task. The proposed XML model surpasses all presented baselines by a large margin and with better efficiency, providing a strong starting point for future work. Extensive analysis experiments also show that incorporating both video and subtitle modules yields better performance than either alone. Lastly, we have also collected additional descriptions for each annotated moment in TVR to form a new multimodal captioning dataset with 262K captions, named the TV show Caption dataset (TVC). Here models need to jointly use the video and subtitle to generate a caption description. Both datasets are publicly available at https://tvr.cs.unc.edu
Tasks
Published	2020-01-24
URL	https://arxiv.org/abs/2001.09099v1
PDF	https://arxiv.org/pdf/2001.09099v1.pdf
PWC	https://paperswithcode.com/paper/tvr-a-large-scale-dataset-for-video-subtitle
Repo	https://github.com/jayleicn/TVCaption
Framework	pytorch

A Survey of Adversarial Learning on Graphs


Title	A Survey of Adversarial Learning on Graphs
Authors	Liang Chen, Jintang Li, Jiaying Peng, Tao Xie, Zengxu Cao, Kun Xu, Xiangnan He, Zibin Zheng
Abstract	Deep learning models on graphs have achieved remarkable performance in various graph analysis tasks, e.g., node classification, link prediction and graph clustering. However, they expose uncertainty and unreliability against the well-designed inputs, i.e., adversarial examples. Accordingly, various studies have emerged for both attack and defense addressed in different graph analysis tasks, leading to the arms race in graph adversarial learning. For instance, the attacker has poisoning and evasion attack, and the defense group correspondingly has preprocessing- and adversarial- based methods. Despite the booming works, there still lacks a unified problem definition and a comprehensive review. To bridge this gap, we investigate and summarize the existing works on graph adversarial learning tasks systemically. Specifically, we survey and unify the existing works w.r.t. attack and defense in graph analysis tasks, and give proper definitions and taxonomies at the same time. Besides, we emphasize the importance of related evaluation metrics, and investigate and summarize them comprehensively. Hopefully, our works can serve as a reference for the relevant researchers, thus providing assistance for their studies. More details of our works are available at https://github.com/gitgiter/Graph-Adversarial-Learning.
Tasks	Graph Clustering, Link Prediction, Node Classification
Published	2020-03-10
URL	https://arxiv.org/abs/2003.05730v1
PDF	https://arxiv.org/pdf/2003.05730v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-adversarial-learning-on-graphs
Repo	https://github.com/gitgiter/Graph-Adversarial-Learning
Framework	tf

Learning Fast and Robust Target Models for Video Object Segmentation


Title	Learning Fast and Robust Target Models for Video Object Segmentation
Authors	Andreas Robinson, Felix Järemo Lawin, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg
Abstract	Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time. The main difficulty is to effectively handle appearance changes and similar background objects, while maintaining accurate segmentation. Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting. More recent methods integrate generative target appearance models, but either achieve limited robustness or require large amounts of training data. We propose a novel VOS architecture consisting of two network components. The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation. The segmentation model is exclusively trained offline, designed to process the coarse scores into high quality segmentation masks. Our method is fast, easily trainable and remains highly effective in cases of limited training data. We perform extensive experiments on the challenging YouTube-VOS and DAVIS datasets. Our network achieves favorable performance, while operating at higher frame-rates compared to state-of-the-art. Code and trained models are available at https://github.com/andr345/frtm-vos.
Tasks	Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2020-02-27
URL	https://arxiv.org/abs/2003.00908v2
PDF	https://arxiv.org/pdf/2003.00908v2.pdf
PWC	https://paperswithcode.com/paper/learning-fast-and-robust-target-models-for
Repo	https://github.com/andr345/frtm-vos
Framework	none

Suppressing Uncertainties for Large-Scale Facial Expression Recognition


Title	Suppressing Uncertainties for Large-Scale Facial Expression Recognition
Authors	Kai Wang, Xiaojiang Peng, Jianfei Yang, Shijian Lu, Yu Qiao
Abstract	Annotating a qualitative large-scale facial expression dataset is extremely difficult due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators. These uncertainties lead to a key challenge of large-scale Facial Expression Recognition (FER) in deep learning era. To address this problem, this paper proposes a simple yet efficient Self-Cure Network (SCN) which suppresses the uncertainties efficiently and prevents deep networks from over-fitting uncertain facial images. Specifically, SCN suppresses the uncertainty from two different aspects: 1) a self-attention mechanism over mini-batch to weight each training sample with a ranking regularization, and 2) a careful relabeling mechanism to modify the labels of these samples in the lowest-ranked group. Experiments on synthetic FER datasets and our collected WebEmotion dataset validate the effectiveness of our method. Results on public benchmarks demonstrate that our SCN outperforms current state-of-the-art methods with \textbf{88.14}% on RAF-DB, \textbf{60.23}% on AffectNet, and \textbf{89.35}% on FERPlus. The code will be available at \href{https://github.com/kaiwang960112/Self-Cure-Network}{https://github.com/kaiwang960112/Self-Cure-Network}.
Tasks	Facial Expression Recognition
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10392v2
PDF	https://arxiv.org/pdf/2002.10392v2.pdf
PWC	https://paperswithcode.com/paper/suppressing-uncertainties-for-large-scale
Repo	https://github.com/kaiwang960112/Self-Cure-Network
Framework	none