April 3, 2020

3268 words 16 mins read

Paper Group AWR 75

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. Inf-VAE: A Variational Autoencoder Framework to Integrate Homophily and Influence in Diffusion Prediction. Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency. Unbiased Scene Graph Generation from Biased Training. MagNet: Discovering Multi-agen …

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning


Title	Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
Authors	Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson
Abstract	In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods.
Tasks	Multi-agent Reinforcement Learning, Starcraft
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08839v1
PDF	https://arxiv.org/pdf/2003.08839v1.pdf
PWC	https://paperswithcode.com/paper/monotonic-value-function-factorisation-for
Repo	https://github.com/oxwhirl/pymarl
Framework	pytorch

Inf-VAE: A Variational Autoencoder Framework to Integrate Homophily and Influence in Diffusion Prediction


Title	Inf-VAE: A Variational Autoencoder Framework to Integrate Homophily and Influence in Diffusion Prediction
Authors	Aravind Sankar, Xinyang Zhang, Adit Krishnan, Jiawei Han
Abstract	Recent years have witnessed tremendous interest in understanding and predicting information spread on social media platforms such as Twitter, Facebook, etc. Existing diffusion prediction methods primarily exploit the sequential order of influenced users by projecting diffusion cascades onto their local social neighborhoods. However, this fails to capture global social structures that do not explicitly manifest in any of the cascades, resulting in poor performance for inactive users with limited historical activities. In this paper, we present a novel variational autoencoder framework (Inf-VAE) to jointly embed homophily and influence through proximity-preserving social and position-encoded temporal latent variables. To model social homophily, Inf-VAE utilizes powerful graph neural network architectures to learn social variables that selectively exploit the social connections of users. Given a sequence of seed user activations, Inf-VAE uses a novel expressive co-attentive fusion network that jointly attends over their social and temporal variables to predict the set of all influenced users. Our experimental results on multiple real-world social network datasets, including Digg, Weibo, and Stack-Exchanges demonstrate significant gains (22% MAP@10) for Inf-VAE over state-of-the-art diffusion prediction models; we achieve massive gains for users with sparse activities, and users who lack direct social neighbors in seed sets.
Tasks
Published	2020-01-01
URL	https://arxiv.org/abs/2001.00132v1
PDF	https://arxiv.org/pdf/2001.00132v1.pdf
PWC	https://paperswithcode.com/paper/inf-vae-a-variational-autoencoder-framework
Repo	https://github.com/aravindsankar28/Inf-VAE
Framework	tf

Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency


Title	Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency
Authors	Yipu Zhao, Patricio A. Vela
Abstract	Analysis of state-of-the-art VO/VSLAM system exposes a gap in balancing performance (accuracy & robustness) and efficiency (latency). Feature-based systems exhibit good performance, yet have higher latency due to explicit data association; direct & semidirect systems have lower latency, but are inapplicable in some target scenarios or exhibit lower accuracy than feature-based ones. This paper aims to fill the performance-efficiency gap with an enhancement applied to feature-based VSLAM. We present good feature matching, an active map-to-frame feature matching method. Feature matching effort is tied to submatrix selection, which has combinatorial time complexity and requires choosing a scoring metric. Via simulation, the Max-logDet matrix revealing metric is shown to perform best. For real-time applicability, the combination of deterministic selection and randomized acceleration is studied. The proposed algorithm is integrated into monocular & stereo feature-based VSLAM systems. Extensive evaluations on multiple benchmarks and compute hardware quantify the latency reduction and the accuracy & robustness preservation.
Tasks
Published	2020-01-03
URL	https://arxiv.org/abs/2001.00714v1
PDF	https://arxiv.org/pdf/2001.00714v1.pdf
PWC	https://paperswithcode.com/paper/good-feature-matching-towards-accurate-robust
Repo	https://github.com/ivalab/gf_orb_slam2
Framework	none

Unbiased Scene Graph Generation from Biased Training


Title	Unbiased Scene Graph Generation from Biased Training
Authors	Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, Hanwang Zhang
Abstract	Today’s scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, e.g., collapsing diverse “human walk on / sit on / lay on beach” into “human on beach”. Given such SGG, the down-stream tasks such as VQA can hardly infer better scene structures than merely a bag of objects. However, debiasing in SGG is not trivial because traditional debiasing methods cannot distinguish between the good and bad bias, e.g., good context prior (e.g., “person read book” rather than “eat”) and bad long-tailed bias (e.g., “near” dominating “behind / in front of”). In this paper, we present a novel SGG framework based on causal inference but not the conventional likelihood. We first build a causal graph for SGG, and perform traditional biased training with the graph. Then, we propose to draw the counterfactual causality from the trained graph to infer the effect from the bad bias, which should be removed. In particular, we use Total Direct Effect (TDE) as the proposed final predicate score for unbiased SGG. Note that our framework is agnostic to any SGG model and thus can be widely applied in the community who seeks unbiased predictions. By using the proposed Scene Graph Diagnosis toolkit on the SGG benchmark Visual Genome and several prevailing models, we observed significant improvements over the previous state-of-the-art methods.
Tasks	Causal Inference, Graph Generation, Scene Graph Generation
Published	2020-02-27
URL	https://arxiv.org/abs/2002.11949v3
PDF	https://arxiv.org/pdf/2002.11949v3.pdf
PWC	https://paperswithcode.com/paper/unbiased-scene-graph-generation-from-biased
Repo	https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch
Framework	pytorch

MagNet: Discovering Multi-agent Interaction Dynamics using Neural Network


Title	MagNet: Discovering Multi-agent Interaction Dynamics using Neural Network
Authors	Priyabrata Saha, Arslan Ali, Burhan A. Mudassar, Yun Long, Saibal Mukhopadhyay
Abstract	We present the MagNet, a neural network-based multi-agent interaction model to discover the governing dynamics and predict evolution of a complex multi-agent system from observations. We formulate a multi-agent system as a coupled non-linear network with a generic ordinary differential equation (ODE) based state evolution, and develop a neural network-based realization of its time-discretized model. MagNet is trained to discover the core dynamics of a multi-agent system from observations, and tuned on-line to learn agent-specific parameters of the dynamics to ensure accurate prediction even when physical or relational attributes of agents, or number of agents change. We evaluate MagNet on a point-mass system in two-dimensional space, Kuramoto phase synchronization dynamics and predator-swarm interaction dynamics demonstrating orders of magnitude improvement in prediction accuracy over traditional deep learning models.
Tasks
Published	2020-01-24
URL	https://arxiv.org/abs/2001.09001v2
PDF	https://arxiv.org/pdf/2001.09001v2.pdf
PWC	https://paperswithcode.com/paper/magnet-discovering-multi-agent-interaction
Repo	https://github.com/sahapriyabrata/MagNet
Framework	pytorch

Toward Adversarial Robustness via Semi-supervised Robust Training


Title	Toward Adversarial Robustness via Semi-supervised Robust Training
Authors	Yiming Li, Baoyuan Wu, Yan Feng, Yanbo Fan, Yong Jiang, Zhifeng Li, Shutao Xia
Abstract	Adversarial examples have been shown to be the severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk $R_{adv}$, which encourages both the benign example $x$ and its adversarially perturbed neighborhoods within the $\ell_{p}$-ball to be predicted as the ground-truth label. In this work, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_{stand}$ and $R_{rob}$), which is with respect to the benign example and its neighborhoods respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that $R_{adv}$ is upper-bounded by $R_{stand} + R_{rob}$, which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, and the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since $R_{rob}$ is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode ($i.e.$, SRT), to further enhance the adversarial robustness. Moreover, we extend the $\ell_{p}$-bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise ($i.e.$, $x + \delta$) or the spatial perturbation ($i.e.$, $ AX + b$). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT method to state-of-the-art methods for defensing pixel-wise or spatial perturbations separately, but also demonstrate its robustness to both perturbations simultaneously. The code for reproducing main results is available at \url{https://github.com/THUYimingLi/Semi-supervised_Robust_Training}.
Tasks	Adversarial Defense
Published	2020-03-16
URL	https://arxiv.org/abs/2003.06974v1
PDF	https://arxiv.org/pdf/2003.06974v1.pdf
PWC	https://paperswithcode.com/paper/toward-adversarial-robustness-via-semi
Repo	https://github.com/THUYimingLi/Semi-supervised_Robust_Training
Framework	none

Mnemonics Training: Multi-Class Incremental Learning without Forgetting


Title	Mnemonics Training: Multi-Class Incremental Learning without Forgetting
Authors	Yaoyao Liu, An-An Liu, Yuting Su, Bernt Schiele, Qianru Sun
Abstract	Multi-Class Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent trade-off to effectively learning new concepts without catastrophic forgetting of previous ones. To alleviate this issue, it has been proposed to keep around a few examples of the previous concepts but the effectiveness of this approach heavily depends on the representativeness of these examples. This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner. We train the framework through bilevel optimizations, i.e., model-level and exemplar-level. We conduct extensive experiments on three MCIL benchmarks, CIFAR-100, ImageNet-Subset and ImageNet, and show that using mnemonics exemplars can surpass the state-of-the-art by a large margin. Interestingly and quite intriguingly, the mnemonics exemplars tend to be on the boundaries between classes.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10211v2
PDF	https://arxiv.org/pdf/2002.10211v2.pdf
PWC	https://paperswithcode.com/paper/mnemonics-training-multi-class-incremental
Repo	https://github.com/yaoyao-liu/mnemonics
Framework	pytorch

On the limits of cross-domain generalization in automated X-ray prediction


Title	On the limits of cross-domain generalization in automated X-ray prediction
Authors	Joseph Paul Cohen, Mohammad Hashir, Rupert Brooks, Hadrien Bertrand
Abstract	This large scale study focuses on quantifying what X-rays diagnostic prediction tasks generalize well across multiple different datasets. We present evidence that the issue of generalization is not due to a shift in the images but instead a shift in the labels. We study the cross-domain performance, agreement between models, and model representations. We find interesting discrepancies between performance and agreement where models which both achieve good performance disagree in their predictions as well as models which agree yet achieve poor performance. We also test for concept similarity by regularizing a network to group tasks across multiple datasets together and observe variation across the tasks.
Tasks	Domain Generalization
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02497v1
PDF	https://arxiv.org/pdf/2002.02497v1.pdf
PWC	https://paperswithcode.com/paper/on-the-limits-of-cross-domain-generalization
Repo	https://github.com/ieee8023/covid-chestxray-dataset
Framework	none

Recognizing Characters in Art History Using Deep Learning


Title	Recognizing Characters in Art History Using Deep Learning
Authors	Prathmesh Madhu, Ronak Kosti, Lara Mührenberg, Peter Bell, Andreas Maier, Vincent Christlein
Abstract	In the field of Art History, images of artworks and their contexts are core to understanding the underlying semantic information. However, the highly complex and sophisticated representation of these artworks makes it difficult, even for the experts, to analyze the scene. From the computer vision perspective, the task of analyzing such artworks can be divided into sub-problems by taking a bottom-up approach. In this paper, we focus on the problem of recognizing the characters in Art History. From the iconography of $Annunciation$ $of$ $the$ $Lord$ (Figure 1), we consider the representation of the main protagonists, $Mary$ and $Gabriel$, across different artworks and styles. We investigate and present the findings of training a character classifier on features extracted from their face images. The limitations of this method, and the inherent ambiguity in the representation of $Gabriel$, motivated us to consider their bodies (a bigger context) to analyze in order to recognize the characters. Convolutional Neural Networks (CNN) trained on the bodies of $Mary$ and $Gabriel$ are able to learn person related features and ultimately improve the performance of character recognition. We introduce a new technique that generates more data with similar styles, effectively creating data in the similar domain. We present experiments and analysis on three different models and show that the model trained on domain related data gives the best performance for recognizing character. Additionally, we analyze the localized image regions for the network predictions. Code is open-sourced and available at https://github.com/prathmeshrmadhu/recognize_characters_art_history and the link to the published peer-reviewed article is https://dl.acm.org/citation.cfm?id=3357242.
Tasks
Published	2020-03-31
URL	https://arxiv.org/abs/2003.14171v2
PDF	https://arxiv.org/pdf/2003.14171v2.pdf
PWC	https://paperswithcode.com/paper/recognizing-characters-in-art-history-using
Repo	https://github.com/prathmeshrmadhu/recognize_characters_art_history
Framework	tf

Structure-Preserving Super Resolution with Gradient Guidance


Title	Structure-Preserving Super Resolution with Gradient Guidance
Authors	Cheng Ma, Yongming Rao, Yean Cheng, Ce Chen, Jiwen Lu, Jie Zhou
Abstract	Structures matter in single image super resolution (SISR). Recent studies benefiting from generative adversarial network (GAN) have promoted the development of SISR by recovering photo-realistic images. However, there are always undesired structural distortions in the recovered images. In this paper, we propose a structure-preserving super resolution method to alleviate the above issue while maintaining the merits of GAN-based methods to generate perceptual-pleasant details. Specifically, we exploit gradient maps of images to guide the recovery in two aspects. On the one hand, we restore high-resolution gradient maps by a gradient branch to provide additional structure priors for the SR process. On the other hand, we propose a gradient loss which imposes a second-order restriction on the super-resolved images. Along with the previous image-space loss functions, the gradient-space objectives help generative networks concentrate more on geometric structures. Moreover, our method is model-agnostic, which can be potentially used for off-the-shelf SR networks. Experimental results show that we achieve the best PI and LPIPS performance and meanwhile comparable PSNR and SSIM compared with state-of-the-art perceptual-driven SR methods. Visual results demonstrate our superiority in restoring structures while generating natural SR images.
Tasks	Image Super-Resolution, Super-Resolution
Published	2020-03-29
URL	https://arxiv.org/abs/2003.13081v1
PDF	https://arxiv.org/pdf/2003.13081v1.pdf
PWC	https://paperswithcode.com/paper/structure-preserving-super-resolution-with
Repo	https://github.com/Maclory/SPSR
Framework	pytorch

NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting


Title	NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting
Authors	Qi Wang, Junyu Gao, Wei Lin, Xuelong Li
Abstract	In the last decade, crowd counting attracts much attention of researchers due to its wide-spread applications, including crowd monitoring, public safety, space design, etc. Many Convolutional Neural Networks (CNN) are designed for tackling this task. However, currently released datasets are so small-scale that they can not meet the needs of the supervised CNN-based algorithms. To remedy this problem, we construct a large-scale congested crowd counting dataset, NWPU-Crowd, consisting of $5,109$ images, in a total of $2,133,238$ annotated heads. Compared with other real-world datasets, it contains various illumination scenes and has the largest density range ($0!\sim!20,033$). Besides, a benchmark website is developed for impartially evaluating the different methods, which allows researchers to submit the results of the test set. Based on the proposed dataset, we further describe the data characteristics, evaluate the performance of some mainstream state-of-the-art (SOTA) methods, and analyze the new problems that arise on the new data. What’s more, NWPU-Crowd Dataset is available at \url{http://www.crowdbenchmark.com/}, and the code is open-sourced at \url{https://github.com/gjy3035/NWPU-Crowd-Sample-Code}
Tasks	Crowd Counting
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03360v1
PDF	https://arxiv.org/pdf/2001.03360v1.pdf
PWC	https://paperswithcode.com/paper/nwpu-crowd-a-large-scale-benchmark-for-crowd
Repo	https://github.com/gjy3035/Awesome-Crowd-Counting
Framework	pytorch

PoPS: Policy Pruning and Shrinking for Deep Reinforcement Learning


Title	PoPS: Policy Pruning and Shrinking for Deep Reinforcement Learning
Authors	Dor Livne, Kobi Cohen
Abstract	The recent success of deep neural networks (DNNs) for function approximation in reinforcement learning has triggered the development of Deep Reinforcement Learning (DRL) algorithms in various fields, such as robotics, computer games, natural language processing, computer vision, sensing systems, and wireless networking. Unfortunately, DNNs suffer from high computational cost and memory consumption, which limits the use of DRL algorithms in systems with limited hardware resources. In recent years, pruning algorithms have demonstrated considerable success in reducing the redundancy of DNNs in classification tasks. However, existing algorithms suffer from a significant performance reduction in the DRL domain. In this paper, we develop the first effective solution to the performance reduction problem of pruning in the DRL domain, and establish a working algorithm, named Policy Pruning and Shrinking (PoPS), to train DRL models with strong performance while achieving a compact representation of the DNN. The framework is based on a novel iterative policy pruning and shrinking method that leverages the power of transfer learning when training the DRL model. We present an extensive experimental study that demonstrates the strong performance of PoPS using the popular Cartpole, Lunar Lander, Pong, and Pacman environments. Finally, we develop an open source software for the benefit of researchers and developers in related fields.
Tasks	Transfer Learning
Published	2020-01-14
URL	https://arxiv.org/abs/2001.05012v1
PDF	https://arxiv.org/pdf/2001.05012v1.pdf
PWC	https://paperswithcode.com/paper/pops-policy-pruning-and-shrinking-for-deep
Repo	https://github.com/dorlivne/PoPS
Framework	tf

BirdNet+: End-to-End 3D Object Detection in LiDAR Bird’s Eye View


Title	BirdNet+: End-to-End 3D Object Detection in LiDAR Bird’s Eye View
Authors	Alejandro Barrera, Carlos Guindel, Jorge Beltrán, Fernando García
Abstract	On-board 3D object detection in autonomous vehicles often relies on geometry information captured by LiDAR devices. Albeit image features are typically preferred for detection, numerous approaches take only spatial data as input. Exploiting this information in inference usually involves the use of compact representations such as the Bird’s Eye View (BEV) projection, which entails a loss of information and thus hinders the joint inference of all the parameters of the objects’ 3D boxes. In this paper, we present a fully end-to-end 3D object detection framework that can infer oriented 3D boxes solely from BEV images by using a two-stage object detector and ad-hoc regression branches, eliminating the need for a post-processing stage. The method outperforms its predecessor (BirdNet) by a large margin and obtains state-of-the-art results on the KITTI 3D Object Detection Benchmark for all the categories in evaluation.
Tasks	3D Object Detection, Autonomous Vehicles, Object Detection
Published	2020-03-09
URL	https://arxiv.org/abs/2003.04188v1
PDF	https://arxiv.org/pdf/2003.04188v1.pdf
PWC	https://paperswithcode.com/paper/birdnet-end-to-end-3d-object-detection-in
Repo	https://github.com/beltransen/lidar_bev
Framework	none

Neural Networks on Random Graphs


Title	Neural Networks on Random Graphs
Authors	Romuald A. Janik, Aleksandra Nowak
Abstract	We performed a massive evaluation of neural networks with architectures corresponding to random graphs of various types. Apart from the classical random graph families including random, scale-free and small world graphs, we introduced a novel and flexible algorithm for directly generating random directed acyclic graphs (DAG) and studied a class of graphs derived from functional resting state fMRI networks. A majority of the best performing networks were indeed in these new families. We also proposed a general procedure for turning a graph into a DAG necessary for a feed-forward neural network. We investigated various structural and numerical properties of the graphs in relation to neural network test accuracy. Since none of the classical numerical graph invariants by itself seems to allow to single out the best networks, we introduced new numerical characteristics that selected a set of quasi-1-dimensional graphs, which were the majority among the best performing networks.
Tasks
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08104v1
PDF	https://arxiv.org/pdf/2002.08104v1.pdf
PWC	https://paperswithcode.com/paper/neural-networks-on-random-graphs
Repo	https://github.com/rmldj/random-graph-nn-paper
Framework	pytorch

Bidirectional Generative Modeling Using Adversarial Gradient Estimation


Title	Bidirectional Generative Modeling Using Adversarial Gradient Estimation
Authors	Xinwei Shen, Tong Zhang, Kani Chen
Abstract	This paper considers the general $f$-divergence formulation of bidirectional generative modeling, which includes VAE and BiGAN as special cases. We present a new optimization method for this formulation, where the gradient is computed using an adversarially learned discriminator. In our framework, we show that different divergences induce similar algorithms in terms of gradient evaluation, except with different scaling. Therefore this paper gives a general recipe for a class of principled $f$-divergence based generative modeling methods. Theoretical justifications and extensive empirical studies are provided to demonstrate the advantage of our approach over existing methods.
Tasks
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09161v1
PDF	https://arxiv.org/pdf/2002.09161v1.pdf
PWC	https://paperswithcode.com/paper/bidirectional-generative-modeling-using
Repo	https://github.com/xwshen51/AGE
Framework	pytorch