February 1, 2020

3428 words 17 mins read

Paper Group AWR 227

The StarCraft Multi-Agent Challenge. Mapping Informal Settlements in Developing Countries using Machine Learning and Low Resolution Multi-spectral Data. Semantic Graph Convolutional Networks for 3D Human Pose Regression. Learning Invariant Representations of Social Media Users. DOVER: A Method for Combining Diarization Outputs. Protecting Geolocati …

The StarCraft Multi-Agent Challenge


Title	The StarCraft Multi-Agent Challenge
Authors	Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson
Abstract	In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0.
Tasks	Multi-agent Reinforcement Learning, Real-Time Strategy Games, Starcraft, Starcraft II
Published	2019-02-11
URL	https://arxiv.org/abs/1902.04043v5
PDF	https://arxiv.org/pdf/1902.04043v5.pdf
PWC	https://paperswithcode.com/paper/the-starcraft-multi-agent-challenge
Repo	https://github.com/oxwhirl/smac
Framework	pytorch

Mapping Informal Settlements in Developing Countries using Machine Learning and Low Resolution Multi-spectral Data


Title	Mapping Informal Settlements in Developing Countries using Machine Learning and Low Resolution Multi-spectral Data
Authors	Bradley Gram-Hansen, Patrick Helber, Indhu Varatharajan, Faiza Azam, Alejandro Coca-Castro, Veronika Kopackova, Piotr Bilinski
Abstract	Informal settlements are home to the most socially and economically vulnerable people on the planet. In order to deliver effective economic and social aid, non-government organizations (NGOs), such as the United Nations Children’s Fund (UNICEF), require detailed maps of the locations of informal settlements. However, data regarding informal and formal settlements is primarily unavailable and if available is often incomplete. This is due, in part, to the cost and complexity of gathering data on a large scale. To address these challenges, we, in this work, provide three contributions. 1) A brand new machine learning data-set, purposely developed for informal settlement detection. 2) We show that it is possible to detect informal settlements using freely available low-resolution (LR) data, in contrast to previous studies that use very-high resolution (VHR) satellite and aerial imagery, something that is cost-prohibitive for NGOs. 3) We demonstrate two effective classification schemes on our curated data set, one that is cost-efficient for NGOs and another that is cost-prohibitive for NGOs, but has additional utility. We integrate these schemes into a semi-automated pipeline that converts either a LR or VHR satellite image into a binary map that encodes the locations of informal settlements.
Tasks
Published	2019-01-03
URL	https://arxiv.org/abs/1901.00861v3
PDF	https://arxiv.org/pdf/1901.00861v3.pdf
PWC	https://paperswithcode.com/paper/mapping-informal-settlements-in-developing
Repo	https://github.com/FrontierDevelopmentLab/informal-settlements
Framework	none

Semantic Graph Convolutional Networks for 3D Human Pose Regression


Title	Semantic Graph Convolutional Networks for 3D Human Pose Regression
Authors	Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris N. Metaxas
Abstract	In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression. Current architectures of GCNs are limited to the small receptive field of convolution filters and shared transformation matrix for each node. To address these limitations, we propose Semantic Graph Convolutional Networks (SemGCN), a novel neural network architecture that operates on regression tasks with graph-structured data. SemGCN learns to capture semantic information such as local and global node relationships, which is not explicitly represented in the graph. These semantic relationships can be learned through end-to-end training from the ground truth without additional supervision or hand-crafted rules. We further investigate applying SemGCN to 3D human pose regression. Our formulation is intuitive and sufficient since both 2D and 3D human poses can be represented as a structured graph encoding the relationships between joints in the skeleton of a human body. We carry out comprehensive studies to validate our method. The results prove that SemGCN outperforms state of the art while using 90% fewer parameters.
Tasks	3D Human Pose Estimation
Published	2019-04-06
URL	https://arxiv.org/abs/1904.03345v3
PDF	https://arxiv.org/pdf/1904.03345v3.pdf
PWC	https://paperswithcode.com/paper/semantic-graph-convolutional-networks-for-3d
Repo	https://github.com/garyzhao/SemGCN
Framework	pytorch


Title	Learning Invariant Representations of Social Media Users
Authors	Nicholas Andrews, Marcus Bishop
Abstract	The evolution of social media users’ behavior over time complicates user-level comparison tasks such as verification, classification, clustering, and ranking. As a result, na"ive approaches may fail to generalize to new users or even to future observations of previously known users. In this paper, we propose a novel procedure to learn a mapping from short episodes of user activity on social media to a vector space in which the distance between points captures the similarity of the corresponding users’ invariant features. We fit the model by optimizing a surrogate metric learning objective over a large corpus of unlabeled social media content. Once learned, the mapping may be applied to users not seen at training time and enables efficient comparisons of users in the resulting vector space. We present a comprehensive evaluation to validate the benefits of the proposed approach using data from Reddit, Twitter, and Wikipedia.
Tasks	Metric Learning
Published	2019-10-11
URL	https://arxiv.org/abs/1910.04979v1
PDF	https://arxiv.org/pdf/1910.04979v1.pdf
PWC	https://paperswithcode.com/paper/learning-invariant-representations-of-social
Repo	https://github.com/noa/iur
Framework	tf

DOVER: A Method for Combining Diarization Outputs


Title	DOVER: A Method for Combining Diarization Outputs
Authors	Andreas Stolcke, Takuya Yoshioka
Abstract	Speech recognition and other natural language tasks have long benefited from voting-based algorithms as a method to aggregate outputs from several systems to achieve a higher accuracy than any of the individual systems. Diarization, the task of segmenting an audio stream into speaker-homogeneous and co-indexed regions, has so far not seen the benefit of this strategy because the structure of the task does not lend itself to a simple voting approach. This paper presents DOVER (diarization output voting error reduction), an algorithm for weighted voting among diarization hypotheses, in the spirit of the ROVER algorithm for combining speech recognition hypotheses. We evaluate the algorithm for diarization of meeting recordings with multiple microphones, and find that it consistently reduces diarization error rate over the average of results from individual channels, and often improves on the single best channel chosen by an oracle.
Tasks	Speech Recognition
Published	2019-09-17
URL	https://arxiv.org/abs/1909.08090v2
PDF	https://arxiv.org/pdf/1909.08090v2.pdf
PWC	https://paperswithcode.com/paper/dover-a-method-for-combining-diarization
Repo	https://github.com/stolcke/dover
Framework	none

Protecting Geolocation Privacy of Photo Collections


Title	Protecting Geolocation Privacy of Photo Collections
Authors	Jinghan Yang, Ayan Chakrabarti, Yevgeniy Vorobeychik
Abstract	People increasingly share personal information, including their photos and photo collections, on social media. This information, however, can compromise individual privacy, particularly as social media platforms use it to infer detailed models of user behavior, including tracking their location. We consider the specific issue of location privacy as potentially revealed by posting photo collections, which facilitate accurate geolocation with the help of deep learning methods even in the absence of geotags. One means to limit associated inadvertent geolocation privacy disclosure is by carefully pruning select photos from photo collections before these are posted publicly. We study this problem formally as a combinatorial optimization problem in the context of geolocation prediction facilitated by deep learning. We first demonstrate the complexity both by showing that a natural greedy algorithm can be arbitrarily bad and by proving that the problem is NP-Hard. We then exhibit an important tractable special case, as well as a more general approach based on mixed-integer linear programming. Through extensive experiments on real photo collections, we demonstrate that our approaches are indeed highly effective at preserving geolocation privacy.
Tasks	Combinatorial Optimization
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02085v1
PDF	https://arxiv.org/pdf/1912.02085v1.pdf
PWC	https://paperswithcode.com/paper/protecting-geolocation-privacy-of-photo
Repo	https://github.com/jinghanY/geoPrivacyAlbum
Framework	tf

Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation


Title	Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation
Authors	Jiahao Lin, Gim Hee Lee
Abstract	Existing deep learning approaches on 3d human pose estimation for videos are either based on Recurrent or Convolutional Neural Networks (RNNs or CNNs). However, RNN-based frameworks can only tackle sequences with limited frames because sequential models are sensitive to bad frames and tend to drift over long sequences. Although existing CNN-based temporal frameworks attempt to address the sensitivity and drift problems by concurrently processing all input frames in the sequence, the existing state-of-the-art CNN-based framework is limited to 3d pose estimation of a single frame from a sequential input. In this paper, we propose a deep learning-based framework that utilizes matrix factorization for sequential 3d human poses estimation. Our approach processes all input frames concurrently to avoid the sensitivity and drift problems, and yet outputs the 3d pose estimates for every frame in the input sequence. More specifically, the 3d poses in all frames are represented as a motion matrix factorized into a trajectory bases matrix and a trajectory coefficient matrix. The trajectory bases matrix is precomputed from matrix factorization approaches such as Singular Value Decomposition (SVD) or Discrete Cosine Transform (DCT), and the problem of sequential 3d pose estimation is reduced to training a deep network to regress the trajectory coefficient matrix. We demonstrate the effectiveness of our framework on long sequences by achieving state-of-the-art performances on multiple benchmark datasets. Our source code is available at: https://github.com/jiahaoLjh/trajectory-pose-3d.
Tasks	3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08289v1
PDF	https://arxiv.org/pdf/1908.08289v1.pdf
PWC	https://paperswithcode.com/paper/trajectory-space-factorization-for-deep-video
Repo	https://github.com/jiahaoLjh/trajectory-pose-3d
Framework	pytorch

Policy Poisoning in Batch Reinforcement Learning and Control


Title	Policy Poisoning in Batch Reinforcement Learning and Control
Authors	Yuzhe Ma, Xuezhou Zhang, Wen Sun, Xiaojin Zhu
Abstract	We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy. The victim is a reinforcement learner / controller which first estimates the dynamics and the rewards from a batch data set, and then solves for the optimal policy with respect to the estimates. The attacker can modify the data set slightly before learning happens, and wants to force the learner into learning a target policy chosen by the attacker. We present a unified framework for solving batch policy poisoning attacks, and instantiate the attack on two standard victims: tabular certainty equivalence learner in reinforcement learning and linear quadratic regulator in control. We show that both instantiation result in a convex optimization problem on which global optimality is guaranteed, and provide analysis on attack feasibility and attack cost. Experiments show the effectiveness of policy poisoning attacks.
Tasks
Published	2019-10-13
URL	https://arxiv.org/abs/1910.05821v2
PDF	https://arxiv.org/pdf/1910.05821v2.pdf
PWC	https://paperswithcode.com/paper/policy-poisoning-in-batch-reinforcement
Repo	https://github.com/myzwisc/PPRL_NeurIPS19
Framework	none

Spatial Mixture Models with Learnable Deep Priors for Perceptual Grouping


Title	Spatial Mixture Models with Learnable Deep Priors for Perceptual Grouping
Authors	Jinyang Yuan, Bin Li, Xiangyang Xue
Abstract	Humans perceive the seemingly chaotic world in a structured and compositional way with the prerequisite of being able to segregate conceptual entities from the complex visual scenes. The mechanism of grouping basic visual elements of scenes into conceptual entities is termed as perceptual grouping. In this work, we propose a new type of spatial mixture models with learnable priors for perceptual grouping. Different from existing methods, the proposed method disentangles the attributes of an object into `shape'' and` appearance’’ which are modeled separately by the mixture weights and the mixture components. More specifically, each object in the visual scene is fully characterized by one latent representation, which is in turn transformed into parameters of the mixture weight and the mixture component by two neural networks. The mixture weights focus on modeling spatial dependencies (i.e., shape) and the mixture components deal with intra-object variations (i.e., appearance). In addition, the background is separately modeled as a special component complementary to the foreground objects. Our extensive empirical tests on two perceptual grouping datasets demonstrate that the proposed method outperforms the state-of-the-art methods under most experimental configurations. The learned conceptual entities are generalizable to novel visual scenes and insensitive to the diversity of objects. Code is available at https://github.com/jinyangyuan/learnable-deep-priors.
Tasks
Published	2019-02-07
URL	http://arxiv.org/abs/1902.02502v2
PDF	http://arxiv.org/pdf/1902.02502v2.pdf
PWC	https://paperswithcode.com/paper/spatial-mixture-models-with-learnable-deep
Repo	https://github.com/jinyangyuan/learnable-deep-priors
Framework	pytorch

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison


Title	Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison
Authors	Dongxu Li, Cristian Rodriguez Opazo, Xin Yu, Hongdong Li
Abstract	Vision-based sign language recognition aims at helping deaf people to communicate with others. However, most existing sign language datasets are limited to a small number of words. Due to the limited vocabulary size, models learned from those datasets cannot be applied in practice. In this paper, we introduce a new large-scale Word-Level American Sign Language (WLASL) video dataset, containing more than 2000 words performed by over 100 signers. This dataset will be made publicly available to the research community. To our knowledge, it is by far the largest public ASL dataset to facilitate word-level sign recognition research. Based on this new large-scale dataset, we are able to experiment with several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios. Specifically we implement and compare two different models,i.e., (i) holistic visual appearance-based approach, and (ii) 2D human pose based approach. Both models are valuable baselines that will benefit the community for method benchmarking. Moreover, we also propose a novel pose-based temporal graph convolution networks (Pose-TGCN) that models spatial and temporal dependencies in human pose trajectories simultaneously, which has further boosted the performance of the pose-based method. Our results show that pose-based and appearance-based models achieve comparable performances up to 66% at top-10 accuracy on 2,000 words/glosses, demonstrating the validity and challenges of our dataset. Our dataset and baseline deep models are available at \url{https://dxli94.github.io/WLASL/}.
Tasks	Sign Language Recognition
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11006v2
PDF	https://arxiv.org/pdf/1910.11006v2.pdf
PWC	https://paperswithcode.com/paper/word-level-deep-sign-language-recognition
Repo	https://github.com/dxli94/WLASL
Framework	none

Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning


Title	Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning
Authors	Qiang Ma, Suwen Ge, Danyang He, Darshan Thaker, Iddo Drori
Abstract	In this work, we introduce Graph Pointer Networks (GPNs) trained using reinforcement learning (RL) for tackling the traveling salesman problem (TSP). GPNs build upon Pointer Networks by introducing a graph embedding layer on the input, which captures relationships between nodes. Furthermore, to approximate solutions to constrained combinatorial optimization problems such as the TSP with time windows, we train hierarchical GPNs (HGPNs) using RL, which learns a hierarchical policy to find an optimal city permutation under constraints. Each layer of the hierarchy is designed with a separate reward function, resulting in stable training. Our results demonstrate that GPNs trained on small-scale TSP50/100 problems generalize well to larger-scale TSP500/1000 problems, with shorter tour lengths and faster computational times. We verify that for constrained TSP problems such as the TSP with time windows, the feasible solutions found via hierarchical RL training outperform previous baselines. In the spirit of reproducible research we make our data, models, and code publicly available.
Tasks	Combinatorial Optimization, Graph Embedding, Hierarchical Reinforcement Learning
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04936v1
PDF	https://arxiv.org/pdf/1911.04936v1.pdf
PWC	https://paperswithcode.com/paper/combinatorial-optimization-by-graph-pointer
Repo	https://github.com/qiang-ma/graph-pointer-network
Framework	pytorch

SignCol: Open-Source Software for Collecting Sign Language Gestures


Title	SignCol: Open-Source Software for Collecting Sign Language Gestures
Authors	Mohammad Eslami, Mahdi Karami, Sedigheh Eslami, Solale Tabarestani, Farah Torkamani-Azar, Christoph Meinel
Abstract	Sign(ed) languages use gestures, such as hand or head movements, for communication. Sign language recognition is an assistive technology for individuals with hearing disability and its goal is to improve such individuals’ life quality by facilitating their social involvement. Since sign languages are vastly varied in alphabets, as known as signs, a sign recognition software should be capable of handling eight different types of sign combinations, e.g. numbers, letters, words and sentences. Due to the intrinsic complexity and diversity of symbolic gestures, recognition algorithms need a comprehensive visual dataset to learn by. In this paper, we describe the design and implementation of a Microsoft Kinect-based open source software, called SignCol, for capturing and saving the gestures used in sign languages. Our work supports a multi-language database and reports the recorded items statistics. SignCol can capture and store colored(RGB) frames, depth frames, infrared frames, body index frames, coordinate mapped color-body frames, skeleton information of each frame and camera parameters simultaneously.
Tasks	Sign Language Recognition
Published	2019-10-31
URL	https://arxiv.org/abs/1911.00071v1
PDF	https://arxiv.org/pdf/1911.00071v1.pdf
PWC	https://paperswithcode.com/paper/signcol-open-source-software-for-collecting
Repo	https://github.com/mohaEs/SignCol
Framework	none

A Fine-Grained Spectral Perspective on Neural Networks


Title	A Fine-Grained Spectral Perspective on Neural Networks
Authors	Greg Yang, Hadi Salman
Abstract	Are neural networks biased toward simple functions? Does depth always help learn more complex features? Is training the last layer of a network as good as training all layers? These questions seem unrelated at face value, but in this work we give all of them a common treatment from the spectral perspective. We will study the spectra of the Conjugate Kernel, CK, (also called the Neural Network-Gaussian Process Kernel), and the Neural Tangent Kernel, NTK. Roughly, the CK and the NTK tell us respectively “what a network looks like at initialization"and “what a network looks like during and after training.” Their spectra then encode valuable information about the initial distribution and the training and generalization properties of neural networks. By analyzing the eigenvalues, we lend novel insights into the questions put forth at the beginning, and we verify these insights by extensive experiments of neural networks. We believe the computational tools we develop here for analyzing the spectra of CK and NTK serve as a solid foundation for future studies of deep neural networks. We have open-sourced the code for it and for generating the plots in this paper at github.com/thegregyang/NNspectra.
Tasks
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10599v2
PDF	https://arxiv.org/pdf/1907.10599v2.pdf
PWC	https://paperswithcode.com/paper/a-fine-grained-spectral-perspective-on-neural
Repo	https://github.com/thegregyang/NNspectra
Framework	pytorch

CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis


Title	CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis
Authors	Jiadong Liang, Wenjie Pei, Feng Lu
Abstract	Typical methods for text-to-image synthesis seek to design effective generative architecture to model the text-to-image mapping directly. It is fairly arduous due to the cross-modality translation involved in the task of text-to-image synthesis. In this paper we circumvent this problem by focusing on parsing the content of both the input text and the synthesized image thoroughly to model the text-to-image consistency in the semantic level. In particular, we design a memory structure to parse the textual content by exploring semantic correspondence between each word in the vocabulary to its various visual contexts across relevant images in training data during text encoding. On the other hand, the synthesized image is parsed to learn its semantics in an object-aware manner. Moreover, we customize a conditional discriminator, which models the fine-grained correlations between words and image sub-regions to push for the cross-modality semantic alignment between the input text and the synthesized image. Thus, a full-spectrum content-oriented parsing in the deep semantic level is performed by our model, which is referred to as Content-Parsing Generative Adversarial Networks (CPGAN). Extensive experiments on COCO dataset manifest that CPGAN advances the state-of-the-art performance significantly.
Tasks	Image Generation
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08562v1
PDF	https://arxiv.org/pdf/1912.08562v1.pdf
PWC	https://paperswithcode.com/paper/cpgan-full-spectrum-content-parsing
Repo	https://github.com/dongdongdong666/CPGAN
Framework	pytorch

Joint Learning of Neural Networks via Iterative Reweighted Least Squares


Title	Joint Learning of Neural Networks via Iterative Reweighted Least Squares
Authors	Zaiwei Zhang, Xiangru Huang, Qixing Huang, Xiao Zhang, Yuan Li
Abstract	In this paper, we introduce the problem of jointly learning feed-forward neural networks across a set of relevant but diverse datasets. Compared to learning a separate network from each dataset in isolation, joint learning enables us to extract correlated information across multiple datasets to significantly improve the quality of learned networks. We formulate this problem as joint learning of multiple copies of the same network architecture and enforce the network weights to be shared across these networks. Instead of hand-encoding the shared network layers, we solve an optimization problem to automatically determine how layers should be shared between each pair of datasets. Experimental results show that our approach outperforms baselines without joint learning and those using pretraining-and-fine-tuning. We show the effectiveness of our approach on three tasks: image classification, learning auto-encoders, and image generation.
Tasks	Image Classification, Image Generation
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06526v2
PDF	https://arxiv.org/pdf/1905.06526v2.pdf
PWC	https://paperswithcode.com/paper/joint-learning-of-neural-networks-via
Repo	https://github.com/zaiweizhang/Joint-Learning-of-NN
Framework	tf