January 25, 2020

3275 words 16 mins read

Paper Group NAWR 43

A Non-negative Symmetric Encoder-Decoder Approach for Community Detection. A Late Fusion CNN for Digital Matting. Greedy Sampling for Approximate Clustering in the Presence of Outliers. PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction. Splitter: Learning Node Representations that Capture Multiple S …

A Non-negative Symmetric Encoder-Decoder Approach for Community Detection


Title	A Non-negative Symmetric Encoder-Decoder Approach for Community Detection
Authors	Bing-Jie Sun, Huawei Shen, Jinhua Gao, Wentao Ouyang, Xueqi Cheng
Abstract	Community detection or graph clustering is crucial to understanding the structure of complex networks and extracting relevant knowledge from networked data. Latent factor model, e.g., non-negative matrix factorization and mixed membership block model, is one of the most successful methods for community detection. Latent factor models for community detection aim to find a distributed and generally low-dimensional representation, or coding, that captures the structural regularity of network and reflects the community membership of nodes. Existing latent factor models are mainly based on reconstructing a network from the representation of its nodes, namely network decoder, while constraining the representation to have certain desirable properties. These methods, however, lack an encoder that transforms nodes into their representation. Consequently, they fail to give a clear explanation about the meaning of a community and suffer from undesired computational problems. In this paper, we propose a non-negative symmetric encoder-decoder approach for community detection. By explicitly integrating a decoder and an encoder into a unified loss function, the proposed approach achieves better performance over state-of-the-art latent factor models for community detection task. Moreover, different from existing methods that explicitly impose the sparsity constraint on the representation of nodes, the proposed approach implicitly achieves the sparsity of node representation through its symmetric and non-negative properties, making the optimization much easier than competing methods based on sparse matrix factorization.
Tasks	Community Detection, Graph Clustering, Network Embedding, Node Classification
Published	2019-12-24
URL	https://dl.acm.org/citation.cfm?id=3132902
PDF	http://www.bigdatalab.ac.cn/~shenhuawei/publications/2017/cikm-sun.pdf
PWC	https://paperswithcode.com/paper/a-non-negative-symmetric-encoder-decoder
Repo	https://github.com/benedekrozemberczki/karateclub
Framework	none

A Late Fusion CNN for Digital Matting


Title	A Late Fusion CNN for Digital Matting
Authors	Yunke Zhang, Lixue Gong, Lubin Fan, Peiran Ren, Qixing Huang, Hujun Bao, Weiwei Xu
Abstract	This paper studies the structure of a deep convolutional neural network to predict the foreground alpha matte by taking a single RGB image as input. Our network is fully convolutional with two decoder branches for the foreground and background classification respectively. Then a fusion branch is used to integrate the two classification results which gives rise to alpha values as the soft segmentation result. This design provides more degrees of freedom than a single decoder branch for the network to obtain better alpha values during training. The network can implicitly produce trimaps without user interaction, which is easy to use for novices without expertise in digital matting. Experimental results demonstrate that our network can achieve high-quality alpha mattes for various types of objects and outperform the state-of-the-art CNN-based image matting methods on the human image matting task.
Tasks	Image Matting
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Zhang_A_Late_Fusion_CNN_for_Digital_Matting_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_A_Late_Fusion_CNN_for_Digital_Matting_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/a-late-fusion-cnn-for-digital-matting
Repo	https://github.com/yunkezhang/FusionMatting
Framework	none

Greedy Sampling for Approximate Clustering in the Presence of Outliers


Title	Greedy Sampling for Approximate Clustering in the Presence of Outliers
Authors	Aditya Bhaskara, Sharvaree Vadgama, Hong Xu
Abstract	Greedy algorithms such as adaptive sampling (k-means++) and furthest point traversal are popular choices for clustering problems. One the one hand, they possess good theoretical approximation guarantees, and on the other, they are fast and easy to implement. However, one main issue with these algorithms is the sensitivity to noise/outliers in the data. In this work we show that for k-means and k-center clustering, simple modifications to the well-studied greedy algorithms result in nearly identical guarantees, while additionally being robust to outliers. For instance, in the case of k-means++, we show that a simple thresholding operation on the distances suffices to obtain an O(\log k) approximation to the objective. We obtain similar results for the simpler k-center problem. Finally, we show experimentally that our algorithms are easy to implement and scale well. We also measure their ability to identify noisy points added to a dataset.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9294-greedy-sampling-for-approximate-clustering-in-the-presence-of-outliers
PDF	http://papers.nips.cc/paper/9294-greedy-sampling-for-approximate-clustering-in-the-presence-of-outliers.pdf
PWC	https://paperswithcode.com/paper/greedy-sampling-for-approximate-clustering-in
Repo	https://github.com/Sharvaree/KMeans_Experiments
Framework	none

PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction


Title	PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction
Authors	Amir Rasouli, Iuliia Kotseruba, Toni Kunic, John K. Tsotsos
Abstract	Pedestrian behavior anticipation is a key challenge in the design of assistive and autonomous driving systems suitable for urban environments. An intelligent system should be able to understand the intentions or underlying motives of pedestrians and to predict their forthcoming actions. To date, only a few public datasets were proposed for the purpose of studying pedestrian behavior prediction in the context of intelligent driving. To this end, we propose a novel large-scale dataset designed for pedestrian intention estimation (PIE). We conducted a large-scale human experiment to establish human reference data for pedestrian intention in traffic scenes. We propose models for estimating pedestrian crossing intention and predicting their future trajectory. Our intention estimation model achieves 79% accuracy and our trajectory prediction algorithm outperforms state-of-the-art by 26% on the proposed dataset. We further show that combining pedestrian intention with observed motion improves trajectory prediction. The dataset and models are available at http://data.nvision2.eecs.yorku.ca/PIE_dataset/.
Tasks	Autonomous Driving, Trajectory Prediction
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Rasouli_PIE_A_Large-Scale_Dataset_and_Models_for_Pedestrian_Intention_Estimation_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Rasouli_PIE_A_Large-Scale_Dataset_and_Models_for_Pedestrian_Intention_Estimation_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/pie-a-large-scale-dataset-and-models-for
Repo	https://github.com/aras62/PIE
Framework	none


Title	Splitter: Learning Node Representations that Capture Multiple Social Contexts
Authors	Alessandro Epasto, Bryan Perozzi
Abstract	Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.
Tasks	Graph Embedding, Link Prediction, Network Embedding
Published	2019-03-18
URL	http://epasto.org/papers/www2019splitter.pdf
PDF	http://epasto.org/papers/www2019splitter.pdf
PWC	https://paperswithcode.com/paper/splitter-learning-node-representations-that
Repo	https://github.com/benedekrozemberczki/Splitter
Framework	pytorch

Attribute-aware non-linear co-embeddings of graph features


Title	Attribute-aware non-linear co-embeddings of graph features
Authors	Ahmed Rashed; Josif Grabocka; Lars Schmidt-Thieme
Abstract	In very sparse recommender data sets, attributes of users such as age, gender and home location and attributes of items such as, in the case of movies, genre, release year, and director can improve the recommendation accuracy, especially for users and items that have few ratings. While most recommendation models can be extended to take attributes of users and items into account, their architectures usually become more complicated. While attributes for items are often easy to be provided, attributes for users are often scarce for reasons of privacy or simply because they are not relevant to the operational process at hand. In this paper, we address these two problems for attribute-aware recommender systems by proposing a simple model that co-embeds users and items into a joint latent space in a similar way as a vanilla matrix factorization, but with non-linear latent features construction that seamlessly can ingest user or item attributes or both (GraphRec). To address the second problem, scarce attributes, the proposed model treats the user-item relation as a bipartite graph and constructs generic user and item attributes via the Laplacian of the user-item co-occurrence graph that requires no further external side information but the mere rating matrix. In experiments on three recommender datasets, we show that GraphRec significantly outperforms existing state-of-the-art attribute-aware and content-aware recommender systems even without using any side information.
Tasks	Recommendation Systems
Published	2019-09-16
URL	https://www.ismll.uni-hildesheim.de/pub/pdfs/Ahmed_RecSys19.pdf
PDF	https://www.ismll.uni-hildesheim.de/pub/pdfs/Ahmed_RecSys19.pdf
PWC	https://paperswithcode.com/paper/attribute-aware-non-linear-co-embeddings-of
Repo	https://github.com/ahmedrashed-ml/GraphRec
Framework	tf

Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network


Title	Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network
Authors	Xianglei Xing, Tian Han, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu
Abstract	We present a deformable generator model to disentangle the appearance and geometric information in purely unsupervised manner. The appearance generator models the appearance related information, including color, illumination, identity or category, of an image, while the geometric generator performs geometric related warping, such as rotation and stretching, through generating displacement of the coordinates of each pixel to obtain the final image. Two generators act upon independent latent factors to extract disentangled appearance and geometric information from image. The proposed scheme is general and can be easily integrated into different generative models. An extensive set of qualitative and quantitative experiments show that the appearance and geometric information can be well disentangled, and the learned geometric generator can be conveniently transferred to the other image datasets to facilitate knowledge transfer tasks.
Tasks	Transfer Learning
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Xing_Unsupervised_Disentangling_of_Appearance_and_Geometry_by_Deformable_Generator_Network_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Xing_Unsupervised_Disentangling_of_Appearance_and_Geometry_by_Deformable_Generator_Network_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/unsupervised-disentangling-of-appearance-and
Repo	https://github.com/andyxingxl/Deformable-generator
Framework	tf

In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images


Title	In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images
Authors	Marin Orsic, Ivan Kreso, Petra Bevandic, Sinisa Segvic
Abstract	Recent success of semantic segmentation approaches on demanding road driving datasets has spurred interest in many related application fields. Many of these applications involve real-time prediction on mobile platforms such as cars, drones and various kinds of robots. Real-time setup is challenging due to extraordinary computational complexity involved. Many previous works address the challenge with custom lightweight architectures which decrease computational complexity by reducing depth, width and layer capacity with respect to general purpose architectures. We propose an alternative approach which achieves a significantly better performance across a wide range of computing budgets. First, we rely on a light-weight general purpose architecture as the main recognition engine. Then, we leverage light-weight upsampling with lateral connections as the most cost-effective solution to restore the prediction resolution. Finally, we propose to enlarge the receptive field by fusing shared features at multiple resolutions in a novel fashion. Experiments on several road driving datasets show a substantial advantage of the proposed approach, either with ImageNet pre-trained parameters or when we learn from scratch. Our Cityscapes test submission entitled SwiftNetRN-18 delivers 75.5% MIoU and achieves 39.9 Hz on 1024x2048 images on GTX1080Ti.
Tasks	Real-Time Semantic Segmentation, Semantic Segmentation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Orsic_In_Defense_of_Pre-Trained_ImageNet_Architectures_for_Real-Time_Semantic_Segmentation_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Orsic_In_Defense_of_Pre-Trained_ImageNet_Architectures_for_Real-Time_Semantic_Segmentation_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/in-defense-of-pre-trained-imagenet-1
Repo	https://github.com/orsic/swiftnet
Framework	pytorch

Light Field Messaging With Deep Photographic Steganography


Title	Light Field Messaging With Deep Photographic Steganography
Authors	Eric Wengrowski, Kristin Dana
Abstract	We develop Light Field Messaging (LFM), a process of embedding, transmitting, and receiving hidden information in video that is displayed on a screen and captured by a handheld camera. The goal of the system is to minimize perceived visual artifacts of the message embedding, while simultaneously maximizing the accuracy of message recovery on the camera side. LFM requires photographic steganography for embedding messages that can be displayed and camera-captured. Unlike digital steganography, the embedding requirements are significantly more challenging due to the combined effect of the screen’s radiometric emittance function, the camera’s sensitivity function, and the camera-display relative geometry. We devise and train a network to jointly learn a deep embedding and recovery algorithm that requires no multi-frame synchronization. A key novel component is the camera display transfer function (CDTF) to model the camera-display pipeline. To learn this CDTF we introduce a dataset (Camera-Display 1M) of 1,000,000 camera-captured images collected from 25 camera-display pairs. The result of this work is a high-performance real-time LFM system using consumer-grade displays and smartphone cameras.
Tasks
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Wengrowski_Light_Field_Messaging_With_Deep_Photographic_Steganography_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Wengrowski_Light_Field_Messaging_With_Deep_Photographic_Steganography_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/light-field-messaging-with-deep-photographic
Repo	https://github.com/mathski/LFM
Framework	pytorch

Limitations of Lazy Training of Two-layers Neural Network


Title	Limitations of Lazy Training of Two-layers Neural Network
Authors	Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari
Abstract	We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_(x_i) for f_ an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i’s are the corresponding class labels. We use two-layers neural networks with quadratic activations, and compare three different learning regimes: the random features (RF) regime in which we only train the second-layer weights; the neural tangent (NT) regime in which we train a linearization of the neural network around its initialization; the fully trained neural network (NN) regime in which we train all the weights in the network. We prove that, even for the simple quadratic model of point (1), there is a potentially unbounded gap between the prediction risk achieved in these three training regimes, when the number of neurons is smaller than the ambient dimension. When the number of neurons is larger than the number of dimensions, the problem is significantly easier and both NT and NN learning achieve zero risk.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9111-limitations-of-lazy-training-of-two-layers-neural-network
PDF	http://papers.nips.cc/paper/9111-limitations-of-lazy-training-of-two-layers-neural-network.pdf
PWC	https://paperswithcode.com/paper/limitations-of-lazy-training-of-two-layers-1
Repo	https://github.com/bGhorbani/Lazy-Training-Neural-Nets
Framework	tf

TAB-VCR: Tags and Attributes based VCR Baselines


Title	TAB-VCR: Tags and Attributes based VCR Baselines
Authors	Jingxiang Lin, Unnat Jain, Alexander Schwing
Abstract	Reasoning is an important ability that we learn from a very early age. Yet, reasoning is extremely hard for algorithms. Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets. To develop models with better reasoning abilities, recently, the new visual commonsense reasoning(VCR) task has been introduced. Not only do models have to answer questions, but also do they have to provide a reason for the given answer. The proposed baseline achieved compelling results, leveraging a meticulously designed model composed of LSTM modules and attention nets. Here we show that a much simpler model obtained by ablating and pruning the existing intricate baseline can perform better with half the number of trainable parameters. By associating visual features with attribute information and better text to image grounding, we obtain further improvements for our simpler & effective baseline, TAB-VCR. We show that this approach results in a 5.3%, 4.4% and 6.5% absolute improvement over the previous state-of-the-art on question answering, answer justification and holistic VCR. Webpage: https://deanplayerljx.github.io/tabvcr/
Tasks	Question Answering, Visual Commonsense Reasoning, Visual Dialog, Visual Question Answering
Published	2019-12-01
URL	http://papers.nips.cc/paper/9693-tab-vcr-tags-and-attributes-based-vcr-baselines
PDF	http://papers.nips.cc/paper/9693-tab-vcr-tags-and-attributes-based-vcr-baselines.pdf
PWC	https://paperswithcode.com/paper/tab-vcr-tags-and-attributes-based-vcr-1
Repo	https://github.com/Deanplayerljx/tab-vcr
Framework	pytorch

Learning Erdos-Renyi Random Graphs via Edge Detecting Queries


Title	Learning Erdos-Renyi Random Graphs via Edge Detecting Queries
Authors	Zihan Li, Matthias Fresacher, Jonathan Scarlett
Abstract	In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. While learning arbitrary graphs with $n$ nodes and $k$ edges is known to be hard in the sense of requiring $\Omega( \min{ k^2 \log n, n^2})$ tests (even when a small probability of error is allowed), we show that learning an Erd\H{o}s-R'enyi random graph with an average of $\kbar$ edges is much easier; namely, one can attain asymptotically vanishing error probability with only $O(\kbar \log n)$ tests. We establish such bounds for a variety of algorithms inspired by the group testing problem, with explicit constant factors indicating a near-optimal number of tests, and in some cases asymptotic optimality including constant factors. In addition, we present an alternative design that permits a near-optimal sublinear decoding time of $O(\kbar \log^2 \kbar + \kbar \log n)$.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8332-learning-erdos-renyi-random-graphs-via-edge-detecting-queries
PDF	http://papers.nips.cc/paper/8332-learning-erdos-renyi-random-graphs-via-edge-detecting-queries.pdf
PWC	https://paperswithcode.com/paper/learning-erdos-renyi-random-graphs-via-edge
Repo	https://github.com/scarlett-nus/er_edge_det
Framework	none

Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose Reconstruction


Title	Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose Reconstruction
Authors	Aleksis Pirinen, Erik Gärtner, Cristian Sminchisescu
Abstract	Existing state-of-the-art estimation systems can detect 2d poses of multiple people in images quite reliably. In contrast, 3d pose estimation from a single image is ill-posed due to occlusion and depth ambiguities. Assuming access to multiple cameras, or given an active system able to position itself to observe the scene from multiple viewpoints, reconstructing 3d pose from 2d measurements becomes well-posed within the framework of standard multi-view geometry. Less clear is what is an informative set of viewpoints for accurate 3d reconstruction, particularly in complex scenes, where people are occluded by others or by scene objects. In order to address the view selection problem in a principled way, we here introduce ACTOR, an active triangulation agent for 3d human pose reconstruction. Our fully trainable agent consists of a 2d pose estimation network (any of which would work) and a deep reinforcement learning-based policy for camera viewpoint selection. The policy predicts observation viewpoints, the number of which varies adaptively depending on scene content, and the associated images are fed to an underlying pose estimator. Importantly, training the policy requires no annotations - given a 2d pose estimator, ACTOR is trained in a self-supervised manner. In extensive evaluations on complex multi-people scenes filmed in a Panoptic dome, under multiple viewpoints, we compare our active triangulation agent to strong multi-view baselines, and show that ACTOR produces significantly more accurate 3d pose reconstructions. We also provide a proof-of-concept experiment indicating the potential of connecting our view selection policy to a physical drone observer.
Tasks	3D Pose Estimation, 3D Reconstruction, Pose Estimation
Published	2019-12-01
URL	http://papers.nips.cc/paper/8646-domes-to-drones-self-supervised-active-triangulation-for-3d-human-pose-reconstruction
PDF	http://papers.nips.cc/paper/8646-domes-to-drones-self-supervised-active-triangulation-for-3d-human-pose-reconstruction.pdf
PWC	https://paperswithcode.com/paper/domes-to-drones-self-supervised-active
Repo	https://github.com/ErikGartner/actor
Framework	tf

Divergence-Augmented Policy Optimization


Title	Divergence-Augmented Policy Optimization
Authors	Qing Wang, Yingru Li, Jiechao Xiong, Tong Zhang
Abstract	In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation. Empirical experiments on Atari games show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, our method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms.
Tasks	Atari Games, Policy Gradient Methods
Published	2019-12-01
URL	http://papers.nips.cc/paper/8842-divergence-augmented-policy-optimization
PDF	http://papers.nips.cc/paper/8842-divergence-augmented-policy-optimization.pdf
PWC	https://paperswithcode.com/paper/divergence-augmented-policy-optimization
Repo	https://github.com/lns/dapo
Framework	none

Photonic human identification based on deep learning of back scattered laser speckle patterns


Title	Photonic human identification based on deep learning of back scattered laser speckle patterns
Authors	Zeev Kalyzhner, Or Levitas, Felix Kalichman, Ron Jacobson, and Zeev Zalevsky
Abstract	The analysis of the dynamics of speckle patterns that are generated when laser light is back scattered from a tissue has been recently shown as very applicable for remote sensing of various bio-medical parameters. In this work, we present how the analysis of a static single speckle pattern scattered from the forehead of a subject, together with advanced machine learning techniques based on multilayered neural networks, can offer novel approach to accurate identification within a small predefined number of classes (e.g., a ‘smart home’ setting which restricts its operations for family members only). Processing the static scattering speckle pattern by neural networks enables extraction of unique features with no previous expert knowledge being required. Using the right model allows for a very accurate differentiation between desirable categories, and that model can form a basis for using speckles patterns as a form of identity measure of ‘forehead-print’.
Tasks
Published	2019-11-25
URL	https://www.osapublishing.org/oe/abstract.cfm?uri=oe-27-24-36002
PDF	https://www.osapublishing.org/DirectPDFAccess/B480DF87-04B7-7FB5-63D05DE2AF76AACE_423449/oe-27-24-36002.pdf?da=1&id=423449&seq=0&mobile=no
PWC	https://paperswithcode.com/paper/photonic-human-identification-based-on-deep
Repo	https://github.com/zeevikal/speckles-classification
Framework	tf