January 25, 2020

3275 words 16 mins read

Paper Group NAWR 43

Paper Group NAWR 43

A Non-negative Symmetric Encoder-Decoder Approach for Community Detection. A Late Fusion CNN for Digital Matting. Greedy Sampling for Approximate Clustering in the Presence of Outliers. PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction. Splitter: Learning Node Representations that Capture Multiple S …

A Non-negative Symmetric Encoder-Decoder Approach for Community Detection

Title A Non-negative Symmetric Encoder-Decoder Approach for Community Detection
Authors Bing-Jie Sun, Huawei Shen, Jinhua Gao, Wentao Ouyang, Xueqi Cheng
Abstract Community detection or graph clustering is crucial to understanding the structure of complex networks and extracting relevant knowledge from networked data. Latent factor model, e.g., non-negative matrix factorization and mixed membership block model, is one of the most successful methods for community detection. Latent factor models for community detection aim to find a distributed and generally low-dimensional representation, or coding, that captures the structural regularity of network and reflects the community membership of nodes. Existing latent factor models are mainly based on reconstructing a network from the representation of its nodes, namely network decoder, while constraining the representation to have certain desirable properties. These methods, however, lack an encoder that transforms nodes into their representation. Consequently, they fail to give a clear explanation about the meaning of a community and suffer from undesired computational problems. In this paper, we propose a non-negative symmetric encoder-decoder approach for community detection. By explicitly integrating a decoder and an encoder into a unified loss function, the proposed approach achieves better performance over state-of-the-art latent factor models for community detection task. Moreover, different from existing methods that explicitly impose the sparsity constraint on the representation of nodes, the proposed approach implicitly achieves the sparsity of node representation through its symmetric and non-negative properties, making the optimization much easier than competing methods based on sparse matrix factorization.
Tasks Community Detection, Graph Clustering, Network Embedding, Node Classification
Published 2019-12-24
URL https://dl.acm.org/citation.cfm?id=3132902
PDF http://www.bigdatalab.ac.cn/~shenhuawei/publications/2017/cikm-sun.pdf
PWC https://paperswithcode.com/paper/a-non-negative-symmetric-encoder-decoder
Repo https://github.com/benedekrozemberczki/karateclub
Framework none

A Late Fusion CNN for Digital Matting

Title A Late Fusion CNN for Digital Matting
Authors Yunke Zhang, Lixue Gong, Lubin Fan, Peiran Ren, Qixing Huang, Hujun Bao, Weiwei Xu
Abstract This paper studies the structure of a deep convolutional neural network to predict the foreground alpha matte by taking a single RGB image as input. Our network is fully convolutional with two decoder branches for the foreground and background classification respectively. Then a fusion branch is used to integrate the two classification results which gives rise to alpha values as the soft segmentation result. This design provides more degrees of freedom than a single decoder branch for the network to obtain better alpha values during training. The network can implicitly produce trimaps without user interaction, which is easy to use for novices without expertise in digital matting. Experimental results demonstrate that our network can achieve high-quality alpha mattes for various types of objects and outperform the state-of-the-art CNN-based image matting methods on the human image matting task.
Tasks Image Matting
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Zhang_A_Late_Fusion_CNN_for_Digital_Matting_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_A_Late_Fusion_CNN_for_Digital_Matting_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/a-late-fusion-cnn-for-digital-matting
Repo https://github.com/yunkezhang/FusionMatting
Framework none

Greedy Sampling for Approximate Clustering in the Presence of Outliers

Title Greedy Sampling for Approximate Clustering in the Presence of Outliers
Authors Aditya Bhaskara, Sharvaree Vadgama, Hong Xu
Abstract Greedy algorithms such as adaptive sampling (k-means++) and furthest point traversal are popular choices for clustering problems. One the one hand, they possess good theoretical approximation guarantees, and on the other, they are fast and easy to implement. However, one main issue with these algorithms is the sensitivity to noise/outliers in the data. In this work we show that for k-means and k-center clustering, simple modifications to the well-studied greedy algorithms result in nearly identical guarantees, while additionally being robust to outliers. For instance, in the case of k-means++, we show that a simple thresholding operation on the distances suffices to obtain an O(\log k) approximation to the objective. We obtain similar results for the simpler k-center problem. Finally, we show experimentally that our algorithms are easy to implement and scale well. We also measure their ability to identify noisy points added to a dataset.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/9294-greedy-sampling-for-approximate-clustering-in-the-presence-of-outliers
PDF http://papers.nips.cc/paper/9294-greedy-sampling-for-approximate-clustering-in-the-presence-of-outliers.pdf
PWC https://paperswithcode.com/paper/greedy-sampling-for-approximate-clustering-in
Repo https://github.com/Sharvaree/KMeans_Experiments
Framework none

PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction

Title PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction
Authors Amir Rasouli, Iuliia Kotseruba, Toni Kunic, John K. Tsotsos
Abstract Pedestrian behavior anticipation is a key challenge in the design of assistive and autonomous driving systems suitable for urban environments. An intelligent system should be able to understand the intentions or underlying motives of pedestrians and to predict their forthcoming actions. To date, only a few public datasets were proposed for the purpose of studying pedestrian behavior prediction in the context of intelligent driving. To this end, we propose a novel large-scale dataset designed for pedestrian intention estimation (PIE). We conducted a large-scale human experiment to establish human reference data for pedestrian intention in traffic scenes. We propose models for estimating pedestrian crossing intention and predicting their future trajectory. Our intention estimation model achieves 79% accuracy and our trajectory prediction algorithm outperforms state-of-the-art by 26% on the proposed dataset. We further show that combining pedestrian intention with observed motion improves trajectory prediction. The dataset and models are available at http://data.nvision2.eecs.yorku.ca/PIE_dataset/.
Tasks Autonomous Driving, Trajectory Prediction
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Rasouli_PIE_A_Large-Scale_Dataset_and_Models_for_Pedestrian_Intention_Estimation_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Rasouli_PIE_A_Large-Scale_Dataset_and_Models_for_Pedestrian_Intention_Estimation_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/pie-a-large-scale-dataset-and-models-for
Repo https://github.com/aras62/PIE
Framework none

Splitter: Learning Node Representations that Capture Multiple Social Contexts

Title Splitter: Learning Node Representations that Capture Multiple Social Contexts
Authors Alessandro Epasto, Bryan Perozzi
Abstract Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.
Tasks Graph Embedding, Link Prediction, Network Embedding
Published 2019-03-18
URL http://epasto.org/papers/www2019splitter.pdf
PDF http://epasto.org/papers/www2019splitter.pdf
PWC https://paperswithcode.com/paper/splitter-learning-node-representations-that
Repo https://github.com/benedekrozemberczki/Splitter
Framework pytorch

Attribute-aware non-linear co-embeddings of graph features

Title Attribute-aware non-linear co-embeddings of graph features
Authors Ahmed Rashed; Josif Grabocka; Lars Schmidt-Thieme
Abstract In very sparse recommender data sets, attributes of users such as age, gender and home location and attributes of items such as, in the case of movies, genre, release year, and director can improve the recommendation accuracy, especially for users and items that have few ratings. While most recommendation models can be extended to take attributes of users and items into account, their architectures usually become more complicated. While attributes for items are often easy to be provided, attributes for users are often scarce for reasons of privacy or simply because they are not relevant to the operational process at hand. In this paper, we address these two problems for attribute-aware recommender systems by proposing a simple model that co-embeds users and items into a joint latent space in a similar way as a vanilla matrix factorization, but with non-linear latent features construction that seamlessly can ingest user or item attributes or both (GraphRec). To address the second problem, scarce attributes, the proposed model treats the user-item relation as a bipartite graph and constructs generic user and item attributes via the Laplacian of the user-item co-occurrence graph that requires no further external side information but the mere rating matrix. In experiments on three recommender datasets, we show that GraphRec significantly outperforms existing state-of-the-art attribute-aware and content-aware recommender systems even without using any side information.
Tasks Recommendation Systems
Published 2019-09-16
URL https://www.ismll.uni-hildesheim.de/pub/pdfs/Ahmed_RecSys19.pdf
PDF https://www.ismll.uni-hildesheim.de/pub/pdfs/Ahmed_RecSys19.pdf
PWC https://paperswithcode.com/paper/attribute-aware-non-linear-co-embeddings-of
Repo https://github.com/ahmedrashed-ml/GraphRec
Framework tf

Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network

Title Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network
Authors Xianglei Xing, Tian Han, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu
Abstract We present a deformable generator model to disentangle the appearance and geometric information in purely unsupervised manner. The appearance generator models the appearance related information, including color, illumination, identity or category, of an image, while the geometric generator performs geometric related warping, such as rotation and stretching, through generating displacement of the coordinates of each pixel to obtain the final image. Two generators act upon independent latent factors to extract disentangled appearance and geometric information from image. The proposed scheme is general and can be easily integrated into different generative models. An extensive set of qualitative and quantitative experiments show that the appearance and geometric information can be well disentangled, and the learned geometric generator can be conveniently transferred to the other image datasets to facilitate knowledge transfer tasks.
Tasks Transfer Learning
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Xing_Unsupervised_Disentangling_of_Appearance_and_Geometry_by_Deformable_Generator_Network_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Xing_Unsupervised_Disentangling_of_Appearance_and_Geometry_by_Deformable_Generator_Network_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/unsupervised-disentangling-of-appearance-and
Repo https://github.com/andyxingxl/Deformable-generator
Framework tf

In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images

Title In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images
Authors Marin Orsic, Ivan Kreso, Petra Bevandic, Sinisa Segvic
Abstract Recent success of semantic segmentation approaches on demanding road driving datasets has spurred interest in many related application fields. Many of these applications involve real-time prediction on mobile platforms such as cars, drones and various kinds of robots. Real-time setup is challenging due to extraordinary computational complexity involved. Many previous works address the challenge with custom lightweight architectures which decrease computational complexity by reducing depth, width and layer capacity with respect to general purpose architectures. We propose an alternative approach which achieves a significantly better performance across a wide range of computing budgets. First, we rely on a light-weight general purpose architecture as the main recognition engine. Then, we leverage light-weight upsampling with lateral connections as the most cost-effective solution to restore the prediction resolution. Finally, we propose to enlarge the receptive field by fusing shared features at multiple resolutions in a novel fashion. Experiments on several road driving datasets show a substantial advantage of the proposed approach, either with ImageNet pre-trained parameters or when we learn from scratch. Our Cityscapes test submission entitled SwiftNetRN-18 delivers 75.5% MIoU and achieves 39.9 Hz on 1024x2048 images on GTX1080Ti.
Tasks Real-Time Semantic Segmentation, Semantic Segmentation
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Orsic_In_Defense_of_Pre-Trained_ImageNet_Architectures_for_Real-Time_Semantic_Segmentation_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Orsic_In_Defense_of_Pre-Trained_ImageNet_Architectures_for_Real-Time_Semantic_Segmentation_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/in-defense-of-pre-trained-imagenet-1
Repo https://github.com/orsic/swiftnet
Framework pytorch

Light Field Messaging With Deep Photographic Steganography

Title Light Field Messaging With Deep Photographic Steganography
Authors Eric Wengrowski, Kristin Dana
Abstract We develop Light Field Messaging (LFM), a process of embedding, transmitting, and receiving hidden information in video that is displayed on a screen and captured by a handheld camera. The goal of the system is to minimize perceived visual artifacts of the message embedding, while simultaneously maximizing the accuracy of message recovery on the camera side. LFM requires photographic steganography for embedding messages that can be displayed and camera-captured. Unlike digital steganography, the embedding requirements are significantly more challenging due to the combined effect of the screen’s radiometric emittance function, the camera’s sensitivity function, and the camera-display relative geometry. We devise and train a network to jointly learn a deep embedding and recovery algorithm that requires no multi-frame synchronization. A key novel component is the camera display transfer function (CDTF) to model the camera-display pipeline. To learn this CDTF we introduce a dataset (Camera-Display 1M) of 1,000,000 camera-captured images collected from 25 camera-display pairs. The result of this work is a high-performance real-time LFM system using consumer-grade displays and smartphone cameras.
Tasks
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Wengrowski_Light_Field_Messaging_With_Deep_Photographic_Steganography_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Wengrowski_Light_Field_Messaging_With_Deep_Photographic_Steganography_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/light-field-messaging-with-deep-photographic
Repo https://github.com/mathski/LFM
Framework pytorch

Limitations of Lazy Training of Two-layers Neural Network

Title Limitations of Lazy Training of Two-layers Neural Network
Authors Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari
Abstract We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_*(x_i) for f_* an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i’s are the corresponding class labels. We use two-layers neural networks with quadratic activations, and compare three different learning regimes: the random features (RF) regime in which we only train the second-layer weights; the neural tangent (NT) regime in which we train a linearization of the neural network around its initialization; the fully trained neural network (NN) regime in which we train all the weights in the network. We prove that, even for the simple quadratic model of point (1), there is a potentially unbounded gap between the prediction risk achieved in these three training regimes, when the number of neurons is smaller than the ambient dimension. When the number of neurons is larger than the number of dimensions, the problem is significantly easier and both NT and NN learning achieve zero risk.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/9111-limitations-of-lazy-training-of-two-layers-neural-network
PDF http://papers.nips.cc/paper/9111-limitations-of-lazy-training-of-two-layers-neural-network.pdf
PWC https://paperswithcode.com/paper/limitations-of-lazy-training-of-two-layers-1
Repo https://github.com/bGhorbani/Lazy-Training-Neural-Nets
Framework tf

TAB-VCR: Tags and Attributes based VCR Baselines

Title TAB-VCR: Tags and Attributes based VCR Baselines
Authors Jingxiang Lin, Unnat Jain, Alexander Schwing
Abstract Reasoning is an important ability that we learn from a very early age. Yet, reasoning is extremely hard for algorithms. Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets. To develop models with better reasoning abilities, recently, the new visual commonsense reasoning(VCR) task has been introduced. Not only do models have to answer questions, but also do they have to provide a reason for the given answer. The proposed baseline achieved compelling results, leveraging a meticulously designed model composed of LSTM modules and attention nets. Here we show that a much simpler model obtained by ablating and pruning the existing intricate baseline can perform better with half the number of trainable parameters. By associating visual features with attribute information and better text to image grounding, we obtain further improvements for our simpler & effective baseline, TAB-VCR. We show that this approach results in a 5.3%, 4.4% and 6.5% absolute improvement over the previous state-of-the-art on question answering, answer justification and holistic VCR. Webpage: https://deanplayerljx.github.io/tabvcr/
Tasks Question Answering, Visual Commonsense Reasoning, Visual Dialog, Visual Question Answering
Published 2019-12-01
URL http://papers.nips.cc/paper/9693-tab-vcr-tags-and-attributes-based-vcr-baselines
PDF http://papers.nips.cc/paper/9693-tab-vcr-tags-and-attributes-based-vcr-baselines.pdf
PWC https://paperswithcode.com/paper/tab-vcr-tags-and-attributes-based-vcr-1
Repo https://github.com/Deanplayerljx/tab-vcr
Framework pytorch

Learning Erdos-Renyi Random Graphs via Edge Detecting Queries

Title Learning Erdos-Renyi Random Graphs via Edge Detecting Queries
Authors Zihan Li, Matthias Fresacher, Jonathan Scarlett
Abstract In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. While learning arbitrary graphs with $n$ nodes and $k$ edges is known to be hard in the sense of requiring $\Omega( \min{ k^2 \log n, n^2})$ tests (even when a small probability of error is allowed), we show that learning an Erd\H{o}s-R'enyi random graph with an average of $\kbar$ edges is much easier; namely, one can attain asymptotically vanishing error probability with only $O(\kbar \log n)$ tests. We establish such bounds for a variety of algorithms inspired by the group testing problem, with explicit constant factors indicating a near-optimal number of tests, and in some cases asymptotic optimality including constant factors. In addition, we present an alternative design that permits a near-optimal sublinear decoding time of $O(\kbar \log^2 \kbar + \kbar \log n)$.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/8332-learning-erdos-renyi-random-graphs-via-edge-detecting-queries
PDF http://papers.nips.cc/paper/8332-learning-erdos-renyi-random-graphs-via-edge-detecting-queries.pdf
PWC https://paperswithcode.com/paper/learning-erdos-renyi-random-graphs-via-edge
Repo https://github.com/scarlett-nus/er_edge_det
Framework none

Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose Reconstruction

Title Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose Reconstruction
Authors Aleksis Pirinen, Erik Gärtner, Cristian Sminchisescu
Abstract Existing state-of-the-art estimation systems can detect 2d poses of multiple people in images quite reliably. In contrast, 3d pose estimation from a single image is ill-posed due to occlusion and depth ambiguities. Assuming access to multiple cameras, or given an active system able to position itself to observe the scene from multiple viewpoints, reconstructing 3d pose from 2d measurements becomes well-posed within the framework of standard multi-view geometry. Less clear is what is an informative set of viewpoints for accurate 3d reconstruction, particularly in complex scenes, where people are occluded by others or by scene objects. In order to address the view selection problem in a principled way, we here introduce ACTOR, an active triangulation agent for 3d human pose reconstruction. Our fully trainable agent consists of a 2d pose estimation network (any of which would work) and a deep reinforcement learning-based policy for camera viewpoint selection. The policy predicts observation viewpoints, the number of which varies adaptively depending on scene content, and the associated images are fed to an underlying pose estimator. Importantly, training the policy requires no annotations - given a 2d pose estimator, ACTOR is trained in a self-supervised manner. In extensive evaluations on complex multi-people scenes filmed in a Panoptic dome, under multiple viewpoints, we compare our active triangulation agent to strong multi-view baselines, and show that ACTOR produces significantly more accurate 3d pose reconstructions. We also provide a proof-of-concept experiment indicating the potential of connecting our view selection policy to a physical drone observer.
Tasks 3D Pose Estimation, 3D Reconstruction, Pose Estimation
Published 2019-12-01
URL http://papers.nips.cc/paper/8646-domes-to-drones-self-supervised-active-triangulation-for-3d-human-pose-reconstruction
PDF http://papers.nips.cc/paper/8646-domes-to-drones-self-supervised-active-triangulation-for-3d-human-pose-reconstruction.pdf
PWC https://paperswithcode.com/paper/domes-to-drones-self-supervised-active
Repo https://github.com/ErikGartner/actor
Framework tf

Divergence-Augmented Policy Optimization

Title Divergence-Augmented Policy Optimization
Authors Qing Wang, Yingru Li, Jiechao Xiong, Tong Zhang
Abstract In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation. Empirical experiments on Atari games show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, our method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms.
Tasks Atari Games, Policy Gradient Methods
Published 2019-12-01
URL http://papers.nips.cc/paper/8842-divergence-augmented-policy-optimization
PDF http://papers.nips.cc/paper/8842-divergence-augmented-policy-optimization.pdf
PWC https://paperswithcode.com/paper/divergence-augmented-policy-optimization
Repo https://github.com/lns/dapo
Framework none

Photonic human identification based on deep learning of back scattered laser speckle patterns

Title Photonic human identification based on deep learning of back scattered laser speckle patterns
Authors Zeev Kalyzhner, Or Levitas, Felix Kalichman, Ron Jacobson, and Zeev Zalevsky
Abstract The analysis of the dynamics of speckle patterns that are generated when laser light is back scattered from a tissue has been recently shown as very applicable for remote sensing of various bio-medical parameters. In this work, we present how the analysis of a static single speckle pattern scattered from the forehead of a subject, together with advanced machine learning techniques based on multilayered neural networks, can offer novel approach to accurate identification within a small predefined number of classes (e.g., a ‘smart home’ setting which restricts its operations for family members only). Processing the static scattering speckle pattern by neural networks enables extraction of unique features with no previous expert knowledge being required. Using the right model allows for a very accurate differentiation between desirable categories, and that model can form a basis for using speckles patterns as a form of identity measure of ‘forehead-print’.
Tasks
Published 2019-11-25
URL https://www.osapublishing.org/oe/abstract.cfm?uri=oe-27-24-36002
PDF https://www.osapublishing.org/DirectPDFAccess/B480DF87-04B7-7FB5-63D05DE2AF76AACE_423449/oe-27-24-36002.pdf?da=1&id=423449&seq=0&mobile=no
PWC https://paperswithcode.com/paper/photonic-human-identification-based-on-deep
Repo https://github.com/zeevikal/speckles-classification
Framework tf
comments powered by Disqus