January 26, 2020

2926 words 14 mins read

Paper Group ANR 1401

SPGNet: Semantic Prediction Guidance for Scene Parsing. Program-Guided Image Manipulators. Lunar surface image restoration using U-net based deep neural networks. Clustering of solutions in the symmetric binary perceptron. Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation. Generating 3D People in Scenes without People. …

SPGNet: Semantic Prediction Guidance for Scene Parsing


Title	SPGNet: Semantic Prediction Guidance for Scene Parsing
Authors	Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas Huang, Wen-Mei Hwu, Honghui Shi
Abstract	Multi-scale context module and single-stage encoder-decoder structure are commonly employed for semantic segmentation. The multi-scale context module refers to the operations to aggregate feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path. In contrast, multi-stage encoder-decoder networks have been widely used in human pose estimation and show superior performance than their single-stage counterpart. However, few efforts have been attempted to bring this effective design to semantic segmentation. In this work, we propose a Semantic Prediction Guidance (SPG) module which learns to re-weight the local features through the guidance from pixel-wise semantic prediction. We find that by carefully re-weighting features across stages, a two-stage encoder-decoder network coupled with our proposed SPG module can significantly outperform its one-stage counterpart with similar parameters and computations. Finally, we report experimental results on the semantic segmentation benchmark Cityscapes, in which our SPGNet attains 81.1% on the test set using only ‘fine’ annotations.
Tasks	Pose Estimation, Scene Parsing, Semantic Segmentation
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09798v1
PDF	https://arxiv.org/pdf/1908.09798v1.pdf
PWC	https://paperswithcode.com/paper/spgnet-semantic-prediction-guidance-for-scene
Repo
Framework

Program-Guided Image Manipulators


Title	Program-Guided Image Manipulators
Authors	Jiayuan Mao, Xiuming Zhang, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
Abstract	Humans are capable of building holistic representations for images at various levels, from local objects, to pairwise relations, to global structures. The interpretation of structures involves reasoning over repetition and symmetry of the objects in the image. In this paper, we present the Program-Guided Image Manipulator (PG-IM), inducing neuro-symbolic program-like representations to represent and manipulate images. Given an image, PG-IM detects repeated patterns, induces symbolic programs, and manipulates the image using a neural network that is guided by the program. PG-IM learns from a single image, exploiting its internal statistics. Despite trained only on image inpainting, PG-IM is directly capable of extrapolation and regularity editing in a unified framework. Extensive experiments show that PG-IM achieves superior performance on all the tasks.
Tasks	Image Inpainting
Published	2019-09-04
URL	https://arxiv.org/abs/1909.02116v1
PDF	https://arxiv.org/pdf/1909.02116v1.pdf
PWC	https://paperswithcode.com/paper/program-guided-image-manipulators
Repo
Framework

Lunar surface image restoration using U-net based deep neural networks


Title	Lunar surface image restoration using U-net based deep neural networks
Authors	Hiya Roy, Subhajit Chaudhury, Toshihiko Yamasaki, Danielle DeLatte, Makiko Ohtake, Tatsuaki Hashimoto
Abstract	Image restoration is a technique that reconstructs a feasible estimate of the original image from the noisy observation. In this paper, we present a U-Net based deep neural network model to restore the missing pixels on the lunar surface image in a context-aware fashion, which is often known as image inpainting problem. We use the grayscale image of the lunar surface captured by Multiband Imager (MI) onboard Kaguya satellite for our experiments and the results show that our method can reconstruct the lunar surface image with good visual quality and improved PSNR values.
Tasks	Image Inpainting, Image Restoration
Published	2019-04-14
URL	http://arxiv.org/abs/1904.06683v1
PDF	http://arxiv.org/pdf/1904.06683v1.pdf
PWC	https://paperswithcode.com/paper/lunar-surface-image-restoration-using-u-net
Repo
Framework

Clustering of solutions in the symmetric binary perceptron


Title	Clustering of solutions in the symmetric binary perceptron
Authors	Carlo Baldassi, Riccardo Della Vecchia, Carlo Lucibello, Riccardo Zecchina
Abstract	The geometrical features of the (non-convex) loss landscape of neural network models are crucial in ensuring successful optimization and, most importantly, the capability to generalize well. While minimizers’ flatness consistently correlates with good generalization, there has been little rigorous work in exploring the condition of existence of such minimizers, even in toy models. Here we consider a simple neural network model, the symmetric perceptron, with binary weights. Phrasing the learning problem as a constraint satisfaction problem, the analogous of a flat minimizer becomes a large and dense cluster of solutions, while the narrowest minimizers are isolated solutions. We perform the first steps toward the rigorous proof of the existence of a dense cluster in certain regimes of the parameters, by computing the first and second moment upper bounds for the existence of pairs of arbitrarily close solutions. Moreover, we present a non rigorous derivation of the same bounds for sets of $y$ solutions at fixed pairwise distances.
Tasks
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06756v2
PDF	https://arxiv.org/pdf/1911.06756v2.pdf
PWC	https://paperswithcode.com/paper/clustering-of-solutions-in-the-symmetric
Repo
Framework

Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation


Title	Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation
Authors	Soochan Lee, Junsoo Ha, Gunhee Kim
Abstract	Recent advances in conditional image generation tasks, such as image-to-image translation and image inpainting, are largely accounted to the success of conditional GAN models, which are often optimized by the joint use of the GAN loss with the reconstruction loss. However, we reveal that this training recipe shared by almost all existing methods causes one critical side effect: lack of diversity in output samples. In order to accomplish both training stability and multimodal output generation, we propose novel training schemes with a new set of losses named moment reconstruction losses that simply replace the reconstruction loss. We show that our approach is applicable to any conditional generation tasks by performing thorough experiments on image-to-image translation, super-resolution and image inpainting using Cityscapes and CelebA dataset. Quantitative evaluations also confirm that our methods achieve a great diversity in outputs while retaining or even improving the visual fidelity of generated samples.
Tasks	Conditional Image Generation, Image Generation, Image Inpainting, Image-to-Image Translation, Super-Resolution
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09225v1
PDF	http://arxiv.org/pdf/1902.09225v1.pdf
PWC	https://paperswithcode.com/paper/harmonizing-maximum-likelihood-with-gans-for
Repo
Framework

Generating 3D People in Scenes without People


Title	Generating 3D People in Scenes without People
Authors	Yan Zhang, Mohamed Hassan, Heiko Neumann, Michael J. Black, Siyu Tang
Abstract	We present a fully-automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene. Given a 3D scene without people, humans can easily imagine how people could interact with the scene and the objects in it. However, this is a challenging task for a computer as solving it requires (1) the generated human bodies should be semantically plausible with the 3D environment, e.g. people sitting on the sofa or cooking near the stove; (2) the generated human-scene interaction should be physically feasible in the way that the human body and scene do not interpenetrate while, at the same time, body-scene contact supports physical interactions. To that end, we make use of the surface-based 3D human model SMPL-X. We first train a conditional variational autoencoder to predict semantically plausible 3D human pose conditioned on latent scene representations, then we further refine the generated 3D bodies using scene constraints to enforce feasible physical interaction. We show that our approach is able to synthesize realistic and expressive 3D human bodies that naturally interact with 3D environment. We perform extensive experiments demonstrating that our generative framework compares favorably with existing methods, both qualitatively and quantitatively. We believe that our scene-conditioned 3D human generation pipeline will be useful for numerous applications; e.g. to generate training data for human pose estimation, in video games and in VR/AR.
Tasks	Pose Estimation
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02923v2
PDF	https://arxiv.org/pdf/1912.02923v2.pdf
PWC	https://paperswithcode.com/paper/generating-3d-people-in-scenes-without-people
Repo
Framework

On Voting Strategies and Emergent Communication


Title	On Voting Strategies and Emergent Communication
Authors	Shubham Gupta, Ambedkar Dukkipati
Abstract	Humans use language to collectively execute complex strategies in addition to using it as a referential tool for referring to physical entities. While existing approaches that study the emergence of language in settings where the language mainly acts as a referential tool, in this paper, we study the role of emergent languages in discovering and implementing strategies in a multi-agent setting. The agents in our setup are connected via a network and are allowed to exchange messages in the form of sequences of discrete symbols. We formulate the problem as a voting game, where two candidate agents are contesting in an election and their goal is to convince the population members (other agents) in the network to vote for them by sending them messages. We use neural networks to parameterize the policies followed by agents in the game. We investigate the effect of choosing different training objectives and strategies for agents in the game and make observations about the emergent language in each case. To the best of our knowledge this is the first work that explores emergence of language for discovering and implementing strategies in a setting where agents are connected via an underlying network.
Tasks
Published	2019-02-19
URL	http://arxiv.org/abs/1902.06897v1
PDF	http://arxiv.org/pdf/1902.06897v1.pdf
PWC	https://paperswithcode.com/paper/on-voting-strategies-and-emergent
Repo
Framework

IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection


Title	IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection
Authors	Youtian Lin, Pengming Feng, Jian Guan
Abstract	Object detection in aerial images is a challenging task due to its lack of visiable features and variant orientation of objects. Currently, amount of R-CNN framework based detectors have made significant progress in predicting targets by horizontal bounding boxes (HBB) and oriented bounding boxes (OBB). However, there is still open space for one-stage anchor free solutions. This paper proposes a one-stage anchor free detector for orientional object in aerial images, which is built upon a per-pixel prediction fashion detector. We make it possible by developing a branch interacting module with a self-attention mechanism to fuse features from classification and box regression branchs. Moreover a geometric transformation is employed in angle prediction to make it more manageable for the prediction network. We also introduce an IOU loss for OBB detection, which is more efficient than regular polygon IOU. The propsed method is evaluated on DOTA and HRSC2016 datasets, and the outcomes show the higher OBB detection performance from our propsed IENet when compared with the state-of-the-art detectors.
Tasks	Object Detection, Object Detection In Aerial Images
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00969v1
PDF	https://arxiv.org/pdf/1912.00969v1.pdf
PWC	https://paperswithcode.com/paper/ienet-interacting-embranchment-one-stage
Repo
Framework

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos


Title	Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos
Authors	Dongliang He, Xiang Zhao, Jizhou Huang, Fu Li, Xiao Liu, Shilei Wen
Abstract	The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a pre-segmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-of-the-art performance on ActivityNet’18 DenseCaption dataset and Charades-STA dataset while observing only 10 or less clips per video.
Tasks	Decision Making, Multi-Task Learning
Published	2019-01-21
URL	http://arxiv.org/abs/1901.06829v1
PDF	http://arxiv.org/pdf/1901.06829v1.pdf
PWC	https://paperswithcode.com/paper/read-watch-and-move-reinforcement-learning
Repo
Framework

Credit Assignment Techniques in Stochastic Computation Graphs


Title	Credit Assignment Techniques in Stochastic Computation Graphs
Authors	Théophane Weber, Nicolas Heess, Lars Buesing, David Silver
Abstract	Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.
Tasks
Published	2019-01-07
URL	http://arxiv.org/abs/1901.01761v1
PDF	http://arxiv.org/pdf/1901.01761v1.pdf
PWC	https://paperswithcode.com/paper/credit-assignment-techniques-in-stochastic
Repo
Framework

Learning adaptively from the unknown for few-example video person re-ID


Title	Learning adaptively from the unknown for few-example video person re-ID
Authors	Jian Han
Abstract	This paper mainly studies one-example and few-example video person re-identification. A multi-branch network PAM that jointly learns local and global features is proposed. PAM has high accuracy, few parameters and converges fast, which is suitable for few-example person re-identification. We iteratively estimates labels for unlabeled samples, incorporates them into training sets, and trains a more robust network. We propose the static relative distance sampling(SRD) strategy based on the relative distance between classes. For the problem that SRD can not use all unlabeled samples, we propose adaptive relative distance sampling (ARD) strategy. For one-example setting, We get 89.78%, 56.13% rank-1 accuracy on PRID2011 and iLIDS-VID respectively, and 85.16%, 45.36% mAP on DukeMTMC and MARS respectively, which exceeds the previous methods by large margin.
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2019-08-25
URL	https://arxiv.org/abs/1908.09340v1
PDF	https://arxiv.org/pdf/1908.09340v1.pdf
PWC	https://paperswithcode.com/paper/learning-adaptively-from-the-unknown-for-few
Repo
Framework

Compatible and Diverse Fashion Image Inpainting


Title	Compatible and Diverse Fashion Image Inpainting
Authors	Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R. Scott, Larry S. Davis
Abstract	Visual compatibility is critical for fashion analysis, yet is missing in existing fashion image synthesis systems. In this paper, we propose to explicitly model visual compatibility through fashion image inpainting. To this end, we present Fashion Inpainting Networks (FiNet), a two-stage image-to-image generation framework that is able to perform compatible and diverse inpainting. Disentangling the generation of shape and appearance to ensure photorealistic results, our framework consists of a shape generation network and an appearance generation network. More importantly, for each generation network, we introduce two encoders interacting with one another to learn latent code in a shared compatibility space. The latent representations are jointly optimized with the corresponding generation network to condition the synthesis process, encouraging a diverse set of generated results that are visually compatible with existing fashion garments. In addition, our framework is readily extended to clothing reconstruction and fashion transfer, with impressive results. Extensive experiments with comparisons with state-of-the-art approaches on fashion synthesis task quantitatively and qualitatively demonstrate the effectiveness of our method.
Tasks	Image Generation, Image Inpainting
Published	2019-02-04
URL	http://arxiv.org/abs/1902.01096v2
PDF	http://arxiv.org/pdf/1902.01096v2.pdf
PWC	https://paperswithcode.com/paper/compatible-and-diverse-fashion-image
Repo
Framework

Inference under Information Constraints II: Communication Constraints and Shared Randomness


Title	Inference under Information Constraints II: Communication Constraints and Shared Randomness
Authors	Jayadev Acharya, Clément L. Canonne, Himanshu Tyagi
Abstract	A central server needs to perform statistical inference based on samples that are distributed over multiple users who can each send a message of limited length to the center. We study problems of distribution learning and identity testing in this distributed inference setting and examine the role of shared randomness as a resource. We propose a general-purpose simulate-and-infer strategy that uses only private-coin communication protocols and is sample-optimal for distribution learning. This general strategy turns out to be sample-optimal even for distribution testing among private-coin protocols. Interestingly, we propose a public-coin protocol that outperforms simulate-and-infer for distribution testing and is, in fact, sample-optimal. Underlying our public-coin protocol is a random hash that when applied to the samples minimally contracts the chi-squared distance of their distribution to the uniform distribution.
Tasks
Published	2019-05-20
URL	https://arxiv.org/abs/1905.08302v1
PDF	https://arxiv.org/pdf/1905.08302v1.pdf
PWC	https://paperswithcode.com/paper/inference-under-information-constraints-ii
Repo
Framework

Deep Non-Rigid Structure from Motion


Title	Deep Non-Rigid Structure from Motion
Authors	Chen Kong, Simon Lucey
Abstract	Current non-rigid structure from motion (NRSfM) algorithms are mainly limited with respect to: (i) the number of images, and (ii) the type of shape variability they can handle. This has hampered the practical utility of NRSfM for many applications within vision. In this paper we propose a novel deep neural network to recover camera poses and 3D points solely from an ensemble of 2D image coordinates. The proposed neural network is mathematically interpretable as a multi-layer block sparse dictionary learning problem, and can handle problems of unprecedented scale and shape complexity. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in the order of magnitude. We further propose a quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstruction.
Tasks	Dictionary Learning
Published	2019-07-30
URL	https://arxiv.org/abs/1908.00052v2
PDF	https://arxiv.org/pdf/1908.00052v2.pdf
PWC	https://paperswithcode.com/paper/deep-non-rigid-structure-from-motion-1
Repo
Framework

Course Concept Expansion in MOOCs with External Knowledge and Interactive Game


Title	Course Concept Expansion in MOOCs with External Knowledge and Interactive Game
Authors	Jifan Yu, Chenyu Wang, Gan Luo, Lei Hou, Juanzi Li, Jie Tang, Zhiyuan Liu
Abstract	As Massive Open Online Courses (MOOCs) become increasingly popular, it is promising to automatically provide extracurricular knowledge for MOOC users. Suffering from semantic drifts and lack of knowledge guidance, existing methods can not effectively expand course concepts in complex MOOC environments. In this paper, we first build a novel boundary during searching for new concepts via external knowledge base and then utilize heterogeneous features to verify the high-quality results. In addition, to involve human efforts in our model, we design an interactive optimization mechanism based on a game. Our experiments on the four datasets from Coursera and XuetangX show that the proposed method achieves significant improvements(+0.19 by MAP) over existing methods. The source code and datasets have been published.
Tasks
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07739v1
PDF	https://arxiv.org/pdf/1909.07739v1.pdf
PWC	https://paperswithcode.com/paper/course-concept-expansion-in-moocs-with-1
Repo
Framework