Paper Group ANR 1401
SPGNet: Semantic Prediction Guidance for Scene Parsing. Program-Guided Image Manipulators. Lunar surface image restoration using U-net based deep neural networks. Clustering of solutions in the symmetric binary perceptron. Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation. Generating 3D People in Scenes without People. …
SPGNet: Semantic Prediction Guidance for Scene Parsing
Title | SPGNet: Semantic Prediction Guidance for Scene Parsing |
Authors | Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas Huang, Wen-Mei Hwu, Honghui Shi |
Abstract | Multi-scale context module and single-stage encoder-decoder structure are commonly employed for semantic segmentation. The multi-scale context module refers to the operations to aggregate feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path. In contrast, multi-stage encoder-decoder networks have been widely used in human pose estimation and show superior performance than their single-stage counterpart. However, few efforts have been attempted to bring this effective design to semantic segmentation. In this work, we propose a Semantic Prediction Guidance (SPG) module which learns to re-weight the local features through the guidance from pixel-wise semantic prediction. We find that by carefully re-weighting features across stages, a two-stage encoder-decoder network coupled with our proposed SPG module can significantly outperform its one-stage counterpart with similar parameters and computations. Finally, we report experimental results on the semantic segmentation benchmark Cityscapes, in which our SPGNet attains 81.1% on the test set using only ‘fine’ annotations. |
Tasks | Pose Estimation, Scene Parsing, Semantic Segmentation |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09798v1 |
https://arxiv.org/pdf/1908.09798v1.pdf | |
PWC | https://paperswithcode.com/paper/spgnet-semantic-prediction-guidance-for-scene |
Repo | |
Framework | |
Program-Guided Image Manipulators
Title | Program-Guided Image Manipulators |
Authors | Jiayuan Mao, Xiuming Zhang, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu |
Abstract | Humans are capable of building holistic representations for images at various levels, from local objects, to pairwise relations, to global structures. The interpretation of structures involves reasoning over repetition and symmetry of the objects in the image. In this paper, we present the Program-Guided Image Manipulator (PG-IM), inducing neuro-symbolic program-like representations to represent and manipulate images. Given an image, PG-IM detects repeated patterns, induces symbolic programs, and manipulates the image using a neural network that is guided by the program. PG-IM learns from a single image, exploiting its internal statistics. Despite trained only on image inpainting, PG-IM is directly capable of extrapolation and regularity editing in a unified framework. Extensive experiments show that PG-IM achieves superior performance on all the tasks. |
Tasks | Image Inpainting |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.02116v1 |
https://arxiv.org/pdf/1909.02116v1.pdf | |
PWC | https://paperswithcode.com/paper/program-guided-image-manipulators |
Repo | |
Framework | |
Lunar surface image restoration using U-net based deep neural networks
Title | Lunar surface image restoration using U-net based deep neural networks |
Authors | Hiya Roy, Subhajit Chaudhury, Toshihiko Yamasaki, Danielle DeLatte, Makiko Ohtake, Tatsuaki Hashimoto |
Abstract | Image restoration is a technique that reconstructs a feasible estimate of the original image from the noisy observation. In this paper, we present a U-Net based deep neural network model to restore the missing pixels on the lunar surface image in a context-aware fashion, which is often known as image inpainting problem. We use the grayscale image of the lunar surface captured by Multiband Imager (MI) onboard Kaguya satellite for our experiments and the results show that our method can reconstruct the lunar surface image with good visual quality and improved PSNR values. |
Tasks | Image Inpainting, Image Restoration |
Published | 2019-04-14 |
URL | http://arxiv.org/abs/1904.06683v1 |
http://arxiv.org/pdf/1904.06683v1.pdf | |
PWC | https://paperswithcode.com/paper/lunar-surface-image-restoration-using-u-net |
Repo | |
Framework | |
Clustering of solutions in the symmetric binary perceptron
Title | Clustering of solutions in the symmetric binary perceptron |
Authors | Carlo Baldassi, Riccardo Della Vecchia, Carlo Lucibello, Riccardo Zecchina |
Abstract | The geometrical features of the (non-convex) loss landscape of neural network models are crucial in ensuring successful optimization and, most importantly, the capability to generalize well. While minimizers’ flatness consistently correlates with good generalization, there has been little rigorous work in exploring the condition of existence of such minimizers, even in toy models. Here we consider a simple neural network model, the symmetric perceptron, with binary weights. Phrasing the learning problem as a constraint satisfaction problem, the analogous of a flat minimizer becomes a large and dense cluster of solutions, while the narrowest minimizers are isolated solutions. We perform the first steps toward the rigorous proof of the existence of a dense cluster in certain regimes of the parameters, by computing the first and second moment upper bounds for the existence of pairs of arbitrarily close solutions. Moreover, we present a non rigorous derivation of the same bounds for sets of $y$ solutions at fixed pairwise distances. |
Tasks | |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06756v2 |
https://arxiv.org/pdf/1911.06756v2.pdf | |
PWC | https://paperswithcode.com/paper/clustering-of-solutions-in-the-symmetric |
Repo | |
Framework | |
Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation
Title | Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation |
Authors | Soochan Lee, Junsoo Ha, Gunhee Kim |
Abstract | Recent advances in conditional image generation tasks, such as image-to-image translation and image inpainting, are largely accounted to the success of conditional GAN models, which are often optimized by the joint use of the GAN loss with the reconstruction loss. However, we reveal that this training recipe shared by almost all existing methods causes one critical side effect: lack of diversity in output samples. In order to accomplish both training stability and multimodal output generation, we propose novel training schemes with a new set of losses named moment reconstruction losses that simply replace the reconstruction loss. We show that our approach is applicable to any conditional generation tasks by performing thorough experiments on image-to-image translation, super-resolution and image inpainting using Cityscapes and CelebA dataset. Quantitative evaluations also confirm that our methods achieve a great diversity in outputs while retaining or even improving the visual fidelity of generated samples. |
Tasks | Conditional Image Generation, Image Generation, Image Inpainting, Image-to-Image Translation, Super-Resolution |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09225v1 |
http://arxiv.org/pdf/1902.09225v1.pdf | |
PWC | https://paperswithcode.com/paper/harmonizing-maximum-likelihood-with-gans-for |
Repo | |
Framework | |
Generating 3D People in Scenes without People
Title | Generating 3D People in Scenes without People |
Authors | Yan Zhang, Mohamed Hassan, Heiko Neumann, Michael J. Black, Siyu Tang |
Abstract | We present a fully-automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene. Given a 3D scene without people, humans can easily imagine how people could interact with the scene and the objects in it. However, this is a challenging task for a computer as solving it requires (1) the generated human bodies should be semantically plausible with the 3D environment, e.g. people sitting on the sofa or cooking near the stove; (2) the generated human-scene interaction should be physically feasible in the way that the human body and scene do not interpenetrate while, at the same time, body-scene contact supports physical interactions. To that end, we make use of the surface-based 3D human model SMPL-X. We first train a conditional variational autoencoder to predict semantically plausible 3D human pose conditioned on latent scene representations, then we further refine the generated 3D bodies using scene constraints to enforce feasible physical interaction. We show that our approach is able to synthesize realistic and expressive 3D human bodies that naturally interact with 3D environment. We perform extensive experiments demonstrating that our generative framework compares favorably with existing methods, both qualitatively and quantitatively. We believe that our scene-conditioned 3D human generation pipeline will be useful for numerous applications; e.g. to generate training data for human pose estimation, in video games and in VR/AR. |
Tasks | Pose Estimation |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02923v2 |
https://arxiv.org/pdf/1912.02923v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-3d-people-in-scenes-without-people |
Repo | |
Framework | |
On Voting Strategies and Emergent Communication
Title | On Voting Strategies and Emergent Communication |
Authors | Shubham Gupta, Ambedkar Dukkipati |
Abstract | Humans use language to collectively execute complex strategies in addition to using it as a referential tool for referring to physical entities. While existing approaches that study the emergence of language in settings where the language mainly acts as a referential tool, in this paper, we study the role of emergent languages in discovering and implementing strategies in a multi-agent setting. The agents in our setup are connected via a network and are allowed to exchange messages in the form of sequences of discrete symbols. We formulate the problem as a voting game, where two candidate agents are contesting in an election and their goal is to convince the population members (other agents) in the network to vote for them by sending them messages. We use neural networks to parameterize the policies followed by agents in the game. We investigate the effect of choosing different training objectives and strategies for agents in the game and make observations about the emergent language in each case. To the best of our knowledge this is the first work that explores emergence of language for discovering and implementing strategies in a setting where agents are connected via an underlying network. |
Tasks | |
Published | 2019-02-19 |
URL | http://arxiv.org/abs/1902.06897v1 |
http://arxiv.org/pdf/1902.06897v1.pdf | |
PWC | https://paperswithcode.com/paper/on-voting-strategies-and-emergent |
Repo | |
Framework | |
IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection
Title | IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection |
Authors | Youtian Lin, Pengming Feng, Jian Guan |
Abstract | Object detection in aerial images is a challenging task due to its lack of visiable features and variant orientation of objects. Currently, amount of R-CNN framework based detectors have made significant progress in predicting targets by horizontal bounding boxes (HBB) and oriented bounding boxes (OBB). However, there is still open space for one-stage anchor free solutions. This paper proposes a one-stage anchor free detector for orientional object in aerial images, which is built upon a per-pixel prediction fashion detector. We make it possible by developing a branch interacting module with a self-attention mechanism to fuse features from classification and box regression branchs. Moreover a geometric transformation is employed in angle prediction to make it more manageable for the prediction network. We also introduce an IOU loss for OBB detection, which is more efficient than regular polygon IOU. The propsed method is evaluated on DOTA and HRSC2016 datasets, and the outcomes show the higher OBB detection performance from our propsed IENet when compared with the state-of-the-art detectors. |
Tasks | Object Detection, Object Detection In Aerial Images |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00969v1 |
https://arxiv.org/pdf/1912.00969v1.pdf | |
PWC | https://paperswithcode.com/paper/ienet-interacting-embranchment-one-stage |
Repo | |
Framework | |
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos
Title | Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos |
Authors | Dongliang He, Xiang Zhao, Jizhou Huang, Fu Li, Xiao Liu, Shilei Wen |
Abstract | The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a pre-segmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-of-the-art performance on ActivityNet’18 DenseCaption dataset and Charades-STA dataset while observing only 10 or less clips per video. |
Tasks | Decision Making, Multi-Task Learning |
Published | 2019-01-21 |
URL | http://arxiv.org/abs/1901.06829v1 |
http://arxiv.org/pdf/1901.06829v1.pdf | |
PWC | https://paperswithcode.com/paper/read-watch-and-move-reinforcement-learning |
Repo | |
Framework | |
Credit Assignment Techniques in Stochastic Computation Graphs
Title | Credit Assignment Techniques in Stochastic Computation Graphs |
Authors | Théophane Weber, Nicolas Heess, Lars Buesing, David Silver |
Abstract | Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature. |
Tasks | |
Published | 2019-01-07 |
URL | http://arxiv.org/abs/1901.01761v1 |
http://arxiv.org/pdf/1901.01761v1.pdf | |
PWC | https://paperswithcode.com/paper/credit-assignment-techniques-in-stochastic |
Repo | |
Framework | |
Learning adaptively from the unknown for few-example video person re-ID
Title | Learning adaptively from the unknown for few-example video person re-ID |
Authors | Jian Han |
Abstract | This paper mainly studies one-example and few-example video person re-identification. A multi-branch network PAM that jointly learns local and global features is proposed. PAM has high accuracy, few parameters and converges fast, which is suitable for few-example person re-identification. We iteratively estimates labels for unlabeled samples, incorporates them into training sets, and trains a more robust network. We propose the static relative distance sampling(SRD) strategy based on the relative distance between classes. For the problem that SRD can not use all unlabeled samples, we propose adaptive relative distance sampling (ARD) strategy. For one-example setting, We get 89.78%, 56.13% rank-1 accuracy on PRID2011 and iLIDS-VID respectively, and 85.16%, 45.36% mAP on DukeMTMC and MARS respectively, which exceeds the previous methods by large margin. |
Tasks | Person Re-Identification, Video-Based Person Re-Identification |
Published | 2019-08-25 |
URL | https://arxiv.org/abs/1908.09340v1 |
https://arxiv.org/pdf/1908.09340v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-adaptively-from-the-unknown-for-few |
Repo | |
Framework | |
Compatible and Diverse Fashion Image Inpainting
Title | Compatible and Diverse Fashion Image Inpainting |
Authors | Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R. Scott, Larry S. Davis |
Abstract | Visual compatibility is critical for fashion analysis, yet is missing in existing fashion image synthesis systems. In this paper, we propose to explicitly model visual compatibility through fashion image inpainting. To this end, we present Fashion Inpainting Networks (FiNet), a two-stage image-to-image generation framework that is able to perform compatible and diverse inpainting. Disentangling the generation of shape and appearance to ensure photorealistic results, our framework consists of a shape generation network and an appearance generation network. More importantly, for each generation network, we introduce two encoders interacting with one another to learn latent code in a shared compatibility space. The latent representations are jointly optimized with the corresponding generation network to condition the synthesis process, encouraging a diverse set of generated results that are visually compatible with existing fashion garments. In addition, our framework is readily extended to clothing reconstruction and fashion transfer, with impressive results. Extensive experiments with comparisons with state-of-the-art approaches on fashion synthesis task quantitatively and qualitatively demonstrate the effectiveness of our method. |
Tasks | Image Generation, Image Inpainting |
Published | 2019-02-04 |
URL | http://arxiv.org/abs/1902.01096v2 |
http://arxiv.org/pdf/1902.01096v2.pdf | |
PWC | https://paperswithcode.com/paper/compatible-and-diverse-fashion-image |
Repo | |
Framework | |
Inference under Information Constraints II: Communication Constraints and Shared Randomness
Title | Inference under Information Constraints II: Communication Constraints and Shared Randomness |
Authors | Jayadev Acharya, Clément L. Canonne, Himanshu Tyagi |
Abstract | A central server needs to perform statistical inference based on samples that are distributed over multiple users who can each send a message of limited length to the center. We study problems of distribution learning and identity testing in this distributed inference setting and examine the role of shared randomness as a resource. We propose a general-purpose simulate-and-infer strategy that uses only private-coin communication protocols and is sample-optimal for distribution learning. This general strategy turns out to be sample-optimal even for distribution testing among private-coin protocols. Interestingly, we propose a public-coin protocol that outperforms simulate-and-infer for distribution testing and is, in fact, sample-optimal. Underlying our public-coin protocol is a random hash that when applied to the samples minimally contracts the chi-squared distance of their distribution to the uniform distribution. |
Tasks | |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.08302v1 |
https://arxiv.org/pdf/1905.08302v1.pdf | |
PWC | https://paperswithcode.com/paper/inference-under-information-constraints-ii |
Repo | |
Framework | |
Deep Non-Rigid Structure from Motion
Title | Deep Non-Rigid Structure from Motion |
Authors | Chen Kong, Simon Lucey |
Abstract | Current non-rigid structure from motion (NRSfM) algorithms are mainly limited with respect to: (i) the number of images, and (ii) the type of shape variability they can handle. This has hampered the practical utility of NRSfM for many applications within vision. In this paper we propose a novel deep neural network to recover camera poses and 3D points solely from an ensemble of 2D image coordinates. The proposed neural network is mathematically interpretable as a multi-layer block sparse dictionary learning problem, and can handle problems of unprecedented scale and shape complexity. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in the order of magnitude. We further propose a quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstruction. |
Tasks | Dictionary Learning |
Published | 2019-07-30 |
URL | https://arxiv.org/abs/1908.00052v2 |
https://arxiv.org/pdf/1908.00052v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-non-rigid-structure-from-motion-1 |
Repo | |
Framework | |
Course Concept Expansion in MOOCs with External Knowledge and Interactive Game
Title | Course Concept Expansion in MOOCs with External Knowledge and Interactive Game |
Authors | Jifan Yu, Chenyu Wang, Gan Luo, Lei Hou, Juanzi Li, Jie Tang, Zhiyuan Liu |
Abstract | As Massive Open Online Courses (MOOCs) become increasingly popular, it is promising to automatically provide extracurricular knowledge for MOOC users. Suffering from semantic drifts and lack of knowledge guidance, existing methods can not effectively expand course concepts in complex MOOC environments. In this paper, we first build a novel boundary during searching for new concepts via external knowledge base and then utilize heterogeneous features to verify the high-quality results. In addition, to involve human efforts in our model, we design an interactive optimization mechanism based on a game. Our experiments on the four datasets from Coursera and XuetangX show that the proposed method achieves significant improvements(+0.19 by MAP) over existing methods. The source code and datasets have been published. |
Tasks | |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.07739v1 |
https://arxiv.org/pdf/1909.07739v1.pdf | |
PWC | https://paperswithcode.com/paper/course-concept-expansion-in-moocs-with-1 |
Repo | |
Framework | |