April 1, 2020

3438 words 17 mins read

Paper Group ANR 464

On Demand Solid Texture Synthesis Using Deep 3D Networks. Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes. Adversarial Generation of Informative Trajectories for Dynamics System Identification. One-Shot Domain Adaptation For Face Generation. Learning to Fly via Deep Model-Based Reinforcement Learning. Dual-Attention GAN fo …

On Demand Solid Texture Synthesis Using Deep 3D Networks


Title	On Demand Solid Texture Synthesis Using Deep 3D Networks
Authors	Jorge Gutierrez, Julien Rabin, Bruno Galerne, Thomas Hurtut
Abstract	This paper describes a novel approach for on demand volumetric texture synthesis based on a deep learning framework that allows for the generation of high quality 3D data at interactive rates. Based on a few example images of textures, a generative network is trained to synthesize coherent portions of solid textures of arbitrary sizes that reproduce the visual characteristics of the examples along some directions. To cope with memory limitations and computation complexity that are inherent to both high resolution and 3D processing on the GPU, only 2D textures referred to as “slices” are generated during the training stage. These synthetic textures are compared to exemplar images via a perceptual loss function based on a pre-trained deep network. The proposed network is very light (less than 100k parameters), therefore it only requires sustainable training (i.e. few hours) and is capable of very fast generation (around a second for $256^3$ voxels) on a single GPU. Integrated with a spatially seeded PRNG the proposed generator network directly returns an RGB value given a set of 3D coordinates. The synthesized volumes have good visual results that are at least equivalent to the state-of-the-art patch based approaches. They are naturally seamlessly tileable and can be fully generated in parallel.
Tasks	Texture Synthesis
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04528v1
PDF	https://arxiv.org/pdf/2001.04528v1.pdf
PWC	https://paperswithcode.com/paper/on-demand-solid-texture-synthesis-using-deep
Repo
Framework

Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes


Title	Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes
Authors	Liang Liao, Jing Xiao, Zheng Wang, Chia-wen Lin, Shin’ichi Satoh
Abstract	Completing a corrupted image with correct structures and reasonable textures for a mixed scene remains an elusive challenge. Since the missing hole in a mixed scene of a corrupted image often contains various semantic information, conventional two-stage approaches utilizing structural information often lead to the problem of unreliable structural prediction and ambiguous image texture generation. In this paper, we propose a Semantic Guidance and Evaluation Network (SGE-Net) to iteratively update the structural priors and the inpainted image in an interplay framework of semantics extraction and image inpainting. It utilizes semantic segmentation map as guidance in each scale of inpainting, under which location-dependent inferences are re-evaluated, and, accordingly, poorly-inferred regions are refined in subsequent scales. Extensive experiments on real-world images of mixed scenes demonstrated the superiority of our proposed method over state-of-the-art approaches, in terms of clear boundaries and photo-realistic textures.
Tasks	Image Inpainting, Semantic Segmentation, Texture Synthesis
Published	2020-03-15
URL	https://arxiv.org/abs/2003.06877v2
PDF	https://arxiv.org/pdf/2003.06877v2.pdf
PWC	https://paperswithcode.com/paper/guidance-and-evaluation-semantic-aware-image
Repo
Framework

Adversarial Generation of Informative Trajectories for Dynamics System Identification


Title	Adversarial Generation of Informative Trajectories for Dynamics System Identification
Authors	Marija Jegorova, Joshua Smith, Michael Mistry, Timothy Hospedales
Abstract	Dynamic System Identification approaches usually heavily rely on the evolutionary and gradient-based optimisation techniques to produce optimal excitation trajectories for determining the physical parameters of robot platforms. Current optimisation techniques tend to generate single trajectories. This is expensive, and intractable for longer trajectories, thus limiting their efficacy for system identification. We propose to tackle this issue by using multiple shorter cyclic trajectories, which can be generated in parallel, and subsequently combined together to achieve the same effect as a longer trajectory. Crucially, we show how to scale this approach even further by increasing the generation speed and quality of the dataset through the use of generative adversarial network (GAN) based architectures to produce a large databases of valid and diverse excitation trajectories. To the best of our knowledge, this is the first robotics work to explore system identification with multiple cyclic trajectories and to develop GAN-based techniques for scaleably producing excitation trajectories that are diverse in both control parameter and inertial parameter spaces. We show that our approach dramatically accelerates trajectory optimisation, while simultaneously providing more accurate system identification than the conventional approach.
Tasks
Published	2020-03-02
URL	https://arxiv.org/abs/2003.01190v1
PDF	https://arxiv.org/pdf/2003.01190v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-generation-of-informative
Repo
Framework

One-Shot Domain Adaptation For Face Generation


Title	One-Shot Domain Adaptation For Face Generation
Authors	Chao Yang, Ser-Nam Lim
Abstract	In this paper, we propose a framework capable of generating face images that fall into the same distribution as that of a given one-shot example. We leverage a pre-trained StyleGAN model that already learned the generic face distribution. Given the one-shot target, we develop an iterative optimization scheme that rapidly adapts the weights of the model to shift the output’s high-level distribution to the target’s. To generate images of the same distribution, we introduce a style-mixing technique that transfers the low-level statistics from the target to faces randomly generated with the model. With that, we are able to generate an unlimited number of faces that inherit from the distribution of both generic human faces and the one-shot example. The newly generated faces can serve as augmented training data for other downstream tasks. Such setting is appealing as it requires labeling very few, or even one example, in the target domain, which is often the case of real-world face manipulations that result from a variety of unknown and unique distributions, each with extremely low prevalence. We show the effectiveness of our one-shot approach for detecting face manipulations and compare it with other few-shot domain adaptation methods qualitatively and quantitatively.
Tasks	Domain Adaptation, Face Generation
Published	2020-03-28
URL	https://arxiv.org/abs/2003.12869v1
PDF	https://arxiv.org/pdf/2003.12869v1.pdf
PWC	https://paperswithcode.com/paper/one-shot-domain-adaptation-for-face
Repo
Framework

Learning to Fly via Deep Model-Based Reinforcement Learning


Title	Learning to Fly via Deep Model-Based Reinforcement Learning
Authors	Philip Becker-Ehmck, Maximilian Karl, Jan Peters, Patrick van der Smagt
Abstract	Learning to control robots without requiring models has been a long-term goal, promising diverse and novel applications. Yet, reinforcement learning has only achieved limited impact on real-time robot control due to its high demand of real-world interactions. In this work, by leveraging a learnt probabilistic model of drone dynamics, we achieve human-like quadrotor control through model-based reinforcement learning. No prior knowledge of the flight dynamics is assumed; instead, a sequential latent variable model, used generatively and as an online filter, is learnt from raw sensory input. The controller and value function are optimised entirely by propagating stochastic analytic gradients through generated latent trajectories. We show that “learning to fly” can be achieved with less than 30 minutes of experience with a single drone, and can be deployed solely using onboard computational resources and sensors, on a self-built drone.
Tasks
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08876v1
PDF	https://arxiv.org/pdf/2003.08876v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-fly-via-deep-model-based
Repo
Framework

Dual-Attention GAN for Large-Pose Face Frontalization


Title	Dual-Attention GAN for Large-Pose Face Frontalization
Authors	Yu Yin, Songyao Jiang, Joseph P. Robinson, Yun Fu
Abstract	Face frontalization provides an effective and efficient way for face data augmentation and further improves the face recognition performance in extreme pose scenario. Despite recent advances in deep learning-based face synthesis approaches, this problem is still challenging due to significant pose and illumination discrepancy. In this paper, we present a novel Dual-Attention Generative Adversarial Network (DA-GAN) for photo-realistic face frontalization by capturing both contextual dependencies and local consistency during GAN training. Specifically, a self-attention-based generator is introduced to integrate local features with their long-range dependencies yielding better feature representations, and hence generate faces that preserve identities better, especially for larger pose angles. Moreover, a novel face-attention-based discriminator is applied to emphasize local features of face regions, and hence reinforce the realism of synthetic frontal faces. Guided by semantic segmentation, four independent discriminators are used to distinguish between different aspects of a face (\ie skin, keypoints, hairline, and frontalized face). By introducing these two complementary attention mechanisms in generator and discriminator separately, we can learn a richer feature representation and generate identity preserving inference of frontal views with much finer details (i.e., more accurate facial appearance and textures) comparing to the state-of-the-art. Quantitative and qualitative experimental results demonstrate the effectiveness and efficiency of our DA-GAN approach.
Tasks	Data Augmentation, Face Generation, Face Recognition, Semantic Segmentation
Published	2020-02-17
URL	https://arxiv.org/abs/2002.07227v1
PDF	https://arxiv.org/pdf/2002.07227v1.pdf
PWC	https://paperswithcode.com/paper/dual-attention-gan-for-large-pose-face
Repo
Framework

FakeLocator: Robust Localization of GAN-Based Face Manipulations via Semantic Segmentation Networks with Bells and Whistles


Title	FakeLocator: Robust Localization of GAN-Based Face Manipulations via Semantic Segmentation Networks with Bells and Whistles
Authors	Yihao Huang, Felix Juefei-Xu, Run Wang, Xiaofei Xie, Lei Ma, Jianwen Li, Weikai Miao, Yang Liu, Geguang Pu
Abstract	Nowadays, full face synthesis and partial face manipulation by virtue of the generative adversarial networks (GANs) have raised wide public concern. In the digital media forensics area, detecting and ultimately locating the image forgery have become imperative. Although many methods focus on fake detection, only a few put emphasis on the localization of the fake regions. Through analyzing the imperfection in the upsampling procedures of the GAN-based methods and recasting the fake localization problem as a modified semantic segmentation one, our proposed FakeLocator can obtain high localization accuracy, at full resolution, on manipulated facial images. To the best of our knowledge, this is the very first attempt to solve the GAN-based fake localization problem with a semantic segmentation map. As an improvement, the real-numbered segmentation map proposed by us preserves more information of fake regions. For this new type segmentation map, we also find suitable loss functions for it. Experimental results on the CelebA and FFHQ databases with seven different SOTA GAN-based face generation methods show the effectiveness of our method. Compared with the baseline, our method performs several times better on various metrics. Moreover, the proposed method is robust against various real-world facial image degradations such as JPEG compression, low-resolution, noise, and blur.
Tasks	Face Generation, Semantic Segmentation
Published	2020-01-27
URL	https://arxiv.org/abs/2001.09598v2
PDF	https://arxiv.org/pdf/2001.09598v2.pdf
PWC	https://paperswithcode.com/paper/fakelocator-robust-localization-of-gan-based
Repo
Framework

GLIB: Exploration via Goal-Literal Babbling for Lifted Operator Learning


Title	GLIB: Exploration via Goal-Literal Babbling for Lifted Operator Learning
Authors	Rohan Chitnis, Tom Silver, Joshua Tenenbaum, Leslie Pack Kaelbling, Tomas Lozano-Perez
Abstract	We address the problem of efficient exploration for learning lifted operators in sequential decision-making problems without extrinsic goals or rewards. Inspired by human curiosity, we propose goal-literal babbling (GLIB), a simple and general method for exploration in such problems. GLIB samples goals that are conjunctions of literals, which can be understood as specific, targeted effects that the agent would like to achieve in the world, and plans to achieve these goals using the operators being learned. We conduct a case study to elucidate two key benefits of GLIB: robustness to overly general preconditions and efficient exploration in domains with effects at long horizons. We also provide theoretical guarantees and further empirical results, finding GLIB to be effective on a range of benchmark planning tasks.
Tasks	Decision Making, Efficient Exploration
Published	2020-01-22
URL	https://arxiv.org/abs/2001.08299v1
PDF	https://arxiv.org/pdf/2001.08299v1.pdf
PWC	https://paperswithcode.com/paper/glib-exploration-via-goal-literal-babbling
Repo
Framework

Long Short-Term Relation Networks for Video Action Detection


Title	Long Short-Term Relation Networks for Video Action Detection
Authors	Dong Li, Ting Yao, Zhaofan Qiu, Houqiang Li, Tao Mei
Abstract	It has been well recognized that modeling human-object or object-object relations would be helpful for detection task. Nevertheless, the problem is not trivial especially when exploring the interactions between human actor, object and scene (collectively as human-context) to boost video action detectors. The difficulty originates from the aspect that reliable relations in a video should depend on not only short-term human-context relation in the present clip but also the temporal dynamics distilled over a long-range span of the video. This motivates us to capture both short-term and long-term relations in a video. In this paper, we present a new Long Short-Term Relation Networks, dubbed as LSTR, that novelly aggregates and propagates relation to augment features for video action detection. Technically, Region Proposal Networks (RPN) is remoulded to first produce 3D bounding boxes, i.e., tubelets, in each video clip. LSTR then models short-term human-context interactions within each clip through spatio-temporal attention mechanism and reasons long-term temporal dynamics across video clips via Graph Convolutional Networks (GCN) in a cascaded manner. Extensive experiments are conducted on four benchmark datasets, and superior results are reported when comparing to state-of-the-art methods.
Tasks	Action Detection
Published	2020-03-31
URL	https://arxiv.org/abs/2003.14065v1
PDF	https://arxiv.org/pdf/2003.14065v1.pdf
PWC	https://paperswithcode.com/paper/long-short-term-relation-networks-for-video
Repo
Framework

Revisiting Few-shot Activity Detection with Class Similarity Control


Title	Revisiting Few-shot Activity Detection with Class Similarity Control
Authors	Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell
Abstract	Many interesting events in the real world are rare making preannotated machine learning ready videos a rarity in consequence. Thus, temporal activity detection models that are able to learn from a few examples are desirable. In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos. Our model is end-to-end trainable, takes into account the frame rate differences between few-shot activities and untrimmed test videos, and can benefit from additional few-shot examples. We experiment on three large scale benchmarks for temporal activity detection (ActivityNet1.2, ActivityNet1.3 and THUMOS14 datasets) in a few-shot setting. We also study the effect on performance of different amount of overlap with activities used to pretrain the video classification backbone and propose corrective measures for future works in this domain. Our code will be made available.
Tasks	Action Detection, Activity Detection, Video Classification
Published	2020-03-31
URL	https://arxiv.org/abs/2004.00137v1
PDF	https://arxiv.org/pdf/2004.00137v1.pdf
PWC	https://paperswithcode.com/paper/revisiting-few-shot-activity-detection-with
Repo
Framework

DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment Prediction


Title	DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment Prediction
Authors	Xingyao Zhang, Cao Xiao, Lucas M. Glass, Jimeng Sun
Abstract	Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment. The core problem of patient-trial matching is to find qualified patients for a trial, where patient information is stored in electronic health records (EHR) while trial eligibility criteria (EC) are described in text documents available on the web. How to represent longitudinal patient EHR? How to extract complex logical rules from EC? Most existing works rely on manual rule-based extraction, which is time consuming and inflexible for complex inference. To address these challenges, we proposed DeepEnroll, a cross-modal inference learning model to jointly encode enrollment criteria (text) and patients records (tabular data) into a shared latent space for matching inference. DeepEnroll applies a pre-trained Bidirectional Encoder Representations from Transformers(BERT) model to encode clinical trial information into sentence embedding. And uses a hierarchical embedding model to represent patient longitudinal EHR. In addition, DeepEnroll is augmented by a numerical information embedding and entailment module to reason over numerical information in both EC and EHR. These encoders are trained jointly to optimize patient-trial matching score. We evaluated DeepEnroll on the trial-patient matching task with demonstrated on real world datasets. DeepEnroll outperformed the best baseline by up to 12.4% in average F1.
Tasks	Sentence Embedding
Published	2020-01-22
URL	https://arxiv.org/abs/2001.08179v2
PDF	https://arxiv.org/pdf/2001.08179v2.pdf
PWC	https://paperswithcode.com/paper/deepenroll-patient-trial-matching-with-deep
Repo
Framework

Just Noticeable Difference for Machines to Generate Adversarial Images


Title	Just Noticeable Difference for Machines to Generate Adversarial Images
Authors	Adil Kaan Akan, Mehmet Ali Genc, Fatos T. Yarman Vural
Abstract	One way of designing a robust machine learning algorithm is to generate authentic adversarial images which can trick the algorithms as much as possible. In this study, we propose a new method to generate adversarial images which are very similar to true images, yet, these images are discriminated from the original ones and are assigned into another category by the model. The proposed method is based on a popular concept of experimental psychology, called, Just Noticeable Difference. We define Just Noticeable Difference for a machine learning model and generate a least perceptible difference for adversarial images which can trick a model. The suggested model iteratively distorts a true image by gradient descent method until the machine learning algorithm outputs a false label. Deep Neural Networks are trained for object detection and classification tasks. The cost function includes regularization terms to generate just noticeably different adversarial images which can be detected by the model. The adversarial images generated in this study looks more natural compared to the output of state of the art adversarial image generators.
Tasks	Object Detection
Published	2020-01-29
URL	https://arxiv.org/abs/2001.11064v1
PDF	https://arxiv.org/pdf/2001.11064v1.pdf
PWC	https://paperswithcode.com/paper/just-noticeable-difference-for-machines-to
Repo
Framework

A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future


Title	A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future
Authors	Evangelia Kyrimi, Scott McLachlan, Kudakwashe Dube, Mariana R. Neves, Ali Fahmi, Norman Fenton
Abstract	No comprehensive review of Bayesian networks (BNs) in healthcare has been published in the past, making it difficult to organize the research contributions in the present and identify challenges and neglected areas that need to be addressed in the future. This unique and novel scoping review of BNs in healthcare provides an analytical framework for comprehensively characterizing the domain and its current state. The review shows that: (1) BNs in healthcare are not used to their full potential; (2) a generic BN development process is lacking; (3) limitations exists in the way BNs in healthcare are presented in the literature, which impacts understanding, consensus towards systematic methodologies, practice and adoption of BNs; and (4) a gap exists between having an accurate BN and a useful BN that impacts clinical practice. This review empowers researchers and clinicians with an analytical framework and findings that will enable understanding of the need to address the problems of restricted aims of BNs, ad hoc BN development methods, and the lack of BN adoption in practice. To map the way forward, the paper proposes future research directions and makes recommendations regarding BN development methods and adoption in practice.
Tasks
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08627v2
PDF	https://arxiv.org/pdf/2002.08627v2.pdf
PWC	https://paperswithcode.com/paper/a-comprehensive-scoping-review-of-bayesian
Repo
Framework

Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation


Title	Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation
Authors	Yansong Tang, Jiwen Lu, Jie Zhou
Abstract	Thanks to the substantial and explosively inscreased instructional videos on the Internet, novices are able to acquire knowledge for completing various tasks. Over the past decade, growing efforts have been devoted to investigating the problem on instructional video analysis. However, the most existing datasets in this area have limitations in diversity and scale, which makes them far from many real-world applications where more diverse activities occur. To address this, we present a large-scale dataset named as “COIN” for COmprehensive INstructional video analysis. Organized with a hierarchical structure, the COIN dataset contains 11,827 videos of 180 tasks in 12 domains (e.g., vehicles, gadgets, etc.) related to our daily life. With a new developed toolbox, all the videos are annotated efficiently with a series of step labels and the corresponding temporal boundaries. In order to provide a benchmark for instructional video analysis, we evaluate plenty of approaches on the COIN dataset under five different settings. Furthermore, we exploit two important characteristics (i.e., task-consistency and ordering-dependency) for localizing important steps in instructional videos. Accordingly, we propose two simple yet effective methods, which can be easily plugged into conventional proposal-based action detection models. We believe the introduction of the COIN dataset will promote the future in-depth research on instructional video analysis for the community. Our dataset, annotation toolbox and source code are available at http://coin-dataset.github.io.
Tasks	Action Detection
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09392v1
PDF	https://arxiv.org/pdf/2003.09392v1.pdf
PWC	https://paperswithcode.com/paper/comprehensive-instructional-video-analysis
Repo
Framework

A Novel Online Action Detection Framework from Untrimmed Video Streams


Title	A Novel Online Action Detection Framework from Untrimmed Video Streams
Authors	Da-Hye Yoon, Nam-Gyu Cho, Seong-Whan Lee
Abstract	Online temporal action localization from an untrimmed video stream is a challenging problem in computer vision. It is challenging because of i) in an untrimmed video stream, more than one action instance may appear, including background scenes, and ii) in online settings, only past and current information is available. Therefore, temporal priors, such as the average action duration of training data, which have been exploited by previous action detection methods, are not suitable for this task because of the high intra-class variation in human actions. We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses and leverages a future frame generation network to cope with the limited information issue associated with the problem outlined above. Additionally, we augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions. We evaluate our method using two benchmark datasets, THUMOS’14 and ActivityNet, for an online temporal action localization scenario and demonstrate that the performance is comparable to state-of-the-art methods that have been proposed for offline settings.
Tasks	Action Detection, Action Localization, Temporal Action Localization
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07734v1
PDF	https://arxiv.org/pdf/2003.07734v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-online-action-detection-framework
Repo
Framework