January 28, 2020

3180 words 15 mins read

Paper Group ANR 788

Learning to Infer User Interface Attributes from Images. VITAL: A Visual Interpretation on Text with Adversarial Learning for Image Labeling. FontGAN: A Unified Generative Framework for Chinese Character Stylization and De-stylization. A New Framework for Query Efficient Active Imitation Learning. Towards Shape Biased Unsupervised Representation Le …

Learning to Infer User Interface Attributes from Images


Title	Learning to Infer User Interface Attributes from Images
Authors	Philippe Schlattner, Pavol Bielik, Martin Vechev
Abstract	We explore a new domain of learning to infer user interface attributes that helps developers automate the process of user interface implementation. Concretely, given an input image created by a designer, we learn to infer its implementation which when rendered, looks visually the same as the input image. To achieve this, we take a black box rendering engine and a set of attributes it supports (e.g., colors, border radius, shadow or text properties), use it to generate a suitable synthetic training dataset, and then train specialized neural models to predict each of the attribute values. To improve pixel-level accuracy, we additionally use imitation learning to train a neural policy that refines the predicted attribute values by learning to compute the similarity of the original and rendered images in their attribute space, rather than based on the difference of pixel values. We instantiate our approach to the task of inferring Android Button attribute values and achieve 92.5% accuracy on a dataset consisting of real-world Google Play Store applications.
Tasks	Imitation Learning
Published	2019-12-31
URL	https://arxiv.org/abs/1912.13243v1
PDF	https://arxiv.org/pdf/1912.13243v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-infer-user-interface-attributes-1
Repo
Framework

VITAL: A Visual Interpretation on Text with Adversarial Learning for Image Labeling


Title	VITAL: A Visual Interpretation on Text with Adversarial Learning for Image Labeling
Authors	Tao Hu, Chengjiang Long, Leheng Zhang, Chunxia Xiao
Abstract	In this paper, we propose a novel way to interpret text information by extracting visual feature presentation from multiple high-resolution and photo-realistic synthetic images generated by Text-to-image Generative Adversarial Network (GAN) to improve the performance of image labeling. Firstly, we design a stacked Generative Multi-Adversarial Network (GMAN), StackGMAN++, a modified version of the current state-of-the-art Text-to-image GAN, StackGAN++, to generate multiple synthetic images with various prior noises conditioned on a text. And then we extract deep visual features from the generated synthetic images to explore the underlying visual concepts for text. Finally, we combine image-level visual feature, text-level feature and visual features based on synthetic images together to predict labels for images. We conduct experiments on two benchmark datasets and the experimental results clearly demonstrate the efficacy of our proposed approach.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11811v2
PDF	https://arxiv.org/pdf/1907.11811v2.pdf
PWC	https://paperswithcode.com/paper/vital-a-visual-interpretation-on-text-with
Repo
Framework

FontGAN: A Unified Generative Framework for Chinese Character Stylization and De-stylization


Title	FontGAN: A Unified Generative Framework for Chinese Character Stylization and De-stylization
Authors	Xiyan Liu, Gaofeng Meng, Shiming Xiang, Chunhong Pan
Abstract	Chinese character synthesis involves two related aspects, i.e., style maintenance and content consistency. Although some methods have achieved remarkable success in synthesizing a character with specified style from standard font, how to map characters to a specified style domain without losing their identifiability remains very challenging. In this paper, we propose a novel model named FontGAN, which integrates the character stylization and de-stylization into a unified framework. In our model, we decouple character images into style representation and content representation, which facilitates more precise control of these two types of variables, thereby improving the quality of the generated results. We also introduce two modules, namely, font consistency module (FCM) and content prior module (CPM). FCM exploits a category guided Kullback-Leibler loss to embedding the style representation into different Gaussian distributions. It constrains the characters of the same font in the training set globally. On the other hand, it enables our model to obtain style variables through sampling in testing phase. CPM provides content prior for the model to guide the content encoding process and alleviates the problem of stroke deficiency during de-stylization. Extensive experimental results on character stylization and de-stylization have demonstrated the effectiveness of our method.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12604v1
PDF	https://arxiv.org/pdf/1910.12604v1.pdf
PWC	https://paperswithcode.com/paper/fontgan-a-unified-generative-framework-for
Repo
Framework

A New Framework for Query Efficient Active Imitation Learning


Title	A New Framework for Query Efficient Active Imitation Learning
Authors	Daniel Hsu
Abstract	We seek to align agent policy with human expert behavior in a reinforcement learning (RL) setting, without any prior knowledge about dynamics, reward function, and unsafe states. There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive. To address this challenge, we propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user’s reward function with efficient queries. We build an adversarial generative model of states and a successor feature (SR) model trained over transition experience collected by learning policy. Our method uses these models to select state-action pairs, asking the user to comment on the optimality or safety, and trains a adversarial neural network to predict the rewards. Different from previous papers, which are almost all based on uncertainty sampling, the key idea is to actively and efficiently select state-action pairs from both on-policy and off-policy experience, by discriminating the queried (expert) and unqueried (generated) data and maximizing the efficiency of value function learning. We call this method adversarial reward query with successor representation. We evaluate the proposed method with simulated human on a state-based 2D navigation task, robotic control tasks and the image-based video games, which have high-dimensional observation and complex state dynamics. The results show that the proposed method significantly outperforms uncertainty-based methods on learning reward models, achieving better query efficiency, where the adversarial discriminator can make the agent learn human behavior more efficiently and the SR can select states which have stronger impact on value function. Moreover, the proposed method can also learn to avoid unsafe states when training the reward model.
Tasks	Imitation Learning
Published	2019-12-30
URL	https://arxiv.org/abs/1912.13037v1
PDF	https://arxiv.org/pdf/1912.13037v1.pdf
PWC	https://paperswithcode.com/paper/a-new-framework-for-query-efficient-active
Repo
Framework

Towards Shape Biased Unsupervised Representation Learning for Domain Generalization


Title	Towards Shape Biased Unsupervised Representation Learning for Domain Generalization
Authors	Nader Asadi, Amir M. Sarfi, Mehrdad Hosseinzadeh, Zahra Karimpour, Mahdi Eftekhari
Abstract	It is known that, without awareness of the process, our brain appears to focus on the general shape of objects rather than superficial statistics of context. On the other hand, learning autonomously allows discovering invariant regularities which help generalization. In this work, we propose a learning framework to improve the shape bias property of self-supervised methods. Our method learns semantic and shape biased representations by integrating domain diversification and jigsaw puzzles. The first module enables the model to create a dynamic environment across arbitrary domains and provides a domain exploration vs. exploitation trade-off, while the second module allows the model to explore this environment autonomously. This universal framework does not require prior knowledge of the domain of interest. Extensive experiments are conducted on several domain generalization datasets, namely, PACS, Office-Home, VLCS, and Digits. We show that our framework outperforms state-of-the-art domain generalization methods by a large margin.
Tasks	Domain Generalization, Representation Learning, Unsupervised Representation Learning
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08245v2
PDF	https://arxiv.org/pdf/1909.08245v2.pdf
PWC	https://paperswithcode.com/paper/towards-shape-biased-unsupervised
Repo
Framework

Justifying Diagnosis Decisions by Deep Neural Networks


Title	Justifying Diagnosis Decisions by Deep Neural Networks
Authors	Graham Spinks, Marie-Francine Moens
Abstract	An integrated approach is proposed across visual and textual data to both determine and justify a medical diagnosis by a neural network. As deep learning techniques improve, interest grows to apply them in medical applications. To enable a transition to workflows in a medical context that are aided by machine learning, the need exists for such algorithms to help justify the obtained outcome so human clinicians can judge their validity. In this work, deep learning methods are used to map a frontal X-Ray image to a continuous textual representation. This textual representation is decoded into a diagnosis and the associated textual justification that will help a clinician evaluate the outcome. Additionally, more explanatory data is provided for the diagnosis by generating a realistic X-Ray that belongs to the nearest alternative diagnosis. With a clinical expert opinion study on a subset of the X-Ray data set from the Indiana University hospital network, we demonstrate that our justification mechanism significantly outperforms existing methods that use saliency maps. While performing multi-task training with multiple loss functions, our method achieves excellent diagnosis accuracy and captioning quality when compared to current state-of-the-art single-task methods.
Tasks	Medical Diagnosis
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05671v1
PDF	https://arxiv.org/pdf/1907.05671v1.pdf
PWC	https://paperswithcode.com/paper/justifying-diagnosis-decisions-by-deep-neural
Repo
Framework

A Light Dual-Task Neural Network for Haze Removal


Title	A Light Dual-Task Neural Network for Haze Removal
Authors	Yu Zhang, Xinchao Wang, Xiaojun Bi, Dacheng Tao
Abstract	Single-image dehazing is a challenging problem due to its ill-posed nature. Existing methods rely on a suboptimal two-step approach, where an intermediate product like a depth map is estimated, based on which the haze-free image is subsequently generated using an artificial prior formula. In this paper, we propose a light dual-task Neural Network called LDTNet that restores the haze-free image in one shot. We use transmission map estimation as an auxiliary task to assist the main task, haze removal, in feature extraction and to enhance the generalization of the network. In LDTNet, the haze-free image and the transmission map are produced simultaneously. As a result, the artificial prior is reduced to the smallest extent. Extensive experiments demonstrate that our algorithm achieves superior performance against the state-of-the-art methods on both synthetic and real-world images.
Tasks	Image Dehazing, Single Image Dehazing
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06024v1
PDF	http://arxiv.org/pdf/1904.06024v1.pdf
PWC	https://paperswithcode.com/paper/a-light-dual-task-neural-network-for-haze
Repo
Framework

A Layered Architecture for Active Perception: Image Classification using Deep Reinforcement Learning


Title	A Layered Architecture for Active Perception: Image Classification using Deep Reinforcement Learning
Authors	Hossein K. Mousavi, Guangyi Liu, Weihang Yuan, Martin Takáč, Héctor Muñoz-Avila, Nader Motee
Abstract	We propose a planning and perception mechanism for a robot (agent), that can only observe the underlying environment partially, in order to solve an image classification problem. A three-layer architecture is suggested that consists of a meta-layer that decides the intermediate goals, an action-layer that selects local actions as the agent navigates towards a goal, and a classification-layer that evaluates the reward and makes a prediction. We design and implement these layers using deep reinforcement learning. A generalized policy gradient algorithm is utilized to learn the parameters of these layers to maximize the expected reward. Our proposed methodology is tested on the MNIST dataset of handwritten digits, which provides us with a level of explainability while interpreting the agent’s intermediate goals and course of action.
Tasks	Image Classification
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09705v1
PDF	https://arxiv.org/pdf/1909.09705v1.pdf
PWC	https://paperswithcode.com/paper/190909705
Repo
Framework


Title	Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Authors	Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
Abstract	In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural language instruction. To explore the environment and progress towards the target location, the agent must perform a series of low-level actions, such as rotate, before stepping ahead. In this paper, we propose to exploit dynamic convolutional filters to encode the visual information and the lingual description in an efficient way. Differently from some previous works that abstract from the agent perspective and use high-level navigation spaces, we design a policy which decodes the information provided by dynamic convolution into a series of low-level, agent friendly actions. Results show that our model exploiting dynamic filters performs better than other architectures with traditional convolution, being the new state of the art for embodied VLN in the low-level action space. Additionally, we attempt to categorize recent work on VLN depending on their architectural choices and distinguish two main groups: we call them low-level actions and high-level actions models. To the best of our knowledge, we are the first to propose this analysis and categorization for VLN.
Tasks
Published	2019-07-05
URL	https://arxiv.org/abs/1907.02985v2
PDF	https://arxiv.org/pdf/1907.02985v2.pdf
PWC	https://paperswithcode.com/paper/embodied-vision-and-language-navigation-with
Repo
Framework

Tackling Partial Domain Adaptation with Self-Supervision


Title	Tackling Partial Domain Adaptation with Self-Supervision
Authors	Silvia Bucci, Antonio D’Innocente, Tatiana Tommasi
Abstract	Domain adaptation approaches have shown promising results in reducing the marginal distribution difference among visual domains. They allow to train reliable models that work over datasets of different nature (photos, paintings etc), but they still struggle when the domains do not share an identical label space. In the partial domain adaptation setting, where the target covers only a subset of the source classes, it is challenging to reduce the domain gap without incurring in negative transfer. Many solutions just keep the standard domain adaptation techniques by adding heuristic sample weighting strategies. In this work we show how the self-supervisory signal obtained from the spatial co-location of patches can be used to define a side task that supports adaptation regardless of the exact label sharing condition across domains. We build over a recent work that introduced a jigsaw puzzle task for domain generalization: we describe how to reformulate this approach for partial domain adaptation and we show how it boosts existing adaptive solutions when combined with them. The obtained experimental results on three datasets supports the effectiveness of our approach.
Tasks	Domain Adaptation, Domain Generalization, Partial Domain Adaptation
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05199v1
PDF	https://arxiv.org/pdf/1906.05199v1.pdf
PWC	https://paperswithcode.com/paper/tackling-partial-domain-adaptation-with-self
Repo
Framework

End-to-End Time-Lapse Video Synthesis from a Single Outdoor Image


Title	End-to-End Time-Lapse Video Synthesis from a Single Outdoor Image
Authors	Seonghyeon Nam, Chongyang Ma, Menglei Chai, William Brendel, Ning Xu, Seon Joo Kim
Abstract	Time-lapse videos usually contain visually appealing content but are often difficult and costly to create. In this paper, we present an end-to-end solution to synthesize a time-lapse video from a single outdoor image using deep neural networks. Our key idea is to train a conditional generative adversarial network based on existing datasets of time-lapse videos and image sequences. We propose a multi-frame joint conditional generation framework to effectively learn the correlation between the illumination change of an outdoor scene and the time of the day. We further present a multi-domain training scheme for robust training of our generative models from two datasets with different distributions and missing timestamp labels. Compared to alternative time-lapse video synthesis algorithms, our method uses the timestamp as the control variable and does not require a reference video to guide the synthesis of the final output. We conduct ablation studies to validate our algorithm and compare with state-of-the-art techniques both qualitatively and quantitatively.
Tasks
Published	2019-04-01
URL	http://arxiv.org/abs/1904.00680v1
PDF	http://arxiv.org/pdf/1904.00680v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-time-lapse-video-synthesis-from-a
Repo
Framework

Online control of the familywise error rate


Title	Online control of the familywise error rate
Authors	Jinjin Tian, Aaditya Ramdas
Abstract	Biological research often involves testing a growing number of null hypotheses as new data is accumulated over time. We study the problem of online control of the familywise error rate (FWER), that is testing an apriori unbounded sequence of hypotheses (p-values) one by one over time without knowing the future, such that with high probability there are no false discoveries in the entire sequence. This paper unifies algorithmic concepts developed for offline (single batch) FWER control and online false discovery rate control to develop novel online FWER control methods. Though many offline FWER methods (e.g. Bonferroni, fallback procedures and Sidak’s method) can trivially be extended to the online setting, our main contribution is the design of new, powerful, adaptive online algorithms that control the FWER when the p-values are independent or locally dependent in time. Our experiments demonstrate substantial gains in power, that are also formally proved in a Gaussian sequence model. Multiple testing, FWER control, online setting.
Tasks
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04900v2
PDF	https://arxiv.org/pdf/1910.04900v2.pdf
PWC	https://paperswithcode.com/paper/online-control-of-the-familywise-error-rate
Repo
Framework

A Content-Based Approach to Email Triage Action Prediction: Exploration and Evaluation


Title	A Content-Based Approach to Email Triage Action Prediction: Exploration and Evaluation
Authors	Sudipto Mukherjee, Ke Jiang
Abstract	Email has remained a principal form of communication among people, both in enterprise and social settings. With a deluge of emails crowding our mailboxes daily, there is a dire need of smart email systems that can recover important emails and make personalized recommendations. In this work, we study the problem of predicting user triage actions to incoming emails where we take the reply prediction as a working example. Different from existing methods, we formulate the triage action prediction as a recommendation problem and focus on the content-based approach, where the users are represented using the content of current and past emails. We also introduce additional similarity features to further explore the affinities between users and emails. Experiments on the publicly available Avocado email collection demonstrate the advantages of our proposed recommendation framework and our method is able to achieve better performance compared to the state-of-the-art deep recommendation methods. More importantly, we provide valuable insight into the effectiveness of different textual and user representations and show that traditional bag-of-words approaches, with the help from the similarity features, compete favorably with the more advanced neural embedding methods.
Tasks
Published	2019-04-30
URL	http://arxiv.org/abs/1905.01991v1
PDF	http://arxiv.org/pdf/1905.01991v1.pdf
PWC	https://paperswithcode.com/paper/190501991
Repo
Framework

Reinforced Bit Allocation under Task-Driven Semantic Distortion Metrics


Title	Reinforced Bit Allocation under Task-Driven Semantic Distortion Metrics
Authors	Jun Shi, Zhibo Chen
Abstract	Rapid growing intelligent applications require optimized bit allocation in image/video coding to support specific task-driven scenarios such as detection, classification, segmentation, etc. Some learning-based frameworks have been proposed for this purpose due to their inherent end-to-end optimization mechanisms. However, it is still quite challenging to integrate these task-driven metrics seamlessly into traditional hybrid coding framework. To the best of our knowledge, this paper is the first work trying to solve this challenge based on reinforcement learning (RL) approach. Specifically, we formulate the bit allocation problem as a Markovian Decision Process (MDP) and train RL agents to automatically decide the quantization parameter (QP) of each coding tree unit (CTU) for HEVC intra coding, according to the task-driven semantic distortion metrics. This bit allocation scheme can maximize the semantic level fidelity of the task, such as classification accuracy, while minimizing the bit-rate. We also employ gradient class activation map (Grad-CAM) and Mask R-CNN tools to extract task-related importance maps to help the agents make decisions. Extensive experimental results demonstrate the superior performance of our approach by achieving 43.1% to 73.2% bit-rate saving over the anchor of HEVC under the equivalent task-related distortions.
Tasks	Quantization
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07392v2
PDF	https://arxiv.org/pdf/1910.07392v2.pdf
PWC	https://paperswithcode.com/paper/reinforced-bit-allocation-under-task-driven
Repo
Framework

Training Object Detectors With Noisy Data


Title	Training Object Detectors With Noisy Data
Authors	Simon Chadwick, Paul Newman
Abstract	The availability of a large quantity of labelled training data is crucial for the training of modern object detectors. Hand labelling training data is time consuming and expensive while automatic labelling methods inevitably add unwanted noise to the labels. We examine the effect of different types of label noise on the performance of an object detector. We then show how co-teaching, a method developed for handling noisy labels and previously demonstrated on a classification problem, can be improved to mitigate the effects of label noise in an object detection setting. We illustrate our results using simulated noise on the KITTI dataset and on a vehicle detection task using automatically labelled data.
Tasks	Object Detection
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07202v1
PDF	https://arxiv.org/pdf/1905.07202v1.pdf
PWC	https://paperswithcode.com/paper/training-object-detectors-with-noisy-data
Repo
Framework