January 29, 2020

2960 words 14 mins read

Paper Group ANR 518

Self-supervised Data Bootstrapping for Deep Optical Character Recognition of Identity Documents. Bayesian Curiosity for Efficient Exploration in Reinforcement Learning. Attention-based Fusion for Multi-source Human Image Generation. Towards non-toxic landscapes: Automatic toxic comment detection using DNN. Multimodal deep networks for text and imag …

Self-supervised Data Bootstrapping for Deep Optical Character Recognition of Identity Documents


Title	Self-supervised Data Bootstrapping for Deep Optical Character Recognition of Identity Documents
Authors	Oliver Mothes, Joachim Denzler
Abstract	The essential task of verifying person identities at airports and national borders is very time consuming. To accelerate it, optical character recognition for identity documents (IDs) using dictionaries is not appropriate due to high variability of the text content in IDs, e.g., individual street names or surnames. Additionally, no properties of the used fonts in IDs are known. Therefore, we propose an iterative self-supervised bootstrapping approach using a smart strategy to mine real character data from IDs. In combination with synthetically generated character data, the real data is used to train efficient convolutional neural networks for character classification serving a practical runtime as well as a high accuracy. On a dataset with 74 character classes, we achieve an average class-wise accuracy of 99.4 %. In contrast, if we would apply a classifier trained only using synthetic data, the accuracy is reduced to 58.1 %. Finally, we show that our whole proposed pipeline outperforms an established open-source framework
Tasks	Optical Character Recognition
Published	2019-08-12
URL	https://arxiv.org/abs/1908.04027v1
PDF	https://arxiv.org/pdf/1908.04027v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-data-bootstrapping-for-deep
Repo
Framework

Bayesian Curiosity for Efficient Exploration in Reinforcement Learning


Title	Bayesian Curiosity for Efficient Exploration in Reinforcement Learning
Authors	Tom Blau, Lionel Ott, Fabio Ramos
Abstract	Balancing exploration and exploitation is a fundamental part of reinforcement learning, yet most state-of-the-art algorithms use a naive exploration protocol like $\epsilon$-greedy. This contributes to the problem of high sample complexity, as the algorithm wastes effort by repeatedly visiting parts of the state space that have already been explored. We introduce a novel method based on Bayesian linear regression and latent space embedding to generate an intrinsic reward signal that encourages the learning agent to seek out unexplored parts of the state space. This method is computationally efficient, simple to implement, and can extend any state-of-the-art reinforcement learning algorithm. We evaluate the method on a range of algorithms and challenging control tasks, on both simulated and physical robots, demonstrating how the proposed method can significantly improve sample complexity.
Tasks	Efficient Exploration
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08701v1
PDF	https://arxiv.org/pdf/1911.08701v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-curiosity-for-efficient-exploration
Repo
Framework

Attention-based Fusion for Multi-source Human Image Generation


Title	Attention-based Fusion for Multi-source Human Image Generation
Authors	Stéphane Lathuilière, Enver Sangineto, Aliaksandr Siarohin, Nicu Sebe
Abstract	We present a generalization of the person-image generation task, in which a human image is generated conditioned on a target pose and a set X of source appearance images. In this way, we can exploit multiple, possibly complementary images of the same person which are usually available at training and at testing time. The solution we propose is mainly based on a local attention mechanism which selects relevant information from different source image regions, avoiding the necessity to build specific generators for each specific cardinality of X. The empirical evaluation of our method shows the practical interest of addressing the person-image generation problem in a multi-source setting.
Tasks	Image Generation
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02655v1
PDF	https://arxiv.org/pdf/1905.02655v1.pdf
PWC	https://paperswithcode.com/paper/attention-based-fusion-for-multi-source-human
Repo
Framework

Towards non-toxic landscapes: Automatic toxic comment detection using DNN


Title	Towards non-toxic landscapes: Automatic toxic comment detection using DNN
Authors	Ashwin Geet D’Sa, Irina Illina, Dominique Fohr
Abstract	The spectacular expansion of the Internet led to the development of a new research problem in the natural language processing field: automatic toxic comment detection, since many countries prohibit hate speech in public media. There is no clear and formal definition of hate, offensive, toxic and abusive speeches. In this article, we put all these terms under the “umbrella” of toxic speech. The contribution of this paper is the design of binary classification and regression-based approaches aiming to predict whether a comment is toxic or not. We compare different unsupervised word representations and different DNN classifiers. Moreover, we study the robustness of the proposed approaches to adversarial attacks by adding one (healthy or toxic) word. We evaluate the proposed methodology on the English Wikipedia Detox corpus. Our experiments show that using BERT fine-tuning outperforms feature-based BERT, Mikolov’s word embedding or fastText representations with different DNN classifiers.
Tasks
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08395v1
PDF	https://arxiv.org/pdf/1911.08395v1.pdf
PWC	https://paperswithcode.com/paper/towards-non-toxic-landscapes-automatic-toxic
Repo
Framework

Multimodal deep networks for text and image-based document classification


Title	Multimodal deep networks for text and image-based document classification
Authors	Nicolas Audebert, Catherine Herold, Kuider Slimani, Cédric Vidal
Abstract	Classification of document images is a critical step for archival of old manuscripts, online subscription and administrative procedures. Computer vision and deep learning have been suggested as a first solution to classify documents based on their visual appearance. However, achieving the fine-grained classification that is required in real-world setting cannot be achieved by visual analysis alone. Often, the relevant information is in the actual text content of the document. We design a multimodal neural network that is able to learn from word embeddings, computed on text extracted by OCR, and from the image. We show that this approach boosts pure image accuracy by 3% on Tobacco3482 and RVL-CDIP augmented by our new QS-OCR text dataset (https://github.com/Quicksign/ocrized-text-dataset), even without clean text information.
Tasks	Document Classification, Optical Character Recognition, Word Embeddings
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06370v1
PDF	https://arxiv.org/pdf/1907.06370v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-deep-networks-for-text-and-image
Repo
Framework

Compositional pre-training for neural semantic parsing


Title	Compositional pre-training for neural semantic parsing
Authors	Amir Ziai
Abstract	Semantic parsing is the process of translating natural language utterances into logical forms, which has many important applications such as question answering and instruction following. Sequence-to-sequence models have been very successful across many NLP tasks. However, a lack of task-specific prior knowledge can be detrimental to the performance of these models. Prior work has used frameworks for inducing grammars over the training examples, which capture conditional independence properties that the model can leverage. Inspired by the recent success stories such as BERT we set out to extend this augmentation framework into two stages. The first stage is to pre-train using a corpus of augmented examples in an unsupervised manner. The second stage is to fine-tune to a domain-specific task. In addition, since the pre-training stage is separate from the training on the main task we also expand the universe of possible augmentations without causing catastrophic inference. We also propose a novel data augmentation strategy that interchanges tokens that co-occur in similar contexts to produce new training pairs. We demonstrate that the proposed two-stage framework is beneficial for improving the parsing accuracy in a standard dataset called GeoQuery for the task of generating logical forms from a set of questions about the US geography.
Tasks	Data Augmentation, Question Answering, Semantic Parsing
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11531v1
PDF	https://arxiv.org/pdf/1905.11531v1.pdf
PWC	https://paperswithcode.com/paper/compositional-pre-training-for-neural
Repo
Framework

Efficient Exploration through Intrinsic Motivation Learning for Unsupervised Subgoal Discovery in Model-Free Hierarchical Reinforcement Learning


Title	Efficient Exploration through Intrinsic Motivation Learning for Unsupervised Subgoal Discovery in Model-Free Hierarchical Reinforcement Learning
Authors	Jacob Rafati, David C. Noelle
Abstract	Efficient exploration for automatic subgoal discovery is a challenging problem in Hierarchical Reinforcement Learning (HRL). In this paper, we show that intrinsic motivation learning increases the efficiency of exploration, leading to successful subgoal discovery. We introduce a model-free subgoal discovery method based on unsupervised learning over a limited memory of agent’s experiences during intrinsic motivation. Additionally, we offer a unified approach to learning representations in model-free HRL.
Tasks	Efficient Exploration, Hierarchical Reinforcement Learning
Published	2019-11-18
URL	https://arxiv.org/abs/1911.10164v1
PDF	https://arxiv.org/pdf/1911.10164v1.pdf
PWC	https://paperswithcode.com/paper/efficient-exploration-through-intrinsic
Repo
Framework

SENSE: Semantically Enhanced Node Sequence Embedding


Title	SENSE: Semantically Enhanced Node Sequence Embedding
Authors	Swati Rallapalli, Liang Ma, Mudhakar Srivatsa, Ananthram Swami, Heesung Kwon, Graham Bent, Christopher Simpkin
Abstract	Effectively capturing graph node sequences in the form of vector embeddings is critical to many applications. We achieve this by (i) first learning vector embeddings of single graph nodes and (ii) then composing them to compactly represent node sequences. Specifically, we propose SENSE-S (Semantically Enhanced Node Sequence Embedding - for Single nodes), a skip-gram based novel embedding mechanism, for single graph nodes that co-learns graph structure as well as their textual descriptions. We demonstrate that SENSE-S vectors increase the accuracy of multi-label classification tasks by up to 50% and link-prediction tasks by up to 78% under a variety of scenarios using real datasets. Based on SENSE-S, we next propose generic SENSE to compute composite vectors that represent a sequence of nodes, where preserving the node order is important. We prove that this approach is efficient in embedding node sequences, and our experiments on real data confirm its high accuracy in node order decoding.
Tasks	Link Prediction, Multi-Label Classification
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02970v1
PDF	https://arxiv.org/pdf/1911.02970v1.pdf
PWC	https://paperswithcode.com/paper/sense-semantically-enhanced-node-sequence-1
Repo
Framework

What can be estimated? Identifiability, estimability, causal inference and ill-posed inverse problems


Title	What can be estimated? Identifiability, estimability, causal inference and ill-posed inverse problems
Authors	Oliver J. Maclaren, Ruanui Nicholson
Abstract	Here we consider, in the context of causal inference, the basic question: ‘what can be estimated from data?'. We call this the question of estimability. We consider the usual definition adopted in the causal inference literature – identifiability – in a general mathematical setting and show why it is an inadequate formal translation of the concept of estimability. Despite showing that identifiability implies the existence of a Fisher-consistent estimator, we show that this estimator may be discontinuous, and hence unstable, in general. The difficulty arises because the causal inference problem is in general an ill-posed inverse problem. Inverse problems have three conditions which must be satisfied in order to be considered well-posed: existence, uniqueness, and stability of solutions. We illustrate how identifiability corresponds to the question of uniqueness; in contrast, we take estimability to mean satisfaction of all three conditions, i.e. well-posedness. It follows that mere identifiability does not guarantee well-posedness of a causal inference procedure, i.e. estimability, and apparent solutions to causal inference problems can be essentially useless with even the smallest amount of imperfection. These concerns apply, in particular, to causal inference approaches that focus on identifiability while ignoring the additional stability requirements needed for estimability.
Tasks	Causal Inference
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02826v2
PDF	http://arxiv.org/pdf/1904.02826v2.pdf
PWC	https://paperswithcode.com/paper/what-can-be-estimated-identifiability
Repo
Framework

Last-iterate convergence rates for min-max optimization


Title	Last-iterate convergence rates for min-max optimization
Authors	Jacob Abernethy, Kevin A. Lai, Andre Wibisono
Abstract	While classic work in convex-concave min-max optimization relies on average-iterate convergence results, the emergence of nonconvex applications such as training Generative Adversarial Networks has led to renewed interest in last-iterate convergence guarantees. Proving last-iterate convergence is challenging because many natural algorithms, such as Simultaneous Gradient Descent/Ascent, provably diverge or cycle even in simple convex-concave min-max settings, and previous work on global last-iterate convergence rates has been limited to the bilinear and convex-strongly concave settings. In this work, we show that the Hamiltonian Gradient Descent (HGD) algorithm achieves linear convergence in a variety of more general settings, including convex-concave problems that satisfy a “sufficiently bilinear” condition. We also prove similar convergence rates for the Consensus Optimization (CO) algorithm of [MNG17] for some parameter settings of CO.
Tasks
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02027v3
PDF	https://arxiv.org/pdf/1906.02027v3.pdf
PWC	https://paperswithcode.com/paper/last-iterate-convergence-rates-for-min-max
Repo
Framework

Meta-Learning PAC-Bayes Priors in Model Averaging


Title	Meta-Learning PAC-Bayes Priors in Model Averaging
Authors	Yimin Huang, Weiran Huang, Liang Li, Zhenguo Li
Abstract	Nowadays model uncertainty has become one of the most important problems in both academia and industry. In this paper, we mainly consider the scenario in which we have a common model set used for model averaging instead of selecting a single final model via a model selection procedure to account for this model’s uncertainty to improve reliability and accuracy of inferences. Here one main challenge is to learn the prior over the model set. To tackle this problem, we propose two data-based algorithms to get proper priors for model averaging. One is for meta-learner, the analysts should use historical similar tasks to extract the information about the prior. The other one is for base-learner, a subsampling method is used to deal with the data step by step. Theoretically, an upper bound of risk for our algorithm is presented to guarantee the performance of the worst situation. In practice, both methods perform well in simulations and real data studies, especially with poor quality data.
Tasks	Meta-Learning, Model Selection
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11252v2
PDF	https://arxiv.org/pdf/1912.11252v2.pdf
PWC	https://paperswithcode.com/paper/meta-learning-pac-bayes-priors-in-model
Repo
Framework

Semi-supervised Skin Detection by Network with Mutual Guidance


Title	Semi-supervised Skin Detection by Network with Mutual Guidance
Authors	Yi He, Jiayuan Shi, Chuan Wang, Haibin Huang, Jiaming Liu, Guanbin Li, Risheng Liu, Jue Wang
Abstract	In this paper we present a new data-driven method for robust skin detection from a single human portrait image. Unlike previous methods, we incorporate human body as a weak semantic guidance into this task, considering acquiring large-scale of human labeled skin data is commonly expensive and time-consuming. To be specific, we propose a dual-task neural network for joint detection of skin and body via a semi-supervised learning strategy. The dual-task network contains a shared encoder but two decoders for skin and body separately. For each decoder, its output also serves as a guidance for its counterpart, making both decoders mutually guided. Extensive experiments were conducted to demonstrate the effectiveness of our network with mutual guidance, and experimental results show our network outperforms the state-of-the-art in skin detection.
Tasks
Published	2019-08-06
URL	https://arxiv.org/abs/1908.01977v1
PDF	https://arxiv.org/pdf/1908.01977v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-skin-detection-by-network
Repo
Framework

Scene-and-Process-Dependent Spatial Image Quality Metrics


Title	Scene-and-Process-Dependent Spatial Image Quality Metrics
Authors	Edward W. S. Fry, Sophie Triantaphillidou, Robin B. Jenkin, Ralph E. Jacobson, John R. Jarvis
Abstract	Spatial image quality metrics designed for camera systems generally employ the Modulation Transfer Function (MTF), the Noise Power Spectrum (NPS), and a visual contrast detection model. Prior art indicates that scene-dependent characteristics of non-linear, content-aware image processing are unaccounted for by MTFs and NPSs measured using traditional methods. We present two novel metrics: the log Noise Equivalent Quanta (log NEQ) and Visual log NEQ. They both employ scene-and-process-dependent MTF (SPD-MTF) and NPS (SPD-NPS) measures, which account for signal-transfer and noise scene-dependency, respectively. We also investigate implementing contrast detection and discrimination models that account for scene-dependent visual masking. Also, three leading camera metrics are revised that use the above scene-dependent measures. All metrics are validated by examining correlations with the perceived quality of images produced by simulated camera pipelines. Metric accuracy improved consistently when the SPD-MTFs and SPD-NPSs were implemented. The novel metrics outperformed existing metrics of the same genre.
Tasks
Published	2019-07-21
URL	https://arxiv.org/abs/1907.08926v1
PDF	https://arxiv.org/pdf/1907.08926v1.pdf
PWC	https://paperswithcode.com/paper/scene-and-process-dependent-spatial-image
Repo
Framework

Automatic Left Atrial Appendage Orifice Detection for Preprocedural Planning of Appendage Closure


Title	Automatic Left Atrial Appendage Orifice Detection for Preprocedural Planning of Appendage Closure
Authors	Walid Abdullah Al, Il Dong Yun, Eun Ju Chun
Abstract	In preoperative planning of left atrial appendage closure (LAAC) with CT angiography, the assessment of the appendage orifice plays a crucial role in choosing an appropriate LAAC device size and a proper C-arm angulation. However, accurate orifice detection is laborious because of the high anatomic variation of the appendage, as well as the unclear orifice position and orientation in the available views. We propose an automatic orifice detection approach performing a search on the principal medial axis of the appendage, where we present an efficient iterative algorithm to grow the axis from the appendage to the left atrium. We propose to use the axis-to-surface distance of the appendage for efficient and effective detection. To localize the necessary initial seed for growing the medial axis, we train an artificial localization agent using an actor-critic reinforcement learning approach, defining the localization as a sequential decision process. The entire detection process takes only about 8 seconds, and the variance of the detected orifice with respect to annotations from two experts is calculated to be significantly small and less than the inter-observer variance. The proposed orifice search on the medial axis of the appendage comparing only its distance from the surface provides a simple, yet robust solution for orifice detection. While being the first fully automatic approach and providing a detection error below the inter-observer difference, our method improved the detection efficiency by eighteen times compared to the existing solution, therefore, can be potentially useful for physicians.
Tasks
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01241v1
PDF	http://arxiv.org/pdf/1904.01241v1.pdf
PWC	https://paperswithcode.com/paper/automatic-left-atrial-appendage-orifice
Repo
Framework

Learning to Transfer Examples for Partial Domain Adaptation


Title	Learning to Transfer Examples for Partial Domain Adaptation
Authors	Zhangjie Cao, Kaichao You, Mingsheng Long, Jianmin Wang, Qiang Yang
Abstract	Domain adaptation is critical for learning in new and unseen environments. With domain adversarial training, deep networks can learn disentangled and transferable features that effectively diminish the dataset shift between the source and target domains for knowledge transfer. In the era of Big Data, the ready availability of large-scale labeled datasets has stimulated wide interest in partial domain adaptation (PDA), which transfers a recognizer from a labeled large domain to an unlabeled small domain. It extends standard domain adaptation to the scenario where target labels are only a subset of source labels. Under the condition that target labels are unknown, the key challenge of PDA is how to transfer relevant examples in the shared classes to promote positive transfer, and ignore irrelevant ones in the specific classes to mitigate negative transfer. In this work, we propose a unified approach to PDA, Example Transfer Network (ETN), which jointly learns domain-invariant representations across the source and target domains, and a progressive weighting scheme that quantifies the transferability of source examples while controlling their importance to the learning task in the target domain. A thorough evaluation on several benchmark datasets shows that our approach achieves state-of-the-art results for partial domain adaptation tasks.
Tasks	Domain Adaptation, Partial Domain Adaptation, Transfer Learning
Published	2019-03-28
URL	http://arxiv.org/abs/1903.12230v2
PDF	http://arxiv.org/pdf/1903.12230v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-transfer-examples-for-partial
Repo
Framework