January 25, 2020

2992 words 15 mins read

Paper Group ANR 1630

Automatic Information Extraction from Piping and Instrumentation Diagrams. Disentangled Deep Autoencoding Regularization for Robust Image Classification. Improving Long Handwritten Text Line Recognition with Convolutional Multi-way Associative Memory. Learning Actions from Human Demonstration Video for Robotic Manipulation. On the Correctness and S …

Automatic Information Extraction from Piping and Instrumentation Diagrams


Title	Automatic Information Extraction from Piping and Instrumentation Diagrams
Authors	Rohit Rahul, Shubham Paliwal, Monika Sharma, Lovekesh Vig
Abstract	One of the most common modes of representing engineering schematics are Piping and Instrumentation diagrams (P&IDs) that describe the layout of an engineering process flow along with the interconnected process equipment. Over the years, P&ID diagrams have been manually generated, scanned and stored as image files. These files need to be digitized for purposes of inventory management and updation, and easy reference to different components of the schematics. There are several challenging vision problems associated with digitizing real world P&ID diagrams. Real world P&IDs come in several different resolutions, and often contain noisy textual information. Extraction of instrumentation information from these diagrams involves accurate detection of symbols that frequently have minute visual differences between them. Identification of pipelines that may converge and diverge at different points in the image is a further cause for concern. Due to these reasons, to the best of our knowledge, no system has been proposed for end-to-end data extraction from P&ID diagrams. However, with the advent of deep learning and the spectacular successes it has achieved in vision, we hypothesized that it is now possible to re-examine this problem armed with the latest deep learning models. To that end, we present a novel pipeline for information extraction from P&ID sheets via a combination of traditional vision techniques and state-of-the-art deep learning models to identify and isolate pipeline codes, pipelines, inlets and outlets, and for detecting symbols. This is followed by association of the detected components with the appropriate pipeline. The extracted pipeline information is used to populate a tree-like data-structure for capturing the structure of the piping schematics. We evaluated proposed method on a real world dataset of P&ID sheets obtained from an oil firm and have obtained promising results.
Tasks
Published	2019-01-28
URL	http://arxiv.org/abs/1901.11383v1
PDF	http://arxiv.org/pdf/1901.11383v1.pdf
PWC	https://paperswithcode.com/paper/automatic-information-extraction-from-piping
Repo
Framework

Disentangled Deep Autoencoding Regularization for Robust Image Classification


Title	Disentangled Deep Autoencoding Regularization for Robust Image Classification
Authors	Zhenyu Duan, Martin Renqiang Min, Li Erran Li, Mingbo Cai, Yi Xu, Bingbing Ni
Abstract	In spite of achieving revolutionary successes in machine learning, deep convolutional neural networks have been recently found to be vulnerable to adversarial attacks and difficult to generalize to novel test images with reasonably large geometric transformations. Inspired by a recent neuroscience discovery revealing that primate brain employs disentangled shape and appearance representations for object recognition, we propose a general disentangled deep autoencoding regularization framework that can be easily applied to any deep embedding based classification model for improving the robustness of deep neural networks. Our framework effectively learns disentangled appearance code and geometric code for robust image classification, which is the first disentangling based method defending against adversarial attacks and complementary to standard defense methods. Extensive experiments on several benchmark datasets show that, our proposed regularization framework leveraging disentangled embedding significantly outperforms traditional unregularized convolutional neural networks for image classification on robustness against adversarial attacks and generalization to novel test data.
Tasks	Image Classification, Object Recognition
Published	2019-02-27
URL	http://arxiv.org/abs/1902.11134v1
PDF	http://arxiv.org/pdf/1902.11134v1.pdf
PWC	https://paperswithcode.com/paper/disentangled-deep-autoencoding-regularization
Repo
Framework

Improving Long Handwritten Text Line Recognition with Convolutional Multi-way Associative Memory


Title	Improving Long Handwritten Text Line Recognition with Convolutional Multi-way Associative Memory
Authors	Duc Nguyen, Nhan Tran, Hung Le
Abstract	Convolutional Recurrent Neural Networks (CRNNs) excel at scene text recognition. Unfortunately, they are likely to suffer from vanishing/exploding gradient problems when processing long text images, which are commonly found in scanned documents. This poses a major challenge to goal of completely solving Optical Character Recognition (OCR) problem. Inspired by recently proposed memory-augmented neural networks (MANNs) for long-term sequential modeling, we present a new architecture dubbed Convolutional Multi-way Associative Memory (CMAM) to tackle the limitation of current CRNNs. By leveraging recent memory accessing mechanisms in MANNs, our architecture demonstrates superior performance against other CRNN counterparts in three real-world long text OCR datasets.
Tasks	Optical Character Recognition, Scene Text Recognition
Published	2019-11-05
URL	https://arxiv.org/abs/1911.01577v2
PDF	https://arxiv.org/pdf/1911.01577v2.pdf
PWC	https://paperswithcode.com/paper/improving-long-handwritten-text-line
Repo
Framework

Learning Actions from Human Demonstration Video for Robotic Manipulation


Title	Learning Actions from Human Demonstration Video for Robotic Manipulation
Authors	Shuo Yang, Wei Zhang, Weizhi Lu, Hesheng Wang, Yibin Li
Abstract	Learning actions from human demonstration is an emerging trend for designing intelligent robotic systems, which can be referred as video to command. The performance of such approach highly relies on the quality of video captioning. However, the general video captioning methods focus more on the understanding of the full frame, lacking of consideration on the specific object of interests in robotic manipulations. We propose a novel deep model to learn actions from human demonstration video for robotic manipulation. It consists of two deep networks, grasp detection network (GNet) and video captioning network (CNet). GNet performs two functions: providing grasp solutions and extracting the local features for the object of interests in robotic manipulation. CNet outputs the captioning results by fusing the features of both full frames and local objects. Experimental results on UR5 robotic arm show that our method could produce more accurate command from video demonstration than state-of-the-art work, thereby leading to more robust grasping performance.
Tasks	Video Captioning
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04312v1
PDF	https://arxiv.org/pdf/1909.04312v1.pdf
PWC	https://paperswithcode.com/paper/learning-actions-from-human-demonstration
Repo
Framework

On the Correctness and Sample Complexity of Inverse Reinforcement Learning


Title	On the Correctness and Sample Complexity of Inverse Reinforcement Learning
Authors	Abi Komanduru, Jean Honorio
Abstract	Inverse reinforcement learning (IRL) is the problem of finding a reward function that generates a given optimal policy for a given Markov Decision Process. This paper looks at an algorithmic-independent geometric analysis of the IRL problem with finite states and actions. A L1-regularized Support Vector Machine formulation of the IRL problem motivated by the geometric analysis is then proposed with the basic objective of the inverse reinforcement problem in mind: to find a reward function that generates a specified optimal policy. The paper further analyzes the proposed formulation of inverse reinforcement learning with $n$ states and $k$ actions, and shows a sample complexity of $O(n^2 \log (nk))$ for recovering a reward function that generates a policy that satisfies Bellman’s optimality condition with respect to the true transition probabilities.
Tasks
Published	2019-06-02
URL	https://arxiv.org/abs/1906.00422v1
PDF	https://arxiv.org/pdf/1906.00422v1.pdf
PWC	https://paperswithcode.com/paper/190600422
Repo
Framework

Pseudo-Labeling Curriculum for Unsupervised Domain Adaptation


Title	Pseudo-Labeling Curriculum for Unsupervised Domain Adaptation
Authors	Jaehoon Choi, Minki Jeong, Taekyung Kim, Changick Kim
Abstract	To learn target discriminative representations, using pseudo-labels is a simple yet effective approach for unsupervised domain adaptation. However, the existence of false pseudo-labels, which may have a detrimental influence on learning target representations, remains a major challenge. To overcome this issue, we propose a pseudo-labeling curriculum based on a density-based clustering algorithm. Since samples with high density values are more likely to have correct pseudo-labels, we leverage these subsets to train our target network at the early stage, and utilize data subsets with low density values at the later stage. We can progressively improve the capability of our network to generate pseudo-labels, and thus these target samples with pseudo-labels are effective for training our model. Moreover, we present a clustering constraint to enhance the discriminative power of the learned target features. Our approach achieves state-of-the-art performance on three benchmarks: Office-31, imageCLEF-DA, and Office-Home.
Tasks	Domain Adaptation, Semi-Supervised Image Classification, Unsupervised Domain Adaptation
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00262v1
PDF	https://arxiv.org/pdf/1908.00262v1.pdf
PWC	https://paperswithcode.com/paper/pseudo-labeling-curriculum-for-unsupervised
Repo
Framework

Few-Shot Meta-Denoising


Title	Few-Shot Meta-Denoising
Authors	Leslie Casas, Attila Klimmek, Gustavo Carneiro, Nassir Navab, Vasileios Belagiannis
Abstract	We study the problem of few-shot learning-based denoising where the training set contains just a handful of clean and noisy samples. A solution to mitigate the small training set issue is to pre-train a denoising model with small training sets containing pairs of clean and synthesized noisy signals, produced from empirical noise priors, and fine-tune on the available small training set. While such transfer learning seems effective, it may not generalize well because of the limited amount of training data. In this work, we propose a new meta-learning training approach for few-shot learning-based denoising problems. Our model is meta-trained using known synthetic noise models, and then fine-tuned with the small training set, with the real noise, as a few-shot learning task. Meta-learning from small training sets of synthetically generated data during meta-training enables us to not only generate an infinite number of training tasks, but also train a model to learn with small training sets – both advantages have the potential to improve the generalisation of the denoising model. Our approach is empirically shown to produce more accurate denoising results than supervised learning and transfer learning in three denoising evaluations for images and 1-D signals. Interestingly, our study provides strong indications that meta-learning has the potential to become the main learning algorithm for denoising.
Tasks	Denoising, Few-Shot Learning, Meta-Learning, Transfer Learning
Published	2019-07-31
URL	https://arxiv.org/abs/1908.00111v2
PDF	https://arxiv.org/pdf/1908.00111v2.pdf
PWC	https://paperswithcode.com/paper/few-shot-meta-denoising
Repo
Framework

Tagged Back-Translation


Title	Tagged Back-Translation
Authors	Isaac Caswell, Ciprian Chelba, David Grangier
Abstract	Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data. We show that the main role of such synthetic noise is not to diversify the source side, as previously suggested, but simply to indicate to the model that the given source is synthetic. We propose a simpler alternative to noising techniques, consisting of tagging back-translated source sentences with an extra token. Our results on WMT outperform noised back-translation in English-Romanian and match performance on English-German, re-defining state-of-the-art in the former.
Tasks	Machine Translation
Published	2019-06-15
URL	https://arxiv.org/abs/1906.06442v1
PDF	https://arxiv.org/pdf/1906.06442v1.pdf
PWC	https://paperswithcode.com/paper/tagged-back-translation
Repo
Framework

Boosting Few-Shot Visual Learning with Self-Supervision


Title	Boosting Few-Shot Visual Learning with Self-Supervision
Authors	Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, Matthieu Cord
Abstract	Few-shot learning and self-supervised learning address different facets of the same problem: how to train a model with little or no labeled data. Few-shot learning aims for optimization methods and models that can learn efficiently to recognize patterns in the low data regime. Self-supervised learning focuses instead on unlabeled data and looks into it for the supervisory signal to feed high capacity deep neural networks. In this work we exploit the complementarity of these two domains and propose an approach for improving few-shot learning through self-supervision. We use self-supervision as an auxiliary task in a few-shot learning pipeline, enabling feature extractors to learn richer and more transferable visual representations while still using few annotated samples. Through self-supervision, our approach can be naturally extended towards using diverse unlabeled data from other datasets in the few-shot setting. We report consistent improvements across an array of architectures, datasets and self-supervision techniques.
Tasks	Few-Shot Learning
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05186v1
PDF	https://arxiv.org/pdf/1906.05186v1.pdf
PWC	https://paperswithcode.com/paper/boosting-few-shot-visual-learning-with-self
Repo
Framework

Sparsely Activated Networks


Title	Sparsely Activated Networks
Authors	Paschalis Bizopoulos, Dimitrios Koutsouris
Abstract	Previous literature on unsupervised learning focused on designing structural priors with the aim of learning meaningful features. However, this was done without considering the description length of the learned representations which is a direct and unbiased measure of the model complexity. In this paper, first we introduce the $\varphi$ metric that evaluates unsupervised models based on their reconstruction accuracy and the degree of compression of their internal representations. We then present and define two activation functions (Identity, ReLU) as base of reference and three sparse activation functions (top-k absolutes, Extrema-Pool indices, Extrema) as candidate structures that minimize the previously defined $\varphi$. We lastly present Sparsely Activated Networks (SANs) that consist of kernels with shared weights that, during encoding, are convolved with the input and then passed through a sparse activation function. During decoding, the same weights are convolved with the sparse activation map and subsequently the partial reconstructions from each weight are summed to reconstruct the input. We compare SANs using the five previously defined activation functions on a variety of datasets (Physionet, UCI-epilepsy, MNIST, FMNIST) and show that models that are selected using $\varphi$ have small description representation length and consist of interpretable kernels.
Tasks	Model Selection
Published	2019-07-12
URL	https://arxiv.org/abs/1907.06592v3
PDF	https://arxiv.org/pdf/1907.06592v3.pdf
PWC	https://paperswithcode.com/paper/sparsely-activated-networks
Repo
Framework

On Object Symmetries and 6D Pose Estimation from Images


Title	On Object Symmetries and 6D Pose Estimation from Images
Authors	Giorgia Pitteri, Michaël Ramamonjisoa, Slobodan Ilic, Vincent Lepetit
Abstract	Objects with symmetries are common in our daily life and in industrial contexts, but are often ignored in the recent literature on 6D pose estimation from images. In this paper, we study in an analytical way the link between the symmetries of a 3D object and its appearance in images. We explain why symmetrical objects can be a challenge when training machine learning algorithms that aim at estimating their 6D pose from images. We propose an efficient and simple solution that relies on the normalization of the pose rotation. Our approach is general and can be used with any 6D pose estimation algorithm. Moreover, our method is also beneficial for objects that are ‘almost symmetrical’, i.e. objects for which only a detail breaks the symmetry. We validate our approach within a Faster-RCNN framework on a synthetic dataset made with objects from the T-Less dataset, which exhibit various types of symmetries, as well as real sequences from T-Less.
Tasks	6D Pose Estimation, Pose Estimation
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07640v1
PDF	https://arxiv.org/pdf/1908.07640v1.pdf
PWC	https://paperswithcode.com/paper/190807640
Repo
Framework

Attention-Aware Answers of the Crowd


Title	Attention-Aware Answers of the Crowd
Authors	Jingzheng Tu, Guoxian Yu, Jun Wang, Carlotta Domeniconi, Xiangliang Zhang
Abstract	Crowdsourcing is a relatively economic and efficient solution to collect annotations from the crowd through online platforms. Answers collected from workers with different expertise may be noisy and unreliable, and the quality of annotated data needs to be further maintained. Various solutions have been attempted to obtain high-quality annotations. However, they all assume that workers’ label quality is stable over time (always at the same level whenever they conduct the tasks). In practice, workers’ attention level changes over time, and the ignorance of which can affect the reliability of the annotations. In this paper, we focus on a novel and realistic crowdsourcing scenario involving attention-aware annotations. We propose a new probabilistic model that takes into account workers’ attention to estimate the label quality. Expectation propagation is adopted for efficient Bayesian inference of our model, and a generalized Expectation Maximization algorithm is derived to estimate both the ground truth of all tasks and the label-quality of each individual crowd worker with attention. In addition, the number of tasks best suited for a worker is estimated according to changes in attention. Experiments against related methods on three real-world and one semi-simulated datasets demonstrate that our method quantifies the relationship between workers’ attention and label-quality on the given tasks, and improves the aggregated labels.
Tasks	Bayesian Inference
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11238v2
PDF	https://arxiv.org/pdf/1912.11238v2.pdf
PWC	https://paperswithcode.com/paper/attention-aware-answers-of-the-crowd
Repo
Framework

Realization of spatial sparseness by deep ReLU nets with massive data


Title	Realization of spatial sparseness by deep ReLU nets with massive data
Authors	Charles K. Chui, Shao-Bo Lin, Bo Zhang, Ding-Xuan Zhou
Abstract	The great success of deep learning poses urgent challenges for understanding its working mechanism and rationality. The depth, structure, and massive size of the data are recognized to be three key ingredients for deep learning. Most of the recent theoretical studies for deep learning focus on the necessity and advantages of depth and structures of neural networks. In this paper, we aim at rigorous verification of the importance of massive data in embodying the out-performance of deep learning. To approximate and learn spatially sparse and smooth functions, we establish a novel sampling theorem in learning theory to show the necessity of massive data. We then prove that implementing the classical empirical risk minimization on some deep nets facilitates in realization of the optimal learning rates derived in the sampling theorem. This perhaps explains why deep learning performs so well in the era of big data.
Tasks
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07464v1
PDF	https://arxiv.org/pdf/1912.07464v1.pdf
PWC	https://paperswithcode.com/paper/realization-of-spatial-sparseness-by-deep
Repo
Framework

Sparse Learning for Variable Selection with Structures and Nonlinearities


Title	Sparse Learning for Variable Selection with Structures and Nonlinearities
Authors	Magda Gregorova
Abstract	In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of input variables the models naturally counteract the overfitting problem ubiquitous in learning from finite sets of training points. Sparse models are cheaper to use for predictions, they usually require lower computational resources and by relying on smaller sets of inputs can possibly reduce costs for data collection and storage. Sparse models can also contribute to better understanding of the investigated phenomenons as they are easier to interpret than full models.
Tasks	Sparse Learning
Published	2019-03-26
URL	http://arxiv.org/abs/1903.10978v1
PDF	http://arxiv.org/pdf/1903.10978v1.pdf
PWC	https://paperswithcode.com/paper/sparse-learning-for-variable-selection-with
Repo
Framework

Correctness Verification of Neural Networks


Title	Correctness Verification of Neural Networks
Authors	Yichen Yang, Martin Rinard
Abstract	We present the first verification that a neural network produces a correct output within a specified tolerance for every input of interest. We define correctness relative to a specification which identifies 1) a state space consisting of all relevant states of the world and 2) an observation process that produces neural network inputs from the states of the world. Tiling the state and input spaces with a finite number of tiles, obtaining ground truth bounds from the state tiles and network output bounds from the input tiles, then comparing the ground truth and network output bounds delivers an upper bound on the network output error for any input of interest. Results from a case study highlight the ability of our technique to deliver tight error bounds for all inputs of interest and show how the error bounds vary over the state and input spaces.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.01030v2
PDF	https://arxiv.org/pdf/1906.01030v2.pdf
PWC	https://paperswithcode.com/paper/correctness-verification-of-neural-networks
Repo
Framework