Paper Group ANR 1630
Automatic Information Extraction from Piping and Instrumentation Diagrams. Disentangled Deep Autoencoding Regularization for Robust Image Classification. Improving Long Handwritten Text Line Recognition with Convolutional Multi-way Associative Memory. Learning Actions from Human Demonstration Video for Robotic Manipulation. On the Correctness and S …
Automatic Information Extraction from Piping and Instrumentation Diagrams
Title | Automatic Information Extraction from Piping and Instrumentation Diagrams |
Authors | Rohit Rahul, Shubham Paliwal, Monika Sharma, Lovekesh Vig |
Abstract | One of the most common modes of representing engineering schematics are Piping and Instrumentation diagrams (P&IDs) that describe the layout of an engineering process flow along with the interconnected process equipment. Over the years, P&ID diagrams have been manually generated, scanned and stored as image files. These files need to be digitized for purposes of inventory management and updation, and easy reference to different components of the schematics. There are several challenging vision problems associated with digitizing real world P&ID diagrams. Real world P&IDs come in several different resolutions, and often contain noisy textual information. Extraction of instrumentation information from these diagrams involves accurate detection of symbols that frequently have minute visual differences between them. Identification of pipelines that may converge and diverge at different points in the image is a further cause for concern. Due to these reasons, to the best of our knowledge, no system has been proposed for end-to-end data extraction from P&ID diagrams. However, with the advent of deep learning and the spectacular successes it has achieved in vision, we hypothesized that it is now possible to re-examine this problem armed with the latest deep learning models. To that end, we present a novel pipeline for information extraction from P&ID sheets via a combination of traditional vision techniques and state-of-the-art deep learning models to identify and isolate pipeline codes, pipelines, inlets and outlets, and for detecting symbols. This is followed by association of the detected components with the appropriate pipeline. The extracted pipeline information is used to populate a tree-like data-structure for capturing the structure of the piping schematics. We evaluated proposed method on a real world dataset of P&ID sheets obtained from an oil firm and have obtained promising results. |
Tasks | |
Published | 2019-01-28 |
URL | http://arxiv.org/abs/1901.11383v1 |
http://arxiv.org/pdf/1901.11383v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-information-extraction-from-piping |
Repo | |
Framework | |
Disentangled Deep Autoencoding Regularization for Robust Image Classification
Title | Disentangled Deep Autoencoding Regularization for Robust Image Classification |
Authors | Zhenyu Duan, Martin Renqiang Min, Li Erran Li, Mingbo Cai, Yi Xu, Bingbing Ni |
Abstract | In spite of achieving revolutionary successes in machine learning, deep convolutional neural networks have been recently found to be vulnerable to adversarial attacks and difficult to generalize to novel test images with reasonably large geometric transformations. Inspired by a recent neuroscience discovery revealing that primate brain employs disentangled shape and appearance representations for object recognition, we propose a general disentangled deep autoencoding regularization framework that can be easily applied to any deep embedding based classification model for improving the robustness of deep neural networks. Our framework effectively learns disentangled appearance code and geometric code for robust image classification, which is the first disentangling based method defending against adversarial attacks and complementary to standard defense methods. Extensive experiments on several benchmark datasets show that, our proposed regularization framework leveraging disentangled embedding significantly outperforms traditional unregularized convolutional neural networks for image classification on robustness against adversarial attacks and generalization to novel test data. |
Tasks | Image Classification, Object Recognition |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1902.11134v1 |
http://arxiv.org/pdf/1902.11134v1.pdf | |
PWC | https://paperswithcode.com/paper/disentangled-deep-autoencoding-regularization |
Repo | |
Framework | |
Improving Long Handwritten Text Line Recognition with Convolutional Multi-way Associative Memory
Title | Improving Long Handwritten Text Line Recognition with Convolutional Multi-way Associative Memory |
Authors | Duc Nguyen, Nhan Tran, Hung Le |
Abstract | Convolutional Recurrent Neural Networks (CRNNs) excel at scene text recognition. Unfortunately, they are likely to suffer from vanishing/exploding gradient problems when processing long text images, which are commonly found in scanned documents. This poses a major challenge to goal of completely solving Optical Character Recognition (OCR) problem. Inspired by recently proposed memory-augmented neural networks (MANNs) for long-term sequential modeling, we present a new architecture dubbed Convolutional Multi-way Associative Memory (CMAM) to tackle the limitation of current CRNNs. By leveraging recent memory accessing mechanisms in MANNs, our architecture demonstrates superior performance against other CRNN counterparts in three real-world long text OCR datasets. |
Tasks | Optical Character Recognition, Scene Text Recognition |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01577v2 |
https://arxiv.org/pdf/1911.01577v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-long-handwritten-text-line |
Repo | |
Framework | |
Learning Actions from Human Demonstration Video for Robotic Manipulation
Title | Learning Actions from Human Demonstration Video for Robotic Manipulation |
Authors | Shuo Yang, Wei Zhang, Weizhi Lu, Hesheng Wang, Yibin Li |
Abstract | Learning actions from human demonstration is an emerging trend for designing intelligent robotic systems, which can be referred as video to command. The performance of such approach highly relies on the quality of video captioning. However, the general video captioning methods focus more on the understanding of the full frame, lacking of consideration on the specific object of interests in robotic manipulations. We propose a novel deep model to learn actions from human demonstration video for robotic manipulation. It consists of two deep networks, grasp detection network (GNet) and video captioning network (CNet). GNet performs two functions: providing grasp solutions and extracting the local features for the object of interests in robotic manipulation. CNet outputs the captioning results by fusing the features of both full frames and local objects. Experimental results on UR5 robotic arm show that our method could produce more accurate command from video demonstration than state-of-the-art work, thereby leading to more robust grasping performance. |
Tasks | Video Captioning |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04312v1 |
https://arxiv.org/pdf/1909.04312v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-actions-from-human-demonstration |
Repo | |
Framework | |
On the Correctness and Sample Complexity of Inverse Reinforcement Learning
Title | On the Correctness and Sample Complexity of Inverse Reinforcement Learning |
Authors | Abi Komanduru, Jean Honorio |
Abstract | Inverse reinforcement learning (IRL) is the problem of finding a reward function that generates a given optimal policy for a given Markov Decision Process. This paper looks at an algorithmic-independent geometric analysis of the IRL problem with finite states and actions. A L1-regularized Support Vector Machine formulation of the IRL problem motivated by the geometric analysis is then proposed with the basic objective of the inverse reinforcement problem in mind: to find a reward function that generates a specified optimal policy. The paper further analyzes the proposed formulation of inverse reinforcement learning with $n$ states and $k$ actions, and shows a sample complexity of $O(n^2 \log (nk))$ for recovering a reward function that generates a policy that satisfies Bellman’s optimality condition with respect to the true transition probabilities. |
Tasks | |
Published | 2019-06-02 |
URL | https://arxiv.org/abs/1906.00422v1 |
https://arxiv.org/pdf/1906.00422v1.pdf | |
PWC | https://paperswithcode.com/paper/190600422 |
Repo | |
Framework | |
Pseudo-Labeling Curriculum for Unsupervised Domain Adaptation
Title | Pseudo-Labeling Curriculum for Unsupervised Domain Adaptation |
Authors | Jaehoon Choi, Minki Jeong, Taekyung Kim, Changick Kim |
Abstract | To learn target discriminative representations, using pseudo-labels is a simple yet effective approach for unsupervised domain adaptation. However, the existence of false pseudo-labels, which may have a detrimental influence on learning target representations, remains a major challenge. To overcome this issue, we propose a pseudo-labeling curriculum based on a density-based clustering algorithm. Since samples with high density values are more likely to have correct pseudo-labels, we leverage these subsets to train our target network at the early stage, and utilize data subsets with low density values at the later stage. We can progressively improve the capability of our network to generate pseudo-labels, and thus these target samples with pseudo-labels are effective for training our model. Moreover, we present a clustering constraint to enhance the discriminative power of the learned target features. Our approach achieves state-of-the-art performance on three benchmarks: Office-31, imageCLEF-DA, and Office-Home. |
Tasks | Domain Adaptation, Semi-Supervised Image Classification, Unsupervised Domain Adaptation |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00262v1 |
https://arxiv.org/pdf/1908.00262v1.pdf | |
PWC | https://paperswithcode.com/paper/pseudo-labeling-curriculum-for-unsupervised |
Repo | |
Framework | |
Few-Shot Meta-Denoising
Title | Few-Shot Meta-Denoising |
Authors | Leslie Casas, Attila Klimmek, Gustavo Carneiro, Nassir Navab, Vasileios Belagiannis |
Abstract | We study the problem of few-shot learning-based denoising where the training set contains just a handful of clean and noisy samples. A solution to mitigate the small training set issue is to pre-train a denoising model with small training sets containing pairs of clean and synthesized noisy signals, produced from empirical noise priors, and fine-tune on the available small training set. While such transfer learning seems effective, it may not generalize well because of the limited amount of training data. In this work, we propose a new meta-learning training approach for few-shot learning-based denoising problems. Our model is meta-trained using known synthetic noise models, and then fine-tuned with the small training set, with the real noise, as a few-shot learning task. Meta-learning from small training sets of synthetically generated data during meta-training enables us to not only generate an infinite number of training tasks, but also train a model to learn with small training sets – both advantages have the potential to improve the generalisation of the denoising model. Our approach is empirically shown to produce more accurate denoising results than supervised learning and transfer learning in three denoising evaluations for images and 1-D signals. Interestingly, our study provides strong indications that meta-learning has the potential to become the main learning algorithm for denoising. |
Tasks | Denoising, Few-Shot Learning, Meta-Learning, Transfer Learning |
Published | 2019-07-31 |
URL | https://arxiv.org/abs/1908.00111v2 |
https://arxiv.org/pdf/1908.00111v2.pdf | |
PWC | https://paperswithcode.com/paper/few-shot-meta-denoising |
Repo | |
Framework | |
Tagged Back-Translation
Title | Tagged Back-Translation |
Authors | Isaac Caswell, Ciprian Chelba, David Grangier |
Abstract | Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data. We show that the main role of such synthetic noise is not to diversify the source side, as previously suggested, but simply to indicate to the model that the given source is synthetic. We propose a simpler alternative to noising techniques, consisting of tagging back-translated source sentences with an extra token. Our results on WMT outperform noised back-translation in English-Romanian and match performance on English-German, re-defining state-of-the-art in the former. |
Tasks | Machine Translation |
Published | 2019-06-15 |
URL | https://arxiv.org/abs/1906.06442v1 |
https://arxiv.org/pdf/1906.06442v1.pdf | |
PWC | https://paperswithcode.com/paper/tagged-back-translation |
Repo | |
Framework | |
Boosting Few-Shot Visual Learning with Self-Supervision
Title | Boosting Few-Shot Visual Learning with Self-Supervision |
Authors | Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, Matthieu Cord |
Abstract | Few-shot learning and self-supervised learning address different facets of the same problem: how to train a model with little or no labeled data. Few-shot learning aims for optimization methods and models that can learn efficiently to recognize patterns in the low data regime. Self-supervised learning focuses instead on unlabeled data and looks into it for the supervisory signal to feed high capacity deep neural networks. In this work we exploit the complementarity of these two domains and propose an approach for improving few-shot learning through self-supervision. We use self-supervision as an auxiliary task in a few-shot learning pipeline, enabling feature extractors to learn richer and more transferable visual representations while still using few annotated samples. Through self-supervision, our approach can be naturally extended towards using diverse unlabeled data from other datasets in the few-shot setting. We report consistent improvements across an array of architectures, datasets and self-supervision techniques. |
Tasks | Few-Shot Learning |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05186v1 |
https://arxiv.org/pdf/1906.05186v1.pdf | |
PWC | https://paperswithcode.com/paper/boosting-few-shot-visual-learning-with-self |
Repo | |
Framework | |
Sparsely Activated Networks
Title | Sparsely Activated Networks |
Authors | Paschalis Bizopoulos, Dimitrios Koutsouris |
Abstract | Previous literature on unsupervised learning focused on designing structural priors with the aim of learning meaningful features. However, this was done without considering the description length of the learned representations which is a direct and unbiased measure of the model complexity. In this paper, first we introduce the $\varphi$ metric that evaluates unsupervised models based on their reconstruction accuracy and the degree of compression of their internal representations. We then present and define two activation functions (Identity, ReLU) as base of reference and three sparse activation functions (top-k absolutes, Extrema-Pool indices, Extrema) as candidate structures that minimize the previously defined $\varphi$. We lastly present Sparsely Activated Networks (SANs) that consist of kernels with shared weights that, during encoding, are convolved with the input and then passed through a sparse activation function. During decoding, the same weights are convolved with the sparse activation map and subsequently the partial reconstructions from each weight are summed to reconstruct the input. We compare SANs using the five previously defined activation functions on a variety of datasets (Physionet, UCI-epilepsy, MNIST, FMNIST) and show that models that are selected using $\varphi$ have small description representation length and consist of interpretable kernels. |
Tasks | Model Selection |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.06592v3 |
https://arxiv.org/pdf/1907.06592v3.pdf | |
PWC | https://paperswithcode.com/paper/sparsely-activated-networks |
Repo | |
Framework | |
On Object Symmetries and 6D Pose Estimation from Images
Title | On Object Symmetries and 6D Pose Estimation from Images |
Authors | Giorgia Pitteri, Michaël Ramamonjisoa, Slobodan Ilic, Vincent Lepetit |
Abstract | Objects with symmetries are common in our daily life and in industrial contexts, but are often ignored in the recent literature on 6D pose estimation from images. In this paper, we study in an analytical way the link between the symmetries of a 3D object and its appearance in images. We explain why symmetrical objects can be a challenge when training machine learning algorithms that aim at estimating their 6D pose from images. We propose an efficient and simple solution that relies on the normalization of the pose rotation. Our approach is general and can be used with any 6D pose estimation algorithm. Moreover, our method is also beneficial for objects that are ‘almost symmetrical’, i.e. objects for which only a detail breaks the symmetry. We validate our approach within a Faster-RCNN framework on a synthetic dataset made with objects from the T-Less dataset, which exhibit various types of symmetries, as well as real sequences from T-Less. |
Tasks | 6D Pose Estimation, Pose Estimation |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07640v1 |
https://arxiv.org/pdf/1908.07640v1.pdf | |
PWC | https://paperswithcode.com/paper/190807640 |
Repo | |
Framework | |
Attention-Aware Answers of the Crowd
Title | Attention-Aware Answers of the Crowd |
Authors | Jingzheng Tu, Guoxian Yu, Jun Wang, Carlotta Domeniconi, Xiangliang Zhang |
Abstract | Crowdsourcing is a relatively economic and efficient solution to collect annotations from the crowd through online platforms. Answers collected from workers with different expertise may be noisy and unreliable, and the quality of annotated data needs to be further maintained. Various solutions have been attempted to obtain high-quality annotations. However, they all assume that workers’ label quality is stable over time (always at the same level whenever they conduct the tasks). In practice, workers’ attention level changes over time, and the ignorance of which can affect the reliability of the annotations. In this paper, we focus on a novel and realistic crowdsourcing scenario involving attention-aware annotations. We propose a new probabilistic model that takes into account workers’ attention to estimate the label quality. Expectation propagation is adopted for efficient Bayesian inference of our model, and a generalized Expectation Maximization algorithm is derived to estimate both the ground truth of all tasks and the label-quality of each individual crowd worker with attention. In addition, the number of tasks best suited for a worker is estimated according to changes in attention. Experiments against related methods on three real-world and one semi-simulated datasets demonstrate that our method quantifies the relationship between workers’ attention and label-quality on the given tasks, and improves the aggregated labels. |
Tasks | Bayesian Inference |
Published | 2019-12-24 |
URL | https://arxiv.org/abs/1912.11238v2 |
https://arxiv.org/pdf/1912.11238v2.pdf | |
PWC | https://paperswithcode.com/paper/attention-aware-answers-of-the-crowd |
Repo | |
Framework | |
Realization of spatial sparseness by deep ReLU nets with massive data
Title | Realization of spatial sparseness by deep ReLU nets with massive data |
Authors | Charles K. Chui, Shao-Bo Lin, Bo Zhang, Ding-Xuan Zhou |
Abstract | The great success of deep learning poses urgent challenges for understanding its working mechanism and rationality. The depth, structure, and massive size of the data are recognized to be three key ingredients for deep learning. Most of the recent theoretical studies for deep learning focus on the necessity and advantages of depth and structures of neural networks. In this paper, we aim at rigorous verification of the importance of massive data in embodying the out-performance of deep learning. To approximate and learn spatially sparse and smooth functions, we establish a novel sampling theorem in learning theory to show the necessity of massive data. We then prove that implementing the classical empirical risk minimization on some deep nets facilitates in realization of the optimal learning rates derived in the sampling theorem. This perhaps explains why deep learning performs so well in the era of big data. |
Tasks | |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07464v1 |
https://arxiv.org/pdf/1912.07464v1.pdf | |
PWC | https://paperswithcode.com/paper/realization-of-spatial-sparseness-by-deep |
Repo | |
Framework | |
Sparse Learning for Variable Selection with Structures and Nonlinearities
Title | Sparse Learning for Variable Selection with Structures and Nonlinearities |
Authors | Magda Gregorova |
Abstract | In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of input variables the models naturally counteract the overfitting problem ubiquitous in learning from finite sets of training points. Sparse models are cheaper to use for predictions, they usually require lower computational resources and by relying on smaller sets of inputs can possibly reduce costs for data collection and storage. Sparse models can also contribute to better understanding of the investigated phenomenons as they are easier to interpret than full models. |
Tasks | Sparse Learning |
Published | 2019-03-26 |
URL | http://arxiv.org/abs/1903.10978v1 |
http://arxiv.org/pdf/1903.10978v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-learning-for-variable-selection-with |
Repo | |
Framework | |
Correctness Verification of Neural Networks
Title | Correctness Verification of Neural Networks |
Authors | Yichen Yang, Martin Rinard |
Abstract | We present the first verification that a neural network produces a correct output within a specified tolerance for every input of interest. We define correctness relative to a specification which identifies 1) a state space consisting of all relevant states of the world and 2) an observation process that produces neural network inputs from the states of the world. Tiling the state and input spaces with a finite number of tiles, obtaining ground truth bounds from the state tiles and network output bounds from the input tiles, then comparing the ground truth and network output bounds delivers an upper bound on the network output error for any input of interest. Results from a case study highlight the ability of our technique to deliver tight error bounds for all inputs of interest and show how the error bounds vary over the state and input spaces. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.01030v2 |
https://arxiv.org/pdf/1906.01030v2.pdf | |
PWC | https://paperswithcode.com/paper/correctness-verification-of-neural-networks |
Repo | |
Framework | |