October 17, 2019

3258 words 16 mins read

Paper Group ANR 787

Paper Group ANR 787

Generative Temporal Models with Spatial Memory for Partially Observed Environments. A Survey on Joint Object Detection and Pose Estimation using Monocular Vision. Network-based protein structural classification. Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences. Local Probabilistic Model for Bayesian Classification: …

Generative Temporal Models with Spatial Memory for Partially Observed Environments

Title Generative Temporal Models with Spatial Memory for Partially Observed Environments
Authors Marco Fraccaro, Danilo Jimenez Rezende, Yori Zwols, Alexander Pritzel, S. M. Ali Eslami, Fabio Viola
Abstract In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent’s representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially partially-observed and 3D environments. In this work we introduce a novel action-conditioned generative model of such challenging environments. The model features a non-parametric spatial memory system in which we store learned, disentangled representations of the environment. Low-dimensional spatial updates are computed using a state-space model that makes use of knowledge on the prior dynamics of the moving agent, and high-dimensional visual observations are modelled with a Variational Auto-Encoder. The result is a scalable architecture capable of performing coherent predictions over hundreds of time steps across a range of partially observed 2D and 3D environments.
Tasks
Published 2018-04-25
URL http://arxiv.org/abs/1804.09401v2
PDF http://arxiv.org/pdf/1804.09401v2.pdf
PWC https://paperswithcode.com/paper/generative-temporal-models-with-spatial
Repo
Framework

A Survey on Joint Object Detection and Pose Estimation using Monocular Vision

Title A Survey on Joint Object Detection and Pose Estimation using Monocular Vision
Authors Aniruddha V Patil, Pankaj Rabha
Abstract In this survey we present a complete landscape of joint object detection and pose estimation methods that use monocular vision. Descriptions of traditional approaches that involve descriptors or models and various estimation methods have been provided. These descriptors or models include chordiograms, shape-aware deformable parts model, bag of boundaries, distance transform templates, natural 3D markers and facet features whereas the estimation methods include iterative clustering estimation, probabilistic networks and iterative genetic matching. Hybrid approaches that use handcrafted feature extraction followed by estimation by deep learning methods have been outlined. We have investigated and compared, wherever possible, pure deep learning based approaches (single stage and multi stage) for this problem. Comprehensive details of the various accuracy measures and metrics have been illustrated. For the purpose of giving a clear overview, the characteristics of relevant datasets are discussed. The trends that prevailed from the infancy of this problem until now have also been highlighted.
Tasks Object Detection, Pose Estimation
Published 2018-11-26
URL http://arxiv.org/abs/1811.10216v1
PDF http://arxiv.org/pdf/1811.10216v1.pdf
PWC https://paperswithcode.com/paper/a-survey-on-joint-object-detection-and-pose
Repo
Framework

Network-based protein structural classification

Title Network-based protein structural classification
Authors Khalique Newaz, Mahboobeh Ghalehnovi, Arash Rahnama, Panos J. Antsaklis, Tijana Milenkovic
Abstract Experimental determination of protein function is resource-consuming. As an alternative, computational prediction of protein function has received attention. In this context, protein structural classification (PSC) can help, by allowing for determining structural classes of currently unclassified proteins based on their features, and then relying on the fact that proteins with similar structures have similar functions. Existing PSC approaches rely on sequence-based or direct 3-dimensional (3D) structure-based protein features. In contrast, we first model 3D structures of proteins as protein structure networks (PSNs). Then, we use network-based features for PSC. We propose the use of graphlets, state-of-the-art features in many research areas of network science, in the task of PSC. Moreover, because graphlets can deal only with unweighted PSNs, and because accounting for edge weights when constructing PSNs could improve PSC accuracy, we also propose a deep learning framework that automatically learns network features from weighted PSNs. When evaluated on a large set of ~9,400 CATH and ~12,800 SCOP protein domains (spanning 36 PSN sets), our proposed approaches are superior to existing PSC approaches in terms of accuracy, with comparable running time.
Tasks
Published 2018-04-12
URL https://arxiv.org/abs/1804.04725v7
PDF https://arxiv.org/pdf/1804.04725v7.pdf
PWC https://paperswithcode.com/paper/network-based-protein-structural
Repo
Framework

Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences

Title Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences
Authors Borja Balle, Gilles Barthe, Marco Gaboardi
Abstract Differential privacy comes equipped with multiple analytical tools for the design of private data analyses. One important tool is the so-called “privacy amplification by subsampling” principle, which ensures that a differentially private mechanism run on a random subsample of a population provides higher privacy guarantees than when run on the entire population. Several instances of this principle have been studied for different random subsampling methods, each with an ad-hoc analysis. In this paper we present a general method that recovers and improves prior analyses, yields lower bounds and derives new instances of privacy amplification by subsampling. Our method leverages a characterization of differential privacy as a divergence which emerged in the program verification community. Furthermore, it introduces new tools, including advanced joint convexity and privacy profiles, which might be of independent interest.
Tasks
Published 2018-07-04
URL http://arxiv.org/abs/1807.01647v2
PDF http://arxiv.org/pdf/1807.01647v2.pdf
PWC https://paperswithcode.com/paper/privacy-amplification-by-subsampling-tight
Repo
Framework

Local Probabilistic Model for Bayesian Classification: a Generalized Local Classification Model

Title Local Probabilistic Model for Bayesian Classification: a Generalized Local Classification Model
Authors Chengsheng Mao, Lijuan Lu, Bin Hu
Abstract In Bayesian classification, it is important to establish a probabilistic model for each class for likelihood estimation. Most of the previous methods modeled the probability distribution in the whole sample space. However, real-world problems are usually too complex to model in the whole sample space; some fundamental assumptions are required to simplify the global model, for example, the class conditional independence assumption for naive Bayesian classification. In this paper, with the insight that the distribution in a local sample space should be simpler than that in the whole sample space, a local probabilistic model established for a local region is expected much simpler and can relax the fundamental assumptions that may not be true in the whole sample space. Based on these advantages we propose establishing local probabilistic models for Bayesian classification. In addition, a Bayesian classifier adopting a local probabilistic model can even be viewed as a generalized local classification model; by tuning the size of the local region and the corresponding local model assumption, a fitting model can be established for a particular classification problem. The experimental results on several real-world datasets demonstrate the effectiveness of local probabilistic models for Bayesian classification.
Tasks
Published 2018-12-13
URL http://arxiv.org/abs/1812.05221v1
PDF http://arxiv.org/pdf/1812.05221v1.pdf
PWC https://paperswithcode.com/paper/local-probabilistic-model-for-bayesian
Repo
Framework

DSCnet: Replicating Lidar Point Clouds with Deep Sensor Cloning

Title DSCnet: Replicating Lidar Point Clouds with Deep Sensor Cloning
Authors Paden Tomasello, Sammy Sidhu, Anting Shen, Matthew W. Moskewicz, Nobie Redmon, Gayatri Joshi, Romi Phadte, Paras Jain, Forrest Iandola
Abstract Convolutional neural networks (CNNs) have become increasingly popular for solving a variety of computer vision tasks, ranging from image classification to image segmentation. Recently, autonomous vehicles have created a demand for depth information, which is often obtained using hardware sensors such as Light detection and ranging (LIDAR). Although it can provide precise distance measurements, most LIDARs are still far too expensive to sell in mass-produced consumer vehicles, which has motivated methods to generate depth information from commodity automotive sensors like cameras. In this paper, we propose an approach called Deep Sensor Cloning (DSC). The idea is to use Convolutional Neural Networks in conjunction with inexpensive sensors to replicate the 3D point-clouds that are created by expensive LIDARs. To accomplish this, we develop a new dataset (DSDepth) and a new family of CNN architectures (DSCnets). While previous tasks such as KITTI depth prediction use an interpolated RGB-D images as ground-truth for training, we instead use DSCnets to directly predict LIDAR point-clouds. When we compare the output of our models to a $75,000 LIDAR, we find that our most accurate DSCnet achieves a relative error of 5.77% using a single camera and 4.69% using stereo cameras.
Tasks Autonomous Vehicles, Depth Estimation, Image Classification, Semantic Segmentation
Published 2018-11-17
URL http://arxiv.org/abs/1811.07070v2
PDF http://arxiv.org/pdf/1811.07070v2.pdf
PWC https://paperswithcode.com/paper/dscnet-replicating-lidar-point-clouds-with
Repo
Framework

A Novel Multi-clustering Method for Hierarchical Clusterings, Based on Boosting

Title A Novel Multi-clustering Method for Hierarchical Clusterings, Based on Boosting
Authors Elaheh Rashedi, Abdolreza Mirzaei
Abstract Bagging and boosting are proved to be the best methods of building multiple classifiers in classification combination problems. In the area of “flat clustering” problems, it is also recognized that multi-clustering methods based on boosting provide clusterings of an improved quality. In this paper, we introduce a novel multi-clustering method for “hierarchical clusterings” based on boosting theory, which creates a more stable hierarchical clustering of a dataset. The proposed algorithm includes a boosting iteration in which a bootstrap of samples is created by weighted random sampling of elements from the original dataset. A hierarchical clustering algorithm is then applied to selected subsample to build a dendrogram which describes the hierarchy. Finally, dissimilarity description matrices of multiple dendrogram results are combined to a consensus one, using a hierarchical-clustering-combination approach. Experiments on real popular datasets show that boosted method provides superior quality solutions compared to standard hierarchical clustering methods.
Tasks
Published 2018-05-29
URL http://arxiv.org/abs/1805.11712v1
PDF http://arxiv.org/pdf/1805.11712v1.pdf
PWC https://paperswithcode.com/paper/a-novel-multi-clustering-method-for
Repo
Framework

Perceptual Conditional Generative Adversarial Networks for End-to-End Image Colourization

Title Perceptual Conditional Generative Adversarial Networks for End-to-End Image Colourization
Authors Shirsendu Sukanta Halder, Kanjar De, Partha Pratim Roy
Abstract Colours are everywhere. They embody a significant part of human visual perception. In this paper, we explore the paradigm of hallucinating colours from a given gray-scale image. The problem of colourization has been dealt in previous literature but mostly in a supervised manner involving user-interference. With the emergence of Deep Learning methods numerous tasks related to computer vision and pattern recognition have been automatized and carried in an end-to-end fashion due to the availability of large data-sets and high-power computing systems. We investigate and build upon the recent success of Conditional Generative Adversarial Networks (cGANs) for Image-to-Image translations. In addition to using the training scheme in the basic cGAN, we propose an encoder-decoder generator network which utilizes the class-specific cross-entropy loss as well as the perceptual loss in addition to the original objective function of cGAN. We train our model on a large-scale dataset and present illustrative qualitative and quantitative analysis of our results. Our results vividly display the versatility and proficiency of our methods through life-like colourization outcomes.
Tasks
Published 2018-11-27
URL http://arxiv.org/abs/1811.10801v1
PDF http://arxiv.org/pdf/1811.10801v1.pdf
PWC https://paperswithcode.com/paper/perceptual-conditional-generative-adversarial
Repo
Framework

Unsupervised Person Re-identification by Deep Learning Tracklet Association

Title Unsupervised Person Re-identification by Deep Learning Tracklet Association
Authors Minxian Li, Xiatian Zhu, Shaogang Gong
Abstract Mostexistingpersonre-identification(re-id)methods relyon supervised model learning on per-camera-pair manually labelled pairwise training data. This leads to poor scalability in practical re-id deployment due to the lack of exhaustive identity labelling of image positive and negative pairs for every camera pair. In this work, we address this problem by proposing an unsupervised re-id deep learning approach capable of incrementally discovering and exploiting the underlying re-id discriminative information from automatically generated person tracklet data from videos in an end-to-end model optimisation. We formulate a Tracklet Association Unsupervised Deep Learning (TAUDL) framework characterised by jointly learning per-camera (within-camera) tracklet association (labelling) and cross-camera tracklet correlation by maximising the discovery of most likely tracklet relationships across camera views. Extensive experiments demonstrate the superiority of the proposed TAUDL model over the state-of-the-art unsupervised and domain adaptation re- id methods using six person re-id benchmarking datasets.
Tasks Domain Adaptation, Person Re-Identification, Unsupervised Person Re-Identification
Published 2018-09-08
URL http://arxiv.org/abs/1809.02874v1
PDF http://arxiv.org/pdf/1809.02874v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-person-re-identification-by-deep-1
Repo
Framework

Deep Probabilistic Programming Languages: A Qualitative Study

Title Deep Probabilistic Programming Languages: A Qualitative Study
Authors Guillaume Baudart, Martin Hirzel, Louis Mandel
Abstract Deep probabilistic programming languages try to combine the advantages of deep learning with those of probabilistic programming languages. If successful, this would be a big step forward in machine learning and programming languages. Unfortunately, as of now, this new crop of languages is hard to use and understand. This paper addresses this problem directly by explaining deep probabilistic programming languages and indirectly by characterizing their current strengths and weaknesses.
Tasks Probabilistic Programming
Published 2018-04-17
URL http://arxiv.org/abs/1804.06458v1
PDF http://arxiv.org/pdf/1804.06458v1.pdf
PWC https://paperswithcode.com/paper/deep-probabilistic-programming-languages-a
Repo
Framework

User Information Augmented Semantic Frame Parsing using Coarse-to-Fine Neural Networks

Title User Information Augmented Semantic Frame Parsing using Coarse-to-Fine Neural Networks
Authors Yilin Shen, Xiangyu Zeng, Yu Wang, Hongxia Jin
Abstract Semantic frame parsing is a crucial component in spoken language understanding (SLU) to build spoken dialog systems. It has two main tasks: intent detection and slot filling. Although state-of-the-art approaches showed good results, they require large annotated training data and long training time. In this paper, we aim to alleviate these drawbacks for semantic frame parsing by utilizing the ubiquitous user information. We design a novel coarse-to-fine deep neural network model to incorporate prior knowledge of user information intermediately to better and quickly train a semantic frame parser. Due to the lack of benchmark dataset with real user information, we synthesize the simplest type of user information (location and time) on ATIS benchmark data. The results show that our approach leverages such simple user information to outperform state-of-the-art approaches by 0.25% for intent detection and 0.31% for slot filling using standard training data. When using smaller training data, the performance improvement on intent detection and slot filling reaches up to 1.35% and 1.20% respectively. We also show that our approach can achieve similar performance as state-of-the-art approaches by using less than 80% annotated training data. Moreover, the training time to achieve the similar performance is also reduced by over 60%.
Tasks Intent Detection, Slot Filling, Spoken Language Understanding
Published 2018-09-18
URL http://arxiv.org/abs/1809.06559v1
PDF http://arxiv.org/pdf/1809.06559v1.pdf
PWC https://paperswithcode.com/paper/user-information-augmented-semantic-frame
Repo
Framework

Recurrent Calibration Network for Irregular Text Recognition

Title Recurrent Calibration Network for Irregular Text Recognition
Authors Yunze Gao, Yingying Chen, Jinqiao Wang, Zhen Lei, Xiao-Yu Zhang, Hanqing Lu
Abstract Scene text recognition has received increased attention in the research community. Text in the wild often possesses irregular arrangements, typically including perspective text, curved text, oriented text. Most existing methods are hard to work well for irregular text, especially for severely distorted text. In this paper, we propose a novel Recurrent Calibration Network (RCN) for irregular scene text recognition. The RCN progressively calibrates the irregular text to boost the recognition performance. By decomposing the calibration process into multiple steps, the irregular text can be calibrated to normal one step by step. Besides, in order to avoid the accumulation of lost information caused by inaccurate transformation, we further design a fiducial-point refinement structure to keep the integrity of text during the recurrent process. Instead of the calibrated images, the coordinates of fiducial points are tracked and refined, which implicitly models the transformation information. Based on the refined fiducial points, we estimate the transformation parameters and sample from the original image at each step. In this way, the original character information is preserved until the final transformation. Such designs lead to optimal calibration results to boost the performance of succeeding recognition. Extensive experiments on challenging datasets demonstrate the superiority of our method, especially on irregular benchmarks.
Tasks Calibration, Irregular Text Recognition, Scene Text Recognition
Published 2018-12-18
URL http://arxiv.org/abs/1812.07145v1
PDF http://arxiv.org/pdf/1812.07145v1.pdf
PWC https://paperswithcode.com/paper/recurrent-calibration-network-for-irregular
Repo
Framework

Source-Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language

Title Source-Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language
Authors He Bai, Yu Zhou, Jiajun Zhang, Liang Zhao, Mei-Yuh Hwang, Chengqing Zong
Abstract To deploy a spoken language understanding (SLU) model to a new language, language transferring is desired to avoid the trouble of acquiring and labeling a new big SLU corpus. Translating the original SLU corpus into the target language is an attractive strategy. However, SLU corpora consist of plenty of semantic labels (slots), which general-purpose translators cannot handle well, not to mention additional culture differences. This paper focuses on the language transferring task given a tiny in-domain parallel SLU corpus. The in-domain parallel corpus can be used as the first adaptation on the general translator. But more importantly, we show how to use reinforcement learning (RL) to further finetune the adapted translator, where translated sentences with more proper slot tags receive higher rewards. We evaluate our approach on Chinese to English language transferring for SLU systems. The experimental results show that the generated English SLU corpus via adaptation and reinforcement learning gives us over 97% in the slot F1 score and over 84% accuracy in domain classification. It demonstrates the effectiveness of the proposed language transferring method. Compared with naive translation, our proposed method improves domain classification accuracy by relatively 22%, and the slot filling F1 score by relatively more than 71%.
Tasks Slot Filling, Spoken Language Understanding
Published 2018-08-19
URL http://arxiv.org/abs/1808.06167v2
PDF http://arxiv.org/pdf/1808.06167v2.pdf
PWC https://paperswithcode.com/paper/source-critical-reinforcement-learning-for
Repo
Framework

Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery

Title Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery
Authors Sebastian Bodenstedt, Max Allan, Anthony Agustinos, Xiaofei Du, Luis Garcia-Peraza-Herrera, Hannes Kenngott, Thomas Kurmann, Beat Müller-Stich, Sebastien Ourselin, Daniil Pakhomov, Raphael Sznitman, Marvin Teichmann, Martin Thoma, Tom Vercauteren, Sandrine Voros, Martin Wagner, Pamela Wochner, Lena Maier-Hein, Danail Stoyanov, Stefanie Speidel
Abstract Intraoperative segmentation and tracking of minimally invasive instruments is a prerequisite for computer- and robotic-assisted surgery. Since additional hardware like tracking systems or the robot encoders are cumbersome and lack accuracy, surgical vision is evolving as promising techniques to segment and track the instruments using only the endoscopic images. However, what is missing so far are common image data sets for consistent evaluation and benchmarking of algorithms against each other. The paper presents a comparative validation study of different vision-based methods for instrument segmentation and tracking in the context of robotic as well as conventional laparoscopic surgery. The contribution of the paper is twofold: we introduce a comprehensive validation data set that was provided to the study participants and present the results of the comparative validation study. Based on the results of the validation study, we arrive at the conclusion that modern deep learning approaches outperform other methods in instrument segmentation tasks, but the results are still not perfect. Furthermore, we show that merging results from different methods actually significantly increases accuracy in comparison to the best stand-alone method. On the other hand, the results of the instrument tracking task show that this is still an open challenge, especially during challenging scenarios in conventional laparoscopic surgery.
Tasks
Published 2018-05-07
URL http://arxiv.org/abs/1805.02475v1
PDF http://arxiv.org/pdf/1805.02475v1.pdf
PWC https://paperswithcode.com/paper/comparative-evaluation-of-instrument
Repo
Framework

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Title Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering
Authors Somak Aditya, Yezhou Yang, Chitta Baral
Abstract Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image. Current state-of-the-art systems attempted to solve the task using deep neural architectures and achieved promising performance. However, the resulting systems are generally opaque and they struggle in understanding questions for which extra knowledge is required. In this paper, we present an explicit reasoning layer on top of a set of penultimate neural network based systems. The reasoning layer enables reasoning and answering questions where additional knowledge is required, and at the same time provides an interpretable interface to the end users. Specifically, the reasoning layer adopts a Probabilistic Soft Logic (PSL) based engine to reason over a basket of inputs: visual relations, the semantic parse of the question, and background ontological knowledge from word2vec and ConceptNet. Experimental analysis of the answers and the key evidential predicates generated on the VQA dataset validate our approach.
Tasks Question Answering, Visual Question Answering
Published 2018-03-23
URL http://arxiv.org/abs/1803.08896v1
PDF http://arxiv.org/pdf/1803.08896v1.pdf
PWC https://paperswithcode.com/paper/explicit-reasoning-over-end-to-end-neural
Repo
Framework
comments powered by Disqus