October 21, 2019

3136 words 15 mins read

Paper Group AWR 150

Paper Group AWR 150

Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics. CFCM: Segmentation via Coarse to Fine Context Memory. Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds. Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road. Probabilistic Formulations of Regression with Mixed Guidance. Tensor Rob …

Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics

Title Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics
Authors Luciano Melodia
Abstract The distribution of energy dose from Lu$^{177}$ radiotherapy can be estimated by convolving an image of a time-integrated activity distribution with a dose voxel kernel (dvk) consisting of different types of tissues. This fast and inacurate approximation is inappropriate for personalized dosimetry as it neglects tissue heterogenity. The latter can be calculated using different imaging techniques such as CT and SPECT combined with a time consuming monte-carlo simulation. The aim of this study is, for the first time, an estimation of DVKs from CT-derived density kernels (dk) via deep learning in convolutional neural networks (cnns). The proposed cnn achieved, on the test set, a mean intersection over union (iou) of $= 0.86$ after $308$ epochs and a corresponding mean squared error (mse) $= 1.24 \cdot 10^{-4}$. This generalization ability shows that the trained cnn can indeed learn the complex transfer function from dk to dvk. Future work will evaluate dvks estimated by cnns with full monte-carlo simulations of a whole body CT to predict patient specific voxel dose maps.
Tasks
Published 2018-05-23
URL https://arxiv.org/abs/1805.09108v4
PDF https://arxiv.org/pdf/1805.09108v4.pdf
PWC https://paperswithcode.com/paper/deep-learning-estimation-of-absorbed-dose-for
Repo https://github.com/karhunenloeve/karhunenloeve.github.io
Framework none

CFCM: Segmentation via Coarse to Fine Context Memory

Title CFCM: Segmentation via Coarse to Fine Context Memory
Authors Fausto Milletari, Nicola Rieke, Maximilian Baust, Marco Esposito, Nassir Navab
Abstract Recent neural-network-based architectures for image segmentation make extensive usage of feature forwarding mechanisms to integrate information from multiple scales. Although yielding good results, even deeper architectures and alternative methods for feature fusion at different resolutions have been scarcely investigated for medical applications. In this work we propose to implement segmentation via an encoder-decoder architecture which differs from any other previously published method since (i) it employs a very deep architecture based on residual learning and (ii) combines features via a convolutional Long Short Term Memory (LSTM), instead of concatenation or summation. The intuition is that the memory mechanism implemented by LSTMs can better integrate features from different scales through a coarse-to-fine strategy; hence the name Coarse-to-Fine Context Memory (CFCM). We demonstrate the remarkable advantages of this approach on two datasets: the Montgomery county lung segmentation dataset, and the EndoVis 2015 challenge dataset for surgical instrument segmentation.
Tasks Semantic Segmentation
Published 2018-06-04
URL http://arxiv.org/abs/1806.01413v1
PDF http://arxiv.org/pdf/1806.01413v1.pdf
PWC https://paperswithcode.com/paper/cfcm-segmentation-via-coarse-to-fine-context
Repo https://github.com/faustomilletari/CFCM-2D
Framework tf

Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds

Title Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds
Authors David Reeb, Andreas Doerr, Sebastian Gerwinn, Barbara Rakitsch
Abstract Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets.
Tasks Gaussian Processes
Published 2018-10-29
URL http://arxiv.org/abs/1810.12263v2
PDF http://arxiv.org/pdf/1810.12263v2.pdf
PWC https://paperswithcode.com/paper/learning-gaussian-processes-by-minimizing-pac
Repo https://github.com/boschresearch/PAC_GP
Framework tf

Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road

Title Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road
Authors Akshay Rangesh, Mohan M. Trivedi
Abstract This paper introduces an approach to produce accurate 3D detection boxes for objects on the ground using single monocular images. We do so by merging 2D visual cues, 3D object dimensions, and ground plane constraints to produce boxes that are robust against small errors and incorrect predictions. First, we train a single-shot convolutional neural network (CNN) that produces multiple visual and geometric cues of interest: 2D bounding boxes, 2D keypoints of interest, coarse object orientations and object dimensions. Subsets of these cues are then used to poll probable ground planes from a pre-computed database of ground planes, to identify the “best fit” plane with highest consensus. Once identified, the “best fit” plane provides enough constraints to successfully construct the desired 3D detection box, without directly predicting the 6DoF pose of the object. The entire ground plane polling (GPP) procedure is constructed as a non-parametrized layer of the CNN that outputs the desired “best fit” plane and the corresponding 3D keypoints, which together define the final 3D bounding box. Doing so allows us to poll thousands of different ground plane configurations without adding considerable overhead, while also creating a single CNN that directly produces the desired output without the need for post processing. We evaluate our method on the 2D detection and orientation estimation benchmark from the challenging KITTI dataset, and provide additional comparisons for 3D metrics of importance. This single-stage, single-pass CNN results in superior localization and orientation estimation compared to more complex and computationally expensive monocular approaches.
Tasks Pose Estimation
Published 2018-11-16
URL https://arxiv.org/abs/1811.06666v4
PDF https://arxiv.org/pdf/1811.06666v4.pdf
PWC https://paperswithcode.com/paper/ground-plane-polling-for-6dof-pose-estimation
Repo https://github.com/arangesh/Ground-Plane-Polling
Framework tf

Probabilistic Formulations of Regression with Mixed Guidance

Title Probabilistic Formulations of Regression with Mixed Guidance
Authors Aubrey Gress, Ian Davidson
Abstract Regression problems assume every instance is annotated (labeled) with a real value, a form of annotation we call \emph{strong guidance}. In order for these annotations to be accurate, they must be the result of a precise experiment or measurement. However, in some cases additional \emph{weak guidance} might be given by imprecise measurements, a domain expert or even crowd sourcing. Current formulations of regression are unable to use both types of guidance. We propose a regression framework that can also incorporate weak guidance based on relative orderings, bounds, neighboring and similarity relations. Consider learning to predict ages from portrait images, these new types of guidance allow weaker forms of guidance such as stating a person is in their 20s or two people are similar in age. These types of annotations can be easier to generate than strong guidance. We introduce a probabilistic formulation for these forms of weak guidance and show that the resulting optimization problems are convex. Our experimental results show the benefits of these formulations on several data sets.
Tasks
Published 2018-04-01
URL http://arxiv.org/abs/1804.01575v1
PDF http://arxiv.org/pdf/1804.01575v1.pdf
PWC https://paperswithcode.com/paper/probabilistic-formulations-of-regression-with
Repo https://github.com/adgress/ICDM2016
Framework none

Tensor Robust Principal Component Analysis with A New Tensor Nuclear Norm

Title Tensor Robust Principal Component Analysis with A New Tensor Nuclear Norm
Authors Canyi Lu, Jiashi Feng, Yudong Chen, Wei Liu, Zhouchen Lin, Shuicheng Yan
Abstract In this paper, we consider the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the low-rank and sparse components from their sum. Our model is based on the recently proposed tensor-tensor product (or t-product). Induced by the t-product, we first rigorously deduce the tensor spectral norm, tensor nuclear norm, and tensor average rank, and show that the tensor nuclear norm is the convex envelope of the tensor average rank within the unit ball of the tensor spectral norm. These definitions, their relationships and properties are consistent with matrix cases. Equipped with the new tensor nuclear norm, we then solve the TRPCA problem by solving a convex program and provide the theoretical guarantee for the exact recovery. Our TRPCA model and recovery guarantee include matrix RPCA as a special case. Numerical experiments verify our results, and the applications to image recovery and background modeling problems demonstrate the effectiveness of our method.
Tasks
Published 2018-04-10
URL http://arxiv.org/abs/1804.03728v2
PDF http://arxiv.org/pdf/1804.03728v2.pdf
PWC https://paperswithcode.com/paper/tensor-robust-principal-component-analysis
Repo https://github.com/zhaoxile/reproducible-tensor-completion-state-of-the-art
Framework none

FAIM – A ConvNet Method for Unsupervised 3D Medical Image Registration

Title FAIM – A ConvNet Method for Unsupervised 3D Medical Image Registration
Authors Dongyang Kuang, Tanya Schmah
Abstract We present a new unsupervised learning algorithm, “FAIM”, for 3D medical image registration. With a different architecture than the popular “U-net”, the network takes a pair of full image volumes and predicts the displacement fields needed to register source to target. Compared with “U-net” based registration networks such as VoxelMorph, FAIM has fewer trainable parameters but can achieve higher registration accuracy as judged by Dice score on region labels in the Mindboggle-101 dataset. Moreover, with the proposed penalty loss on negative Jacobian determinants, FAIM produces deformations with many fewer “foldings”, i.e. regions of non-invertibility where the surface folds over itself. In our experiment, we varied the strength of this penalty and investigated changes in registration accuracy and non-invertibility in terms of number of “folding” locations. We found that FAIM is able to maintain both the advantages of higher accuracy and fewer “folding” locations over VoxelMorph, over a range of hyper-parameters (with the same values used for both networks). Further, when trading off registration accuracy for better invertibility, FAIM required less sacrifice of registration accuracy. Codes for this paper will be released upon publication.
Tasks Image Registration, Medical Image Registration
Published 2018-11-22
URL https://arxiv.org/abs/1811.09243v2
PDF https://arxiv.org/pdf/1811.09243v2.pdf
PWC https://paperswithcode.com/paper/faim-a-convnet-method-for-unsupervised-3d
Repo https://github.com/dykuang/Medical-image-registration
Framework tf

Attention-Gated Networks for Improving Ultrasound Scan Plane Detection

Title Attention-Gated Networks for Improving Ultrasound Scan Plane Detection
Authors Jo Schlemper, Ozan Oktay, Liang Chen, Jacqueline Matthew, Caroline Knight, Bernhard Kainz, Ben Glocker, Daniel Rueckert
Abstract In this work, we apply an attention-gated network to real-time automated scan plane detection for fetal ultrasound screening. Scan plane detection in fetal ultrasound is a challenging problem due the poor image quality resulting in low interpretability for both clinicians and automated algorithms. To solve this, we propose incorporating self-gated soft-attention mechanisms. A soft-attention mechanism generates a gating signal that is end-to-end trainable, which allows the network to contextualise local information useful for prediction. The proposed attention mechanism is generic and it can be easily incorporated into any existing classification architectures, while only requiring a few additional parameters. We show that, when the base network has a high capacity, the incorporated attention mechanism can provide efficient object localisation while improving the overall performance. When the base network has a low capacity, the method greatly outperforms the baseline approach and significantly reduces false positives. Lastly, the generated attention maps allow us to understand the model’s reasoning process, which can also be used for weakly supervised object localisation.
Tasks
Published 2018-04-15
URL http://arxiv.org/abs/1804.05338v1
PDF http://arxiv.org/pdf/1804.05338v1.pdf
PWC https://paperswithcode.com/paper/attention-gated-networks-for-improving
Repo https://github.com/srb-cv/AttentionClassification
Framework pytorch

Learning and Inference in Hilbert Space with Quantum Graphical Models

Title Learning and Inference in Hilbert Space with Quantum Graphical Models
Authors Siddarth Srinivasan, Carlton Downey, Byron Boots
Abstract Quantum Graphical Models (QGMs) generalize classical graphical models by adopting the formalism for reasoning about uncertainty from quantum mechanics. Unlike classical graphical models, QGMs represent uncertainty with density matrices in complex Hilbert spaces. Hilbert space embeddings (HSEs) also generalize Bayesian inference in Hilbert spaces. We investigate the link between QGMs and HSEs and show that the sum rule and Bayes rule for QGMs are equivalent to the kernel sum rule in HSEs and a special case of Nadaraya-Watson kernel regression, respectively. We show that these operations can be kernelized, and use these insights to propose a Hilbert Space Embedding of Hidden Quantum Markov Models (HSE-HQMM) to model dynamics. We present experimental results showing that HSE-HQMMs are competitive with state-of-the-art models like LSTMs and PSRNNs on several datasets, while also providing a nonparametric method for maintaining a probability distribution over continuous-valued features.
Tasks Bayesian Inference
Published 2018-10-29
URL http://arxiv.org/abs/1810.12369v1
PDF http://arxiv.org/pdf/1810.12369v1.pdf
PWC https://paperswithcode.com/paper/learning-and-inference-in-hilbert-space-with
Repo https://github.com/cmdowney/hsehqmm
Framework none

An Attention Model for group-level emotion recognition

Title An Attention Model for group-level emotion recognition
Authors Aarush Gupta, Dakshit Agrawal, Hardik Chauhan, Jose Dolz, Marco Pedersoli
Abstract In this paper we propose a new approach for classifying the global emotion of images containing groups of people. To achieve this task, we consider two different and complementary sources of information: i) a global representation of the entire image (ii) a local representation where only faces are considered. While the global representation of the image is learned with a convolutional neural network (CNN), the local representation is obtained by merging face features through an attention mechanism. The two representations are first learned independently with two separate CNN branches and then fused through concatenation in order to obtain the final group-emotion classifier. For our submission to the EmotiW 2018 group-level emotion recognition challenge, we combine several variations of the proposed model into an ensemble, obtaining a final accuracy of 64.83% on the test set and ranking 4th among all challenge participants.
Tasks Emotion Recognition
Published 2018-07-09
URL http://arxiv.org/abs/1807.03380v1
PDF http://arxiv.org/pdf/1807.03380v1.pdf
PWC https://paperswithcode.com/paper/an-attention-model-for-group-level-emotion
Repo https://github.com/vlgiitr/Group-Level-Emotion-Recognition
Framework pytorch

Community Member Retrieval on Social Media using Textual Information

Title Community Member Retrieval on Social Media using Textual Information
Authors Aaron Jaech, Shobhit Hathi, Mari Ostendorf
Abstract This paper addresses the problem of community membership detection using only text features in a scenario where a small number of positive labeled examples defines the community. The solution introduces an unsupervised proxy task for learning user embeddings: user re-identification. Experiments with 16 different communities show that the resulting embeddings are more effective for community membership identification than common unsupervised representations.
Tasks
Published 2018-04-16
URL http://arxiv.org/abs/1804.05499v1
PDF http://arxiv.org/pdf/1804.05499v1.pdf
PWC https://paperswithcode.com/paper/community-member-retrieval-on-social-media
Repo https://github.com/ajaech/twittercommunities
Framework tf

Incorporating Features Learned by an Enhanced Deep Knowledge Tracing Model for STEM/Non-STEM Job Prediction

Title Incorporating Features Learned by an Enhanced Deep Knowledge Tracing Model for STEM/Non-STEM Job Prediction
Authors Chun-kit Yeung, Zizheng Lin, Kai Yang, Dit-yan Yeung
Abstract The 2017 ASSISTments Data Mining competition aims to use data from a longitudinal study for predicting a brand-new outcome of students which had never been studied before by the educational data mining research community. Specifically, it facilitates research in developing predictive models that predict whether the first job of a student out of college belongs to a STEM (the acronym for science, technology, engineering, and mathematics) field. This is based on the student’s learning history on the ASSISTments blended learning platform in the form of extensive clickstream data gathered during the middle school years. To tackle this challenge, we first estimate the expected knowledge state of students with respect to different mathematical skills using a deep knowledge tracing (DKT) model and an enhanced DKT (DKT+) model. We then combine the features corresponding to the DKT/DKT+ expected knowledge state with other features extracted directly from the student profile in the dataset to train several machine learning models for the STEM/non-STEM job prediction. Our experiments show that models trained with the combined features generally perform better than the models trained with the student profile alone. Detailed analysis of the student’s knowledge state reveals that, when compared with non-STEM students, STEM students generally show a higher mastery level and a higher learning gain in mathematics.
Tasks Knowledge Tracing
Published 2018-06-06
URL http://arxiv.org/abs/1806.03256v1
PDF http://arxiv.org/pdf/1806.03256v1.pdf
PWC https://paperswithcode.com/paper/incorporating-features-learned-by-an-enhanced
Repo https://github.com/ckyeungac/ADM2017
Framework tf

Multimodal Grounding for Sequence-to-Sequence Speech Recognition

Title Multimodal Grounding for Sequence-to-Sequence Speech Recognition
Authors Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze
Abstract Humans are capable of processing speech by making use of multiple sensory modalities. For example, the environment where a conversation takes place generally provides semantic and/or acoustic context that helps us to resolve ambiguities or to recall named entities. Motivated by this, there have been many works studying the integration of visual information into the speech recognition pipeline. Specifically, in our previous work, we propose a multistep visual adaptive training approach which improves the accuracy of an audio-based Automatic Speech Recognition (ASR) system. This approach, however, is not end-to-end as it requires fine-tuning the whole model with an adaptation layer. In this paper, we propose novel end-to-end multimodal ASR systems and compare them to the adaptive approach by using a range of visual representations obtained from state-of-the-art convolutional neural networks. We show that adaptive training is effective for S2S models leading to an absolute improvement of 1.4% in word error rate. As for the end-to-end systems, although they perform better than baseline, the improvements are slightly less than adaptive training, 0.8 absolute WER reduction in single-best models. Using ensemble decoding, end-to-end models reach a WER of 15% which is the lowest score among all systems.
Tasks Sequence-To-Sequence Speech Recognition, Speech Recognition
Published 2018-11-09
URL http://arxiv.org/abs/1811.03865v2
PDF http://arxiv.org/pdf/1811.03865v2.pdf
PWC https://paperswithcode.com/paper/multimodal-grounding-for-sequence-to-sequence
Repo https://github.com/srvk/how2-dataset
Framework none

From Coarse to Fine: Robust Hierarchical Localization at Large Scale

Title From Coarse to Fine: Robust Hierarchical Localization at Large Scale
Authors Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, Marcin Dymczyk
Abstract Robust and accurate visual localization is a fundamental capability for numerous applications, such as autonomous driving, mobile robotics, or augmented reality. It remains, however, a challenging task, particularly for large-scale environments and in presence of significant appearance changes. State-of-the-art methods not only struggle with such scenarios, but are often too resource intensive for certain real-time applications. In this paper we propose HF-Net, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization. We exploit the coarse-to-fine localization paradigm: we first perform a global retrieval to obtain location hypotheses and only later match local features within those candidate places. This hierarchical approach incurs significant runtime savings and makes our system suitable for real-time operation. By leveraging learned descriptors, our method achieves remarkable localization robustness across large variations of appearance and sets a new state-of-the-art on two challenging benchmarks for large-scale localization.
Tasks Autonomous Driving, Visual Localization
Published 2018-12-09
URL http://arxiv.org/abs/1812.03506v2
PDF http://arxiv.org/pdf/1812.03506v2.pdf
PWC https://paperswithcode.com/paper/from-coarse-to-fine-robust-hierarchical
Repo https://github.com/ethz-asl/hfnet
Framework tf

Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization

Title Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization
Authors Paul-Edouard Sarlin, Frédéric Debraine, Marcin Dymczyk, Roland Siegwart, Cesar Cadena
Abstract Many robotics applications require precise pose estimates despite operating in large and changing environments. This can be addressed by visual localization, using a pre-computed 3D model of the surroundings. The pose estimation then amounts to finding correspondences between 2D keypoints in a query image and 3D points in the model using local descriptors. However, computational power is often limited on robotic platforms, making this task challenging in large-scale environments. Binary feature descriptors significantly speed up this 2D-3D matching, and have become popular in the robotics community, but also strongly impair the robustness to perceptual aliasing and changes in viewpoint, illumination and scene structure. In this work, we propose to leverage recent advances in deep learning to perform an efficient hierarchical localization. We first localize at the map level using learned image-wide global descriptors, and subsequently estimate a precise pose from 2D-3D matches computed in the candidate places only. This restricts the local search and thus allows to efficiently exploit powerful non-binary descriptors usually dismissed on resource-constrained devices. Our approach results in state-of-the-art localization performance while running in real-time on a popular mobile platform, enabling new prospects for robotics research.
Tasks Pose Estimation, Visual Localization
Published 2018-09-04
URL http://arxiv.org/abs/1809.01019v2
PDF http://arxiv.org/pdf/1809.01019v2.pdf
PWC https://paperswithcode.com/paper/leveraging-deep-visual-descriptors-for
Repo https://github.com/ethz-asl/hierarchical_loc
Framework tf
comments powered by Disqus