October 21, 2019

3136 words 15 mins read

Paper Group AWR 150

Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics. CFCM: Segmentation via Coarse to Fine Context Memory. Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds. Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road. Probabilistic Formulations of Regression with Mixed Guidance. Tensor Rob …

Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics


Title	Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics
Authors	Luciano Melodia
Abstract	The distribution of energy dose from Lu$^{177}$ radiotherapy can be estimated by convolving an image of a time-integrated activity distribution with a dose voxel kernel (dvk) consisting of different types of tissues. This fast and inacurate approximation is inappropriate for personalized dosimetry as it neglects tissue heterogenity. The latter can be calculated using different imaging techniques such as CT and SPECT combined with a time consuming monte-carlo simulation. The aim of this study is, for the first time, an estimation of DVKs from CT-derived density kernels (dk) via deep learning in convolutional neural networks (cnns). The proposed cnn achieved, on the test set, a mean intersection over union (iou) of $= 0.86$ after $308$ epochs and a corresponding mean squared error (mse) $= 1.24 \cdot 10^{-4}$. This generalization ability shows that the trained cnn can indeed learn the complex transfer function from dk to dvk. Future work will evaluate dvks estimated by cnns with full monte-carlo simulations of a whole body CT to predict patient specific voxel dose maps.
Tasks
Published	2018-05-23
URL	https://arxiv.org/abs/1805.09108v4
PDF	https://arxiv.org/pdf/1805.09108v4.pdf
PWC	https://paperswithcode.com/paper/deep-learning-estimation-of-absorbed-dose-for
Repo	https://github.com/karhunenloeve/karhunenloeve.github.io
Framework	none

CFCM: Segmentation via Coarse to Fine Context Memory


Title	CFCM: Segmentation via Coarse to Fine Context Memory
Authors	Fausto Milletari, Nicola Rieke, Maximilian Baust, Marco Esposito, Nassir Navab
Abstract	Recent neural-network-based architectures for image segmentation make extensive usage of feature forwarding mechanisms to integrate information from multiple scales. Although yielding good results, even deeper architectures and alternative methods for feature fusion at different resolutions have been scarcely investigated for medical applications. In this work we propose to implement segmentation via an encoder-decoder architecture which differs from any other previously published method since (i) it employs a very deep architecture based on residual learning and (ii) combines features via a convolutional Long Short Term Memory (LSTM), instead of concatenation or summation. The intuition is that the memory mechanism implemented by LSTMs can better integrate features from different scales through a coarse-to-fine strategy; hence the name Coarse-to-Fine Context Memory (CFCM). We demonstrate the remarkable advantages of this approach on two datasets: the Montgomery county lung segmentation dataset, and the EndoVis 2015 challenge dataset for surgical instrument segmentation.
Tasks	Semantic Segmentation
Published	2018-06-04
URL	http://arxiv.org/abs/1806.01413v1
PDF	http://arxiv.org/pdf/1806.01413v1.pdf
PWC	https://paperswithcode.com/paper/cfcm-segmentation-via-coarse-to-fine-context
Repo	https://github.com/faustomilletari/CFCM-2D
Framework	tf

Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds


Title	Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds
Authors	David Reeb, Andreas Doerr, Sebastian Gerwinn, Barbara Rakitsch
Abstract	Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets.
Tasks	Gaussian Processes
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12263v2
PDF	http://arxiv.org/pdf/1810.12263v2.pdf
PWC	https://paperswithcode.com/paper/learning-gaussian-processes-by-minimizing-pac
Repo	https://github.com/boschresearch/PAC_GP
Framework	tf

Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road


Title	Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road
Authors	Akshay Rangesh, Mohan M. Trivedi
Abstract	This paper introduces an approach to produce accurate 3D detection boxes for objects on the ground using single monocular images. We do so by merging 2D visual cues, 3D object dimensions, and ground plane constraints to produce boxes that are robust against small errors and incorrect predictions. First, we train a single-shot convolutional neural network (CNN) that produces multiple visual and geometric cues of interest: 2D bounding boxes, 2D keypoints of interest, coarse object orientations and object dimensions. Subsets of these cues are then used to poll probable ground planes from a pre-computed database of ground planes, to identify the “best fit” plane with highest consensus. Once identified, the “best fit” plane provides enough constraints to successfully construct the desired 3D detection box, without directly predicting the 6DoF pose of the object. The entire ground plane polling (GPP) procedure is constructed as a non-parametrized layer of the CNN that outputs the desired “best fit” plane and the corresponding 3D keypoints, which together define the final 3D bounding box. Doing so allows us to poll thousands of different ground plane configurations without adding considerable overhead, while also creating a single CNN that directly produces the desired output without the need for post processing. We evaluate our method on the 2D detection and orientation estimation benchmark from the challenging KITTI dataset, and provide additional comparisons for 3D metrics of importance. This single-stage, single-pass CNN results in superior localization and orientation estimation compared to more complex and computationally expensive monocular approaches.
Tasks	Pose Estimation
Published	2018-11-16
URL	https://arxiv.org/abs/1811.06666v4
PDF	https://arxiv.org/pdf/1811.06666v4.pdf
PWC	https://paperswithcode.com/paper/ground-plane-polling-for-6dof-pose-estimation
Repo	https://github.com/arangesh/Ground-Plane-Polling
Framework	tf

Probabilistic Formulations of Regression with Mixed Guidance


Title	Probabilistic Formulations of Regression with Mixed Guidance
Authors	Aubrey Gress, Ian Davidson
Abstract	Regression problems assume every instance is annotated (labeled) with a real value, a form of annotation we call \emph{strong guidance}. In order for these annotations to be accurate, they must be the result of a precise experiment or measurement. However, in some cases additional \emph{weak guidance} might be given by imprecise measurements, a domain expert or even crowd sourcing. Current formulations of regression are unable to use both types of guidance. We propose a regression framework that can also incorporate weak guidance based on relative orderings, bounds, neighboring and similarity relations. Consider learning to predict ages from portrait images, these new types of guidance allow weaker forms of guidance such as stating a person is in their 20s or two people are similar in age. These types of annotations can be easier to generate than strong guidance. We introduce a probabilistic formulation for these forms of weak guidance and show that the resulting optimization problems are convex. Our experimental results show the benefits of these formulations on several data sets.
Tasks
Published	2018-04-01
URL	http://arxiv.org/abs/1804.01575v1
PDF	http://arxiv.org/pdf/1804.01575v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-formulations-of-regression-with
Repo	https://github.com/adgress/ICDM2016
Framework	none

Tensor Robust Principal Component Analysis with A New Tensor Nuclear Norm


Title	Tensor Robust Principal Component Analysis with A New Tensor Nuclear Norm
Authors	Canyi Lu, Jiashi Feng, Yudong Chen, Wei Liu, Zhouchen Lin, Shuicheng Yan
Abstract	In this paper, we consider the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the low-rank and sparse components from their sum. Our model is based on the recently proposed tensor-tensor product (or t-product). Induced by the t-product, we first rigorously deduce the tensor spectral norm, tensor nuclear norm, and tensor average rank, and show that the tensor nuclear norm is the convex envelope of the tensor average rank within the unit ball of the tensor spectral norm. These definitions, their relationships and properties are consistent with matrix cases. Equipped with the new tensor nuclear norm, we then solve the TRPCA problem by solving a convex program and provide the theoretical guarantee for the exact recovery. Our TRPCA model and recovery guarantee include matrix RPCA as a special case. Numerical experiments verify our results, and the applications to image recovery and background modeling problems demonstrate the effectiveness of our method.
Tasks
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03728v2
PDF	http://arxiv.org/pdf/1804.03728v2.pdf
PWC	https://paperswithcode.com/paper/tensor-robust-principal-component-analysis
Repo	https://github.com/zhaoxile/reproducible-tensor-completion-state-of-the-art
Framework	none

FAIM – A ConvNet Method for Unsupervised 3D Medical Image Registration


Title	FAIM – A ConvNet Method for Unsupervised 3D Medical Image Registration
Authors	Dongyang Kuang, Tanya Schmah
Abstract	We present a new unsupervised learning algorithm, “FAIM”, for 3D medical image registration. With a different architecture than the popular “U-net”, the network takes a pair of full image volumes and predicts the displacement fields needed to register source to target. Compared with “U-net” based registration networks such as VoxelMorph, FAIM has fewer trainable parameters but can achieve higher registration accuracy as judged by Dice score on region labels in the Mindboggle-101 dataset. Moreover, with the proposed penalty loss on negative Jacobian determinants, FAIM produces deformations with many fewer “foldings”, i.e. regions of non-invertibility where the surface folds over itself. In our experiment, we varied the strength of this penalty and investigated changes in registration accuracy and non-invertibility in terms of number of “folding” locations. We found that FAIM is able to maintain both the advantages of higher accuracy and fewer “folding” locations over VoxelMorph, over a range of hyper-parameters (with the same values used for both networks). Further, when trading off registration accuracy for better invertibility, FAIM required less sacrifice of registration accuracy. Codes for this paper will be released upon publication.
Tasks	Image Registration, Medical Image Registration
Published	2018-11-22
URL	https://arxiv.org/abs/1811.09243v2
PDF	https://arxiv.org/pdf/1811.09243v2.pdf
PWC	https://paperswithcode.com/paper/faim-a-convnet-method-for-unsupervised-3d
Repo	https://github.com/dykuang/Medical-image-registration
Framework	tf

Attention-Gated Networks for Improving Ultrasound Scan Plane Detection


Title	Attention-Gated Networks for Improving Ultrasound Scan Plane Detection
Authors	Jo Schlemper, Ozan Oktay, Liang Chen, Jacqueline Matthew, Caroline Knight, Bernhard Kainz, Ben Glocker, Daniel Rueckert
Abstract	In this work, we apply an attention-gated network to real-time automated scan plane detection for fetal ultrasound screening. Scan plane detection in fetal ultrasound is a challenging problem due the poor image quality resulting in low interpretability for both clinicians and automated algorithms. To solve this, we propose incorporating self-gated soft-attention mechanisms. A soft-attention mechanism generates a gating signal that is end-to-end trainable, which allows the network to contextualise local information useful for prediction. The proposed attention mechanism is generic and it can be easily incorporated into any existing classification architectures, while only requiring a few additional parameters. We show that, when the base network has a high capacity, the incorporated attention mechanism can provide efficient object localisation while improving the overall performance. When the base network has a low capacity, the method greatly outperforms the baseline approach and significantly reduces false positives. Lastly, the generated attention maps allow us to understand the model’s reasoning process, which can also be used for weakly supervised object localisation.
Tasks
Published	2018-04-15
URL	http://arxiv.org/abs/1804.05338v1
PDF	http://arxiv.org/pdf/1804.05338v1.pdf
PWC	https://paperswithcode.com/paper/attention-gated-networks-for-improving
Repo	https://github.com/srb-cv/AttentionClassification
Framework	pytorch

Learning and Inference in Hilbert Space with Quantum Graphical Models


Title	Learning and Inference in Hilbert Space with Quantum Graphical Models
Authors	Siddarth Srinivasan, Carlton Downey, Byron Boots
Abstract	Quantum Graphical Models (QGMs) generalize classical graphical models by adopting the formalism for reasoning about uncertainty from quantum mechanics. Unlike classical graphical models, QGMs represent uncertainty with density matrices in complex Hilbert spaces. Hilbert space embeddings (HSEs) also generalize Bayesian inference in Hilbert spaces. We investigate the link between QGMs and HSEs and show that the sum rule and Bayes rule for QGMs are equivalent to the kernel sum rule in HSEs and a special case of Nadaraya-Watson kernel regression, respectively. We show that these operations can be kernelized, and use these insights to propose a Hilbert Space Embedding of Hidden Quantum Markov Models (HSE-HQMM) to model dynamics. We present experimental results showing that HSE-HQMMs are competitive with state-of-the-art models like LSTMs and PSRNNs on several datasets, while also providing a nonparametric method for maintaining a probability distribution over continuous-valued features.
Tasks	Bayesian Inference
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12369v1
PDF	http://arxiv.org/pdf/1810.12369v1.pdf
PWC	https://paperswithcode.com/paper/learning-and-inference-in-hilbert-space-with
Repo	https://github.com/cmdowney/hsehqmm
Framework	none

An Attention Model for group-level emotion recognition


Title	An Attention Model for group-level emotion recognition
Authors	Aarush Gupta, Dakshit Agrawal, Hardik Chauhan, Jose Dolz, Marco Pedersoli
Abstract	In this paper we propose a new approach for classifying the global emotion of images containing groups of people. To achieve this task, we consider two different and complementary sources of information: i) a global representation of the entire image (ii) a local representation where only faces are considered. While the global representation of the image is learned with a convolutional neural network (CNN), the local representation is obtained by merging face features through an attention mechanism. The two representations are first learned independently with two separate CNN branches and then fused through concatenation in order to obtain the final group-emotion classifier. For our submission to the EmotiW 2018 group-level emotion recognition challenge, we combine several variations of the proposed model into an ensemble, obtaining a final accuracy of 64.83% on the test set and ranking 4th among all challenge participants.
Tasks	Emotion Recognition
Published	2018-07-09
URL	http://arxiv.org/abs/1807.03380v1
PDF	http://arxiv.org/pdf/1807.03380v1.pdf
PWC	https://paperswithcode.com/paper/an-attention-model-for-group-level-emotion
Repo	https://github.com/vlgiitr/Group-Level-Emotion-Recognition
Framework	pytorch


Title	Community Member Retrieval on Social Media using Textual Information
Authors	Aaron Jaech, Shobhit Hathi, Mari Ostendorf
Abstract	This paper addresses the problem of community membership detection using only text features in a scenario where a small number of positive labeled examples defines the community. The solution introduces an unsupervised proxy task for learning user embeddings: user re-identification. Experiments with 16 different communities show that the resulting embeddings are more effective for community membership identification than common unsupervised representations.
Tasks
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05499v1
PDF	http://arxiv.org/pdf/1804.05499v1.pdf
PWC	https://paperswithcode.com/paper/community-member-retrieval-on-social-media
Repo	https://github.com/ajaech/twittercommunities
Framework	tf

Incorporating Features Learned by an Enhanced Deep Knowledge Tracing Model for STEM/Non-STEM Job Prediction


Title	Incorporating Features Learned by an Enhanced Deep Knowledge Tracing Model for STEM/Non-STEM Job Prediction
Authors	Chun-kit Yeung, Zizheng Lin, Kai Yang, Dit-yan Yeung
Abstract	The 2017 ASSISTments Data Mining competition aims to use data from a longitudinal study for predicting a brand-new outcome of students which had never been studied before by the educational data mining research community. Specifically, it facilitates research in developing predictive models that predict whether the first job of a student out of college belongs to a STEM (the acronym for science, technology, engineering, and mathematics) field. This is based on the student’s learning history on the ASSISTments blended learning platform in the form of extensive clickstream data gathered during the middle school years. To tackle this challenge, we first estimate the expected knowledge state of students with respect to different mathematical skills using a deep knowledge tracing (DKT) model and an enhanced DKT (DKT+) model. We then combine the features corresponding to the DKT/DKT+ expected knowledge state with other features extracted directly from the student profile in the dataset to train several machine learning models for the STEM/non-STEM job prediction. Our experiments show that models trained with the combined features generally perform better than the models trained with the student profile alone. Detailed analysis of the student’s knowledge state reveals that, when compared with non-STEM students, STEM students generally show a higher mastery level and a higher learning gain in mathematics.
Tasks	Knowledge Tracing
Published	2018-06-06
URL	http://arxiv.org/abs/1806.03256v1
PDF	http://arxiv.org/pdf/1806.03256v1.pdf
PWC	https://paperswithcode.com/paper/incorporating-features-learned-by-an-enhanced
Repo	https://github.com/ckyeungac/ADM2017
Framework	tf

Multimodal Grounding for Sequence-to-Sequence Speech Recognition


Title	Multimodal Grounding for Sequence-to-Sequence Speech Recognition
Authors	Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze
Abstract	Humans are capable of processing speech by making use of multiple sensory modalities. For example, the environment where a conversation takes place generally provides semantic and/or acoustic context that helps us to resolve ambiguities or to recall named entities. Motivated by this, there have been many works studying the integration of visual information into the speech recognition pipeline. Specifically, in our previous work, we propose a multistep visual adaptive training approach which improves the accuracy of an audio-based Automatic Speech Recognition (ASR) system. This approach, however, is not end-to-end as it requires fine-tuning the whole model with an adaptation layer. In this paper, we propose novel end-to-end multimodal ASR systems and compare them to the adaptive approach by using a range of visual representations obtained from state-of-the-art convolutional neural networks. We show that adaptive training is effective for S2S models leading to an absolute improvement of 1.4% in word error rate. As for the end-to-end systems, although they perform better than baseline, the improvements are slightly less than adaptive training, 0.8 absolute WER reduction in single-best models. Using ensemble decoding, end-to-end models reach a WER of 15% which is the lowest score among all systems.
Tasks	Sequence-To-Sequence Speech Recognition, Speech Recognition
Published	2018-11-09
URL	http://arxiv.org/abs/1811.03865v2
PDF	http://arxiv.org/pdf/1811.03865v2.pdf
PWC	https://paperswithcode.com/paper/multimodal-grounding-for-sequence-to-sequence
Repo	https://github.com/srvk/how2-dataset
Framework	none

From Coarse to Fine: Robust Hierarchical Localization at Large Scale


Title	From Coarse to Fine: Robust Hierarchical Localization at Large Scale
Authors	Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, Marcin Dymczyk
Abstract	Robust and accurate visual localization is a fundamental capability for numerous applications, such as autonomous driving, mobile robotics, or augmented reality. It remains, however, a challenging task, particularly for large-scale environments and in presence of significant appearance changes. State-of-the-art methods not only struggle with such scenarios, but are often too resource intensive for certain real-time applications. In this paper we propose HF-Net, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization. We exploit the coarse-to-fine localization paradigm: we first perform a global retrieval to obtain location hypotheses and only later match local features within those candidate places. This hierarchical approach incurs significant runtime savings and makes our system suitable for real-time operation. By leveraging learned descriptors, our method achieves remarkable localization robustness across large variations of appearance and sets a new state-of-the-art on two challenging benchmarks for large-scale localization.
Tasks	Autonomous Driving, Visual Localization
Published	2018-12-09
URL	http://arxiv.org/abs/1812.03506v2
PDF	http://arxiv.org/pdf/1812.03506v2.pdf
PWC	https://paperswithcode.com/paper/from-coarse-to-fine-robust-hierarchical
Repo	https://github.com/ethz-asl/hfnet
Framework	tf

Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization


Title	Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization
Authors	Paul-Edouard Sarlin, Frédéric Debraine, Marcin Dymczyk, Roland Siegwart, Cesar Cadena
Abstract	Many robotics applications require precise pose estimates despite operating in large and changing environments. This can be addressed by visual localization, using a pre-computed 3D model of the surroundings. The pose estimation then amounts to finding correspondences between 2D keypoints in a query image and 3D points in the model using local descriptors. However, computational power is often limited on robotic platforms, making this task challenging in large-scale environments. Binary feature descriptors significantly speed up this 2D-3D matching, and have become popular in the robotics community, but also strongly impair the robustness to perceptual aliasing and changes in viewpoint, illumination and scene structure. In this work, we propose to leverage recent advances in deep learning to perform an efficient hierarchical localization. We first localize at the map level using learned image-wide global descriptors, and subsequently estimate a precise pose from 2D-3D matches computed in the candidate places only. This restricts the local search and thus allows to efficiently exploit powerful non-binary descriptors usually dismissed on resource-constrained devices. Our approach results in state-of-the-art localization performance while running in real-time on a popular mobile platform, enabling new prospects for robotics research.
Tasks	Pose Estimation, Visual Localization
Published	2018-09-04
URL	http://arxiv.org/abs/1809.01019v2
PDF	http://arxiv.org/pdf/1809.01019v2.pdf
PWC	https://paperswithcode.com/paper/leveraging-deep-visual-descriptors-for
Repo	https://github.com/ethz-asl/hierarchical_loc
Framework	tf