Paper Group AWR 150
Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics. CFCM: Segmentation via Coarse to Fine Context Memory. Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds. Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road. Probabilistic Formulations of Regression with Mixed Guidance. Tensor Rob …
Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics
Title | Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics |
Authors | Luciano Melodia |
Abstract | The distribution of energy dose from Lu$^{177}$ radiotherapy can be estimated by convolving an image of a time-integrated activity distribution with a dose voxel kernel (dvk) consisting of different types of tissues. This fast and inacurate approximation is inappropriate for personalized dosimetry as it neglects tissue heterogenity. The latter can be calculated using different imaging techniques such as CT and SPECT combined with a time consuming monte-carlo simulation. The aim of this study is, for the first time, an estimation of DVKs from CT-derived density kernels (dk) via deep learning in convolutional neural networks (cnns). The proposed cnn achieved, on the test set, a mean intersection over union (iou) of $= 0.86$ after $308$ epochs and a corresponding mean squared error (mse) $= 1.24 \cdot 10^{-4}$. This generalization ability shows that the trained cnn can indeed learn the complex transfer function from dk to dvk. Future work will evaluate dvks estimated by cnns with full monte-carlo simulations of a whole body CT to predict patient specific voxel dose maps. |
Tasks | |
Published | 2018-05-23 |
URL | https://arxiv.org/abs/1805.09108v4 |
https://arxiv.org/pdf/1805.09108v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-estimation-of-absorbed-dose-for |
Repo | https://github.com/karhunenloeve/karhunenloeve.github.io |
Framework | none |
CFCM: Segmentation via Coarse to Fine Context Memory
Title | CFCM: Segmentation via Coarse to Fine Context Memory |
Authors | Fausto Milletari, Nicola Rieke, Maximilian Baust, Marco Esposito, Nassir Navab |
Abstract | Recent neural-network-based architectures for image segmentation make extensive usage of feature forwarding mechanisms to integrate information from multiple scales. Although yielding good results, even deeper architectures and alternative methods for feature fusion at different resolutions have been scarcely investigated for medical applications. In this work we propose to implement segmentation via an encoder-decoder architecture which differs from any other previously published method since (i) it employs a very deep architecture based on residual learning and (ii) combines features via a convolutional Long Short Term Memory (LSTM), instead of concatenation or summation. The intuition is that the memory mechanism implemented by LSTMs can better integrate features from different scales through a coarse-to-fine strategy; hence the name Coarse-to-Fine Context Memory (CFCM). We demonstrate the remarkable advantages of this approach on two datasets: the Montgomery county lung segmentation dataset, and the EndoVis 2015 challenge dataset for surgical instrument segmentation. |
Tasks | Semantic Segmentation |
Published | 2018-06-04 |
URL | http://arxiv.org/abs/1806.01413v1 |
http://arxiv.org/pdf/1806.01413v1.pdf | |
PWC | https://paperswithcode.com/paper/cfcm-segmentation-via-coarse-to-fine-context |
Repo | https://github.com/faustomilletari/CFCM-2D |
Framework | tf |
Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds
Title | Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds |
Authors | David Reeb, Andreas Doerr, Sebastian Gerwinn, Barbara Rakitsch |
Abstract | Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets. |
Tasks | Gaussian Processes |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12263v2 |
http://arxiv.org/pdf/1810.12263v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-gaussian-processes-by-minimizing-pac |
Repo | https://github.com/boschresearch/PAC_GP |
Framework | tf |
Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road
Title | Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road |
Authors | Akshay Rangesh, Mohan M. Trivedi |
Abstract | This paper introduces an approach to produce accurate 3D detection boxes for objects on the ground using single monocular images. We do so by merging 2D visual cues, 3D object dimensions, and ground plane constraints to produce boxes that are robust against small errors and incorrect predictions. First, we train a single-shot convolutional neural network (CNN) that produces multiple visual and geometric cues of interest: 2D bounding boxes, 2D keypoints of interest, coarse object orientations and object dimensions. Subsets of these cues are then used to poll probable ground planes from a pre-computed database of ground planes, to identify the “best fit” plane with highest consensus. Once identified, the “best fit” plane provides enough constraints to successfully construct the desired 3D detection box, without directly predicting the 6DoF pose of the object. The entire ground plane polling (GPP) procedure is constructed as a non-parametrized layer of the CNN that outputs the desired “best fit” plane and the corresponding 3D keypoints, which together define the final 3D bounding box. Doing so allows us to poll thousands of different ground plane configurations without adding considerable overhead, while also creating a single CNN that directly produces the desired output without the need for post processing. We evaluate our method on the 2D detection and orientation estimation benchmark from the challenging KITTI dataset, and provide additional comparisons for 3D metrics of importance. This single-stage, single-pass CNN results in superior localization and orientation estimation compared to more complex and computationally expensive monocular approaches. |
Tasks | Pose Estimation |
Published | 2018-11-16 |
URL | https://arxiv.org/abs/1811.06666v4 |
https://arxiv.org/pdf/1811.06666v4.pdf | |
PWC | https://paperswithcode.com/paper/ground-plane-polling-for-6dof-pose-estimation |
Repo | https://github.com/arangesh/Ground-Plane-Polling |
Framework | tf |
Probabilistic Formulations of Regression with Mixed Guidance
Title | Probabilistic Formulations of Regression with Mixed Guidance |
Authors | Aubrey Gress, Ian Davidson |
Abstract | Regression problems assume every instance is annotated (labeled) with a real value, a form of annotation we call \emph{strong guidance}. In order for these annotations to be accurate, they must be the result of a precise experiment or measurement. However, in some cases additional \emph{weak guidance} might be given by imprecise measurements, a domain expert or even crowd sourcing. Current formulations of regression are unable to use both types of guidance. We propose a regression framework that can also incorporate weak guidance based on relative orderings, bounds, neighboring and similarity relations. Consider learning to predict ages from portrait images, these new types of guidance allow weaker forms of guidance such as stating a person is in their 20s or two people are similar in age. These types of annotations can be easier to generate than strong guidance. We introduce a probabilistic formulation for these forms of weak guidance and show that the resulting optimization problems are convex. Our experimental results show the benefits of these formulations on several data sets. |
Tasks | |
Published | 2018-04-01 |
URL | http://arxiv.org/abs/1804.01575v1 |
http://arxiv.org/pdf/1804.01575v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-formulations-of-regression-with |
Repo | https://github.com/adgress/ICDM2016 |
Framework | none |
Tensor Robust Principal Component Analysis with A New Tensor Nuclear Norm
Title | Tensor Robust Principal Component Analysis with A New Tensor Nuclear Norm |
Authors | Canyi Lu, Jiashi Feng, Yudong Chen, Wei Liu, Zhouchen Lin, Shuicheng Yan |
Abstract | In this paper, we consider the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the low-rank and sparse components from their sum. Our model is based on the recently proposed tensor-tensor product (or t-product). Induced by the t-product, we first rigorously deduce the tensor spectral norm, tensor nuclear norm, and tensor average rank, and show that the tensor nuclear norm is the convex envelope of the tensor average rank within the unit ball of the tensor spectral norm. These definitions, their relationships and properties are consistent with matrix cases. Equipped with the new tensor nuclear norm, we then solve the TRPCA problem by solving a convex program and provide the theoretical guarantee for the exact recovery. Our TRPCA model and recovery guarantee include matrix RPCA as a special case. Numerical experiments verify our results, and the applications to image recovery and background modeling problems demonstrate the effectiveness of our method. |
Tasks | |
Published | 2018-04-10 |
URL | http://arxiv.org/abs/1804.03728v2 |
http://arxiv.org/pdf/1804.03728v2.pdf | |
PWC | https://paperswithcode.com/paper/tensor-robust-principal-component-analysis |
Repo | https://github.com/zhaoxile/reproducible-tensor-completion-state-of-the-art |
Framework | none |
FAIM – A ConvNet Method for Unsupervised 3D Medical Image Registration
Title | FAIM – A ConvNet Method for Unsupervised 3D Medical Image Registration |
Authors | Dongyang Kuang, Tanya Schmah |
Abstract | We present a new unsupervised learning algorithm, “FAIM”, for 3D medical image registration. With a different architecture than the popular “U-net”, the network takes a pair of full image volumes and predicts the displacement fields needed to register source to target. Compared with “U-net” based registration networks such as VoxelMorph, FAIM has fewer trainable parameters but can achieve higher registration accuracy as judged by Dice score on region labels in the Mindboggle-101 dataset. Moreover, with the proposed penalty loss on negative Jacobian determinants, FAIM produces deformations with many fewer “foldings”, i.e. regions of non-invertibility where the surface folds over itself. In our experiment, we varied the strength of this penalty and investigated changes in registration accuracy and non-invertibility in terms of number of “folding” locations. We found that FAIM is able to maintain both the advantages of higher accuracy and fewer “folding” locations over VoxelMorph, over a range of hyper-parameters (with the same values used for both networks). Further, when trading off registration accuracy for better invertibility, FAIM required less sacrifice of registration accuracy. Codes for this paper will be released upon publication. |
Tasks | Image Registration, Medical Image Registration |
Published | 2018-11-22 |
URL | https://arxiv.org/abs/1811.09243v2 |
https://arxiv.org/pdf/1811.09243v2.pdf | |
PWC | https://paperswithcode.com/paper/faim-a-convnet-method-for-unsupervised-3d |
Repo | https://github.com/dykuang/Medical-image-registration |
Framework | tf |
Attention-Gated Networks for Improving Ultrasound Scan Plane Detection
Title | Attention-Gated Networks for Improving Ultrasound Scan Plane Detection |
Authors | Jo Schlemper, Ozan Oktay, Liang Chen, Jacqueline Matthew, Caroline Knight, Bernhard Kainz, Ben Glocker, Daniel Rueckert |
Abstract | In this work, we apply an attention-gated network to real-time automated scan plane detection for fetal ultrasound screening. Scan plane detection in fetal ultrasound is a challenging problem due the poor image quality resulting in low interpretability for both clinicians and automated algorithms. To solve this, we propose incorporating self-gated soft-attention mechanisms. A soft-attention mechanism generates a gating signal that is end-to-end trainable, which allows the network to contextualise local information useful for prediction. The proposed attention mechanism is generic and it can be easily incorporated into any existing classification architectures, while only requiring a few additional parameters. We show that, when the base network has a high capacity, the incorporated attention mechanism can provide efficient object localisation while improving the overall performance. When the base network has a low capacity, the method greatly outperforms the baseline approach and significantly reduces false positives. Lastly, the generated attention maps allow us to understand the model’s reasoning process, which can also be used for weakly supervised object localisation. |
Tasks | |
Published | 2018-04-15 |
URL | http://arxiv.org/abs/1804.05338v1 |
http://arxiv.org/pdf/1804.05338v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-gated-networks-for-improving |
Repo | https://github.com/srb-cv/AttentionClassification |
Framework | pytorch |
Learning and Inference in Hilbert Space with Quantum Graphical Models
Title | Learning and Inference in Hilbert Space with Quantum Graphical Models |
Authors | Siddarth Srinivasan, Carlton Downey, Byron Boots |
Abstract | Quantum Graphical Models (QGMs) generalize classical graphical models by adopting the formalism for reasoning about uncertainty from quantum mechanics. Unlike classical graphical models, QGMs represent uncertainty with density matrices in complex Hilbert spaces. Hilbert space embeddings (HSEs) also generalize Bayesian inference in Hilbert spaces. We investigate the link between QGMs and HSEs and show that the sum rule and Bayes rule for QGMs are equivalent to the kernel sum rule in HSEs and a special case of Nadaraya-Watson kernel regression, respectively. We show that these operations can be kernelized, and use these insights to propose a Hilbert Space Embedding of Hidden Quantum Markov Models (HSE-HQMM) to model dynamics. We present experimental results showing that HSE-HQMMs are competitive with state-of-the-art models like LSTMs and PSRNNs on several datasets, while also providing a nonparametric method for maintaining a probability distribution over continuous-valued features. |
Tasks | Bayesian Inference |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12369v1 |
http://arxiv.org/pdf/1810.12369v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-and-inference-in-hilbert-space-with |
Repo | https://github.com/cmdowney/hsehqmm |
Framework | none |
An Attention Model for group-level emotion recognition
Title | An Attention Model for group-level emotion recognition |
Authors | Aarush Gupta, Dakshit Agrawal, Hardik Chauhan, Jose Dolz, Marco Pedersoli |
Abstract | In this paper we propose a new approach for classifying the global emotion of images containing groups of people. To achieve this task, we consider two different and complementary sources of information: i) a global representation of the entire image (ii) a local representation where only faces are considered. While the global representation of the image is learned with a convolutional neural network (CNN), the local representation is obtained by merging face features through an attention mechanism. The two representations are first learned independently with two separate CNN branches and then fused through concatenation in order to obtain the final group-emotion classifier. For our submission to the EmotiW 2018 group-level emotion recognition challenge, we combine several variations of the proposed model into an ensemble, obtaining a final accuracy of 64.83% on the test set and ranking 4th among all challenge participants. |
Tasks | Emotion Recognition |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03380v1 |
http://arxiv.org/pdf/1807.03380v1.pdf | |
PWC | https://paperswithcode.com/paper/an-attention-model-for-group-level-emotion |
Repo | https://github.com/vlgiitr/Group-Level-Emotion-Recognition |
Framework | pytorch |
Community Member Retrieval on Social Media using Textual Information
Title | Community Member Retrieval on Social Media using Textual Information |
Authors | Aaron Jaech, Shobhit Hathi, Mari Ostendorf |
Abstract | This paper addresses the problem of community membership detection using only text features in a scenario where a small number of positive labeled examples defines the community. The solution introduces an unsupervised proxy task for learning user embeddings: user re-identification. Experiments with 16 different communities show that the resulting embeddings are more effective for community membership identification than common unsupervised representations. |
Tasks | |
Published | 2018-04-16 |
URL | http://arxiv.org/abs/1804.05499v1 |
http://arxiv.org/pdf/1804.05499v1.pdf | |
PWC | https://paperswithcode.com/paper/community-member-retrieval-on-social-media |
Repo | https://github.com/ajaech/twittercommunities |
Framework | tf |
Incorporating Features Learned by an Enhanced Deep Knowledge Tracing Model for STEM/Non-STEM Job Prediction
Title | Incorporating Features Learned by an Enhanced Deep Knowledge Tracing Model for STEM/Non-STEM Job Prediction |
Authors | Chun-kit Yeung, Zizheng Lin, Kai Yang, Dit-yan Yeung |
Abstract | The 2017 ASSISTments Data Mining competition aims to use data from a longitudinal study for predicting a brand-new outcome of students which had never been studied before by the educational data mining research community. Specifically, it facilitates research in developing predictive models that predict whether the first job of a student out of college belongs to a STEM (the acronym for science, technology, engineering, and mathematics) field. This is based on the student’s learning history on the ASSISTments blended learning platform in the form of extensive clickstream data gathered during the middle school years. To tackle this challenge, we first estimate the expected knowledge state of students with respect to different mathematical skills using a deep knowledge tracing (DKT) model and an enhanced DKT (DKT+) model. We then combine the features corresponding to the DKT/DKT+ expected knowledge state with other features extracted directly from the student profile in the dataset to train several machine learning models for the STEM/non-STEM job prediction. Our experiments show that models trained with the combined features generally perform better than the models trained with the student profile alone. Detailed analysis of the student’s knowledge state reveals that, when compared with non-STEM students, STEM students generally show a higher mastery level and a higher learning gain in mathematics. |
Tasks | Knowledge Tracing |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.03256v1 |
http://arxiv.org/pdf/1806.03256v1.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-features-learned-by-an-enhanced |
Repo | https://github.com/ckyeungac/ADM2017 |
Framework | tf |
Multimodal Grounding for Sequence-to-Sequence Speech Recognition
Title | Multimodal Grounding for Sequence-to-Sequence Speech Recognition |
Authors | Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze |
Abstract | Humans are capable of processing speech by making use of multiple sensory modalities. For example, the environment where a conversation takes place generally provides semantic and/or acoustic context that helps us to resolve ambiguities or to recall named entities. Motivated by this, there have been many works studying the integration of visual information into the speech recognition pipeline. Specifically, in our previous work, we propose a multistep visual adaptive training approach which improves the accuracy of an audio-based Automatic Speech Recognition (ASR) system. This approach, however, is not end-to-end as it requires fine-tuning the whole model with an adaptation layer. In this paper, we propose novel end-to-end multimodal ASR systems and compare them to the adaptive approach by using a range of visual representations obtained from state-of-the-art convolutional neural networks. We show that adaptive training is effective for S2S models leading to an absolute improvement of 1.4% in word error rate. As for the end-to-end systems, although they perform better than baseline, the improvements are slightly less than adaptive training, 0.8 absolute WER reduction in single-best models. Using ensemble decoding, end-to-end models reach a WER of 15% which is the lowest score among all systems. |
Tasks | Sequence-To-Sequence Speech Recognition, Speech Recognition |
Published | 2018-11-09 |
URL | http://arxiv.org/abs/1811.03865v2 |
http://arxiv.org/pdf/1811.03865v2.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-grounding-for-sequence-to-sequence |
Repo | https://github.com/srvk/how2-dataset |
Framework | none |
From Coarse to Fine: Robust Hierarchical Localization at Large Scale
Title | From Coarse to Fine: Robust Hierarchical Localization at Large Scale |
Authors | Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, Marcin Dymczyk |
Abstract | Robust and accurate visual localization is a fundamental capability for numerous applications, such as autonomous driving, mobile robotics, or augmented reality. It remains, however, a challenging task, particularly for large-scale environments and in presence of significant appearance changes. State-of-the-art methods not only struggle with such scenarios, but are often too resource intensive for certain real-time applications. In this paper we propose HF-Net, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization. We exploit the coarse-to-fine localization paradigm: we first perform a global retrieval to obtain location hypotheses and only later match local features within those candidate places. This hierarchical approach incurs significant runtime savings and makes our system suitable for real-time operation. By leveraging learned descriptors, our method achieves remarkable localization robustness across large variations of appearance and sets a new state-of-the-art on two challenging benchmarks for large-scale localization. |
Tasks | Autonomous Driving, Visual Localization |
Published | 2018-12-09 |
URL | http://arxiv.org/abs/1812.03506v2 |
http://arxiv.org/pdf/1812.03506v2.pdf | |
PWC | https://paperswithcode.com/paper/from-coarse-to-fine-robust-hierarchical |
Repo | https://github.com/ethz-asl/hfnet |
Framework | tf |
Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization
Title | Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization |
Authors | Paul-Edouard Sarlin, Frédéric Debraine, Marcin Dymczyk, Roland Siegwart, Cesar Cadena |
Abstract | Many robotics applications require precise pose estimates despite operating in large and changing environments. This can be addressed by visual localization, using a pre-computed 3D model of the surroundings. The pose estimation then amounts to finding correspondences between 2D keypoints in a query image and 3D points in the model using local descriptors. However, computational power is often limited on robotic platforms, making this task challenging in large-scale environments. Binary feature descriptors significantly speed up this 2D-3D matching, and have become popular in the robotics community, but also strongly impair the robustness to perceptual aliasing and changes in viewpoint, illumination and scene structure. In this work, we propose to leverage recent advances in deep learning to perform an efficient hierarchical localization. We first localize at the map level using learned image-wide global descriptors, and subsequently estimate a precise pose from 2D-3D matches computed in the candidate places only. This restricts the local search and thus allows to efficiently exploit powerful non-binary descriptors usually dismissed on resource-constrained devices. Our approach results in state-of-the-art localization performance while running in real-time on a popular mobile platform, enabling new prospects for robotics research. |
Tasks | Pose Estimation, Visual Localization |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.01019v2 |
http://arxiv.org/pdf/1809.01019v2.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-deep-visual-descriptors-for |
Repo | https://github.com/ethz-asl/hierarchical_loc |
Framework | tf |