October 17, 2019

3258 words 16 mins read

Paper Group ANR 783

Self-Supervised Learning of Depth and Camera Motion from 360° Videos. Stochastic Normalizations as Bayesian Learning. Abstractive and Extractive Text Summarization using Document Context Vector and Recurrent Neural Networks. Discovering Style Trends through Deep Visually Aware Latent Item Embeddings. Attention-based Graph Neural Network for Semi-su …

Self-Supervised Learning of Depth and Camera Motion from 360° Videos


Title	Self-Supervised Learning of Depth and Camera Motion from 360° Videos
Authors	Fu-En Wang, Hou-Ning Hu, Hsien-Tzu Cheng, Juan-Ting Lin, Shang-Ta Yang, Meng-Li Shih, Hung-Kuo Chu, Min Sun
Abstract	As 360{\deg} cameras become prevalent in many autonomous systems (e.g., self-driving cars and drones), efficient 360{\deg} perception becomes more and more important. We propose a novel self-supervised learning approach for predicting the omnidirectional depth and camera motion from a 360{\deg} video. In particular, starting from the SfMLearner, which is designed for cameras with normal field-of-view, we introduce three key features to process 360{\deg} images efficiently. Firstly, we convert each image from equirectangular projection to cubic projection in order to avoid image distortion. In each network layer, we use Cube Padding (CP), which pads intermediate features from adjacent faces, to avoid image boundaries. Secondly, we propose a novel “spherical” photometric consistency constraint on the whole viewing sphere. In this way, no pixel will be projected outside the image boundary which typically happens in images with normal field-of-view. Finally, rather than naively estimating six independent camera motions (i.e., naively applying SfM-Learner to each face on a cube), we propose a novel camera pose consistency loss to ensure the estimated camera motions reaching consensus. To train and evaluate our approach, we collect a new PanoSUNCG dataset containing a large amount of 360{\deg} videos with groundtruth depth and camera motion. Our approach achieves state-of-the-art depth prediction and camera motion estimation on PanoSUNCG with faster inference speed comparing to equirectangular. In real-world indoor videos, our approach can also achieve qualitatively reasonable depth prediction by acquiring model pre-trained on PanoSUNCG.
Tasks	Depth And Camera Motion, Depth Estimation, Motion Estimation, Self-Driving Cars
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05304v1
PDF	http://arxiv.org/pdf/1811.05304v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-learning-of-depth-and-camera
Repo
Framework

Stochastic Normalizations as Bayesian Learning


Title	Stochastic Normalizations as Bayesian Learning
Authors	Alexander Shekhovtsov, Boris Flach
Abstract	In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply this idea to other (deterministic) normalization techniques that are oblivious to the batch size. We show that their generalization performance can be improved significantly by Bayesian learning of the same form. We obtain test performance comparable to BN and, at the same time, better validation losses suitable for subsequent output uncertainty estimation through approximate Bayesian posterior.
Tasks
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00639v1
PDF	http://arxiv.org/pdf/1811.00639v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-normalizations-as-bayesian
Repo
Framework

Abstractive and Extractive Text Summarization using Document Context Vector and Recurrent Neural Networks


Title	Abstractive and Extractive Text Summarization using Document Context Vector and Recurrent Neural Networks
Authors	Chandra Khatri, Gyanit Singh, Nish Parikh
Abstract	Sequence to sequence (Seq2Seq) learning has recently been used for abstractive and extractive summarization. In current study, Seq2Seq models have been used for eBay product description summarization. We propose a novel Document-Context based Seq2Seq models using RNNs for abstractive and extractive summarizations. Intuitively, this is similar to humans reading the title, abstract or any other contextual information before reading the document. This gives humans a high-level idea of what the document is about. We use this idea and propose that Seq2Seq models should be started with contextual information at the first time-step of the input to obtain better summaries. In this manner, the output summaries are more document centric, than being generic, overcoming one of the major hurdles of using generative models. We generate document-context from user-behavior and seller provided information. We train and evaluate our models on human-extracted-golden-summaries. The document-contextual Seq2Seq models outperform standard Seq2Seq models. Moreover, generating human extracted summaries is prohibitively expensive to scale, we therefore propose a semi-supervised technique for extracting approximate summaries and using it for training Seq2Seq models at scale. Semi-supervised models are evaluated against human extracted summaries and are found to be of similar efficacy. We provide side by side comparison for abstractive and extractive summarizers (contextual and non-contextual) on same evaluation dataset. Overall, we provide methodologies to use and evaluate the proposed techniques for large document summarization. Furthermore, we found these techniques to be highly effective, which is not the case with existing techniques.
Tasks	Document Summarization, Text Summarization
Published	2018-07-20
URL	http://arxiv.org/abs/1807.08000v2
PDF	http://arxiv.org/pdf/1807.08000v2.pdf
PWC	https://paperswithcode.com/paper/abstractive-and-extractive-text-summarization
Repo
Framework

Discovering Style Trends through Deep Visually Aware Latent Item Embeddings


Title	Discovering Style Trends through Deep Visually Aware Latent Item Embeddings
Authors	Murium Iqbal, Adair Kovac, Kamelia Aryafar
Abstract	In this paper, we explore Latent Dirichlet Allocation (LDA) and Polylingual Latent Dirichlet Allocation (PolyLDA), as a means to discover trending styles in Overstock from deep visual semantic features transferred from a pretrained convolutional neural network and text-based item attributes. To utilize deep visual semantic features in conjunction with LDA, we develop a method for creating a bag of words representation of unrolled image vectors. By viewing the channels within the convolutional layers of a Resnet-50 as being representative of a word, we can index these activations to create visual documents. We then train LDA over these documents to discover the latent style in the images. We also incorporate text-based data with PolyLDA, where each representation is viewed as an independent language attempting to describe the same style. The resulting topics are shown to be excellent indicators of visual style across our platform.
Tasks
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08704v1
PDF	http://arxiv.org/pdf/1804.08704v1.pdf
PWC	https://paperswithcode.com/paper/discovering-style-trends-through-deep
Repo
Framework

Attention-based Graph Neural Network for Semi-supervised Learning


Title	Attention-based Graph Neural Network for Semi-supervised Learning
Authors	Kiran K. Thekumparampil, Chong Wang, Sewoong Oh, Li-Jia Li
Abstract	Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches. These architectures alternate between a propagation layer that aggregates the hidden states of the local neighborhood and a fully-connected layer. Perhaps surprisingly, we show that a linear model, that removes all the intermediate fully-connected layers, is still able to achieve a performance comparable to the state-of-the-art models. This significantly reduces the number of parameters, which is critical for semi-supervised learning where number of labeled examples are small. This in turn allows a room for designing more innovative propagation layers. Based on this insight, we propose a novel graph neural network that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph. The attention mechanism allows us to learn a dynamic and adaptive local summary of the neighborhood to achieve more accurate predictions. In a number of experiments on benchmark citation networks datasets, we demonstrate that our approach outperforms competing methods. By examining the attention weights among neighbors, we show that our model provides some interesting insights on how neighbors influence each other.
Tasks	Graph Regression
Published	2018-03-10
URL	http://arxiv.org/abs/1803.03735v1
PDF	http://arxiv.org/pdf/1803.03735v1.pdf
PWC	https://paperswithcode.com/paper/attention-based-graph-neural-network-for-semi
Repo
Framework

Fully Automatic Myocardial Segmentation of Contrast Echocardiography Sequence Using Random Forests Guided by Shape Model


Title	Fully Automatic Myocardial Segmentation of Contrast Echocardiography Sequence Using Random Forests Guided by Shape Model
Authors	Yuanwei Li, Chin Pang Ho, Matthieu Toulemonde, Navtej Chahal, Roxy Senior, Meng-Xing Tang
Abstract	Myocardial contrast echocardiography (MCE) is an imaging technique that assesses left ventricle function and myocardial perfusion for the detection of coronary artery diseases. Automatic MCE perfusion quantification is challenging and requires accurate segmentation of the myocardium from noisy and time-varying images. Random forests (RF) have been successfully applied to many medical image segmentation tasks. However, the pixel-wise RF classifier ignores contextual relationships between label outputs of individual pixels. RF which only utilizes local appearance features is also susceptible to data suffering from large intensity variations. In this paper, we demonstrate how to overcome the above limitations of classic RF by presenting a fully automatic segmentation pipeline for myocardial segmentation in full-cycle 2D MCE data. Specifically, a statistical shape model is used to provide shape prior information that guide the RF segmentation in two ways. First, a novel shape model (SM) feature is incorporated into the RF framework to generate a more accurate RF probability map. Second, the shape model is fitted to the RF probability map to refine and constrain the final segmentation to plausible myocardial shapes. We further improve the performance by introducing a bounding box detection algorithm as a preprocessing step in the segmentation pipeline. Our approach on 2D image is further extended to 2D+t sequence which ensures temporal consistency in the resultant sequence segmentations. When evaluated on clinical MCE data, our proposed method achieves notable improvement in segmentation accuracy and outperforms other state-of-the-art methods including the classic RF and its variants, active shape model and image registration.
Tasks	Image Registration, Medical Image Segmentation, Semantic Segmentation
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07497v1
PDF	http://arxiv.org/pdf/1806.07497v1.pdf
PWC	https://paperswithcode.com/paper/fully-automatic-myocardial-segmentation-of
Repo
Framework

Automated Scene Flow Data Generation for Training and Verification


Title	Automated Scene Flow Data Generation for Training and Verification
Authors	Oliver Wasenmüller, René Schuster, Didier Stricker, Karl Leiss, Jürger Pfister, Oleksandra Ganus, Julian Tatsch, Artem Savkin, Nikolas Brasch
Abstract	Scene flow describes the 3D position as well as the 3D motion of each pixel in an image. Such algorithms are the basis for many state-of-the-art autonomous or automated driving functions. For verification and training large amounts of ground truth data is required, which is not available for real data. In this paper, we demonstrate a technology to create synthetic data with dense and precise scene flow ground truth.
Tasks
Published	2018-08-30
URL	http://arxiv.org/abs/1808.10232v2
PDF	http://arxiv.org/pdf/1808.10232v2.pdf
PWC	https://paperswithcode.com/paper/automated-scene-flow-data-generation-for
Repo
Framework

Functional Object-Oriented Network: Construction & Expansion


Title	Functional Object-Oriented Network: Construction & Expansion
Authors	David Paulius, Ahmad Babaeian Jelodar, Yu Sun
Abstract	We build upon the functional object-oriented network (FOON), a structured knowledge representation which is constructed from observations of human activities and manipulations. A FOON can be used for representing object-motion affordances. Knowledge retrieval through graph search allows us to obtain novel manipulation sequences using knowledge spanning across many video sources, hence the novelty in our approach. However, we are limited to the sources collected. To further improve the performance of knowledge retrieval as a follow up to our previous work, we discuss generalizing knowledge to be applied to objects which are similar to what we have in FOON without manually annotating new sources of knowledge. We discuss two means of generalization: 1) expanding our network through the use of object similarity to create new functional units from those we already have, and 2) compressing the functional units by object categories rather than specific objects. We discuss experiments which compare the performance of our knowledge retrieval algorithm with both expansion and compression by categories.
Tasks
Published	2018-07-05
URL	http://arxiv.org/abs/1807.02189v1
PDF	http://arxiv.org/pdf/1807.02189v1.pdf
PWC	https://paperswithcode.com/paper/functional-object-oriented-network
Repo
Framework

On the Effect of Inter-observer Variability for a Reliable Estimation of Uncertainty of Medical Image Segmentation


Title	On the Effect of Inter-observer Variability for a Reliable Estimation of Uncertainty of Medical Image Segmentation
Authors	Alain Jungo, Raphael Meier, Ekin Ermis, Marcela Blatti-Moreno, Evelyn Herrmann, Roland Wiest, Mauricio Reyes
Abstract	Uncertainty estimation methods are expected to improve the understanding and quality of computer-assisted methods used in medical applications (e.g., neurosurgical interventions, radiotherapy planning), where automated medical image segmentation is crucial. In supervised machine learning, a common practice to generate ground truth label data is to merge observer annotations. However, as many medical image tasks show a high inter-observer variability resulting from factors such as image quality, different levels of user expertise and domain knowledge, little is known as to how inter-observer variability and commonly used fusion methods affect the estimation of uncertainty of automated image segmentation. In this paper we analyze the effect of common image label fusion techniques on uncertainty estimation, and propose to learn the uncertainty among observers. The results highlight the negative effect of fusion methods applied in deep learning, to obtain reliable estimates of segmentation uncertainty. Additionally, we show that the learned observers’ uncertainty can be combined with current standard Monte Carlo dropout Bayesian neural networks to characterize uncertainty of model’s parameters.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02562v1
PDF	http://arxiv.org/pdf/1806.02562v1.pdf
PWC	https://paperswithcode.com/paper/on-the-effect-of-inter-observer-variability
Repo
Framework

Multimodal Registration of Retinal Images Using Domain-Specific Landmarks and Vessel Enhancement


Title	Multimodal Registration of Retinal Images Using Domain-Specific Landmarks and Vessel Enhancement
Authors	Álvaro S. Hervella, José Rouco, Jorge Novo, Marcos Ortega
Abstract	The analysis of different image modalities is frequently performed in ophthalmology as it provides complementary information for the diagnosis and follow-up of relevant diseases, like hypertension or diabetes. This work presents a hybrid method for the multimodal registration of color fundus retinography and fluorescein angiography. The proposed method combines a feature-based approach, using domain-specific landmarks, with an intensity-based approach that employs a domain-adapted similarity metric. The methodology is tested on a dataset of 59 image pairs containing both healthy and pathological cases. The results show a satisfactory performance of the proposed combined approach in this multimodal scenario, improving the registration accuracy achieved by the feature-based and the intensity-based approaches.
Tasks
Published	2018-03-02
URL	http://arxiv.org/abs/1803.00951v2
PDF	http://arxiv.org/pdf/1803.00951v2.pdf
PWC	https://paperswithcode.com/paper/multimodal-registration-of-retinal-images
Repo
Framework

Phoneme-to-viseme mappings: the good, the bad, and the ugly


Title	Phoneme-to-viseme mappings: the good, the bad, and the ugly
Authors	Helen L Bear, Richard Harvey
Abstract	Visemes are the visual equivalent of phonemes. Although not precisely defined, a working definition of a viseme is “a set of phonemes which have identical appearance on the lips”. Therefore a phoneme falls into one viseme class but a viseme may represent many phonemes: a many to one mapping. This mapping introduces ambiguity between phonemes when using viseme classifiers. Not only is this ambiguity damaging to the performance of audio-visual classifiers operating on real expressive speech, there is also considerable choice between possible mappings. In this paper we explore the issue of this choice of viseme-to-phoneme map. We show that there is definite difference in performance between viseme-to-phoneme mappings and explore why some maps appear to work better than others. We also devise a new algorithm for constructing phoneme-to-viseme mappings from labeled speech data. These new visemes, `Bear’ visemes, are shown to perform better than previously known units. \|
Tasks
Published	2018-05-08
URL	http://arxiv.org/abs/1805.02934v1
PDF	http://arxiv.org/pdf/1805.02934v1.pdf
PWC	https://paperswithcode.com/paper/phoneme-to-viseme-mappings-the-good-the-bad
Repo
Framework

NeuroTreeNet: A New Method to Explore Horizontal Expansion Network


Title	NeuroTreeNet: A New Method to Explore Horizontal Expansion Network
Authors	Shenlong Lou, Yan Luo, Qiancong Fan, Feng Chen, Yiping Chen, Cheng Wang, Jonathan Li
Abstract	It is widely recognized that the deeper networks or networks with more feature maps have better performance. Existing studies mainly focus on extending the network depth and increasing the feature maps of networks. At the same time, horizontal expansion network (e.g. Inception Model) as an alternative way to improve network performance has not been fully investigated. Accordingly, we proposed NeuroTreeNet (NTN), as a new horizontal extension network through the combination of random forest and Inception Model. Based on the tree structure, in which each branch represents a network and the root node features are shared to child nodes, network parameters are effectively reduced. By combining all features of leaf nodes, even less feature maps achieved better performance. In addition, the relationship between tree structure and the performance of NTN was investigated in depth. Comparing to other networks (e.g. VDSR_5) with equal magnitude parameters, our model showed preferable performance in super resolution reconstruction task.
Tasks	Super-Resolution
Published	2018-11-22
URL	http://arxiv.org/abs/1811.09618v1
PDF	http://arxiv.org/pdf/1811.09618v1.pdf
PWC	https://paperswithcode.com/paper/neurotreenet-a-new-method-to-explore
Repo
Framework

Wide and Deep Learning for Peer-to-Peer Lending


Title	Wide and Deep Learning for Peer-to-Peer Lending
Authors	Kaveh Bastani, Elham Asgari, Hamed Namavari
Abstract	This paper proposes a two-stage scoring approach to help lenders decide their fund allocations in the peer-to-peer (P2P) lending market. The existing scoring approaches focus on only either probability of default (PD) prediction, known as credit scoring, or profitability prediction, known as profit scoring, to identify the best loans for investment. Credit scoring fails to deliver the main need of lenders on how much profit they may obtain through their investment. On the other hand, profit scoring can satisfy that need by predicting the investment profitability. However, profit scoring completely ignores the class imbalance problem where most of the past loans are non-default. Consequently, ignorance of the class imbalance problem significantly affects the accuracy of profitability prediction. Our proposed two-stage scoring approach is an integration of credit scoring and profit scoring to address the above challenges. More specifically, stage 1 is designed as credit scoring to identify non-default loans while the imbalanced nature of loan status is considered in PD prediction. The loans identified as non-default are then moved to stage 2 for prediction of profitability, measured by internal rate of return. Wide and deep learning is used to build the predictive models in both stages to achieve both memorization and generalization. Extensive numerical studies are conducted based on real-world data to verify the effectiveness of the proposed approach. The numerical studies indicate our two-stage scoring approach outperforms the existing credit scoring and profit scoring approaches.
Tasks
Published	2018-10-05
URL	http://arxiv.org/abs/1810.03466v2
PDF	http://arxiv.org/pdf/1810.03466v2.pdf
PWC	https://paperswithcode.com/paper/wide-and-deep-learning-for-peer-to-peer
Repo
Framework

Learn the new, keep the old: Extending pretrained models with new anatomy and images


Title	Learn the new, keep the old: Extending pretrained models with new anatomy and images
Authors	Firat Ozdemir, Philipp Fuernstahl, Orcun Goksel
Abstract	Deep learning has been widely accepted as a promising solution for medical image segmentation, given a sufficiently large representative dataset of images with corresponding annotations. With ever increasing amounts of annotated medical datasets, it is infeasible to train a learning method always with all data from scratch. This is also doomed to hit computational limits, e.g., memory or runtime feasible for training. Incremental learning can be a potential solution, where new information (images or anatomy) is introduced iteratively. Nevertheless, for the preservation of the collective information, it is essential to keep some “important” (i.e. representative) images and annotations from the past, while adding new information. In this paper, we introduce a framework for applying incremental learning for segmentation and propose novel methods for selecting representative data therein. We comparatively evaluate our methods in different scenarios using MR images and validate the increased learning capacity with using our methods.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00265v1
PDF	http://arxiv.org/pdf/1806.00265v1.pdf
PWC	https://paperswithcode.com/paper/learn-the-new-keep-the-old-extending
Repo
Framework

Deep Convolutional Neural Networks for Noise Detection in ECGs


Title	Deep Convolutional Neural Networks for Noise Detection in ECGs
Authors	Jennifer N. John, Conner Galloway, Alexander Valys
Abstract	Mobile electrocardiogram (ECG) recording technologies represent a promising tool to fight the ongoing epidemic of cardiovascular diseases, which are responsible for more deaths globally than any other cause. While the ability to monitor one’s heart activity at any time in any place is a crucial advantage of such technologies, it is also the cause of a drawback: signal noise due to environmental factors can render the ECGs illegible. In this work, we develop convolutional neural networks (CNNs) to automatically label ECGs for noise, training them on a novel noise-annotated dataset. By reducing distraction from noisy intervals of signals, such networks have the potential to increase the accuracy of models for the detection of atrial fibrillation, long QT syndrome, and other cardiovascular conditions. Comparing several architectures, we find that a 16-layer CNN adapted from the VGG16 network which generates one prediction per second on a 10-second input performs exceptionally well on this task, with an AUC of 0.977.
Tasks
Published	2018-10-05
URL	http://arxiv.org/abs/1810.04122v1
PDF	http://arxiv.org/pdf/1810.04122v1.pdf
PWC	https://paperswithcode.com/paper/deep-convolutional-neural-networks-for-noise
Repo
Framework