Paper Group ANR 783
Self-Supervised Learning of Depth and Camera Motion from 360° Videos. Stochastic Normalizations as Bayesian Learning. Abstractive and Extractive Text Summarization using Document Context Vector and Recurrent Neural Networks. Discovering Style Trends through Deep Visually Aware Latent Item Embeddings. Attention-based Graph Neural Network for Semi-su …
Self-Supervised Learning of Depth and Camera Motion from 360° Videos
Title | Self-Supervised Learning of Depth and Camera Motion from 360° Videos |
Authors | Fu-En Wang, Hou-Ning Hu, Hsien-Tzu Cheng, Juan-Ting Lin, Shang-Ta Yang, Meng-Li Shih, Hung-Kuo Chu, Min Sun |
Abstract | As 360{\deg} cameras become prevalent in many autonomous systems (e.g., self-driving cars and drones), efficient 360{\deg} perception becomes more and more important. We propose a novel self-supervised learning approach for predicting the omnidirectional depth and camera motion from a 360{\deg} video. In particular, starting from the SfMLearner, which is designed for cameras with normal field-of-view, we introduce three key features to process 360{\deg} images efficiently. Firstly, we convert each image from equirectangular projection to cubic projection in order to avoid image distortion. In each network layer, we use Cube Padding (CP), which pads intermediate features from adjacent faces, to avoid image boundaries. Secondly, we propose a novel “spherical” photometric consistency constraint on the whole viewing sphere. In this way, no pixel will be projected outside the image boundary which typically happens in images with normal field-of-view. Finally, rather than naively estimating six independent camera motions (i.e., naively applying SfM-Learner to each face on a cube), we propose a novel camera pose consistency loss to ensure the estimated camera motions reaching consensus. To train and evaluate our approach, we collect a new PanoSUNCG dataset containing a large amount of 360{\deg} videos with groundtruth depth and camera motion. Our approach achieves state-of-the-art depth prediction and camera motion estimation on PanoSUNCG with faster inference speed comparing to equirectangular. In real-world indoor videos, our approach can also achieve qualitatively reasonable depth prediction by acquiring model pre-trained on PanoSUNCG. |
Tasks | Depth And Camera Motion, Depth Estimation, Motion Estimation, Self-Driving Cars |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05304v1 |
http://arxiv.org/pdf/1811.05304v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-learning-of-depth-and-camera |
Repo | |
Framework | |
Stochastic Normalizations as Bayesian Learning
Title | Stochastic Normalizations as Bayesian Learning |
Authors | Alexander Shekhovtsov, Boris Flach |
Abstract | In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply this idea to other (deterministic) normalization techniques that are oblivious to the batch size. We show that their generalization performance can be improved significantly by Bayesian learning of the same form. We obtain test performance comparable to BN and, at the same time, better validation losses suitable for subsequent output uncertainty estimation through approximate Bayesian posterior. |
Tasks | |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00639v1 |
http://arxiv.org/pdf/1811.00639v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-normalizations-as-bayesian |
Repo | |
Framework | |
Abstractive and Extractive Text Summarization using Document Context Vector and Recurrent Neural Networks
Title | Abstractive and Extractive Text Summarization using Document Context Vector and Recurrent Neural Networks |
Authors | Chandra Khatri, Gyanit Singh, Nish Parikh |
Abstract | Sequence to sequence (Seq2Seq) learning has recently been used for abstractive and extractive summarization. In current study, Seq2Seq models have been used for eBay product description summarization. We propose a novel Document-Context based Seq2Seq models using RNNs for abstractive and extractive summarizations. Intuitively, this is similar to humans reading the title, abstract or any other contextual information before reading the document. This gives humans a high-level idea of what the document is about. We use this idea and propose that Seq2Seq models should be started with contextual information at the first time-step of the input to obtain better summaries. In this manner, the output summaries are more document centric, than being generic, overcoming one of the major hurdles of using generative models. We generate document-context from user-behavior and seller provided information. We train and evaluate our models on human-extracted-golden-summaries. The document-contextual Seq2Seq models outperform standard Seq2Seq models. Moreover, generating human extracted summaries is prohibitively expensive to scale, we therefore propose a semi-supervised technique for extracting approximate summaries and using it for training Seq2Seq models at scale. Semi-supervised models are evaluated against human extracted summaries and are found to be of similar efficacy. We provide side by side comparison for abstractive and extractive summarizers (contextual and non-contextual) on same evaluation dataset. Overall, we provide methodologies to use and evaluate the proposed techniques for large document summarization. Furthermore, we found these techniques to be highly effective, which is not the case with existing techniques. |
Tasks | Document Summarization, Text Summarization |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.08000v2 |
http://arxiv.org/pdf/1807.08000v2.pdf | |
PWC | https://paperswithcode.com/paper/abstractive-and-extractive-text-summarization |
Repo | |
Framework | |
Discovering Style Trends through Deep Visually Aware Latent Item Embeddings
Title | Discovering Style Trends through Deep Visually Aware Latent Item Embeddings |
Authors | Murium Iqbal, Adair Kovac, Kamelia Aryafar |
Abstract | In this paper, we explore Latent Dirichlet Allocation (LDA) and Polylingual Latent Dirichlet Allocation (PolyLDA), as a means to discover trending styles in Overstock from deep visual semantic features transferred from a pretrained convolutional neural network and text-based item attributes. To utilize deep visual semantic features in conjunction with LDA, we develop a method for creating a bag of words representation of unrolled image vectors. By viewing the channels within the convolutional layers of a Resnet-50 as being representative of a word, we can index these activations to create visual documents. We then train LDA over these documents to discover the latent style in the images. We also incorporate text-based data with PolyLDA, where each representation is viewed as an independent language attempting to describe the same style. The resulting topics are shown to be excellent indicators of visual style across our platform. |
Tasks | |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08704v1 |
http://arxiv.org/pdf/1804.08704v1.pdf | |
PWC | https://paperswithcode.com/paper/discovering-style-trends-through-deep |
Repo | |
Framework | |
Attention-based Graph Neural Network for Semi-supervised Learning
Title | Attention-based Graph Neural Network for Semi-supervised Learning |
Authors | Kiran K. Thekumparampil, Chong Wang, Sewoong Oh, Li-Jia Li |
Abstract | Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches. These architectures alternate between a propagation layer that aggregates the hidden states of the local neighborhood and a fully-connected layer. Perhaps surprisingly, we show that a linear model, that removes all the intermediate fully-connected layers, is still able to achieve a performance comparable to the state-of-the-art models. This significantly reduces the number of parameters, which is critical for semi-supervised learning where number of labeled examples are small. This in turn allows a room for designing more innovative propagation layers. Based on this insight, we propose a novel graph neural network that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph. The attention mechanism allows us to learn a dynamic and adaptive local summary of the neighborhood to achieve more accurate predictions. In a number of experiments on benchmark citation networks datasets, we demonstrate that our approach outperforms competing methods. By examining the attention weights among neighbors, we show that our model provides some interesting insights on how neighbors influence each other. |
Tasks | Graph Regression |
Published | 2018-03-10 |
URL | http://arxiv.org/abs/1803.03735v1 |
http://arxiv.org/pdf/1803.03735v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-graph-neural-network-for-semi |
Repo | |
Framework | |
Fully Automatic Myocardial Segmentation of Contrast Echocardiography Sequence Using Random Forests Guided by Shape Model
Title | Fully Automatic Myocardial Segmentation of Contrast Echocardiography Sequence Using Random Forests Guided by Shape Model |
Authors | Yuanwei Li, Chin Pang Ho, Matthieu Toulemonde, Navtej Chahal, Roxy Senior, Meng-Xing Tang |
Abstract | Myocardial contrast echocardiography (MCE) is an imaging technique that assesses left ventricle function and myocardial perfusion for the detection of coronary artery diseases. Automatic MCE perfusion quantification is challenging and requires accurate segmentation of the myocardium from noisy and time-varying images. Random forests (RF) have been successfully applied to many medical image segmentation tasks. However, the pixel-wise RF classifier ignores contextual relationships between label outputs of individual pixels. RF which only utilizes local appearance features is also susceptible to data suffering from large intensity variations. In this paper, we demonstrate how to overcome the above limitations of classic RF by presenting a fully automatic segmentation pipeline for myocardial segmentation in full-cycle 2D MCE data. Specifically, a statistical shape model is used to provide shape prior information that guide the RF segmentation in two ways. First, a novel shape model (SM) feature is incorporated into the RF framework to generate a more accurate RF probability map. Second, the shape model is fitted to the RF probability map to refine and constrain the final segmentation to plausible myocardial shapes. We further improve the performance by introducing a bounding box detection algorithm as a preprocessing step in the segmentation pipeline. Our approach on 2D image is further extended to 2D+t sequence which ensures temporal consistency in the resultant sequence segmentations. When evaluated on clinical MCE data, our proposed method achieves notable improvement in segmentation accuracy and outperforms other state-of-the-art methods including the classic RF and its variants, active shape model and image registration. |
Tasks | Image Registration, Medical Image Segmentation, Semantic Segmentation |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07497v1 |
http://arxiv.org/pdf/1806.07497v1.pdf | |
PWC | https://paperswithcode.com/paper/fully-automatic-myocardial-segmentation-of |
Repo | |
Framework | |
Automated Scene Flow Data Generation for Training and Verification
Title | Automated Scene Flow Data Generation for Training and Verification |
Authors | Oliver Wasenmüller, René Schuster, Didier Stricker, Karl Leiss, Jürger Pfister, Oleksandra Ganus, Julian Tatsch, Artem Savkin, Nikolas Brasch |
Abstract | Scene flow describes the 3D position as well as the 3D motion of each pixel in an image. Such algorithms are the basis for many state-of-the-art autonomous or automated driving functions. For verification and training large amounts of ground truth data is required, which is not available for real data. In this paper, we demonstrate a technology to create synthetic data with dense and precise scene flow ground truth. |
Tasks | |
Published | 2018-08-30 |
URL | http://arxiv.org/abs/1808.10232v2 |
http://arxiv.org/pdf/1808.10232v2.pdf | |
PWC | https://paperswithcode.com/paper/automated-scene-flow-data-generation-for |
Repo | |
Framework | |
Functional Object-Oriented Network: Construction & Expansion
Title | Functional Object-Oriented Network: Construction & Expansion |
Authors | David Paulius, Ahmad Babaeian Jelodar, Yu Sun |
Abstract | We build upon the functional object-oriented network (FOON), a structured knowledge representation which is constructed from observations of human activities and manipulations. A FOON can be used for representing object-motion affordances. Knowledge retrieval through graph search allows us to obtain novel manipulation sequences using knowledge spanning across many video sources, hence the novelty in our approach. However, we are limited to the sources collected. To further improve the performance of knowledge retrieval as a follow up to our previous work, we discuss generalizing knowledge to be applied to objects which are similar to what we have in FOON without manually annotating new sources of knowledge. We discuss two means of generalization: 1) expanding our network through the use of object similarity to create new functional units from those we already have, and 2) compressing the functional units by object categories rather than specific objects. We discuss experiments which compare the performance of our knowledge retrieval algorithm with both expansion and compression by categories. |
Tasks | |
Published | 2018-07-05 |
URL | http://arxiv.org/abs/1807.02189v1 |
http://arxiv.org/pdf/1807.02189v1.pdf | |
PWC | https://paperswithcode.com/paper/functional-object-oriented-network |
Repo | |
Framework | |
On the Effect of Inter-observer Variability for a Reliable Estimation of Uncertainty of Medical Image Segmentation
Title | On the Effect of Inter-observer Variability for a Reliable Estimation of Uncertainty of Medical Image Segmentation |
Authors | Alain Jungo, Raphael Meier, Ekin Ermis, Marcela Blatti-Moreno, Evelyn Herrmann, Roland Wiest, Mauricio Reyes |
Abstract | Uncertainty estimation methods are expected to improve the understanding and quality of computer-assisted methods used in medical applications (e.g., neurosurgical interventions, radiotherapy planning), where automated medical image segmentation is crucial. In supervised machine learning, a common practice to generate ground truth label data is to merge observer annotations. However, as many medical image tasks show a high inter-observer variability resulting from factors such as image quality, different levels of user expertise and domain knowledge, little is known as to how inter-observer variability and commonly used fusion methods affect the estimation of uncertainty of automated image segmentation. In this paper we analyze the effect of common image label fusion techniques on uncertainty estimation, and propose to learn the uncertainty among observers. The results highlight the negative effect of fusion methods applied in deep learning, to obtain reliable estimates of segmentation uncertainty. Additionally, we show that the learned observers’ uncertainty can be combined with current standard Monte Carlo dropout Bayesian neural networks to characterize uncertainty of model’s parameters. |
Tasks | Medical Image Segmentation, Semantic Segmentation |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02562v1 |
http://arxiv.org/pdf/1806.02562v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-effect-of-inter-observer-variability |
Repo | |
Framework | |
Multimodal Registration of Retinal Images Using Domain-Specific Landmarks and Vessel Enhancement
Title | Multimodal Registration of Retinal Images Using Domain-Specific Landmarks and Vessel Enhancement |
Authors | Álvaro S. Hervella, José Rouco, Jorge Novo, Marcos Ortega |
Abstract | The analysis of different image modalities is frequently performed in ophthalmology as it provides complementary information for the diagnosis and follow-up of relevant diseases, like hypertension or diabetes. This work presents a hybrid method for the multimodal registration of color fundus retinography and fluorescein angiography. The proposed method combines a feature-based approach, using domain-specific landmarks, with an intensity-based approach that employs a domain-adapted similarity metric. The methodology is tested on a dataset of 59 image pairs containing both healthy and pathological cases. The results show a satisfactory performance of the proposed combined approach in this multimodal scenario, improving the registration accuracy achieved by the feature-based and the intensity-based approaches. |
Tasks | |
Published | 2018-03-02 |
URL | http://arxiv.org/abs/1803.00951v2 |
http://arxiv.org/pdf/1803.00951v2.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-registration-of-retinal-images |
Repo | |
Framework | |
Phoneme-to-viseme mappings: the good, the bad, and the ugly
Title | Phoneme-to-viseme mappings: the good, the bad, and the ugly |
Authors | Helen L Bear, Richard Harvey |
Abstract | Visemes are the visual equivalent of phonemes. Although not precisely defined, a working definition of a viseme is “a set of phonemes which have identical appearance on the lips”. Therefore a phoneme falls into one viseme class but a viseme may represent many phonemes: a many to one mapping. This mapping introduces ambiguity between phonemes when using viseme classifiers. Not only is this ambiguity damaging to the performance of audio-visual classifiers operating on real expressive speech, there is also considerable choice between possible mappings. In this paper we explore the issue of this choice of viseme-to-phoneme map. We show that there is definite difference in performance between viseme-to-phoneme mappings and explore why some maps appear to work better than others. We also devise a new algorithm for constructing phoneme-to-viseme mappings from labeled speech data. These new visemes, `Bear’ visemes, are shown to perform better than previously known units. | |
Tasks | |
Published | 2018-05-08 |
URL | http://arxiv.org/abs/1805.02934v1 |
http://arxiv.org/pdf/1805.02934v1.pdf | |
PWC | https://paperswithcode.com/paper/phoneme-to-viseme-mappings-the-good-the-bad |
Repo | |
Framework | |
NeuroTreeNet: A New Method to Explore Horizontal Expansion Network
Title | NeuroTreeNet: A New Method to Explore Horizontal Expansion Network |
Authors | Shenlong Lou, Yan Luo, Qiancong Fan, Feng Chen, Yiping Chen, Cheng Wang, Jonathan Li |
Abstract | It is widely recognized that the deeper networks or networks with more feature maps have better performance. Existing studies mainly focus on extending the network depth and increasing the feature maps of networks. At the same time, horizontal expansion network (e.g. Inception Model) as an alternative way to improve network performance has not been fully investigated. Accordingly, we proposed NeuroTreeNet (NTN), as a new horizontal extension network through the combination of random forest and Inception Model. Based on the tree structure, in which each branch represents a network and the root node features are shared to child nodes, network parameters are effectively reduced. By combining all features of leaf nodes, even less feature maps achieved better performance. In addition, the relationship between tree structure and the performance of NTN was investigated in depth. Comparing to other networks (e.g. VDSR_5) with equal magnitude parameters, our model showed preferable performance in super resolution reconstruction task. |
Tasks | Super-Resolution |
Published | 2018-11-22 |
URL | http://arxiv.org/abs/1811.09618v1 |
http://arxiv.org/pdf/1811.09618v1.pdf | |
PWC | https://paperswithcode.com/paper/neurotreenet-a-new-method-to-explore |
Repo | |
Framework | |
Wide and Deep Learning for Peer-to-Peer Lending
Title | Wide and Deep Learning for Peer-to-Peer Lending |
Authors | Kaveh Bastani, Elham Asgari, Hamed Namavari |
Abstract | This paper proposes a two-stage scoring approach to help lenders decide their fund allocations in the peer-to-peer (P2P) lending market. The existing scoring approaches focus on only either probability of default (PD) prediction, known as credit scoring, or profitability prediction, known as profit scoring, to identify the best loans for investment. Credit scoring fails to deliver the main need of lenders on how much profit they may obtain through their investment. On the other hand, profit scoring can satisfy that need by predicting the investment profitability. However, profit scoring completely ignores the class imbalance problem where most of the past loans are non-default. Consequently, ignorance of the class imbalance problem significantly affects the accuracy of profitability prediction. Our proposed two-stage scoring approach is an integration of credit scoring and profit scoring to address the above challenges. More specifically, stage 1 is designed as credit scoring to identify non-default loans while the imbalanced nature of loan status is considered in PD prediction. The loans identified as non-default are then moved to stage 2 for prediction of profitability, measured by internal rate of return. Wide and deep learning is used to build the predictive models in both stages to achieve both memorization and generalization. Extensive numerical studies are conducted based on real-world data to verify the effectiveness of the proposed approach. The numerical studies indicate our two-stage scoring approach outperforms the existing credit scoring and profit scoring approaches. |
Tasks | |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.03466v2 |
http://arxiv.org/pdf/1810.03466v2.pdf | |
PWC | https://paperswithcode.com/paper/wide-and-deep-learning-for-peer-to-peer |
Repo | |
Framework | |
Learn the new, keep the old: Extending pretrained models with new anatomy and images
Title | Learn the new, keep the old: Extending pretrained models with new anatomy and images |
Authors | Firat Ozdemir, Philipp Fuernstahl, Orcun Goksel |
Abstract | Deep learning has been widely accepted as a promising solution for medical image segmentation, given a sufficiently large representative dataset of images with corresponding annotations. With ever increasing amounts of annotated medical datasets, it is infeasible to train a learning method always with all data from scratch. This is also doomed to hit computational limits, e.g., memory or runtime feasible for training. Incremental learning can be a potential solution, where new information (images or anatomy) is introduced iteratively. Nevertheless, for the preservation of the collective information, it is essential to keep some “important” (i.e. representative) images and annotations from the past, while adding new information. In this paper, we introduce a framework for applying incremental learning for segmentation and propose novel methods for selecting representative data therein. We comparatively evaluate our methods in different scenarios using MR images and validate the increased learning capacity with using our methods. |
Tasks | Medical Image Segmentation, Semantic Segmentation |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.00265v1 |
http://arxiv.org/pdf/1806.00265v1.pdf | |
PWC | https://paperswithcode.com/paper/learn-the-new-keep-the-old-extending |
Repo | |
Framework | |
Deep Convolutional Neural Networks for Noise Detection in ECGs
Title | Deep Convolutional Neural Networks for Noise Detection in ECGs |
Authors | Jennifer N. John, Conner Galloway, Alexander Valys |
Abstract | Mobile electrocardiogram (ECG) recording technologies represent a promising tool to fight the ongoing epidemic of cardiovascular diseases, which are responsible for more deaths globally than any other cause. While the ability to monitor one’s heart activity at any time in any place is a crucial advantage of such technologies, it is also the cause of a drawback: signal noise due to environmental factors can render the ECGs illegible. In this work, we develop convolutional neural networks (CNNs) to automatically label ECGs for noise, training them on a novel noise-annotated dataset. By reducing distraction from noisy intervals of signals, such networks have the potential to increase the accuracy of models for the detection of atrial fibrillation, long QT syndrome, and other cardiovascular conditions. Comparing several architectures, we find that a 16-layer CNN adapted from the VGG16 network which generates one prediction per second on a 10-second input performs exceptionally well on this task, with an AUC of 0.977. |
Tasks | |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.04122v1 |
http://arxiv.org/pdf/1810.04122v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-convolutional-neural-networks-for-noise |
Repo | |
Framework | |