July 27, 2019

3209 words 16 mins read

Paper Group ANR 711

Deep Learning and Conditional Random Fields-based Depth Estimation and Topographical Reconstruction from Conventional Endoscopy. Bayesian Paragraph Vectors. Learning Fixation Point Strategy for Object Detection and Classification. Robust Real-Time Multi-View Eye Tracking. From Multimodal to Unimodal Webpages for Developing Countries. Multiple Insta …

Deep Learning and Conditional Random Fields-based Depth Estimation and Topographical Reconstruction from Conventional Endoscopy


Title	Deep Learning and Conditional Random Fields-based Depth Estimation and Topographical Reconstruction from Conventional Endoscopy
Authors	Faisal Mahmood, Nicholas J. Durr
Abstract	Colorectal cancer is the fourth leading cause of cancer deaths worldwide and the second leading cause in the United States. The risk of colorectal cancer can be mitigated by the identification and removal of premalignant lesions through optical colonoscopy. Unfortunately, conventional colonoscopy misses more than 20% of the polyps that should be removed, due in part to poor contrast of lesion topography. Imaging tissue topography during a colonoscopy is difficult because of the size constraints of the endoscope and the deforming mucosa. Most existing methods make geometric assumptions or incorporate a priori information, which limits accuracy and sensitivity. In this paper, we present a method that avoids these restrictions, using a joint deep convolutional neural network-conditional random field (CNN-CRF) framework. Estimated depth is used to reconstruct the topography of the surface of the colon from a single image. We train the unary and pairwise potential functions of a CRF in a CNN on synthetic data, generated by developing an endoscope camera model and rendering over 100,000 images of an anatomically-realistic colon. We validate our approach with real endoscopy images from a porcine colon, transferred to a synthetic-like domain, with ground truth from registered computed tomography measurements. The CNN-CRF approach estimates depths with a relative error of 0.152 for synthetic endoscopy images and 0.242 for real endoscopy images. We show that the estimated depth maps can be used for reconstructing the topography of the mucosa from conventional colonoscopy images. This approach can easily be integrated into existing endoscopy systems and provides a foundation for improving computer-aided detection algorithms for detection, segmentation and classification of lesions.
Tasks	Depth Estimation
Published	2017-10-30
URL	http://arxiv.org/abs/1710.11216v3
PDF	http://arxiv.org/pdf/1710.11216v3.pdf
PWC	https://paperswithcode.com/paper/deep-learning-and-conditional-random-fields
Repo
Framework

Bayesian Paragraph Vectors


Title	Bayesian Paragraph Vectors
Authors	Geng Ji, Robert Bamler, Erik B. Sudderth, Stephan Mandt
Abstract	Word2vec (Mikolov et al., 2013) has proven to be successful in natural language processing by capturing the semantic relationships between different words. Built on top of single-word embeddings, paragraph vectors (Le and Mikolov, 2014) find fixed-length representations for pieces of text with arbitrary lengths, such as documents, paragraphs, and sentences. In this work, we propose a novel interpretation for neural-network-based paragraph vectors by developing an unsupervised generative model whose maximum likelihood solution corresponds to traditional paragraph vectors. This probabilistic formulation allows us to go beyond point estimates of parameters and to perform Bayesian posterior inference. We find that the entropy of paragraph vectors decreases with the length of documents, and that information about posterior uncertainty improves performance in supervised learning tasks such as sentiment analysis and paraphrase detection.
Tasks	Sentiment Analysis, Word Embeddings
Published	2017-11-10
URL	http://arxiv.org/abs/1711.03946v2
PDF	http://arxiv.org/pdf/1711.03946v2.pdf
PWC	https://paperswithcode.com/paper/bayesian-paragraph-vectors
Repo
Framework

Learning Fixation Point Strategy for Object Detection and Classification


Title	Learning Fixation Point Strategy for Object Detection and Classification
Authors	Jie Lyu, Zejian Yuan, Dapeng Chen
Abstract	We propose a novel recurrent attentional structure to localize and recognize objects jointly. The network can learn to extract a sequence of local observations with detailed appearance and rough context, instead of sliding windows or convolutions on the entire image. Meanwhile, those observations are fused to complete detection and classification tasks. On training, we present a hybrid loss function to learn the parameters of the multi-task network end-to-end. Particularly, the combination of stochastic and object-awareness strategy, named SA, can select more abundant context and ensure the last fixation close to the object. In addition, we build a real-world dataset to verify the capacity of our method in detecting the object of interest including those small ones. Our method can predict a precise bounding box on an image, and achieve high speed on large images without pooling operations. Experimental results indicate that the proposed method can mine effective context by several local observations. Moreover, the precision and speed are easily improved by changing the number of recurrent steps. Finally, we will open the source code of our proposed approach.
Tasks	Object Detection
Published	2017-12-19
URL	http://arxiv.org/abs/1712.06897v1
PDF	http://arxiv.org/pdf/1712.06897v1.pdf
PWC	https://paperswithcode.com/paper/learning-fixation-point-strategy-for-object
Repo
Framework

Robust Real-Time Multi-View Eye Tracking


Title	Robust Real-Time Multi-View Eye Tracking
Authors	Nuri Murat Arar, Jean-Philippe Thiran
Abstract	Despite significant advances in improving the gaze tracking accuracy under controlled conditions, the tracking robustness under real-world conditions, such as large head pose and movements, use of eyeglasses, illumination and eye type variations, remains a major challenge in eye tracking. In this paper, we revisit this challenge and introduce a real-time multi-camera eye tracking framework to improve the tracking robustness. First, differently from previous work, we design a multi-view tracking setup that allows for acquiring multiple eye appearances simultaneously. Leveraging multi-view appearances enables to more reliably detect gaze features under challenging conditions, particularly when they are obstructed in conventional single-view appearance due to large head movements or eyewear effects. The features extracted on various appearances are then used for estimating multiple gaze outputs. Second, we propose to combine estimated gaze outputs through an adaptive fusion mechanism to compute user’s overall point of regard. The proposed mechanism firstly determines the estimation reliability of each gaze output according to user’s momentary head pose and predicted gazing behavior, and then performs a reliability-based weighted fusion. We demonstrate the efficacy of our framework with extensive simulations and user experiments on a collected dataset featuring 20 subjects. Our results show that in comparison with state-of-the-art eye trackers, the proposed framework provides not only a significant enhancement in accuracy but also a notable robustness. Our prototype system runs at 30 frames-per-second (fps) and achieves 1 degree accuracy under challenging experimental scenarios, which makes it suitable for applications demanding high accuracy and robustness.
Tasks	Eye Tracking
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05444v2
PDF	http://arxiv.org/pdf/1711.05444v2.pdf
PWC	https://paperswithcode.com/paper/robust-real-time-multi-view-eye-tracking
Repo
Framework

From Multimodal to Unimodal Webpages for Developing Countries


Title	From Multimodal to Unimodal Webpages for Developing Countries
Authors	Vidyapu Sandeep, V Vijaya Saradhi, Samit Bhattacharya
Abstract	The multimodal web elements such as text and images are associated with inherent memory costs to store and transfer over the Internet. With the limited network connectivity in developing countries, webpage rendering gets delayed in the presence of high-memory demanding elements such as images (relative to text). To overcome this limitation, we propose a Canonical Correlation Analysis (CCA) based computational approach to replace high-cost modality with an equivalent low-cost modality. Our model learns a common subspace for low-cost and high-cost modalities that maximizes the correlation between their visual features. The obtained common subspace is used for determining the low-cost (text) element of a given high-cost (image) element for the replacement. We analyze the cost-saving performance of the proposed approach through an eye-tracking experiment conducted on real-world webpages. Our approach reduces the memory-cost by at least 83.35% by replacing images with text.
Tasks	Eye Tracking
Published	2017-11-06
URL	http://arxiv.org/abs/1711.02068v1
PDF	http://arxiv.org/pdf/1711.02068v1.pdf
PWC	https://paperswithcode.com/paper/from-multimodal-to-unimodal-webpages-for
Repo
Framework

Multiple Instance Curriculum Learning for Weakly Supervised Object Detection


Title	Multiple Instance Curriculum Learning for Weakly Supervised Object Detection
Authors	Siyang Li, Xiangxin Zhu, Qin Huang, Hao Xu, C. -C. Jay Kuo
Abstract	When supervising an object detector with weakly labeled data, most existing approaches are prone to trapping in the discriminative object parts, e.g., finding the face of a cat instead of the full body, due to lacking the supervision on the extent of full objects. To address this challenge, we incorporate object segmentation into the detector training, which guides the model to correctly localize the full objects. We propose the multiple instance curriculum learning (MICL) method, which injects curriculum learning (CL) into the multiple instance learning (MIL) framework. The MICL method starts by automatically picking the easy training examples, where the extent of the segmentation masks agree with detection bounding boxes. The training set is gradually expanded to include harder examples to train strong detectors that handle complex images. The proposed MICL method with segmentation in the loop outperforms the state-of-the-art weakly supervised object detectors by a substantial margin on the PASCAL VOC datasets.
Tasks	Multiple Instance Learning, Object Detection, Semantic Segmentation, Weakly Supervised Object Detection
Published	2017-11-25
URL	http://arxiv.org/abs/1711.09191v1
PDF	http://arxiv.org/pdf/1711.09191v1.pdf
PWC	https://paperswithcode.com/paper/multiple-instance-curriculum-learning-for
Repo
Framework

Clustering of Data with Missing Entries using Non-convex Fusion Penalties


Title	Clustering of Data with Missing Entries using Non-convex Fusion Penalties
Authors	Sunrita Poddar, Mathews Jacob
Abstract	The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conventional clustering techniques where every feature is known for each point, our algorithm can handle cases where a few feature values are unknown for every point. For this more challenging problem, we provide theoretical guarantees for clustering using a $\ell_0$ fusion penalty based optimization problem. Furthermore, we propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. It is observed that this algorithm produces solutions that degrade gradually with an increase in the fraction of missing feature values. We demonstrate the utility of the proposed method using a simulated dataset, the Wine dataset and also an under-sampled cardiac MRI dataset. It is shown that the proposed method is a promising clustering technique for datasets with large fractions of missing entries.
Tasks
Published	2017-09-06
URL	http://arxiv.org/abs/1709.01870v1
PDF	http://arxiv.org/pdf/1709.01870v1.pdf
PWC	https://paperswithcode.com/paper/clustering-of-data-with-missing-entries-using
Repo
Framework

Semi-parametric Network Structure Discovery Models


Title	Semi-parametric Network Structure Discovery Models
Authors	Amir Dezfouli, Edwin V. Bonilla, Richard Nock
Abstract	We propose a network structure discovery model for continuous observations that generalizes linear causal models by incorporating a Gaussian process (GP) prior on a network-independent component, and random sparsity and weight matrices as the network-dependent parameters. This approach provides flexible modeling of network-independent trends in the observations as well as uncertainty quantification around the discovered network structure. We establish a connection between our model and multi-task GPs and develop an efficient stochastic variational inference algorithm for it. Furthermore, we formally show that our approach is numerically stable and in fact numerically easy to carry out almost everywhere on the support of the random variables involved. Finally, we evaluate our model on three applications, showing that it outperforms previous approaches. We provide a qualitative and quantitative analysis of the structures discovered for domains such as the study of the full genome regulation of the yeast Saccharomyces cerevisiae.
Tasks
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08530v1
PDF	http://arxiv.org/pdf/1702.08530v1.pdf
PWC	https://paperswithcode.com/paper/semi-parametric-network-structure-discovery
Repo
Framework

Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network


Title	Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network
Authors	Jen Hong Tan, U. Rajendra Acharya, Sulatha V. Bhandary, Kuang Chua Chua, Sobha Sivaprasad
Abstract	We have developed and trained a convolutional neural network to automatically and simultaneously segment optic disc, fovea and blood vessels. Fundus images were normalised before segmentation was performed to enforce consistency in background lighting and contrast. For every effective point in the fundus image, our algorithm extracted three channels of input from the neighbourhood of the point and forward the response across the 7 layer network. In average, our segmentation achieved an accuracy of 92.68 percent on the testing set from Drive database.
Tasks
Published	2017-02-02
URL	http://arxiv.org/abs/1702.00509v1
PDF	http://arxiv.org/pdf/1702.00509v1.pdf
PWC	https://paperswithcode.com/paper/segmentation-of-optic-disc-fovea-and-retinal
Repo
Framework

Remote Sensing Image Classification with Large Scale Gaussian Processes


Title	Remote Sensing Image Classification with Large Scale Gaussian Processes
Authors	Pablo Morales-Alvarez, Adrian Perez-Suay, Rafael Molina, Gustau Camps-Valls
Abstract	Current remote sensing image classification problems have to deal with an unprecedented amount of heterogeneous and complex data sources. Upcoming missions will soon provide large data streams that will make land cover/use classification difficult. Machine learning classifiers can help at this, and many methods are currently available. A popular kernel classifier is the Gaussian process classifier (GPC), since it approaches the classification problem with a solid probabilistic treatment, thus yielding confidence intervals for the predictions as well as very competitive results to state-of-the-art neural networks and support vector machines. However, its computational cost is prohibitive for large scale applications, and constitutes the main obstacle precluding wide adoption. This paper tackles this problem by introducing two novel efficient methodologies for Gaussian Process (GP) classification. We first include the standard random Fourier features approximation into GPC, which largely decreases its computational cost and permits large scale remote sensing image classification. In addition, we propose a model which avoids randomly sampling a number of Fourier frequencies, and alternatively learns the optimal ones within a variational Bayes approach. The performance of the proposed methods is illustrated in complex problems of cloud detection from multispectral imagery and infrared sounding data. Excellent empirical results support the proposal in both computational cost and accuracy.
Tasks	Cloud Detection, Gaussian Processes, Image Classification, Remote Sensing Image Classification
Published	2017-10-02
URL	http://arxiv.org/abs/1710.00575v2
PDF	http://arxiv.org/pdf/1710.00575v2.pdf
PWC	https://paperswithcode.com/paper/remote-sensing-image-classification-with
Repo
Framework

Plan Explanations as Model Reconciliation: Moving Beyond Explanation as Soliloquy


Title	Plan Explanations as Model Reconciliation: Moving Beyond Explanation as Soliloquy
Authors	Tathagata Chakraborti, Sarath Sreedharan, Yu Zhang, Subbarao Kambhampati
Abstract	When AI systems interact with humans in the loop, they are often called on to provide explanations for their plans and behavior. Past work on plan explanations primarily involved the AI system explaining the correctness of its plan and the rationale for its decision in terms of its own model. Such soliloquy is wholly inadequate in most realistic scenarios where the humans have domain and task models that differ significantly from that used by the AI system. We posit that the explanations are best studied in light of these differing models. In particular, we show how explanation can be seen as a “model reconciliation problem” (MRP), where the AI system in effect suggests changes to the human’s model, so as to make its plan be optimal with respect to that changed human model. We will study the properties of such explanations, present algorithms for automatically computing them, and evaluate the performance of the algorithms.
Tasks
Published	2017-01-28
URL	http://arxiv.org/abs/1701.08317v5
PDF	http://arxiv.org/pdf/1701.08317v5.pdf
PWC	https://paperswithcode.com/paper/plan-explanations-as-model-reconciliation
Repo
Framework

Better than Real: Complex-valued Neural Nets for MRI Fingerprinting


Title	Better than Real: Complex-valued Neural Nets for MRI Fingerprinting
Authors	Patrick Virtue, Stella X. Yu, Michael Lustig
Abstract	The task of MRI fingerprinting is to identify tissue parameters from complex-valued MRI signals. The prevalent approach is dictionary based, where a test MRI signal is compared to stored MRI signals with known tissue parameters and the most similar signals and tissue parameters retrieved. Such an approach does not scale with the number of parameters and is rather slow when the tissue parameter space is large. Our first novel contribution is to use deep learning as an efficient nonlinear inverse mapping approach. We generate synthetic (tissue, MRI) data from an MRI simulator, and use them to train a deep net to map the MRI signal to the tissue parameters directly. Our second novel contribution is to develop a complex-valued neural network with new cardioid activation functions. Our results demonstrate that complex-valued neural nets could be much more accurate than real-valued neural nets at complex-valued MRI fingerprinting.
Tasks
Published	2017-07-01
URL	http://arxiv.org/abs/1707.00070v1
PDF	http://arxiv.org/pdf/1707.00070v1.pdf
PWC	https://paperswithcode.com/paper/better-than-real-complex-valued-neural-nets
Repo
Framework

Tracking Persons-of-Interest via Unsupervised Representation Adaptation


Title	Tracking Persons-of-Interest via Unsupervised Representation Adaptation
Authors	Shun Zhang, Jia-Bin Huang, Jongwoo Lim, Yihong Gong, Jinjun Wang, Narendra Ahuja, Ming-Hsuan Yang
Abstract	Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Existing multi-target tracking methods often use low-level features which are not sufficiently discriminative for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face representations using convolutional neural networks (CNNs). Unlike existing CNN-based approaches which are only trained on large-scale face image datasets offline, we use the contextual constraints to generate a large number of training samples for a given video, and further adapt the pre-trained face CNN to specific videos using discovered training samples. Using these training samples, we optimize the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity via minimizing a triplet loss function. With the learned discriminative features, we apply the hierarchical clustering algorithm to link tracklets across multiple shots to generate trajectories. We extensively evaluate the proposed algorithm on two sets of TV sitcoms and YouTube music videos, analyze the contribution of each component, and demonstrate significant performance improvement over existing techniques.
Tasks
Published	2017-10-05
URL	http://arxiv.org/abs/1710.02139v1
PDF	http://arxiv.org/pdf/1710.02139v1.pdf
PWC	https://paperswithcode.com/paper/tracking-persons-of-interest-via-unsupervised
Repo
Framework

Bayesian Boolean Matrix Factorisation


Title	Bayesian Boolean Matrix Factorisation
Authors	Tammo Rukat, Chris C. Holmes, Michalis K. Titsias, Christopher Yau
Abstract	Boolean matrix factorisation aims to decompose a binary data matrix into an approximate Boolean product of two low rank, binary matrices: one containing meaningful patterns, the other quantifying how the observations can be expressed as a combination of these patterns. We introduce the OrMachine, a probabilistic generative model for Boolean matrix factorisation and derive a Metropolised Gibbs sampler that facilitates efficient parallel posterior inference. On real world and simulated data, our method outperforms all currently existing approaches for Boolean matrix factorisation and completion. This is the first method to provide full posterior inference for Boolean Matrix factorisation which is relevant in applications, e.g. for controlling false positive rates in collaborative filtering and, crucially, improves the interpretability of the inferred patterns. The proposed algorithm scales to large datasets as we demonstrate by analysing single cell gene expression data in 1.3 million mouse brain cells across 11 thousand genes on commodity hardware.
Tasks
Published	2017-02-20
URL	http://arxiv.org/abs/1702.06166v2
PDF	http://arxiv.org/pdf/1702.06166v2.pdf
PWC	https://paperswithcode.com/paper/bayesian-boolean-matrix-factorisation
Repo
Framework

Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition


Title	Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition
Authors	Jinmian Ye, Linnan Wang, Guangxi Li, Di Chen, Shandian Zhe, Xinqi Chu, Zenglin Xu
Abstract	Recurrent Neural Networks (RNNs) are powerful sequence modeling tools. However, when dealing with high dimensional inputs, the training of RNNs becomes computational expensive due to the large number of model parameters. This hinders RNNs from solving many important computer vision tasks, such as Action Recognition in Videos and Image Captioning. To overcome this problem, we propose a compact and flexible structure, namely Block-Term tensor decomposition, which greatly reduces the parameters of RNNs and improves their training efficiency. Compared with alternative low-rank approximations, such as tensor-train RNN (TT-RNN), our method, Block-Term RNN (BT-RNN), is not only more concise (when using the same rank), but also able to attain a better approximation to the original RNNs with much fewer parameters. On three challenging tasks, including Action Recognition in Videos, Image Captioning and Image Generation, BT-RNN outperforms TT-RNN and the standard RNN in terms of both prediction accuracy and convergence rate. Specifically, BT-LSTM utilizes 17,388 times fewer parameters than the standard LSTM to achieve an accuracy improvement over 15.6% in the Action Recognition task on the UCF11 dataset.
Tasks	Action Recognition In Videos, Image Captioning, Image Generation, Temporal Action Localization
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05134v2
PDF	http://arxiv.org/pdf/1712.05134v2.pdf
PWC	https://paperswithcode.com/paper/learning-compact-recurrent-neural-networks-1
Repo
Framework