October 17, 2019

3027 words 15 mins read

Paper Group ANR 903

Paper Group ANR 903

Semi-Supervised Deep Metrics for Image Registration. Thermal to Visible Synthesis of Face Images using Multiple Regions. Online Decomposition of Compressive Streaming Data Using $n$-$\ell_1$ Cluster-Weighted Minimization. On the Ambiguity of Registration Uncertainty. Improving Nighttime Retrieval-Based Localization. Integrating Weakly Supervised Wo …

Semi-Supervised Deep Metrics for Image Registration

Title Semi-Supervised Deep Metrics for Image Registration
Authors Alireza Sedghi, Jie Luo, Alireza Mehrtash, Steve Pieper, Clare M. Tempany, Tina Kapur, Parvin Mousavi, William M. Wells III
Abstract Deep metrics have been shown effective as similarity measures in multi-modal image registration; however, the metrics are currently constructed from aligned image pairs in the training data. In this paper, we propose a strategy for learning such metrics from roughly aligned training data. Symmetrizing the data corrects bias in the metric that results from misalignment in the data (at the expense of increased variance), while random perturbations to the data, i.e. dithering, ensures that the metric has a single mode, and is amenable to registration by optimization. Evaluation is performed on the task of registration on separate unseen test image pairs. The results demonstrate the feasibility of learning a useful deep metric from substantially misaligned training data, in some cases the results are significantly better than from Mutual Information. Data augmentation via dithering is, therefore, an effective strategy for discharging the need for well-aligned training data; this brings deep metric registration from the realm of supervised to semi-supervised machine learning.
Tasks Data Augmentation, Image Registration
Published 2018-04-04
URL http://arxiv.org/abs/1804.01565v1
PDF http://arxiv.org/pdf/1804.01565v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-deep-metrics-for-image
Repo
Framework

Thermal to Visible Synthesis of Face Images using Multiple Regions

Title Thermal to Visible Synthesis of Face Images using Multiple Regions
Authors Benjamin S. Riggan, Nathaniel J. Short, Shuowen Hu
Abstract Synthesis of visible spectrum faces from thermal facial imagery is a promising approach for heterogeneous face recognition; enabling existing face recognition software trained on visible imagery to be leveraged, and allowing human analysts to verify cross-spectrum matches more effectively. We propose a new synthesis method to enhance the discriminative quality of synthesized visible face imagery by leveraging both global (e.g., entire face) and local regions (e.g., eyes, nose, and mouth). Here, each region provides (1) an independent representation for the corresponding area, and (2) additional regularization terms, which impact the overall quality of synthesized images. We analyze the effects of using multiple regions to synthesize a visible face image from a thermal face. We demonstrate that our approach improves cross-spectrum verification rates over recently published synthesis approaches. Moreover, using our synthesized imagery, we report the results on facial landmark detection-commonly used for image registration-which is a critical part of the face recognition process.
Tasks Face Recognition, Facial Landmark Detection, Heterogeneous Face Recognition, Image Registration
Published 2018-03-20
URL http://arxiv.org/abs/1803.07599v1
PDF http://arxiv.org/pdf/1803.07599v1.pdf
PWC https://paperswithcode.com/paper/thermal-to-visible-synthesis-of-face-images
Repo
Framework

Online Decomposition of Compressive Streaming Data Using $n$-$\ell_1$ Cluster-Weighted Minimization

Title Online Decomposition of Compressive Streaming Data Using $n$-$\ell_1$ Cluster-Weighted Minimization
Authors Huynh Van Luong, Nikos Deligiannis, Søren Forchhammer, André Kaup
Abstract We consider a decomposition method for compressive streaming data in the context of online compressive Robust Principle Component Analysis (RPCA). The proposed decomposition solves an $n$-$\ell_1$ cluster-weighted minimization to decompose a sequence of frames (or vectors), into sparse and low-rank components, from compressive measurements. Our method processes a data vector of the stream per time instance from a small number of measurements in contrast to conventional batch RPCA, which needs to access full data. The $n$-$\ell_1$ cluster-weighted minimization leverages the sparse components along with their correlations with multiple previously-recovered sparse vectors. Moreover, the proposed minimization can exploit the structures of sparse components via clustering and re-weighting iteratively. The method outperforms the existing methods for both numerical data and actual video data.
Tasks
Published 2018-02-08
URL http://arxiv.org/abs/1802.02885v1
PDF http://arxiv.org/pdf/1802.02885v1.pdf
PWC https://paperswithcode.com/paper/online-decomposition-of-compressive-streaming
Repo
Framework

On the Ambiguity of Registration Uncertainty

Title On the Ambiguity of Registration Uncertainty
Authors Jie Luo, Sarah Frisken, Karteek Popuri, Dana Cobzas, Frank Preiswerk, Matt Toews, Miaomiao Zhang, Hongyi Ding, Polina Golland, Alexandra Golby, Masashi Sugiyama, William M. Wells III
Abstract Estimating the uncertainty in image registration is an area of current research that is aimed at providing information that will enable surgeons to assess the operative risk based on registered image data and the estimated registration uncertainty. If they receive inaccurately calculated registration uncertainty and misplace confidence in the alignment solutions, severe consequences may result. For probabilistic image registration (PIR), most research quantifies the registration uncertainty using summary statistics of the transformation distributions. In this paper, we study a rarely examined topic: whether those summary statistics of the transformation distribution truly represent the registration uncertainty. Using concrete examples, we show that there are two types of uncertainties: the transformation uncertainty, Ut, and label uncertainty Ul. Ut indicates the doubt concerning transformation parameters and can be estimated by conventional uncertainty measures, while Ul is strongly linked to the goal of registration. Further, we show that using Ut to quantify Ul is inappropriate and can be misleading. In addition, we present some potentially critical findings regarding PIR.
Tasks Image Registration
Published 2018-03-14
URL http://arxiv.org/abs/1803.05266v2
PDF http://arxiv.org/pdf/1803.05266v2.pdf
PWC https://paperswithcode.com/paper/on-the-ambiguity-of-registration-uncertainty
Repo
Framework

Improving Nighttime Retrieval-Based Localization

Title Improving Nighttime Retrieval-Based Localization
Authors Hugo Germain, Guillaume Bourmaud, Vincent Lepetit
Abstract Outdoor visual localization is a crucial component to many computer vision systems. We propose an approach to localization from images that is designed to explicitly handle the strong variations in appearance happening between daytime and nighttime. As revealed by recent long-term localization benchmarks, both traditional feature-based and retrieval-based approaches still struggle to handle such changes. Our novel localization method combines a state-of-the-art image retrieval architecture with condition-specific sub-networks allowing the computation of global image descriptors that are explicitly dependent of the capturing conditions. We show that our approach improves localization by a factor of almost 300% compared to the popular VLAD-based methods on nighttime localization.
Tasks Image Retrieval, Visual Localization
Published 2018-12-10
URL http://arxiv.org/abs/1812.03707v3
PDF http://arxiv.org/pdf/1812.03707v3.pdf
PWC https://paperswithcode.com/paper/efficient-condition-based-representations-for
Repo
Framework

Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation

Title Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation
Authors Xiao Pu, Nikolaos Pappas, James Henderson, Andrei Popescu-Belis
Abstract This paper demonstrates that word sense disambiguation (WSD) can improve neural machine translation (NMT) by widening the source context considered when modeling the senses of potentially ambiguous words. We first introduce three adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant processes, and random walks, which are then applied to large word contexts represented in a low-rank space and evaluated on SemEval shared-task data. We then learn word vectors jointly with sense vectors defined by our best WSD method, within a state-of-the-art NMT system. We show that the concatenation of these vectors, and the use of a sense selection mechanism based on the weighted average of sense vectors, outperforms several baselines including sense-aware ones. This is demonstrated by translation on five language pairs. The improvements are above one BLEU point over strong NMT baselines, +4% accuracy over all ambiguous nouns and verbs, or +20% when scored manually over several challenging words.
Tasks Machine Translation, Word Sense Disambiguation
Published 2018-10-05
URL http://arxiv.org/abs/1810.02614v1
PDF http://arxiv.org/pdf/1810.02614v1.pdf
PWC https://paperswithcode.com/paper/integrating-weakly-supervised-word-sense
Repo
Framework
Title Localization: A Missing Link in the Pipeline of Object Matching and Registration
Authors Deepak Mishra, Rajeev Ranjan, Santanu Chaudhury, Mukul Sarkar, Arvinder Singh Soin
Abstract Image registration is a process of aligning two or more images of same objects using geometric transformation. Most of the existing approaches work on the assumption of location invariance. These approaches require object-centric images to perform matching. Further, in absence of intensity level symmetry between the corresponding points in two images, the learning based registration approaches rely on synthetic deformations, which often fail in real scenarios. To address these issues, a combination of convolutional neural networks (CNNs) to perform the desired registration is developed in this work. The complete objective is divided into three sub-objectives: object localization, segmentation and matching transformation. Object localization step establishes an initial correspondence between the images. A modified version of single shot multi-box detector is used for this purpose. The detected region is cropped to make the images object-centric. Subsequently, the objects are segmented and matched using a spatial transformer network employing thin plate spline deformation. Initial experiments on MNIST and Caltech-101 datasets show that the proposed model is able to produce accurate matching. Quantitative evaluation performed using dice coefficient (DC) and mean intersection over union (mIoU) show that proposed method results in the values of 79% and 66%, respectively for MNIST dataset and the values of 94% and 90%, respectively for Caltech-101 dataset. The proposed framework is extended to the registration of CT and US images, which is free from any data specific assumptions and has better generalization capability as compared to the existing rule-based/classical approaches.
Tasks Image Registration, Object Localization
Published 2018-05-01
URL http://arxiv.org/abs/1805.00223v2
PDF http://arxiv.org/pdf/1805.00223v2.pdf
PWC https://paperswithcode.com/paper/localization-a-missing-link-in-the-pipeline
Repo
Framework

Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018)

Title Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018)
Authors Been Kim, Kush R. Varshney, Adrian Weller
Abstract This is the Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), which was held in Stockholm, Sweden, July 14, 2018. Invited speakers were Barbara Engelhardt, Cynthia Rudin, Fernanda Vi'egas, and Martin Wattenberg.
Tasks
Published 2018-07-03
URL http://arxiv.org/abs/1807.01308v1
PDF http://arxiv.org/pdf/1807.01308v1.pdf
PWC https://paperswithcode.com/paper/proceedings-of-the-2018-icml-workshop-on
Repo
Framework

Detecting unseen visual relations using analogies

Title Detecting unseen visual relations using analogies
Authors Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic
Abstract We seek to detect visual relations in images of the form of triplets t = (subject, predicate, object), such as “person riding dog”, where training examples of the individual entities are available but their combinations are unseen at training. This is an important set-up due to the combinatorial nature of visual relations : collecting sufficient training data for all possible triplets would be very hard. The contributions of this work are three-fold. First, we learn a representation of visual relations that combines (i) individual embeddings for subject, object and predicate together with (ii) a visual phrase embedding that represents the relation triplet. Second, we learn how to transfer visual phrase embeddings from existing training triplets to unseen test triplets using analogies between relations that involve similar objects. Third, we demonstrate the benefits of our approach on three challenging datasets : on HICO-DET, our model achieves significant improvement over a strong baseline for both frequent and unseen triplets, and we observe similar improvement for the retrieval of unseen triplets with out-of-vocabulary predicates on the COCO-a dataset as well as the challenging unusual triplets in the UnRel dataset.
Tasks
Published 2018-12-13
URL https://arxiv.org/abs/1812.05736v3
PDF https://arxiv.org/pdf/1812.05736v3.pdf
PWC https://paperswithcode.com/paper/detecting-rare-visual-relations-using
Repo
Framework

Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach

Title Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach
Authors Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P. Adams, Peter Orbanz
Abstract Modern neural networks are highly overparameterized, with capacity to substantially overfit to training data. Nevertheless, these networks often generalize well in practice. It has also been observed that trained networks can often be “compressed” to much smaller representations. The purpose of this paper is to connect these two empirical observations. Our main technical result is a generalization bound for compressed networks based on the compressed size. Combined with off-the-shelf compression algorithms, the bound leads to state of the art generalization guarantees; in particular, we provide the first non-vacuous generalization guarantees for realistic architectures applied to the ImageNet classification problem. As additional evidence connecting compression and generalization, we show that compressibility of models that tend to overfit is limited: We establish an absolute limit on expected compressibility as a function of expected generalization error, where the expectations are over the random choice of training examples. The bounds are complemented by empirical results that show an increase in overfitting implies an increase in the number of bits required to describe a trained network.
Tasks
Published 2018-04-16
URL http://arxiv.org/abs/1804.05862v3
PDF http://arxiv.org/pdf/1804.05862v3.pdf
PWC https://paperswithcode.com/paper/non-vacuous-generalization-bounds-at-the
Repo
Framework

Space-Time Extension of the MEM Approach for Electromagnetic Neuroimaging

Title Space-Time Extension of the MEM Approach for Electromagnetic Neuroimaging
Authors Marie-Christine Roubaud, Jean-Marc Lina, Julie Carrier, B Torrésani
Abstract The wavelet Maximum Entropy on the Mean (wMEM) approach to the MEG inverse problem is revisited and extended to infer brain activity from full space-time data. The resulting dimensionality increase is tackled using a collection of techniques , that includes time and space dimension reduction (using respectively wavelet and spatial filter based reductions), Kronecker product modeling for covariance matrices, and numerical manipulation of the free energy directly in matrix form. This leads to a smooth numerical optimization problem of reasonable dimension, solved using standard approaches. The method is applied to the MEG inverse problem. Results of a simulation study in the context of slow wave localization from sleep MEG data are presented and discussed. Index Terms: MEG inverse problem, maximum entropy on the mean, wavelet decomposition, spatial filters, Kronecker covariance factorization, sleep slow waves.
Tasks Dimensionality Reduction
Published 2018-07-24
URL http://arxiv.org/abs/1807.08959v1
PDF http://arxiv.org/pdf/1807.08959v1.pdf
PWC https://paperswithcode.com/paper/space-time-extension-of-the-mem-approach-for
Repo
Framework

Large-Scale QA-SRL Parsing

Title Large-Scale QA-SRL Parsing
Authors Nicholas FitzGerald, Julian Michael, Luheng He, Luke Zettlemoyer
Abstract We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser. Our corpus, QA-SRL Bank 2.0, consists of over 250,000 question-answer pairs for over 64,000 sentences across 3 domains and was gathered with a new crowd-sourcing scheme that we show has high precision and good recall at modest cost. We also present neural models for two QA-SRL subtasks: detecting argument spans for a predicate and generating questions to label the semantic relationship. The best models achieve question accuracy of 82.6% and span-level accuracy of 77.6% (under human evaluation) on the full pipelined QA-SRL prediction task. They can also, as we show, be used to gather additional annotations at low cost.
Tasks Semantic Role Labeling
Published 2018-05-14
URL http://arxiv.org/abs/1805.05377v1
PDF http://arxiv.org/pdf/1805.05377v1.pdf
PWC https://paperswithcode.com/paper/large-scale-qa-srl-parsing
Repo
Framework

FPAN: Fine-grained and Progressive Attention Localization Network for Data Retrieval

Title FPAN: Fine-grained and Progressive Attention Localization Network for Data Retrieval
Authors Sijia Chen, Bin Song, Jie Guo, Xiaojiang Du, Mohsen Guizani
Abstract The Localization of the target object for data retrieval is a key issue in the Intelligent and Connected Transportation Systems (ICTS). However, due to lack of intelligence in the traditional transportation system, it can take tremendous resources to manually retrieve and locate the queried objects among a large number of images. In order to solve this issue, we propose an effective method to query-based object localization that uses artificial intelligence techniques to automatically locate the queried object in the complex background. The presented method is termed as Fine-grained and Progressive Attention Localization Network (FPAN), which uses an image and a queried object as input to accurately locate the target object in the image. Specifically, the fine-grained attention module is naturally embedded into each layer of the convolution neural network (CNN), thereby gradually suppressing the regions that are irrelevant to the queried object and eventually shrinking attention to the target area. We further employ top-down attentions fusion algorithm operated by a learnable cascade up-sampling structure to establish the connection between the attention map and the exact location of the queried object in the original image. Furthermore, the FPAN is trained by multi-task learning with box segmentation loss and cosine loss. At last, we conduct comprehensive experiments on both queried-based digit localization and object tracking with synthetic and benchmark datasets, respectively. The experimental results show that our algorithm is far superior to other algorithms in the synthesis datasets and outperforms most existing trackers on the OTB and VOT datasets.
Tasks Multi-Task Learning, Object Localization, Object Tracking
Published 2018-04-05
URL http://arxiv.org/abs/1804.02056v1
PDF http://arxiv.org/pdf/1804.02056v1.pdf
PWC https://paperswithcode.com/paper/fpan-fine-grained-and-progressive-attention
Repo
Framework

Partial Policy-based Reinforcement Learning for Anatomical Landmark Localization in 3D Medical Images

Title Partial Policy-based Reinforcement Learning for Anatomical Landmark Localization in 3D Medical Images
Authors Walid Abdullah Al, Il Dong Yun
Abstract Deploying the idea of long-term cumulative return, reinforcement learning has shown remarkable performance in various fields. We propose a formulation of the landmark localization in 3D medical images as a reinforcement learning problem. Whereas value-based methods have been widely used to solve similar problems, we adopt an actor-critic based direct policy search method framed in a temporal difference learning approach. Successful behavior learning is challenging in large state and/or action spaces, requiring many trials. We introduce a partial policy-based reinforcement learning to enable solving the large problem of localization by learning the optimal policy on smaller partial domains. Independent actors efficiently learn the corresponding partial policies, each utilizing their own independent critic. The proposed policy reconstruction from the partial policies ensures a robust and efficient localization utilizing the sub-agents solving simple binary decision problems in their corresponding partial action spaces. The proposed reinforcement learning requires a small number of trials to learn the optimal behavior compared with the original behavior learning scheme.
Tasks
Published 2018-07-09
URL http://arxiv.org/abs/1807.02908v2
PDF http://arxiv.org/pdf/1807.02908v2.pdf
PWC https://paperswithcode.com/paper/partial-policy-based-reinforcement-learning
Repo
Framework

Weakly Supervised Object Localization on grocery shelves using simple FCN and Synthetic Dataset

Title Weakly Supervised Object Localization on grocery shelves using simple FCN and Synthetic Dataset
Authors Srikrishna Varadarajan, Muktabh Mayank Srivastava
Abstract We propose a weakly supervised method using two algorithms to predict object bounding boxes given only an image classification dataset. First algorithm is a simple Fully Convolutional Network (FCN) trained to classify object instances. We use the property of FCN to return a mask for images larger than training images to get a primary output segmentation mask during test time by passing an image pyramid to it. We enhance the FCN output mask into final output bounding boxes by a Convolutional Encoder-Decoder (ConvAE) viz. the second algorithm. ConvAE is trained to localize objects on an artificially generated dataset of output segmentation masks. We demonstrate the effectiveness of this method in localizing objects in grocery shelves where annotating data for object detection is hard due to variety of objects. This method can be extended to any problem domain where collecting images of objects is easy and annotating their coordinates is hard.
Tasks Image Classification, Object Detection, Object Localization, Weakly-Supervised Object Localization
Published 2018-03-19
URL http://arxiv.org/abs/1803.06813v2
PDF http://arxiv.org/pdf/1803.06813v2.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-object-localization-on
Repo
Framework
comments powered by Disqus