July 28, 2019

2768 words 13 mins read

Paper Group ANR 165

Paper Group ANR 165

Dense RGB-D semantic mapping with Pixel-Voxel neural network. A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions. Using KL-divergence to focus Deep Visual Explanation. Learning Overcomplete HMMs. Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets. Towards …

Dense RGB-D semantic mapping with Pixel-Voxel neural network

Title Dense RGB-D semantic mapping with Pixel-Voxel neural network
Authors Cheng Zhao, Li Sun, Pulak Purkait, Rustam Stolkin
Abstract For intelligent robotics applications, extending 3D mapping to 3D semantic mapping enables robots to, not only localize themselves with respect to the scene’s geometrical features but also simultaneously understand the higher level meaning of the scene contexts. Most previous methods focus on geometric 3D reconstruction and scene understanding independently notwithstanding the fact that joint estimation can boost the accuracy of the semantic mapping. In this paper, a dense RGB-D semantic mapping system with a Pixel-Voxel network is proposed, which can perform dense 3D mapping while simultaneously recognizing and semantically labelling each point in the 3D map. The proposed Pixel-Voxel network obtains global context information by using PixelNet to exploit the RGB image and meanwhile, preserves accurate local shape information by using VoxelNet to exploit the corresponding 3D point cloud. Unlike the existing architecture that fuses score maps from different models with equal weights, we proposed a Softmax weighted fusion stack that adaptively learns the varying contributions of PixelNet and VoxelNet, and fuses the score maps of the two models according to their respective confidence levels. The proposed Pixel-Voxel network achieves the state-of-the-art semantic segmentation performance on the SUN RGB-D benchmark dataset. The runtime of the proposed system can be boosted to 11-12Hz, enabling near to real-time performance using an i7 8-cores PC with Titan X GPU.
Tasks 3D Reconstruction, Scene Understanding, Semantic Segmentation
Published 2017-09-30
URL http://arxiv.org/abs/1710.00132v3
PDF http://arxiv.org/pdf/1710.00132v3.pdf
PWC https://paperswithcode.com/paper/dense-rgb-d-semantic-mapping-with-pixel-voxel
Repo
Framework

A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions

Title A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions
Authors Milind S. Gide, Lina J. Karam
Abstract With the increased focus on visual attention (VA) in the last decade, a large number of computational visual saliency methods have been developed over the past few years. These models are traditionally evaluated by using performance evaluation metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though a considerable number of such metrics have been proposed in the literature, there are notable problems in them. In this work, we discuss shortcomings in existing metrics through illustrative examples and propose a new metric that uses local weights based on fixation density which overcomes these flaws. To compare the performance of our proposed metric at assessing the quality of saliency prediction with other existing metrics, we construct a ground-truth subjective database in which saliency maps obtained from 17 different VA models are evaluated by 16 human observers on a 5-point categorical scale in terms of their visual resemblance with corresponding ground-truth fixation density maps obtained from eye-tracking data. The metrics are evaluated by correlating metric scores with the human subjective ratings. The correlation results show that the proposed evaluation metric outperforms all other popular existing metrics. Additionally, the constructed database and corresponding subjective ratings provide an insight into which of the existing metrics and future metrics are better at estimating the quality of saliency prediction and can be used as a benchmark.
Tasks Eye Tracking, Saliency Prediction
Published 2017-08-01
URL http://arxiv.org/abs/1708.00169v1
PDF http://arxiv.org/pdf/1708.00169v1.pdf
PWC https://paperswithcode.com/paper/a-locally-weighted-fixation-density-based
Repo
Framework

Using KL-divergence to focus Deep Visual Explanation

Title Using KL-divergence to focus Deep Visual Explanation
Authors Housam Khalifa Bashier Babiker, Randy Goebel
Abstract We present a method for explaining the image classification predictions of deep convolution neural networks, by highlighting the pixels in the image which influence the final class prediction. Our method requires the identification of a heuristic method to select parameters hypothesized to be most relevant in this prediction, and here we use Kullback-Leibler divergence to provide this focus. Overall, our approach helps in understanding and interpreting deep network predictions and we hope contributes to a foundation for such understanding of deep learning networks. In this brief paper, our experiments evaluate the performance of two popular networks in this context of interpretability.
Tasks Image Classification
Published 2017-11-17
URL http://arxiv.org/abs/1711.06431v2
PDF http://arxiv.org/pdf/1711.06431v2.pdf
PWC https://paperswithcode.com/paper/using-kl-divergence-to-focus-deep-visual
Repo
Framework

Learning Overcomplete HMMs

Title Learning Overcomplete HMMs
Authors Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant
Abstract We study the problem of learning overcomplete HMMs—those that have many hidden states but a small output alphabet. Despite having significant practical importance, such HMMs are poorly understood with no known positive or negative results for efficient learning. In this paper, we present several new results—both positive and negative—which help define the boundaries between the tractable and intractable settings. Specifically, we show positive results for a large subclass of HMMs whose transition matrices are sparse, well-conditioned, and have small probability mass on short cycles. On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree. We also discuss these results in the context of learning HMMs which can capture long-term dependencies.
Tasks
Published 2017-11-07
URL http://arxiv.org/abs/1711.02309v2
PDF http://arxiv.org/pdf/1711.02309v2.pdf
PWC https://paperswithcode.com/paper/learning-overcomplete-hmms
Repo
Framework

Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets

Title Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets
Authors Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, Mubarak Shah
Abstract In this paper, a unified three-layer hierarchical approach for solving tracking problems in multiple non-overlapping cameras is proposed. Given a video and a set of detections (obtained by any person detector), we first solve within-camera tracking employing the first two layers of our framework and, then, in the third layer, we solve across-camera tracking by merging tracks of the same person in all cameras in a simultaneous fashion. To best serve our purpose, a constrained dominant sets clustering (CDSC) technique, a parametrized version of standard quadratic optimization, is employed to solve both tracking tasks. The tracking problem is caste as finding constrained dominant sets from a graph. In addition to having a unified framework that simultaneously solves within- and across-camera tracking, the third layer helps link broken tracks of the same person occurring during within-camera tracking. In this work, we propose a fast algorithm, based on dynamics from evolutionary game theory, which is efficient and salable to large-scale real-world applications.
Tasks
Published 2017-06-19
URL http://arxiv.org/abs/1706.06196v1
PDF http://arxiv.org/pdf/1706.06196v1.pdf
PWC https://paperswithcode.com/paper/multi-target-tracking-in-multiple-non
Repo
Framework

Towards reduction of autocorrelation in HMC by machine learning

Title Towards reduction of autocorrelation in HMC by machine learning
Authors Akinori Tanaka, Akio Tomiya
Abstract In this paper we propose new algorithm to reduce autocorrelation in Markov chain Monte-Carlo algorithms for euclidean field theories on the lattice. Our proposing algorithm is the Hybrid Monte-Carlo algorithm (HMC) with restricted Boltzmann machine. We examine the validity of the algorithm by employing the phi-fourth theory in three dimension. We observe reduction of the autocorrelation both in symmetric and broken phase as well. Our proposing algorithm provides consistent central values of expectation values of the action density and one-point Green’s function with ones from the original HMC in both the symmetric phase and broken phase within the statistical error. On the other hand, two-point Green’s functions have slight difference between one calculated by the HMC and one by our proposing algorithm in the symmetric phase. Furthermore, near the criticality, the distribution of the one-point Green’s function differs from the one from HMC. We discuss the origin of discrepancies and its improvement.
Tasks
Published 2017-12-11
URL http://arxiv.org/abs/1712.03893v1
PDF http://arxiv.org/pdf/1712.03893v1.pdf
PWC https://paperswithcode.com/paper/towards-reduction-of-autocorrelation-in-hmc
Repo
Framework

Joint Probabilistic Linear Discriminant Analysis

Title Joint Probabilistic Linear Discriminant Analysis
Authors Luciana Ferrer
Abstract Standard probabilistic linear discriminant analysis (PLDA) for speaker recognition assumes that the sample’s features (usually, i-vectors) are given by a sum of three terms: a term that depends on the speaker identity, a term that models the within-speaker variability and is assumed independent across samples, and a final term that models any remaining variability and is also independent across samples. In this work, we propose a generalization of this model where the within-speaker variability is not necessarily assumed independent across samples but dependent on another discrete variable. This variable, which we call the channel variable as in the standard PLDA approach, could be, for example, a discrete category for the channel characteristics, the language spoken by the speaker, the type of speech in the sample (conversational, monologue, read), etc. The value of this variable is assumed to be known during training but not during testing. Scoring is performed, as in standard PLDA, by computing a likelihood ratio between the null hypothesis that the two sides of a trial belong to the same speaker versus the alternative hypothesis that the two sides belong to different speakers. The two likelihoods are computed by marginalizing over two hypothesis about the channels in both sides of a trial: that they are the same and that they are different. This way, we expect that the new model will be better at coping with same-channel versus different-channel trials than standard PLDA, since knowledge about the channel (or language, or speech style) is used during training and implicitly considered during scoring.
Tasks Speaker Recognition
Published 2017-04-07
URL http://arxiv.org/abs/1704.02346v2
PDF http://arxiv.org/pdf/1704.02346v2.pdf
PWC https://paperswithcode.com/paper/joint-probabilistic-linear-discriminant
Repo
Framework

Recover Missing Sensor Data with Iterative Imputing Network

Title Recover Missing Sensor Data with Iterative Imputing Network
Authors Jingguang Zhou, Zili Huang
Abstract Sensor data has been playing an important role in machine learning tasks, complementary to the human-annotated data that is usually rather costly. However, due to systematic or accidental mis-operations, sensor data comes very often with a variety of missing values, resulting in considerable difficulties in the follow-up analysis and visualization. Previous work imputes the missing values by interpolating in the observational feature space, without consulting any latent (hidden) dynamics. In contrast, our model captures the latent complex temporal dynamics by summarizing each observation’s context with a novel Iterative Imputing Network, thus significantly outperforms previous work on the benchmark Beijing air quality and meteorological dataset. Our model also yields consistent superiority over other methods in cases of different missing rates.
Tasks
Published 2017-11-20
URL http://arxiv.org/abs/1711.07878v1
PDF http://arxiv.org/pdf/1711.07878v1.pdf
PWC https://paperswithcode.com/paper/recover-missing-sensor-data-with-iterative
Repo
Framework

Text Summarization Techniques: A Brief Survey

Title Text Summarization Techniques: A Brief Survey
Authors Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, Krys Kochut
Abstract In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.
Tasks Text Summarization
Published 2017-07-07
URL http://arxiv.org/abs/1707.02268v3
PDF http://arxiv.org/pdf/1707.02268v3.pdf
PWC https://paperswithcode.com/paper/text-summarization-techniques-a-brief-survey
Repo
Framework

Deep Learning Diffuse Optical Tomography

Title Deep Learning Diffuse Optical Tomography
Authors Jaejun Yoo, Sohail Sabir, Duchang Heo, Kee Hyun Kim, Abdul Wahab, Yoonseok Choi, Seul-I Lee, Eun Young Chae, Hak Hee Kim, Young Min Bae, Young-wook Choi, Seungryong Cho, Jong Chul Ye
Abstract Diffuse optical tomography (DOT) has been investigated as an alternative imaging modality for breast cancer detection thanks to its excellent contrast to hemoglobin oxidization level. However, due to the complicated non-linear photon scattering physics and ill-posedness, the conventional reconstruction algorithms are sensitive to imaging parameters such as boundary conditions. To address this, here we propose a novel deep learning approach that learns non-linear photon scattering physics and obtains an accurate three dimensional (3D) distribution of optical anomalies. In contrast to the traditional black-box deep learning approaches, our deep network is designed to invert the Lippman-Schwinger integral equation using the recent mathematical theory of deep convolutional framelets. As an example of clinical relevance, we applied the method to our prototype DOT system. We show that our deep neural network, trained with only simulation data, can accurately recover the location of anomalies within biomimetic phantoms and live animals without the use of an exogenous contrast agent.
Tasks Breast Cancer Detection
Published 2017-12-04
URL https://arxiv.org/abs/1712.00912v2
PDF https://arxiv.org/pdf/1712.00912v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-can-reverse-photon-migration
Repo
Framework

A Survey of Deep Learning Methods for Relation Extraction

Title A Survey of Deep Learning Methods for Relation Extraction
Authors Shantanu Kumar
Abstract Relation Extraction is an important sub-task of Information Extraction which has the potential of employing deep learning (DL) models with the creation of large datasets using distant supervision. In this review, we compare the contributions and pitfalls of the various DL models that have been used for the task, to help guide the path ahead.
Tasks Relation Extraction
Published 2017-05-10
URL http://arxiv.org/abs/1705.03645v1
PDF http://arxiv.org/pdf/1705.03645v1.pdf
PWC https://paperswithcode.com/paper/a-survey-of-deep-learning-methods-for
Repo
Framework

Accelerating Discrete Wavelet Transforms on GPUs

Title Accelerating Discrete Wavelet Transforms on GPUs
Authors David Barina, Michal Kula, Michal Matysek, Pavel Zemcik
Abstract The two-dimensional discrete wavelet transform has a huge number of applications in image-processing techniques. Until now, several papers compared the performance of such transform on graphics processing units (GPUs). However, all of them only dealt with lifting and convolution computation schemes. In this paper, we show that corresponding horizontal and vertical lifting parts of the lifting scheme can be merged into non-separable lifting units, which halves the number of steps. We also discuss an optimization strategy leading to a reduction in the number of arithmetic operations. The schemes were assessed using the OpenCL and pixel shaders. The proposed non-separable lifting scheme outperforms the existing schemes in many cases, irrespective of its higher complexity.
Tasks
Published 2017-05-18
URL http://arxiv.org/abs/1705.08266v1
PDF http://arxiv.org/pdf/1705.08266v1.pdf
PWC https://paperswithcode.com/paper/accelerating-discrete-wavelet-transforms-on
Repo
Framework

On Classification of Distorted Images with Deep Convolutional Neural Networks

Title On Classification of Distorted Images with Deep Convolutional Neural Networks
Authors Yiren Zhou, Sibo Song, Ngai-Man Cheung
Abstract Image blur and image noise are common distortions during image acquisition. In this paper, we systematically study the effect of image distortions on the deep neural network (DNN) image classifiers. First, we examine the DNN classifier performance under four types of distortions. Second, we propose two approaches to alleviate the effect of image distortion: re-training and fine-tuning with noisy images. Our results suggest that, under certain conditions, fine-tuning with noisy images can alleviate much effect due to distorted inputs, and is more practical than re-training.
Tasks
Published 2017-01-08
URL http://arxiv.org/abs/1701.01924v1
PDF http://arxiv.org/pdf/1701.01924v1.pdf
PWC https://paperswithcode.com/paper/on-classification-of-distorted-images-with
Repo
Framework

Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation

Title Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation
Authors Konstantinos Kamnitsas, Wenjia Bai, Enzo Ferrante, Steven McDonagh, Matthew Sinclair, Nick Pawlowski, Martin Rajchl, Matthew Lee, Bernhard Kainz, Daniel Rueckert, Ben Glocker
Abstract Deep learning approaches such as convolutional neural nets have consistently outperformed previous methods on challenging tasks such as dense, semantic segmentation. However, the various proposed networks perform differently, with behaviour largely influenced by architectural choices and training settings. This paper explores Ensembles of Multiple Models and Architectures (EMMA) for robust performance through aggregation of predictions from a wide range of methods. The approach reduces the influence of the meta-parameters of individual models and the risk of overfitting the configuration to a particular database. EMMA can be seen as an unbiased, generic deep learning model which is shown to yield excellent performance, winning the first position in the BRATS 2017 competition among 50+ participating teams.
Tasks Semantic Segmentation
Published 2017-11-04
URL http://arxiv.org/abs/1711.01468v1
PDF http://arxiv.org/pdf/1711.01468v1.pdf
PWC https://paperswithcode.com/paper/ensembles-of-multiple-models-and
Repo
Framework

Rank Persistence: Assessing the Temporal Performance of Real-World Person Re-Identification

Title Rank Persistence: Assessing the Temporal Performance of Real-World Person Re-Identification
Authors Srikrishna Karanam, Eric Lam, Richard J. Radke
Abstract Designing useful person re-identification systems for real-world applications requires attention to operational aspects not typically considered in academic research. Here, we focus on the temporal aspect of re-identification; that is, instead of finding a match to a probe person of interest in a fixed candidate gallery, we consider the more realistic scenario in which the gallery is continuously populated by new candidates over a long time period. A key question of interest for an operator of such a system is: how long is a correct match to a probe likely to remain in a rank-k shortlist of possible candidates? We propose to distill this information into a Rank Persistence Curve (RPC), which allows different algorithms’ temporal performance characteristics to be directly compared. We present examples to illustrate the RPC using a new long-term dataset with multiple candidate reappearances, and discuss considerations for future re-identification research that explicitly involves temporal aspects.
Tasks Person Re-Identification
Published 2017-06-02
URL http://arxiv.org/abs/1706.00553v2
PDF http://arxiv.org/pdf/1706.00553v2.pdf
PWC https://paperswithcode.com/paper/rank-persistence-assessing-the-temporal
Repo
Framework
comments powered by Disqus