Paper Group ANR 165
![Paper Group ANR 165](/2017/images/pwc/paper-arxiv_hu144ec288a26b3e360d673e256787de3e_28623_900x500_fit_q75_box.jpg)
Dense RGB-D semantic mapping with Pixel-Voxel neural network. A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions. Using KL-divergence to focus Deep Visual Explanation. Learning Overcomplete HMMs. Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets. Towards …
Dense RGB-D semantic mapping with Pixel-Voxel neural network
Title | Dense RGB-D semantic mapping with Pixel-Voxel neural network |
Authors | Cheng Zhao, Li Sun, Pulak Purkait, Rustam Stolkin |
Abstract | For intelligent robotics applications, extending 3D mapping to 3D semantic mapping enables robots to, not only localize themselves with respect to the scene’s geometrical features but also simultaneously understand the higher level meaning of the scene contexts. Most previous methods focus on geometric 3D reconstruction and scene understanding independently notwithstanding the fact that joint estimation can boost the accuracy of the semantic mapping. In this paper, a dense RGB-D semantic mapping system with a Pixel-Voxel network is proposed, which can perform dense 3D mapping while simultaneously recognizing and semantically labelling each point in the 3D map. The proposed Pixel-Voxel network obtains global context information by using PixelNet to exploit the RGB image and meanwhile, preserves accurate local shape information by using VoxelNet to exploit the corresponding 3D point cloud. Unlike the existing architecture that fuses score maps from different models with equal weights, we proposed a Softmax weighted fusion stack that adaptively learns the varying contributions of PixelNet and VoxelNet, and fuses the score maps of the two models according to their respective confidence levels. The proposed Pixel-Voxel network achieves the state-of-the-art semantic segmentation performance on the SUN RGB-D benchmark dataset. The runtime of the proposed system can be boosted to 11-12Hz, enabling near to real-time performance using an i7 8-cores PC with Titan X GPU. |
Tasks | 3D Reconstruction, Scene Understanding, Semantic Segmentation |
Published | 2017-09-30 |
URL | http://arxiv.org/abs/1710.00132v3 |
http://arxiv.org/pdf/1710.00132v3.pdf | |
PWC | https://paperswithcode.com/paper/dense-rgb-d-semantic-mapping-with-pixel-voxel |
Repo | |
Framework | |
A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions
Title | A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions |
Authors | Milind S. Gide, Lina J. Karam |
Abstract | With the increased focus on visual attention (VA) in the last decade, a large number of computational visual saliency methods have been developed over the past few years. These models are traditionally evaluated by using performance evaluation metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though a considerable number of such metrics have been proposed in the literature, there are notable problems in them. In this work, we discuss shortcomings in existing metrics through illustrative examples and propose a new metric that uses local weights based on fixation density which overcomes these flaws. To compare the performance of our proposed metric at assessing the quality of saliency prediction with other existing metrics, we construct a ground-truth subjective database in which saliency maps obtained from 17 different VA models are evaluated by 16 human observers on a 5-point categorical scale in terms of their visual resemblance with corresponding ground-truth fixation density maps obtained from eye-tracking data. The metrics are evaluated by correlating metric scores with the human subjective ratings. The correlation results show that the proposed evaluation metric outperforms all other popular existing metrics. Additionally, the constructed database and corresponding subjective ratings provide an insight into which of the existing metrics and future metrics are better at estimating the quality of saliency prediction and can be used as a benchmark. |
Tasks | Eye Tracking, Saliency Prediction |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00169v1 |
http://arxiv.org/pdf/1708.00169v1.pdf | |
PWC | https://paperswithcode.com/paper/a-locally-weighted-fixation-density-based |
Repo | |
Framework | |
Using KL-divergence to focus Deep Visual Explanation
Title | Using KL-divergence to focus Deep Visual Explanation |
Authors | Housam Khalifa Bashier Babiker, Randy Goebel |
Abstract | We present a method for explaining the image classification predictions of deep convolution neural networks, by highlighting the pixels in the image which influence the final class prediction. Our method requires the identification of a heuristic method to select parameters hypothesized to be most relevant in this prediction, and here we use Kullback-Leibler divergence to provide this focus. Overall, our approach helps in understanding and interpreting deep network predictions and we hope contributes to a foundation for such understanding of deep learning networks. In this brief paper, our experiments evaluate the performance of two popular networks in this context of interpretability. |
Tasks | Image Classification |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06431v2 |
http://arxiv.org/pdf/1711.06431v2.pdf | |
PWC | https://paperswithcode.com/paper/using-kl-divergence-to-focus-deep-visual |
Repo | |
Framework | |
Learning Overcomplete HMMs
Title | Learning Overcomplete HMMs |
Authors | Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant |
Abstract | We study the problem of learning overcomplete HMMs—those that have many hidden states but a small output alphabet. Despite having significant practical importance, such HMMs are poorly understood with no known positive or negative results for efficient learning. In this paper, we present several new results—both positive and negative—which help define the boundaries between the tractable and intractable settings. Specifically, we show positive results for a large subclass of HMMs whose transition matrices are sparse, well-conditioned, and have small probability mass on short cycles. On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree. We also discuss these results in the context of learning HMMs which can capture long-term dependencies. |
Tasks | |
Published | 2017-11-07 |
URL | http://arxiv.org/abs/1711.02309v2 |
http://arxiv.org/pdf/1711.02309v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-overcomplete-hmms |
Repo | |
Framework | |
Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets
Title | Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets |
Authors | Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, Mubarak Shah |
Abstract | In this paper, a unified three-layer hierarchical approach for solving tracking problems in multiple non-overlapping cameras is proposed. Given a video and a set of detections (obtained by any person detector), we first solve within-camera tracking employing the first two layers of our framework and, then, in the third layer, we solve across-camera tracking by merging tracks of the same person in all cameras in a simultaneous fashion. To best serve our purpose, a constrained dominant sets clustering (CDSC) technique, a parametrized version of standard quadratic optimization, is employed to solve both tracking tasks. The tracking problem is caste as finding constrained dominant sets from a graph. In addition to having a unified framework that simultaneously solves within- and across-camera tracking, the third layer helps link broken tracks of the same person occurring during within-camera tracking. In this work, we propose a fast algorithm, based on dynamics from evolutionary game theory, which is efficient and salable to large-scale real-world applications. |
Tasks | |
Published | 2017-06-19 |
URL | http://arxiv.org/abs/1706.06196v1 |
http://arxiv.org/pdf/1706.06196v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-target-tracking-in-multiple-non |
Repo | |
Framework | |
Towards reduction of autocorrelation in HMC by machine learning
Title | Towards reduction of autocorrelation in HMC by machine learning |
Authors | Akinori Tanaka, Akio Tomiya |
Abstract | In this paper we propose new algorithm to reduce autocorrelation in Markov chain Monte-Carlo algorithms for euclidean field theories on the lattice. Our proposing algorithm is the Hybrid Monte-Carlo algorithm (HMC) with restricted Boltzmann machine. We examine the validity of the algorithm by employing the phi-fourth theory in three dimension. We observe reduction of the autocorrelation both in symmetric and broken phase as well. Our proposing algorithm provides consistent central values of expectation values of the action density and one-point Green’s function with ones from the original HMC in both the symmetric phase and broken phase within the statistical error. On the other hand, two-point Green’s functions have slight difference between one calculated by the HMC and one by our proposing algorithm in the symmetric phase. Furthermore, near the criticality, the distribution of the one-point Green’s function differs from the one from HMC. We discuss the origin of discrepancies and its improvement. |
Tasks | |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.03893v1 |
http://arxiv.org/pdf/1712.03893v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-reduction-of-autocorrelation-in-hmc |
Repo | |
Framework | |
Joint Probabilistic Linear Discriminant Analysis
Title | Joint Probabilistic Linear Discriminant Analysis |
Authors | Luciana Ferrer |
Abstract | Standard probabilistic linear discriminant analysis (PLDA) for speaker recognition assumes that the sample’s features (usually, i-vectors) are given by a sum of three terms: a term that depends on the speaker identity, a term that models the within-speaker variability and is assumed independent across samples, and a final term that models any remaining variability and is also independent across samples. In this work, we propose a generalization of this model where the within-speaker variability is not necessarily assumed independent across samples but dependent on another discrete variable. This variable, which we call the channel variable as in the standard PLDA approach, could be, for example, a discrete category for the channel characteristics, the language spoken by the speaker, the type of speech in the sample (conversational, monologue, read), etc. The value of this variable is assumed to be known during training but not during testing. Scoring is performed, as in standard PLDA, by computing a likelihood ratio between the null hypothesis that the two sides of a trial belong to the same speaker versus the alternative hypothesis that the two sides belong to different speakers. The two likelihoods are computed by marginalizing over two hypothesis about the channels in both sides of a trial: that they are the same and that they are different. This way, we expect that the new model will be better at coping with same-channel versus different-channel trials than standard PLDA, since knowledge about the channel (or language, or speech style) is used during training and implicitly considered during scoring. |
Tasks | Speaker Recognition |
Published | 2017-04-07 |
URL | http://arxiv.org/abs/1704.02346v2 |
http://arxiv.org/pdf/1704.02346v2.pdf | |
PWC | https://paperswithcode.com/paper/joint-probabilistic-linear-discriminant |
Repo | |
Framework | |
Recover Missing Sensor Data with Iterative Imputing Network
Title | Recover Missing Sensor Data with Iterative Imputing Network |
Authors | Jingguang Zhou, Zili Huang |
Abstract | Sensor data has been playing an important role in machine learning tasks, complementary to the human-annotated data that is usually rather costly. However, due to systematic or accidental mis-operations, sensor data comes very often with a variety of missing values, resulting in considerable difficulties in the follow-up analysis and visualization. Previous work imputes the missing values by interpolating in the observational feature space, without consulting any latent (hidden) dynamics. In contrast, our model captures the latent complex temporal dynamics by summarizing each observation’s context with a novel Iterative Imputing Network, thus significantly outperforms previous work on the benchmark Beijing air quality and meteorological dataset. Our model also yields consistent superiority over other methods in cases of different missing rates. |
Tasks | |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07878v1 |
http://arxiv.org/pdf/1711.07878v1.pdf | |
PWC | https://paperswithcode.com/paper/recover-missing-sensor-data-with-iterative |
Repo | |
Framework | |
Text Summarization Techniques: A Brief Survey
Title | Text Summarization Techniques: A Brief Survey |
Authors | Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, Krys Kochut |
Abstract | In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods. |
Tasks | Text Summarization |
Published | 2017-07-07 |
URL | http://arxiv.org/abs/1707.02268v3 |
http://arxiv.org/pdf/1707.02268v3.pdf | |
PWC | https://paperswithcode.com/paper/text-summarization-techniques-a-brief-survey |
Repo | |
Framework | |
Deep Learning Diffuse Optical Tomography
Title | Deep Learning Diffuse Optical Tomography |
Authors | Jaejun Yoo, Sohail Sabir, Duchang Heo, Kee Hyun Kim, Abdul Wahab, Yoonseok Choi, Seul-I Lee, Eun Young Chae, Hak Hee Kim, Young Min Bae, Young-wook Choi, Seungryong Cho, Jong Chul Ye |
Abstract | Diffuse optical tomography (DOT) has been investigated as an alternative imaging modality for breast cancer detection thanks to its excellent contrast to hemoglobin oxidization level. However, due to the complicated non-linear photon scattering physics and ill-posedness, the conventional reconstruction algorithms are sensitive to imaging parameters such as boundary conditions. To address this, here we propose a novel deep learning approach that learns non-linear photon scattering physics and obtains an accurate three dimensional (3D) distribution of optical anomalies. In contrast to the traditional black-box deep learning approaches, our deep network is designed to invert the Lippman-Schwinger integral equation using the recent mathematical theory of deep convolutional framelets. As an example of clinical relevance, we applied the method to our prototype DOT system. We show that our deep neural network, trained with only simulation data, can accurately recover the location of anomalies within biomimetic phantoms and live animals without the use of an exogenous contrast agent. |
Tasks | Breast Cancer Detection |
Published | 2017-12-04 |
URL | https://arxiv.org/abs/1712.00912v2 |
https://arxiv.org/pdf/1712.00912v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-can-reverse-photon-migration |
Repo | |
Framework | |
A Survey of Deep Learning Methods for Relation Extraction
Title | A Survey of Deep Learning Methods for Relation Extraction |
Authors | Shantanu Kumar |
Abstract | Relation Extraction is an important sub-task of Information Extraction which has the potential of employing deep learning (DL) models with the creation of large datasets using distant supervision. In this review, we compare the contributions and pitfalls of the various DL models that have been used for the task, to help guide the path ahead. |
Tasks | Relation Extraction |
Published | 2017-05-10 |
URL | http://arxiv.org/abs/1705.03645v1 |
http://arxiv.org/pdf/1705.03645v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-deep-learning-methods-for |
Repo | |
Framework | |
Accelerating Discrete Wavelet Transforms on GPUs
Title | Accelerating Discrete Wavelet Transforms on GPUs |
Authors | David Barina, Michal Kula, Michal Matysek, Pavel Zemcik |
Abstract | The two-dimensional discrete wavelet transform has a huge number of applications in image-processing techniques. Until now, several papers compared the performance of such transform on graphics processing units (GPUs). However, all of them only dealt with lifting and convolution computation schemes. In this paper, we show that corresponding horizontal and vertical lifting parts of the lifting scheme can be merged into non-separable lifting units, which halves the number of steps. We also discuss an optimization strategy leading to a reduction in the number of arithmetic operations. The schemes were assessed using the OpenCL and pixel shaders. The proposed non-separable lifting scheme outperforms the existing schemes in many cases, irrespective of its higher complexity. |
Tasks | |
Published | 2017-05-18 |
URL | http://arxiv.org/abs/1705.08266v1 |
http://arxiv.org/pdf/1705.08266v1.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-discrete-wavelet-transforms-on |
Repo | |
Framework | |
On Classification of Distorted Images with Deep Convolutional Neural Networks
Title | On Classification of Distorted Images with Deep Convolutional Neural Networks |
Authors | Yiren Zhou, Sibo Song, Ngai-Man Cheung |
Abstract | Image blur and image noise are common distortions during image acquisition. In this paper, we systematically study the effect of image distortions on the deep neural network (DNN) image classifiers. First, we examine the DNN classifier performance under four types of distortions. Second, we propose two approaches to alleviate the effect of image distortion: re-training and fine-tuning with noisy images. Our results suggest that, under certain conditions, fine-tuning with noisy images can alleviate much effect due to distorted inputs, and is more practical than re-training. |
Tasks | |
Published | 2017-01-08 |
URL | http://arxiv.org/abs/1701.01924v1 |
http://arxiv.org/pdf/1701.01924v1.pdf | |
PWC | https://paperswithcode.com/paper/on-classification-of-distorted-images-with |
Repo | |
Framework | |
Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation
Title | Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation |
Authors | Konstantinos Kamnitsas, Wenjia Bai, Enzo Ferrante, Steven McDonagh, Matthew Sinclair, Nick Pawlowski, Martin Rajchl, Matthew Lee, Bernhard Kainz, Daniel Rueckert, Ben Glocker |
Abstract | Deep learning approaches such as convolutional neural nets have consistently outperformed previous methods on challenging tasks such as dense, semantic segmentation. However, the various proposed networks perform differently, with behaviour largely influenced by architectural choices and training settings. This paper explores Ensembles of Multiple Models and Architectures (EMMA) for robust performance through aggregation of predictions from a wide range of methods. The approach reduces the influence of the meta-parameters of individual models and the risk of overfitting the configuration to a particular database. EMMA can be seen as an unbiased, generic deep learning model which is shown to yield excellent performance, winning the first position in the BRATS 2017 competition among 50+ participating teams. |
Tasks | Semantic Segmentation |
Published | 2017-11-04 |
URL | http://arxiv.org/abs/1711.01468v1 |
http://arxiv.org/pdf/1711.01468v1.pdf | |
PWC | https://paperswithcode.com/paper/ensembles-of-multiple-models-and |
Repo | |
Framework | |
Rank Persistence: Assessing the Temporal Performance of Real-World Person Re-Identification
Title | Rank Persistence: Assessing the Temporal Performance of Real-World Person Re-Identification |
Authors | Srikrishna Karanam, Eric Lam, Richard J. Radke |
Abstract | Designing useful person re-identification systems for real-world applications requires attention to operational aspects not typically considered in academic research. Here, we focus on the temporal aspect of re-identification; that is, instead of finding a match to a probe person of interest in a fixed candidate gallery, we consider the more realistic scenario in which the gallery is continuously populated by new candidates over a long time period. A key question of interest for an operator of such a system is: how long is a correct match to a probe likely to remain in a rank-k shortlist of possible candidates? We propose to distill this information into a Rank Persistence Curve (RPC), which allows different algorithms’ temporal performance characteristics to be directly compared. We present examples to illustrate the RPC using a new long-term dataset with multiple candidate reappearances, and discuss considerations for future re-identification research that explicitly involves temporal aspects. |
Tasks | Person Re-Identification |
Published | 2017-06-02 |
URL | http://arxiv.org/abs/1706.00553v2 |
http://arxiv.org/pdf/1706.00553v2.pdf | |
PWC | https://paperswithcode.com/paper/rank-persistence-assessing-the-temporal |
Repo | |
Framework | |