Paper Group ANR 518
Image Identification Using SIFT Algorithm: Performance Analysis against Different Image Deformations. A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition. Joint Learning from Earth Observation and OpenStreetMap Data to Get Faster Better Semantic Maps. Beyond Log-concavity: Provable Guarant …
Image Identification Using SIFT Algorithm: Performance Analysis against Different Image Deformations
Title | Image Identification Using SIFT Algorithm: Performance Analysis against Different Image Deformations |
Authors | Ebrahim Karami, Mohamed Shehata, Andrew Smith |
Abstract | Image identification is one of the most challenging tasks in different areas of computer vision. Scale-invariant feature transform is an algorithm to detect and describe local features in images to further use them as an image matching criteria. In this paper, the performance of the SIFT matching algorithm against various image distortions such as rotation, scaling, fisheye and motion distortion are evaluated and false and true positive rates for a large number of image pairs are calculated and presented. We also evaluate the distribution of the matched keypoint orientation difference for each image deformation. |
Tasks | |
Published | 2017-10-07 |
URL | http://arxiv.org/abs/1710.02728v2 |
http://arxiv.org/pdf/1710.02728v2.pdf | |
PWC | https://paperswithcode.com/paper/image-identification-using-sift-algorithm |
Repo | |
Framework | |
A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition
Title | A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition |
Authors | Cheng Zhao, Li Sun, Rustam Stolkin |
Abstract | This paper addresses the problem of simultaneous 3D reconstruction and material recognition and segmentation. Enabling robots to recognise different materials (concrete, metal etc.) in a scene is important for many tasks, e.g. robotic interventions in nuclear decommissioning. Previous work on 3D semantic reconstruction has predominantly focused on recognition of everyday domestic objects (tables, chairs etc.), whereas previous work on material recognition has largely been confined to single 2D images without any 3D reconstruction. Meanwhile, most 3D semantic reconstruction methods rely on computationally expensive post-processing, using Fully-Connected Conditional Random Fields (CRFs), to achieve consistent segmentations. In contrast, we propose a deep learning method which performs 3D reconstruction while simultaneously recognising different types of materials and labelling them at the pixel level. Unlike previous methods, we propose a fully end-to-end approach, which does not require hand-crafted features or CRF post-processing. Instead, we use only learned features, and the CRF segmentation constraints are incorporated inside the fully end-to-end learned system. We present the results of experiments, in which we trained our system to perform real-time 3D semantic reconstruction for 23 different materials in a real-world application. The run-time performance of the system can be boosted to around 10Hz, using a conventional GPU, which is enough to achieve real-time semantic reconstruction using a 30fps RGB-D camera. To the best of our knowledge, this work is the first real-time end-to-end system for simultaneous 3D reconstruction and material recognition. |
Tasks | 3D Reconstruction, Material Recognition |
Published | 2017-03-14 |
URL | http://arxiv.org/abs/1703.04699v1 |
http://arxiv.org/pdf/1703.04699v1.pdf | |
PWC | https://paperswithcode.com/paper/a-fully-end-to-end-deep-learning-approach-for |
Repo | |
Framework | |
Joint Learning from Earth Observation and OpenStreetMap Data to Get Faster Better Semantic Maps
Title | Joint Learning from Earth Observation and OpenStreetMap Data to Get Faster Better Semantic Maps |
Authors | Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre |
Abstract | In this work, we investigate the use of OpenStreetMap data for semantic labeling of Earth Observation images. Deep neural networks have been used in the past for remote sensing data classification from various sensors, including multispectral, hyperspectral, SAR and LiDAR data. While OpenStreetMap has already been used as ground truth data for training such networks, this abundant data source remains rarely exploited as an input information layer. In this paper, we study different use cases and deep network architectures to leverage OpenStreetMap data for semantic labeling of aerial and satellite images. Especially , we look into fusion based architectures and coarse-to-fine segmentation to include the OpenStreetMap layer into multispectral-based deep fully convolutional networks. We illustrate how these methods can be successfully used on two public datasets: ISPRS Potsdam and DFC2017. We show that OpenStreetMap data can efficiently be integrated into the vision-based deep learning models and that it significantly improves both the accuracy performance and the convergence speed of the networks. |
Tasks | |
Published | 2017-05-17 |
URL | http://arxiv.org/abs/1705.06057v1 |
http://arxiv.org/pdf/1705.06057v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-learning-from-earth-observation-and |
Repo | |
Framework | |
Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo
Title | Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo |
Authors | Rong Ge, Holden Lee, Andrej Risteski |
Abstract | A key task in Bayesian statistics is sampling from distributions that are only specified up to a partition function (i.e., constant of proportionality). However, without any assumptions, sampling (even approximately) can be #P-hard, and few works have provided “beyond worst-case” guarantees for such settings. For log-concave distributions, classical results going back to Bakry and 'Emery (1985) show that natural continuous-time Markov chains called Langevin diffusions mix in polynomial time. The most salient feature of log-concavity violated in practice is uni-modality: commonly, the distributions we wish to sample from are multi-modal. In the presence of multiple deep and well-separated modes, Langevin diffusion suffers from torpid mixing. We address this problem by combining Langevin diffusion with simulated tempering. The result is a Markov chain that mixes more rapidly by transitioning between different temperatures of the distribution. We analyze this Markov chain for the canonical multi-modal distribution: a mixture of gaussians (of equal variance). The algorithm based on our Markov chain provably samples from distributions that are close to mixtures of gaussians, given access to the gradient of the log-pdf. For the analysis, we use a spectral decomposition theorem for graphs (Gharan and Trevisan, 2014) and a Markov chain decomposition technique (Madras and Randall, 2002). |
Tasks | |
Published | 2017-10-07 |
URL | http://arxiv.org/abs/1710.02736v2 |
http://arxiv.org/pdf/1710.02736v2.pdf | |
PWC | https://paperswithcode.com/paper/beyond-log-concavity-provable-guarantees-for |
Repo | |
Framework | |
Multidimensional Data Tensor Sensing for RF Tomographic Imaging
Title | Multidimensional Data Tensor Sensing for RF Tomographic Imaging |
Authors | Tao Deng, Xiao-Yang Liu, Feng Qian, Anwar Walid |
Abstract | Radio-frequency (RF) tomographic imaging is a promising technique for inferring multi-dimensional physical space by processing RF signals traversed across a region of interest. However, conventional RF tomography schemes are generally based on vector compressed sensing, which ignores the geometric structures of the target spaces and leads to low recovery precision. The recently proposed transform-based tensor model is more appropriate for sensory data processing, as it helps exploit the geometric structures of the three-dimensional target and improve the recovery precision. In this paper, we propose a novel tensor sensing approach that achieves highly accurate estimation for real-world three-dimensional spaces. First, we use the transform-based tensor model to formulate a tensor sensing problem, and propose a fast alternating minimization algorithm called Alt-Min. Secondly, we drive an algorithm which is optimized to reduce memory and computation requirements. Finally, we present evaluation of our Alt-Min approach using IKEA 3D data and demonstrate significant improvement in recovery error and convergence speed compared to prior tensor-based compressed sensing. |
Tasks | |
Published | 2017-12-13 |
URL | http://arxiv.org/abs/1712.04919v2 |
http://arxiv.org/pdf/1712.04919v2.pdf | |
PWC | https://paperswithcode.com/paper/multidimensional-data-tensor-sensing-for-rf |
Repo | |
Framework | |
Towards Automated Cadastral Boundary Delineation from UAV Data
Title | Towards Automated Cadastral Boundary Delineation from UAV Data |
Authors | Sophie Crommelinck, Michael Ying Yang, Mila Koeva, Markus Gerke, Rohan Bennett, George Vosselman |
Abstract | Unmanned aerial vehicles (UAV) are evolving as an alternative tool to acquire land tenure data. UAVs can capture geospatial data at high quality and resolution in a cost-effective, transparent and flexible manner, from which visible land parcel boundaries, i.e., cadastral boundaries are delineable. This delineation is to no extent automated, even though physical objects automatically retrievable through image analysis methods mark a large portion of cadastral boundaries. This study proposes (i) a workflow that automatically extracts candidate cadastral boundaries from UAV orthoimages and (ii) a tool for their semi-automatic processing to delineate final cadastral boundaries. The workflow consists of two state-of-the-art computer vision methods, namely gPb contour detection and SLIC superpixels that are transferred to remote sensing in this study. The tool combines the two methods, allows a semi-automatic final delineation and is implemented as a publicly available QGIS plugin. The approach does not yet aim to provide a comparable alternative to manual cadastral mapping procedures. However, the methodological development of the tool towards this goal is developed in this paper. A study with 13 volunteers investigates the design and implementation of the approach and gathers initial qualitative as well as quantitate results. The study revealed points for improvement, which are prioritized based on the study results and which will be addressed in future work. |
Tasks | Contour Detection |
Published | 2017-09-06 |
URL | http://arxiv.org/abs/1709.01813v1 |
http://arxiv.org/pdf/1709.01813v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-automated-cadastral-boundary |
Repo | |
Framework | |
$k$-means as a variational EM approximation of Gaussian mixture models
Title | $k$-means as a variational EM approximation of Gaussian mixture models |
Authors | Jörg Lücke, Dennis Forster |
Abstract | We show that $k$-means (Lloyd’s algorithm) is obtained as a special case when truncated variational EM approximations are applied to Gaussian Mixture Models (GMM) with isotropic Gaussians. In contrast to the standard way to relate $k$-means and GMMs, the provided derivation shows that it is not required to consider Gaussians with small variances or the limit case of zero variances. There are a number of consequences that directly follow from our approach: (A) $k$-means can be shown to increase a free energy associated with truncated distributions and this free energy can directly be reformulated in terms of the $k$-means objective; (B) $k$-means generalizations can directly be derived by considering the 2nd closest, 3rd closest etc. cluster in addition to just the closest one; and (C) the embedding of $k$-means into a free energy framework allows for theoretical interpretations of other $k$-means generalizations in the literature. In general, truncated variational EM provides a natural and rigorous quantitative link between $k$-means-like clustering and GMM clustering algorithms which may be very relevant for future theoretical and empirical studies. |
Tasks | |
Published | 2017-04-16 |
URL | https://arxiv.org/abs/1704.04812v5 |
https://arxiv.org/pdf/1704.04812v5.pdf | |
PWC | https://paperswithcode.com/paper/k-means-as-a-variational-em-approximation-of |
Repo | |
Framework | |
Image Registration for the Alignment of Digitized Historical Documents
Title | Image Registration for the Alignment of Digitized Historical Documents |
Authors | AmirAbbas Davari, Tobias Lindenberger, Armin Häberle, Vincent Christlein, Andreas Maier, Christian Riess |
Abstract | In this work, we conducted a survey on different registration algorithms and investigated their suitability for hyperspectral historical image registration applications. After the evaluation of different algorithms, we choose an intensity based registration algorithm with a curved transformation model. For the transformation model, we select cubic B-splines since they should be capable to cope with all non-rigid deformations in our hyperspectral images. From a number of similarity measures, we found that residual complexity and localized mutual information are well suited for the task at hand. In our evaluation, both measures show an acceptable performance in handling all difficulties, e.g., capture range, non-stationary and spatially varying intensity distortions or multi-modality that occur in our application. |
Tasks | Image Registration |
Published | 2017-12-12 |
URL | http://arxiv.org/abs/1712.04482v1 |
http://arxiv.org/pdf/1712.04482v1.pdf | |
PWC | https://paperswithcode.com/paper/image-registration-for-the-alignment-of |
Repo | |
Framework | |
Visually Aligned Word Embeddings for Improving Zero-shot Learning
Title | Visually Aligned Word Embeddings for Improving Zero-shot Learning |
Authors | Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, Anton van den Hengel |
Abstract | Zero-shot learning (ZSL) highly depends on a good semantic embedding to connect the seen and unseen classes. Recently, distributed word embeddings (DWE) pre-trained from large text corpus have become a popular choice to draw such a connection. Compared with human defined attributes, DWEs are more scalable and easier to obtain. However, they are designed to reflect semantic similarity rather than visual similarity and thus using them in ZSL often leads to inferior performance. To overcome this visual-semantic discrepancy, this work proposes an objective function to re-align the distributed word embeddings with visual information by learning a neural network to map it into a new representation called visually aligned word embedding (VAWE). Thus the neighbourhood structure of VAWEs becomes similar to that in the visual domain. Note that in this work we do not design a ZSL method that projects the visual features and semantic embeddings onto a shared space but just impose a requirement on the structure of the mapped word embeddings. This strategy allows the learned VAWE to generalize to various ZSL methods and visual features. As evaluated via four state-of-the-art ZSL methods on four benchmark datasets, the VAWE exhibit consistent performance improvement. |
Tasks | Semantic Similarity, Semantic Textual Similarity, Word Embeddings, Zero-Shot Learning |
Published | 2017-07-18 |
URL | http://arxiv.org/abs/1707.05427v1 |
http://arxiv.org/pdf/1707.05427v1.pdf | |
PWC | https://paperswithcode.com/paper/visually-aligned-word-embeddings-for |
Repo | |
Framework | |
You Are How You Walk: Uncooperative MoCap Gait Identification for Video Surveillance with Incomplete and Noisy Data
Title | You Are How You Walk: Uncooperative MoCap Gait Identification for Video Surveillance with Incomplete and Noisy Data |
Authors | Michal Balazia, Petr Sojka |
Abstract | This work offers a design of a video surveillance system based on a soft biometric – gait identification from MoCap data. The main focus is on two substantial issues of the video surveillance scenario: (1) the walkers do not cooperate in providing learning data to establish their identities and (2) the data are often noisy or incomplete. We show that only a few examples of human gait cycles are required to learn a projection of raw MoCap data onto a low-dimensional sub-space where the identities are well separable. Latent features learned by Maximum Margin Criterion (MMC) method discriminate better than any collection of geometric features. The MMC method is also highly robust to noisy data and works properly even with only a fraction of joints tracked. The overall workflow of the design is directly applicable for a day-to-day operation based on the available MoCap technology and algorithms for gait analysis. In the concept we introduce, a walker’s identity is represented by a cluster of gait data collected at their incidents within the surveillance system: They are how they walk. |
Tasks | Gait Identification |
Published | 2017-06-28 |
URL | http://arxiv.org/abs/1706.09443v2 |
http://arxiv.org/pdf/1706.09443v2.pdf | |
PWC | https://paperswithcode.com/paper/you-are-how-you-walk-uncooperative-mocap-gait |
Repo | |
Framework | |
Pose-Normalized Image Generation for Person Re-identification
Title | Pose-Normalized Image Generation for Person Re-identification |
Authors | Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, Xiangyang Xue |
Abstract | Person Re-identification (re-id) faces two major challenges: the lack of cross-view paired training data and learning discriminative identity-sensitive and view-invariant features in the presence of large pose variations. In this work, we address both problems by proposing a novel deep person image generation model for synthesizing realistic person images conditional on the pose. The model is based on a generative adversarial network (GAN) designed specifically for pose normalization in re-id, thus termed pose-normalization GAN (PN-GAN). With the synthesized images, we can learn a new type of deep re-id feature free of the influence of pose variations. We show that this feature is strong on its own and complementary to features learned with the original images. Importantly, under the transfer learning setting, we show that our model generalizes well to any new re-id dataset without the need for collecting any training data for model fine-tuning. The model thus has the potential to make re-id model truly scalable. |
Tasks | Image Generation, Person Re-Identification, Transfer Learning |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02225v6 |
http://arxiv.org/pdf/1712.02225v6.pdf | |
PWC | https://paperswithcode.com/paper/pose-normalized-image-generation-for-person |
Repo | |
Framework | |
Automatic Extraction of Commonsense LocatedNear Knowledge
Title | Automatic Extraction of Commonsense LocatedNear Knowledge |
Authors | Frank F. Xu, Bill Yuchen Lin, Kenny Q. Zhu |
Abstract | LocatedNear relation is a kind of commonsense knowledge describing two physical objects that are typically found near each other in real life. In this paper, we study how to automatically extract such relationship through a sentence-level relation classifier and aggregating the scores of entity pairs from a large corpus. Also, we release two benchmark datasets for evaluation and future research. |
Tasks | |
Published | 2017-11-11 |
URL | http://arxiv.org/abs/1711.04204v3 |
http://arxiv.org/pdf/1711.04204v3.pdf | |
PWC | https://paperswithcode.com/paper/automatic-extraction-of-commonsense |
Repo | |
Framework | |
Multiplex model of mental lexicon reveals explosive learning in humans
Title | Multiplex model of mental lexicon reveals explosive learning in humans |
Authors | Massimo Stella, Nicole M. Beckage, Markus Brede, Manlio De Domenico |
Abstract | Word similarities affect language acquisition and use in a multi-relational way barely accounted for in the literature. We propose a multiplex network representation of this mental lexicon of word similarities as a natural framework for investigating large-scale cognitive patterns. Our representation accounts for semantic, taxonomic, and phonological interactions and it identifies a cluster of words which are used with greater frequency, are identified, memorised, and learned more easily, and have more meanings than expected at random. This cluster emerges around age 7 through an explosive transition not reproduced by null models. We relate this explosive emergence to polysemy – redundancy in word meanings. Results indicate that the word cluster acts as a core for the lexicon, increasing both lexical navigability and robustness to linguistic degradation. Our findings provide quantitative confirmation of existing conjectures about core structure in the mental lexicon and the importance of integrating multi-relational word-word interactions in psycholinguistic frameworks. |
Tasks | Language Acquisition |
Published | 2017-05-26 |
URL | http://arxiv.org/abs/1705.09731v3 |
http://arxiv.org/pdf/1705.09731v3.pdf | |
PWC | https://paperswithcode.com/paper/multiplex-model-of-mental-lexicon-reveals |
Repo | |
Framework | |
A Pig, an Angel and a Cactus Walk Into a Blender: A Descriptive Approach to Visual Blending
Title | A Pig, an Angel and a Cactus Walk Into a Blender: A Descriptive Approach to Visual Blending |
Authors | João M. Cunha, João Gonçalves, Pedro Martins, Penousal Machado, Amílcar Cardoso |
Abstract | A descriptive approach for automatic generation of visual blends is presented. The implemented system, the Blender, is composed of two components: the Mapper and the Visual Blender. The approach uses structured visual representations along with sets of visual relations which describe how the elements (in which the visual representation can be decomposed) relate among each other. Our system is a hybrid blender, as the blending process starts at the Mapper (conceptual level) and ends at the Visual Blender (visual representation level). The experimental results show that the Blender is able to create analogies from input mental spaces and produce well-composed blends, which follow the rules imposed by its base-analogy and its relations. The resulting blends are visually interesting and some can be considered as unexpected. |
Tasks | |
Published | 2017-06-27 |
URL | http://arxiv.org/abs/1706.09076v3 |
http://arxiv.org/pdf/1706.09076v3.pdf | |
PWC | https://paperswithcode.com/paper/a-pig-an-angel-and-a-cactus-walk-into-a |
Repo | |
Framework | |
Brief Notes on Hard Takeoff, Value Alignment, and Coherent Extrapolated Volition
Title | Brief Notes on Hard Takeoff, Value Alignment, and Coherent Extrapolated Volition |
Authors | Gopal P. Sarma |
Abstract | I make some basic observations about hard takeoff, value alignment, and coherent extrapolated volition, concepts which have been central in analyses of superintelligent AI systems. |
Tasks | |
Published | 2017-04-03 |
URL | http://arxiv.org/abs/1704.00783v2 |
http://arxiv.org/pdf/1704.00783v2.pdf | |
PWC | https://paperswithcode.com/paper/brief-notes-on-hard-takeoff-value-alignment |
Repo | |
Framework | |