Paper Group ANR 303
Graph-Based Classification of Omnidirectional Images. Classification of Aerial Photogrammetric 3D Point Clouds. How hard can it be? Estimating the difficulty of visual search in an image. Improving content marketing processes with the approaches by artificial intelligence. Convolutional Spike Timing Dependent Plasticity based Feature Learning in Sp …
Graph-Based Classification of Omnidirectional Images
Title | Graph-Based Classification of Omnidirectional Images |
Authors | Renata Khasanova, Pascal Frossard |
Abstract | Omnidirectional cameras are widely used in such areas as robotics and virtual reality as they provide a wide field of view. Their images are often processed with classical methods, which might unfortunately lead to non-optimal solutions as these methods are designed for planar images that have different geometrical properties than omnidirectional ones. In this paper we study image classification task by taking into account the specific geometry of omnidirectional cameras with graph-based representations. In particular, we extend deep learning architectures to data on graphs; we propose a principled way of graph construction such that convolutional filters respond similarly for the same pattern on different positions of the image regardless of lens distortions. Our experiments show that the proposed method outperforms current techniques for the omnidirectional image classification problem. |
Tasks | graph construction, Image Classification |
Published | 2017-07-26 |
URL | http://arxiv.org/abs/1707.08301v1 |
http://arxiv.org/pdf/1707.08301v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-based-classification-of-omnidirectional |
Repo | |
Framework | |
Classification of Aerial Photogrammetric 3D Point Clouds
Title | Classification of Aerial Photogrammetric 3D Point Clouds |
Authors | Carlos Becker, Nicolai Häni, Elena Rosinskaya, Emmanuel d’Angelo, Christoph Strecha |
Abstract | We present a powerful method to extract per-point semantic class labels from aerialphotogrammetry data. Labeling this kind of data is important for tasks such as environmental modelling, object classification and scene understanding. Unlike previous point cloud classification methods that rely exclusively on geometric features, we show that incorporating color information yields a significant increase in accuracy in detecting semantic classes. We test our classification method on three real-world photogrammetry datasets that were generated with Pix4Dmapper Pro, and with varying point densities. We show that off-the-shelf machine learning techniques coupled with our new features allow us to train highly accurate classifiers that generalize well to unseen data, processing point clouds containing 10 million points in less than 3 minutes on a desktop computer. |
Tasks | Object Classification, Scene Understanding |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08374v1 |
http://arxiv.org/pdf/1705.08374v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-of-aerial-photogrammetric-3d |
Repo | |
Framework | |
How hard can it be? Estimating the difficulty of visual search in an image
Title | How hard can it be? Estimating the difficulty of visual search in an image |
Authors | Radu Tudor Ionescu, Bogdan Alexe, Marius Leordeanu, Marius Popescu, Dim P. Papadopoulos, Vittorio Ferrari |
Abstract | We address the problem of estimating image difficulty defined as the human response time for solving a visual search task. We collect human annotations of image difficulty for the PASCAL VOC 2012 data set through a crowd-sourcing platform. We then analyze what human interpretable image properties can have an impact on visual search difficulty, and how accurate are those properties for predicting difficulty. Next, we build a regression model based on deep features learned with state of the art convolutional neural networks and show better results for predicting the ground-truth visual search difficulty scores produced by human annotators. Our model is able to correctly rank about 75% image pairs according to their difficulty score. We also show that our difficulty predictor generalizes well to new classes not seen during training. Finally, we demonstrate that our predicted difficulty scores are useful for weakly supervised object localization (8% improvement) and semi-supervised object classification (1% improvement). |
Tasks | Object Classification, Object Localization, Weakly-Supervised Object Localization |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08280v1 |
http://arxiv.org/pdf/1705.08280v1.pdf | |
PWC | https://paperswithcode.com/paper/how-hard-can-it-be-estimating-the-difficulty |
Repo | |
Framework | |
Improving content marketing processes with the approaches by artificial intelligence
Title | Improving content marketing processes with the approaches by artificial intelligence |
Authors | Utku Kose, Selcuk Sert |
Abstract | Content marketing is todays one of the most remarkable approaches in the context of marketing processes of companies. Value of this kind of marketing has improved in time, thanks to the latest developments regarding to computer and communication technologies. Nowadays, especially social media based platforms have a great importance on enabling companies to design multimedia oriented, interactive content. But on the other hand, there is still something more to do for improved content marketing approaches. In this context, objective of this study is to focus on intelligent content marketing, which can be done by using artificial intelligence. Artificial Intelligence is todays one of the most remarkable research fields and it can be used easily as multidisciplinary. So, this study has aimed to discuss about its potential on improving content marketing. In detail, the study has enabled readers to improve their awareness about the intersection point of content marketing and artificial intelligence. Furthermore, the authors have introduced some example models of intelligent content marketing, which can be achieved by using current Web technologies and artificial intelligence techniques. |
Tasks | |
Published | 2017-04-07 |
URL | http://arxiv.org/abs/1704.02114v1 |
http://arxiv.org/pdf/1704.02114v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-content-marketing-processes-with |
Repo | |
Framework | |
Convolutional Spike Timing Dependent Plasticity based Feature Learning in Spiking Neural Networks
Title | Convolutional Spike Timing Dependent Plasticity based Feature Learning in Spiking Neural Networks |
Authors | Priyadarshini Panda, Gopalakrishnan Srinivasan, Kaushik Roy |
Abstract | Brain-inspired learning models attempt to mimic the cortical architecture and computations performed in the neurons and synapses constituting the human brain to achieve its efficiency in cognitive tasks. In this work, we present convolutional spike timing dependent plasticity based feature learning with biologically plausible leaky-integrate-and-fire neurons in Spiking Neural Networks (SNNs). We use shared weight kernels that are trained to encode representative features underlying the input patterns thereby improving the sparsity as well as the robustness of the learning model. We demonstrate that the proposed unsupervised learning methodology learns several visual categories for object recognition with fewer number of examples and outperforms traditional fully-connected SNN architectures while yielding competitive accuracy. Additionally, we observe that the learning model performs out-of-set generalization further making the proposed biologically plausible framework a viable and efficient architecture for future neuromorphic applications. |
Tasks | Object Recognition |
Published | 2017-03-10 |
URL | http://arxiv.org/abs/1703.03854v2 |
http://arxiv.org/pdf/1703.03854v2.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-spike-timing-dependent |
Repo | |
Framework | |
Revisiting Graph Construction for Fast Image Segmentation
Title | Revisiting Graph Construction for Fast Image Segmentation |
Authors | Zizhao Zhang, Fuyong Xing, Hanzi Wang, Yan Yan, Ying Huang, Xiaoshuang Shi, Lin Yang |
Abstract | In this paper, we propose a simple but effective method for fast image segmentation. We re-examine the locality-preserving character of spectral clustering by constructing a graph over image regions with both global and local connections. Our novel approach to build graph connections relies on two key observations: 1) local region pairs that co-occur frequently will have a high probability to reside on a common object; 2) spatially distant regions in a common object often exhibit similar visual saliency, which implies their neighborship in a manifold. We present a novel energy function to efficiently conduct graph partitioning. Based on multiple high quality partitions, we show that the generated eigenvector histogram based representation can automatically drive effective unary potentials for a hierarchical random field model to produce multi-class segmentation. Sufficient experiments, on the BSDS500 benchmark, large-scale PASCAL VOC and COCO datasets, demonstrate the competitive segmentation accuracy and significantly improved efficiency of our proposed method compared with other state of the arts. |
Tasks | graph construction, graph partitioning, Semantic Segmentation |
Published | 2017-02-18 |
URL | http://arxiv.org/abs/1702.05650v2 |
http://arxiv.org/pdf/1702.05650v2.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-graph-construction-for-fast-image |
Repo | |
Framework | |
A Family of Metrics for Clustering Algorithms
Title | A Family of Metrics for Clustering Algorithms |
Authors | Clark Alexander, Sofya Akhmametyeva |
Abstract | We give the motivation for scoring clustering algorithms and a metric $M : A \rightarrow \mathbb{N}$ from the set of clustering algorithms to the natural numbers which we realize as \begin{equation} M(A) = \sum_i \alpha_i f_i - \beta_i^{w_i} \end{equation} where $\alpha_i,\beta_i,w_i$ are parameters used for scoring the feature $f_i$, which is computed empirically.. We give a method by which one can score features such as stability, noise sensitivity, etc and derive the necessary parameters. We conclude by giving a sample set of scores. |
Tasks | |
Published | 2017-07-27 |
URL | http://arxiv.org/abs/1707.08912v1 |
http://arxiv.org/pdf/1707.08912v1.pdf | |
PWC | https://paperswithcode.com/paper/a-family-of-metrics-for-clustering-algorithms |
Repo | |
Framework | |
Deep Generalized Canonical Correlation Analysis
Title | Deep Generalized Canonical Correlation Analysis |
Authors | Adrian Benton, Huda Khayrallah, Biman Gujral, Dee Ann Reisinger, Sheng Zhang, Raman Arora |
Abstract | We present Deep Generalized Canonical Correlation Analysis (DGCCA) – a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other. While methods for nonlinear two-view representation learning (Deep CCA, (Andrew et al., 2013)) and linear many-view representation learning (Generalized CCA (Horst, 1961)) exist, DGCCA is the first CCA-style multiview representation learning technique that combines the flexibility of nonlinear (deep) representation learning with the statistical power of incorporating information from many independent sources, or views. We present the DGCCA formulation as well as an efficient stochastic optimization algorithm for solving it. We learn DGCCA representations on two distinct datasets for three downstream tasks: phonetic transcription from acoustic and articulatory measurements, and recommending hashtags and friends on a dataset of Twitter users. We find that DGCCA representations soundly beat existing methods at phonetic transcription and hashtag recommendation, and in general perform no worse than standard linear many-view techniques. |
Tasks | Representation Learning, Stochastic Optimization |
Published | 2017-02-08 |
URL | http://arxiv.org/abs/1702.02519v2 |
http://arxiv.org/pdf/1702.02519v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-generalized-canonical-correlation |
Repo | |
Framework | |
Learning-based Ensemble Average Propagator Estimation
Title | Learning-based Ensemble Average Propagator Estimation |
Authors | Chuyang Ye |
Abstract | By capturing the anisotropic water diffusion in tissue, diffusion magnetic resonance imaging (dMRI) provides a unique tool for noninvasively probing the tissue microstructure and orientation in the human brain. The diffusion profile can be described by the ensemble average propagator (EAP), which is inferred from observed diffusion signals. However, accurate EAP estimation using the number of diffusion gradients that is clinically practical can be challenging. In this work, we propose a deep learning algorithm for EAP estimation, which is named learning-based ensemble average propagator estimation (LEAPE). The EAP is commonly represented by a basis and its associated coefficients, and here we choose the SHORE basis and design a deep network to estimate the coefficients. The network comprises two cascaded components. The first component is a multiple layer perceptron (MLP) that simultaneously predicts the unknown coefficients. However, typical training loss functions, such as mean squared errors, may not properly represent the geometry of the possibly non-Euclidean space of the coefficients, which in particular causes problems for the extraction of directional information from the EAP. Therefore, to regularize the training, in the second component we compute an auxiliary output of approximated fiber orientation (FO) errors with the aid of a second MLP that is trained separately. We performed experiments using dMRI data that resemble clinically achievable $q$-space sampling, and observed promising results compared with the conventional EAP estimation method. |
Tasks | |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06258v1 |
http://arxiv.org/pdf/1706.06258v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-based-ensemble-average-propagator |
Repo | |
Framework | |
A Spacetime Approach to Generalized Cognitive Reasoning in Multi-scale Learning
Title | A Spacetime Approach to Generalized Cognitive Reasoning in Multi-scale Learning |
Authors | Mark Burgess |
Abstract | In modern machine learning, pattern recognition replaces realtime semantic reasoning. The mapping from input to output is learned with fixed semantics by training outcomes deliberately. This is an expensive and static approach which depends heavily on the availability of a very particular kind of prior raining data to make inferences in a single step. Conventional semantic network approaches, on the other hand, base multi-step reasoning on modal logics and handcrafted ontologies, which are ad hoc, expensive to construct, and fragile to inconsistency. Both approaches may be enhanced by a hybrid approach, which completely separates reasoning from pattern recognition. In this report, a quasi-linguistic approach to knowledge representation is discussed, motivated by spacetime structure. Tokenized patterns from diverse sources are integrated to build a lightly constrained and approximately scale-free network. This is then be parsed with very simple recursive algorithms to generate `brainstorming’ sets of reasoned knowledge. | |
Tasks | |
Published | 2017-02-12 |
URL | http://arxiv.org/abs/1702.04638v2 |
http://arxiv.org/pdf/1702.04638v2.pdf | |
PWC | https://paperswithcode.com/paper/a-spacetime-approach-to-generalized-cognitive |
Repo | |
Framework | |
GPLAC: Generalizing Vision-Based Robotic Skills using Weakly Labeled Images
Title | GPLAC: Generalizing Vision-Based Robotic Skills using Weakly Labeled Images |
Authors | Avi Singh, Larry Yang, Sergey Levine |
Abstract | We tackle the problem of learning robotic sensorimotor control policies that can generalize to visually diverse and unseen environments. Achieving broad generalization typically requires large datasets, which are difficult to obtain for task-specific interactive processes such as reinforcement learning or learning from demonstration. However, much of the visual diversity in the world can be captured through passively collected datasets of images or videos. In our method, which we refer to as GPLAC (Generalized Policy Learning with Attentional Classifier), we use both interaction data and weakly labeled image data to augment the generalization capacity of sensorimotor policies. Our method combines multitask learning on action selection and an auxiliary binary classification objective, together with a convolutional neural network architecture that uses an attentional mechanism to avoid distractors. We show that pairing interaction data from just a single environment with a diverse dataset of weakly labeled data results in greatly improved generalization to unseen environments, and show that this generalization depends on both the auxiliary objective and the attentional architecture that we propose. We demonstrate our results in both simulation and on a real robotic manipulator, and demonstrate substantial improvement over standard convolutional architectures and domain adaptation methods. |
Tasks | Domain Adaptation |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02313v1 |
http://arxiv.org/pdf/1708.02313v1.pdf | |
PWC | https://paperswithcode.com/paper/gplac-generalizing-vision-based-robotic |
Repo | |
Framework | |
Concept Drift and Anomaly Detection in Graph Streams
Title | Concept Drift and Anomaly Detection in Graph Streams |
Authors | Daniele Zambon, Cesare Alippi, Lorenzo Livi |
Abstract | Graph representations offer powerful and intuitive ways to describe data in a multitude of application domains. Here, we consider stochastic processes generating graphs and propose a methodology for detecting changes in stationarity of such processes. The methodology is general and considers a process generating attributed graphs with a variable number of vertices/edges, without the need to assume one-to-one correspondence between vertices at different time steps. The methodology acts by embedding every graph of the stream into a vector domain, where a conventional multivariate change detection procedure can be easily applied. We ground the soundness of our proposal by proving several theoretical results. In addition, we provide a specific implementation of the methodology and evaluate its effectiveness on several detection problems involving attributed graphs representing biological molecules and drawings. Experimental results are contrasted with respect to suitable baseline methods, demonstrating the effectiveness of our approach. |
Tasks | Anomaly Detection |
Published | 2017-06-21 |
URL | http://arxiv.org/abs/1706.06941v3 |
http://arxiv.org/pdf/1706.06941v3.pdf | |
PWC | https://paperswithcode.com/paper/concept-drift-and-anomaly-detection-in-graph |
Repo | |
Framework | |
Multi-frame image super-resolution with fast upscaling technique
Title | Multi-frame image super-resolution with fast upscaling technique |
Authors | Longguang Wang, Zaiping Lin, Xinpu Deng, Wei An |
Abstract | Multi-frame image super-resolution (MISR) aims to fuse information in low-resolution (LR) image sequence to compose a high-resolution (HR) one, which is applied extensively in many areas recently. Different with single image super-resolution (SISR), sub-pixel transitions between multiple frames introduce additional information, attaching more significance to fusion operator to alleviate the ill-posedness of MISR. For reconstruction-based approaches, the inevitable projection of reconstruction errors from LR space to HR space is commonly tackled by an interpolation operator, however crude interpolation may not fit the natural image and generate annoying blurring artifacts, especially after fusion operator. In this paper, we propose an end-to-end fast upscaling technique to replace the interpolation operator, design upscaling filters in LR space for periodic sub-locations respectively and shuffle the filter results to derive the final reconstruction errors in HR space. The proposed fast upscaling technique not only reduce the computational complexity of the upscaling operation by utilizing shuffling operation to avoid complex operation in HR space, but also realize superior performance with fewer blurring artifacts. Extensive experimental results demonstrate the effectiveness and efficiency of the proposed technique, whilst, combining the proposed technique with bilateral total variation (BTV) regu-larization, the MISR approach outperforms state-of-the-art methods. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06266v2 |
http://arxiv.org/pdf/1706.06266v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-frame-image-super-resolution-with-fast |
Repo | |
Framework | |
Topology Reduction in Deep Convolutional Feature Extraction Networks
Title | Topology Reduction in Deep Convolutional Feature Extraction Networks |
Authors | Thomas Wiatowski, Philipp Grohs, Helmut Bölcskei |
Abstract | Deep convolutional neural networks (CNNs) used in practice employ potentially hundreds of layers and $10$,$000$s of nodes. Such network sizes entail significant computational complexity due to the large number of convolutions that need to be carried out; in addition, a large number of parameters needs to be learned and stored. Very deep and wide CNNs may therefore not be well suited to applications operating under severe resource constraints as is the case, e.g., in low-power embedded and mobile platforms. This paper aims at understanding the impact of CNN topology, specifically depth and width, on the network’s feature extraction capabilities. We address this question for the class of scattering networks that employ either Weyl-Heisenberg filters or wavelets, the modulus non-linearity, and no pooling. The exponential feature map energy decay results in Wiatowski et al., 2017, are generalized to $\mathcal{O}(a^{-N})$, where an arbitrary decay factor $a>1$ can be realized through suitable choice of the Weyl-Heisenberg prototype function or the mother wavelet. We then show how networks of fixed (possibly small) depth $N$ can be designed to guarantee that $((1-\varepsilon)\cdot 100)%$ of the input signal’s energy are contained in the feature vector. Based on the notion of operationally significant nodes, we characterize, partly rigorously and partly heuristically, the topology-reducing effects of (effectively) band-limited input signals, band-limited filters, and feature map symmetries. Finally, for networks based on Weyl-Heisenberg filters, we determine the prototype function bandwidth that minimizes—for fixed network depth $N$—the average number of operationally significant nodes per layer. |
Tasks | |
Published | 2017-07-10 |
URL | http://arxiv.org/abs/1707.02711v2 |
http://arxiv.org/pdf/1707.02711v2.pdf | |
PWC | https://paperswithcode.com/paper/topology-reduction-in-deep-convolutional |
Repo | |
Framework | |
Beyond Low-Rank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multi-view Spectral Clustering
Title | Beyond Low-Rank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multi-view Spectral Clustering |
Authors | Yang Wang, Lin Wu |
Abstract | Low-Rank Representation (LRR) is arguably one of the most powerful paradigms for Multi-view spectral clustering, which elegantly encodes the multi-view local graph/manifold structures into an intrinsic low-rank self-expressive data similarity embedded in high-dimensional space, to yield a better graph partition than their single-view counterparts. In this paper we revisit it with a fundamentally different perspective by discovering LRR as essentially a latent clustered orthogonal projection based representation winged with an optimized local graph structure for spectral clustering; each column of the representation is fundamentally a cluster basis orthogonal to others to indicate its members, which intuitively projects the view-specific feature representation to be the one spanned by all orthogonal basis to characterize the cluster structures. Upon this finding, we propose our technique with the followings: (1) We decompose LRR into latent clustered orthogonal representation via low-rank matrix factorization, to encode the more flexible cluster structures than LRR over primal data objects; (2) We convert the problem of LRR into that of simultaneously learning orthogonal clustered representation and optimized local graph structure for each view; (3) The learned orthogonal clustered representations and local graph structures enjoy the same magnitude for multi-view, so that the ideal multi-view consensus can be readily achieved. The experiments over multi-view datasets validate its superiority. |
Tasks | |
Published | 2017-08-04 |
URL | http://arxiv.org/abs/1708.02288v4 |
http://arxiv.org/pdf/1708.02288v4.pdf | |
PWC | https://paperswithcode.com/paper/beyond-low-rank-representations-orthogonal |
Repo | |
Framework | |