Paper Group ANR 746
Learning Spatial-Aware Regressions for Visual Tracking. Impulsive noise removal from color images with morphological filtering. Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification. An Improved Video Analysis using Context based Extension of LSH. Dense-Captioning Events in Videos. Learni …
Learning Spatial-Aware Regressions for Visual Tracking
Title | Learning Spatial-Aware Regressions for Visual Tracking |
Authors | Chong Sun, Dong Wang, Huchuan Lu, Ming-Hsuan Yang |
Abstract | In this paper, we analyze the spatial information of deep features, and propose two complementary regressions for robust visual tracking. First, we propose a kernelized ridge regression model wherein the kernel value is defined as the weighted sum of similarity scores of all pairs of patches between two samples. We show that this model can be formulated as a neural network and thus can be efficiently solved. Second, we propose a fully convolutional neural network with spatially regularized kernels, through which the filter kernel corresponding to each output channel is forced to focus on a specific region of the target. Distance transform pooling is further exploited to determine the effectiveness of each output channel of the convolution layer. The outputs from the kernelized ridge regression model and the fully convolutional neural network are combined to obtain the ultimate response. Experimental results on two benchmark datasets validate the effectiveness of the proposed method. |
Tasks | Visual Object Tracking, Visual Tracking |
Published | 2017-06-22 |
URL | http://arxiv.org/abs/1706.07457v2 |
http://arxiv.org/pdf/1706.07457v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-spatial-aware-regressions-for-visual |
Repo | |
Framework | |
Impulsive noise removal from color images with morphological filtering
Title | Impulsive noise removal from color images with morphological filtering |
Authors | Alexey Ruchay, Vitaly Kober |
Abstract | This paper deals with impulse noise removal from color images. The proposed noise removal algorithm employs a novel approach with morphological filtering for color image denoising; that is, detection of corrupted pixels and removal of the detected noise by means of morphological filtering. With the help of computer simulation we show that the proposed algorithm can effectively remove impulse noise. The performance of the proposed algorithm is compared in terms of image restoration metrics and processing speed with that of common successful algorithms. |
Tasks | Denoising, Image Denoising, Image Restoration |
Published | 2017-07-11 |
URL | http://arxiv.org/abs/1707.03126v1 |
http://arxiv.org/pdf/1707.03126v1.pdf | |
PWC | https://paperswithcode.com/paper/impulsive-noise-removal-from-color-images |
Repo | |
Framework | |
Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification
Title | Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification |
Authors | Weijian Deng, Liang Zheng, Qixiang Ye, Guoliang Kang, Yi Yang, Jianbin Jiao |
Abstract | Person re-identification (re-ID) models trained on one domain often fail to generalize well to another. In our attempt, we present a “learning via translation” framework. In the baseline, we translate the labeled images from source to target domain in an unsupervised manner. We then train re-ID models with the translated images by supervised methods. Yet, being an essential part of this framework, unsupervised image-image translation suffers from the information loss of source-domain labels during translation. Our motivation is two-fold. First, for each image, the discriminative cues contained in its ID label should be maintained after translation. Second, given the fact that two domains have entirely different persons, a translated image should be dissimilar to any of the target IDs. To this end, we propose to preserve two types of unsupervised similarities, 1) self-similarity of an image before and after translation, and 2) domain-dissimilarity of a translated source image and a target image. Both constraints are implemented in the similarity preserving generative adversarial network (SPGAN) which consists of an Siamese network and a CycleGAN. Through domain adaptation experiment, we show that images generated by SPGAN are more suitable for domain adaptation and yield consistent and competitive re-ID accuracy on two large-scale datasets. |
Tasks | Domain Adaptation, Person Re-Identification |
Published | 2017-11-19 |
URL | http://arxiv.org/abs/1711.07027v3 |
http://arxiv.org/pdf/1711.07027v3.pdf | |
PWC | https://paperswithcode.com/paper/image-image-domain-adaptation-with-preserved |
Repo | |
Framework | |
An Improved Video Analysis using Context based Extension of LSH
Title | An Improved Video Analysis using Context based Extension of LSH |
Authors | Angana Chakraborty, Sanghamitra Bandyopadhyay |
Abstract | Locality Sensitive Hashing (LSH) based algorithms have already shown their promise in finding approximate nearest neighbors in high dimen- sional data space. However, there are certain scenarios, as in sequential data, where the proximity of a pair of points cannot be captured without considering their surroundings or context. In videos, as for example, a particular frame is meaningful only when it is seen in the context of its preceding and following frames. LSH has no mechanism to handle the con- texts of the data points. In this article, a novel scheme of Context based Locality Sensitive Hashing (conLSH) has been introduced, in which points are hashed together not only based on their closeness, but also because of similar context. The contribution made in this article is three fold. First, conLSH is integrated with a recently proposed fast optimal sequence alignment algorithm (FOGSAA) using a layered approach. The resultant method is applied to video retrieval for extracting similar sequences. The pro- posed algorithm yields more than 80% accuracy on an average in different datasets. It has been found to save 36.3% of the total time, consumed by the exhaustive search. conLSH reduces the search space to approximately 42% of the entire dataset, when compared with an exhaustive search by the aforementioned FOGSAA, Bag of Words method and the standard LSH implementations. Secondly, the effectiveness of conLSH is demon- strated in action recognition of the video clips, which yields an average gain of 12.83% in terms of classification accuracy over the state of the art methods using STIP descriptors. The last but of great significance is that this article provides a way of automatically annotating long and composite real life videos. The source code of conLSH is made available at http://www.isical.ac.in/~bioinfo_miu/conLSH/conLSH.html |
Tasks | Temporal Action Localization, Video Retrieval |
Published | 2017-05-10 |
URL | http://arxiv.org/abs/1705.03933v2 |
http://arxiv.org/pdf/1705.03933v2.pdf | |
PWC | https://paperswithcode.com/paper/an-improved-video-analysis-using-context |
Repo | |
Framework | |
Dense-Captioning Events in Videos
Title | Dense-Captioning Events in Videos |
Authors | Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles |
Abstract | Most natural videos contain numerous events. For example, in a video of a “man playing a piano”, the video might also contain “another man dancing” or “a crowd clapping”. We introduce the task of dense-captioning events, which involves both detecting and describing events in a video. We propose a new model that is able to identify all events in a single pass of the video while simultaneously describing the detected events with natural language. Our model introduces a variant of an existing proposal module that is designed to capture both short as well as long events that span minutes. To capture the dependencies between the events in a video, our model introduces a new captioning module that uses contextual information from past and future events to jointly describe all events. We also introduce ActivityNet Captions, a large-scale benchmark for dense-captioning events. ActivityNet Captions contains 20k videos amounting to 849 video hours with 100k total descriptions, each with it’s unique start and end time. Finally, we report performances of our model for dense-captioning events, video retrieval and localization. |
Tasks | Video Retrieval |
Published | 2017-05-02 |
URL | http://arxiv.org/abs/1705.00754v1 |
http://arxiv.org/pdf/1705.00754v1.pdf | |
PWC | https://paperswithcode.com/paper/dense-captioning-events-in-videos |
Repo | |
Framework | |
Learning Steerable Filters for Rotation Equivariant CNNs
Title | Learning Steerable Filters for Rotation Equivariant CNNs |
Authors | Maurice Weiler, Fred A. Hamprecht, Martin Storath |
Abstract | In many machine learning tasks it is desirable that a model’s prediction transforms in an equivariant way under transformations of its input. Convolutional neural networks (CNNs) implement translational equivariance by construction; for other transformations, however, they are compelled to learn the proper mapping. In this work, we develop Steerable Filter CNNs (SFCNNs) which achieve joint equivariance under translations and rotations by design. The proposed architecture employs steerable filters to efficiently compute orientation dependent responses for many orientations without suffering interpolation artifacts from filter rotation. We utilize group convolutions which guarantee an equivariant mapping. In addition, we generalize He’s weight initialization scheme to filters which are defined as a linear combination of a system of atomic filters. Numerical experiments show a substantial enhancement of the sample complexity with a growing number of sampled filter orientations and confirm that the network generalizes learned patterns over orientations. The proposed approach achieves state-of-the-art on the rotated MNIST benchmark and on the ISBI 2012 2D EM segmentation challenge. |
Tasks | |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07289v3 |
http://arxiv.org/pdf/1711.07289v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-steerable-filters-for-rotation |
Repo | |
Framework | |
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings
Title | Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings |
Authors | Junki Matsuo, Mamoru Komachi, Katsuhito Sudoh |
Abstract | One of the most important problems in machine translation (MT) evaluation is to evaluate the similarity between translation hypotheses with different surface forms from the reference, especially at the segment level. We propose to use word embeddings to perform word alignment for segment-level MT evaluation. We performed experiments with three types of alignment methods using word embeddings. We evaluated our proposed methods with various translation datasets. Experimental results show that our proposed methods outperform previous word embeddings-based methods. |
Tasks | Machine Translation, Word Alignment, Word Embeddings |
Published | 2017-04-02 |
URL | http://arxiv.org/abs/1704.00380v1 |
http://arxiv.org/pdf/1704.00380v1.pdf | |
PWC | https://paperswithcode.com/paper/word-alignment-based-segment-level-machine |
Repo | |
Framework | |
Solar Power Plant Detection on Multi-Spectral Satellite Imagery using Weakly-Supervised CNN with Feedback Features and m-PCNN Fusion
Title | Solar Power Plant Detection on Multi-Spectral Satellite Imagery using Weakly-Supervised CNN with Feedback Features and m-PCNN Fusion |
Authors | Nevrez Imamoglu, Motoki Kimura, Hiroki Miyamoto, Aito Fujita, Ryosuke Nakamura |
Abstract | Most of the traditional convolutional neural networks (CNNs) implements bottom-up approach (feed-forward) for image classifications. However, many scientific studies demonstrate that visual perception in primates rely on both bottom-up and top-down connections. Therefore, in this work, we propose a CNN network with feedback structure for Solar power plant detection on middle-resolution satellite images. To express the strength of the top-down connections, we introduce feedback CNN network (FB-Net) to a baseline CNN model used for solar power plant classification on multi-spectral satellite data. Moreover, we introduce a method to improve class activation mapping (CAM) to our FB-Net, which takes advantage of multi-channel pulse coupled neural network (m-PCNN) for weakly-supervised localization of the solar power plants from the features of proposed FB-Net. For the proposed FB-Net CAM with m-PCNN, experimental results demonstrated promising results on both solar-power plant image classification and detection task. |
Tasks | Image Classification |
Published | 2017-04-21 |
URL | http://arxiv.org/abs/1704.06410v2 |
http://arxiv.org/pdf/1704.06410v2.pdf | |
PWC | https://paperswithcode.com/paper/solar-power-plant-detection-on-multi-spectral |
Repo | |
Framework | |
Traffic Prediction Based on Random Connectivity in Deep Learning with Long Short-Term Memory
Title | Traffic Prediction Based on Random Connectivity in Deep Learning with Long Short-Term Memory |
Authors | Yuxiu Hua, Zhifeng Zhao, Rongpeng Li, Xianfu Chen, Zhiming Liu, Honggang Zhang |
Abstract | Traffic prediction plays an important role in evaluating the performance of telecommunication networks and attracts intense research interests. A significant number of algorithms and models have been put forward to analyse traffic data and make prediction. In the recent big data era, deep learning has been exploited to mine the profound information hidden in the data. In particular, Long Short-Term Memory (LSTM), one kind of Recurrent Neural Network (RNN) schemes, has attracted a lot of attentions due to its capability of processing the long-range dependency embedded in the sequential traffic data. However, LSTM has considerable computational cost, which can not be tolerated in tasks with stringent latency requirement. In this paper, we propose a deep learning model based on LSTM, called Random Connectivity LSTM (RCLSTM). Compared to the conventional LSTM, RCLSTM makes a notable breakthrough in the formation of neural network, which is that the neurons are connected in a stochastic manner rather than full connected. So, the RCLSTM, with certain intrinsic sparsity, have many neural connections absent (distinguished from the full connectivity) and which leads to the reduction of the parameters to be trained and the computational cost. We apply the RCLSTM to predict traffic and validate that the RCLSTM with even 35% neural connectivity still shows a satisfactory performance. When we gradually add training samples, the performance of RCLSTM becomes increasingly closer to the baseline LSTM. Moreover, for the input traffic sequences of enough length, the RCLSTM exhibits even superior prediction accuracy than the baseline LSTM. |
Tasks | Traffic Prediction |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.02833v2 |
http://arxiv.org/pdf/1711.02833v2.pdf | |
PWC | https://paperswithcode.com/paper/traffic-prediction-based-on-random |
Repo | |
Framework | |
Target Oriented High Resolution SAR Image Formation via Semantic Information Guided Regularizations
Title | Target Oriented High Resolution SAR Image Formation via Semantic Information Guided Regularizations |
Authors | Biao Hou, Zaidao Wen, Licheng Jiao, Qian Wu |
Abstract | Sparsity-regularized synthetic aperture radar (SAR) imaging framework has shown its remarkable performance to generate a feature enhanced high resolution image, in which a sparsity-inducing regularizer is involved by exploiting the sparsity priors of some visual features in the underlying image. However, since the simple prior of low level features are insufficient to describe different semantic contents in the image, this type of regularizer will be incapable of distinguishing between the target of interest and unconcerned background clutters. As a consequence, the features belonging to the target and clutters are simultaneously affected in the generated image without concerning their underlying semantic labels. To address this problem, we propose a novel semantic information guided framework for target oriented SAR image formation, which aims at enhancing the interested target scatters while suppressing the background clutters. Firstly, we develop a new semantics-specific regularizer for image formation by exploiting the statistical properties of different semantic categories in a target scene SAR image. In order to infer the semantic label for each pixel in an unsupervised way, we moreover induce a novel high-level prior-driven regularizer and some semantic causal rules from the prior knowledge. Finally, our regularized framework for image formation is further derived as a simple iteratively reweighted $\ell_1$ minimization problem which can be conveniently solved by many off-the-shelf solvers. Experimental results demonstrate the effectiveness and superiority of our framework for SAR image formation in terms of target enhancement and clutters suppression, compared with the state of the arts. Additionally, the proposed framework opens a new direction of devoting some machine learning strategies to image formation, which can benefit the subsequent decision making tasks. |
Tasks | Decision Making |
Published | 2017-04-24 |
URL | http://arxiv.org/abs/1704.07082v1 |
http://arxiv.org/pdf/1704.07082v1.pdf | |
PWC | https://paperswithcode.com/paper/target-oriented-high-resolution-sar-image |
Repo | |
Framework | |
Simulating Patho-realistic Ultrasound Images using Deep Generative Networks with Adversarial Learning
Title | Simulating Patho-realistic Ultrasound Images using Deep Generative Networks with Adversarial Learning |
Authors | Francis Tom, Debdoot Sheet |
Abstract | Ultrasound imaging makes use of backscattering of waves during their interaction with scatterers present in biological tissues. Simulation of synthetic ultrasound images is a challenging problem on account of inability to accurately model various factors of which some include intra-/inter scanline interference, transducer to surface coupling, artifacts on transducer elements, inhomogeneous shadowing and nonlinear attenuation. Current approaches typically solve wave space equations making them computationally expensive and slow to operate. We propose a generative adversarial network (GAN) inspired approach for fast simulation of patho-realistic ultrasound images. We apply the framework to intravascular ultrasound (IVUS) simulation. A stage 0 simulation performed using pseudo B-mode ultrasound image simulator yields speckle mapping of a digitally defined phantom. The stage I GAN subsequently refines them to preserve tissue specific speckle intensities. The stage II GAN further refines them to generate high resolution images with patho-realistic speckle profiles. We evaluate patho-realism of simulated images with a visual Turing test indicating an equivocal confusion in discriminating simulated from real. We also quantify the shift in tissue specific intensity distributions of the real and simulated images to prove their similarity. |
Tasks | |
Published | 2017-12-21 |
URL | http://arxiv.org/abs/1712.07881v2 |
http://arxiv.org/pdf/1712.07881v2.pdf | |
PWC | https://paperswithcode.com/paper/simulating-patho-realistic-ultrasound-images |
Repo | |
Framework | |
A Generic Regression Framework for Pose Recognition on Color and Depth Images
Title | A Generic Regression Framework for Pose Recognition on Color and Depth Images |
Authors | Wenye He |
Abstract | Cascaded regression method is a fast and accurate method on finding 2D pose of objects in RGB images. It is able to find the accurate pose of objects in an image by a great number of corrections on the good initial guess of the pose of objects. This paper explains the algorithm and shows the result of two experiments carried by the researchers. The presented new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing. Finally, we generate confidence-scored 3D proposals of several body parts by re-projecting the classification result and finding local modes. |
Tasks | Object Recognition, Pose Estimation |
Published | 2017-09-23 |
URL | http://arxiv.org/abs/1709.08068v1 |
http://arxiv.org/pdf/1709.08068v1.pdf | |
PWC | https://paperswithcode.com/paper/a-generic-regression-framework-for-pose |
Repo | |
Framework | |
Accelerating Innovation Through Analogy Mining
Title | Accelerating Innovation Through Analogy Mining |
Authors | Tom Hope, Joel Chan, Aniket Kittur, Dafna Shahaf |
Abstract | The availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery by providing people with inspiration from solutions to analogous problems. However, finding useful analogies in these large, messy, real-world repositories remains a persistent challenge for either human or automated methods. Previous approaches include costly hand-created databases that have high relational structure (e.g., predicate calculus representations) but are very sparse. Simpler machine-learning/information-retrieval similarity metrics can scale to large, natural-language datasets, but struggle to account for structural similarity, which is central to analogy. In this paper we explore the viability and value of learning simpler structural representations, specifically, “problem schemas”, which specify the purpose of a product and the mechanisms by which it achieves that purpose. Our approach combines crowdsourcing and recurrent neural networks to extract purpose and mechanism vector representations from product descriptions. We demonstrate that these learned vectors allow us to find analogies with higher precision and recall than traditional information-retrieval methods. In an ideation experiment, analogies retrieved by our models significantly increased people’s likelihood of generating creative ideas compared to analogies retrieved by traditional methods. Our results suggest a promising approach to enabling computational analogy at scale is to learn and leverage weaker structural representations. |
Tasks | Information Retrieval |
Published | 2017-06-17 |
URL | http://arxiv.org/abs/1706.05585v1 |
http://arxiv.org/pdf/1706.05585v1.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-innovation-through-analogy |
Repo | |
Framework | |
Sample and Computationally Efficient Learning Algorithms under S-Concave Distributions
Title | Sample and Computationally Efficient Learning Algorithms under S-Concave Distributions |
Authors | Maria-Florina Balcan, Hongyang Zhang |
Abstract | We provide new results for noise-tolerant and sample-efficient learning algorithms under $s$-concave distributions. The new class of $s$-concave distributions is a broad and natural generalization of log-concavity, and includes many important additional distributions, e.g., the Pareto distribution and $t$-distribution. This class has been studied in the context of efficient sampling, integration, and optimization, but much remains unknown about the geometry of this class of distributions and their applications in the context of learning. The challenge is that unlike the commonly used distributions in learning (uniform or more generally log-concave distributions), this broader class is not closed under the marginalization operator and many such distributions are fat-tailed. In this work, we introduce new convex geometry tools to study the properties of $s$-concave distributions and use these properties to provide bounds on quantities of interest to learning including the probability of disagreement between two halfspaces, disagreement outside a band, and the disagreement coefficient. We use these results to significantly generalize prior results for margin-based active learning, disagreement-based active learning, and passive learning of intersections of halfspaces. Our analysis of geometric properties of $s$-concave distributions might be of independent interest to optimization more broadly. |
Tasks | Active Learning |
Published | 2017-03-22 |
URL | http://arxiv.org/abs/1703.07758v2 |
http://arxiv.org/pdf/1703.07758v2.pdf | |
PWC | https://paperswithcode.com/paper/sample-and-computationally-efficient-learning |
Repo | |
Framework | |
Joint Hierarchical Category Structure Learning and Large-Scale Image Classification
Title | Joint Hierarchical Category Structure Learning and Large-Scale Image Classification |
Authors | Yanyun Qu, Li Lin, Fumin Shen, Chang Lu, Yang Wu, Yuan Xie, Dacheng Tao |
Abstract | We investigate the scalable image classification problem with a large number of categories. Hierarchical visual data structures are helpful for improving the efficiency and performance of large-scale multi-class classification. We propose a novel image classification method based on learning hierarchical inter-class structures. Specifically, we first design a fast algorithm to compute the similarity metric between categories, based on which a visual tree is constructed by hierarchical spectral clustering. Using the learned visual tree, a test sample label is efficiently predicted by searching for the best path over the entire tree. The proposed method is extensively evaluated on the ILSVRC2010 and Caltech 256 benchmark datasets. Experimental results show that our method obtains significantly better category hierarchies than other state-of-the-art visual tree-based methods and, therefore, much more accurate classification. |
Tasks | Image Classification |
Published | 2017-09-15 |
URL | http://arxiv.org/abs/1709.05072v1 |
http://arxiv.org/pdf/1709.05072v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-hierarchical-category-structure |
Repo | |
Framework | |