Paper Group AWR 79
Learning Visual Servoing with Deep Features and Fitted Q-Iteration. Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning. Good Features to Correlate for Visual Tracking. Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks. Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval. …
Learning Visual Servoing with Deep Features and Fitted Q-Iteration
Title | Learning Visual Servoing with Deep Features and Fitted Q-Iteration |
Authors | Alex X. Lee, Sergey Levine, Pieter Abbeel |
Abstract | Visual servoing involves choosing actions that move a robot in response to observations from a camera, in order to reach a goal configuration in the world. Standard visual servoing approaches typically rely on manually designed features and analytical dynamics models, which limits their generalization capability and often requires extensive application-specific feature and model engineering. In this work, we study how learned visual features, learned predictive dynamics models, and reinforcement learning can be combined to learn visual servoing mechanisms. We focus on target following, with the goal of designing algorithms that can learn a visual servo using low amounts of data of the target in question, to enable quick adaptation to new targets. Our approach is based on servoing the camera in the space of learned visual features, rather than image pixels or manually-designed keypoints. We demonstrate that standard deep features, in our case taken from a model trained for object classification, can be used together with a bilinear predictive model to learn an effective visual servo that is robust to visual variation, changes in viewing angle and appearance, and occlusions. A key component of our approach is to use a sample-efficient fitted Q-iteration algorithm to learn which features are best suited for the task at hand. We show that we can learn an effective visual servo on a complex synthetic car following benchmark using just 20 training trajectory samples for reinforcement learning. We demonstrate substantial improvement over a conventional approach based on image pixels or hand-designed keypoints, and we show an improvement in sample-efficiency of more than two orders of magnitude over standard model-free deep reinforcement learning algorithms. Videos are available at http://rll.berkeley.edu/visual_servoing . |
Tasks | Object Classification |
Published | 2017-03-31 |
URL | http://arxiv.org/abs/1703.11000v2 |
http://arxiv.org/pdf/1703.11000v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-visual-servoing-with-deep-features |
Repo | https://github.com/alexlee-gk/visual_dynamics |
Framework | none |
Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning
Title | Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning |
Authors | Berkan Demirel, Ramazan Gokberk Cinbis, Nazli Ikizler-Cinbis |
Abstract | We propose a novel approach for unsupervised zero-shot learning (ZSL) of classes based on their names. Most existing unsupervised ZSL methods aim to learn a model for directly comparing image features and class names. However, this proves to be a difficult task due to dominance of non-visual semantics in underlying vector-space embeddings of class names. To address this issue, we discriminatively learn a word representation such that the similarities between class and combination of attribute names fall in line with the visual similarity. Contrary to the traditional zero-shot learning approaches that are built upon attribute presence, our approach bypasses the laborious attribute-class relation annotations for unseen classes. In addition, our proposed approach renders text-only training possible, hence, the training can be augmented without the need to collect additional image data. The experimental results show that our method yields state-of-the-art results for unsupervised ZSL in three benchmark datasets. |
Tasks | Zero-Shot Learning |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.01734v2 |
http://arxiv.org/pdf/1705.01734v2.pdf | |
PWC | https://paperswithcode.com/paper/attributes2classname-a-discriminative-model |
Repo | https://github.com/berkandemirel/attributes2classname |
Framework | tf |
Good Features to Correlate for Visual Tracking
Title | Good Features to Correlate for Visual Tracking |
Authors | Erhan Gundogdu, A. Aydin Alatan |
Abstract | During the recent years, correlation filters have shown dominant and spectacular results for visual object tracking. The types of the features that are employed in these family of trackers significantly affect the performance of visual tracking. The ultimate goal is to utilize robust features invariant to any kind of appearance change of the object, while predicting the object location as properly as in the case of no appearance change. As the deep learning based methods have emerged, the study of learning features for specific tasks has accelerated. For instance, discriminative visual tracking methods based on deep architectures have been studied with promising performance. Nevertheless, correlation filter based (CFB) trackers confine themselves to use the pre-trained networks which are trained for object classification problem. To this end, in this manuscript the problem of learning deep fully convolutional features for the CFB visual tracking is formulated. In order to learn the proposed model, a novel and efficient backpropagation algorithm is presented based on the loss function of the network. The proposed learning framework enables the network model to be flexible for a custom design. Moreover, it alleviates the dependency on the network trained for classification. Extensive performance analysis shows the efficacy of the proposed custom design in the CFB tracking framework. By fine-tuning the convolutional parts of a state-of-the-art network and integrating this model to a CFB tracker, which is the top performing one of VOT2016, 18% increase is achieved in terms of expected average overlap, and tracking failures are decreased by 25%, while maintaining the superiority over the state-of-the-art methods in OTB-2013 and OTB-2015 tracking datasets. |
Tasks | Object Classification, Object Tracking, Visual Object Tracking, Visual Tracking |
Published | 2017-04-20 |
URL | http://arxiv.org/abs/1704.06326v2 |
http://arxiv.org/pdf/1704.06326v2.pdf | |
PWC | https://paperswithcode.com/paper/good-features-to-correlate-for-visual |
Repo | https://github.com/egundogdu/CFCF |
Framework | tf |
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Title | Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks |
Authors | Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, Shih-Fu Chang |
Abstract | Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models. Source code is publicly available at https://imatge-upc.github.io/skiprnn-2017-telecombcn/ . |
Tasks | |
Published | 2017-08-22 |
URL | http://arxiv.org/abs/1708.06834v3 |
http://arxiv.org/pdf/1708.06834v3.pdf | |
PWC | https://paperswithcode.com/paper/skip-rnn-learning-to-skip-state-updates-in |
Repo | https://github.com/gitabcworld/skiprnn_pytorch |
Framework | pytorch |
Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval
Title | Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval |
Authors | Li Liu, Fumin Shen, Yuming Shen, Xianglong Liu, Ling Shao |
Abstract | Free-hand sketch-based image retrieval (SBIR) is a specific cross-view retrieval task, in which queries are abstract and ambiguous sketches while the retrieval database is formed with natural images. Work in this area mainly focuses on extracting representative and shared features for sketches and natural images. However, these can neither cope well with the geometric distortion between sketches and images nor be feasible for large-scale SBIR due to the heavy continuous-valued distance computation. In this paper, we speed up SBIR by introducing a novel binary coding method, named \textbf{Deep Sketch Hashing} (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework. Specifically, three convolutional neural networks are utilized to encode free-hand sketches, natural images and, especially, the auxiliary sketch-tokens which are adopted as bridges to mitigate the sketch-image geometric distortion. The learned DSH codes can effectively capture the cross-view similarities as well as the intrinsic semantic correlations between different categories. To the best of our knowledge, DSH is the first hashing work specifically designed for category-level SBIR with an end-to-end deep architecture. The proposed DSH is comprehensively evaluated on two large-scale datasets of TU-Berlin Extension and Sketchy, and the experiments consistently show DSH’s superior SBIR accuracies over several state-of-the-art methods, while achieving significantly reduced retrieval time and memory footprint. |
Tasks | Image Retrieval, Sketch-Based Image Retrieval |
Published | 2017-03-16 |
URL | http://arxiv.org/abs/1703.05605v1 |
http://arxiv.org/pdf/1703.05605v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-sketch-hashing-fast-free-hand-sketch |
Repo | https://github.com/ymcidence/DeepSketchHashing |
Framework | none |
Adversarial Patch
Title | Adversarial Patch |
Authors | Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi, Justin Gilmer |
Abstract | We present a method to create universal, robust, targeted adversarial image patches in the real world. The patches are universal because they can be used to attack any scene, robust because they work under a wide variety of transformations, and targeted because they can cause a classifier to output any target class. These adversarial patches can be printed, added to any scene, photographed, and presented to image classifiers; even when the patches are small, they cause the classifiers to ignore the other items in the scene and report a chosen target class. To reproduce the results from the paper, our code is available at https://github.com/tensorflow/cleverhans/tree/master/examples/adversarial_patch |
Tasks | |
Published | 2017-12-27 |
URL | http://arxiv.org/abs/1712.09665v2 |
http://arxiv.org/pdf/1712.09665v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-patch |
Repo | https://github.com/Fyndir/PythonDeepLearn |
Framework | tf |
A Public Image Database for Benchmark of Plant Seedling Classification Algorithms
Title | A Public Image Database for Benchmark of Plant Seedling Classification Algorithms |
Authors | Thomas Mosgaard Giselsson, Rasmus Nyholm Jørgensen, Peter Kryger Jensen, Mads Dyrmann, Henrik Skov Midtiby |
Abstract | A database of images of approximately 960 unique plants belonging to 12 species at several growth stages is made publicly available. It comprises annotated RGB images with a physical resolution of roughly 10 pixels per mm. To standardise the evaluation of classification results obtained with the database, a benchmark based on $f_{1}$ scores is proposed. The dataset is available at https://vision.eng.au.dk/plant-seedlings-dataset |
Tasks | |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05458v1 |
http://arxiv.org/pdf/1711.05458v1.pdf | |
PWC | https://paperswithcode.com/paper/a-public-image-database-for-benchmark-of |
Repo | https://github.com/WuZhuoran/Plant_Seedlings_Classification |
Framework | pytorch |
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
Title | Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm |
Authors | Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, Sune Lehmann |
Abstract | NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within sentiment, emotion and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches. |
Tasks | Sarcasm Detection, Sentiment Analysis |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00524v2 |
http://arxiv.org/pdf/1708.00524v2.pdf | |
PWC | https://paperswithcode.com/paper/using-millions-of-emoji-occurrences-to-learn |
Repo | https://github.com/bfelbo/deepmoji |
Framework | tf |
Continuous-Time Relationship Prediction in Dynamic Heterogeneous Information Networks
Title | Continuous-Time Relationship Prediction in Dynamic Heterogeneous Information Networks |
Authors | Sina Sajadmanesh, Sogol Bazargani, Jiawei Zhang, Hamid R. Rabiee |
Abstract | Online social networks, World Wide Web, media and technological networks, and other types of so-called information networks are ubiquitous nowadays. These information networks are inherently heterogeneous and dynamic. They are heterogeneous as they consist of multi-typed objects and relations, and they are dynamic as they are constantly evolving over time. One of the challenging issues in such heterogeneous and dynamic environments is to forecast those relationships in the network that will appear in the future. In this paper, we try to solve the problem of continuous-time relationship prediction in dynamic and heterogeneous information networks. This implies predicting the time it takes for a relationship to appear in the future, given its features that have been extracted by considering both heterogeneity and temporal dynamics of the underlying network. To this end, we first introduce a feature extraction framework that combines the power of meta-path-based modeling and recurrent neural networks to effectively extract features suitable for relationship prediction regarding heterogeneity and dynamicity of the networks. Next, we propose a supervised non-parametric approach, called Non-Parametric Generalized Linear Model (NP-GLM), which infers the hidden underlying probability distribution of the relationship building time given its features. We then present a learning algorithm to train NP-GLM and an inference method to answer time-related queries. Extensive experiments conducted on synthetic data and three real-world datasets, namely Delicious, MovieLens, and DBLP, demonstrate the effectiveness of NP-GLM in solving continuous-time relationship prediction problem vis-a-vis competitive baselines |
Tasks | |
Published | 2017-09-30 |
URL | https://arxiv.org/abs/1710.00818v4 |
https://arxiv.org/pdf/1710.00818v4.pdf | |
PWC | https://paperswithcode.com/paper/continuous-time-relationship-prediction-in |
Repo | https://github.com/sisaman/npglm |
Framework | none |
Bayesian Policy Gradients via Alpha Divergence Dropout Inference
Title | Bayesian Policy Gradients via Alpha Divergence Dropout Inference |
Authors | Peter Henderson, Thang Doan, Riashat Islam, David Meger |
Abstract | Policy gradient methods have had great success in solving continuous control tasks, yet the stochastic nature of such problems makes deterministic value estimation difficult. We propose an approach which instead estimates a distribution by fitting the value function with a Bayesian Neural Network. We optimize an $\alpha$-divergence objective with Bayesian dropout approximation to learn and estimate this distribution. We show that using the Monte Carlo posterior mean of the Bayesian value function distribution, rather than a deterministic network, improves stability and performance of policy gradient methods in continuous control MuJoCo simulations. |
Tasks | Continuous Control, Policy Gradient Methods |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02037v1 |
http://arxiv.org/pdf/1712.02037v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-policy-gradients-via-alpha |
Repo | https://github.com/Breakend/BayesianPolicyGradients |
Framework | tf |
Automatic Discovery and Geotagging of Objects from Street View Imagery
Title | Automatic Discovery and Geotagging of Objects from Street View Imagery |
Authors | Vladimir A. Krylov, Eamonn Kenny, Rozenn Dahyot |
Abstract | Many applications such as autonomous navigation, urban planning and asset monitoring, rely on the availability of accurate information about objects and their geolocations. In this paper we propose to automatically detect and compute the GPS coordinates of recurring stationary objects of interest using street view imagery. Our processing pipeline relies on two fully convolutional neural networks: the first segments objects in the images while the second estimates their distance from the camera. To geolocate all the detected objects coherently we propose a novel custom Markov Random Field model to perform objects triangulation. The novelty of the resulting pipeline is the combined use of monocular depth estimation and triangulation to enable automatic mapping of complex scenes with multiple visually similar objects of interest. We validate experimentally the effectiveness of our approach on two object classes: traffic lights and telegraph poles. The experiments report high object recall rates and GPS accuracy within 2 meters, which is comparable with the precision of single-frequency GPS receivers. |
Tasks | Autonomous Navigation, Depth Estimation, Monocular Depth Estimation |
Published | 2017-08-28 |
URL | http://arxiv.org/abs/1708.08417v2 |
http://arxiv.org/pdf/1708.08417v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-discovery-and-geotagging-of-objects |
Repo | https://github.com/sasha-kap/CV-to-Maps |
Framework | tf |
A time series distance measure for efficient clustering of input output signals by their underlying dynamics
Title | A time series distance measure for efficient clustering of input output signals by their underlying dynamics |
Authors | Oliver Lauwers, Bart De Moor |
Abstract | Starting from a dataset with input/output time series generated by multiple deterministic linear dynamical systems, this paper tackles the problem of automatically clustering these time series. We propose an extension to the so-called Martin cepstral distance, that allows to efficiently cluster these time series, and apply it to simulated electrical circuits data. Traditionally, two ways of handling the problem are used. The first class of methods employs a distance measure on time series (e.g. Euclidean, Dynamic Time Warping) and a clustering technique (e.g. k-means, k-medoids, hierarchical clustering) to find natural groups in the dataset. It is, however, often not clear whether these distance measures effectively take into account the specific temporal correlations in these time series. The second class of methods uses the input/output data to identify a dynamic system using an identification scheme, and then applies a model norm-based distance (e.g. H2, H-infinity) to find out which systems are similar. This, however, can be very time consuming for large amounts of long time series data. We show that the new distance measure presented in this paper performs as good as when every input/output pair is modelled explicitly, but remains computationally much less complex. The complexity of calculating this distance between two time series of length N is O(N logN). |
Tasks | Time Series |
Published | 2017-03-06 |
URL | http://arxiv.org/abs/1703.01923v1 |
http://arxiv.org/pdf/1703.01923v1.pdf | |
PWC | https://paperswithcode.com/paper/a-time-series-distance-measure-for-efficient |
Repo | https://github.com/Olauwers/Extended-Cepstral-Distance |
Framework | none |
Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
Title | Learning Cross-Modal Deep Representations for Robust Pedestrian Detection |
Authors | Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe |
Abstract | This paper presents a novel method for detecting pedestrians under adverse illumination conditions. Our approach relies on a novel cross-modality learning framework and it is based on two main phases. First, given a multimodal dataset, a deep convolutional network is employed to learn a non-linear mapping, modeling the relations between RGB and thermal data. Then, the learned feature representations are transferred to a second deep network, which receives as input an RGB image and outputs the detection results. In this way, features which are both discriminative and robust to bad illumination conditions are learned. Importantly, at test time, only the second pipeline is considered and no thermal data are required. Our extensive evaluation demonstrates that the proposed approach outperforms the state-of- the-art on the challenging KAIST multispectral pedestrian dataset and it is competitive with previous methods on the popular Caltech dataset. |
Tasks | Pedestrian Detection |
Published | 2017-04-08 |
URL | http://arxiv.org/abs/1704.02431v2 |
http://arxiv.org/pdf/1704.02431v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-cross-modal-deep-representations-for |
Repo | https://github.com/SoonminHwang/rgbt-ped-detection |
Framework | none |
One-Sided Unsupervised Domain Mapping
Title | One-Sided Unsupervised Domain Mapping |
Authors | Sagie Benaim, Lior Wolf |
Abstract | In unsupervised domain mapping, the learner is given two unmatched datasets $A$ and $B$. The goal is to learn a mapping $G_{AB}$ that translates a sample in $A$ to the analog sample in $B$. Recent approaches have shown that when learning simultaneously both $G_{AB}$ and the inverse mapping $G_{BA}$, convincing mappings are obtained. In this work, we present a method of learning $G_{AB}$ without learning $G_{BA}$. This is done by learning a mapping that maintains the distance between a pair of samples. Moreover, good mappings are obtained, even by maintaining the distance between different parts of the same sample before and after mapping. We present experimental results that the new method not only allows for one sided mapping learning, but also leads to preferable numerical results over the existing circularity-based constraint. Our entire code is made publicly available at https://github.com/sagiebenaim/DistanceGAN . |
Tasks | Style Transfer, Unsupervised Image-To-Image Translation |
Published | 2017-06-02 |
URL | http://arxiv.org/abs/1706.00826v2 |
http://arxiv.org/pdf/1706.00826v2.pdf | |
PWC | https://paperswithcode.com/paper/one-sided-unsupervised-domain-mapping |
Repo | https://github.com/sagiebenaim/DistanceGAN |
Framework | pytorch |
Personalized Saliency and its Prediction
Title | Personalized Saliency and its Prediction |
Authors | Yanyu Xu, Shenghua Gao, Junru Wu, Nianyi Li, Jingyi Yu |
Abstract | Nearly all existing visual saliency models by far have focused on predicting a universal saliency map across all observers. Yet psychology studies suggest that visual attention of different observers can vary significantly under specific circumstances, especially a scene is composed of multiple salient objects. To study such heterogenous visual attention pattern across observers, we first construct a personalized saliency dataset and explore correlations between visual attention, personal preferences, and image contents. Specifically, we propose to decompose a personalized saliency map (referred to as PSM) into a universal saliency map (referred to as USM) predictable by existing saliency detection models and a new discrepancy map across users that characterizes personalized saliency. We then present two solutions towards predicting such discrepancy maps, i.e., a multi-task convolutional neural network (CNN) framework and an extended CNN with Person-specific Information Encoded Filters (CNN-PIEF). Extensive experimental results demonstrate the effectiveness of our models for PSM prediction as well their generalization capability for unseen observers. |
Tasks | Saliency Detection |
Published | 2017-10-09 |
URL | http://arxiv.org/abs/1710.03011v2 |
http://arxiv.org/pdf/1710.03011v2.pdf | |
PWC | https://paperswithcode.com/paper/personalized-saliency-and-its-prediction |
Repo | https://github.com/xuyanyu-shh/Personalized-Saliency |
Framework | none |