July 29, 2019

3174 words 15 mins read

Paper Group AWR 79

Learning Visual Servoing with Deep Features and Fitted Q-Iteration. Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning. Good Features to Correlate for Visual Tracking. Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks. Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval. …

Learning Visual Servoing with Deep Features and Fitted Q-Iteration


Title	Learning Visual Servoing with Deep Features and Fitted Q-Iteration
Authors	Alex X. Lee, Sergey Levine, Pieter Abbeel
Abstract	Visual servoing involves choosing actions that move a robot in response to observations from a camera, in order to reach a goal configuration in the world. Standard visual servoing approaches typically rely on manually designed features and analytical dynamics models, which limits their generalization capability and often requires extensive application-specific feature and model engineering. In this work, we study how learned visual features, learned predictive dynamics models, and reinforcement learning can be combined to learn visual servoing mechanisms. We focus on target following, with the goal of designing algorithms that can learn a visual servo using low amounts of data of the target in question, to enable quick adaptation to new targets. Our approach is based on servoing the camera in the space of learned visual features, rather than image pixels or manually-designed keypoints. We demonstrate that standard deep features, in our case taken from a model trained for object classification, can be used together with a bilinear predictive model to learn an effective visual servo that is robust to visual variation, changes in viewing angle and appearance, and occlusions. A key component of our approach is to use a sample-efficient fitted Q-iteration algorithm to learn which features are best suited for the task at hand. We show that we can learn an effective visual servo on a complex synthetic car following benchmark using just 20 training trajectory samples for reinforcement learning. We demonstrate substantial improvement over a conventional approach based on image pixels or hand-designed keypoints, and we show an improvement in sample-efficiency of more than two orders of magnitude over standard model-free deep reinforcement learning algorithms. Videos are available at http://rll.berkeley.edu/visual_servoing .
Tasks	Object Classification
Published	2017-03-31
URL	http://arxiv.org/abs/1703.11000v2
PDF	http://arxiv.org/pdf/1703.11000v2.pdf
PWC	https://paperswithcode.com/paper/learning-visual-servoing-with-deep-features
Repo	https://github.com/alexlee-gk/visual_dynamics
Framework	none

Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning


Title	Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning
Authors	Berkan Demirel, Ramazan Gokberk Cinbis, Nazli Ikizler-Cinbis
Abstract	We propose a novel approach for unsupervised zero-shot learning (ZSL) of classes based on their names. Most existing unsupervised ZSL methods aim to learn a model for directly comparing image features and class names. However, this proves to be a difficult task due to dominance of non-visual semantics in underlying vector-space embeddings of class names. To address this issue, we discriminatively learn a word representation such that the similarities between class and combination of attribute names fall in line with the visual similarity. Contrary to the traditional zero-shot learning approaches that are built upon attribute presence, our approach bypasses the laborious attribute-class relation annotations for unseen classes. In addition, our proposed approach renders text-only training possible, hence, the training can be augmented without the need to collect additional image data. The experimental results show that our method yields state-of-the-art results for unsupervised ZSL in three benchmark datasets.
Tasks	Zero-Shot Learning
Published	2017-05-04
URL	http://arxiv.org/abs/1705.01734v2
PDF	http://arxiv.org/pdf/1705.01734v2.pdf
PWC	https://paperswithcode.com/paper/attributes2classname-a-discriminative-model
Repo	https://github.com/berkandemirel/attributes2classname
Framework	tf

Good Features to Correlate for Visual Tracking


Title	Good Features to Correlate for Visual Tracking
Authors	Erhan Gundogdu, A. Aydin Alatan
Abstract	During the recent years, correlation filters have shown dominant and spectacular results for visual object tracking. The types of the features that are employed in these family of trackers significantly affect the performance of visual tracking. The ultimate goal is to utilize robust features invariant to any kind of appearance change of the object, while predicting the object location as properly as in the case of no appearance change. As the deep learning based methods have emerged, the study of learning features for specific tasks has accelerated. For instance, discriminative visual tracking methods based on deep architectures have been studied with promising performance. Nevertheless, correlation filter based (CFB) trackers confine themselves to use the pre-trained networks which are trained for object classification problem. To this end, in this manuscript the problem of learning deep fully convolutional features for the CFB visual tracking is formulated. In order to learn the proposed model, a novel and efficient backpropagation algorithm is presented based on the loss function of the network. The proposed learning framework enables the network model to be flexible for a custom design. Moreover, it alleviates the dependency on the network trained for classification. Extensive performance analysis shows the efficacy of the proposed custom design in the CFB tracking framework. By fine-tuning the convolutional parts of a state-of-the-art network and integrating this model to a CFB tracker, which is the top performing one of VOT2016, 18% increase is achieved in terms of expected average overlap, and tracking failures are decreased by 25%, while maintaining the superiority over the state-of-the-art methods in OTB-2013 and OTB-2015 tracking datasets.
Tasks	Object Classification, Object Tracking, Visual Object Tracking, Visual Tracking
Published	2017-04-20
URL	http://arxiv.org/abs/1704.06326v2
PDF	http://arxiv.org/pdf/1704.06326v2.pdf
PWC	https://paperswithcode.com/paper/good-features-to-correlate-for-visual
Repo	https://github.com/egundogdu/CFCF
Framework	tf

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks


Title	Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Authors	Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, Shih-Fu Chang
Abstract	Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models. Source code is publicly available at https://imatge-upc.github.io/skiprnn-2017-telecombcn/ .
Tasks
Published	2017-08-22
URL	http://arxiv.org/abs/1708.06834v3
PDF	http://arxiv.org/pdf/1708.06834v3.pdf
PWC	https://paperswithcode.com/paper/skip-rnn-learning-to-skip-state-updates-in
Repo	https://github.com/gitabcworld/skiprnn_pytorch
Framework	pytorch

Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval


Title	Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval
Authors	Li Liu, Fumin Shen, Yuming Shen, Xianglong Liu, Ling Shao
Abstract	Free-hand sketch-based image retrieval (SBIR) is a specific cross-view retrieval task, in which queries are abstract and ambiguous sketches while the retrieval database is formed with natural images. Work in this area mainly focuses on extracting representative and shared features for sketches and natural images. However, these can neither cope well with the geometric distortion between sketches and images nor be feasible for large-scale SBIR due to the heavy continuous-valued distance computation. In this paper, we speed up SBIR by introducing a novel binary coding method, named \textbf{Deep Sketch Hashing} (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework. Specifically, three convolutional neural networks are utilized to encode free-hand sketches, natural images and, especially, the auxiliary sketch-tokens which are adopted as bridges to mitigate the sketch-image geometric distortion. The learned DSH codes can effectively capture the cross-view similarities as well as the intrinsic semantic correlations between different categories. To the best of our knowledge, DSH is the first hashing work specifically designed for category-level SBIR with an end-to-end deep architecture. The proposed DSH is comprehensively evaluated on two large-scale datasets of TU-Berlin Extension and Sketchy, and the experiments consistently show DSH’s superior SBIR accuracies over several state-of-the-art methods, while achieving significantly reduced retrieval time and memory footprint.
Tasks	Image Retrieval, Sketch-Based Image Retrieval
Published	2017-03-16
URL	http://arxiv.org/abs/1703.05605v1
PDF	http://arxiv.org/pdf/1703.05605v1.pdf
PWC	https://paperswithcode.com/paper/deep-sketch-hashing-fast-free-hand-sketch
Repo	https://github.com/ymcidence/DeepSketchHashing
Framework	none

Adversarial Patch


Title	Adversarial Patch
Authors	Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi, Justin Gilmer
Abstract	We present a method to create universal, robust, targeted adversarial image patches in the real world. The patches are universal because they can be used to attack any scene, robust because they work under a wide variety of transformations, and targeted because they can cause a classifier to output any target class. These adversarial patches can be printed, added to any scene, photographed, and presented to image classifiers; even when the patches are small, they cause the classifiers to ignore the other items in the scene and report a chosen target class. To reproduce the results from the paper, our code is available at https://github.com/tensorflow/cleverhans/tree/master/examples/adversarial_patch
Tasks
Published	2017-12-27
URL	http://arxiv.org/abs/1712.09665v2
PDF	http://arxiv.org/pdf/1712.09665v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-patch
Repo	https://github.com/Fyndir/PythonDeepLearn
Framework	tf

A Public Image Database for Benchmark of Plant Seedling Classification Algorithms


Title	A Public Image Database for Benchmark of Plant Seedling Classification Algorithms
Authors	Thomas Mosgaard Giselsson, Rasmus Nyholm Jørgensen, Peter Kryger Jensen, Mads Dyrmann, Henrik Skov Midtiby
Abstract	A database of images of approximately 960 unique plants belonging to 12 species at several growth stages is made publicly available. It comprises annotated RGB images with a physical resolution of roughly 10 pixels per mm. To standardise the evaluation of classification results obtained with the database, a benchmark based on $f_{1}$ scores is proposed. The dataset is available at https://vision.eng.au.dk/plant-seedlings-dataset
Tasks
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05458v1
PDF	http://arxiv.org/pdf/1711.05458v1.pdf
PWC	https://paperswithcode.com/paper/a-public-image-database-for-benchmark-of
Repo	https://github.com/WuZhuoran/Plant_Seedlings_Classification
Framework	pytorch

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm


Title	Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
Authors	Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, Sune Lehmann
Abstract	NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within sentiment, emotion and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches.
Tasks	Sarcasm Detection, Sentiment Analysis
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00524v2
PDF	http://arxiv.org/pdf/1708.00524v2.pdf
PWC	https://paperswithcode.com/paper/using-millions-of-emoji-occurrences-to-learn
Repo	https://github.com/bfelbo/deepmoji
Framework	tf

Continuous-Time Relationship Prediction in Dynamic Heterogeneous Information Networks


Title	Continuous-Time Relationship Prediction in Dynamic Heterogeneous Information Networks
Authors	Sina Sajadmanesh, Sogol Bazargani, Jiawei Zhang, Hamid R. Rabiee
Abstract	Online social networks, World Wide Web, media and technological networks, and other types of so-called information networks are ubiquitous nowadays. These information networks are inherently heterogeneous and dynamic. They are heterogeneous as they consist of multi-typed objects and relations, and they are dynamic as they are constantly evolving over time. One of the challenging issues in such heterogeneous and dynamic environments is to forecast those relationships in the network that will appear in the future. In this paper, we try to solve the problem of continuous-time relationship prediction in dynamic and heterogeneous information networks. This implies predicting the time it takes for a relationship to appear in the future, given its features that have been extracted by considering both heterogeneity and temporal dynamics of the underlying network. To this end, we first introduce a feature extraction framework that combines the power of meta-path-based modeling and recurrent neural networks to effectively extract features suitable for relationship prediction regarding heterogeneity and dynamicity of the networks. Next, we propose a supervised non-parametric approach, called Non-Parametric Generalized Linear Model (NP-GLM), which infers the hidden underlying probability distribution of the relationship building time given its features. We then present a learning algorithm to train NP-GLM and an inference method to answer time-related queries. Extensive experiments conducted on synthetic data and three real-world datasets, namely Delicious, MovieLens, and DBLP, demonstrate the effectiveness of NP-GLM in solving continuous-time relationship prediction problem vis-a-vis competitive baselines
Tasks
Published	2017-09-30
URL	https://arxiv.org/abs/1710.00818v4
PDF	https://arxiv.org/pdf/1710.00818v4.pdf
PWC	https://paperswithcode.com/paper/continuous-time-relationship-prediction-in
Repo	https://github.com/sisaman/npglm
Framework	none

Bayesian Policy Gradients via Alpha Divergence Dropout Inference


Title	Bayesian Policy Gradients via Alpha Divergence Dropout Inference
Authors	Peter Henderson, Thang Doan, Riashat Islam, David Meger
Abstract	Policy gradient methods have had great success in solving continuous control tasks, yet the stochastic nature of such problems makes deterministic value estimation difficult. We propose an approach which instead estimates a distribution by fitting the value function with a Bayesian Neural Network. We optimize an $\alpha$-divergence objective with Bayesian dropout approximation to learn and estimate this distribution. We show that using the Monte Carlo posterior mean of the Bayesian value function distribution, rather than a deterministic network, improves stability and performance of policy gradient methods in continuous control MuJoCo simulations.
Tasks	Continuous Control, Policy Gradient Methods
Published	2017-12-06
URL	http://arxiv.org/abs/1712.02037v1
PDF	http://arxiv.org/pdf/1712.02037v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-policy-gradients-via-alpha
Repo	https://github.com/Breakend/BayesianPolicyGradients
Framework	tf

Automatic Discovery and Geotagging of Objects from Street View Imagery


Title	Automatic Discovery and Geotagging of Objects from Street View Imagery
Authors	Vladimir A. Krylov, Eamonn Kenny, Rozenn Dahyot
Abstract	Many applications such as autonomous navigation, urban planning and asset monitoring, rely on the availability of accurate information about objects and their geolocations. In this paper we propose to automatically detect and compute the GPS coordinates of recurring stationary objects of interest using street view imagery. Our processing pipeline relies on two fully convolutional neural networks: the first segments objects in the images while the second estimates their distance from the camera. To geolocate all the detected objects coherently we propose a novel custom Markov Random Field model to perform objects triangulation. The novelty of the resulting pipeline is the combined use of monocular depth estimation and triangulation to enable automatic mapping of complex scenes with multiple visually similar objects of interest. We validate experimentally the effectiveness of our approach on two object classes: traffic lights and telegraph poles. The experiments report high object recall rates and GPS accuracy within 2 meters, which is comparable with the precision of single-frequency GPS receivers.
Tasks	Autonomous Navigation, Depth Estimation, Monocular Depth Estimation
Published	2017-08-28
URL	http://arxiv.org/abs/1708.08417v2
PDF	http://arxiv.org/pdf/1708.08417v2.pdf
PWC	https://paperswithcode.com/paper/automatic-discovery-and-geotagging-of-objects
Repo	https://github.com/sasha-kap/CV-to-Maps
Framework	tf

A time series distance measure for efficient clustering of input output signals by their underlying dynamics


Title	A time series distance measure for efficient clustering of input output signals by their underlying dynamics
Authors	Oliver Lauwers, Bart De Moor
Abstract	Starting from a dataset with input/output time series generated by multiple deterministic linear dynamical systems, this paper tackles the problem of automatically clustering these time series. We propose an extension to the so-called Martin cepstral distance, that allows to efficiently cluster these time series, and apply it to simulated electrical circuits data. Traditionally, two ways of handling the problem are used. The first class of methods employs a distance measure on time series (e.g. Euclidean, Dynamic Time Warping) and a clustering technique (e.g. k-means, k-medoids, hierarchical clustering) to find natural groups in the dataset. It is, however, often not clear whether these distance measures effectively take into account the specific temporal correlations in these time series. The second class of methods uses the input/output data to identify a dynamic system using an identification scheme, and then applies a model norm-based distance (e.g. H2, H-infinity) to find out which systems are similar. This, however, can be very time consuming for large amounts of long time series data. We show that the new distance measure presented in this paper performs as good as when every input/output pair is modelled explicitly, but remains computationally much less complex. The complexity of calculating this distance between two time series of length N is O(N logN).
Tasks	Time Series
Published	2017-03-06
URL	http://arxiv.org/abs/1703.01923v1
PDF	http://arxiv.org/pdf/1703.01923v1.pdf
PWC	https://paperswithcode.com/paper/a-time-series-distance-measure-for-efficient
Repo	https://github.com/Olauwers/Extended-Cepstral-Distance
Framework	none


Title	Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
Authors	Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe
Abstract	This paper presents a novel method for detecting pedestrians under adverse illumination conditions. Our approach relies on a novel cross-modality learning framework and it is based on two main phases. First, given a multimodal dataset, a deep convolutional network is employed to learn a non-linear mapping, modeling the relations between RGB and thermal data. Then, the learned feature representations are transferred to a second deep network, which receives as input an RGB image and outputs the detection results. In this way, features which are both discriminative and robust to bad illumination conditions are learned. Importantly, at test time, only the second pipeline is considered and no thermal data are required. Our extensive evaluation demonstrates that the proposed approach outperforms the state-of- the-art on the challenging KAIST multispectral pedestrian dataset and it is competitive with previous methods on the popular Caltech dataset.
Tasks	Pedestrian Detection
Published	2017-04-08
URL	http://arxiv.org/abs/1704.02431v2
PDF	http://arxiv.org/pdf/1704.02431v2.pdf
PWC	https://paperswithcode.com/paper/learning-cross-modal-deep-representations-for
Repo	https://github.com/SoonminHwang/rgbt-ped-detection
Framework	none

One-Sided Unsupervised Domain Mapping


Title	One-Sided Unsupervised Domain Mapping
Authors	Sagie Benaim, Lior Wolf
Abstract	In unsupervised domain mapping, the learner is given two unmatched datasets $A$ and $B$. The goal is to learn a mapping $G_{AB}$ that translates a sample in $A$ to the analog sample in $B$. Recent approaches have shown that when learning simultaneously both $G_{AB}$ and the inverse mapping $G_{BA}$, convincing mappings are obtained. In this work, we present a method of learning $G_{AB}$ without learning $G_{BA}$. This is done by learning a mapping that maintains the distance between a pair of samples. Moreover, good mappings are obtained, even by maintaining the distance between different parts of the same sample before and after mapping. We present experimental results that the new method not only allows for one sided mapping learning, but also leads to preferable numerical results over the existing circularity-based constraint. Our entire code is made publicly available at https://github.com/sagiebenaim/DistanceGAN .
Tasks	Style Transfer, Unsupervised Image-To-Image Translation
Published	2017-06-02
URL	http://arxiv.org/abs/1706.00826v2
PDF	http://arxiv.org/pdf/1706.00826v2.pdf
PWC	https://paperswithcode.com/paper/one-sided-unsupervised-domain-mapping
Repo	https://github.com/sagiebenaim/DistanceGAN
Framework	pytorch

Personalized Saliency and its Prediction


Title	Personalized Saliency and its Prediction
Authors	Yanyu Xu, Shenghua Gao, Junru Wu, Nianyi Li, Jingyi Yu
Abstract	Nearly all existing visual saliency models by far have focused on predicting a universal saliency map across all observers. Yet psychology studies suggest that visual attention of different observers can vary significantly under specific circumstances, especially a scene is composed of multiple salient objects. To study such heterogenous visual attention pattern across observers, we first construct a personalized saliency dataset and explore correlations between visual attention, personal preferences, and image contents. Specifically, we propose to decompose a personalized saliency map (referred to as PSM) into a universal saliency map (referred to as USM) predictable by existing saliency detection models and a new discrepancy map across users that characterizes personalized saliency. We then present two solutions towards predicting such discrepancy maps, i.e., a multi-task convolutional neural network (CNN) framework and an extended CNN with Person-specific Information Encoded Filters (CNN-PIEF). Extensive experimental results demonstrate the effectiveness of our models for PSM prediction as well their generalization capability for unseen observers.
Tasks	Saliency Detection
Published	2017-10-09
URL	http://arxiv.org/abs/1710.03011v2
PDF	http://arxiv.org/pdf/1710.03011v2.pdf
PWC	https://paperswithcode.com/paper/personalized-saliency-and-its-prediction
Repo	https://github.com/xuyanyu-shh/Personalized-Saliency
Framework	none