October 16, 2019

3235 words 16 mins read

Paper Group ANR 1138

Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. Learning via social awareness: Improving a deep generative sketching model with facial feedback. Learning to guide task and motion planning using score-space representation. VH-HFCN based Parking Slot and Lane Markings Segmentation on Panoramic Sur …

Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application


Title	Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application
Authors	Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, Yinghui Xu
Abstract	In e-commerce platforms such as Amazon and TaoBao, ranking items in a search session is a typical multi-step decision-making problem. Learning to rank (LTR) methods have been widely applied to ranking problems. However, such methods often consider different ranking steps in a session to be independent, which conversely may be highly correlated to each other. For better utilizing the correlation between different ranking steps, in this paper, we propose to use reinforcement learning (RL) to learn an optimal ranking policy which maximizes the expected accumulative rewards in a search session. Firstly, we formally define the concept of search session Markov decision process (SSMDP) to formulate the multi-step ranking problem. Secondly, we analyze the property of SSMDP and theoretically prove the necessity of maximizing accumulative rewards. Lastly, we propose a novel policy gradient algorithm for learning an optimal ranking policy, which is able to deal with the problem of high reward variance and unbalanced reward distribution of an SSMDP. Experiments are conducted in simulation and TaoBao search engine. The results demonstrate that our algorithm performs much better than online LTR methods, with more than 40% and 30% growth of total transaction amount in the simulation and the real application, respectively.
Tasks	Decision Making, Learning-To-Rank
Published	2018-03-02
URL	http://arxiv.org/abs/1803.00710v3
PDF	http://arxiv.org/pdf/1803.00710v3.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-to-rank-in-e-commerce
Repo
Framework


Title	Learning via social awareness: Improving a deep generative sketching model with facial feedback
Authors	Natasha Jaques, Jennifer McCleary, Jesse Engel, David Ha, Fred Bertsch, Rosalind Picard, Douglas Eck
Abstract	In the quest towards general artificial intelligence (AI), researchers have explored developing loss functions that act as intrinsic motivators in the absence of external rewards. This paper argues that such research has overlooked an important and useful intrinsic motivator: social interaction. We posit that making an AI agent aware of implicit social feedback from humans can allow for faster learning of more generalizable and useful representations, and could potentially impact AI safety. We collect social feedback in the form of facial expression reactions to samples from Sketch RNN, an LSTM-based variational autoencoder (VAE) designed to produce sketch drawings. We use a Latent Constraints GAN (LC-GAN) to learn from the facial feedback of a small group of viewers, by optimizing the model to produce sketches that it predicts will lead to more positive facial expressions. We show in multiple independent evaluations that the model trained with facial feedback produced sketches that are more highly rated, and induce significantly more positive facial expressions. Thus, we establish that implicit social feedback can improve the output of a deep learning model.
Tasks
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04877v2
PDF	http://arxiv.org/pdf/1802.04877v2.pdf
PWC	https://paperswithcode.com/paper/learning-via-social-awareness-improving-a
Repo
Framework

Learning to guide task and motion planning using score-space representation


Title	Learning to guide task and motion planning using score-space representation
Authors	Beomjoon Kim, Zi Wang, Leslie Pack Kaelbling, Tomas Lozano-Perez
Abstract	In this paper, we propose a learning algorithm that speeds up the search in task and motion planning problems. Our algorithm proposes solutions to three different challenges that arise in learning to improve planning efficiency: what to predict, how to represent a planning problem instance, and how to transfer knowledge from one problem instance to another. We propose a method that predicts constraints on the search space based on a generic representation of a planning problem instance, called score-space, where we represent a problem instance in terms of the performance of a set of solutions attempted so far. Using this representation, we transfer knowledge, in the form of constraints, from previous problems based on the similarity in score space. We design a sequential algorithm that efficiently predicts these constraints, and evaluate it in three different challenging task and motion planning problems. Results indicate that our approach performs orders of magnitudes faster than an unguided planner
Tasks	Motion Planning
Published	2018-07-26
URL	http://arxiv.org/abs/1807.09962v1
PDF	http://arxiv.org/pdf/1807.09962v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-guide-task-and-motion-planning
Repo
Framework

VH-HFCN based Parking Slot and Lane Markings Segmentation on Panoramic Surround View


Title	VH-HFCN based Parking Slot and Lane Markings Segmentation on Panoramic Surround View
Authors	Yan Wu, Tao Yang, Junqiao Zhao, Linting Guan, Wei Jiang
Abstract	The automatic parking is being massively developed by car manufacturers and providers. Until now, there are two problems with the automatic parking. First, there is no openly-available segmentation labels of parking slot on panoramic surround view (PSV) dataset. Second, how to detect parking slot and road structure robustly. Therefore, in this paper, we build up a public PSV dataset. At the same time, we proposed a highly fused convolutional network (HFCN) based segmentation method for parking slot and lane markings based on the PSV dataset. A surround-view image is made of four calibrated images captured from four fisheye cameras. We collect and label more than 4,200 surround view images for this task, which contain various illuminated scenes of different types of parking slots. A VH-HFCN network is proposed, which adopts an HFCN as the base, with an extra efficient VH-stage for better segmenting various markings. The VH-stage consists of two independent linear convolution paths with vertical and horizontal convolution kernels respectively. This modification enables the network to robustly and precisely extract linear features. We evaluated our model on the PSV dataset and the results showed outstanding performance in ground markings segmentation. Based on the segmented markings, parking slots and lanes are acquired by skeletonization, hough line transform and line arrangement.
Tasks
Published	2018-04-19
URL	http://arxiv.org/abs/1804.07027v2
PDF	http://arxiv.org/pdf/1804.07027v2.pdf
PWC	https://paperswithcode.com/paper/vh-hfcn-based-parking-slot-and-lane-markings
Repo
Framework

HeadOn: Real-time Reenactment of Human Portrait Videos


Title	HeadOn: Real-time Reenactment of Human Portrait Videos
Authors	Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, Matthias Nießner
Abstract	We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel real-time reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.
Tasks
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11729v1
PDF	http://arxiv.org/pdf/1805.11729v1.pdf
PWC	https://paperswithcode.com/paper/headon-real-time-reenactment-of-human
Repo
Framework

Theory IIIb: Generalization in Deep Networks


Title	Theory IIIb: Generalization in Deep Networks
Authors	Tomaso Poggio, Qianli Liao, Brando Miranda, Andrzej Banburski, Xavier Boix, Jack Hidary
Abstract	A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of “overfitting”, defined in this paper as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient descent. This is surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Recent results by Srebro et al. provide a satisfying solution of the puzzle for linear networks used in binary classification. They prove that minimization of loss functions such as the logistic, the cross-entropy and the exp-loss yields asymptotic, “slow” convergence to the maximum margin solution for linearly separable datasets, independently of the initial conditions. Here we prove a similar result for nonlinear multilayer DNNs near zero minima of the empirical loss. The result holds for exponential-type losses but not for the square loss. In particular, we prove that the weight matrix at each layer of a deep network converges to a minimum norm solution up to a scale factor (in the separable case). Our analysis of the dynamical system corresponding to gradient descent of a multilayer network suggests a simple criterion for ranking the generalization performance of different zero minimizers of the empirical loss.
Tasks
Published	2018-06-29
URL	http://arxiv.org/abs/1806.11379v1
PDF	http://arxiv.org/pdf/1806.11379v1.pdf
PWC	https://paperswithcode.com/paper/theory-iiib-generalization-in-deep-networks
Repo
Framework

MagicVO: End-to-End Monocular Visual Odometry through Deep Bi-directional Recurrent Convolutional Neural Network


Title	MagicVO: End-to-End Monocular Visual Odometry through Deep Bi-directional Recurrent Convolutional Neural Network
Authors	Jian Jiao, Jichao Jiao, Yaokai Mo, Weilun Liu, Zhongliang Deng
Abstract	This paper proposes a new framework to solve the problem of monocular visual odometry, called MagicVO . Based on Convolutional Neural Network (CNN) and Bi-directional LSTM (Bi-LSTM), MagicVO outputs a 6-DoF absolute-scale pose at each position of the camera with a sequence of continuous monocular images as input. It not only utilizes the outstanding performance of CNN in image feature processing to extract the rich features of image frames fully but also learns the geometric relationship from image sequences pre and post through Bi-LSTM to get a more accurate prediction. A pipeline of the MagicVO is shown in Fig. 1. The MagicVO system is end-to-end, and the results of experiments on the KITTI dataset and the ETH-asl cla dataset show that MagicVO has a better performance than traditional visual odometry (VO) systems in the accuracy of pose and the generalization ability.
Tasks	Monocular Visual Odometry, Visual Odometry
Published	2018-11-27
URL	http://arxiv.org/abs/1811.10964v2
PDF	http://arxiv.org/pdf/1811.10964v2.pdf
PWC	https://paperswithcode.com/paper/magicvo-end-to-end-monocular-visual-odometry
Repo
Framework

Learnable: Theory vs Applications


Title	Learnable: Theory vs Applications
Authors	Marina Sapir
Abstract	Two different views on machine learning problem: Applied learning (machine learning with business applications) and Agnostic PAC learning are formalized and compared here. I show that, under some conditions, the theory of PAC Learnable provides a way to solve the Applied learning problem. However, the theory requires to have the training sets so large, that it would make the learning practically useless. I suggest shedding some theoretical misconceptions about learning to make the theory more aligned with the needs and experience of practitioners.
Tasks
Published	2018-07-27
URL	http://arxiv.org/abs/1807.10681v1
PDF	http://arxiv.org/pdf/1807.10681v1.pdf
PWC	https://paperswithcode.com/paper/learnable-theory-vs-applications
Repo
Framework

Guided Feature Selection for Deep Visual Odometry


Title	Guided Feature Selection for Deep Visual Odometry
Authors	Fei Xue, Qiuyuan Wang, Xin Wang, Wei Dong, Junqiu Wang, Hongbin Zha
Abstract	We present a novel end-to-end visual odometry architecture with guided feature selection based on deep convolutional recurrent neural networks. Different from current monocular visual odometry methods, our approach is established on the intuition that features contribute discriminately to different motion patterns. Specifically, we propose a dual-branch recurrent network to learn the rotation and translation separately by leveraging current Convolutional Neural Network (CNN) for feature representation and Recurrent Neural Network (RNN) for image sequence reasoning. To enhance the ability of feature selection, we further introduce an effective context-aware guidance mechanism to force each branch to distill related information for specific motion pattern explicitly. Experiments demonstrate that on the prevalent KITTI and ICL_NUIM benchmarks, our method outperforms current state-of-the-art model- and learning-based methods for both decoupled and joint camera pose recovery.
Tasks	Feature Selection, Monocular Visual Odometry, Visual Odometry
Published	2018-11-25
URL	http://arxiv.org/abs/1811.09935v1
PDF	http://arxiv.org/pdf/1811.09935v1.pdf
PWC	https://paperswithcode.com/paper/guided-feature-selection-for-deep-visual
Repo
Framework

IMMIGRATE: A Margin-based Feature Selection Method with Interaction Terms


Title	IMMIGRATE: A Margin-based Feature Selection Method with Interaction Terms
Authors	Ruzhang Zhao, Pengyu Hong, Jun S Liu
Abstract	Relief based algorithms have often been claimed to uncover feature interactions. However, it is still unclear whether and how interaction terms will be differentiated from marginal effects. In this paper, we propose IMMIGRATE algorithm by including and training weights for interaction terms. Besides applying the large margin principle, we focus on the robustness of the contributors of margin and consider local and global information simultaneously. Moreover, IMMIGRATE has been shown to enjoy attractive properties, such as robustness and combination with Boosting. We evaluate our proposed method on several tasks, which achieves state-of-the-art results significantly.
Tasks	Feature Selection
Published	2018-10-05
URL	https://arxiv.org/abs/1810.02658v3
PDF	https://arxiv.org/pdf/1810.02658v3.pdf
PWC	https://paperswithcode.com/paper/immigrate-a-margin-based-feature-selection
Repo
Framework

Unveiling the Power of Deep Tracking


Title	Unveiling the Power of Deep Tracking
Authors	Goutam Bhat, Joakim Johnander, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg
Abstract	In the field of generic object tracking numerous attempts have been made to exploit deep features. Despite all expectations, deep trackers are yet to reach an outstanding level of performance compared to methods solely based on handcrafted features. In this paper, we investigate this key issue and propose an approach to unlock the true potential of deep features for tracking. We systematically study the characteristics of both deep and shallow features, and their relation to tracking accuracy and robustness. We identify the limited data and low spatial resolution as the main challenges, and propose strategies to counter these issues when integrating deep features for tracking. Furthermore, we propose a novel adaptive fusion approach that leverages the complementary properties of deep and shallow features to improve both robustness and accuracy. Extensive experiments are performed on four challenging datasets. On VOT2017, our approach significantly outperforms the top performing tracker from the challenge with a relative gain of 17% in EAO.
Tasks	Object Tracking
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06833v1
PDF	http://arxiv.org/pdf/1804.06833v1.pdf
PWC	https://paperswithcode.com/paper/unveiling-the-power-of-deep-tracking
Repo
Framework

How Do Classifiers Induce Agents To Invest Effort Strategically?


Title	How Do Classifiers Induce Agents To Invest Effort Strategically?
Authors	Jon Kleinberg, Manish Raghavan
Abstract	Algorithms are often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort in order to change the outcomes they receive, and we give a tight characterization of when such agents can be incentivized to invest specified forms of effort into improving their outcomes as opposed to “gaming” the classifier. We show that whenever any “reasonable” mechanism can do so, a simple linear mechanism suffices.
Tasks	Decision Making
Published	2018-07-13
URL	https://arxiv.org/abs/1807.05307v5
PDF	https://arxiv.org/pdf/1807.05307v5.pdf
PWC	https://paperswithcode.com/paper/how-do-classifiers-induce-agents-to-invest
Repo
Framework

Deep Learning based Retinal OCT Segmentation


Title	Deep Learning based Retinal OCT Segmentation
Authors	Mike Pekala, Neil Joshi, David E. Freund, Neil M. Bressler, Delia Cabrera DeBuc, Philippe M Burlina
Abstract	Our objective is to evaluate the efficacy of methods that use deep learning (DL) for the automatic fine-grained segmentation of optical coherence tomography (OCT) images of the retina. OCT images from 10 patients with mild non-proliferative diabetic retinopathy were used from a public (U. of Miami) dataset. For each patient, five images were available: one image of the fovea center, two images of the perifovea, and two images of the parafovea. For each image, two expert graders each manually annotated five retinal surfaces (i.e. boundaries between pairs of retinal layers). The first grader’s annotations were used as ground truth and the second grader’s annotations to compute inter-operator agreement. The proposed automated approach segments images using fully convolutional networks (FCNs) together with Gaussian process (GP)-based regression as a post-processing step to improve the quality of the estimates. Using 10-fold cross validation, the performance of the algorithms is determined by computing the per-pixel unsigned error (distance) between the automated estimates and the ground truth annotations generated by the first manual grader. We compare the proposed method against five state of the art automatic segmentation techniques. The results show that the proposed methods compare favorably with state of the art techniques, resulting in the smallest mean unsigned error values and associated standard deviations, and performance is comparable with human annotation of retinal layers from OCT when there is only mild retinopathy. The results suggest that semantic segmentation using FCNs, coupled with regression-based post-processing, can effectively solve the OCT segmentation problem on par with human capabilities with mild retinopathy.
Tasks	Semantic Segmentation
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09749v1
PDF	http://arxiv.org/pdf/1801.09749v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-retinal-oct-segmentation
Repo
Framework

An Enhanced BPSO based Approach for Service Placement in Hybrid Cloud


Title	An Enhanced BPSO based Approach for Service Placement in Hybrid Cloud
Authors	Wissem Abbes, Zied Kechaou, Adel M. Alimi
Abstract	Due to the challenges of competition and the rapidly evolving market, companies need to be innovative and agile, particularly in regard of web applications as used by customers. Nowadays, hybrid cloud stands as an attractive solution as organizations tend to use a combination of private and public cloud implementations, in accordance with their appropriate needs to profitably apply the available resources and speed of execution. In such a case, deploying the new applications would certainly entail opting for placing and consecrating some components to the private cloud option, while reserving some others to the public cloud option. In this respect, our primary goal in this paper consists in minimizing the extra costs likely to be incurred by applying the public cloud related options, along with those costs involved in maintaining communication between the private cloud system and the public cloud framework. As for our second targeted objective, it lies in reducing the decision process relating to the execution time, necessary for selecting the optimal service placement solution. For this purpose, a novel Binary Particle Swarm Optimization (BPSO) based approach is proposed, useful for an effective service placement optimization within hybrid cloud to take place. Using a real benchmark, the experimental results appear to reveal that our proposed approach reached results that outperform those documented in the state of the art both in terms of cost and time.
Tasks
Published	2018-06-10
URL	http://arxiv.org/abs/1806.05971v1
PDF	http://arxiv.org/pdf/1806.05971v1.pdf
PWC	https://paperswithcode.com/paper/an-enhanced-bpso-based-approach-for-service
Repo
Framework

Force Estimation from OCT Volumes using 3D CNNs


Title	Force Estimation from OCT Volumes using 3D CNNs
Authors	Nils Gessert, Jens Beringhoff, Christoph Otte, Alexander Schlaefer
Abstract	\textit{Purpose} Estimating the interaction forces of instruments and tissue is of interest, particularly to provide haptic feedback during robot assisted minimally invasive interventions. Different approaches based on external and integrated force sensors have been proposed. These are hampered by friction, sensor size, and sterilizability. We investigate a novel approach to estimate the force vector directly from optical coherence tomography image volumes. \textit{Methods} We introduce a novel Siamese 3D CNN architecture. The network takes an undeformed reference volume and a deformed sample volume as an input and outputs the three components of the force vector. We employ a deep residual architecture with bottlenecks for increased efficiency. We compare the Siamese approach to methods using difference volumes and two-dimensional projections. Data was generated using a robotic setup to obtain ground truth force vectors for silicon tissue phantoms as well as porcine tissue. \textit{Results} Our method achieves a mean average error of 7.7 +- 4.3 mN when estimating the force vector. Our novel Siamese 3D CNN architecture outperforms single-path methods that achieve a mean average error of 11.59 +- 6.7 mN. Moreover, the use of volume data leads to significantly higher performance compared to processing only surface information which achieves a mean average error of 24.38 +- 22.0 mN. Based on the tissue dataset, our methods shows good generalization in between different subjects. \textit{Conclusions} We propose a novel image-based force estimation method using optical coherence tomography. We illustrate that capturing the deformation of subsurface structures substantially improves force estimation. Our approach can provide accurate force estimates in surgical setups when using intraoperative optical coherence tomography.
Tasks
Published	2018-04-26
URL	http://arxiv.org/abs/1804.10002v1
PDF	http://arxiv.org/pdf/1804.10002v1.pdf
PWC	https://paperswithcode.com/paper/force-estimation-from-oct-volumes-using-3d
Repo
Framework