April 2, 2020

3258 words 16 mins read

Paper Group ANR 252

Indoor Layout Estimation by 2D LiDAR and Camera Fusion. Off-policy Maximum Entropy Reinforcement Learning : Soft Actor-Critic with Advantage Weighted Mixture Policy(SAC-AWMP). MetNet: A Neural Weather Model for Precipitation Forecasting. Difference Attention Based Error Correction LSTM Model for Time Series Prediction. AirRL: A Reinforcement Learni …

Indoor Layout Estimation by 2D LiDAR and Camera Fusion


Title	Indoor Layout Estimation by 2D LiDAR and Camera Fusion
Authors	Jieyu Li, Robert L Stevenson
Abstract	This paper presents an algorithm for indoor layout estimation and reconstruction through the fusion of a sequence of captured images and LiDAR data sets. In the proposed system, a movable platform collects both intensity images and 2D LiDAR information. Pose estimation and semantic segmentation is computed jointly by aligning the LiDAR points to line segments from the images. For indoor scenes with walls orthogonal to floor, the alignment problem is decoupled into top-down view projection and a 2D similarity transformation estimation and solved by the recursive random sample consensus (R-RANSAC) algorithm. Hypotheses can be generated, evaluated and optimized by integrating new scans as the platform moves throughout the environment. The proposed method avoids the need of extensive prior training or a cuboid layout assumption, which is more effective and practical compared to most previous indoor layout estimation methods. Multi-sensor fusion allows the capability of providing accurate depth estimation and high resolution visual information.
Tasks	Depth Estimation, Pose Estimation, Semantic Segmentation, Sensor Fusion
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05422v1
PDF	https://arxiv.org/pdf/2001.05422v1.pdf
PWC	https://paperswithcode.com/paper/indoor-layout-estimation-by-2d-lidar-and
Repo
Framework

Off-policy Maximum Entropy Reinforcement Learning : Soft Actor-Critic with Advantage Weighted Mixture Policy(SAC-AWMP)


Title	Off-policy Maximum Entropy Reinforcement Learning : Soft Actor-Critic with Advantage Weighted Mixture Policy(SAC-AWMP)
Authors	Zhimin Hou, Kuangen Zhang, Yi Wan, Dongyu Li, Chenglong Fu, Haoyong Yu
Abstract	The optimal policy of a reinforcement learning problem is often discontinuous and non-smooth. I.e., for two states with similar representations, their optimal policies can be significantly different. In this case, representing the entire policy with a function approximator (FA) with shared parameters for all states maybe not desirable, as the generalization ability of parameters sharing makes representing discontinuous, non-smooth policies difficult. A common way to solve this problem, known as Mixture-of-Experts, is to represent the policy as the weighted sum of multiple components, where different components perform well on different parts of the state space. Following this idea and inspired by a recent work called advantage-weighted information maximization, we propose to learn for each state weights of these components, so that they entail the information of the state itself and also the preferred action learned so far for the state. The action preference is characterized via the advantage function. In this case, the weight of each component would only be large for certain groups of states whose representations are similar and preferred action representations are also similar. Therefore each component is easy to be represented. We call a policy parameterized in this way an Advantage Weighted Mixture Policy (AWMP) and apply this idea to improve soft-actor-critic (SAC), one of the most competitive continuous control algorithm. Experimental results demonstrate that SAC with AWMP clearly outperforms SAC in four commonly used continuous control tasks and achieve stable performance across different random seeds.
Tasks	Continuous Control
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02829v1
PDF	https://arxiv.org/pdf/2002.02829v1.pdf
PWC	https://paperswithcode.com/paper/off-policy-maximum-entropy-reinforcement
Repo
Framework

MetNet: A Neural Weather Model for Precipitation Forecasting


Title	MetNet: A Neural Weather Model for Precipitation Forecasting
Authors	Casper Kaae Sønderby, Lasse Espeholt, Jonathan Heek, Mostafa Dehghani, Avital Oliver, Tim Salimans, Shreya Agrawal, Jason Hickey, Nal Kalchbrenner
Abstract	Weather forecasting is a long standing scientific challenge with direct social and economic impact. The task is suitable for deep neural networks due to vast amounts of continuously collected data and a rich spatial and temporal structure that presents long range dependencies. We introduce MetNet, a neural network that forecasts precipitation up to 8 hours into the future at the high spatial resolution of 1 km$^2$ and at the temporal resolution of 2 minutes with a latency in the order of seconds. MetNet takes as input radar and satellite data and forecast lead time and produces a probabilistic precipitation map. The architecture uses axial self-attention to aggregate the global context from a large input patch corresponding to a million square kilometers. We evaluate the performance of MetNet at various precipitation thresholds and find that MetNet outperforms Numerical Weather Prediction at forecasts of up to 7 to 8 hours on the scale of the continental United States.
Tasks	Weather Forecasting
Published	2020-03-24
URL	https://arxiv.org/abs/2003.12140v2
PDF	https://arxiv.org/pdf/2003.12140v2.pdf
PWC	https://paperswithcode.com/paper/metnet-a-neural-weather-model-for
Repo
Framework

Difference Attention Based Error Correction LSTM Model for Time Series Prediction


Title	Difference Attention Based Error Correction LSTM Model for Time Series Prediction
Authors	Yuxuan Liu, Jiangyong Duan, Juan Meng
Abstract	In this paper, we propose a novel model for time series prediction in which difference-attention LSTM model and error-correction LSTM model are respectively employed and combined in a cascade way. While difference-attention LSTM model introduces a difference feature to perform attention in traditional LSTM to focus on the obvious changes in time series. Error-correction LSTM model refines the prediction error of difference-attention LSTM model to further improve the prediction accuracy. Finally, we design a training strategy to jointly train the both models simultaneously. With additional difference features and new principle learning framework, our model can improve the prediction accuracy in time series. Experiments on various time series are conducted to demonstrate the effectiveness of our method.
Tasks	Time Series, Time Series Prediction
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13616v1
PDF	https://arxiv.org/pdf/2003.13616v1.pdf
PWC	https://paperswithcode.com/paper/difference-attention-based-error-correction
Repo
Framework

AirRL: A Reinforcement Learning Approach to Urban Air Quality Inference


Title	AirRL: A Reinforcement Learning Approach to Urban Air Quality Inference
Authors	Huiqiang Zhong, Cunxiang Yin, Xiaohui Wu, Jinchang Luo, JiaWei He
Abstract	Urban air pollution has become a major environmental problem that threatens public health. It has become increasingly important to infer fine-grained urban air quality based on existing monitoring stations. One of the challenges is how to effectively select some relevant stations for air quality inference. In this paper, we propose a novel model based on reinforcement learning for urban air quality inference. The model consists of two modules: a station selector and an air quality regressor. The station selector dynamically selects the most relevant monitoring stations when inferring air quality. The air quality regressor takes in the selected stations and makes air quality inference with deep neural network. We conduct experiments on a real-world air quality dataset and our approach achieves the highest performance compared with several popular solutions, and the experiments show significant effectiveness of proposed model in tackling problems of air quality inference.
Tasks	Air Quality Inference
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12205v1
PDF	https://arxiv.org/pdf/2003.12205v1.pdf
PWC	https://paperswithcode.com/paper/airrl-a-reinforcement-learning-approach-to
Repo
Framework


Title	Identifying Individual Dogs in Social Media Images
Authors	Djordje Batic, Dubravko Culibrk
Abstract	We present the results of an initial study focused on developing a visual AI solution able to recognize individual dogs in unconstrained (wild) images occurring on social media. The work described here is part of joint project done with Pet2Net, a social network focused on pets and their owners. In order to detect and recognize individual dogs we combine transfer learning and object detection approaches on Inception v3 and SSD Inception v2 architectures respectively and evaluate the proposed pipeline using a new data set containing real data that the users uploaded to Pet2Net platform. We show that it can achieve 94.59% accuracy in identifying individual dogs. Our approach has been designed with simplicity in mind and the goal of easy deployment on all the images uploaded to Pet2Net platform. A purely visual approach to identifying dogs in images, will enhance Pet2Net features aimed at finding lost dogs, as well as form the basis of future work focused on identifying social relationships between dogs, which cannot be inferred from other data collected by the platform.
Tasks	Object Detection, Transfer Learning
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06705v1
PDF	https://arxiv.org/pdf/2003.06705v1.pdf
PWC	https://paperswithcode.com/paper/identifying-individual-dogs-in-social-media
Repo
Framework

Financial Time Series Representation Learning


Title	Financial Time Series Representation Learning
Authors	Philippe Chatigny, Jean-Marc Patenaude, Shengrui Wang
Abstract	This paper addresses the difficulty of forecasting multiple financial time series (TS) conjointly using deep neural networks (DNN). We investigate whether DNN-based models could forecast these TS more efficiently by learning their representation directly. To this end, we make use of the dynamic factor graph (DFG) from that we enhance by proposing a novel variable-length attention-based mechanism to render it memory-augmented. Using this mechanism, we propose an unsupervised DNN architecture for multivariate TS forecasting that allows to learn and take advantage of the relationships between these TS. We test our model on two datasets covering 19 years of investment funds activities. Our experimental results show that our proposed approach outperforms significantly typical DNN-based and statistical models at forecasting their 21-day price trajectory.
Tasks	Representation Learning, Time Series
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12194v1
PDF	https://arxiv.org/pdf/2003.12194v1.pdf
PWC	https://paperswithcode.com/paper/financial-time-series-representation-learning
Repo
Framework

SaccadeNet: A Fast and Accurate Object Detector


Title	SaccadeNet: A Fast and Accurate Object Detector
Authors	Shiyi Lan, Zhou Ren, Yi Wu, Larry S. Davis, Gang Hua
Abstract	Object detection is an essential step towards holistic scene understanding. Most existing object detection algorithms attend to certain object areas once and then predict the object locations. However, neuroscientists have revealed that humans do not look at the scene in fixed steadiness. Instead, human eyes move around, locating informative parts to understand the object location. This active perceiving movement process is called \textit{saccade}. %In this paper, Inspired by such mechanism, we propose a fast and accurate object detector called \textit{SaccadeNet}. It contains four main modules, the \cenam, the \coram, the \atm, and the \aggatt, which allows it to attend to different informative object keypoints, and predict object locations from coarse to fine. The \coram~is used only during training to extract more informative corner features which brings free-lunch performance boost. On the MS COCO dataset, we achieve the performance of 40.4% mAP at 28 FPS and 30.5% mAP at 118 FPS. Among all the real-time object detectors, %that can run faster than 25 FPS, our SaccadeNet achieves the best detection performance, which demonstrates the effectiveness of the proposed detection mechanism.
Tasks	Object Detection, Scene Understanding
Published	2020-03-26
URL	https://arxiv.org/abs/2003.12125v1
PDF	https://arxiv.org/pdf/2003.12125v1.pdf
PWC	https://paperswithcode.com/paper/saccadenet-a-fast-and-accurate-object
Repo
Framework

Toward Improving the Evaluation of Visual Attention Models: a Crowdsourcing Approach


Title	Toward Improving the Evaluation of Visual Attention Models: a Crowdsourcing Approach
Authors	Dario Zanca, Stefano Melacci, Marco Gori
Abstract	Human visual attention is a complex phenomenon. A computational modeling of this phenomenon must take into account where people look in order to evaluate which are the salient locations (spatial distribution of the fixations), when they look in those locations to understand the temporal development of the exploration (temporal order of the fixations), and how they move from one location to another with respect to the dynamics of the scene and the mechanics of the eyes (dynamics). State-of-the-art models focus on learning saliency maps from human data, a process that only takes into account the spatial component of the phenomenon and ignore its temporal and dynamical counterparts. In this work we focus on the evaluation methodology of models of human visual attention. We underline the limits of the current metrics for saliency prediction and scanpath similarity, and we introduce a statistical measure for the evaluation of the dynamics of the simulated eye movements. While deep learning models achieve astonishing performance in saliency prediction, our analysis shows their limitations in capturing the dynamics of the process. We find that unsupervised gravitational models, despite of their simplicity, outperform all competitors. Finally, exploiting a crowd-sourcing platform, we present a study aimed at evaluating how strongly the scanpaths generated with the unsupervised gravitational models appear plausible to naive and expert human observers.
Tasks	Saliency Prediction
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04407v1
PDF	https://arxiv.org/pdf/2002.04407v1.pdf
PWC	https://paperswithcode.com/paper/toward-improving-the-evaluation-of-visual
Repo
Framework

BigEarthNet Dataset with A New Class-Nomenclature for Remote Sensing Image Understanding


Title	BigEarthNet Dataset with A New Class-Nomenclature for Remote Sensing Image Understanding
Authors	Gencer Sumbul, Jian Kang, Tristan Kreuziger, Filipe Marcelino, Hugo Costa, Pedro Benevides, Mario Caetano, Begüm Demir
Abstract	This paper presents BigEarthNet that is a large-scale Sentinel-2 multispectral image dataset with a new class nomenclature to advance deep learning (DL) studies in remote sensing (RS). BigEarthNet is made up of 590,326 image patches annotated with multi-labels provided by the CORINE Land Cover (CLC) map of 2018 based on its most thematic detailed Level-3 class nomenclature. Initial research demonstrates that some CLC classes are challenging to be accurately described by considering only Sentinel-2 images. To increase the effectiveness of BigEarthNet, in this paper we introduce an alternative class-nomenclature to allow DL models for better learning and describing the complex spatial and spectral information content of the Sentinel-2 images. This is achieved by interpreting and arranging the CLC Level-3 nomenclature based on the properties of Sentinel-2 images in a new nomenclature of 19 classes. Then, the new class-nomenclature of BigEarthNet is used within state-of-the-art DL models in the context of multi-label classification. Results show that the models trained from scratch on BigEarthNet outperform those pre-trained on ImageNet, especially in relation to some complex classes including agriculture, other vegetated and natural environments. All DL models are made publicly available at http://bigearth.net/#downloads, offering an important resource to guide future progress on RS image analysis.
Tasks	Content-Based Image Retrieval, Image Retrieval, Multi-Label Classification, Scene Classification
Published	2020-01-17
URL	https://arxiv.org/abs/2001.06372v2
PDF	https://arxiv.org/pdf/2001.06372v2.pdf
PWC	https://paperswithcode.com/paper/bigearthnet-deep-learning-models-with-a-new
Repo
Framework

Efficient Rollout Strategies for Bayesian Optimization


Title	Efficient Rollout Strategies for Bayesian Optimization
Authors	Eric Hans Lee, David Eriksson, Bolong Cheng, Michael McCourt, David Bindel
Abstract	Bayesian optimization (BO) is a class of sample-efficient global optimization methods, where a probabilistic model conditioned on previous observations is used to determine future evaluations via the optimization of an acquisition function. Most acquisition functions are myopic, meaning that they only consider the impact of the next function evaluation. Non-myopic acquisition functions consider the impact of the next $h$ function evaluations and are typically computed through rollout, in which $h$ steps of BO are simulated. These rollout acquisition functions are defined as $h$-dimensional integrals, and are expensive to compute and optimize. We show that a combination of quasi-Monte Carlo, common random numbers, and control variates significantly reduce the computational burden of rollout. We then formulate a policy-search based approach that removes the need to optimize the rollout acquisition function. Finally, we discuss the qualitative behavior of rollout policies in the setting of multi-modal objectives and model error.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10539v2
PDF	https://arxiv.org/pdf/2002.10539v2.pdf
PWC	https://paperswithcode.com/paper/efficient-rollout-strategies-for-bayesian
Repo
Framework

Sorting Big Data by Revealed Preference with Application to College Ranking


Title	Sorting Big Data by Revealed Preference with Application to College Ranking
Authors	Xingwei Hu
Abstract	When ranking big data observations such as colleges in the United States, diverse consumers reveal heterogeneous preferences. The objective of this paper is to sort out a linear ordering for these observations and to recommend strategies to improve their relative positions in the ranking. A properly sorted solution could help consumers make the right choices, and governments make wise policy decisions. Previous researchers have applied exogenous weighting or multivariate regression approaches to sort big data objects, ignoring their variety and variability. By recognizing the diversity and heterogeneity among both the observations and the consumers, we instead apply endogenous weighting to these contradictory revealed preferences. The outcome is a consistent steady-state solution to the counterbalance equilibrium within these contradictions. The solution takes into consideration the spillover effects of multiple-step interactions among the observations. When information from data is efficiently revealed in preferences, the revealed preferences greatly reduce the volume of the required data in the sorting process. The employed approach can be applied in many other areas, such as sports team ranking, academic journal ranking, voting, and real effective exchange rates.
Tasks
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12198v1
PDF	https://arxiv.org/pdf/2003.12198v1.pdf
PWC	https://paperswithcode.com/paper/sorting-big-data-by-revealed-preference-with
Repo
Framework

HERS: Homomorphically Encrypted Representation Search


Title	HERS: Homomorphically Encrypted Representation Search
Authors	Joshua J. Engelsma, Anil K. Jain, Vishnu Naresh Boddeti
Abstract	We present a method to search for a probe (or query) image representation against a large gallery in the encrypted domain. We require that the probe and gallery images be represented in terms of a fixed-length representation, which is typical for representations obtained from learned networks. Our encryption scheme is agnostic to how the fixed-length representation is obtained and can, therefore, be applied to any fixed-length representation in any application domain. Our method, dubbed HERS (Homomorphically Encrypted Representation Search), operates by (i) compressing the representation towards its estimated intrinsic dimensionality, (ii) encrypting the compressed representation using the proposed fully homomorphic encryption scheme, and (iii) searching against a gallery of encrypted representations directly in the encrypted domain, without decrypting them, and with minimal loss of accuracy. Numerical results on large galleries of face, fingerprint, and object datasets such as ImageNet show that, for the first time, accurate and fast image search within the encrypted domain is feasible at scale (296 seconds; 46x speedup over state-of-the-art for face search against a background of 1 million).
Tasks	Image Retrieval
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12197v1
PDF	https://arxiv.org/pdf/2003.12197v1.pdf
PWC	https://paperswithcode.com/paper/hers-homomorphically-encrypted-representation
Repo
Framework

Local Facial Makeup Transfer via Disentangled Representation


Title	Local Facial Makeup Transfer via Disentangled Representation
Authors	Zhaoyang Sun, Wenxuan Liu, Feng Liu, Ryan Wen Liu, Shengwu Xiong
Abstract	Facial makeup transfer aims to render a non-makeup face image in an arbitrary given makeup one while preserving face identity. The most advanced method separates makeup style information from face images to realize makeup transfer. However, makeup style includes several semantic clear local styles which are still entangled together. In this paper, we propose a novel unified adversarial disentangling network to further decompose face images into four independent components, i.e., personal identity, lips makeup style, eyes makeup style and face makeup style. Owing to the further disentangling of makeup style, our method can not only control the degree of global makeup style, but also flexibly regulate the degree of local makeup styles which any other approaches can’t do. For makeup removal, different from other methods which regard makeup removal as the reverse process of makeup, we integrate the makeup transfer with the makeup removal into one uniform framework and obtain multiple makeup removal results. Extensive experiments have demonstrated that our approach can produce more realistic and accurate makeup transfer results compared to the state-of-the-art methods.
Tasks	Facial Makeup Transfer
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12065v1
PDF	https://arxiv.org/pdf/2003.12065v1.pdf
PWC	https://paperswithcode.com/paper/local-facial-makeup-transfer-via-disentangled
Repo
Framework

Refined Plane Segmentation for Cuboid-Shaped Objects by Leveraging Edge Detection


Title	Refined Plane Segmentation for Cuboid-Shaped Objects by Leveraging Edge Detection
Authors	Alexander Naumann, Laura Dörr, Niels Ole Salscheider, Kai Furmans
Abstract	Recent advances in the area of plane segmentation from single RGB images show strong accuracy improvements and now allow a reliable segmentation of indoor scenes into planes. Nonetheless, fine-grained details of these segmentation masks are still lacking accuracy, thus restricting the usability of such techniques on a larger scale in numerous applications, such as inpainting for Augmented Reality use cases. We propose a post-processing algorithm to align the segmented plane masks with edges detected in the image. This allows us to increase the accuracy of state-of-the-art approaches, while limiting ourselves to cuboid-shaped objects. Our approach is motivated by logistics, where this assumption is valid and refined planes can be used to perform robust object detection without the need for supervised learning. Results for two baselines and our approach are reported on our own dataset, which we made publicly available. The results show a consistent improvement over the state-of-the-art. The influence of the prior segmentation and the edge detection is investigated and finally, areas for future research are proposed.
Tasks	Edge Detection, Object Detection, Robust Object Detection
Published	2020-03-28
URL	https://arxiv.org/abs/2003.12870v1
PDF	https://arxiv.org/pdf/2003.12870v1.pdf
PWC	https://paperswithcode.com/paper/refined-plane-segmentation-for-cuboid-shaped
Repo
Framework