January 27, 2020

3112 words 15 mins read

Paper Group ANR 1215

Nonparametric Curve Alignment. Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery. Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning. Gradient Boosted Decision Tree Neural Network. Frustum VoxNet for 3D object detection from RGB-D or Depth images. Quantitative Depth Quality Asses …

Nonparametric Curve Alignment


Title	Nonparametric Curve Alignment
Authors	Marwan Mattar, Michael Ross, Erik Learned-Miller
Abstract	Congealing is a flexible nonparametric data-driven framework for the joint alignment of data. It has been successfully applied to the joint alignment of binary images of digits, binary images of object silhouettes, grayscale MRI images, color images of cars and faces, and 3D brain volumes. This research enhances congealing to practically and effectively apply it to curve data. We develop a parameterized set of nonlinear transformations that allow us to apply congealing to this type of data. We present positive results on aligning synthetic and real curve data sets and conclude with a discussion on extending this work to simultaneous alignment and clustering.
Tasks
Published	2019-02-02
URL	http://arxiv.org/abs/1902.00626v1
PDF	http://arxiv.org/pdf/1902.00626v1.pdf
PWC	https://paperswithcode.com/paper/nonparametric-curve-alignment
Repo
Framework

Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery


Title	Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery
Authors	Ries Uittenbogaard, Clint Sebastian, Julien Vijverberg, Bas Boom, Dariu M. Gavrila, Peter H. N. de With
Abstract	The current paradigm in privacy protection in street-view images is to detect and blur sensitive information. In this paper, we propose a framework that is an alternative to blurring, which automatically removes and inpaints moving objects (e.g. pedestrians, vehicles) in street-view imagery. We propose a novel moving object segmentation algorithm exploiting consistencies in depth across multiple street-view images that are later combined with the results of a segmentation network. The detected moving objects are removed and inpainted with information from other views, to obtain a realistic output image such that the moving object is not visible anymore. We evaluate our results on a dataset of 1000 images to obtain a peak noise-to-signal ratio (PSNR) and L1 loss of 27.2 dB and 2.5%, respectively. To ensure the subjective quality, To assess overall quality, we also report the results of a survey conducted on 35 professionals, asked to visually inspect the images whether object removal and inpainting had taken place. The inpainting dataset will be made publicly available for scientific benchmarking purposes at https://research.cyclomedia.com
Tasks	Semantic Segmentation
Published	2019-03-27
URL	http://arxiv.org/abs/1903.11532v1
PDF	http://arxiv.org/pdf/1903.11532v1.pdf
PWC	https://paperswithcode.com/paper/privacy-protection-in-street-view-panoramas
Repo
Framework

Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning


Title	Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning
Authors	Yufei Wang, Ziju Shen, Zichao Long, Bin Dong
Abstract	Conservation laws are considered to be fundamental laws of nature. It has broad application in many fields including physics, chemistry, biology, geology, and engineering. Solving the differential equations associated with conservation laws is a major branch in computational mathematics. Recent success of machine learning, especially deep learning, in areas such as computer vision and natural language processing, has attracted a lot of attention from the community of computational mathematics and inspired many intriguing works in combining machine learning with traditional methods. In this paper, we are the first to explore the possibility and benefit of solving nonlinear conservation laws using deep reinforcement learning. As a proof of concept, we focus on 1-dimensional scalar conservation laws. We deploy the machinery of deep reinforcement learning to train a policy network that can decide on how the numerical solutions should be approximated in a sequential and spatial-temporal adaptive manner. We will show that the problem of solving conservation laws can be naturally viewed as a sequential decision making process and the numerical schemes learned in such a way can easily enforce long-term accuracy. Furthermore, the learned policy network can determine a good local discrete approximation based on the current state of the solution, which essentially makes the proposed method a meta-learning approach. In other words, the proposed method is capable of learning how to discretize for a given situation mimicking human experts. Finally, we will provide details on how the policy network is trained, how well it performs compared with some state-of-the-art numerical solvers such as WENO schemes, and how well it generalizes.
Tasks	Decision Making, Meta-Learning
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11079v2
PDF	https://arxiv.org/pdf/1905.11079v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-discretize-solving-1d-scalar
Repo
Framework

Gradient Boosted Decision Tree Neural Network


Title	Gradient Boosted Decision Tree Neural Network
Authors	Mohammad Saberian, Pablo Delgado, Yves Raimond
Abstract	In this paper we propose a method to build a neural network that is similar to an ensemble of decision trees. We first illustrate how to convert a learned ensemble of decision trees to a single neural network with one hidden layer and an input transformation. We then relax some properties of this network such as thresholds and activation functions to train an approximately equivalent decision tree ensemble. The final model, Hammock, is surprisingly simple: a fully connected two layers neural network where the input is quantized and one-hot encoded. Experiments on large and small datasets show this simple method can achieve performance similar to that of Gradient Boosted Decision Trees.
Tasks
Published	2019-10-17
URL	https://arxiv.org/abs/1910.09340v2
PDF	https://arxiv.org/pdf/1910.09340v2.pdf
PWC	https://paperswithcode.com/paper/gradient-boosted-decision-tree-neural-network
Repo
Framework

Frustum VoxNet for 3D object detection from RGB-D or Depth images


Title	Frustum VoxNet for 3D object detection from RGB-D or Depth images
Authors	Xiaoke Shen, Ioannis Stamos
Abstract	Recently, there have been a plethora of classification and detection systems from RGB as well as 3D images. In this work, we describe a new 3D object detection system from an RGB-D or depth-only point cloud. Our system first detects objects in 2D (either RGB or pseudo-RGB constructed from depth). The next step is to detect 3D objects within the 3D frustums these 2D detections define. This is achieved by voxelizing parts of the frustums (since frustums can be really large), instead of using the whole frustums as done in earlier work. The main novelty of our system has to do with determining which parts (3D proposals) of the frustums to voxelize, thus allowing us to provide high resolution representations around the objects of interest. It also allows our system to have reduced memory requirements. These 3D proposals are fed to an efficient ResNet-based 3D Fully Convolutional Network (FCN). Our 3D detection system is fast and can be integrated into a robotics platform. With respect to systems that do not perform voxelization (such as PointNet), our methods can operate without the requirement of subsampling of the datasets. We have also introduced a pipelining approach that further improves the efficiency of our system. Results on SUN RGB-D dataset show that our system, which is based on a small network, can process 20 frames per second with comparable detection results to the state-of-the-art, achieving a 2 times speedup.
Tasks	3D Object Detection, Object Detection
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05483v2
PDF	https://arxiv.org/pdf/1910.05483v2.pdf
PWC	https://paperswithcode.com/paper/frustum-voxnet-for-3d-object-detection-from
Repo
Framework

Quantitative Depth Quality Assessment of RGBD Cameras At Close Range Using 3D Printed Fixtures


Title	Quantitative Depth Quality Assessment of RGBD Cameras At Close Range Using 3D Printed Fixtures
Authors	Michele Pratusevich, Jason Chrisos, Shreyas Aditya
Abstract	Mobile robots that manipulate their environments require high-accuracy scene understanding at close range. Typically this understanding is achieved with RGBD cameras, but the evaluation process for selecting an appropriate RGBD camera for the application is minimally quantitative. Limited manufacturer-published metrics do not translate to observed quality in real-world cluttered environments, since quality is application-specific. To bridge the gap, we present a method for quantitatively measuring depth quality using a set of extendable 3D printed fixtures that approximate real-world conditions. By framing depth quality as point cloud density and root mean square error (RMSE) from a known geometry, we present a method that is extendable by other system integrators for custom environments. We show a comparison of 3 cameras and present a case study for camera selection, provide reference meshes and analysis code, and discuss further extensions.
Tasks	Scene Understanding
Published	2019-03-21
URL	http://arxiv.org/abs/1903.09169v1
PDF	http://arxiv.org/pdf/1903.09169v1.pdf
PWC	https://paperswithcode.com/paper/quantitative-depth-quality-assessment-of-rgbd
Repo
Framework

Integrate Image Representation to Text Model on Sentence Level: a Semi-supervised Framework


Title	Integrate Image Representation to Text Model on Sentence Level: a Semi-supervised Framework
Authors	Lisai Zhang, Qingcai Chen, Dongfang Li, Buzhou Tang
Abstract	Integrating visual features has been proved useful in language representation learning. Nevertheless, in most existing multi-modality models, alignment of visual and textual data is prerequisite. In this paper, we propose a novel semi-supervised visual integration framework for sentence level language representation. The uniqueness include: 1) the integration is conducted via a semi-supervised approach, which can bring image to textual NLU tasks by pre-training a visualization network, 2) visual representations are dynamically integrated in both training and predicting stages. To verify the efficacy of the proposed framework, we conduct the experiments on the SemEval 2018 Task 11 and reach new state-of-the-art on this reading comprehension task. Considering that the visual integration framework only requires image database, and no extra alignment is required for training and prediction, it provides an efficient and feasible method for multi-modality language learning.
Tasks	Reading Comprehension, Representation Learning
Published	2019-12-01
URL	https://arxiv.org/abs/1912.00336v1
PDF	https://arxiv.org/pdf/1912.00336v1.pdf
PWC	https://paperswithcode.com/paper/integrate-image-representation-to-text-model
Repo
Framework

Real time backbone for semantic segmentation


Title	Real time backbone for semantic segmentation
Authors	Zhengeng Yang, Hongshan Yu, Qiang Fu, Wei Sun, Wenyan Jia, Mingui Sun, Zhi-Hong Mao
Abstract	The rapid development of autonomous driving in recent years presents lots of challenges for scene understanding. As an essential step towards scene understanding, semantic segmentation thus received lots of attention in past few years. Although deep learning based state-of-the-arts have achieved great success in improving the segmentation accuracy, most of them suffer from an inefficiency problem and can hardly applied to practical applications. In this paper, we systematically analyze the computation cost of Convolutional Neural Network(CNN) and found that the inefficiency of CNN is mainly caused by its wide structure rather than the deep structure. In addition, the success of pruning based model compression methods proved that there are many redundant channels in CNN. Thus, we designed a very narrow while deep backbone network to improve the efficiency of semantic segmentation. By casting our network to FCN32 segmentation architecture, the basic structure of most segmentation methods, we achieved 60.6% mIoU on Cityscape val dataset with 54 frame per seconds(FPS) on $1024\times2048$ inputs, which already outperforms one of the earliest real time deep learning based segmentation methods: ENet.
Tasks	Autonomous Driving, Model Compression, Scene Understanding, Semantic Segmentation
Published	2019-03-16
URL	http://arxiv.org/abs/1903.06922v1
PDF	http://arxiv.org/pdf/1903.06922v1.pdf
PWC	https://paperswithcode.com/paper/real-time-backbone-for-semantic-segmentation
Repo
Framework

Dense Depth Posterior (DDP) from Single Image and Sparse Range


Title	Dense Depth Posterior (DDP) from Single Image and Sparse Range
Authors	Yanchao Yang, Alex Wong, Stefano Soatto
Abstract	We present a deep learning system to infer the posterior distribution of a dense depth map associated with an image, by exploiting sparse range measurements, for instance from a lidar. While the lidar may provide a depth value for a small percentage of the pixels, we exploit regularities reflected in the training set to complete the map so as to have a probability over depth for each pixel in the image. We exploit a Conditional Prior Network, that allows associating a probability to each depth value given an image, and combine it with a likelihood term that uses the sparse measurements. Optionally we can also exploit the availability of stereo during training, but in any case only require a single image and a sparse point cloud at run-time. We test our approach on both unsupervised and supervised depth completion using the KITTI benchmark, and improve the state-of-the-art in both.
Tasks	Depth Completion
Published	2019-01-28
URL	http://arxiv.org/abs/1901.10034v2
PDF	http://arxiv.org/pdf/1901.10034v2.pdf
PWC	https://paperswithcode.com/paper/dense-depth-posterior-ddp-from-single-image
Repo
Framework

Kernelized Multiview Subspace Analysis by Self-weighted Learning


Title	Kernelized Multiview Subspace Analysis by Self-weighted Learning
Authors	Huibing Wang, Yang Wang, Zhao Zhang, Xianping Fu, Meng Wang
Abstract	With the popularity of multimedia technology, information is always represented or transmitted from multiple views. Most of the existing algorithms are graph-based ones to learn the complex structures within multiview data but overlooked the information within data representations. Furthermore, many existing works treat multiple views discriminatively by introducing some hyperparameters, which is undesirable in practice. To this end, abundant multiview based methods have been proposed for dimension reduction. However, there are still no research to leverage the existing work into a unified framework. To address this issue, in this paper, we propose a general framework for multiview data dimension reduction, named Kernelized Multiview Subspace Analysis (KMSA). It directly handles the multi-view feature representation in the kernel space, which provides a feasible channel for direct manipulations on multiview data with different dimensions. Meanwhile, compared with those graph-based methods, KMSA can fully exploit information from multiview data with nothing to lose. Furthermore, since different views have different influences on KMSA, we propose a self-weighted strategy to treat different views discriminatively according to their contributions. A co-regularized term is proposed to promote the mutual learning from multi-views. KMSA combines self-weighted learning with the co-regularized term to learn appropriate weights for all views. We also discuss the influence of the parameters in KMSA regarding the weights of multi-views. We evaluate our proposed framework on 6 multiview datasets for classification and image retrieval. The experimental results validate the advantages of our proposed method.
Tasks	Dimensionality Reduction, Image Retrieval
Published	2019-11-23
URL	https://arxiv.org/abs/1911.10357v1
PDF	https://arxiv.org/pdf/1911.10357v1.pdf
PWC	https://paperswithcode.com/paper/kernelized-multiview-subspace-analysis-by
Repo
Framework

Learning to Navigate from Simulation via Spatial and Semantic Information Synthesis with Noise Model Embedding


Title	Learning to Navigate from Simulation via Spatial and Semantic Information Synthesis with Noise Model Embedding
Authors	Gang Chen, Hongzhe Yu, Wei Dong, Xinjun Sheng, Xiangyang Zhu, Han Ding
Abstract	While training an end-to-end navigation network in the real world is usually of high cost, simulation provides a safe and cheap environment in this training stage. However, training neural network models in simulation brings up the problem of how to effectively transfer the model from simulation to the real world (sim-to-real). In this work, we regard the environment representation as a crucial element in this transfer process and propose a visual information pyramid (VIP) model to systematically investigate a practical environment representation. A novel representation composed of spatial and semantic information synthesis is then established accordingly, where noise model embedding is particularly considered. To explore the effectiveness of this representation, we compared the performance with representations popularly used in the literature in both simulated and real-world scenarios. Results suggest that our environment representation stands out. Furthermore, an analysis on the feature map is implemented to investigate the effectiveness through inner reaction, which could be irradiative for future researches on end-to-end navigation.
Tasks
Published	2019-10-13
URL	https://arxiv.org/abs/1910.05758v3
PDF	https://arxiv.org/pdf/1910.05758v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-navigate-from-simulation-via
Repo
Framework

Instance- and Category-level 6D Object Pose Estimation


Title	Instance- and Category-level 6D Object Pose Estimation
Authors	Caner Sahin, Guillermo Garcia-Hernando, Juil Sock, Tae-Kyun Kim
Abstract	6D object pose estimation is an important task that determines the 3D position and 3D rotation of an object in camera-centred coordinates. By utilizing such a task, one can propose promising solutions for various problems related to scene understanding, augmented reality, control and navigation of robotics. Recent developments on visual depth sensors and low-cost availability of depth data significantly facilitate object pose estimation. Using depth information from RGB-D sensors, substantial progress has been made in the last decade by the methods addressing the challenges such as viewpoint variability, occlusion and clutter, and similar looking distractors. Particularly, with the recent advent of convolutional neural networks, RGB-only based solutions have been presented. However, improved results have only been reported for recovering the pose of known instances, i.e., for the instance-level object pose estimation tasks. More recently, state-of-the-art approaches target to solve object pose estimation problem at the level of categories, recovering the 6D pose of unknown instances. To this end, they address the challenges of the category-level tasks such as distribution shift among source and target domains, high intra-class variations, and shape discrepancies between objects.
Tasks	6D Pose Estimation using RGB, Pose Estimation, Scene Understanding
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04229v1
PDF	http://arxiv.org/pdf/1903.04229v1.pdf
PWC	https://paperswithcode.com/paper/instance-and-category-level-6d-object-pose
Repo
Framework

Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction


Title	Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction
Authors	Yifei Shi, Angel Xuan Chang, Zhelun Wu, Manolis Savva, Kai Xu
Abstract	Indoor scenes exhibit rich hierarchical structure in 3D object layouts. Many tasks in 3D scene understanding can benefit from reasoning jointly about the hierarchical context of a scene, and the identities of objects. We present a variational denoising recursive autoencoder (VDRAE) that generates and iteratively refines a hierarchical representation of 3D object layouts, interleaving bottom-up encoding for context aggregation and top-down decoding for propagation. We train our VDRAE on large-scale 3D scene datasets to predict both instance-level segmentations and a 3D object detections from an over-segmentation of an input point cloud. We show that our VDRAE improves object detection performance on real-world 3D point cloud datasets compared to baselines from prior work.
Tasks	Denoising, Object Detection, Scene Understanding
Published	2019-03-09
URL	http://arxiv.org/abs/1903.03757v2
PDF	http://arxiv.org/pdf/1903.03757v2.pdf
PWC	https://paperswithcode.com/paper/hierarchy-denoising-recursive-autoencoders
Repo
Framework

Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning


Title	Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning
Authors	Yixuan Qiu, Jing Lei, Kathryn Roeder
Abstract	Sparse principal component analysis (PCA) is an important technique for dimensionality reduction of high-dimensional data. However, most existing sparse PCA algorithms are based on non-convex optimization, which provide little guarantee on the global convergence. Sparse PCA algorithms based on a convex formulation, for example the Fantope projection and selection (FPS), overcome this difficulty, but are computationally expensive. In this work we study sparse PCA based on the convex FPS formulation, and propose a new algorithm that is computationally efficient and applicable to large and high-dimensional data sets. Nonasymptotic and explicit bounds are derived for both the optimization error and the statistical accuracy, which can be used for testing and inference problems. We also extend our algorithm to online learning problems, where data are obtained in a streaming fashion. The proposed algorithm is applied to high-dimensional gene expression data for the detection of functional gene groups.
Tasks	Dimensionality Reduction
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08048v1
PDF	https://arxiv.org/pdf/1911.08048v1.pdf
PWC	https://paperswithcode.com/paper/gradient-based-sparse-principal-component
Repo
Framework

The Implicit Bias of AdaGrad on Separable Data


Title	The Implicit Bias of AdaGrad on Separable Data
Authors	Qian Qian, Xiaoyuan Qian
Abstract	We study the implicit bias of AdaGrad on separable linear classification problems. We show that AdaGrad converges to a direction that can be characterized as the solution of a quadratic optimization problem with the same feasible set as the hard SVM problem. We also give a discussion about how different choices of the hyperparameters of AdaGrad might impact this direction. This provides a deeper understanding of why adaptive methods do not seem to have the generalization ability as good as gradient descent does in practice.
Tasks
Published	2019-06-09
URL	https://arxiv.org/abs/1906.03559v1
PDF	https://arxiv.org/pdf/1906.03559v1.pdf
PWC	https://paperswithcode.com/paper/the-implicit-bias-of-adagrad-on-separable
Repo
Framework