Paper Group ANR 1215
Nonparametric Curve Alignment. Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery. Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning. Gradient Boosted Decision Tree Neural Network. Frustum VoxNet for 3D object detection from RGB-D or Depth images. Quantitative Depth Quality Asses …
Nonparametric Curve Alignment
Title | Nonparametric Curve Alignment |
Authors | Marwan Mattar, Michael Ross, Erik Learned-Miller |
Abstract | Congealing is a flexible nonparametric data-driven framework for the joint alignment of data. It has been successfully applied to the joint alignment of binary images of digits, binary images of object silhouettes, grayscale MRI images, color images of cars and faces, and 3D brain volumes. This research enhances congealing to practically and effectively apply it to curve data. We develop a parameterized set of nonlinear transformations that allow us to apply congealing to this type of data. We present positive results on aligning synthetic and real curve data sets and conclude with a discussion on extending this work to simultaneous alignment and clustering. |
Tasks | |
Published | 2019-02-02 |
URL | http://arxiv.org/abs/1902.00626v1 |
http://arxiv.org/pdf/1902.00626v1.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-curve-alignment |
Repo | |
Framework | |
Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery
Title | Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery |
Authors | Ries Uittenbogaard, Clint Sebastian, Julien Vijverberg, Bas Boom, Dariu M. Gavrila, Peter H. N. de With |
Abstract | The current paradigm in privacy protection in street-view images is to detect and blur sensitive information. In this paper, we propose a framework that is an alternative to blurring, which automatically removes and inpaints moving objects (e.g. pedestrians, vehicles) in street-view imagery. We propose a novel moving object segmentation algorithm exploiting consistencies in depth across multiple street-view images that are later combined with the results of a segmentation network. The detected moving objects are removed and inpainted with information from other views, to obtain a realistic output image such that the moving object is not visible anymore. We evaluate our results on a dataset of 1000 images to obtain a peak noise-to-signal ratio (PSNR) and L1 loss of 27.2 dB and 2.5%, respectively. To ensure the subjective quality, To assess overall quality, we also report the results of a survey conducted on 35 professionals, asked to visually inspect the images whether object removal and inpainting had taken place. The inpainting dataset will be made publicly available for scientific benchmarking purposes at https://research.cyclomedia.com |
Tasks | Semantic Segmentation |
Published | 2019-03-27 |
URL | http://arxiv.org/abs/1903.11532v1 |
http://arxiv.org/pdf/1903.11532v1.pdf | |
PWC | https://paperswithcode.com/paper/privacy-protection-in-street-view-panoramas |
Repo | |
Framework | |
Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning
Title | Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning |
Authors | Yufei Wang, Ziju Shen, Zichao Long, Bin Dong |
Abstract | Conservation laws are considered to be fundamental laws of nature. It has broad application in many fields including physics, chemistry, biology, geology, and engineering. Solving the differential equations associated with conservation laws is a major branch in computational mathematics. Recent success of machine learning, especially deep learning, in areas such as computer vision and natural language processing, has attracted a lot of attention from the community of computational mathematics and inspired many intriguing works in combining machine learning with traditional methods. In this paper, we are the first to explore the possibility and benefit of solving nonlinear conservation laws using deep reinforcement learning. As a proof of concept, we focus on 1-dimensional scalar conservation laws. We deploy the machinery of deep reinforcement learning to train a policy network that can decide on how the numerical solutions should be approximated in a sequential and spatial-temporal adaptive manner. We will show that the problem of solving conservation laws can be naturally viewed as a sequential decision making process and the numerical schemes learned in such a way can easily enforce long-term accuracy. Furthermore, the learned policy network can determine a good local discrete approximation based on the current state of the solution, which essentially makes the proposed method a meta-learning approach. In other words, the proposed method is capable of learning how to discretize for a given situation mimicking human experts. Finally, we will provide details on how the policy network is trained, how well it performs compared with some state-of-the-art numerical solvers such as WENO schemes, and how well it generalizes. |
Tasks | Decision Making, Meta-Learning |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11079v2 |
https://arxiv.org/pdf/1905.11079v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-discretize-solving-1d-scalar |
Repo | |
Framework | |
Gradient Boosted Decision Tree Neural Network
Title | Gradient Boosted Decision Tree Neural Network |
Authors | Mohammad Saberian, Pablo Delgado, Yves Raimond |
Abstract | In this paper we propose a method to build a neural network that is similar to an ensemble of decision trees. We first illustrate how to convert a learned ensemble of decision trees to a single neural network with one hidden layer and an input transformation. We then relax some properties of this network such as thresholds and activation functions to train an approximately equivalent decision tree ensemble. The final model, Hammock, is surprisingly simple: a fully connected two layers neural network where the input is quantized and one-hot encoded. Experiments on large and small datasets show this simple method can achieve performance similar to that of Gradient Boosted Decision Trees. |
Tasks | |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.09340v2 |
https://arxiv.org/pdf/1910.09340v2.pdf | |
PWC | https://paperswithcode.com/paper/gradient-boosted-decision-tree-neural-network |
Repo | |
Framework | |
Frustum VoxNet for 3D object detection from RGB-D or Depth images
Title | Frustum VoxNet for 3D object detection from RGB-D or Depth images |
Authors | Xiaoke Shen, Ioannis Stamos |
Abstract | Recently, there have been a plethora of classification and detection systems from RGB as well as 3D images. In this work, we describe a new 3D object detection system from an RGB-D or depth-only point cloud. Our system first detects objects in 2D (either RGB or pseudo-RGB constructed from depth). The next step is to detect 3D objects within the 3D frustums these 2D detections define. This is achieved by voxelizing parts of the frustums (since frustums can be really large), instead of using the whole frustums as done in earlier work. The main novelty of our system has to do with determining which parts (3D proposals) of the frustums to voxelize, thus allowing us to provide high resolution representations around the objects of interest. It also allows our system to have reduced memory requirements. These 3D proposals are fed to an efficient ResNet-based 3D Fully Convolutional Network (FCN). Our 3D detection system is fast and can be integrated into a robotics platform. With respect to systems that do not perform voxelization (such as PointNet), our methods can operate without the requirement of subsampling of the datasets. We have also introduced a pipelining approach that further improves the efficiency of our system. Results on SUN RGB-D dataset show that our system, which is based on a small network, can process 20 frames per second with comparable detection results to the state-of-the-art, achieving a 2 times speedup. |
Tasks | 3D Object Detection, Object Detection |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.05483v2 |
https://arxiv.org/pdf/1910.05483v2.pdf | |
PWC | https://paperswithcode.com/paper/frustum-voxnet-for-3d-object-detection-from |
Repo | |
Framework | |
Quantitative Depth Quality Assessment of RGBD Cameras At Close Range Using 3D Printed Fixtures
Title | Quantitative Depth Quality Assessment of RGBD Cameras At Close Range Using 3D Printed Fixtures |
Authors | Michele Pratusevich, Jason Chrisos, Shreyas Aditya |
Abstract | Mobile robots that manipulate their environments require high-accuracy scene understanding at close range. Typically this understanding is achieved with RGBD cameras, but the evaluation process for selecting an appropriate RGBD camera for the application is minimally quantitative. Limited manufacturer-published metrics do not translate to observed quality in real-world cluttered environments, since quality is application-specific. To bridge the gap, we present a method for quantitatively measuring depth quality using a set of extendable 3D printed fixtures that approximate real-world conditions. By framing depth quality as point cloud density and root mean square error (RMSE) from a known geometry, we present a method that is extendable by other system integrators for custom environments. We show a comparison of 3 cameras and present a case study for camera selection, provide reference meshes and analysis code, and discuss further extensions. |
Tasks | Scene Understanding |
Published | 2019-03-21 |
URL | http://arxiv.org/abs/1903.09169v1 |
http://arxiv.org/pdf/1903.09169v1.pdf | |
PWC | https://paperswithcode.com/paper/quantitative-depth-quality-assessment-of-rgbd |
Repo | |
Framework | |
Integrate Image Representation to Text Model on Sentence Level: a Semi-supervised Framework
Title | Integrate Image Representation to Text Model on Sentence Level: a Semi-supervised Framework |
Authors | Lisai Zhang, Qingcai Chen, Dongfang Li, Buzhou Tang |
Abstract | Integrating visual features has been proved useful in language representation learning. Nevertheless, in most existing multi-modality models, alignment of visual and textual data is prerequisite. In this paper, we propose a novel semi-supervised visual integration framework for sentence level language representation. The uniqueness include: 1) the integration is conducted via a semi-supervised approach, which can bring image to textual NLU tasks by pre-training a visualization network, 2) visual representations are dynamically integrated in both training and predicting stages. To verify the efficacy of the proposed framework, we conduct the experiments on the SemEval 2018 Task 11 and reach new state-of-the-art on this reading comprehension task. Considering that the visual integration framework only requires image database, and no extra alignment is required for training and prediction, it provides an efficient and feasible method for multi-modality language learning. |
Tasks | Reading Comprehension, Representation Learning |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00336v1 |
https://arxiv.org/pdf/1912.00336v1.pdf | |
PWC | https://paperswithcode.com/paper/integrate-image-representation-to-text-model |
Repo | |
Framework | |
Real time backbone for semantic segmentation
Title | Real time backbone for semantic segmentation |
Authors | Zhengeng Yang, Hongshan Yu, Qiang Fu, Wei Sun, Wenyan Jia, Mingui Sun, Zhi-Hong Mao |
Abstract | The rapid development of autonomous driving in recent years presents lots of challenges for scene understanding. As an essential step towards scene understanding, semantic segmentation thus received lots of attention in past few years. Although deep learning based state-of-the-arts have achieved great success in improving the segmentation accuracy, most of them suffer from an inefficiency problem and can hardly applied to practical applications. In this paper, we systematically analyze the computation cost of Convolutional Neural Network(CNN) and found that the inefficiency of CNN is mainly caused by its wide structure rather than the deep structure. In addition, the success of pruning based model compression methods proved that there are many redundant channels in CNN. Thus, we designed a very narrow while deep backbone network to improve the efficiency of semantic segmentation. By casting our network to FCN32 segmentation architecture, the basic structure of most segmentation methods, we achieved 60.6% mIoU on Cityscape val dataset with 54 frame per seconds(FPS) on $1024\times2048$ inputs, which already outperforms one of the earliest real time deep learning based segmentation methods: ENet. |
Tasks | Autonomous Driving, Model Compression, Scene Understanding, Semantic Segmentation |
Published | 2019-03-16 |
URL | http://arxiv.org/abs/1903.06922v1 |
http://arxiv.org/pdf/1903.06922v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-backbone-for-semantic-segmentation |
Repo | |
Framework | |
Dense Depth Posterior (DDP) from Single Image and Sparse Range
Title | Dense Depth Posterior (DDP) from Single Image and Sparse Range |
Authors | Yanchao Yang, Alex Wong, Stefano Soatto |
Abstract | We present a deep learning system to infer the posterior distribution of a dense depth map associated with an image, by exploiting sparse range measurements, for instance from a lidar. While the lidar may provide a depth value for a small percentage of the pixels, we exploit regularities reflected in the training set to complete the map so as to have a probability over depth for each pixel in the image. We exploit a Conditional Prior Network, that allows associating a probability to each depth value given an image, and combine it with a likelihood term that uses the sparse measurements. Optionally we can also exploit the availability of stereo during training, but in any case only require a single image and a sparse point cloud at run-time. We test our approach on both unsupervised and supervised depth completion using the KITTI benchmark, and improve the state-of-the-art in both. |
Tasks | Depth Completion |
Published | 2019-01-28 |
URL | http://arxiv.org/abs/1901.10034v2 |
http://arxiv.org/pdf/1901.10034v2.pdf | |
PWC | https://paperswithcode.com/paper/dense-depth-posterior-ddp-from-single-image |
Repo | |
Framework | |
Kernelized Multiview Subspace Analysis by Self-weighted Learning
Title | Kernelized Multiview Subspace Analysis by Self-weighted Learning |
Authors | Huibing Wang, Yang Wang, Zhao Zhang, Xianping Fu, Meng Wang |
Abstract | With the popularity of multimedia technology, information is always represented or transmitted from multiple views. Most of the existing algorithms are graph-based ones to learn the complex structures within multiview data but overlooked the information within data representations. Furthermore, many existing works treat multiple views discriminatively by introducing some hyperparameters, which is undesirable in practice. To this end, abundant multiview based methods have been proposed for dimension reduction. However, there are still no research to leverage the existing work into a unified framework. To address this issue, in this paper, we propose a general framework for multiview data dimension reduction, named Kernelized Multiview Subspace Analysis (KMSA). It directly handles the multi-view feature representation in the kernel space, which provides a feasible channel for direct manipulations on multiview data with different dimensions. Meanwhile, compared with those graph-based methods, KMSA can fully exploit information from multiview data with nothing to lose. Furthermore, since different views have different influences on KMSA, we propose a self-weighted strategy to treat different views discriminatively according to their contributions. A co-regularized term is proposed to promote the mutual learning from multi-views. KMSA combines self-weighted learning with the co-regularized term to learn appropriate weights for all views. We also discuss the influence of the parameters in KMSA regarding the weights of multi-views. We evaluate our proposed framework on 6 multiview datasets for classification and image retrieval. The experimental results validate the advantages of our proposed method. |
Tasks | Dimensionality Reduction, Image Retrieval |
Published | 2019-11-23 |
URL | https://arxiv.org/abs/1911.10357v1 |
https://arxiv.org/pdf/1911.10357v1.pdf | |
PWC | https://paperswithcode.com/paper/kernelized-multiview-subspace-analysis-by |
Repo | |
Framework | |
Learning to Navigate from Simulation via Spatial and Semantic Information Synthesis with Noise Model Embedding
Title | Learning to Navigate from Simulation via Spatial and Semantic Information Synthesis with Noise Model Embedding |
Authors | Gang Chen, Hongzhe Yu, Wei Dong, Xinjun Sheng, Xiangyang Zhu, Han Ding |
Abstract | While training an end-to-end navigation network in the real world is usually of high cost, simulation provides a safe and cheap environment in this training stage. However, training neural network models in simulation brings up the problem of how to effectively transfer the model from simulation to the real world (sim-to-real). In this work, we regard the environment representation as a crucial element in this transfer process and propose a visual information pyramid (VIP) model to systematically investigate a practical environment representation. A novel representation composed of spatial and semantic information synthesis is then established accordingly, where noise model embedding is particularly considered. To explore the effectiveness of this representation, we compared the performance with representations popularly used in the literature in both simulated and real-world scenarios. Results suggest that our environment representation stands out. Furthermore, an analysis on the feature map is implemented to investigate the effectiveness through inner reaction, which could be irradiative for future researches on end-to-end navigation. |
Tasks | |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.05758v3 |
https://arxiv.org/pdf/1910.05758v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-navigate-from-simulation-via |
Repo | |
Framework | |
Instance- and Category-level 6D Object Pose Estimation
Title | Instance- and Category-level 6D Object Pose Estimation |
Authors | Caner Sahin, Guillermo Garcia-Hernando, Juil Sock, Tae-Kyun Kim |
Abstract | 6D object pose estimation is an important task that determines the 3D position and 3D rotation of an object in camera-centred coordinates. By utilizing such a task, one can propose promising solutions for various problems related to scene understanding, augmented reality, control and navigation of robotics. Recent developments on visual depth sensors and low-cost availability of depth data significantly facilitate object pose estimation. Using depth information from RGB-D sensors, substantial progress has been made in the last decade by the methods addressing the challenges such as viewpoint variability, occlusion and clutter, and similar looking distractors. Particularly, with the recent advent of convolutional neural networks, RGB-only based solutions have been presented. However, improved results have only been reported for recovering the pose of known instances, i.e., for the instance-level object pose estimation tasks. More recently, state-of-the-art approaches target to solve object pose estimation problem at the level of categories, recovering the 6D pose of unknown instances. To this end, they address the challenges of the category-level tasks such as distribution shift among source and target domains, high intra-class variations, and shape discrepancies between objects. |
Tasks | 6D Pose Estimation using RGB, Pose Estimation, Scene Understanding |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04229v1 |
http://arxiv.org/pdf/1903.04229v1.pdf | |
PWC | https://paperswithcode.com/paper/instance-and-category-level-6d-object-pose |
Repo | |
Framework | |
Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction
Title | Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction |
Authors | Yifei Shi, Angel Xuan Chang, Zhelun Wu, Manolis Savva, Kai Xu |
Abstract | Indoor scenes exhibit rich hierarchical structure in 3D object layouts. Many tasks in 3D scene understanding can benefit from reasoning jointly about the hierarchical context of a scene, and the identities of objects. We present a variational denoising recursive autoencoder (VDRAE) that generates and iteratively refines a hierarchical representation of 3D object layouts, interleaving bottom-up encoding for context aggregation and top-down decoding for propagation. We train our VDRAE on large-scale 3D scene datasets to predict both instance-level segmentations and a 3D object detections from an over-segmentation of an input point cloud. We show that our VDRAE improves object detection performance on real-world 3D point cloud datasets compared to baselines from prior work. |
Tasks | Denoising, Object Detection, Scene Understanding |
Published | 2019-03-09 |
URL | http://arxiv.org/abs/1903.03757v2 |
http://arxiv.org/pdf/1903.03757v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchy-denoising-recursive-autoencoders |
Repo | |
Framework | |
Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning
Title | Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning |
Authors | Yixuan Qiu, Jing Lei, Kathryn Roeder |
Abstract | Sparse principal component analysis (PCA) is an important technique for dimensionality reduction of high-dimensional data. However, most existing sparse PCA algorithms are based on non-convex optimization, which provide little guarantee on the global convergence. Sparse PCA algorithms based on a convex formulation, for example the Fantope projection and selection (FPS), overcome this difficulty, but are computationally expensive. In this work we study sparse PCA based on the convex FPS formulation, and propose a new algorithm that is computationally efficient and applicable to large and high-dimensional data sets. Nonasymptotic and explicit bounds are derived for both the optimization error and the statistical accuracy, which can be used for testing and inference problems. We also extend our algorithm to online learning problems, where data are obtained in a streaming fashion. The proposed algorithm is applied to high-dimensional gene expression data for the detection of functional gene groups. |
Tasks | Dimensionality Reduction |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08048v1 |
https://arxiv.org/pdf/1911.08048v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-based-sparse-principal-component |
Repo | |
Framework | |
The Implicit Bias of AdaGrad on Separable Data
Title | The Implicit Bias of AdaGrad on Separable Data |
Authors | Qian Qian, Xiaoyuan Qian |
Abstract | We study the implicit bias of AdaGrad on separable linear classification problems. We show that AdaGrad converges to a direction that can be characterized as the solution of a quadratic optimization problem with the same feasible set as the hard SVM problem. We also give a discussion about how different choices of the hyperparameters of AdaGrad might impact this direction. This provides a deeper understanding of why adaptive methods do not seem to have the generalization ability as good as gradient descent does in practice. |
Tasks | |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.03559v1 |
https://arxiv.org/pdf/1906.03559v1.pdf | |
PWC | https://paperswithcode.com/paper/the-implicit-bias-of-adagrad-on-separable |
Repo | |
Framework | |