January 28, 2020

3225 words 16 mins read

Paper Group ANR 976

Paper Group ANR 976

The Wang-Landau Algorithm as Stochastic Optimization and Its Acceleration. Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving. A Unified Framework of Robust Submodular Optimization. Matrix Product State Based Quantum Classifier. GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving. Re- …

The Wang-Landau Algorithm as Stochastic Optimization and Its Acceleration

Title The Wang-Landau Algorithm as Stochastic Optimization and Its Acceleration
Authors Chenguang Dai, Jun S. Liu
Abstract We show that the Wang-Landau algorithm can be formulated as a stochastic gradient descent algorithm minimizing a smooth and convex objective function, of which the gradient is estimated using Markov chain Monte Carlo iterations. The optimization formulation provides us a new way to establish the convergence rate of the Wang-Landau algorithm, by exploiting the fact that almost surely, the density estimates (on the logarithmic scale) remain in a compact set, upon which the objective function is strongly convex. The optimization viewpoint motivates us to improve the efficiency of the Wang-Landau algorithm using popular tools including the momentum method and the adaptive learning rate method. We demonstrate the accelerated Wang-Landau algorithm on a two-dimensional Ising model and a two-dimensional ten-state Potts model.
Tasks Stochastic Optimization
Published 2019-07-27
URL https://arxiv.org/abs/1907.11985v2
PDF https://arxiv.org/pdf/1907.11985v2.pdf
PWC https://paperswithcode.com/paper/the-wang-landau-algorithm-as-stochastic
Repo
Framework

Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

Title Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving
Authors Xinzhu Ma, Zhihui Wang, Haojie Li, Pengbo Zhang, Xin Fan, Wanli Ouyang
Abstract In this paper, we propose a monocular 3D object detection framework in the domain of autonomous driving. Unlike previous image-based methods which focus on RGB feature extracted from 2D images, our method solves this problem in the reconstructed 3D space in order to exploit 3D contexts explicitly. To this end, we first leverage a stand-alone module to transform the input data from 2D image plane to 3D point clouds space for a better input representation, then we perform the 3D detection using PointNet backbone net to obtain objects 3D locations, dimensions and orientations. To enhance the discriminative capability of point clouds, we propose a multi-modal feature fusion module to embed the complementary RGB cue into the generated point clouds representation. We argue that it is more effective to infer the 3D bounding boxes from the generated 3D scene space (i.e., X,Y, Z space) compared to the image plane (i.e., R,G,B image plane). Evaluation on the challenging KITTI dataset shows that our approach boosts the performance of state-of-the-art monocular approach by a large margin.
Tasks 3D Object Detection, 3D Reconstruction, Autonomous Driving, Object Detection
Published 2019-03-27
URL https://arxiv.org/abs/1903.11444v3
PDF https://arxiv.org/pdf/1903.11444v3.pdf
PWC https://paperswithcode.com/paper/accurate-monocular-3d-object-detection-via
Repo
Framework

A Unified Framework of Robust Submodular Optimization

Title A Unified Framework of Robust Submodular Optimization
Authors Rishabh Iyer
Abstract In this paper, we shall study a unified framework of robust submodular optimization. We study this problem both from a minimization and maximization perspective (previous work has only focused on variants of robust submodular maximization). We do this under a broad range of combinatorial constraints including cardinality, knapsack, matroid as well as graph based constraints such as cuts, paths, matchings and trees. Furthermore, we also study robust submodular minimization and maximization under multiple submodular upper and lower bound constraints. We show that all these problems are motivated by important machine learning applications including robust data subset selection, robust co-operative cuts and robust co-operative matchings. In each case, we provide scalable approximation algorithms and also study hardness bounds. Finally, we empirically demonstrate the utility of our algorithms on real world applications.
Tasks
Published 2019-06-14
URL https://arxiv.org/abs/1906.06393v1
PDF https://arxiv.org/pdf/1906.06393v1.pdf
PWC https://paperswithcode.com/paper/a-unified-framework-of-robust-submodular
Repo
Framework

Matrix Product State Based Quantum Classifier

Title Matrix Product State Based Quantum Classifier
Authors Amandeep Singh Bhatia, Mandeep Kaur Saggi, Ajay Kumar, Sushma Jain
Abstract In recent years, interest in expressing the success of neural networks to the quantum computing has increased significantly. Tensor network theory has become increasingly popular and widely used to simulate strongly entangled correlated systems. Matrix product state (MPS) is the well-designed class of tensor network states, which plays an important role in processing of quantum information. In this paper, we have shown that matrix product state as one-dimensional array of tensors can be used to classify classical and quantum data. We have performed binary classification of classical machine learning dataset Iris encoded in a quantum state. Further, we have investigated the performance by considering different parameters on the ibmqx4 quantum computer and proved that MPS circuits can be used to attain better accuracy. Further, the learning ability of MPS quantum classifier is tested to classify evapotranspiration ($ET_{o}$) for Patiala meteorological station located in Northern Punjab (India), using three years of historical dataset (Agri). Furthermore, we have used different performance metrics of classification to measure its capability. Finally, the results are plotted and degree of correspondence among values of each sample is shown.
Tasks
Published 2019-05-04
URL https://arxiv.org/abs/1905.01426v1
PDF https://arxiv.org/pdf/1905.01426v1.pdf
PWC https://paperswithcode.com/paper/matrix-product-state-based-quantum-classifier
Repo
Framework

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving

Title GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving
Authors Buyu Li, Wanli Ouyang, Lu Sheng, Xingyu Zeng, Xiaogang Wang
Abstract We present an efficient 3D object detection framework based on a single RGB image in the scenario of autonomous driving. Our efforts are put on extracting the underlying 3D information in a 2D image and determining the accurate 3D bounding box of the object without point cloud or stereo data. Leveraging the off-the-shelf 2D object detector, we propose an artful approach to efficiently obtain a coarse cuboid for each predicted 2D box. The coarse cuboid has enough accuracy to guide us to determine the 3D box of the object by refinement. In contrast to previous state-of-the-art methods that only use the features extracted from the 2D bounding box for box refinement, we explore the 3D structure information of the object by employing the visual features of visible surfaces. The new features from surfaces are utilized to eliminate the problem of representation ambiguity brought by only using a 2D bounding box. Moreover, we investigate different methods of 3D box refinement and discover that a classification formulation with quality aware loss has much better performance than regression. Evaluated on the KITTI benchmark, our approach outperforms current state-of-the-art methods for single RGB image based 3D object detection.
Tasks 3D Object Detection, Autonomous Driving, Object Detection
Published 2019-03-26
URL http://arxiv.org/abs/1903.10955v2
PDF http://arxiv.org/pdf/1903.10955v2.pdf
PWC https://paperswithcode.com/paper/gs3d-an-efficient-3d-object-detection
Repo
Framework

Re-route Package Pickup and Delivery Planning with Random Demands

Title Re-route Package Pickup and Delivery Planning with Random Demands
Authors Suttinee Sawadsitang, Dusit Niyato, Kongrath Suankaewmanee, Puay Siew Tan
Abstract Recently, a higher competition in logistics business introduces new challenges to the vehicle routing problem (VRP). Re-route planning, also known as dynamic VRP, is one of the important challenges. The re-route planning has to be performed when new customers request for deliveries while the delivery vehicles, i.e., trucks, are serving other customers. While the re-route planning has been studied in the literature, most of the existing works do not consider different uncertainties. Therefore, in this paper, we propose two systems, i.e., (i) an offline package pickup and delivery planning with stochastic demands (PDPSD) and (ii) a re-route package pickup and delivery planning with stochastic demands (Re-route PDPSD). Accordingly, we formulate the PDPSD system as a two-stage stochastic optimization. We then extend the PDPSD system to the Re-route PDPSD system with a re-route algorithm. Furthermore, we evaluate performance of the proposed systems by using the dataset from Solomon Benchmark suite and a real data from a Singapore logistics 1company. The results show that the PDPSD system can achieve the lower cost than that of the baseline model. In addition, the Re-route PDPSD system can help the supplier efficiently and successfully to serve more customers while the trucks are already on the road.
Tasks Stochastic Optimization
Published 2019-07-24
URL https://arxiv.org/abs/1908.07827v1
PDF https://arxiv.org/pdf/1908.07827v1.pdf
PWC https://paperswithcode.com/paper/re-route-package-pickup-and-delivery-planning
Repo
Framework

Generating Diverse Story Continuations with Controllable Semantics

Title Generating Diverse Story Continuations with Controllable Semantics
Authors Lifu Tu, Xiaoan Ding, Dong Yu, Kevin Gimpel
Abstract We propose a simple and effective modeling framework for controlled generation of multiple, diverse outputs. We focus on the setting of generating the next sentence of a story given its context. As controllable dimensions, we consider several sentence attributes, including sentiment, length, predicates, frames, and automatically-induced clusters. Our empirical results demonstrate: (1) our framework is accurate in terms of generating outputs that match the target control values; (2) our model yields increased maximum metric scores compared to standard n-best list generation via beam search; (3) controlling generation with semantic frames leads to a stronger combination of diversity and quality than other control variables as measured by automatic metrics. We also conduct a human evaluation to assess the utility of providing multiple suggestions for creative writing, demonstrating promising results for the potential of controllable, diverse generation in a collaborative writing system.
Tasks
Published 2019-09-30
URL https://arxiv.org/abs/1909.13434v1
PDF https://arxiv.org/pdf/1909.13434v1.pdf
PWC https://paperswithcode.com/paper/generating-diverse-story-continuations-with
Repo
Framework

FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Clouds

Title FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Clouds
Authors Jie Zhou, Xin Tan, Zhiwei Shao, Lizhuang Ma
Abstract 3D object detection from raw and sparse point clouds has been far less treated to date, compared with its 2D counterpart. In this paper, we propose a novel framework called FVNet for 3D front-view proposal generation and object detection from point clouds. It consists of two stages: generation of front-view proposals and estimation of 3D bounding box parameters. Instead of generating proposals from camera images or bird’s-eye-view maps, we first project point clouds onto a cylindrical surface to generate front-view feature maps which retains rich information. We then introduce a proposal generation network to predict 3D region proposals from the generated maps and further extrude objects of interest from the whole point cloud. Finally, we present another network to extract the point-wise features from the extruded object points and regress the final 3D bounding box parameters in the canonical coordinates. Our framework achieves real-time performance with 12ms per point cloud sample. Extensive experiments on the 3D detection benchmark KITTI show that the proposed architecture outperforms state-of-the-art techniques which take either camera images or point clouds as input, in terms of accuracy and inference time.
Tasks 3D Object Detection, Object Detection, Real-Time Object Detection
Published 2019-03-26
URL https://arxiv.org/abs/1903.10750v3
PDF https://arxiv.org/pdf/1903.10750v3.pdf
PWC https://paperswithcode.com/paper/fvnet-3d-front-view-proposal-generation-for
Repo
Framework

2D Wasserstein Loss for Robust Facial Landmark Detection

Title 2D Wasserstein Loss for Robust Facial Landmark Detection
Authors Yongzhe Yan, Stefan Duffner, Priyanka Phutane, Anthony Berthelier, Christophe Blanc, Christophe Garcia, Thierry Chateau
Abstract Facial landmark detection is an important preprocessing task for most applications related to face analysis. In recent years, the performance of facial landmark detection has been significantly improved by using deep Convolutional Neural Networks (CNNs), especially the Heatmap Regression Models (HRMs). Although their performance on common benchmark datasets have reached a high level, the robustness of these models still remains a challenging problem in the practical use under more noisy conditions of realistic environments. Contrary to most existing work focusing on the design of new models, we argue that improving the robustness requires rethinking many other aspects, including the use of datasets, the format of landmark annotation, the evaluation metric as well as the training and detection algorithm itself. In this paper, we propose a novel method for robust facial landmark detection using a loss function based on the 2D Wasserstein distance combined with a new landmark coordinate sampling relying on the barycenter of the individual propability distributions. The most intriguing fact of our method is that it can be plugged-and-play on most state-of-the-art HRMs with neither additional complexity nor structural modifications of the models. Further, with the large performance increase of state-of-the-art deep CNN models, we found that current evaluation metrics can no longer fully reflect the robustness of these models. Therefore, we propose several improvements on the standard evaluation protocol. Extensive experimental results on both traditional evaluation metrics and our evaluation metrics demonstrate that our approach significantly improves the robustness of state-of-the-art facial landmark detection models.
Tasks Facial Landmark Detection
Published 2019-11-24
URL https://arxiv.org/abs/1911.10572v1
PDF https://arxiv.org/pdf/1911.10572v1.pdf
PWC https://paperswithcode.com/paper/2d-wasserstein-loss-for-robust-facial
Repo
Framework

Neurally-Guided Structure Inference

Title Neurally-Guided Structure Inference
Authors Sidi Lu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu
Abstract Most structure inference methods either rely on exhaustive search or are purely data-driven. Exhaustive search robustly infers the structure of arbitrarily complex data, but it is slow. Data-driven methods allow efficient inference, but do not generalize when test data have more complex structures than training data. In this paper, we propose a hybrid inference algorithm, the Neurally-Guided Structure Inference (NG-SI), keeping the advantages of both search-based and data-driven methods. The key idea of NG-SI is to use a neural network to guide the hierarchical, layer-wise search over the compositional space of structures. We evaluate our algorithm on two representative structure inference tasks: probabilistic matrix decomposition and symbolic program parsing. It outperforms data-driven and search-based alternatives on both tasks.
Tasks
Published 2019-06-17
URL https://arxiv.org/abs/1906.07304v2
PDF https://arxiv.org/pdf/1906.07304v2.pdf
PWC https://paperswithcode.com/paper/neurally-guided-structure-inference
Repo
Framework

Real-time 3D Traffic Cone Detection for Autonomous Driving

Title Real-time 3D Traffic Cone Detection for Autonomous Driving
Authors Ankit Dhall, Dengxin Dai, Luc Van Gool
Abstract Considerable progress has been made in semantic scene understanding of road scenes with monocular cameras. It is, however, mainly related to certain classes such as cars and pedestrians. This work investigates traffic cones, an object class crucial for traffic control in the context of autonomous vehicles. 3D object detection using images from a monocular camera is intrinsically an ill-posed problem. In this work, we leverage the unique structure of traffic cones and propose a pipelined approach to the problem. Specifically, we first detect cones in images by a tailored 2D object detector; then, the spatial arrangement of keypoints on a traffic cone are detected by our deep structural regression network, where the fact that the cross-ratio is projection invariant is leveraged for network regularization; finally, the 3D position of cones is recovered by the classical Perspective n-Point algorithm. Extensive experiments show that our approach can accurately detect traffic cones and estimate their position in the 3D world in real time. The proposed method is also deployed on a real-time, critical system. It runs efficiently on the low-power Jetson TX2, providing accurate 3D position estimates, allowing a race-car to map and drive autonomously on an unseen track indicated by traffic cones. With the help of robust and accurate perception, our race-car won both Formula Student Competitions held in Italy and Germany in 2018, cruising at a top-speed of 54 kmph. Visualization of the complete pipeline, mapping and navigation can be found on our project page.
Tasks 3D Object Detection, Autonomous Driving, Autonomous Vehicles, Object Detection, Scene Understanding
Published 2019-02-06
URL https://arxiv.org/abs/1902.02394v2
PDF https://arxiv.org/pdf/1902.02394v2.pdf
PWC https://paperswithcode.com/paper/real-time-3d-traffic-cone-detection-for
Repo
Framework

Hyper Vision Net: Kidney Tumor Segmentation Using Coordinate Convolutional Layer and Attention Unit

Title Hyper Vision Net: Kidney Tumor Segmentation Using Coordinate Convolutional Layer and Attention Unit
Authors D. Sabarinathan, M. Parisa Beham, S. M. Md. Mansoor Roomi
Abstract KiTs19 challenge paves the way to haste the improvement of solid kidney tumor semantic segmentation methodologies. Accurate segmentation of kidney tumor in computer tomography (CT) images is a challenging task due to the non-uniform motion, similar appearance and various shape. Inspired by this fact, in this manuscript, we present a novel kidney tumor segmentation method using deep learning network termed as Hyper vision Net model. All the existing U-net models are using a modified version of U-net to segment the kidney tumor region. In the proposed architecture, we introduced supervision layers in the decoder part, and it refines even minimal regions in the output. A dataset consists of real arterial phase abdominal CT scans of 300 patients, including 45964 images has been provided from KiTs19 for training and validation of the proposed model. Compared with the state-of-the-art segmentation methods, the results demonstrate the superiority of our approach on training dice value score of 0.9552 and 0.9633 in tumor region and kidney region, respectively.
Tasks Semantic Segmentation
Published 2019-08-09
URL https://arxiv.org/abs/1908.03339v1
PDF https://arxiv.org/pdf/1908.03339v1.pdf
PWC https://paperswithcode.com/paper/hyper-vision-net-kidney-tumor-segmentation
Repo
Framework

Animating Face using Disentangled Audio Representations

Title Animating Face using Disentangled Audio Representations
Authors Gaurav Mittal, Baoyuan Wang
Abstract All previous methods for audio-driven talking head generation assume the input audio to be clean with a neutral tone. As we show empirically, one can easily break these systems by simply adding certain background noise to the utterance or changing its emotional tone (to such as sad). To make talking head generation robust to such variations, we propose an explicit audio representation learning framework that disentangles audio sequences into various factors such as phonetic content, emotional tone, background noise and others. We conduct experiments to validate that conditioned on disentangled content representation, the generated mouth movement by our model is significantly more accurate than previous approaches (without disentangled learning) in the presence of noise and emotional variations. We further demonstrate that our framework is compatible with current state-of-the-art approaches by replacing their original audio learning component with ours. To our best knowledge, this is the first work which improves the performance of talking head generation from disentangled audio representation perspective, which is important for many real-world applications.
Tasks Representation Learning, Talking Head Generation
Published 2019-10-02
URL https://arxiv.org/abs/1910.00726v1
PDF https://arxiv.org/pdf/1910.00726v1.pdf
PWC https://paperswithcode.com/paper/animating-face-using-disentangled-audio
Repo
Framework

Human Languages in Source Code: Auto-Translation for Localized Instruction

Title Human Languages in Source Code: Auto-Translation for Localized Instruction
Authors Chris Piech, Sami Abu-El-Haija
Abstract Computer science education has promised open access around the world, but access is largely determined by what human language you speak. As younger students learn computer science it is less appropriate to assume that they should learn English beforehand. To that end we present CodeInternational, the first tool to translate code between human languages. To develop a theory of non-English code, and inform our translation decisions, we conduct a study of public code repositories on GitHub. The study is to the best of our knowledge the first on human-language in code and covers 2.9 million Java repositories. To demonstrate CodeInternational’s educational utility, we build an interactive version of the popular English-language Karel reader and translate it into 100 spoken languages. Our translations have already been used in classrooms around the world, and represent a first step in an important open CS-education problem.
Tasks
Published 2019-09-10
URL https://arxiv.org/abs/1909.04556v1
PDF https://arxiv.org/pdf/1909.04556v1.pdf
PWC https://paperswithcode.com/paper/human-languages-in-source-code-auto
Repo
Framework

Real-time tracker with fast recovery from target loss

Title Real-time tracker with fast recovery from target loss
Authors Alessandro Bay, Panagiotis Sidiropoulos, Eduard Vazquez, Michele Sasdelli
Abstract In this paper, we introduce a variation of a state-of-the-art real-time tracker (CFNet), which adds to the original algorithm robustness to target loss without a significant computational overhead. The new method is based on the assumption that the feature map can be used to estimate the tracking confidence more accurately. When the confidence is low, we avoid updating the object’s position through the feature map; instead, the tracker passes to a single-frame failure mode, during which the patch’s low-level visual content is used to swiftly update the object’s position, before recovering from the target loss in the next frame. The experimental evidence provided by evaluating the method on several tracking datasets validates both the theoretical assumption that the feature map is associated to tracking confidence, and that the proposed implementation can achieve target recovery in multiple scenarios, without compromising the real-time performance.
Tasks
Published 2019-02-12
URL http://arxiv.org/abs/1902.04570v1
PDF http://arxiv.org/pdf/1902.04570v1.pdf
PWC https://paperswithcode.com/paper/real-time-tracker-with-fast-recovery-from
Repo
Framework
comments powered by Disqus