January 28, 2020

3225 words 16 mins read

Paper Group ANR 976

The Wang-Landau Algorithm as Stochastic Optimization and Its Acceleration. Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving. A Unified Framework of Robust Submodular Optimization. Matrix Product State Based Quantum Classifier. GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving. Re- …

The Wang-Landau Algorithm as Stochastic Optimization and Its Acceleration


Title	The Wang-Landau Algorithm as Stochastic Optimization and Its Acceleration
Authors	Chenguang Dai, Jun S. Liu
Abstract	We show that the Wang-Landau algorithm can be formulated as a stochastic gradient descent algorithm minimizing a smooth and convex objective function, of which the gradient is estimated using Markov chain Monte Carlo iterations. The optimization formulation provides us a new way to establish the convergence rate of the Wang-Landau algorithm, by exploiting the fact that almost surely, the density estimates (on the logarithmic scale) remain in a compact set, upon which the objective function is strongly convex. The optimization viewpoint motivates us to improve the efficiency of the Wang-Landau algorithm using popular tools including the momentum method and the adaptive learning rate method. We demonstrate the accelerated Wang-Landau algorithm on a two-dimensional Ising model and a two-dimensional ten-state Potts model.
Tasks	Stochastic Optimization
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11985v2
PDF	https://arxiv.org/pdf/1907.11985v2.pdf
PWC	https://paperswithcode.com/paper/the-wang-landau-algorithm-as-stochastic
Repo
Framework

Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving


Title	Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving
Authors	Xinzhu Ma, Zhihui Wang, Haojie Li, Pengbo Zhang, Xin Fan, Wanli Ouyang
Abstract	In this paper, we propose a monocular 3D object detection framework in the domain of autonomous driving. Unlike previous image-based methods which focus on RGB feature extracted from 2D images, our method solves this problem in the reconstructed 3D space in order to exploit 3D contexts explicitly. To this end, we first leverage a stand-alone module to transform the input data from 2D image plane to 3D point clouds space for a better input representation, then we perform the 3D detection using PointNet backbone net to obtain objects 3D locations, dimensions and orientations. To enhance the discriminative capability of point clouds, we propose a multi-modal feature fusion module to embed the complementary RGB cue into the generated point clouds representation. We argue that it is more effective to infer the 3D bounding boxes from the generated 3D scene space (i.e., X,Y, Z space) compared to the image plane (i.e., R,G,B image plane). Evaluation on the challenging KITTI dataset shows that our approach boosts the performance of state-of-the-art monocular approach by a large margin.
Tasks	3D Object Detection, 3D Reconstruction, Autonomous Driving, Object Detection
Published	2019-03-27
URL	https://arxiv.org/abs/1903.11444v3
PDF	https://arxiv.org/pdf/1903.11444v3.pdf
PWC	https://paperswithcode.com/paper/accurate-monocular-3d-object-detection-via
Repo
Framework

A Unified Framework of Robust Submodular Optimization


Title	A Unified Framework of Robust Submodular Optimization
Authors	Rishabh Iyer
Abstract	In this paper, we shall study a unified framework of robust submodular optimization. We study this problem both from a minimization and maximization perspective (previous work has only focused on variants of robust submodular maximization). We do this under a broad range of combinatorial constraints including cardinality, knapsack, matroid as well as graph based constraints such as cuts, paths, matchings and trees. Furthermore, we also study robust submodular minimization and maximization under multiple submodular upper and lower bound constraints. We show that all these problems are motivated by important machine learning applications including robust data subset selection, robust co-operative cuts and robust co-operative matchings. In each case, we provide scalable approximation algorithms and also study hardness bounds. Finally, we empirically demonstrate the utility of our algorithms on real world applications.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06393v1
PDF	https://arxiv.org/pdf/1906.06393v1.pdf
PWC	https://paperswithcode.com/paper/a-unified-framework-of-robust-submodular
Repo
Framework

Matrix Product State Based Quantum Classifier


Title	Matrix Product State Based Quantum Classifier
Authors	Amandeep Singh Bhatia, Mandeep Kaur Saggi, Ajay Kumar, Sushma Jain
Abstract	In recent years, interest in expressing the success of neural networks to the quantum computing has increased significantly. Tensor network theory has become increasingly popular and widely used to simulate strongly entangled correlated systems. Matrix product state (MPS) is the well-designed class of tensor network states, which plays an important role in processing of quantum information. In this paper, we have shown that matrix product state as one-dimensional array of tensors can be used to classify classical and quantum data. We have performed binary classification of classical machine learning dataset Iris encoded in a quantum state. Further, we have investigated the performance by considering different parameters on the ibmqx4 quantum computer and proved that MPS circuits can be used to attain better accuracy. Further, the learning ability of MPS quantum classifier is tested to classify evapotranspiration ($ET_{o}$) for Patiala meteorological station located in Northern Punjab (India), using three years of historical dataset (Agri). Furthermore, we have used different performance metrics of classification to measure its capability. Finally, the results are plotted and degree of correspondence among values of each sample is shown.
Tasks
Published	2019-05-04
URL	https://arxiv.org/abs/1905.01426v1
PDF	https://arxiv.org/pdf/1905.01426v1.pdf
PWC	https://paperswithcode.com/paper/matrix-product-state-based-quantum-classifier
Repo
Framework

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving


Title	GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving
Authors	Buyu Li, Wanli Ouyang, Lu Sheng, Xingyu Zeng, Xiaogang Wang
Abstract	We present an efficient 3D object detection framework based on a single RGB image in the scenario of autonomous driving. Our efforts are put on extracting the underlying 3D information in a 2D image and determining the accurate 3D bounding box of the object without point cloud or stereo data. Leveraging the off-the-shelf 2D object detector, we propose an artful approach to efficiently obtain a coarse cuboid for each predicted 2D box. The coarse cuboid has enough accuracy to guide us to determine the 3D box of the object by refinement. In contrast to previous state-of-the-art methods that only use the features extracted from the 2D bounding box for box refinement, we explore the 3D structure information of the object by employing the visual features of visible surfaces. The new features from surfaces are utilized to eliminate the problem of representation ambiguity brought by only using a 2D bounding box. Moreover, we investigate different methods of 3D box refinement and discover that a classification formulation with quality aware loss has much better performance than regression. Evaluated on the KITTI benchmark, our approach outperforms current state-of-the-art methods for single RGB image based 3D object detection.
Tasks	3D Object Detection, Autonomous Driving, Object Detection
Published	2019-03-26
URL	http://arxiv.org/abs/1903.10955v2
PDF	http://arxiv.org/pdf/1903.10955v2.pdf
PWC	https://paperswithcode.com/paper/gs3d-an-efficient-3d-object-detection
Repo
Framework

Re-route Package Pickup and Delivery Planning with Random Demands


Title	Re-route Package Pickup and Delivery Planning with Random Demands
Authors	Suttinee Sawadsitang, Dusit Niyato, Kongrath Suankaewmanee, Puay Siew Tan
Abstract	Recently, a higher competition in logistics business introduces new challenges to the vehicle routing problem (VRP). Re-route planning, also known as dynamic VRP, is one of the important challenges. The re-route planning has to be performed when new customers request for deliveries while the delivery vehicles, i.e., trucks, are serving other customers. While the re-route planning has been studied in the literature, most of the existing works do not consider different uncertainties. Therefore, in this paper, we propose two systems, i.e., (i) an offline package pickup and delivery planning with stochastic demands (PDPSD) and (ii) a re-route package pickup and delivery planning with stochastic demands (Re-route PDPSD). Accordingly, we formulate the PDPSD system as a two-stage stochastic optimization. We then extend the PDPSD system to the Re-route PDPSD system with a re-route algorithm. Furthermore, we evaluate performance of the proposed systems by using the dataset from Solomon Benchmark suite and a real data from a Singapore logistics 1company. The results show that the PDPSD system can achieve the lower cost than that of the baseline model. In addition, the Re-route PDPSD system can help the supplier efficiently and successfully to serve more customers while the trucks are already on the road.
Tasks	Stochastic Optimization
Published	2019-07-24
URL	https://arxiv.org/abs/1908.07827v1
PDF	https://arxiv.org/pdf/1908.07827v1.pdf
PWC	https://paperswithcode.com/paper/re-route-package-pickup-and-delivery-planning
Repo
Framework

Generating Diverse Story Continuations with Controllable Semantics


Title	Generating Diverse Story Continuations with Controllable Semantics
Authors	Lifu Tu, Xiaoan Ding, Dong Yu, Kevin Gimpel
Abstract	We propose a simple and effective modeling framework for controlled generation of multiple, diverse outputs. We focus on the setting of generating the next sentence of a story given its context. As controllable dimensions, we consider several sentence attributes, including sentiment, length, predicates, frames, and automatically-induced clusters. Our empirical results demonstrate: (1) our framework is accurate in terms of generating outputs that match the target control values; (2) our model yields increased maximum metric scores compared to standard n-best list generation via beam search; (3) controlling generation with semantic frames leads to a stronger combination of diversity and quality than other control variables as measured by automatic metrics. We also conduct a human evaluation to assess the utility of providing multiple suggestions for creative writing, demonstrating promising results for the potential of controllable, diverse generation in a collaborative writing system.
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13434v1
PDF	https://arxiv.org/pdf/1909.13434v1.pdf
PWC	https://paperswithcode.com/paper/generating-diverse-story-continuations-with
Repo
Framework

FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Clouds


Title	FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Clouds
Authors	Jie Zhou, Xin Tan, Zhiwei Shao, Lizhuang Ma
Abstract	3D object detection from raw and sparse point clouds has been far less treated to date, compared with its 2D counterpart. In this paper, we propose a novel framework called FVNet for 3D front-view proposal generation and object detection from point clouds. It consists of two stages: generation of front-view proposals and estimation of 3D bounding box parameters. Instead of generating proposals from camera images or bird’s-eye-view maps, we first project point clouds onto a cylindrical surface to generate front-view feature maps which retains rich information. We then introduce a proposal generation network to predict 3D region proposals from the generated maps and further extrude objects of interest from the whole point cloud. Finally, we present another network to extract the point-wise features from the extruded object points and regress the final 3D bounding box parameters in the canonical coordinates. Our framework achieves real-time performance with 12ms per point cloud sample. Extensive experiments on the 3D detection benchmark KITTI show that the proposed architecture outperforms state-of-the-art techniques which take either camera images or point clouds as input, in terms of accuracy and inference time.
Tasks	3D Object Detection, Object Detection, Real-Time Object Detection
Published	2019-03-26
URL	https://arxiv.org/abs/1903.10750v3
PDF	https://arxiv.org/pdf/1903.10750v3.pdf
PWC	https://paperswithcode.com/paper/fvnet-3d-front-view-proposal-generation-for
Repo
Framework

2D Wasserstein Loss for Robust Facial Landmark Detection


Title	2D Wasserstein Loss for Robust Facial Landmark Detection
Authors	Yongzhe Yan, Stefan Duffner, Priyanka Phutane, Anthony Berthelier, Christophe Blanc, Christophe Garcia, Thierry Chateau
Abstract	Facial landmark detection is an important preprocessing task for most applications related to face analysis. In recent years, the performance of facial landmark detection has been significantly improved by using deep Convolutional Neural Networks (CNNs), especially the Heatmap Regression Models (HRMs). Although their performance on common benchmark datasets have reached a high level, the robustness of these models still remains a challenging problem in the practical use under more noisy conditions of realistic environments. Contrary to most existing work focusing on the design of new models, we argue that improving the robustness requires rethinking many other aspects, including the use of datasets, the format of landmark annotation, the evaluation metric as well as the training and detection algorithm itself. In this paper, we propose a novel method for robust facial landmark detection using a loss function based on the 2D Wasserstein distance combined with a new landmark coordinate sampling relying on the barycenter of the individual propability distributions. The most intriguing fact of our method is that it can be plugged-and-play on most state-of-the-art HRMs with neither additional complexity nor structural modifications of the models. Further, with the large performance increase of state-of-the-art deep CNN models, we found that current evaluation metrics can no longer fully reflect the robustness of these models. Therefore, we propose several improvements on the standard evaluation protocol. Extensive experimental results on both traditional evaluation metrics and our evaluation metrics demonstrate that our approach significantly improves the robustness of state-of-the-art facial landmark detection models.
Tasks	Facial Landmark Detection
Published	2019-11-24
URL	https://arxiv.org/abs/1911.10572v1
PDF	https://arxiv.org/pdf/1911.10572v1.pdf
PWC	https://paperswithcode.com/paper/2d-wasserstein-loss-for-robust-facial
Repo
Framework

Neurally-Guided Structure Inference


Title	Neurally-Guided Structure Inference
Authors	Sidi Lu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu
Abstract	Most structure inference methods either rely on exhaustive search or are purely data-driven. Exhaustive search robustly infers the structure of arbitrarily complex data, but it is slow. Data-driven methods allow efficient inference, but do not generalize when test data have more complex structures than training data. In this paper, we propose a hybrid inference algorithm, the Neurally-Guided Structure Inference (NG-SI), keeping the advantages of both search-based and data-driven methods. The key idea of NG-SI is to use a neural network to guide the hierarchical, layer-wise search over the compositional space of structures. We evaluate our algorithm on two representative structure inference tasks: probabilistic matrix decomposition and symbolic program parsing. It outperforms data-driven and search-based alternatives on both tasks.
Tasks
Published	2019-06-17
URL	https://arxiv.org/abs/1906.07304v2
PDF	https://arxiv.org/pdf/1906.07304v2.pdf
PWC	https://paperswithcode.com/paper/neurally-guided-structure-inference
Repo
Framework

Real-time 3D Traffic Cone Detection for Autonomous Driving


Title	Real-time 3D Traffic Cone Detection for Autonomous Driving
Authors	Ankit Dhall, Dengxin Dai, Luc Van Gool
Abstract	Considerable progress has been made in semantic scene understanding of road scenes with monocular cameras. It is, however, mainly related to certain classes such as cars and pedestrians. This work investigates traffic cones, an object class crucial for traffic control in the context of autonomous vehicles. 3D object detection using images from a monocular camera is intrinsically an ill-posed problem. In this work, we leverage the unique structure of traffic cones and propose a pipelined approach to the problem. Specifically, we first detect cones in images by a tailored 2D object detector; then, the spatial arrangement of keypoints on a traffic cone are detected by our deep structural regression network, where the fact that the cross-ratio is projection invariant is leveraged for network regularization; finally, the 3D position of cones is recovered by the classical Perspective n-Point algorithm. Extensive experiments show that our approach can accurately detect traffic cones and estimate their position in the 3D world in real time. The proposed method is also deployed on a real-time, critical system. It runs efficiently on the low-power Jetson TX2, providing accurate 3D position estimates, allowing a race-car to map and drive autonomously on an unseen track indicated by traffic cones. With the help of robust and accurate perception, our race-car won both Formula Student Competitions held in Italy and Germany in 2018, cruising at a top-speed of 54 kmph. Visualization of the complete pipeline, mapping and navigation can be found on our project page.
Tasks	3D Object Detection, Autonomous Driving, Autonomous Vehicles, Object Detection, Scene Understanding
Published	2019-02-06
URL	https://arxiv.org/abs/1902.02394v2
PDF	https://arxiv.org/pdf/1902.02394v2.pdf
PWC	https://paperswithcode.com/paper/real-time-3d-traffic-cone-detection-for
Repo
Framework

Hyper Vision Net: Kidney Tumor Segmentation Using Coordinate Convolutional Layer and Attention Unit


Title	Hyper Vision Net: Kidney Tumor Segmentation Using Coordinate Convolutional Layer and Attention Unit
Authors	D. Sabarinathan, M. Parisa Beham, S. M. Md. Mansoor Roomi
Abstract	KiTs19 challenge paves the way to haste the improvement of solid kidney tumor semantic segmentation methodologies. Accurate segmentation of kidney tumor in computer tomography (CT) images is a challenging task due to the non-uniform motion, similar appearance and various shape. Inspired by this fact, in this manuscript, we present a novel kidney tumor segmentation method using deep learning network termed as Hyper vision Net model. All the existing U-net models are using a modified version of U-net to segment the kidney tumor region. In the proposed architecture, we introduced supervision layers in the decoder part, and it refines even minimal regions in the output. A dataset consists of real arterial phase abdominal CT scans of 300 patients, including 45964 images has been provided from KiTs19 for training and validation of the proposed model. Compared with the state-of-the-art segmentation methods, the results demonstrate the superiority of our approach on training dice value score of 0.9552 and 0.9633 in tumor region and kidney region, respectively.
Tasks	Semantic Segmentation
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03339v1
PDF	https://arxiv.org/pdf/1908.03339v1.pdf
PWC	https://paperswithcode.com/paper/hyper-vision-net-kidney-tumor-segmentation
Repo
Framework

Animating Face using Disentangled Audio Representations


Title	Animating Face using Disentangled Audio Representations
Authors	Gaurav Mittal, Baoyuan Wang
Abstract	All previous methods for audio-driven talking head generation assume the input audio to be clean with a neutral tone. As we show empirically, one can easily break these systems by simply adding certain background noise to the utterance or changing its emotional tone (to such as sad). To make talking head generation robust to such variations, we propose an explicit audio representation learning framework that disentangles audio sequences into various factors such as phonetic content, emotional tone, background noise and others. We conduct experiments to validate that conditioned on disentangled content representation, the generated mouth movement by our model is significantly more accurate than previous approaches (without disentangled learning) in the presence of noise and emotional variations. We further demonstrate that our framework is compatible with current state-of-the-art approaches by replacing their original audio learning component with ours. To our best knowledge, this is the first work which improves the performance of talking head generation from disentangled audio representation perspective, which is important for many real-world applications.
Tasks	Representation Learning, Talking Head Generation
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00726v1
PDF	https://arxiv.org/pdf/1910.00726v1.pdf
PWC	https://paperswithcode.com/paper/animating-face-using-disentangled-audio
Repo
Framework

Human Languages in Source Code: Auto-Translation for Localized Instruction


Title	Human Languages in Source Code: Auto-Translation for Localized Instruction
Authors	Chris Piech, Sami Abu-El-Haija
Abstract	Computer science education has promised open access around the world, but access is largely determined by what human language you speak. As younger students learn computer science it is less appropriate to assume that they should learn English beforehand. To that end we present CodeInternational, the first tool to translate code between human languages. To develop a theory of non-English code, and inform our translation decisions, we conduct a study of public code repositories on GitHub. The study is to the best of our knowledge the first on human-language in code and covers 2.9 million Java repositories. To demonstrate CodeInternational’s educational utility, we build an interactive version of the popular English-language Karel reader and translate it into 100 spoken languages. Our translations have already been used in classrooms around the world, and represent a first step in an important open CS-education problem.
Tasks
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04556v1
PDF	https://arxiv.org/pdf/1909.04556v1.pdf
PWC	https://paperswithcode.com/paper/human-languages-in-source-code-auto
Repo
Framework

Real-time tracker with fast recovery from target loss


Title	Real-time tracker with fast recovery from target loss
Authors	Alessandro Bay, Panagiotis Sidiropoulos, Eduard Vazquez, Michele Sasdelli
Abstract	In this paper, we introduce a variation of a state-of-the-art real-time tracker (CFNet), which adds to the original algorithm robustness to target loss without a significant computational overhead. The new method is based on the assumption that the feature map can be used to estimate the tracking confidence more accurately. When the confidence is low, we avoid updating the object’s position through the feature map; instead, the tracker passes to a single-frame failure mode, during which the patch’s low-level visual content is used to swiftly update the object’s position, before recovering from the target loss in the next frame. The experimental evidence provided by evaluating the method on several tracking datasets validates both the theoretical assumption that the feature map is associated to tracking confidence, and that the proposed implementation can achieve target recovery in multiple scenarios, without compromising the real-time performance.
Tasks
Published	2019-02-12
URL	http://arxiv.org/abs/1902.04570v1
PDF	http://arxiv.org/pdf/1902.04570v1.pdf
PWC	https://paperswithcode.com/paper/real-time-tracker-with-fast-recovery-from
Repo
Framework