October 20, 2019

3127 words 15 mins read

Paper Group AWR 341

Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP. S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning. Scalable Micro-planned Generation of Discourse from Structured Data. The Double Sphere Camera Model. Hyperparameters and Tuning Strategies for Random Forest. SqueezeSegV2: I …

Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP


Title	Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP
Authors	Marcelo O. R. Prates, Pedro H. C. Avelar, Henrique Lemos, Luis Lamb, Moshe Vardi
Abstract	Graph Neural Networks (GNN) are a promising technique for bridging differential programming and combinatorial domains. GNNs employ trainable modules which can be assembled in different configurations that reflect the relational structure of each problem instance. In this paper, we show that GNNs can learn to solve, with very little supervision, the decision variant of the Traveling Salesperson Problem (TSP), a highly relevant $\mathcal{NP}$-Complete problem. Our model is trained to function as an effective message-passing algorithm in which edges (embedded with their weights) communicate with vertices for a number of iterations after which the model is asked to decide whether a route with cost $<C$ exists. We show that such a network can be trained with sets of dual examples: given the optimal tour cost $C^{}$, we produce one decision instance with target cost $x%$ smaller and one with target cost $x%$ larger than $C^{}$. We were able to obtain $80%$ accuracy training with $-2%,+2%$ deviations, and the same trained model can generalize for more relaxed deviations with increasing performance. We also show that the model is capable of generalizing for larger problem sizes. Finally, we provide a method for predicting the optimal route cost within $2%$ deviation from the ground truth. In summary, our work shows that Graph Neural Networks are powerful enough to solve $\mathcal{NP}$-Complete problems which combine symbolic and numeric data.
Tasks
Published	2018-09-08
URL	http://arxiv.org/abs/1809.02721v3
PDF	http://arxiv.org/pdf/1809.02721v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-solve-np-complete-problems-a
Repo	https://github.com/machine-reasoning-ufrgs/TSP-GNN
Framework	tf

S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning


Title	S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning
Authors	Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat
Abstract	State representation learning aims at learning compact representations from raw observations in robotics and control applications. Approaches used for this objective are auto-encoders, learning forward models, inverse dynamics or learning using generic priors on the state characteristics. However, the diversity in applications and methods makes the field lack standard evaluation datasets, metrics and tasks. This paper provides a set of environments, data generators, robotic control tasks, metrics and tools to facilitate iterative state representation learning and evaluation in reinforcement learning settings.
Tasks	Representation Learning
Published	2018-09-25
URL	http://arxiv.org/abs/1809.09369v2
PDF	http://arxiv.org/pdf/1809.09369v2.pdf
PWC	https://paperswithcode.com/paper/s-rl-toolbox-environments-datasets-and
Repo	https://github.com/araffin/robotics-rl-srl
Framework	none

Scalable Micro-planned Generation of Discourse from Structured Data


Title	Scalable Micro-planned Generation of Discourse from Structured Data
Authors	Anirban Laha, Parag Jain, Abhijit Mishra, Karthik Sankaranarayanan
Abstract	We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically employ end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore, exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. It rather relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains. Our system employs a 3-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain dataset curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other popular datasets covering diverse data types such as Knowledge Graphs and Key-Value maps.
Tasks	Knowledge Graphs, Text Generation
Published	2018-10-05
URL	https://arxiv.org/abs/1810.02889v3
PDF	https://arxiv.org/pdf/1810.02889v3.pdf
PWC	https://paperswithcode.com/paper/scalable-micro-planned-generation-of
Repo	https://github.com/parajain/structscribe
Framework	pytorch

The Double Sphere Camera Model


Title	The Double Sphere Camera Model
Authors	Vladyslav Usenko, Nikolaus Demmel, Daniel Cremers
Abstract	Vision-based motion estimation and 3D reconstruction, which have numerous applications (e.g., autonomous driving, navigation systems for airborne devices and augmented reality) are receiving significant research attention. To increase the accuracy and robustness, several researchers have recently demonstrated the benefit of using large field-of-view cameras for such applications. In this paper, we provide an extensive review of existing models for large field-of-view cameras. For each model we provide projection and unprojection functions and the subspace of points that result in valid projection. Then, we propose the Double Sphere camera model that well fits with large field-of-view lenses, is computationally inexpensive and has a closed-form inverse. We evaluate the model using a calibration dataset with several different lenses and compare the models using the metrics that are relevant for Visual Odometry, i.e., reprojection error, as well as computation time for projection and unprojection functions and their Jacobians. We also provide qualitative results and discuss the performance of all models.
Tasks	3D Reconstruction, Autonomous Driving, Calibration, Motion Estimation, Visual Odometry
Published	2018-07-24
URL	http://arxiv.org/abs/1807.08957v2
PDF	http://arxiv.org/pdf/1807.08957v2.pdf
PWC	https://paperswithcode.com/paper/the-double-sphere-camera-model
Repo	https://github.com/VladyslavUsenko/basalt-mirror
Framework	none

Hyperparameters and Tuning Strategies for Random Forest


Title	Hyperparameters and Tuning Strategies for Random Forest
Authors	Philipp Probst, Marvin Wright, Anne-Laure Boulesteix
Abstract	The random forest algorithm (RF) has several hyperparameters that have to be set by the user, e.g., the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain and the number of trees. In this paper, we first provide a literature review on the parameters’ influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a brief overview of tuning strategies we demonstrate the application of one of the most established tuning strategies, model-based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters.
Tasks
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03515v2
PDF	http://arxiv.org/pdf/1804.03515v2.pdf
PWC	https://paperswithcode.com/paper/hyperparameters-and-tuning-strategies-for
Repo	https://github.com/PhilippPro/tuneRanger
Framework	none

SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud


Title	SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud
Authors	Bichen Wu, Xuanyu Zhou, Sicheng Zhao, Xiangyu Yue, Kurt Keutzer
Abstract	Earlier work demonstrates the promise of deep-learning-based approaches for point cloud segmentation; however, these approaches need to be improved to be practically useful. To this end, we introduce a new model SqueezeSegV2 that is more robust to dropout noise in LiDAR point clouds. With improved model structure, training loss, batch normalization and additional input channel, SqueezeSegV2 achieves significant accuracy improvement when trained on real data. Training models for point cloud segmentation requires large amounts of labeled point-cloud data, which is expensive to obtain. To sidestep the cost of collection and annotation, simulators such as GTA-V can be used to create unlimited amounts of labeled, synthetic data. However, due to domain shift, models trained on synthetic data often do not generalize well to the real world. We address this problem with a domain-adaptation training pipeline consisting of three major components: 1) learned intensity rendering, 2) geodesic correlation alignment, and 3) progressive domain calibration. When trained on real data, our new model exhibits segmentation accuracy improvements of 6.0-8.6% over the original SqueezeSeg. When training our new model on synthetic data using the proposed domain adaptation pipeline, we nearly double test accuracy on real-world data, from 29.0% to 57.4%. Our source code and synthetic dataset will be open-sourced.
Tasks	Calibration, Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation
Published	2018-09-22
URL	http://arxiv.org/abs/1809.08495v1
PDF	http://arxiv.org/pdf/1809.08495v1.pdf
PWC	https://paperswithcode.com/paper/squeezesegv2-improved-model-structure-and
Repo	https://github.com/xuanyuzhou98/SqueezeSegV2
Framework	tf

3D Human Pose Estimation with 2D Marginal Heatmaps


Title	3D Human Pose Estimation with 2D Marginal Heatmaps
Authors	Aiden Nibali, Zhen He, Stuart Morgan, Luke Prendergast
Abstract	Automatically determining three-dimensional human pose from monocular RGB image data is a challenging problem. The two-dimensional nature of the input results in intrinsic ambiguities which make inferring depth particularly difficult. Recently, researchers have demonstrated that the flexible statistical modelling capabilities of deep neural networks are sufficient to make such inferences with reasonable accuracy. However, many of these models use coordinate output techniques which are memory-intensive, not differentiable, and/or do not spatially generalise well. We propose improvements to 3D coordinate prediction which avoid the aforementioned undesirable traits by predicting 2D marginal heatmaps under an augmented soft-argmax scheme. Our resulting model, MargiPose, produces visually coherent heatmaps whilst maintaining differentiability. We are also able to achieve state-of-the-art accuracy on publicly available 3D human pose estimation data.
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01484v2
PDF	http://arxiv.org/pdf/1806.01484v2.pdf
PWC	https://paperswithcode.com/paper/3d-human-pose-estimation-with-2d-marginal
Repo	https://github.com/anibali/margipose
Framework	pytorch

Multimodal One-Shot Learning of Speech and Images


Title	Multimodal One-Shot Learning of Speech and Images
Authors	Ryan Eloff, Herman A. Engelbrecht, Herman Kamper
Abstract	Imagine a robot is shown new concepts visually together with spoken tags, e.g. “milk”, “eggs”, “butter”. After seeing one paired audio-visual example per class, it is shown a new set of unseen instances of these objects, and asked to pick the “milk”. Without receiving any hard labels, could it learn to match the new continuous speech input to the correct visual instance? Although unimodal one-shot learning has been studied, where one labelled example in a single modality is given per class, this example motivates multimodal one-shot learning. Our main contribution is to formally define this task, and to propose several baseline and advanced models. We use a dataset of paired spoken and visual digits to specifically investigate recent advances in Siamese convolutional neural networks. Our best Siamese model achieves twice the accuracy of a nearest neighbour model using pixel-distance over images and dynamic time warping over speech in 11-way cross-modal matching.
Tasks	One-Shot Learning
Published	2018-11-09
URL	http://arxiv.org/abs/1811.03875v2
PDF	http://arxiv.org/pdf/1811.03875v2.pdf
PWC	https://paperswithcode.com/paper/multimodal-one-shot-learning-of-speech-and
Repo	https://github.com/rpeloff/multimodal_one_shot_learning
Framework	tf

MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects


Title	MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects
Authors	Martin Rünz, Maud Buffier, Lourdes Agapito
Abstract	We present MaskFusion, a real-time, object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene. MaskFusion recognizes, segments and assigns semantic class labels to different objects in the scene, while tracking and reconstructing them even when they move independently from the camera. As an RGB-D camera scans a cluttered scene, image-based instance-level semantic segmentation creates semantic object masks that enable real-time object recognition and the creation of an object-level representation for the world map. Unlike previous recognition-based SLAM systems, MaskFusion does not require known models of the objects it can recognize, and can deal with multiple independent motions. MaskFusion takes full advantage of using instance-level semantic segmentation to enable semantic labels to be fused into an object-aware map, unlike recent semantics enabled SLAM systems that perform voxel-level semantic segmentation. We show augmented-reality applications that demonstrate the unique features of the map output by MaskFusion: instance-aware, semantic and dynamic.
Tasks	Object Recognition, Semantic Segmentation
Published	2018-04-24
URL	http://arxiv.org/abs/1804.09194v2
PDF	http://arxiv.org/pdf/1804.09194v2.pdf
PWC	https://paperswithcode.com/paper/maskfusion-real-time-recognition-tracking-and
Repo	https://github.com/martinruenz/maskfusion
Framework	tf

Classification from Positive, Unlabeled and Biased Negative Data


Title	Classification from Positive, Unlabeled and Biased Negative Data
Authors	Yu-Guan Hsieh, Gang Niu, Masashi Sugiyama
Abstract	In binary classification, there are situations where negative (N) data are too diverse to be fully labeled and we often resort to positive-unlabeled (PU) learning in these scenarios. However, collecting a non-representative N set that contains only a small portion of all possible N data can often be much easier in practice. This paper studies a novel classification framework which incorporates such biased N (bN) data in PU learning. We provide a method based on empirical risk minimization to address this PUbN classification problem. Our approach can be regarded as a novel example-weighting algorithm, with the weight of each example computed through a preliminary step that draws inspiration from PU learning. We also derive an estimation error bound for the proposed method. Experimental results demonstrate the effectiveness of our algorithm in not only PUbN learning scenarios but also ordinary PU learning scenarios on several benchmark datasets.
Tasks
Published	2018-10-01
URL	https://arxiv.org/abs/1810.00846v2
PDF	https://arxiv.org/pdf/1810.00846v2.pdf
PWC	https://paperswithcode.com/paper/classification-from-positive-unlabeled-and
Repo	https://github.com/ZaydH/covariate_shift_risk_estimation
Framework	pytorch

Dist-GAN: An Improved GAN using Distance Constraints


Title	Dist-GAN: An Improved GAN using Distance Constraints
Authors	Ngoc-Trung Tran, Tuan-Anh Bui, Ngai-Man Cheung
Abstract	We introduce effective training algorithms for Generative Adversarial Networks (GAN) to alleviate mode collapse and gradient vanishing. In our system, we constrain the generator by an Autoencoder (AE). We propose a formulation to consider the reconstructed samples from AE as “real” samples for the discriminator. This couples the convergence of the AE with that of the discriminator, effectively slowing down the convergence of discriminator and reducing gradient vanishing. Importantly, we propose two novel distance constraints to improve the generator. First, we propose a latent-data distance constraint to enforce compatibility between the latent sample distances and the corresponding data sample distances. We use this constraint to explicitly prevent the generator from mode collapse. Second, we propose a discriminator-score distance constraint to align the distribution of the generated samples with that of the real samples through the discriminator score. We use this constraint to guide the generator to synthesize samples that resemble the real ones. Our proposed GAN using these distance constraints, namely Dist-GAN, can achieve better results than state-of-the-art methods across benchmark datasets: synthetic, MNIST, MNIST-1K, CelebA, CIFAR-10 and STL-10 datasets. Our code is published here (https://github.com/tntrung/gan) for research.
Tasks	Image Generation
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08887v3
PDF	http://arxiv.org/pdf/1803.08887v3.pdf
PWC	https://paperswithcode.com/paper/dist-gan-an-improved-gan-using-distance
Repo	https://github.com/tntrung/gan
Framework	tf

Improved training of end-to-end attention models for speech recognition


Title	Improved training of end-to-end attention models for speech recognition
Authors	Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney
Abstract	Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h and LibriSpeech 1000h tasks. In particular, we report the state-of-the-art word error rates (WER) of 3.54% on the dev-clean and 3.82% on the test-clean evaluation subsets of LibriSpeech. We introduce a new pretraining scheme by starting with a high time reduction factor and lowering it during training, which is crucial both for convergence and final performance. In some experiments, we also use an auxiliary CTC loss function to help the convergence. In addition, we train long short-term memory (LSTM) language models on subword units. By shallow fusion, we report up to 27% relative improvements in WER over the attention baseline without a language model.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2018-05-08
URL	http://arxiv.org/abs/1805.03294v1
PDF	http://arxiv.org/pdf/1805.03294v1.pdf
PWC	https://paperswithcode.com/paper/improved-training-of-end-to-end-attention
Repo	https://github.com/pvsimoes/our_espnet
Framework	pytorch

Online Temporal Calibration for Monocular Visual-Inertial Systems


Title	Online Temporal Calibration for Monocular Visual-Inertial Systems
Authors	Tong Qin, Shaojie Shen
Abstract	Accurate state estimation is a fundamental module for various intelligent applications, such as robot navigation, autonomous driving, virtual and augmented reality. Visual and inertial fusion is a popular technology for 6-DOF state estimation in recent years. Time instants at which different sensors’ measurements are recorded are of crucial importance to the system’s robustness and accuracy. In practice, timestamps of each sensor typically suffer from triggering and transmission delays, leading to temporal misalignment (time offsets) among different sensors. Such temporal offset dramatically influences the performance of sensor fusion. To this end, we propose an online approach for calibrating temporal offset between visual and inertial measurements. Our approach achieves temporal offset calibration by jointly optimizing time offset, camera and IMU states, as well as feature locations in a SLAM system. Furthermore, the approach is a general model, which can be easily employed in several feature-based optimization frameworks. Simulation and experimental results demonstrate the high accuracy of our calibration approach even compared with other state-of-art offline tools. The VIO comparison against other methods proves that the online temporal calibration significantly benefits visual-inertial systems. The source code of temporal calibration is integrated into our public project, VINS-Mono.
Tasks	Autonomous Driving, Calibration, Robot Navigation, Sensor Fusion, Time Offset Calibration
Published	2018-08-02
URL	http://arxiv.org/abs/1808.00692v1
PDF	http://arxiv.org/pdf/1808.00692v1.pdf
PWC	https://paperswithcode.com/paper/online-temporal-calibration-for-monocular
Repo	https://github.com/HKUST-Aerial-Robotics/VINS-Mono
Framework	tf

Autonomous Driving in Reality with Reinforcement Learning and Image Translation


Title	Autonomous Driving in Reality with Reinforcement Learning and Image Translation
Authors	Nayun Xu, Bowen Tan, Bingyu Kong
Abstract	Supervised learning is widely used in training autonomous driving vehicle. However, it is trained with large amount of supervised labeled data. Reinforcement learning can be trained without abundant labeled data, but we cannot train it in reality because it would involve many unpredictable accidents. Nevertheless, training an agent with good performance in virtual environment is relatively much easier. Because of the huge difference between virtual and real, how to fill the gap between virtual and real is challenging. In this paper, we proposed a novel framework of reinforcement learning with image semantic segmentation network to make the whole model adaptable to reality. The agent is trained in TORCS, a car racing simulator.
Tasks	Autonomous Driving, Car Racing, Semantic Segmentation
Published	2018-01-13
URL	http://arxiv.org/abs/1801.05299v2
PDF	http://arxiv.org/pdf/1801.05299v2.pdf
PWC	https://paperswithcode.com/paper/autonomous-driving-in-reality-with
Repo	https://github.com/SullyChen/Autopilot-TensorFlow
Framework	tf

BanditSum: Extractive Summarization as a Contextual Bandit


Title	BanditSum: Extractive Summarization as a Contextual Bandit
Authors	Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung
Abstract	In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels. We call our approach BanditSum as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BanditSum is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BanditSum performs significantly better than competing approaches when good summary sentences appear late in the source document.
Tasks
Published	2018-09-25
URL	https://arxiv.org/abs/1809.09672v3
PDF	https://arxiv.org/pdf/1809.09672v3.pdf
PWC	https://paperswithcode.com/paper/banditsum-extractive-summarization-as-a
Repo	https://github.com/yuedongP/BanditSum
Framework	pytorch