October 20, 2019

3127 words 15 mins read

Paper Group AWR 341

Paper Group AWR 341

Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP. S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning. Scalable Micro-planned Generation of Discourse from Structured Data. The Double Sphere Camera Model. Hyperparameters and Tuning Strategies for Random Forest. SqueezeSegV2: I …

Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP

Title Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP
Authors Marcelo O. R. Prates, Pedro H. C. Avelar, Henrique Lemos, Luis Lamb, Moshe Vardi
Abstract Graph Neural Networks (GNN) are a promising technique for bridging differential programming and combinatorial domains. GNNs employ trainable modules which can be assembled in different configurations that reflect the relational structure of each problem instance. In this paper, we show that GNNs can learn to solve, with very little supervision, the decision variant of the Traveling Salesperson Problem (TSP), a highly relevant $\mathcal{NP}$-Complete problem. Our model is trained to function as an effective message-passing algorithm in which edges (embedded with their weights) communicate with vertices for a number of iterations after which the model is asked to decide whether a route with cost $<C$ exists. We show that such a network can be trained with sets of dual examples: given the optimal tour cost $C^{}$, we produce one decision instance with target cost $x%$ smaller and one with target cost $x%$ larger than $C^{}$. We were able to obtain $80%$ accuracy training with $-2%,+2%$ deviations, and the same trained model can generalize for more relaxed deviations with increasing performance. We also show that the model is capable of generalizing for larger problem sizes. Finally, we provide a method for predicting the optimal route cost within $2%$ deviation from the ground truth. In summary, our work shows that Graph Neural Networks are powerful enough to solve $\mathcal{NP}$-Complete problems which combine symbolic and numeric data.
Tasks
Published 2018-09-08
URL http://arxiv.org/abs/1809.02721v3
PDF http://arxiv.org/pdf/1809.02721v3.pdf
PWC https://paperswithcode.com/paper/learning-to-solve-np-complete-problems-a
Repo https://github.com/machine-reasoning-ufrgs/TSP-GNN
Framework tf

S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning

Title S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning
Authors Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat
Abstract State representation learning aims at learning compact representations from raw observations in robotics and control applications. Approaches used for this objective are auto-encoders, learning forward models, inverse dynamics or learning using generic priors on the state characteristics. However, the diversity in applications and methods makes the field lack standard evaluation datasets, metrics and tasks. This paper provides a set of environments, data generators, robotic control tasks, metrics and tools to facilitate iterative state representation learning and evaluation in reinforcement learning settings.
Tasks Representation Learning
Published 2018-09-25
URL http://arxiv.org/abs/1809.09369v2
PDF http://arxiv.org/pdf/1809.09369v2.pdf
PWC https://paperswithcode.com/paper/s-rl-toolbox-environments-datasets-and
Repo https://github.com/araffin/robotics-rl-srl
Framework none

Scalable Micro-planned Generation of Discourse from Structured Data

Title Scalable Micro-planned Generation of Discourse from Structured Data
Authors Anirban Laha, Parag Jain, Abhijit Mishra, Karthik Sankaranarayanan
Abstract We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically employ end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore, exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. It rather relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains. Our system employs a 3-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain dataset curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other popular datasets covering diverse data types such as Knowledge Graphs and Key-Value maps.
Tasks Knowledge Graphs, Text Generation
Published 2018-10-05
URL https://arxiv.org/abs/1810.02889v3
PDF https://arxiv.org/pdf/1810.02889v3.pdf
PWC https://paperswithcode.com/paper/scalable-micro-planned-generation-of
Repo https://github.com/parajain/structscribe
Framework pytorch

The Double Sphere Camera Model

Title The Double Sphere Camera Model
Authors Vladyslav Usenko, Nikolaus Demmel, Daniel Cremers
Abstract Vision-based motion estimation and 3D reconstruction, which have numerous applications (e.g., autonomous driving, navigation systems for airborne devices and augmented reality) are receiving significant research attention. To increase the accuracy and robustness, several researchers have recently demonstrated the benefit of using large field-of-view cameras for such applications. In this paper, we provide an extensive review of existing models for large field-of-view cameras. For each model we provide projection and unprojection functions and the subspace of points that result in valid projection. Then, we propose the Double Sphere camera model that well fits with large field-of-view lenses, is computationally inexpensive and has a closed-form inverse. We evaluate the model using a calibration dataset with several different lenses and compare the models using the metrics that are relevant for Visual Odometry, i.e., reprojection error, as well as computation time for projection and unprojection functions and their Jacobians. We also provide qualitative results and discuss the performance of all models.
Tasks 3D Reconstruction, Autonomous Driving, Calibration, Motion Estimation, Visual Odometry
Published 2018-07-24
URL http://arxiv.org/abs/1807.08957v2
PDF http://arxiv.org/pdf/1807.08957v2.pdf
PWC https://paperswithcode.com/paper/the-double-sphere-camera-model
Repo https://github.com/VladyslavUsenko/basalt-mirror
Framework none

Hyperparameters and Tuning Strategies for Random Forest

Title Hyperparameters and Tuning Strategies for Random Forest
Authors Philipp Probst, Marvin Wright, Anne-Laure Boulesteix
Abstract The random forest algorithm (RF) has several hyperparameters that have to be set by the user, e.g., the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain and the number of trees. In this paper, we first provide a literature review on the parameters’ influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a brief overview of tuning strategies we demonstrate the application of one of the most established tuning strategies, model-based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters.
Tasks
Published 2018-04-10
URL http://arxiv.org/abs/1804.03515v2
PDF http://arxiv.org/pdf/1804.03515v2.pdf
PWC https://paperswithcode.com/paper/hyperparameters-and-tuning-strategies-for
Repo https://github.com/PhilippPro/tuneRanger
Framework none

SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud

Title SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud
Authors Bichen Wu, Xuanyu Zhou, Sicheng Zhao, Xiangyu Yue, Kurt Keutzer
Abstract Earlier work demonstrates the promise of deep-learning-based approaches for point cloud segmentation; however, these approaches need to be improved to be practically useful. To this end, we introduce a new model SqueezeSegV2 that is more robust to dropout noise in LiDAR point clouds. With improved model structure, training loss, batch normalization and additional input channel, SqueezeSegV2 achieves significant accuracy improvement when trained on real data. Training models for point cloud segmentation requires large amounts of labeled point-cloud data, which is expensive to obtain. To sidestep the cost of collection and annotation, simulators such as GTA-V can be used to create unlimited amounts of labeled, synthetic data. However, due to domain shift, models trained on synthetic data often do not generalize well to the real world. We address this problem with a domain-adaptation training pipeline consisting of three major components: 1) learned intensity rendering, 2) geodesic correlation alignment, and 3) progressive domain calibration. When trained on real data, our new model exhibits segmentation accuracy improvements of 6.0-8.6% over the original SqueezeSeg. When training our new model on synthetic data using the proposed domain adaptation pipeline, we nearly double test accuracy on real-world data, from 29.0% to 57.4%. Our source code and synthetic dataset will be open-sourced.
Tasks Calibration, Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation
Published 2018-09-22
URL http://arxiv.org/abs/1809.08495v1
PDF http://arxiv.org/pdf/1809.08495v1.pdf
PWC https://paperswithcode.com/paper/squeezesegv2-improved-model-structure-and
Repo https://github.com/xuanyuzhou98/SqueezeSegV2
Framework tf

3D Human Pose Estimation with 2D Marginal Heatmaps

Title 3D Human Pose Estimation with 2D Marginal Heatmaps
Authors Aiden Nibali, Zhen He, Stuart Morgan, Luke Prendergast
Abstract Automatically determining three-dimensional human pose from monocular RGB image data is a challenging problem. The two-dimensional nature of the input results in intrinsic ambiguities which make inferring depth particularly difficult. Recently, researchers have demonstrated that the flexible statistical modelling capabilities of deep neural networks are sufficient to make such inferences with reasonable accuracy. However, many of these models use coordinate output techniques which are memory-intensive, not differentiable, and/or do not spatially generalise well. We propose improvements to 3D coordinate prediction which avoid the aforementioned undesirable traits by predicting 2D marginal heatmaps under an augmented soft-argmax scheme. Our resulting model, MargiPose, produces visually coherent heatmaps whilst maintaining differentiability. We are also able to achieve state-of-the-art accuracy on publicly available 3D human pose estimation data.
Tasks 3D Human Pose Estimation, Pose Estimation
Published 2018-06-05
URL http://arxiv.org/abs/1806.01484v2
PDF http://arxiv.org/pdf/1806.01484v2.pdf
PWC https://paperswithcode.com/paper/3d-human-pose-estimation-with-2d-marginal
Repo https://github.com/anibali/margipose
Framework pytorch

Multimodal One-Shot Learning of Speech and Images

Title Multimodal One-Shot Learning of Speech and Images
Authors Ryan Eloff, Herman A. Engelbrecht, Herman Kamper
Abstract Imagine a robot is shown new concepts visually together with spoken tags, e.g. “milk”, “eggs”, “butter”. After seeing one paired audio-visual example per class, it is shown a new set of unseen instances of these objects, and asked to pick the “milk”. Without receiving any hard labels, could it learn to match the new continuous speech input to the correct visual instance? Although unimodal one-shot learning has been studied, where one labelled example in a single modality is given per class, this example motivates multimodal one-shot learning. Our main contribution is to formally define this task, and to propose several baseline and advanced models. We use a dataset of paired spoken and visual digits to specifically investigate recent advances in Siamese convolutional neural networks. Our best Siamese model achieves twice the accuracy of a nearest neighbour model using pixel-distance over images and dynamic time warping over speech in 11-way cross-modal matching.
Tasks One-Shot Learning
Published 2018-11-09
URL http://arxiv.org/abs/1811.03875v2
PDF http://arxiv.org/pdf/1811.03875v2.pdf
PWC https://paperswithcode.com/paper/multimodal-one-shot-learning-of-speech-and
Repo https://github.com/rpeloff/multimodal_one_shot_learning
Framework tf

MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects

Title MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects
Authors Martin Rünz, Maud Buffier, Lourdes Agapito
Abstract We present MaskFusion, a real-time, object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene. MaskFusion recognizes, segments and assigns semantic class labels to different objects in the scene, while tracking and reconstructing them even when they move independently from the camera. As an RGB-D camera scans a cluttered scene, image-based instance-level semantic segmentation creates semantic object masks that enable real-time object recognition and the creation of an object-level representation for the world map. Unlike previous recognition-based SLAM systems, MaskFusion does not require known models of the objects it can recognize, and can deal with multiple independent motions. MaskFusion takes full advantage of using instance-level semantic segmentation to enable semantic labels to be fused into an object-aware map, unlike recent semantics enabled SLAM systems that perform voxel-level semantic segmentation. We show augmented-reality applications that demonstrate the unique features of the map output by MaskFusion: instance-aware, semantic and dynamic.
Tasks Object Recognition, Semantic Segmentation
Published 2018-04-24
URL http://arxiv.org/abs/1804.09194v2
PDF http://arxiv.org/pdf/1804.09194v2.pdf
PWC https://paperswithcode.com/paper/maskfusion-real-time-recognition-tracking-and
Repo https://github.com/martinruenz/maskfusion
Framework tf

Classification from Positive, Unlabeled and Biased Negative Data

Title Classification from Positive, Unlabeled and Biased Negative Data
Authors Yu-Guan Hsieh, Gang Niu, Masashi Sugiyama
Abstract In binary classification, there are situations where negative (N) data are too diverse to be fully labeled and we often resort to positive-unlabeled (PU) learning in these scenarios. However, collecting a non-representative N set that contains only a small portion of all possible N data can often be much easier in practice. This paper studies a novel classification framework which incorporates such biased N (bN) data in PU learning. We provide a method based on empirical risk minimization to address this PUbN classification problem. Our approach can be regarded as a novel example-weighting algorithm, with the weight of each example computed through a preliminary step that draws inspiration from PU learning. We also derive an estimation error bound for the proposed method. Experimental results demonstrate the effectiveness of our algorithm in not only PUbN learning scenarios but also ordinary PU learning scenarios on several benchmark datasets.
Tasks
Published 2018-10-01
URL https://arxiv.org/abs/1810.00846v2
PDF https://arxiv.org/pdf/1810.00846v2.pdf
PWC https://paperswithcode.com/paper/classification-from-positive-unlabeled-and
Repo https://github.com/ZaydH/covariate_shift_risk_estimation
Framework pytorch

Dist-GAN: An Improved GAN using Distance Constraints

Title Dist-GAN: An Improved GAN using Distance Constraints
Authors Ngoc-Trung Tran, Tuan-Anh Bui, Ngai-Man Cheung
Abstract We introduce effective training algorithms for Generative Adversarial Networks (GAN) to alleviate mode collapse and gradient vanishing. In our system, we constrain the generator by an Autoencoder (AE). We propose a formulation to consider the reconstructed samples from AE as “real” samples for the discriminator. This couples the convergence of the AE with that of the discriminator, effectively slowing down the convergence of discriminator and reducing gradient vanishing. Importantly, we propose two novel distance constraints to improve the generator. First, we propose a latent-data distance constraint to enforce compatibility between the latent sample distances and the corresponding data sample distances. We use this constraint to explicitly prevent the generator from mode collapse. Second, we propose a discriminator-score distance constraint to align the distribution of the generated samples with that of the real samples through the discriminator score. We use this constraint to guide the generator to synthesize samples that resemble the real ones. Our proposed GAN using these distance constraints, namely Dist-GAN, can achieve better results than state-of-the-art methods across benchmark datasets: synthetic, MNIST, MNIST-1K, CelebA, CIFAR-10 and STL-10 datasets. Our code is published here (https://github.com/tntrung/gan) for research.
Tasks Image Generation
Published 2018-03-23
URL http://arxiv.org/abs/1803.08887v3
PDF http://arxiv.org/pdf/1803.08887v3.pdf
PWC https://paperswithcode.com/paper/dist-gan-an-improved-gan-using-distance
Repo https://github.com/tntrung/gan
Framework tf

Improved training of end-to-end attention models for speech recognition

Title Improved training of end-to-end attention models for speech recognition
Authors Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney
Abstract Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h and LibriSpeech 1000h tasks. In particular, we report the state-of-the-art word error rates (WER) of 3.54% on the dev-clean and 3.82% on the test-clean evaluation subsets of LibriSpeech. We introduce a new pretraining scheme by starting with a high time reduction factor and lowering it during training, which is crucial both for convergence and final performance. In some experiments, we also use an auxiliary CTC loss function to help the convergence. In addition, we train long short-term memory (LSTM) language models on subword units. By shallow fusion, we report up to 27% relative improvements in WER over the attention baseline without a language model.
Tasks End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published 2018-05-08
URL http://arxiv.org/abs/1805.03294v1
PDF http://arxiv.org/pdf/1805.03294v1.pdf
PWC https://paperswithcode.com/paper/improved-training-of-end-to-end-attention
Repo https://github.com/pvsimoes/our_espnet
Framework pytorch

Online Temporal Calibration for Monocular Visual-Inertial Systems

Title Online Temporal Calibration for Monocular Visual-Inertial Systems
Authors Tong Qin, Shaojie Shen
Abstract Accurate state estimation is a fundamental module for various intelligent applications, such as robot navigation, autonomous driving, virtual and augmented reality. Visual and inertial fusion is a popular technology for 6-DOF state estimation in recent years. Time instants at which different sensors’ measurements are recorded are of crucial importance to the system’s robustness and accuracy. In practice, timestamps of each sensor typically suffer from triggering and transmission delays, leading to temporal misalignment (time offsets) among different sensors. Such temporal offset dramatically influences the performance of sensor fusion. To this end, we propose an online approach for calibrating temporal offset between visual and inertial measurements. Our approach achieves temporal offset calibration by jointly optimizing time offset, camera and IMU states, as well as feature locations in a SLAM system. Furthermore, the approach is a general model, which can be easily employed in several feature-based optimization frameworks. Simulation and experimental results demonstrate the high accuracy of our calibration approach even compared with other state-of-art offline tools. The VIO comparison against other methods proves that the online temporal calibration significantly benefits visual-inertial systems. The source code of temporal calibration is integrated into our public project, VINS-Mono.
Tasks Autonomous Driving, Calibration, Robot Navigation, Sensor Fusion, Time Offset Calibration
Published 2018-08-02
URL http://arxiv.org/abs/1808.00692v1
PDF http://arxiv.org/pdf/1808.00692v1.pdf
PWC https://paperswithcode.com/paper/online-temporal-calibration-for-monocular
Repo https://github.com/HKUST-Aerial-Robotics/VINS-Mono
Framework tf

Autonomous Driving in Reality with Reinforcement Learning and Image Translation

Title Autonomous Driving in Reality with Reinforcement Learning and Image Translation
Authors Nayun Xu, Bowen Tan, Bingyu Kong
Abstract Supervised learning is widely used in training autonomous driving vehicle. However, it is trained with large amount of supervised labeled data. Reinforcement learning can be trained without abundant labeled data, but we cannot train it in reality because it would involve many unpredictable accidents. Nevertheless, training an agent with good performance in virtual environment is relatively much easier. Because of the huge difference between virtual and real, how to fill the gap between virtual and real is challenging. In this paper, we proposed a novel framework of reinforcement learning with image semantic segmentation network to make the whole model adaptable to reality. The agent is trained in TORCS, a car racing simulator.
Tasks Autonomous Driving, Car Racing, Semantic Segmentation
Published 2018-01-13
URL http://arxiv.org/abs/1801.05299v2
PDF http://arxiv.org/pdf/1801.05299v2.pdf
PWC https://paperswithcode.com/paper/autonomous-driving-in-reality-with
Repo https://github.com/SullyChen/Autopilot-TensorFlow
Framework tf

BanditSum: Extractive Summarization as a Contextual Bandit

Title BanditSum: Extractive Summarization as a Contextual Bandit
Authors Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung
Abstract In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels. We call our approach BanditSum as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BanditSum is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BanditSum performs significantly better than competing approaches when good summary sentences appear late in the source document.
Tasks
Published 2018-09-25
URL https://arxiv.org/abs/1809.09672v3
PDF https://arxiv.org/pdf/1809.09672v3.pdf
PWC https://paperswithcode.com/paper/banditsum-extractive-summarization-as-a
Repo https://github.com/yuedongP/BanditSum
Framework pytorch
comments powered by Disqus