Paper Group AWR 341
Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP. S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning. Scalable Micro-planned Generation of Discourse from Structured Data. The Double Sphere Camera Model. Hyperparameters and Tuning Strategies for Random Forest. SqueezeSegV2: I …
Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP
Title | Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP |
Authors | Marcelo O. R. Prates, Pedro H. C. Avelar, Henrique Lemos, Luis Lamb, Moshe Vardi |
Abstract | Graph Neural Networks (GNN) are a promising technique for bridging differential programming and combinatorial domains. GNNs employ trainable modules which can be assembled in different configurations that reflect the relational structure of each problem instance. In this paper, we show that GNNs can learn to solve, with very little supervision, the decision variant of the Traveling Salesperson Problem (TSP), a highly relevant $\mathcal{NP}$-Complete problem. Our model is trained to function as an effective message-passing algorithm in which edges (embedded with their weights) communicate with vertices for a number of iterations after which the model is asked to decide whether a route with cost $<C$ exists. We show that such a network can be trained with sets of dual examples: given the optimal tour cost $C^{}$, we produce one decision instance with target cost $x%$ smaller and one with target cost $x%$ larger than $C^{}$. We were able to obtain $80%$ accuracy training with $-2%,+2%$ deviations, and the same trained model can generalize for more relaxed deviations with increasing performance. We also show that the model is capable of generalizing for larger problem sizes. Finally, we provide a method for predicting the optimal route cost within $2%$ deviation from the ground truth. In summary, our work shows that Graph Neural Networks are powerful enough to solve $\mathcal{NP}$-Complete problems which combine symbolic and numeric data. |
Tasks | |
Published | 2018-09-08 |
URL | http://arxiv.org/abs/1809.02721v3 |
http://arxiv.org/pdf/1809.02721v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-solve-np-complete-problems-a |
Repo | https://github.com/machine-reasoning-ufrgs/TSP-GNN |
Framework | tf |
S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning
Title | S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning |
Authors | Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat |
Abstract | State representation learning aims at learning compact representations from raw observations in robotics and control applications. Approaches used for this objective are auto-encoders, learning forward models, inverse dynamics or learning using generic priors on the state characteristics. However, the diversity in applications and methods makes the field lack standard evaluation datasets, metrics and tasks. This paper provides a set of environments, data generators, robotic control tasks, metrics and tools to facilitate iterative state representation learning and evaluation in reinforcement learning settings. |
Tasks | Representation Learning |
Published | 2018-09-25 |
URL | http://arxiv.org/abs/1809.09369v2 |
http://arxiv.org/pdf/1809.09369v2.pdf | |
PWC | https://paperswithcode.com/paper/s-rl-toolbox-environments-datasets-and |
Repo | https://github.com/araffin/robotics-rl-srl |
Framework | none |
Scalable Micro-planned Generation of Discourse from Structured Data
Title | Scalable Micro-planned Generation of Discourse from Structured Data |
Authors | Anirban Laha, Parag Jain, Abhijit Mishra, Karthik Sankaranarayanan |
Abstract | We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically employ end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore, exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. It rather relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains. Our system employs a 3-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain dataset curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other popular datasets covering diverse data types such as Knowledge Graphs and Key-Value maps. |
Tasks | Knowledge Graphs, Text Generation |
Published | 2018-10-05 |
URL | https://arxiv.org/abs/1810.02889v3 |
https://arxiv.org/pdf/1810.02889v3.pdf | |
PWC | https://paperswithcode.com/paper/scalable-micro-planned-generation-of |
Repo | https://github.com/parajain/structscribe |
Framework | pytorch |
The Double Sphere Camera Model
Title | The Double Sphere Camera Model |
Authors | Vladyslav Usenko, Nikolaus Demmel, Daniel Cremers |
Abstract | Vision-based motion estimation and 3D reconstruction, which have numerous applications (e.g., autonomous driving, navigation systems for airborne devices and augmented reality) are receiving significant research attention. To increase the accuracy and robustness, several researchers have recently demonstrated the benefit of using large field-of-view cameras for such applications. In this paper, we provide an extensive review of existing models for large field-of-view cameras. For each model we provide projection and unprojection functions and the subspace of points that result in valid projection. Then, we propose the Double Sphere camera model that well fits with large field-of-view lenses, is computationally inexpensive and has a closed-form inverse. We evaluate the model using a calibration dataset with several different lenses and compare the models using the metrics that are relevant for Visual Odometry, i.e., reprojection error, as well as computation time for projection and unprojection functions and their Jacobians. We also provide qualitative results and discuss the performance of all models. |
Tasks | 3D Reconstruction, Autonomous Driving, Calibration, Motion Estimation, Visual Odometry |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.08957v2 |
http://arxiv.org/pdf/1807.08957v2.pdf | |
PWC | https://paperswithcode.com/paper/the-double-sphere-camera-model |
Repo | https://github.com/VladyslavUsenko/basalt-mirror |
Framework | none |
Hyperparameters and Tuning Strategies for Random Forest
Title | Hyperparameters and Tuning Strategies for Random Forest |
Authors | Philipp Probst, Marvin Wright, Anne-Laure Boulesteix |
Abstract | The random forest algorithm (RF) has several hyperparameters that have to be set by the user, e.g., the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain and the number of trees. In this paper, we first provide a literature review on the parameters’ influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a brief overview of tuning strategies we demonstrate the application of one of the most established tuning strategies, model-based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters. |
Tasks | |
Published | 2018-04-10 |
URL | http://arxiv.org/abs/1804.03515v2 |
http://arxiv.org/pdf/1804.03515v2.pdf | |
PWC | https://paperswithcode.com/paper/hyperparameters-and-tuning-strategies-for |
Repo | https://github.com/PhilippPro/tuneRanger |
Framework | none |
SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud
Title | SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud |
Authors | Bichen Wu, Xuanyu Zhou, Sicheng Zhao, Xiangyu Yue, Kurt Keutzer |
Abstract | Earlier work demonstrates the promise of deep-learning-based approaches for point cloud segmentation; however, these approaches need to be improved to be practically useful. To this end, we introduce a new model SqueezeSegV2 that is more robust to dropout noise in LiDAR point clouds. With improved model structure, training loss, batch normalization and additional input channel, SqueezeSegV2 achieves significant accuracy improvement when trained on real data. Training models for point cloud segmentation requires large amounts of labeled point-cloud data, which is expensive to obtain. To sidestep the cost of collection and annotation, simulators such as GTA-V can be used to create unlimited amounts of labeled, synthetic data. However, due to domain shift, models trained on synthetic data often do not generalize well to the real world. We address this problem with a domain-adaptation training pipeline consisting of three major components: 1) learned intensity rendering, 2) geodesic correlation alignment, and 3) progressive domain calibration. When trained on real data, our new model exhibits segmentation accuracy improvements of 6.0-8.6% over the original SqueezeSeg. When training our new model on synthetic data using the proposed domain adaptation pipeline, we nearly double test accuracy on real-world data, from 29.0% to 57.4%. Our source code and synthetic dataset will be open-sourced. |
Tasks | Calibration, Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation |
Published | 2018-09-22 |
URL | http://arxiv.org/abs/1809.08495v1 |
http://arxiv.org/pdf/1809.08495v1.pdf | |
PWC | https://paperswithcode.com/paper/squeezesegv2-improved-model-structure-and |
Repo | https://github.com/xuanyuzhou98/SqueezeSegV2 |
Framework | tf |
3D Human Pose Estimation with 2D Marginal Heatmaps
Title | 3D Human Pose Estimation with 2D Marginal Heatmaps |
Authors | Aiden Nibali, Zhen He, Stuart Morgan, Luke Prendergast |
Abstract | Automatically determining three-dimensional human pose from monocular RGB image data is a challenging problem. The two-dimensional nature of the input results in intrinsic ambiguities which make inferring depth particularly difficult. Recently, researchers have demonstrated that the flexible statistical modelling capabilities of deep neural networks are sufficient to make such inferences with reasonable accuracy. However, many of these models use coordinate output techniques which are memory-intensive, not differentiable, and/or do not spatially generalise well. We propose improvements to 3D coordinate prediction which avoid the aforementioned undesirable traits by predicting 2D marginal heatmaps under an augmented soft-argmax scheme. Our resulting model, MargiPose, produces visually coherent heatmaps whilst maintaining differentiability. We are also able to achieve state-of-the-art accuracy on publicly available 3D human pose estimation data. |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2018-06-05 |
URL | http://arxiv.org/abs/1806.01484v2 |
http://arxiv.org/pdf/1806.01484v2.pdf | |
PWC | https://paperswithcode.com/paper/3d-human-pose-estimation-with-2d-marginal |
Repo | https://github.com/anibali/margipose |
Framework | pytorch |
Multimodal One-Shot Learning of Speech and Images
Title | Multimodal One-Shot Learning of Speech and Images |
Authors | Ryan Eloff, Herman A. Engelbrecht, Herman Kamper |
Abstract | Imagine a robot is shown new concepts visually together with spoken tags, e.g. “milk”, “eggs”, “butter”. After seeing one paired audio-visual example per class, it is shown a new set of unseen instances of these objects, and asked to pick the “milk”. Without receiving any hard labels, could it learn to match the new continuous speech input to the correct visual instance? Although unimodal one-shot learning has been studied, where one labelled example in a single modality is given per class, this example motivates multimodal one-shot learning. Our main contribution is to formally define this task, and to propose several baseline and advanced models. We use a dataset of paired spoken and visual digits to specifically investigate recent advances in Siamese convolutional neural networks. Our best Siamese model achieves twice the accuracy of a nearest neighbour model using pixel-distance over images and dynamic time warping over speech in 11-way cross-modal matching. |
Tasks | One-Shot Learning |
Published | 2018-11-09 |
URL | http://arxiv.org/abs/1811.03875v2 |
http://arxiv.org/pdf/1811.03875v2.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-one-shot-learning-of-speech-and |
Repo | https://github.com/rpeloff/multimodal_one_shot_learning |
Framework | tf |
MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects
Title | MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects |
Authors | Martin Rünz, Maud Buffier, Lourdes Agapito |
Abstract | We present MaskFusion, a real-time, object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene. MaskFusion recognizes, segments and assigns semantic class labels to different objects in the scene, while tracking and reconstructing them even when they move independently from the camera. As an RGB-D camera scans a cluttered scene, image-based instance-level semantic segmentation creates semantic object masks that enable real-time object recognition and the creation of an object-level representation for the world map. Unlike previous recognition-based SLAM systems, MaskFusion does not require known models of the objects it can recognize, and can deal with multiple independent motions. MaskFusion takes full advantage of using instance-level semantic segmentation to enable semantic labels to be fused into an object-aware map, unlike recent semantics enabled SLAM systems that perform voxel-level semantic segmentation. We show augmented-reality applications that demonstrate the unique features of the map output by MaskFusion: instance-aware, semantic and dynamic. |
Tasks | Object Recognition, Semantic Segmentation |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1804.09194v2 |
http://arxiv.org/pdf/1804.09194v2.pdf | |
PWC | https://paperswithcode.com/paper/maskfusion-real-time-recognition-tracking-and |
Repo | https://github.com/martinruenz/maskfusion |
Framework | tf |
Classification from Positive, Unlabeled and Biased Negative Data
Title | Classification from Positive, Unlabeled and Biased Negative Data |
Authors | Yu-Guan Hsieh, Gang Niu, Masashi Sugiyama |
Abstract | In binary classification, there are situations where negative (N) data are too diverse to be fully labeled and we often resort to positive-unlabeled (PU) learning in these scenarios. However, collecting a non-representative N set that contains only a small portion of all possible N data can often be much easier in practice. This paper studies a novel classification framework which incorporates such biased N (bN) data in PU learning. We provide a method based on empirical risk minimization to address this PUbN classification problem. Our approach can be regarded as a novel example-weighting algorithm, with the weight of each example computed through a preliminary step that draws inspiration from PU learning. We also derive an estimation error bound for the proposed method. Experimental results demonstrate the effectiveness of our algorithm in not only PUbN learning scenarios but also ordinary PU learning scenarios on several benchmark datasets. |
Tasks | |
Published | 2018-10-01 |
URL | https://arxiv.org/abs/1810.00846v2 |
https://arxiv.org/pdf/1810.00846v2.pdf | |
PWC | https://paperswithcode.com/paper/classification-from-positive-unlabeled-and |
Repo | https://github.com/ZaydH/covariate_shift_risk_estimation |
Framework | pytorch |
Dist-GAN: An Improved GAN using Distance Constraints
Title | Dist-GAN: An Improved GAN using Distance Constraints |
Authors | Ngoc-Trung Tran, Tuan-Anh Bui, Ngai-Man Cheung |
Abstract | We introduce effective training algorithms for Generative Adversarial Networks (GAN) to alleviate mode collapse and gradient vanishing. In our system, we constrain the generator by an Autoencoder (AE). We propose a formulation to consider the reconstructed samples from AE as “real” samples for the discriminator. This couples the convergence of the AE with that of the discriminator, effectively slowing down the convergence of discriminator and reducing gradient vanishing. Importantly, we propose two novel distance constraints to improve the generator. First, we propose a latent-data distance constraint to enforce compatibility between the latent sample distances and the corresponding data sample distances. We use this constraint to explicitly prevent the generator from mode collapse. Second, we propose a discriminator-score distance constraint to align the distribution of the generated samples with that of the real samples through the discriminator score. We use this constraint to guide the generator to synthesize samples that resemble the real ones. Our proposed GAN using these distance constraints, namely Dist-GAN, can achieve better results than state-of-the-art methods across benchmark datasets: synthetic, MNIST, MNIST-1K, CelebA, CIFAR-10 and STL-10 datasets. Our code is published here (https://github.com/tntrung/gan) for research. |
Tasks | Image Generation |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08887v3 |
http://arxiv.org/pdf/1803.08887v3.pdf | |
PWC | https://paperswithcode.com/paper/dist-gan-an-improved-gan-using-distance |
Repo | https://github.com/tntrung/gan |
Framework | tf |
Improved training of end-to-end attention models for speech recognition
Title | Improved training of end-to-end attention models for speech recognition |
Authors | Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney |
Abstract | Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h and LibriSpeech 1000h tasks. In particular, we report the state-of-the-art word error rates (WER) of 3.54% on the dev-clean and 3.82% on the test-clean evaluation subsets of LibriSpeech. We introduce a new pretraining scheme by starting with a high time reduction factor and lowering it during training, which is crucial both for convergence and final performance. In some experiments, we also use an auxiliary CTC loss function to help the convergence. In addition, we train long short-term memory (LSTM) language models on subword units. By shallow fusion, we report up to 27% relative improvements in WER over the attention baseline without a language model. |
Tasks | End-To-End Speech Recognition, Language Modelling, Speech Recognition |
Published | 2018-05-08 |
URL | http://arxiv.org/abs/1805.03294v1 |
http://arxiv.org/pdf/1805.03294v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-training-of-end-to-end-attention |
Repo | https://github.com/pvsimoes/our_espnet |
Framework | pytorch |
Online Temporal Calibration for Monocular Visual-Inertial Systems
Title | Online Temporal Calibration for Monocular Visual-Inertial Systems |
Authors | Tong Qin, Shaojie Shen |
Abstract | Accurate state estimation is a fundamental module for various intelligent applications, such as robot navigation, autonomous driving, virtual and augmented reality. Visual and inertial fusion is a popular technology for 6-DOF state estimation in recent years. Time instants at which different sensors’ measurements are recorded are of crucial importance to the system’s robustness and accuracy. In practice, timestamps of each sensor typically suffer from triggering and transmission delays, leading to temporal misalignment (time offsets) among different sensors. Such temporal offset dramatically influences the performance of sensor fusion. To this end, we propose an online approach for calibrating temporal offset between visual and inertial measurements. Our approach achieves temporal offset calibration by jointly optimizing time offset, camera and IMU states, as well as feature locations in a SLAM system. Furthermore, the approach is a general model, which can be easily employed in several feature-based optimization frameworks. Simulation and experimental results demonstrate the high accuracy of our calibration approach even compared with other state-of-art offline tools. The VIO comparison against other methods proves that the online temporal calibration significantly benefits visual-inertial systems. The source code of temporal calibration is integrated into our public project, VINS-Mono. |
Tasks | Autonomous Driving, Calibration, Robot Navigation, Sensor Fusion, Time Offset Calibration |
Published | 2018-08-02 |
URL | http://arxiv.org/abs/1808.00692v1 |
http://arxiv.org/pdf/1808.00692v1.pdf | |
PWC | https://paperswithcode.com/paper/online-temporal-calibration-for-monocular |
Repo | https://github.com/HKUST-Aerial-Robotics/VINS-Mono |
Framework | tf |
Autonomous Driving in Reality with Reinforcement Learning and Image Translation
Title | Autonomous Driving in Reality with Reinforcement Learning and Image Translation |
Authors | Nayun Xu, Bowen Tan, Bingyu Kong |
Abstract | Supervised learning is widely used in training autonomous driving vehicle. However, it is trained with large amount of supervised labeled data. Reinforcement learning can be trained without abundant labeled data, but we cannot train it in reality because it would involve many unpredictable accidents. Nevertheless, training an agent with good performance in virtual environment is relatively much easier. Because of the huge difference between virtual and real, how to fill the gap between virtual and real is challenging. In this paper, we proposed a novel framework of reinforcement learning with image semantic segmentation network to make the whole model adaptable to reality. The agent is trained in TORCS, a car racing simulator. |
Tasks | Autonomous Driving, Car Racing, Semantic Segmentation |
Published | 2018-01-13 |
URL | http://arxiv.org/abs/1801.05299v2 |
http://arxiv.org/pdf/1801.05299v2.pdf | |
PWC | https://paperswithcode.com/paper/autonomous-driving-in-reality-with |
Repo | https://github.com/SullyChen/Autopilot-TensorFlow |
Framework | tf |
BanditSum: Extractive Summarization as a Contextual Bandit
Title | BanditSum: Extractive Summarization as a Contextual Bandit |
Authors | Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung |
Abstract | In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels. We call our approach BanditSum as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BanditSum is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BanditSum performs significantly better than competing approaches when good summary sentences appear late in the source document. |
Tasks | |
Published | 2018-09-25 |
URL | https://arxiv.org/abs/1809.09672v3 |
https://arxiv.org/pdf/1809.09672v3.pdf | |
PWC | https://paperswithcode.com/paper/banditsum-extractive-summarization-as-a |
Repo | https://github.com/yuedongP/BanditSum |
Framework | pytorch |