April 2, 2020

3439 words 17 mins read

Paper Group ANR 182

Paper Group ANR 182

Characterizing and Avoiding Problematic Global Optima of Variational Autoencoders. Maximal Causes for Exponential Family Observables. Experimental Comparison of Global Motion Planning Algorithms for Wheeled Mobile Robots. Sub-Goal Trees – a Framework for Goal-Based Reinforcement Learning. STAViS: Spatio-Temporal AudioVisual Saliency Network. Multi …

Characterizing and Avoiding Problematic Global Optima of Variational Autoencoders

Title Characterizing and Avoiding Problematic Global Optima of Variational Autoencoders
Authors Yaniv Yacoby, Weiwei Pan, Finale Doshi-Velez
Abstract Variational Auto-encoders (VAEs) are deep generative latent variable models consisting of two components: a generative model that captures a data distribution p(x) by transforming a distribution p(z) over latent space, and an inference model that infers likely latent codes for each data point (Kingma and Welling, 2013). Recent work shows that traditional training methods tend to yield solutions that violate modeling desiderata: (1) the learned generative model captures the observed data distribution but does so while ignoring the latent codes, resulting in codes that do not represent the data (e.g. van den Oord et al. (2017); Kim et al. (2018)); (2) the aggregate of the learned latent codes does not match the prior p(z). This mismatch means that the learned generative model will be unable to generate realistic data with samples from p(z)(e.g. Makhzani et al. (2015); Tomczak and Welling (2017)). In this paper, we demonstrate that both issues stem from the fact that the global optima of the VAE training objective often correspond to undesirable solutions. Our analysis builds on two observations: (1) the generative model is unidentifiable - there exist many generative models that explain the data equally well, each with different (and potentially unwanted) properties and (2) bias in the VAE objective - the VAE objective may prefer generative models that explain the data poorly but have posteriors that are easy to approximate. We present a novel inference method, LiBI, mitigating the problems identified in our analysis. On synthetic datasets, we show that LiBI can learn generative models that capture the data distribution and inference models that better satisfy modeling assumptions when traditional methods struggle to do so.
Tasks Latent Variable Models
Published 2020-03-17
URL https://arxiv.org/abs/2003.07756v1
PDF https://arxiv.org/pdf/2003.07756v1.pdf
PWC https://paperswithcode.com/paper/characterizing-and-avoiding-problematic
Repo
Framework

Maximal Causes for Exponential Family Observables

Title Maximal Causes for Exponential Family Observables
Authors S. Hamid Mousavi, Jakob Drefs, Florian Hirschberger, Jörg Lücke
Abstract The data model of standard sparse coding assumes a weighted linear summation of latents to determine the mean of Gaussian observation noise. However, such a linear summation of latents is often at odds with non-Gaussian observables (e.g., means of the Bernoulli distribution have to lie in the unit interval), and also in the Gaussian case it can be difficult to justify for many types of data. Alternative superposition models (i.e., links between latents and observables) have therefore been investigated repeatedly. Here we show that using the maximum instead of a linear sum to link latents to observables allows for the derivation of very general and concise parameter update equations. Concretely, we derive a set of update equations that has the same functional form for all distributions of the exponential family (given that derivatives w.r.t. their parameters can be taken). Our results consequently allow for the development of latent variable models for commonly as well as for unusually distributed data. We numerically verify our analytical result assuming standard Gaussian, Gamma, Poisson, Bernoulli and Exponential distributions and point to some potential applications.
Tasks Latent Variable Models
Published 2020-03-04
URL https://arxiv.org/abs/2003.02214v1
PDF https://arxiv.org/pdf/2003.02214v1.pdf
PWC https://paperswithcode.com/paper/maximal-causes-for-exponential-family
Repo
Framework

Experimental Comparison of Global Motion Planning Algorithms for Wheeled Mobile Robots

Title Experimental Comparison of Global Motion Planning Algorithms for Wheeled Mobile Robots
Authors Eric Heiden, Luigi Palmieri, Kai O. Arras, Gaurav S. Sukhatme, Sven Koenig
Abstract Planning smooth and energy-efficient motions for wheeled mobile robots is a central task for applications ranging from autonomous driving to service and intralogistic robotics. Over the past decades, a wide variety of motion planners, steer functions and path-improvement techniques have been proposed for such non-holonomic systems. With the objective of comparing this large assortment of state-of-the-art motion-planning techniques, we introduce a novel open-source motion-planning benchmark for wheeled mobile robots, whose scenarios resemble real-world applications (such as navigating warehouses, moving in cluttered cities or parking), and propose metrics for planning efficiency and path quality. Our benchmark is easy to use and extend, and thus allows practitioners and researchers to evaluate new motion-planning algorithms, scenarios and metrics easily. We use our benchmark to highlight the strengths and weaknesses of several common state-of-the-art motion planners and provide recommendations on when they should be used.
Tasks Autonomous Driving, Motion Planning
Published 2020-03-07
URL https://arxiv.org/abs/2003.03543v1
PDF https://arxiv.org/pdf/2003.03543v1.pdf
PWC https://paperswithcode.com/paper/experimental-comparison-of-global-motion
Repo
Framework

Sub-Goal Trees – a Framework for Goal-Based Reinforcement Learning

Title Sub-Goal Trees – a Framework for Goal-Based Reinforcement Learning
Authors Tom Jurgenson, Or Avner, Edward Groshev, Aviv Tamar
Abstract Many AI problems, in robotics and other domains, are goal-based, essentially seeking trajectories leading to various goal states. Reinforcement learning (RL), building on Bellman’s optimality equation, naturally optimizes for a single goal, yet can be made multi-goal by augmenting the state with the goal. Instead, we propose a new RL framework, derived from a dynamic programming equation for the all pairs shortest path (APSP) problem, which naturally solves multi-goal queries. We show that this approach has computational benefits for both standard and approximate dynamic programming. Interestingly, our formulation prescribes a novel protocol for computing a trajectory: instead of predicting the next state given its predecessor, as in standard RL, a goal-conditioned trajectory is constructed by first predicting an intermediate state between start and goal, partitioning the trajectory into two. Then, recursively, predicting intermediate points on each sub-segment, until a complete trajectory is obtained. We call this trajectory structure a sub-goal tree. Building on it, we additionally extend the policy gradient methodology to recursively predict sub-goals, resulting in novel goal-based algorithms. Finally, we apply our method to neural motion planning, where we demonstrate significant improvements compared to standard RL on navigating a 7-DoF robot arm between obstacles.
Tasks Motion Planning
Published 2020-02-27
URL https://arxiv.org/abs/2002.12361v1
PDF https://arxiv.org/pdf/2002.12361v1.pdf
PWC https://paperswithcode.com/paper/sub-goal-trees-a-framework-for-goal-based
Repo
Framework

STAViS: Spatio-Temporal AudioVisual Saliency Network

Title STAViS: Spatio-Temporal AudioVisual Saliency Network
Authors Antigoni Tsiami, Petros Koutras, Petros Maragos
Abstract We introduce STAViS, a spatio-temporal audiovisual saliency network that combines spatio-temporal visual and auditory information in order to efficiently address the problem of saliency estimation in videos. Our approach employs a single network that combines visual saliency and auditory features and learns to appropriately localize sound sources and to fuse the two saliencies in order to obtain a final saliency map. The network has been designed, trained end-to-end, and evaluated on six different databases that contain audiovisual eye-tracking data of a large variety of videos. We compare our method against 8 different state-of-the-art visual saliency models. Evaluation results across databases indicate that our STAViS model outperforms our visual only variant as well as the other state-of-the-art models in the majority of cases. Also, the consistently good performance it achieves for all databases indicates that it is appropriate for estimating saliency “in-the-wild”.
Tasks Eye Tracking, Saliency Prediction
Published 2020-01-09
URL https://arxiv.org/abs/2001.03063v1
PDF https://arxiv.org/pdf/2001.03063v1.pdf
PWC https://paperswithcode.com/paper/stavis-spatio-temporal-audiovisual-saliency
Repo
Framework

Multi-Label Class Balancing Algorithm for Action Unit Detection

Title Multi-Label Class Balancing Algorithm for Action Unit Detection
Authors Jaspar Pahl, Ines Rieger, Dominik Seuss
Abstract Isolated facial movements, so-called Action Units, can describe combined emotions or physical states such as pain. As datasets are limited and mostly imbalanced, we present an approach incorporating a multi-label class balancing algorithm. This submission is subject to the Action Unit detection task of the Affective Behavior Analysis in-the-wild (ABAW) challenge at the IEEE Conference on Face and Gesture Recognition 2020.
Tasks Action Unit Detection, Gesture Recognition
Published 2020-02-08
URL https://arxiv.org/abs/2002.03238v1
PDF https://arxiv.org/pdf/2002.03238v1.pdf
PWC https://paperswithcode.com/paper/multi-label-class-balancing-algorithm-for
Repo
Framework

Vehicle Ego-Lane Estimation with Sensor Failure Modeling

Title Vehicle Ego-Lane Estimation with Sensor Failure Modeling
Authors Augusto Luis Ballardini, Daniele Cattaneo, Rubén Izquierdo, Ignacio Parra Alonso, Andrea Piazzoni, Miguel Ángel Sotelo, Domenico Giorgio Sorrenti
Abstract We present a probabilistic ego-lane estimation algorithm for highway-like scenarios that is designed to increase the accuracy of the ego-lane estimate, which can be obtained relying only on a noisy line detector and tracker. The contribution relies on a Hidden Markov Model (HMM) with a transient failure model. The proposed algorithm exploits the OpenStreetMap (or other cartographic services) road property lane number as the expected number of lanes and leverages consecutive, possibly incomplete, observations. The algorithm effectiveness is proven by employing different line detectors and showing we could achieve much more usable, i.e. stable and reliable, ego-lane estimates over more than 100 Km of highway scenarios, recorded both in Italy and Spain. Moreover, as we could not find a suitable dataset for a quantitative comparison with other approaches, we collected datasets and manually annotated the Ground Truth about the vehicle ego-lane. Such datasets are made publicly available for usage from the scientific community.
Tasks
Published 2020-02-05
URL https://arxiv.org/abs/2002.01913v2
PDF https://arxiv.org/pdf/2002.01913v2.pdf
PWC https://paperswithcode.com/paper/ego-lane-estimation-by-modelling-lanes-and
Repo
Framework

A Technology-aided Multi-modal Training Approach to Assist Abdominal Palpation Training and its Assessment in Medical Education

Title A Technology-aided Multi-modal Training Approach to Assist Abdominal Palpation Training and its Assessment in Medical Education
Authors A. Asadipour, K. Debattista, V. Patel, A. Chalmers
Abstract Computer-assisted multimodal training is an effective way of learning complex motor skills in various applications. In particular disciplines (eg. healthcare) incompetency in performing dexterous hands-on examinations (clinical palpation) may result in misdiagnosis of symptoms, serious injuries or even death. Furthermore, a high quality clinical examination can help to exclude significant pathology, and reduce time and cost of diagnosis by eliminating the need for unnecessary medical imaging. Medical palpation is used regularly as an effective preliminary diagnosis method all around the world but years of training are required currently to achieve competency. This paper focuses on a multimodal palpation training system to teach and improve clinical examination skills in relation to the abdomen. It is our aim to shorten significantly the palpation training duration by increasing the frequency of rehearsals as well as providing essential augmented feedback on how to perform various abdominal palpation techniques which has been captured and modelled from medical experts. Twenty three first year medical students divided into a control group (n=8), a semi-visually trained group (n=8), and a fully visually trained group (n=7) were invited to perform three palpation tasks (superficial, deep and liver). The medical students performances were assessed using both computer-based and human-based methods where a positive correlation was shown between the generated scores, r=.62, p(one-tailed)<.05. The visually-trained group significantly outperformed the control group in which abstract visualisation of applied forces and their palmar locations were provided to the students during each palpation examination (p<.05). Moreover, a positive trend was observed between groups when visual feedback was presented, J=132, z=2.62, r=0.55.
Tasks
Published 2020-01-16
URL https://arxiv.org/abs/2001.05745v1
PDF https://arxiv.org/pdf/2001.05745v1.pdf
PWC https://paperswithcode.com/paper/a-technology-aided-multi-modal-training
Repo
Framework

Ensemble Noise Simulation to Handle Uncertainty about Gradient-based Adversarial Attacks

Title Ensemble Noise Simulation to Handle Uncertainty about Gradient-based Adversarial Attacks
Authors Rehana Mahfuz, Rajeev Sahay, Aly El Gamal
Abstract Gradient-based adversarial attacks on neural networks can be crafted in a variety of ways by varying either how the attack algorithm relies on the gradient, the network architecture used for crafting the attack, or both. Most recent work has focused on defending classifiers in a case where there is no uncertainty about the attacker’s behavior (i.e., the attacker is expected to generate a specific attack using a specific network architecture). However, if the attacker is not guaranteed to behave in a certain way, the literature lacks methods in devising a strategic defense. We fill this gap by simulating the attacker’s noisy perturbation using a variety of attack algorithms based on gradients of various classifiers. We perform our analysis using a pre-processing Denoising Autoencoder (DAE) defense that is trained with the simulated noise. We demonstrate significant improvements in post-attack accuracy, using our proposed ensemble-trained defense, compared to a situation where no effort is made to handle uncertainty.
Tasks Denoising
Published 2020-01-26
URL https://arxiv.org/abs/2001.09486v1
PDF https://arxiv.org/pdf/2001.09486v1.pdf
PWC https://paperswithcode.com/paper/ensemble-noise-simulation-to-handle
Repo
Framework

AriEL: volume coding for sentence generation

Title AriEL: volume coding for sentence generation
Authors Luca Celotti, Simon Brodeur, Jean Rouat
Abstract Mapping sequences of discrete data to a point in a continuous space makes it difficult to retrieve those sequences via random sampling. Mapping the input to a volume would make it easier to retrieve at test time, and that’s the strategy followed by the family of approaches based on Variational Autoencoder. However the fact that they are at the same time optimizing for prediction and for smoothness of representation, forces them to trade-off between the two. We improve on the performance of some of the standard methods in deep learning to generate sentences by uniformly sampling a continuous space. We do it by proposing AriEL, that constructs volumes in a continuous space, without the need of encouraging the creation of volumes through the loss function. We first benchmark on a toy grammar, that allows to automatically evaluate the language learned and generated by the models. Then, we benchmark on a real dataset of human dialogues. Our results indicate that the random access to the stored information is dramatically improved, and our method AriEL is able to generate a wider variety of correct language by randomly sampling the latent space. VAE follows in performance for the toy dataset while, AE and Transformer follow for the real dataset. This partially supports to the hypothesis that encoding information into volumes instead of into points, can lead to improved retrieval of learned information with random sampling. This can lead to better generators and we also discuss potential disadvantages.
Tasks
Published 2020-03-30
URL https://arxiv.org/abs/2003.13600v1
PDF https://arxiv.org/pdf/2003.13600v1.pdf
PWC https://paperswithcode.com/paper/ariel-volume-coding-for-sentence-generation
Repo
Framework

Wind Speed Prediction using Deep Ensemble Learning with a Jet-like Architecture

Title Wind Speed Prediction using Deep Ensemble Learning with a Jet-like Architecture
Authors Aqsa Saeed Qureshi, Asifullah Khan, Muhammad Waleed Khan
Abstract The wind is one of the most increasingly used renewable energy resources. Accurate and reliable forecast of wind speed is necessary for efficient power production; however, it is not an easy task because it depends upon meteorological features of the surrounding region. Deep learning is extensively used these days for performing feature extraction. It has also been observed that the integration of several learning models, known as ensemble learning, generally gives better performance compared to a single model. The design of wings, tail, and nose of a jet improves the aerodynamics resulting in a smooth and controlled flight of the jet against the variations of the air currents. Inspired by the shape and working of a jet, a novel Deep Ensemble Learning using Jet-like Architecture (DEL-Jet) technique is proposed to enhance the diversity and robustness of a learning system against the variations in the input space. The diverse feature spaces of the base-regressors are exploited using the jet-like ensemble architecture. Two Convolutional Neural Networks (as jet wings) and one deep Auto-Encoder (as jet tail) are used to extract the diverse feature spaces from the input data. After that, nonlinear PCA (as jet main body) is employed to reduce the dimensionality of extracted feature space. Finally, both the reduced and the original feature spaces are exploited to train the meta-regressor (as jet nose) for forecasting the wind speed. The performance of the proposed DEL-Jet technique is evaluated for ten independent runs and shows that the deep and jet-like architecture helps in improving the robustness and generalization of the learning system.
Tasks Time Series
Published 2020-02-28
URL https://arxiv.org/abs/2002.12592v2
PDF https://arxiv.org/pdf/2002.12592v2.pdf
PWC https://paperswithcode.com/paper/wind-speed-prediction-using-deep-ensemble
Repo
Framework

MVLoc: Multimodal Variational Geometry-Aware Learning for Visual Localization

Title MVLoc: Multimodal Variational Geometry-Aware Learning for Visual Localization
Authors Rui Zhou, Changhao Chen, Bing Wang, Andrew Markham, Niki Trigoni
Abstract Recent learning-based research has achieved impressive results in the field of single-shot camera relocalization. However, how best to fuse multiple modalities, for example, image and depth, and how to deal with degraded or missing input are less well studied. In particular, we note that previous approaches towards deep fusion do not perform significantly better than models employing a single modality. We conjecture that this is because of the naive approaches to feature space fusion through summation or concatenation which do not take into account the different strengths of each modality, specifically appearance for images and structure for depth. To address this, we propose an end-to-end framework to fuse different sensor inputs through a variational Product-of-Experts (PoE) joint encoder followed by attention-based fusion. Unlike prior work which draws a single sample from the joint encoder, we show how accuracy can be increased through importance sampling and reparameterization of the latent space. Our model is extensively evaluated on RGB-D datasets, outperforming existing baselines by a large margin.
Tasks Camera Relocalization, Visual Localization
Published 2020-03-12
URL https://arxiv.org/abs/2003.07289v1
PDF https://arxiv.org/pdf/2003.07289v1.pdf
PWC https://paperswithcode.com/paper/mvloc-multimodal-variational-geometry-aware
Repo
Framework

Graph Inference Learning for Semi-supervised Classification

Title Graph Inference Learning for Semi-supervised Classification
Authors Chunyan Xu, Zhen Cui, Xiaobin Hong, Tong Zhang, Jian Yang, Wei Liu
Abstract In this work, we address semi-supervised classification of graph data, where the categories of those unlabeled nodes are inferred from labeled nodes as well as graph structures. Recent works often solve this problem via advanced graph convolution in a conventionally supervised manner, but the performance could degrade significantly when labeled data is scarce. To this end, we propose a Graph Inference Learning (GIL) framework to boost the performance of semi-supervised node classification by learning the inference of node labels on graph topology. To bridge the connection between two nodes, we formally define a structure relation by encapsulating node attributes, between-node paths, and local topological structures together, which can make the inference conveniently deduced from one node to another node. For learning the inference process, we further introduce meta-optimization on structure relations from training nodes to validation nodes, such that the learnt graph inference capability can be better self-adapted to testing nodes. Comprehensive evaluations on four benchmark datasets (including Cora, Citeseer, Pubmed, and NELL) demonstrate the superiority of our proposed GIL when compared against state-of-the-art methods on the semi-supervised node classification task.
Tasks Node Classification
Published 2020-01-17
URL https://arxiv.org/abs/2001.06137v1
PDF https://arxiv.org/pdf/2001.06137v1.pdf
PWC https://paperswithcode.com/paper/graph-inference-learning-for-semi-supervised-1
Repo
Framework

Prediction of adverse events in Afghanistan: regression analysis of time series data grouped not by geographic dependencies

Title Prediction of adverse events in Afghanistan: regression analysis of time series data grouped not by geographic dependencies
Authors Krzysztof Fiok, Waldemar Karwowski, Maciej Wilamowski
Abstract The aim of this study was to approach a difficult regression task on highly unbalanced data regarding active theater of war in Afghanistan. Our focus was set on predicting the negative events number without distinguishing precise nature of the events given historical data on investment and negative events per each of predefined 400 Afghanistan districts. In contrast with previous research on the matter, we propose an approach to analysis of time series data that benefits from non-conventional aggregation of these territorial entities. By carrying out initial exploratory data analysis we demonstrate that dividing data according to our proposal allows to identify strong trend and seasonal components in the selected target variable. Utilizing this approach we also tried to estimate which data regarding investments is most important for prediction performance. Based on our exploratory analysis and previous research we prepared 5 sets of independent variables that were fed to 3 machine learning regression models. The results expressed by mean absolute and mean square errors indicate that leveraging historical data regarding target variable allows for reasonable performance, however unfortunately other proposed independent variables does not seem to improve prediction quality.
Tasks Time Series
Published 2020-02-27
URL https://arxiv.org/abs/2002.12211v1
PDF https://arxiv.org/pdf/2002.12211v1.pdf
PWC https://paperswithcode.com/paper/prediction-of-adverse-events-in-afghanistan
Repo
Framework

Multivariate time-series modeling with generative neural networks

Title Multivariate time-series modeling with generative neural networks
Authors Marius Hofert, Avinash Prasad, Mu Zhu
Abstract Generative moment matching networks (GMMNs) are introduced as dependence models for the joint innovation distribution of multivariate time series (MTS). Following the popular copula-GARCH approach for modeling dependent MTS data, a framework allowing us to take an alternative GMMN-GARCH approach is presented. First, ARMA-GARCH models are utilized to capture the serial dependence within each univariate marginal time series. Second, if the number of marginal time series is large, principal component analysis (PCA) is used as a dimension-reduction step. Last, the remaining cross-sectional dependence is modeled via a GMMN, our main contribution. GMMNs are highly flexible and easy to simulate from, which is a major advantage over the copula-GARCH approach. Applications involving yield curve modeling and the analysis of foreign exchange rate returns are presented to demonstrate the utility of our approach, especially in terms of producing better empirical predictive distributions and making better probabilistic forecasts. All results are reproducible with the demo GMMN_MTS_paper of the R package gnn.
Tasks Dimensionality Reduction, Time Series
Published 2020-02-25
URL https://arxiv.org/abs/2002.10645v1
PDF https://arxiv.org/pdf/2002.10645v1.pdf
PWC https://paperswithcode.com/paper/multivariate-time-series-modeling-with
Repo
Framework
comments powered by Disqus