April 2, 2020

3255 words 16 mins read

Paper Group ANR 326

Attention-aware fusion RGB-D face recognition. The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition. Optimized Directed Roadmap Graph for Multi-Agent Path Finding Using Stochastic Gradient Descent. Visual Navigation Among Humans with Optimal Control as a Supervisor. Multipli …

Attention-aware fusion RGB-D face recognition


Title	Attention-aware fusion RGB-D face recognition
Authors	Hardik Uppal, Alireza Sepas-Moghaddam, Michael Greenspan, Ali Etemad
Abstract	A novel attention aware method is proposed to fuse two image modalities, RGB and depth, for enhanced RGB-D facial recognition. The proposed method uses two attention layers, the first focused on the fused feature maps generated by convolution layers, and the second focused on the spatial features of those maps. The training database is preprocessed and augmented through a set of geometric transformations, and the learning process is further aided using transfer learning from a pure 2D RGB image training process. Comparative evaluations demonstrate that the proposed method outperforms other state-of-the-art approaches, including both traditional and deep neural network-based methods, on the challenging CurtinFaces and IIIT-D RGB-D benchmark databases, achieving classification accuracies over 98:2% and 99:3% respectively.
Tasks	Face Recognition, Transfer Learning
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00168v1
PDF	https://arxiv.org/pdf/2003.00168v1.pdf
PWC	https://paperswithcode.com/paper/attention-aware-fusion-rgb-d-face-recognition
Repo
Framework

The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition


Title	The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition
Authors	Max Ruby, David S. Bolme, Joel Brogan, David Cornett III, Baldemar Delgado, Gavin Jager, Christi Johnson, Jose Martinez-Mendoza, Hector Santos-Villalobos, Nisha Srinivas
Abstract	Face recognition of vehicle occupants through windshields in unconstrained environments poses a number of unique challenges ranging from glare, poor illumination, driver pose and motion blur. In this paper, we further develop the hardware and software components of a custom vehicle imaging system to better overcome these challenges. After the build out of a physical prototype system that performs High Dynamic Range (HDR) imaging, we collect a small dataset of through-windshield image captures of known drivers. We then re-formulate the classical Mertens-Kautz-Van Reeth HDR fusion algorithm as a pre-initialized neural network, which we name the Mertens Unrolled Network (MU-Net), for the purpose of fine-tuning the HDR output of through-windshield images. Reconstructed faces from this novel HDR method are then evaluated and compared against other traditional and experimental HDR methods in a pre-trained state-of-the-art (SOTA) facial recognition pipeline, verifying the efficacy of our approach.
Tasks	Face Recognition
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12257v1
PDF	https://arxiv.org/pdf/2002.12257v1.pdf
PWC	https://paperswithcode.com/paper/the-mertens-unrolled-network-mu-net-a-high
Repo
Framework

Optimized Directed Roadmap Graph for Multi-Agent Path Finding Using Stochastic Gradient Descent


Title	Optimized Directed Roadmap Graph for Multi-Agent Path Finding Using Stochastic Gradient Descent
Authors	Christian Henkel, Marc Toussaint
Abstract	We present a novel approach called Optimized Directed Roadmap Graph (ODRM). It is a method to build a directed roadmap graph that allows for collision avoidance in multi-robot navigation. This is a highly relevant problem, for example for industrial autonomous guided vehicles. The core idea of ODRM is, that a directed roadmap can encode inherent properties of the environment which are useful when agents have to avoid each other in that same environment. Like Probabilistic Roadmaps (PRMs), ODRM’s first step is generating samples from C-space. In a second step, ODRM optimizes vertex positions and edge directions by Stochastic Gradient Descent (SGD). This leads to emergent properties like edges parallel to walls and patterns similar to two-lane streets or roundabouts. Agents can then navigate on this graph by searching their path independently and solving occurring agent-agent collisions at run-time. Using the graphs generated by ODRM compared to a non-optimized graph significantly fewer agent-agent collisions happen. We evaluate our roadmap with both, centralized and decentralized planners. Our experiments show that with ODRM even a simple centralized planner can solve problems with high numbers of agents that other multi-agent planners can not solve. Additionally, we use simulated robots with decentralized planners and online collision avoidance to show how agents are a lot faster on our roadmap than on standard grid maps.
Tasks	Multi-Agent Path Finding, Robot Navigation
Published	2020-03-29
URL	https://arxiv.org/abs/2003.12924v1
PDF	https://arxiv.org/pdf/2003.12924v1.pdf
PWC	https://paperswithcode.com/paper/optimized-directed-roadmap-graph-for-multi
Repo
Framework


Title	Visual Navigation Among Humans with Optimal Control as a Supervisor
Authors	Varun Tolani, Somil Bansal, Aleksandra Faust, Claire Tomlin
Abstract	Real world navigation requires robots to operate in unfamiliar, dynamic environments, sharing spaces with humans. Navigating around humans is especially difficult because it requires predicting their future motion, which can be quite challenging. We propose a novel framework for navigation around humans which combines learning-based perception with model-based optimal control. Specifically, we train a Convolutional Neural Network (CNN)-based perception module which maps the robot’s visual inputs to a waypoint, or next desired state. This waypoint is then input into planning and control modules which convey the robot safely and efficiently to the goal. To train the CNN we contribute a photo-realistic bench-marking dataset for autonomous robot navigation in the presence of humans. The CNN is trained using supervised learning on images rendered from our photo-realistic dataset. The proposed framework learns to anticipate and react to peoples’ motion based only on a monocular RGB image, without explicitly predicting future human motion. Our method generalizes well to unseen buildings and humans in both simulation and real world environments. Furthermore, our experiments demonstrate that combining model-based control and learning leads to better and more data-efficient navigational behaviors as compared to a purely learning based approach. Videos describing our approach and experiments are available on the project website.
Tasks	Robot Navigation, Visual Navigation
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09354v1
PDF	https://arxiv.org/pdf/2003.09354v1.pdf
PWC	https://paperswithcode.com/paper/visual-navigation-among-humans-with-optimal
Repo
Framework


Title	Multiplicative Controller Fusion: A Hybrid Navigation Strategy For Deployment in Unknown Environments
Authors	Krishan Rana, Vibhavari Dasagi, Ben Talbot, Michael Milford, Niko Sünderhauf
Abstract	Learning-based approaches often outperform hand-coded algorithmic solutions for many problems in robotics. However, learning long-horizon tasks on real robot hardware can be intractable, and transferring a learned policy from simulation to reality is still extremely challenging. We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions as an algorithmic prior during training and deployment. During training, our gated fusion approach enables the prior to guide the initial stages of exploration, increasing sample-efficiency and enabling learning from sparse long-horizon reward signals. Importantly, the policy can learn to improve beyond the performance of the sub-optimal prior since the prior’s influence is annealed gradually. During deployment, the policy’s uncertainty provides a reliable strategy for transferring a simulation-trained policy to the real world by falling back to the prior controller in uncertain states. We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation and demonstrate safe transfer from simulation to the real world without any fine tuning. The code for this project is made publicly available at https://sites.google.com/view/mcf-nav/home.
Tasks	Robot Navigation
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05117v2
PDF	https://arxiv.org/pdf/2003.05117v2.pdf
PWC	https://paperswithcode.com/paper/multiplicative-controller-fusion-a-hybrid
Repo
Framework


Title	Learning View and Target Invariant Visual Servoing for Navigation
Authors	Yimeng Li, Jana Kosecka
Abstract	The advances in deep reinforcement learning recently revived interest in data-driven learning based approaches to navigation. In this paper we propose to learn viewpoint invariant and target invariant visual servoing for local mobile robot navigation; given an initial view and the goal view or an image of a target, we train deep convolutional network controller to reach the desired goal. We present a new architecture for this task which rests on the ability of establishing correspondences between the initial and goal view and novel reward structure motivated by the traditional feedback control error. The advantage of the proposed model is that it does not require calibration and depth information and achieves robust visual servoing in a variety of environments and targets without any parameter fine tuning. We present comprehensive evaluation of the approach and comparison with other deep learning architectures as well as classical visual servoing methods in visually realistic simulation environment. The presented model overcomes the brittleness of classical visual servoing based methods and achieves significantly higher generalization capability compared to the previous learning approaches.
Tasks	Calibration, Robot Navigation
Published	2020-03-04
URL	https://arxiv.org/abs/2003.02327v1
PDF	https://arxiv.org/pdf/2003.02327v1.pdf
PWC	https://paperswithcode.com/paper/learning-view-and-target-invariant-visual
Repo
Framework

DISCO: Double Likelihood-free Inference Stochastic Control


Title	DISCO: Double Likelihood-free Inference Stochastic Control
Authors	Lucas Barcelos, Rafael Oliveira, Rafael Possas, Lionel Ott, Fabio Ramos
Abstract	Accurate simulation of complex physical systems enables the development, testing, and certification of control strategies before they are deployed into the real systems. As simulators become more advanced, the analytical tractability of the differential equations and associated numerical solvers incorporated in the simulations diminishes, making them difficult to analyse. A potential solution is the use of probabilistic inference to assess the uncertainty of the simulation parameters given real observations of the system. Unfortunately the likelihood function required for inference is generally expensive to compute or totally intractable. In this paper we propose to leverage the power of modern simulators and recent techniques in Bayesian statistics for likelihood-free inference to design a control framework that is efficient and robust with respect to the uncertainty over simulation parameters. The posterior distribution over simulation parameters is propagated through a potentially non-analytical model of the system with the unscented transform, and a variant of the information theoretical model predictive control. This approach provides a more efficient way to evaluate trajectory roll outs than Monte Carlo sampling, reducing the online computation burden. Experiments show that the controller proposed attained superior performance and robustness on classical control and robotics tasks when compared to models not accounting for the uncertainty over model parameters.
Tasks
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07379v2
PDF	https://arxiv.org/pdf/2002.07379v2.pdf
PWC	https://paperswithcode.com/paper/disco-double-likelihood-free-inference
Repo
Framework

Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective


Title	Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective
Authors	Jialun Liu, Yifan Sun, Chuchu Han, Zhaopeng Dou, Wenhui Li
Abstract	This paper considers learning deep features from long-tailed data. We observe that in the deep feature space, the head classes and the tail classes present different distribution patterns. The head classes have a relatively large spatial span, while the tail classes have significantly small spatial span, due to the lack of intra-class diversity. This uneven distribution between head and tail classes distorts the overall feature space, which compromises the discriminative ability of the learned features. Intuitively, we seek to expand the distribution of the tail classes by transferring from the head classes, so as to alleviate the distortion of the feature space. To this end, we propose to construct each feature into a “feature cloud”. If a sample belongs to a tail class, the corresponding feature cloud will have relatively large distribution range, in compensation to its lack of diversity. It allows each tail sample to push the samples from other classes far away, recovering the intra-class diversity of tail classes. Extensive experimental evaluations on person re-identification and face recognition tasks confirm the effectiveness of our method.
Tasks	Face Recognition, Person Re-Identification, Representation Learning
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10826v2
PDF	https://arxiv.org/pdf/2002.10826v2.pdf
PWC	https://paperswithcode.com/paper/deep-representation-learning-on-long-tailed
Repo
Framework

Deep Learning for Content-based Personalized Viewport Prediction of 360-Degree VR Videos


Title	Deep Learning for Content-based Personalized Viewport Prediction of 360-Degree VR Videos
Authors	Xinwei Chen, Ali Taleb Zadeh Kasgari, Walid Saad
Abstract	In this paper, the problem of head movement prediction for virtual reality videos is studied. In the considered model, a deep learning network is introduced to leverage position data as well as video frame content to predict future head movement. For optimizing data input into this neural network, data sample rate, reduced data, and long-period prediction length are also explored for this model. Simulation results show that the proposed approach yields 16.1% improvement in terms of prediction accuracy compared to a baseline approach that relies only on the position data.
Tasks
Published	2020-03-01
URL	https://arxiv.org/abs/2003.00429v1
PDF	https://arxiv.org/pdf/2003.00429v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-content-based-personalized
Repo
Framework

Submodular Maximization Through Barrier Functions


Title	Submodular Maximization Through Barrier Functions
Authors	Ashwinkumar Badanidiyuru, Amin Karbasi, Ehsan Kazemi, Jan Vondrak
Abstract	In this paper, we introduce a novel technique for constrained submodular maximization, inspired by barrier functions in continuous optimization. This connection not only improves the running time for constrained submodular maximization but also provides the state of the art guarantee. More precisely, for maximizing a monotone submodular function subject to the combination of a $k$-matchoid and $\ell$-knapsack constraint (for $\ell\leq k$), we propose a potential function that can be approximately minimized. Once we minimize the potential function up to an $\epsilon$ error it is guaranteed that we have found a feasible set with a $2(k+1+\epsilon)$-approximation factor which can indeed be further improved to $(k+1+\epsilon)$ by an enumeration technique. We extensively evaluate the performance of our proposed algorithm over several real-world applications, including a movie recommendation system, summarization tasks for YouTube videos, Twitter feeds and Yelp business locations, and a set cover problem.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03523v1
PDF	https://arxiv.org/pdf/2002.03523v1.pdf
PWC	https://paperswithcode.com/paper/submodular-maximization-through-barrier
Repo
Framework

UGRWO-Sampling: A modified random walk under-sampling approach based on graphs to imbalanced data classification


Title	UGRWO-Sampling: A modified random walk under-sampling approach based on graphs to imbalanced data classification
Authors	Saeideh Roshanfekr, Shahriar Esmaeili, Hassan Ataeian, Ali Amiri
Abstract	In this paper, we propose a new RWO-Sampling (Random Walk Over-Sampling) based on graphs for imbalanced datasets. In this method, two figures based on under-sampling and over-sampling methods are introduced to keep the proximity information, which is robust to noises and outliers. After the construction of the first graph on minority class, RWO-Sampling will be implemented on selected samples, and the rest of them will remain unchanged. The second graph is constructed for the majority class, and the samples in a low-density area (outliers) are removed. In the proposed method, examples of the majority class in a high-density area are selected, and the rest of them are eliminated. Furthermore, utilizing RWO-sampling, the boundary of minority class is increased though, the outliers are not raised. This method is tested, and the number of evaluation measures is compared to previous methods on nine continuous attribute datasets with different over-sampling rates. The experimental results were an indicator of the high efficiency and flexibility of the proposed method for the classification of imbalanced data.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03521v2
PDF	https://arxiv.org/pdf/2002.03521v2.pdf
PWC	https://paperswithcode.com/paper/ugrwo-sampling-a-modified-random-walk-under
Repo
Framework

Uncertainty-based Modulation for Lifelong Learning


Title	Uncertainty-based Modulation for Lifelong Learning
Authors	Andrew Brna, Ryan Brown, Patrick Connolly, Stephen Simons, Renee Shimizu, Mario Aguilar-Simon
Abstract	The creation of machine learning algorithms for intelligent agents capable of continuous, lifelong learning is a critical objective for algorithms being deployed on real-life systems in dynamic environments. Here we present an algorithm inspired by neuromodulatory mechanisms in the human brain that integrates and expands upon Stephen Grossberg's ground-breaking Adaptive Resonance Theory proposals. Specifically, it builds on the concept of uncertainty, and employs a series of neuromodulatory mechanisms to enable continuous learning, including self-supervised and one-shot learning. Algorithm components were evaluated in a series of benchmark experiments that demonstrate stable learning without catastrophic forgetting. We also demonstrate the critical role of developing these systems in a closed-loop manner where the environment and the agent's behaviors constrain and guide the learning process. To this end, we integrated the algorithm into an embodied simulated drone agent. The experiments show that the algorithm is capable of continuous learning of new tasks and under changed conditions with high classification accuracy (greater than 94 percent) in a virtual environment, without catastrophic forgetting. The algorithm accepts high dimensional inputs from any state-of-the-art detection and feature extraction algorithms, making it a flexible addition to existing systems. We also describe future development efforts focused on imbuing the algorithm with mechanisms to seek out new knowledge as well as employ a broader range of neuromodulatory processes.
Tasks	One-Shot Learning
Published	2020-01-27
URL	https://arxiv.org/abs/2001.09822v1
PDF	https://arxiv.org/pdf/2001.09822v1.pdf
PWC	https://paperswithcode.com/paper/uncertainty-based-modulation-for-lifelong
Repo
Framework

HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline


Title	HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline
Authors	Richard Liaw, Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph Gonzalez, Ion Stoica, Alexey Tumanov
Abstract	Prior research in resource scheduling for machine learning training workloads has largely focused on minimizing job completion times. Commonly, these model training workloads collectively search over a large number of parameter values that control the learning process in a hyperparameter search. It is preferable to identify and maximally provision the best-performing hyperparameter configuration (trial) to achieve the highest accuracy result as soon as possible. To optimally trade-off evaluating multiple configurations and training the most promising ones by a fixed deadline, we design and build HyperSched – a dynamic application-level resource scheduler to track, identify, and preferentially allocate resources to the best performing trials to maximize accuracy by the deadline. HyperSched leverages three properties of a hyperparameter search workload over-looked in prior work - trial disposability, progressively identifiable rankings among different configurations, and space-time constraints - to outperform standard hyperparameter search algorithms across a variety of benchmarks.
Tasks
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02338v1
PDF	https://arxiv.org/pdf/2001.02338v1.pdf
PWC	https://paperswithcode.com/paper/hypersched-dynamic-resource-reallocation-for
Repo
Framework

Architecture Disentanglement for Deep Neural Networks


Title	Architecture Disentanglement for Deep Neural Networks
Authors	Jie Hu, Rongrong Ji, Qixiang Ye, Tong Tong, ShengChuan Zhang, Ke Li, Feiyue Huang, Ling Shao
Abstract	Deep Neural Networks (DNNs) are central to deep learning, and understanding their internal working mechanism is crucial if they are to be used for emerging applications in medical and industrial AI. To this end, the current line of research typically involves linking semantic concepts to a DNN’s units or layers. However, this fails to capture the hierarchical inference procedure throughout the network. To address this issue, we introduce the novel concept of Neural Architecture Disentanglement (NAD) in this paper. Specifically, we disentangle a pre-trained network into hierarchical paths corresponding to specific concepts, forming the concept feature paths, i.e., the concept flows from the bottom to top layers of a DNN. Such paths further enable us to quantify the interpretability of DNNs according to the learned diversity of human concepts. We select four types of representative architectures ranging from handcrafted to autoML-based, and conduct extensive experiments on object-based and scene-based datasets. Our NAD sheds important light on the information flow of semantic concepts in DNNs, and provides a fundamental metric that will facilitate the design of interpretable network architectures. Code will be available at: https://github.com/hujiecpp/NAD.
Tasks	AutoML
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13268v1
PDF	https://arxiv.org/pdf/2003.13268v1.pdf
PWC	https://paperswithcode.com/paper/architecture-disentanglement-for-deep-neural
Repo
Framework

A MEMS-based Foveating LIDAR to enable Real-time Adaptive Depth Sensing


Title	A MEMS-based Foveating LIDAR to enable Real-time Adaptive Depth Sensing
Authors	Francesco Pittaluga, Zaid Tasneem, Justin Folden, Brevin Tilmon, Ayan Chakrabarti, Sanjeev J. Koppal
Abstract	Most active depth sensors sample their visual field using a fixed pattern, decided by accuracy, speed and cost trade-offs, rather than scene content. However, a number of recent works have demonstrated that adapting measurement patterns to scene content can offer significantly better trade-offs. We propose a hardware LIDAR design that allows flexible real-time measurements according to dynamically specified measurement patterns. Our flexible depth sensor design consists of a controllable scanning LIDAR that can foveate, or increase resolution in regions of interest, and that can fully leverage the power of adaptive depth sensing. We describe our optical setup and calibration, which enables fast sparse depth measurements using a scanning MEMS (micro-electro mechanical) mirror. We validate the efficacy of our prototype LIDAR design by testing on over 75 static and dynamic scenes spanning a range of environments. We also show CNN-based depth-map completion of sparse measurements obtained by our sensor. Our experiments show that our sensor can realize adaptive depth sensing systems.
Tasks	Calibration
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09545v1
PDF	https://arxiv.org/pdf/2003.09545v1.pdf
PWC	https://paperswithcode.com/paper/a-mems-based-foveating-lidar-to-enable-real
Repo
Framework

Paper Group ANR 326

Attention-aware fusion RGB-D face recognition

The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition

Optimized Directed Roadmap Graph for Multi-Agent Path Finding Using Stochastic Gradient Descent

Visual Navigation Among Humans with Optimal Control as a Supervisor

Multiplicative Controller Fusion: A Hybrid Navigation Strategy For Deployment in Unknown Environments

Learning View and Target Invariant Visual Servoing for Navigation