Paper Group ANR 326
Attention-aware fusion RGB-D face recognition. The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition. Optimized Directed Roadmap Graph for Multi-Agent Path Finding Using Stochastic Gradient Descent. Visual Navigation Among Humans with Optimal Control as a Supervisor. Multipli …
Attention-aware fusion RGB-D face recognition
Title | Attention-aware fusion RGB-D face recognition |
Authors | Hardik Uppal, Alireza Sepas-Moghaddam, Michael Greenspan, Ali Etemad |
Abstract | A novel attention aware method is proposed to fuse two image modalities, RGB and depth, for enhanced RGB-D facial recognition. The proposed method uses two attention layers, the first focused on the fused feature maps generated by convolution layers, and the second focused on the spatial features of those maps. The training database is preprocessed and augmented through a set of geometric transformations, and the learning process is further aided using transfer learning from a pure 2D RGB image training process. Comparative evaluations demonstrate that the proposed method outperforms other state-of-the-art approaches, including both traditional and deep neural network-based methods, on the challenging CurtinFaces and IIIT-D RGB-D benchmark databases, achieving classification accuracies over 98:2% and 99:3% respectively. |
Tasks | Face Recognition, Transfer Learning |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.00168v1 |
https://arxiv.org/pdf/2003.00168v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-aware-fusion-rgb-d-face-recognition |
Repo | |
Framework | |
The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition
Title | The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition |
Authors | Max Ruby, David S. Bolme, Joel Brogan, David Cornett III, Baldemar Delgado, Gavin Jager, Christi Johnson, Jose Martinez-Mendoza, Hector Santos-Villalobos, Nisha Srinivas |
Abstract | Face recognition of vehicle occupants through windshields in unconstrained environments poses a number of unique challenges ranging from glare, poor illumination, driver pose and motion blur. In this paper, we further develop the hardware and software components of a custom vehicle imaging system to better overcome these challenges. After the build out of a physical prototype system that performs High Dynamic Range (HDR) imaging, we collect a small dataset of through-windshield image captures of known drivers. We then re-formulate the classical Mertens-Kautz-Van Reeth HDR fusion algorithm as a pre-initialized neural network, which we name the Mertens Unrolled Network (MU-Net), for the purpose of fine-tuning the HDR output of through-windshield images. Reconstructed faces from this novel HDR method are then evaluated and compared against other traditional and experimental HDR methods in a pre-trained state-of-the-art (SOTA) facial recognition pipeline, verifying the efficacy of our approach. |
Tasks | Face Recognition |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12257v1 |
https://arxiv.org/pdf/2002.12257v1.pdf | |
PWC | https://paperswithcode.com/paper/the-mertens-unrolled-network-mu-net-a-high |
Repo | |
Framework | |
Optimized Directed Roadmap Graph for Multi-Agent Path Finding Using Stochastic Gradient Descent
Title | Optimized Directed Roadmap Graph for Multi-Agent Path Finding Using Stochastic Gradient Descent |
Authors | Christian Henkel, Marc Toussaint |
Abstract | We present a novel approach called Optimized Directed Roadmap Graph (ODRM). It is a method to build a directed roadmap graph that allows for collision avoidance in multi-robot navigation. This is a highly relevant problem, for example for industrial autonomous guided vehicles. The core idea of ODRM is, that a directed roadmap can encode inherent properties of the environment which are useful when agents have to avoid each other in that same environment. Like Probabilistic Roadmaps (PRMs), ODRM’s first step is generating samples from C-space. In a second step, ODRM optimizes vertex positions and edge directions by Stochastic Gradient Descent (SGD). This leads to emergent properties like edges parallel to walls and patterns similar to two-lane streets or roundabouts. Agents can then navigate on this graph by searching their path independently and solving occurring agent-agent collisions at run-time. Using the graphs generated by ODRM compared to a non-optimized graph significantly fewer agent-agent collisions happen. We evaluate our roadmap with both, centralized and decentralized planners. Our experiments show that with ODRM even a simple centralized planner can solve problems with high numbers of agents that other multi-agent planners can not solve. Additionally, we use simulated robots with decentralized planners and online collision avoidance to show how agents are a lot faster on our roadmap than on standard grid maps. |
Tasks | Multi-Agent Path Finding, Robot Navigation |
Published | 2020-03-29 |
URL | https://arxiv.org/abs/2003.12924v1 |
https://arxiv.org/pdf/2003.12924v1.pdf | |
PWC | https://paperswithcode.com/paper/optimized-directed-roadmap-graph-for-multi |
Repo | |
Framework | |
Visual Navigation Among Humans with Optimal Control as a Supervisor
Title | Visual Navigation Among Humans with Optimal Control as a Supervisor |
Authors | Varun Tolani, Somil Bansal, Aleksandra Faust, Claire Tomlin |
Abstract | Real world navigation requires robots to operate in unfamiliar, dynamic environments, sharing spaces with humans. Navigating around humans is especially difficult because it requires predicting their future motion, which can be quite challenging. We propose a novel framework for navigation around humans which combines learning-based perception with model-based optimal control. Specifically, we train a Convolutional Neural Network (CNN)-based perception module which maps the robot’s visual inputs to a waypoint, or next desired state. This waypoint is then input into planning and control modules which convey the robot safely and efficiently to the goal. To train the CNN we contribute a photo-realistic bench-marking dataset for autonomous robot navigation in the presence of humans. The CNN is trained using supervised learning on images rendered from our photo-realistic dataset. The proposed framework learns to anticipate and react to peoples’ motion based only on a monocular RGB image, without explicitly predicting future human motion. Our method generalizes well to unseen buildings and humans in both simulation and real world environments. Furthermore, our experiments demonstrate that combining model-based control and learning leads to better and more data-efficient navigational behaviors as compared to a purely learning based approach. Videos describing our approach and experiments are available on the project website. |
Tasks | Robot Navigation, Visual Navigation |
Published | 2020-03-20 |
URL | https://arxiv.org/abs/2003.09354v1 |
https://arxiv.org/pdf/2003.09354v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-navigation-among-humans-with-optimal |
Repo | |
Framework | |
Multiplicative Controller Fusion: A Hybrid Navigation Strategy For Deployment in Unknown Environments
Title | Multiplicative Controller Fusion: A Hybrid Navigation Strategy For Deployment in Unknown Environments |
Authors | Krishan Rana, Vibhavari Dasagi, Ben Talbot, Michael Milford, Niko Sünderhauf |
Abstract | Learning-based approaches often outperform hand-coded algorithmic solutions for many problems in robotics. However, learning long-horizon tasks on real robot hardware can be intractable, and transferring a learned policy from simulation to reality is still extremely challenging. We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions as an algorithmic prior during training and deployment. During training, our gated fusion approach enables the prior to guide the initial stages of exploration, increasing sample-efficiency and enabling learning from sparse long-horizon reward signals. Importantly, the policy can learn to improve beyond the performance of the sub-optimal prior since the prior’s influence is annealed gradually. During deployment, the policy’s uncertainty provides a reliable strategy for transferring a simulation-trained policy to the real world by falling back to the prior controller in uncertain states. We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation and demonstrate safe transfer from simulation to the real world without any fine tuning. The code for this project is made publicly available at https://sites.google.com/view/mcf-nav/home. |
Tasks | Robot Navigation |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05117v2 |
https://arxiv.org/pdf/2003.05117v2.pdf | |
PWC | https://paperswithcode.com/paper/multiplicative-controller-fusion-a-hybrid |
Repo | |
Framework | |
Learning View and Target Invariant Visual Servoing for Navigation
Title | Learning View and Target Invariant Visual Servoing for Navigation |
Authors | Yimeng Li, Jana Kosecka |
Abstract | The advances in deep reinforcement learning recently revived interest in data-driven learning based approaches to navigation. In this paper we propose to learn viewpoint invariant and target invariant visual servoing for local mobile robot navigation; given an initial view and the goal view or an image of a target, we train deep convolutional network controller to reach the desired goal. We present a new architecture for this task which rests on the ability of establishing correspondences between the initial and goal view and novel reward structure motivated by the traditional feedback control error. The advantage of the proposed model is that it does not require calibration and depth information and achieves robust visual servoing in a variety of environments and targets without any parameter fine tuning. We present comprehensive evaluation of the approach and comparison with other deep learning architectures as well as classical visual servoing methods in visually realistic simulation environment. The presented model overcomes the brittleness of classical visual servoing based methods and achieves significantly higher generalization capability compared to the previous learning approaches. |
Tasks | Calibration, Robot Navigation |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.02327v1 |
https://arxiv.org/pdf/2003.02327v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-view-and-target-invariant-visual |
Repo | |
Framework | |
DISCO: Double Likelihood-free Inference Stochastic Control
Title | DISCO: Double Likelihood-free Inference Stochastic Control |
Authors | Lucas Barcelos, Rafael Oliveira, Rafael Possas, Lionel Ott, Fabio Ramos |
Abstract | Accurate simulation of complex physical systems enables the development, testing, and certification of control strategies before they are deployed into the real systems. As simulators become more advanced, the analytical tractability of the differential equations and associated numerical solvers incorporated in the simulations diminishes, making them difficult to analyse. A potential solution is the use of probabilistic inference to assess the uncertainty of the simulation parameters given real observations of the system. Unfortunately the likelihood function required for inference is generally expensive to compute or totally intractable. In this paper we propose to leverage the power of modern simulators and recent techniques in Bayesian statistics for likelihood-free inference to design a control framework that is efficient and robust with respect to the uncertainty over simulation parameters. The posterior distribution over simulation parameters is propagated through a potentially non-analytical model of the system with the unscented transform, and a variant of the information theoretical model predictive control. This approach provides a more efficient way to evaluate trajectory roll outs than Monte Carlo sampling, reducing the online computation burden. Experiments show that the controller proposed attained superior performance and robustness on classical control and robotics tasks when compared to models not accounting for the uncertainty over model parameters. |
Tasks | |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07379v2 |
https://arxiv.org/pdf/2002.07379v2.pdf | |
PWC | https://paperswithcode.com/paper/disco-double-likelihood-free-inference |
Repo | |
Framework | |
Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective
Title | Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective |
Authors | Jialun Liu, Yifan Sun, Chuchu Han, Zhaopeng Dou, Wenhui Li |
Abstract | This paper considers learning deep features from long-tailed data. We observe that in the deep feature space, the head classes and the tail classes present different distribution patterns. The head classes have a relatively large spatial span, while the tail classes have significantly small spatial span, due to the lack of intra-class diversity. This uneven distribution between head and tail classes distorts the overall feature space, which compromises the discriminative ability of the learned features. Intuitively, we seek to expand the distribution of the tail classes by transferring from the head classes, so as to alleviate the distortion of the feature space. To this end, we propose to construct each feature into a “feature cloud”. If a sample belongs to a tail class, the corresponding feature cloud will have relatively large distribution range, in compensation to its lack of diversity. It allows each tail sample to push the samples from other classes far away, recovering the intra-class diversity of tail classes. Extensive experimental evaluations on person re-identification and face recognition tasks confirm the effectiveness of our method. |
Tasks | Face Recognition, Person Re-Identification, Representation Learning |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10826v2 |
https://arxiv.org/pdf/2002.10826v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-representation-learning-on-long-tailed |
Repo | |
Framework | |
Deep Learning for Content-based Personalized Viewport Prediction of 360-Degree VR Videos
Title | Deep Learning for Content-based Personalized Viewport Prediction of 360-Degree VR Videos |
Authors | Xinwei Chen, Ali Taleb Zadeh Kasgari, Walid Saad |
Abstract | In this paper, the problem of head movement prediction for virtual reality videos is studied. In the considered model, a deep learning network is introduced to leverage position data as well as video frame content to predict future head movement. For optimizing data input into this neural network, data sample rate, reduced data, and long-period prediction length are also explored for this model. Simulation results show that the proposed approach yields 16.1% improvement in terms of prediction accuracy compared to a baseline approach that relies only on the position data. |
Tasks | |
Published | 2020-03-01 |
URL | https://arxiv.org/abs/2003.00429v1 |
https://arxiv.org/pdf/2003.00429v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-content-based-personalized |
Repo | |
Framework | |
Submodular Maximization Through Barrier Functions
Title | Submodular Maximization Through Barrier Functions |
Authors | Ashwinkumar Badanidiyuru, Amin Karbasi, Ehsan Kazemi, Jan Vondrak |
Abstract | In this paper, we introduce a novel technique for constrained submodular maximization, inspired by barrier functions in continuous optimization. This connection not only improves the running time for constrained submodular maximization but also provides the state of the art guarantee. More precisely, for maximizing a monotone submodular function subject to the combination of a $k$-matchoid and $\ell$-knapsack constraint (for $\ell\leq k$), we propose a potential function that can be approximately minimized. Once we minimize the potential function up to an $\epsilon$ error it is guaranteed that we have found a feasible set with a $2(k+1+\epsilon)$-approximation factor which can indeed be further improved to $(k+1+\epsilon)$ by an enumeration technique. We extensively evaluate the performance of our proposed algorithm over several real-world applications, including a movie recommendation system, summarization tasks for YouTube videos, Twitter feeds and Yelp business locations, and a set cover problem. |
Tasks | |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03523v1 |
https://arxiv.org/pdf/2002.03523v1.pdf | |
PWC | https://paperswithcode.com/paper/submodular-maximization-through-barrier |
Repo | |
Framework | |
UGRWO-Sampling: A modified random walk under-sampling approach based on graphs to imbalanced data classification
Title | UGRWO-Sampling: A modified random walk under-sampling approach based on graphs to imbalanced data classification |
Authors | Saeideh Roshanfekr, Shahriar Esmaeili, Hassan Ataeian, Ali Amiri |
Abstract | In this paper, we propose a new RWO-Sampling (Random Walk Over-Sampling) based on graphs for imbalanced datasets. In this method, two figures based on under-sampling and over-sampling methods are introduced to keep the proximity information, which is robust to noises and outliers. After the construction of the first graph on minority class, RWO-Sampling will be implemented on selected samples, and the rest of them will remain unchanged. The second graph is constructed for the majority class, and the samples in a low-density area (outliers) are removed. In the proposed method, examples of the majority class in a high-density area are selected, and the rest of them are eliminated. Furthermore, utilizing RWO-sampling, the boundary of minority class is increased though, the outliers are not raised. This method is tested, and the number of evaluation measures is compared to previous methods on nine continuous attribute datasets with different over-sampling rates. The experimental results were an indicator of the high efficiency and flexibility of the proposed method for the classification of imbalanced data. |
Tasks | |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03521v2 |
https://arxiv.org/pdf/2002.03521v2.pdf | |
PWC | https://paperswithcode.com/paper/ugrwo-sampling-a-modified-random-walk-under |
Repo | |
Framework | |
Uncertainty-based Modulation for Lifelong Learning
Title | Uncertainty-based Modulation for Lifelong Learning |
Authors | Andrew Brna, Ryan Brown, Patrick Connolly, Stephen Simons, Renee Shimizu, Mario Aguilar-Simon |
Abstract | The creation of machine learning algorithms for intelligent agents capable of continuous, lifelong learning is a critical objective for algorithms being deployed on real-life systems in dynamic environments. Here we present an algorithm inspired by neuromodulatory mechanisms in the human brain that integrates and expands upon Stephen Grossberg's ground-breaking Adaptive Resonance Theory proposals. Specifically, it builds on the concept of uncertainty, and employs a series of neuromodulatory mechanisms to enable continuous learning, including self-supervised and one-shot learning. Algorithm components were evaluated in a series of benchmark experiments that demonstrate stable learning without catastrophic forgetting. We also demonstrate the critical role of developing these systems in a closed-loop manner where the environment and the agent's behaviors constrain and guide the learning process. To this end, we integrated the algorithm into an embodied simulated drone agent. The experiments show that the algorithm is capable of continuous learning of new tasks and under changed conditions with high classification accuracy (greater than 94 percent) in a virtual environment, without catastrophic forgetting. The algorithm accepts high dimensional inputs from any state-of-the-art detection and feature extraction algorithms, making it a flexible addition to existing systems. We also describe future development efforts focused on imbuing the algorithm with mechanisms to seek out new knowledge as well as employ a broader range of neuromodulatory processes. |
Tasks | One-Shot Learning |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2001.09822v1 |
https://arxiv.org/pdf/2001.09822v1.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-based-modulation-for-lifelong |
Repo | |
Framework | |
HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline
Title | HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline |
Authors | Richard Liaw, Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph Gonzalez, Ion Stoica, Alexey Tumanov |
Abstract | Prior research in resource scheduling for machine learning training workloads has largely focused on minimizing job completion times. Commonly, these model training workloads collectively search over a large number of parameter values that control the learning process in a hyperparameter search. It is preferable to identify and maximally provision the best-performing hyperparameter configuration (trial) to achieve the highest accuracy result as soon as possible. To optimally trade-off evaluating multiple configurations and training the most promising ones by a fixed deadline, we design and build HyperSched – a dynamic application-level resource scheduler to track, identify, and preferentially allocate resources to the best performing trials to maximize accuracy by the deadline. HyperSched leverages three properties of a hyperparameter search workload over-looked in prior work - trial disposability, progressively identifiable rankings among different configurations, and space-time constraints - to outperform standard hyperparameter search algorithms across a variety of benchmarks. |
Tasks | |
Published | 2020-01-08 |
URL | https://arxiv.org/abs/2001.02338v1 |
https://arxiv.org/pdf/2001.02338v1.pdf | |
PWC | https://paperswithcode.com/paper/hypersched-dynamic-resource-reallocation-for |
Repo | |
Framework | |
Architecture Disentanglement for Deep Neural Networks
Title | Architecture Disentanglement for Deep Neural Networks |
Authors | Jie Hu, Rongrong Ji, Qixiang Ye, Tong Tong, ShengChuan Zhang, Ke Li, Feiyue Huang, Ling Shao |
Abstract | Deep Neural Networks (DNNs) are central to deep learning, and understanding their internal working mechanism is crucial if they are to be used for emerging applications in medical and industrial AI. To this end, the current line of research typically involves linking semantic concepts to a DNN’s units or layers. However, this fails to capture the hierarchical inference procedure throughout the network. To address this issue, we introduce the novel concept of Neural Architecture Disentanglement (NAD) in this paper. Specifically, we disentangle a pre-trained network into hierarchical paths corresponding to specific concepts, forming the concept feature paths, i.e., the concept flows from the bottom to top layers of a DNN. Such paths further enable us to quantify the interpretability of DNNs according to the learned diversity of human concepts. We select four types of representative architectures ranging from handcrafted to autoML-based, and conduct extensive experiments on object-based and scene-based datasets. Our NAD sheds important light on the information flow of semantic concepts in DNNs, and provides a fundamental metric that will facilitate the design of interpretable network architectures. Code will be available at: https://github.com/hujiecpp/NAD. |
Tasks | AutoML |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13268v1 |
https://arxiv.org/pdf/2003.13268v1.pdf | |
PWC | https://paperswithcode.com/paper/architecture-disentanglement-for-deep-neural |
Repo | |
Framework | |
A MEMS-based Foveating LIDAR to enable Real-time Adaptive Depth Sensing
Title | A MEMS-based Foveating LIDAR to enable Real-time Adaptive Depth Sensing |
Authors | Francesco Pittaluga, Zaid Tasneem, Justin Folden, Brevin Tilmon, Ayan Chakrabarti, Sanjeev J. Koppal |
Abstract | Most active depth sensors sample their visual field using a fixed pattern, decided by accuracy, speed and cost trade-offs, rather than scene content. However, a number of recent works have demonstrated that adapting measurement patterns to scene content can offer significantly better trade-offs. We propose a hardware LIDAR design that allows flexible real-time measurements according to dynamically specified measurement patterns. Our flexible depth sensor design consists of a controllable scanning LIDAR that can foveate, or increase resolution in regions of interest, and that can fully leverage the power of adaptive depth sensing. We describe our optical setup and calibration, which enables fast sparse depth measurements using a scanning MEMS (micro-electro mechanical) mirror. We validate the efficacy of our prototype LIDAR design by testing on over 75 static and dynamic scenes spanning a range of environments. We also show CNN-based depth-map completion of sparse measurements obtained by our sensor. Our experiments show that our sensor can realize adaptive depth sensing systems. |
Tasks | Calibration |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09545v1 |
https://arxiv.org/pdf/2003.09545v1.pdf | |
PWC | https://paperswithcode.com/paper/a-mems-based-foveating-lidar-to-enable-real |
Repo | |
Framework | |