January 26, 2020

3267 words 16 mins read

Paper Group ANR 1596

Online Learning over Dynamic Graphs via Distributed Proximal Gradient Algorithm. How far are we from quantifying visual attention in mobile HCI?. Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities. Prediction of Physical Load Level by Machine Learning Analysis of Heart Activity after Exercises. On the Utility of M …

Online Learning over Dynamic Graphs via Distributed Proximal Gradient Algorithm


Title	Online Learning over Dynamic Graphs via Distributed Proximal Gradient Algorithm
Authors	Rishabh Dixit, Amrit Singh Bedi, Ketan Rajawat
Abstract	We consider the problem of tracking the minimum of a time-varying convex optimization problem over a dynamic graph. Motivated by target tracking and parameter estimation problems in intermittently connected robotic and sensor networks, the goal is to design a distributed algorithm capable of handling non-differentiable regularization penalties. The proposed proximal online gradient descent algorithm is built to run in a fully decentralized manner and utilizes consensus updates over possibly disconnected graphs. The performance of the proposed algorithm is analyzed by developing bounds on its dynamic regret in terms of the cumulative path length of the time-varying optimum. It is shown that as compared to the centralized case, the dynamic regret incurred by the proposed algorithm over $T$ time slots is worse by a factor of $\log(T)$ only, despite the disconnected and time-varying network topology. The empirical performance of the proposed algorithm is tested on the distributed dynamic sparse recovery problem, where it is shown to incur a dynamic regret that is close to that of the centralized algorithm.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.07018v1
PDF	https://arxiv.org/pdf/1905.07018v1.pdf
PWC	https://paperswithcode.com/paper/online-learning-over-dynamic-graphs-via
Repo
Framework

How far are we from quantifying visual attention in mobile HCI?


Title	How far are we from quantifying visual attention in mobile HCI?
Authors	Mihai Bâce, Sander Staal, Andreas Bulling
Abstract	With an ever-increasing number of mobile devices competing for our attention, quantifying when, how often, or for how long users visually attend to their devices has emerged as a core challenge in mobile human-computer interaction. Encouraged by recent advances in automatic eye contact detection using machine learning and device-integrated cameras, we provide a fundamental investigation into the feasibility of quantifying visual attention during everyday mobile interactions. We identify core challenges and sources of errors associated with sensing attention on mobile devices in the wild, including the impact of face and eye visibility, the importance of robust head pose estimation, and the need for accurate gaze estimation. Based on this analysis, we propose future research directions and discuss how eye contact detection represents the foundation for exciting new applications towards next-generation pervasive attentive user interfaces.
Tasks	Gaze Estimation, Head Pose Estimation, Pose Estimation
Published	2019-07-25
URL	https://arxiv.org/abs/1907.11106v1
PDF	https://arxiv.org/pdf/1907.11106v1.pdf
PWC	https://paperswithcode.com/paper/how-far-are-we-from-quantifying-visual
Repo
Framework

Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities


Title	Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities
Authors	Khe Chai Sim, Françoise Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang, Leif Johnson, Giovanni Motta, Lillian Zhou
Abstract	We study the effectiveness of several techniques to personalize end-to-end speech models and improve the recognition of proper names relevant to the user. These techniques differ in the amounts of user effort required to provide supervision, and are evaluated on how they impact speech recognition performance. We propose using keyword-dependent precision and recall metrics to measure vocabulary acquisition performance. We evaluate the algorithms on a dataset that we designed to contain names of persons that are difficult to recognize. Therefore, the baseline recall rate for proper names in this dataset is very low: 2.4%. A data synthesis approach we developed brings it to 48.6%, with no need for speech input from the user. With speech input, if the user corrects only the names, the name recall rate improves to 64.4%. If the user corrects all the recognition errors, we achieve the best recall of 73.5%. To eliminate the need to upload user data and store personalized models on a server, we focus on performing the entire personalization workflow on a mobile device.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2019-12-14
URL	https://arxiv.org/abs/1912.09251v1
PDF	https://arxiv.org/pdf/1912.09251v1.pdf
PWC	https://paperswithcode.com/paper/personalization-of-end-to-end-speech
Repo
Framework

Prediction of Physical Load Level by Machine Learning Analysis of Heart Activity after Exercises


Title	Prediction of Physical Load Level by Machine Learning Analysis of Heart Activity after Exercises
Authors	Peng Gang, Wei Zeng, Yuri Gordienko, Oleksandr Rokovyi, Oleg Alienin, Sergii Stirenko
Abstract	The assessment of energy expenditure in real life is of great importance for monitoring the current physical state of people, especially in work, sport, elderly care, health care, and everyday life even. This work reports about application of some machine learning methods (linear regression, linear discriminant analysis, k-nearest neighbors, decision tree, random forest, Gaussian naive Bayes, support-vector machine) for monitoring energy expenditures in athletes. The classification problem was to predict the known level of the in-exercise loads (in three categories by calories) by the heart rate activity features measured during the short period of time (1 minute only) after training, i.e by features of the post-exercise load. The results obtained shown that the post-exercise heart activity features preserve the information of the in-exercise training loads and allow us to predict their actual in-exercise levels. The best performance can be obtained by the random forest classifier with all 8 heart rate features (micro-averaged area under curve value AUCmicro = 0.87 and macro-averaged one AUCmacro = 0.88) and the k-nearest neighbors classifier with 4 most important heart rate features (AUCmicro = 0.91 and AUCmacro = 0.89). The limitations and perspectives of the ML methods used are outlined, and some practical advices are proposed as to their improvement and implementation for the better prediction of in-exercise energy expenditures.
Tasks
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09848v1
PDF	https://arxiv.org/pdf/1912.09848v1.pdf
PWC	https://paperswithcode.com/paper/prediction-of-physical-load-level-by-machine
Repo
Framework

On the Utility of Model Learning in HRI


Title	On the Utility of Model Learning in HRI
Authors	Rohan Choudhury, Gokul Swamy, Dylan Hadfield-Menell, Anca Dragan
Abstract	Fundamental to robotics is the debate between model-based and model-free learning: should the robot build an explicit model of the world, or learn a policy directly? In the context of HRI, part of the world to be modeled is the human. One option is for the robot to treat the human as a black box and learn a policy for how they act directly. But it can also model the human as an agent, and rely on a “theory of mind” to guide or bias the learning (grey box). We contribute a characterization of the performance of these methods under the optimistic case of having an ideal theory of mind, as well as under different scenarios in which the assumptions behind the robot’s theory of mind for the human are wrong, as they inevitably will be in practice. We find that there is a significant sample complexity advantage to theory of mind methods and that they are more robust to covariate shift, but that when enough interaction data is available, black box approaches eventually dominate.
Tasks
Published	2019-01-04
URL	http://arxiv.org/abs/1901.01291v1
PDF	http://arxiv.org/pdf/1901.01291v1.pdf
PWC	https://paperswithcode.com/paper/on-the-utility-of-model-learning-in-hri
Repo
Framework

Enhancing Object Detection in Adverse Conditions using Thermal Imaging


Title	Enhancing Object Detection in Adverse Conditions using Thermal Imaging
Authors	Kshitij Agrawal, Anbumani Subramanian
Abstract	Autonomous driving relies on deriving understanding of objects and scenes through images. These images are often captured by sensors in the visible spectrum. For improved detection capabilities we propose the use of thermal sensors to augment the vision capabilities of an autonomous vehicle. In this paper, we present our investigations on the fusion of visible and thermal spectrum images using a publicly available dataset, and use it to analyze the performance of object recognition on other known driving datasets. We present an comparison of object detection in night time imagery and qualitatively demonstrate that thermal images significantly improve detection accuracy.
Tasks	Autonomous Driving, Object Detection, Object Recognition
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13551v1
PDF	https://arxiv.org/pdf/1909.13551v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-object-detection-in-adverse
Repo
Framework

Instantiation-Net: 3D Mesh Reconstruction from Single 2D Image for Right Ventricle


Title	Instantiation-Net: 3D Mesh Reconstruction from Single 2D Image for Right Ventricle
Authors	Zhao-Yang Wang, Xiao-Yun Zhou, Peichao Li, Celia Riga, Guang-Zhong Yang
Abstract	3D shape instantiation which reconstructs the 3D shape of a target from limited 2D images or projections is an emerging technique for surgical intervention. It improves the currently less-informative and insufficient 2D navigation schemes for robot-assisted Minimally Invasive Surgery (MIS) to 3D navigation. Previously, a general and registration-free framework was proposed for 3D shape instantiation based on Kernel Partial Least Square Regression (KPLSR), requiring manually segmented anatomical structures as the pre-requisite. Two hyper-parameters including the Gaussian width and component number also need to be carefully adjusted. Deep Convolutional Neural Network (DCNN) based framework has also been proposed to reconstruct a 3D point cloud from a single 2D image, with end-to-end and fully automatic learning. In this paper, an Instantiation-Net is proposed to reconstruct the 3D mesh of a target from its a single 2D image, by using DCNN to extract features from the 2D image and Graph Convolutional Network (GCN) to reconstruct the 3D mesh, and using Fully Connected (FC) layers to connect the DCNN to GCN. Detailed validation was performed to demonstrate the practical strength of the method and its potential clinical use.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.08986v1
PDF	https://arxiv.org/pdf/1909.08986v1.pdf
PWC	https://paperswithcode.com/paper/instantiation-net-3d-mesh-reconstruction-from
Repo
Framework

S-TRIGGER: Continual State Representation Learning via Self-Triggered Generative Replay


Title	S-TRIGGER: Continual State Representation Learning via Self-Triggered Generative Replay
Authors	Hugo Caselles-Dupré, Michael Garcia-Ortiz, David Filliat
Abstract	We consider the problem of building a state representation model for control, in a continual learning setting. As the environment changes, the aim is to efficiently compress the sensory state’s information without losing past knowledge, and then use Reinforcement Learning on the resulting features for efficient policy learning. To this end, we propose S-TRIGGER, a general method for Continual State Representation Learning applicable to Variational Auto-Encoders and its many variants. The method is based on Generative Replay, i.e. the use of generated samples to maintain past knowledge. It comes along with a statistically sound method for environment change detection, which self-triggers the Generative Replay. Our experiments on VAEs show that S-TRIGGER learns state representations that allows fast and high-performing Reinforcement Learning, while avoiding catastrophic forgetting. The resulting system is capable of autonomously learning new information without using past data and with a bounded system size. Code for our experiments is attached in Appendix.
Tasks	Continual Learning, Representation Learning
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09434v1
PDF	http://arxiv.org/pdf/1902.09434v1.pdf
PWC	https://paperswithcode.com/paper/s-trigger-continual-state-representation
Repo
Framework

Semi Few-Shot Attribute Translation


Title	Semi Few-Shot Attribute Translation
Authors	Ricard Durall, Franz-Josef Pfreundt, Janis Keuper
Abstract	Recent studies have shown remarkable success in image-to-image translation for attribute transfer applications. However, most of existing approaches are based on deep learning and require an abundant amount of labeled data to produce good results, therefore limiting their applicability. In the same vein, recent advances in meta-learning have led to successful implementations with limited available data, allowing so-called few-shot learning. In this paper, we address this limitation of supervised methods, by proposing a novel approach based on GANs. These are trained in a meta-training manner, which allows them to perform image-to-image translations using just a few labeled samples from a new target class. This work empirically demonstrates the potential of training a GAN for few shot image-to-image translation on hair color attribute synthesis tasks, opening the door to further research on generative transfer learning.
Tasks	Few-Shot Learning, Image-to-Image Translation, Meta-Learning, Transfer Learning
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03240v2
PDF	https://arxiv.org/pdf/1910.03240v2.pdf
PWC	https://paperswithcode.com/paper/semi-few-shot-attribute-translation
Repo
Framework


Title	Swarm Behaviour Evolution via Rule Sharing and Novelty Search
Authors	Phillip Smith, Robert Hunjet, Aldeida Aleti, Asad Khan
Abstract	We present in this paper an exertion of our previous work by increasing the robustness and coverage of the evolution search via hybridisation with a state-of-the-art novelty search and accelerate the individual agent behaviour searches via a novel behaviour-component sharing technique. Via these improvements, we present Swarm Learning Classifier System 2.0 (SLCS2), a behaviour evolving algorithm which is robust to complex environments, and seen to out-perform a human behaviour designer in challenging cases of the data-transfer task in a range of environmental conditions. Additionally, we examine the impact of tailoring the SLCS2 rule generator for specific environmental conditions. We find this leads to over-fitting, as might be expected, and thus conclude that for greatest environment flexibility a general rule generator should be utilised.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12412v1
PDF	https://arxiv.org/pdf/1910.12412v1.pdf
PWC	https://paperswithcode.com/paper/swarm-behaviour-evolution-via-rule-sharing
Repo
Framework

Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller


Title	Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller
Authors	Pratyusha Sharma, Deepak Pathak, Abhinav Gupta
Abstract	We study a generalized setup for learning from demonstration to build an agent that can manipulate novel objects in unseen scenarios by looking at only a single video of human demonstration from a third-person perspective. To accomplish this goal, our agent should not only learn to understand the intent of the demonstrated third-person video in its context but also perform the intended task in its environment configuration. Our central insight is to enforce this structure explicitly during learning by decoupling what to achieve (intended task) from how to perform it (controller). We propose a hierarchical setup where a high-level module learns to generate a series of first-person sub-goals conditioned on the third-person video demonstration, and a low-level controller predicts the actions to achieve those sub-goals. Our agent acts from raw image observations without any access to the full state information. We show results on a real robotic platform using Baxter for the manipulation tasks of pouring and placing objects in a box. Project video and code are at https://pathak22.github.io/hierarchical-imitation/
Tasks	Imitation Learning
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09676v1
PDF	https://arxiv.org/pdf/1911.09676v1.pdf
PWC	https://paperswithcode.com/paper/third-person-visual-imitation-learning-via-1
Repo
Framework

Approximated Orthonormal Normalisation in Training Neural Networks


Title	Approximated Orthonormal Normalisation in Training Neural Networks
Authors	Guoqiang Zhang, Kenta Niwa, W. B. Kleijn
Abstract	Generalisation of a deep neural network (DNN) is one major concern when employing the deep learning approach for solving practical problems. In this paper we propose a new technique, named approximated orthonormal normalisation (AON), to improve the generalisation capacity of a DNN model. Considering a weight matrix W from a particular neural layer in the model, our objective is to design a function h(W) such that its row vectors are approximately orthogonal to each other while allowing the DNN model to fit the training data sufficiently accurate. By doing so, it would avoid co-adaptation among neurons of the same layer to be able to improve network-generalisation capacity. Specifically, at each iteration, we first approximate (WW^T)^(-1/2) using its Taylor expansion before multiplying the matrix W. After that, the matrix product is then normalised by applying the spectral normalisation (SN) technique to obtain h(W). Conceptually speaking, AON is designed to turn orthonormal regularisation into orthonormal normalisation to avoid manual balancing the original and penalty functions. Experimental results show that AON yields promising validation performance compared to orthonormal regularisation.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09445v2
PDF	https://arxiv.org/pdf/1911.09445v2.pdf
PWC	https://paperswithcode.com/paper/approximated-orthonormal-normalisation-in
Repo
Framework

Edge Dithering for Robust Adaptive Graph Convolutional Networks


Title	Edge Dithering for Robust Adaptive Graph Convolutional Networks
Authors	Vassilis N. Ioannidis, Georgios B. Giannakis
Abstract	Graph convolutional networks (GCNs) are vulnerable to perturbations of the graph structure that are either random, or, adversarially designed. The perturbed links modify the graph neighborhoods, which critically affects the performance of GCNs in semi-supervised learning (SSL) tasks. Aiming at robustifying GCNs conditioned on the perturbed graph, the present paper generates multiple auxiliary graphs, each having its binary 0-1 edge weights flip values with probabilities designed to enhance robustness. The resultant edge-dithered auxiliary graphs are leveraged by an adaptive (A)GCN that performs SSL. Robustness is enabled through learnable graph-combining weights along with suitable regularizers. Relative to GCN, the novel AGCN achieves markedly improved performance in tests with noisy inputs, graph perturbations, and state-of-the-art adversarial attacks. Further experiments with protein interaction networks showcase the competitive performance of AGCN for SSL over multiple graphs.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09590v1
PDF	https://arxiv.org/pdf/1910.09590v1.pdf
PWC	https://paperswithcode.com/paper/edge-dithering-for-robust-adaptive-graph
Repo
Framework

Disentangled Cumulants Help Successor Representations Transfer to New Tasks


Title	Disentangled Cumulants Help Successor Representations Transfer to New Tasks
Authors	Christopher Grimm, Irina Higgins, Andre Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, Satinder Singh
Abstract	Biological intelligence can learn to solve many diverse tasks in a data efficient manner by re-using basic knowledge and skills from one task to another. Furthermore, many of such skills are acquired without explicit supervision in an intrinsically driven fashion. This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch and struggle with knowledge transfer. In this paper we propose a principled way to learn a basis set of policies, which, when recombined through generalised policy improvement, come with guarantees on the coverage of the final task space. In particular, we concentrate on solving goal-based downstream tasks where the execution order of actions is not important. We demonstrate both theoretically and empirically that learning a small number of policies that reach intrinsically specified goal regions in a disentangled latent space can be re-used to quickly achieve a high level of performance on an exponentially larger number of externally specified, often significantly more complex downstream tasks. Our learning pipeline consists of two stages. First, the agent learns to perform intrinsically generated, goal-based tasks in the total absence of environmental rewards. Second, the agent leverages this experience to quickly achieve a high level of performance on numerous diverse externally specified tasks.
Tasks	Transfer Learning
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10866v1
PDF	https://arxiv.org/pdf/1911.10866v1.pdf
PWC	https://paperswithcode.com/paper/disentangled-cumulants-help-successor-1
Repo
Framework

Dynamic survival prediction in intensive care units from heterogeneous time series without the need for variable selection or pre-processing


Title	Dynamic survival prediction in intensive care units from heterogeneous time series without the need for variable selection or pre-processing
Authors	Jacob Deasy, Pietro Liò, Ari Ercole
Abstract	We present a machine learning pipeline and model that uses the entire uncurated EHR for prediction of in-hospital mortality at arbitrary time intervals, using all available chart, lab and output events, without the need for pre-processing or feature engineering. Data for more than 45,000 American ICU patients from the MIMIC-III database were used to develop an ICU mortality prediction model. All chart, lab and output events were treated by the model in the same manner inspired by Natural Language Processing (NLP). Patient events were discretized by percentile and mapped to learnt embeddings before being passed to a Recurrent Neural Network (RNN) to provide early prediction of in-patient mortality risk. We compared mortality predictions with the Simplified Acute Physiology Score II (SAPS II) and the Oxford Acute Severity of Illness Score (OASIS). Data were split into an independent test set (10%) and a ten-fold cross-validation was carried out during training to avoid overfitting. 13,233 distinct variables with heterogeneous data types were included without manual selection or pre-processing. Recordings in the first few hours of a patient’s stay were found to be strongly predictive of mortality, outperforming models using SAPS II and OASIS scores within just 2 hours and achieving a state of the art Area Under the Receiver Operating Characteristic (AUROC) value of 0.80 (95% CI 0.79-0.80) at 12 hours vs 0.70 and 0.66 for SAPS II and OASIS at 24 hours respectively. Our model achieves a very strong performance of AUROC 0.86 (95% CI 0.85-0.86) for in-patient mortality prediction after 48 hours on the MIMIC-III dataset. Predictive performance increases over the first 48 hours of the ICU stay, but suffers from diminishing returns, providing rationale for time-limited trials of critical care and suggesting that the timing of decision making can be optimised and individualised.
Tasks	Decision Making, Feature Engineering, Mortality Prediction, Time Series
Published	2019-09-13
URL	https://arxiv.org/abs/1909.07214v2
PDF	https://arxiv.org/pdf/1909.07214v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-survival-prediction-in-intensive-care
Repo
Framework