January 29, 2020

3506 words 17 mins read

Paper Group ANR 639

Influence of Pointing on Learning to Count: A Neuro-Robotics Model. A Linearly Constrained Nonparametric Framework for Imitation Learning. Driver Distraction Identification with an Ensemble of Convolutional Neural Networks. Improving Human Annotation in Single Object Tracking. Correlation Priors for Reinforcement Learning. Modeling German Verb Argu …

Influence of Pointing on Learning to Count: A Neuro-Robotics Model


Title	Influence of Pointing on Learning to Count: A Neuro-Robotics Model
Authors	Leszek Pecyna, Angelo Cangelosi
Abstract	In this paper a neuro-robotics model capable of counting using gestures is introduced. The contribution of gestures to learning to count is tested with various model and training conditions. Two studies were presented in this article. In the first, we combine different modalities of the robot’s neural network, in the second, a novel training procedure for it is proposed. The model is trained with pointing data from an iCub robot simulator. The behaviour of the model is in line with that of human children in terms of performance change depending on gesture production.
Tasks
Published	2019-07-09
URL	https://arxiv.org/abs/1907.05269v1
PDF	https://arxiv.org/pdf/1907.05269v1.pdf
PWC	https://paperswithcode.com/paper/influence-of-pointing-on-learning-to-count-a
Repo
Framework

A Linearly Constrained Nonparametric Framework for Imitation Learning


Title	A Linearly Constrained Nonparametric Framework for Imitation Learning
Authors	Yanlong Huang, Darwin G. Caldwell
Abstract	In recent years, a myriad of advanced results have been reported in the community of imitation learning, ranging from parametric to non-parametric, probabilistic to non-probabilistic and Bayesian to frequentist approaches. Meanwhile, ample applications (e.g., grasping tasks and human-robot collaborations) further show the applicability of imitation learning in a wide range of domains. While numerous literature is dedicated to the learning of human skills in unconstrained environment, the problem of learning constrained motor skills, however, has not received equal attention yet. In fact, constrained skills exist widely in robotic systems. For instance, when a robot is demanded to write letters on a board, its end-effector trajectory must comply with the plane constraint from the board. In this paper, we aim to tackle the problem of imitation learning with linear constraints. Specifically, we propose to exploit the probabilistic properties of multiple demonstrations, and subsequently incorporate them into a linearly constrained optimization problem, which finally leads to a non-parametric solution. In addition, a connection between our framework and the classical model predictive control is provided. Several examples including simulated writing and locomotion tasks are presented to show the effectiveness of our framework.
Tasks	Imitation Learning
Published	2019-09-15
URL	https://arxiv.org/abs/1909.07374v1
PDF	https://arxiv.org/pdf/1909.07374v1.pdf
PWC	https://paperswithcode.com/paper/a-linearly-constrained-nonparametric
Repo
Framework

Driver Distraction Identification with an Ensemble of Convolutional Neural Networks


Title	Driver Distraction Identification with an Ensemble of Convolutional Neural Networks
Authors	Hesham M. Eraqi, Yehya Abouelnaga, Mohamed H. Saad, Mohamed N. Moustafa
Abstract	The World Health Organization (WHO) reported 1.25 million deaths yearly due to road traffic accidents worldwide and the number has been continuously increasing over the last few years. Nearly fifth of these accidents are caused by distracted drivers. Existing work of distracted driver detection is concerned with a small set of distractions (mostly, cell phone usage). Unreliable ad-hoc methods are often used.In this paper, we present the first publicly available dataset for driver distraction identification with more distraction postures than existing alternatives. In addition, we propose a reliable deep learning-based solution that achieves a 90% accuracy. The system consists of a genetically-weighted ensemble of convolutional neural networks, we show that a weighted ensemble of classifiers using a genetic algorithm yields in a better classification confidence. We also study the effect of different visual elements in distraction detection by means of face and hand localizations, and skin segmentation. Finally, we present a thinned version of our ensemble that could achieve 84.64% classification accuracy and operate in a real-time environment.
Tasks
Published	2019-01-22
URL	http://arxiv.org/abs/1901.09097v1
PDF	http://arxiv.org/pdf/1901.09097v1.pdf
PWC	https://paperswithcode.com/paper/driver-distraction-identification-with-an
Repo
Framework

Improving Human Annotation in Single Object Tracking


Title	Improving Human Annotation in Single Object Tracking
Authors	Yu Pang, Xinyi Li, Lin Yuan, Haibin Ling
Abstract	Human annotation is always considered as ground truth in video object tracking tasks. It is used in both training and evaluation purposes. Thus, ensuring its high quality is an important task for the success of trackers and evaluations between them. In this paper, we give a qualitative and quantitative analysis of the existing human annotations. We show that human annotation tends to be non-smooth and is prone to partial visibility and deformation. We propose a smoothing trajectory strategy with the ability to handle moving scenes. We use a two-step adaptive image alignment algorithm to find the canonical view of the video sequence. We then use different techniques to smooth the trajectories at certain degree. Once we convert back to the original image coordination, we can compare with the human annotation. With the experimental results, we can get more consistent trajectories. At a certain degree, it can also slightly improve the trained model. If go beyond a certain threshold, the smoothing error will start eating up the benefit. Overall, our method could help extrapolate the missing annotation frames or identify and correct human annotation outliers as well as help improve the training data quality.
Tasks	Object Tracking, Video Object Tracking
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02807v1
PDF	https://arxiv.org/pdf/1911.02807v1.pdf
PWC	https://paperswithcode.com/paper/improving-human-annotation-in-single-object
Repo
Framework

Correlation Priors for Reinforcement Learning


Title	Correlation Priors for Reinforcement Learning
Authors	Bastian Alt, Adrian Šošić, Heinz Koeppl
Abstract	Many decision-making problems naturally exhibit pronounced structures inherited from the characteristics of the underlying environment. In a Markov decision process model, for example, two distinct states can have inherently related semantics or encode resembling physical state configurations. This often implies locally correlated transition dynamics among the states. In order to complete a certain task in such environments, the operating agent usually needs to execute a series of temporally and spatially correlated actions. Though there exists a variety of approaches to capture these correlations in continuous state-action domains, a principled solution for discrete environments is missing. In this work, we present a Bayesian learning framework based on P'olya-Gamma augmentation that enables an analogous reasoning in such cases. We demonstrate the framework on a number of common decision-making related problems, such as imitation learning, subgoal extraction, system identification and Bayesian reinforcement learning. By explicitly modeling the underlying correlation structures of these problems, the proposed approach yields superior predictive performance compared to correlation-agnostic models, even when trained on data sets that are an order of magnitude smaller in size.
Tasks	Decision Making, Imitation Learning
Published	2019-09-11
URL	https://arxiv.org/abs/1909.05106v2
PDF	https://arxiv.org/pdf/1909.05106v2.pdf
PWC	https://paperswithcode.com/paper/correlation-priors-for-reinforcement-learning
Repo
Framework

Modeling German Verb Argument Structures: LSTMs vs. Humans


Title	Modeling German Verb Argument Structures: LSTMs vs. Humans
Authors	Charlotte Rochereau, Benoît Sagot, Emmanuel Dupoux
Abstract	LSTMs have proven very successful at language modeling. However, it remains unclear to what extent they are able to capture complex morphosyntactic structures. In this paper, we examine whether LSTMs are sensitive to verb argument structures. We introduce a German grammaticality dataset in which ungrammatical sentences are constructed by manipulating case assignments (eg substituting nominative by accusative or dative). We find that LSTMs are better than chance in detecting incorrect argument structures and slightly worse than humans tested on the same dataset. Surprisingly, LSTMs are contaminated by heuristics not found in humans like a preference toward nominative noun phrases. In other respects they show human-similar results like biases for particular orders of case assignments.
Tasks	Language Modelling
Published	2019-11-30
URL	https://arxiv.org/abs/1912.00239v1
PDF	https://arxiv.org/pdf/1912.00239v1.pdf
PWC	https://paperswithcode.com/paper/modeling-german-verb-argument-structures
Repo
Framework

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition


Title	Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
Authors	Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu
Abstract	In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition. Specifically, initialized with a RNN-T trained model, MBR training is conducted via minimizing the expected edit distance between the reference label sequence and on-the-fly generated N-best hypothesis. We also introduce a heuristic to incorporate an external neural network language model (NNLM) in RNN-T beam search decoding and explore MBR training with the external NNLM. Experimental results demonstrate an MBR trained model outperforms a RNN-T trained model substantially and further improvements can be achieved if trained with an external NNLM. Our best MBR trained system achieves absolute character error rate (CER) reductions of 1.2% and 0.5% on read and spontaneous Mandarin speech respectively over a strong convolution and transformer based RNN-T baseline trained on ~21,000 hours of speech.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12487v1
PDF	https://arxiv.org/pdf/1911.12487v1.pdf
PWC	https://paperswithcode.com/paper/minimum-bayes-risk-training-of-rnn-transducer
Repo
Framework

Variational Inference with Latent Space Quantization for Adversarial Resilience


Title	Variational Inference with Latent Space Quantization for Adversarial Resilience
Authors	Vinay Kyatham, Mayank Mishra, Tarun Kumar Yadav, Deepak Mishra, Prathosh AP
Abstract	Despite their tremendous success in modelling high-dimensional data manifolds, deep neural networks suffer from the threat of adversarial attacks - Existence of perceptually valid input-like samples obtained through careful perturbation that lead to degradation in the performance of the underlying model. Major concerns with existing defense mechanisms include non-generalizability across different attacks, models and large inference time. In this paper, we propose a generalized defense mechanism capitalizing on the expressive power of regularized latent space based generative models. We design an adversarial filter, devoid of access to classifier and adversaries, which makes it usable in tandem with any classifier. The basic idea is to learn a Lipschitz constrained mapping from the data manifold, incorporating adversarial perturbations, to a quantized latent space and re-map it to the true data manifold. Specifically, we simultaneously auto-encode the data manifold and its perturbations implicitly through the perturbations of the regularized and quantized generative latent space, realized using variational inference. We demonstrate the efficacy of the proposed formulation in providing resilience against multiple attack types (black and white box) and methods, while being almost real-time. Our experiments show that the proposed method surpasses the state-of-the-art techniques in several cases.
Tasks	Quantization
Published	2019-03-24
URL	https://arxiv.org/abs/1903.09940v2
PDF	https://arxiv.org/pdf/1903.09940v2.pdf
PWC	https://paperswithcode.com/paper/variational-inference-with-latent-space
Repo
Framework

Digital Twin approach to Clinical DSS with Explainable AI


Title	Digital Twin approach to Clinical DSS with Explainable AI
Authors	Dattaraj Jagdish Rao, Shraddha Mane
Abstract	We propose a digital twin approach to improve healthcare decision support systems with a combination of domain knowledge and data. Domain knowledge helps build decision thresholds that doctors can use to determine a risk or recommend a treatment or test based on the specific patient condition. However, these assessments tend to be highly subjective and differ from doctor to doctor and from patient to patient. We propose a system where we collate this subjective risk by compiling data from different doctors treating different patients and build a machine learning model that learns from this knowledge. Then using state-of-the-art explainability concepts we derive explanations from this model. These explanations give us a summary of different doctor domain knowledge applied in different cases to give a more generic perspective. Also these explanations are specific to a particular patient and are customized for their condition. This is a form of a digital twin for the patient that can now be used to enhance decision boundaries for earlier defined decision tables that help in diagnosis. We will show an example of running this analysis for a liver disease risk diagnosis.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.13520v1
PDF	https://arxiv.org/pdf/1910.13520v1.pdf
PWC	https://paperswithcode.com/paper/digital-twin-approach-to-clinical-dss-with
Repo
Framework

On the Interaction Between Deep Detectors and Siamese Trackers in Video Surveillance


Title	On the Interaction Between Deep Detectors and Siamese Trackers in Video Surveillance
Authors	Madhu Kiran, Vivek Tiwari, Le Thanh Nguyen-Meidine, Eric Granger
Abstract	Visual object tracking is an important function in many real-time video surveillance applications, such as localization and spatio-temporal recognition of persons. In real-world applications, an object detector and tracker must interact on a periodic basis to discover new objects, and thereby to initiate tracks. Periodic interactions with the detector can also allow the tracker to validate and/or update its object template with new bounding boxes. However, bounding boxes provided by a state-of-the-art detector are noisy, due to changes in appearance, background and occlusion, which can cause the tracker to drift. Moreover, CNN-based detectors can provide a high level of accuracy at the expense of computational complexity, so interactions should be minimized for real-time applications. In this paper, a new approach is proposed to manage detector-tracker interactions for trackers from the Siamese-FC family. By integrating a change detection mechanism into a deep Siamese-FC tracker, its template can be adapted in response to changes in a target’s appearance that lead to drifts during tracking. An abrupt change detection triggers an update of tracker template using the bounding box produced by the detector, while in the case of a gradual change, the detector is used to update an evolving set of templates for robust matching. Experiments were performed using state-of-the-art Siamese-FC trackers and the YOLOv3 detector on a subset of videos from the OTB-100 dataset that mimic video surveillance scenarios. Results highlight the importance for reliable VOT of using accurate detectors. They also indicate that our adaptive Siamese trackers are robust to noisy object detections, and can significantly improve the performance of Siamese-FC tracking.
Tasks	Object Tracking, Visual Object Tracking
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14552v1
PDF	https://arxiv.org/pdf/1910.14552v1.pdf
PWC	https://paperswithcode.com/paper/on-the-interaction-between-deep-detectors-and
Repo
Framework

SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling


Title	SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling
Authors	Huyen Nguyen
Abstract	With language modeling becoming the popular base task for unsupervised representation learning in Natural Language Processing, it is important to come up with new architectures and techniques for faster and better training of language models. However, due to a peculiarity of languages – the larger the dataset, the higher the average number of times a word appears in that dataset – datasets of different sizes have very different properties. Architectures performing well on small datasets might not perform well on larger ones. For example, LSTM models perform well on WikiText-2 but poorly on WikiText-103, while Transformer models perform well on WikiText-103 but not on WikiText-2. For setups like architectural search, this is a challenge since it is prohibitively costly to run a search on the full dataset but it is not indicative to experiment on smaller ones. In this paper, we introduce SimpleBooks, a small dataset with the average word frequency as high as that of much larger ones. Created from 1,573 Gutenberg books with the highest ratio of word-level book length to vocabulary size, SimpleBooks contains 92M word-level tokens, on par with WikiText-103 (103M tokens), but has the vocabulary of 98K, a third of WikiText-103’s. SimpleBooks can be downloaded from https://dldata-public.s3.us-east-2.amazonaws.com/simplebooks.zip.
Tasks	Language Modelling, Representation Learning, Unsupervised Representation Learning
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12391v1
PDF	https://arxiv.org/pdf/1911.12391v1.pdf
PWC	https://paperswithcode.com/paper/simplebooks-long-term-dependency-book-dataset
Repo
Framework

Data assimilation in a nonlinear time-delayed dynamical system


Title	Data assimilation in a nonlinear time-delayed dynamical system
Authors	Tullio Traverso, Luca Magri
Abstract	When the heat released by a flame is sufficiently in phase with the acoustic pressure, a self-excited thermoacoustic oscillation can arise. These nonlinear oscillations are one of the biggest challenges faced in the design of safe and reliable gas turbines and rocket motors. In the worst-case scenario, uncontrolled thermoacoustic oscillations can shake an engine apart. Reduced-order thermoacoustic models, which are nonlinear and time-delayed, can only qualitatively predict thermoacoustic oscillations. To make reduced-order models quantitatively predictive, we develop a data assimilation framework for state estimation. We numerically estimate the most likely nonlinear state of a Galerkin-discretized time delayed model of a horizontal Rijke tube, which is a prototypical combustor. Data assimilation is an optimal blending of observations with previous state estimates (background) to produce optimal initial conditions. A cost functional is defined to measure the statistical distance between the model output and the measurements from experiments; and the distance between the initial conditions and the background knowledge. Its minimum corresponds to the optimal state, which is computed by Lagrangian optimization with the aid of adjoint equations. We study the influence of the number of Galerkin modes, which are the natural acoustic modes of the duct, with which the model is discretized. We show that decomposing the measured pressure signal in a finite number of modes is an effective way to enhance state estimation, especially when nonlinear modal interactions occur during the assimilation window. This work represents the first application of data assimilation to nonlinear thermoacoustics, which opens up new possibilities for real-time calibration of reduced-order models with experimental measurements.
Tasks	Calibration
Published	2019-04-09
URL	http://arxiv.org/abs/1904.05163v1
PDF	http://arxiv.org/pdf/1904.05163v1.pdf
PWC	https://paperswithcode.com/paper/data-assimilation-in-a-nonlinear-time-delayed
Repo
Framework

On the Resilience of Deep Learning for Reduced-voltage FPGAs


Title	On the Resilience of Deep Learning for Reduced-voltage FPGAs
Authors	Kamyar Givaki, Behzad Salami, Reza Hojabr, S. M. Reza Tayaranian, Ahmad Khonsari, Dara Rahmati, Saeid Gorgin, Adrian Cristal, Osman S. Unsal
Abstract	Deep Neural Networks (DNNs) are inherently computation-intensive and also power-hungry. Hardware accelerators such as Field Programmable Gate Arrays (FPGAs) are a promising solution that can satisfy these requirements for both embedded and High-Performance Computing (HPC) systems. In FPGAs, as well as CPUs and GPUs, aggressive voltage scaling below the nominal level is an effective technique for power dissipation minimization. Unfortunately, bit-flip faults start to appear as the voltage is scaled down closer to the transistor threshold due to timing issues, thus creating a resilience issue. This paper experimentally evaluates the resilience of the training phase of DNNs in the presence of voltage underscaling related faults of FPGAs, especially in on-chip memories. Toward this goal, we have experimentally evaluated the resilience of LeNet-5 and also a specially designed network for CIFAR-10 dataset with different activation functions of Rectified Linear Unit (Relu) and Hyperbolic Tangent (Tanh). We have found that modern FPGAs are robust enough in extremely low-voltage levels and that low-voltage related faults can be automatically masked within the training iterations, so there is no need for costly software- or hardware-oriented fault mitigation techniques like ECC. Approximately 10% more training iterations are needed to fill the gap in the accuracy. This observation is the result of the relatively low rate of undervolting faults, i.e., <0.1%, measured on real FPGA fabrics. We have also increased the fault rate significantly for the LeNet-5 network by randomly generated fault injection campaigns and observed that the training accuracy starts to degrade. When the fault rate increases, the network with Tanh activation function outperforms the one with Relu in terms of accuracy, e.g., when the fault rate is 30% the accuracy difference is 4.92%.
Tasks
Published	2019-12-26
URL	https://arxiv.org/abs/2001.00053v1
PDF	https://arxiv.org/pdf/2001.00053v1.pdf
PWC	https://paperswithcode.com/paper/on-the-resilience-of-deep-learning-for
Repo
Framework

Censored Quantile Regression Forests


Title	Censored Quantile Regression Forests
Authors	Alexander Hanbo Li, Jelena Bradic
Abstract	Random forests are powerful non-parametric regression method but are severely limited in their usage in the presence of randomly censored observations, and naively applied can exhibit poor predictive performance due to the incurred biases. Based on a local adaptive representation of random forests, we develop its regression adjustment for randomly censored regression quantile models. Regression adjustment is based on new estimating equations that adapt to censoring and lead to quantile score whenever the data do not exhibit censoring. The proposed procedure named censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption. We establish its consistency under mild model specifications. Numerical studies showcase a clear advantage of the proposed procedure.
Tasks
Published	2019-02-08
URL	http://arxiv.org/abs/1902.03327v1
PDF	http://arxiv.org/pdf/1902.03327v1.pdf
PWC	https://paperswithcode.com/paper/censored-quantile-regression-forests
Repo
Framework

A low-power end-to-end hybrid neuromorphic framework for surveillance applications


Title	A low-power end-to-end hybrid neuromorphic framework for surveillance applications
Authors	Andres Ussa, Luca Della Vedova, Vandana Reddy Padala, Deepak Singla, Jyotibdha Acharya, Charles Zhang Lei, Garrick Orchard, Arindam Basu, Bharath Ramesh
Abstract	With the success of deep learning, object recognition systems that can be deployed for real-world applications are becoming commonplace. However, inference that needs to largely take place on the `edge’ (not processed on servers), is a highly computational and memory intensive workload, making it intractable for low-power mobile nodes and remote security applications. To address this challenge, this paper proposes a low-power (5W) end-to-end neuromorphic framework for object tracking and classification using event-based cameras that possess desirable properties such as low power consumption (5-14 mW) and high dynamic range (120 dB). Nonetheless, unlike traditional approaches of using event-by-event processing, this work uses a mixed frame and event approach to get energy savings with high performance. Using a frame-based region proposal method based on the density of foreground events, a hardware-friendly object tracking is implemented using the apparent object velocity while tackling occlusion scenarios. For low-power classification of the tracked objects, the event camera is interfaced to IBM TrueNorth, which is time-multiplexed to tackle up to eight instances for a traffic monitoring application. The frame-based object track input is converted back to spikes for Truenorth classification via the energy efficient deep network (EEDN) pipeline. Using originally collected datasets, we train the TrueNorth model on the hardware track outputs, instead of using ground truth object locations as commonly done, and demonstrate the efficacy of our system to handle practical surveillance scenarios. Finally, we compare the proposed methodologies to state-of-the-art event-based systems for object tracking and classification, and demonstrate the use case of our neuromorphic approach for low-power applications without sacrificing on performance. \|
Tasks	Object Recognition, Object Tracking
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09806v3
PDF	https://arxiv.org/pdf/1910.09806v3.pdf
PWC	https://paperswithcode.com/paper/a-low-power-end-to-end-hybrid-neuromorphic
Repo
Framework