Paper Group ANR 639
Influence of Pointing on Learning to Count: A Neuro-Robotics Model. A Linearly Constrained Nonparametric Framework for Imitation Learning. Driver Distraction Identification with an Ensemble of Convolutional Neural Networks. Improving Human Annotation in Single Object Tracking. Correlation Priors for Reinforcement Learning. Modeling German Verb Argu …
Influence of Pointing on Learning to Count: A Neuro-Robotics Model
Title | Influence of Pointing on Learning to Count: A Neuro-Robotics Model |
Authors | Leszek Pecyna, Angelo Cangelosi |
Abstract | In this paper a neuro-robotics model capable of counting using gestures is introduced. The contribution of gestures to learning to count is tested with various model and training conditions. Two studies were presented in this article. In the first, we combine different modalities of the robot’s neural network, in the second, a novel training procedure for it is proposed. The model is trained with pointing data from an iCub robot simulator. The behaviour of the model is in line with that of human children in terms of performance change depending on gesture production. |
Tasks | |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.05269v1 |
https://arxiv.org/pdf/1907.05269v1.pdf | |
PWC | https://paperswithcode.com/paper/influence-of-pointing-on-learning-to-count-a |
Repo | |
Framework | |
A Linearly Constrained Nonparametric Framework for Imitation Learning
Title | A Linearly Constrained Nonparametric Framework for Imitation Learning |
Authors | Yanlong Huang, Darwin G. Caldwell |
Abstract | In recent years, a myriad of advanced results have been reported in the community of imitation learning, ranging from parametric to non-parametric, probabilistic to non-probabilistic and Bayesian to frequentist approaches. Meanwhile, ample applications (e.g., grasping tasks and human-robot collaborations) further show the applicability of imitation learning in a wide range of domains. While numerous literature is dedicated to the learning of human skills in unconstrained environment, the problem of learning constrained motor skills, however, has not received equal attention yet. In fact, constrained skills exist widely in robotic systems. For instance, when a robot is demanded to write letters on a board, its end-effector trajectory must comply with the plane constraint from the board. In this paper, we aim to tackle the problem of imitation learning with linear constraints. Specifically, we propose to exploit the probabilistic properties of multiple demonstrations, and subsequently incorporate them into a linearly constrained optimization problem, which finally leads to a non-parametric solution. In addition, a connection between our framework and the classical model predictive control is provided. Several examples including simulated writing and locomotion tasks are presented to show the effectiveness of our framework. |
Tasks | Imitation Learning |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.07374v1 |
https://arxiv.org/pdf/1909.07374v1.pdf | |
PWC | https://paperswithcode.com/paper/a-linearly-constrained-nonparametric |
Repo | |
Framework | |
Driver Distraction Identification with an Ensemble of Convolutional Neural Networks
Title | Driver Distraction Identification with an Ensemble of Convolutional Neural Networks |
Authors | Hesham M. Eraqi, Yehya Abouelnaga, Mohamed H. Saad, Mohamed N. Moustafa |
Abstract | The World Health Organization (WHO) reported 1.25 million deaths yearly due to road traffic accidents worldwide and the number has been continuously increasing over the last few years. Nearly fifth of these accidents are caused by distracted drivers. Existing work of distracted driver detection is concerned with a small set of distractions (mostly, cell phone usage). Unreliable ad-hoc methods are often used.In this paper, we present the first publicly available dataset for driver distraction identification with more distraction postures than existing alternatives. In addition, we propose a reliable deep learning-based solution that achieves a 90% accuracy. The system consists of a genetically-weighted ensemble of convolutional neural networks, we show that a weighted ensemble of classifiers using a genetic algorithm yields in a better classification confidence. We also study the effect of different visual elements in distraction detection by means of face and hand localizations, and skin segmentation. Finally, we present a thinned version of our ensemble that could achieve 84.64% classification accuracy and operate in a real-time environment. |
Tasks | |
Published | 2019-01-22 |
URL | http://arxiv.org/abs/1901.09097v1 |
http://arxiv.org/pdf/1901.09097v1.pdf | |
PWC | https://paperswithcode.com/paper/driver-distraction-identification-with-an |
Repo | |
Framework | |
Improving Human Annotation in Single Object Tracking
Title | Improving Human Annotation in Single Object Tracking |
Authors | Yu Pang, Xinyi Li, Lin Yuan, Haibin Ling |
Abstract | Human annotation is always considered as ground truth in video object tracking tasks. It is used in both training and evaluation purposes. Thus, ensuring its high quality is an important task for the success of trackers and evaluations between them. In this paper, we give a qualitative and quantitative analysis of the existing human annotations. We show that human annotation tends to be non-smooth and is prone to partial visibility and deformation. We propose a smoothing trajectory strategy with the ability to handle moving scenes. We use a two-step adaptive image alignment algorithm to find the canonical view of the video sequence. We then use different techniques to smooth the trajectories at certain degree. Once we convert back to the original image coordination, we can compare with the human annotation. With the experimental results, we can get more consistent trajectories. At a certain degree, it can also slightly improve the trained model. If go beyond a certain threshold, the smoothing error will start eating up the benefit. Overall, our method could help extrapolate the missing annotation frames or identify and correct human annotation outliers as well as help improve the training data quality. |
Tasks | Object Tracking, Video Object Tracking |
Published | 2019-11-07 |
URL | https://arxiv.org/abs/1911.02807v1 |
https://arxiv.org/pdf/1911.02807v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-human-annotation-in-single-object |
Repo | |
Framework | |
Correlation Priors for Reinforcement Learning
Title | Correlation Priors for Reinforcement Learning |
Authors | Bastian Alt, Adrian Šošić, Heinz Koeppl |
Abstract | Many decision-making problems naturally exhibit pronounced structures inherited from the characteristics of the underlying environment. In a Markov decision process model, for example, two distinct states can have inherently related semantics or encode resembling physical state configurations. This often implies locally correlated transition dynamics among the states. In order to complete a certain task in such environments, the operating agent usually needs to execute a series of temporally and spatially correlated actions. Though there exists a variety of approaches to capture these correlations in continuous state-action domains, a principled solution for discrete environments is missing. In this work, we present a Bayesian learning framework based on P'olya-Gamma augmentation that enables an analogous reasoning in such cases. We demonstrate the framework on a number of common decision-making related problems, such as imitation learning, subgoal extraction, system identification and Bayesian reinforcement learning. By explicitly modeling the underlying correlation structures of these problems, the proposed approach yields superior predictive performance compared to correlation-agnostic models, even when trained on data sets that are an order of magnitude smaller in size. |
Tasks | Decision Making, Imitation Learning |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.05106v2 |
https://arxiv.org/pdf/1909.05106v2.pdf | |
PWC | https://paperswithcode.com/paper/correlation-priors-for-reinforcement-learning |
Repo | |
Framework | |
Modeling German Verb Argument Structures: LSTMs vs. Humans
Title | Modeling German Verb Argument Structures: LSTMs vs. Humans |
Authors | Charlotte Rochereau, Benoît Sagot, Emmanuel Dupoux |
Abstract | LSTMs have proven very successful at language modeling. However, it remains unclear to what extent they are able to capture complex morphosyntactic structures. In this paper, we examine whether LSTMs are sensitive to verb argument structures. We introduce a German grammaticality dataset in which ungrammatical sentences are constructed by manipulating case assignments (eg substituting nominative by accusative or dative). We find that LSTMs are better than chance in detecting incorrect argument structures and slightly worse than humans tested on the same dataset. Surprisingly, LSTMs are contaminated by heuristics not found in humans like a preference toward nominative noun phrases. In other respects they show human-similar results like biases for particular orders of case assignments. |
Tasks | Language Modelling |
Published | 2019-11-30 |
URL | https://arxiv.org/abs/1912.00239v1 |
https://arxiv.org/pdf/1912.00239v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-german-verb-argument-structures |
Repo | |
Framework | |
Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
Title | Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition |
Authors | Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu |
Abstract | In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition. Specifically, initialized with a RNN-T trained model, MBR training is conducted via minimizing the expected edit distance between the reference label sequence and on-the-fly generated N-best hypothesis. We also introduce a heuristic to incorporate an external neural network language model (NNLM) in RNN-T beam search decoding and explore MBR training with the external NNLM. Experimental results demonstrate an MBR trained model outperforms a RNN-T trained model substantially and further improvements can be achieved if trained with an external NNLM. Our best MBR trained system achieves absolute character error rate (CER) reductions of 1.2% and 0.5% on read and spontaneous Mandarin speech respectively over a strong convolution and transformer based RNN-T baseline trained on ~21,000 hours of speech. |
Tasks | End-To-End Speech Recognition, Language Modelling, Speech Recognition |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12487v1 |
https://arxiv.org/pdf/1911.12487v1.pdf | |
PWC | https://paperswithcode.com/paper/minimum-bayes-risk-training-of-rnn-transducer |
Repo | |
Framework | |
Variational Inference with Latent Space Quantization for Adversarial Resilience
Title | Variational Inference with Latent Space Quantization for Adversarial Resilience |
Authors | Vinay Kyatham, Mayank Mishra, Tarun Kumar Yadav, Deepak Mishra, Prathosh AP |
Abstract | Despite their tremendous success in modelling high-dimensional data manifolds, deep neural networks suffer from the threat of adversarial attacks - Existence of perceptually valid input-like samples obtained through careful perturbation that lead to degradation in the performance of the underlying model. Major concerns with existing defense mechanisms include non-generalizability across different attacks, models and large inference time. In this paper, we propose a generalized defense mechanism capitalizing on the expressive power of regularized latent space based generative models. We design an adversarial filter, devoid of access to classifier and adversaries, which makes it usable in tandem with any classifier. The basic idea is to learn a Lipschitz constrained mapping from the data manifold, incorporating adversarial perturbations, to a quantized latent space and re-map it to the true data manifold. Specifically, we simultaneously auto-encode the data manifold and its perturbations implicitly through the perturbations of the regularized and quantized generative latent space, realized using variational inference. We demonstrate the efficacy of the proposed formulation in providing resilience against multiple attack types (black and white box) and methods, while being almost real-time. Our experiments show that the proposed method surpasses the state-of-the-art techniques in several cases. |
Tasks | Quantization |
Published | 2019-03-24 |
URL | https://arxiv.org/abs/1903.09940v2 |
https://arxiv.org/pdf/1903.09940v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-inference-with-latent-space |
Repo | |
Framework | |
Digital Twin approach to Clinical DSS with Explainable AI
Title | Digital Twin approach to Clinical DSS with Explainable AI |
Authors | Dattaraj Jagdish Rao, Shraddha Mane |
Abstract | We propose a digital twin approach to improve healthcare decision support systems with a combination of domain knowledge and data. Domain knowledge helps build decision thresholds that doctors can use to determine a risk or recommend a treatment or test based on the specific patient condition. However, these assessments tend to be highly subjective and differ from doctor to doctor and from patient to patient. We propose a system where we collate this subjective risk by compiling data from different doctors treating different patients and build a machine learning model that learns from this knowledge. Then using state-of-the-art explainability concepts we derive explanations from this model. These explanations give us a summary of different doctor domain knowledge applied in different cases to give a more generic perspective. Also these explanations are specific to a particular patient and are customized for their condition. This is a form of a digital twin for the patient that can now be used to enhance decision boundaries for earlier defined decision tables that help in diagnosis. We will show an example of running this analysis for a liver disease risk diagnosis. |
Tasks | |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.13520v1 |
https://arxiv.org/pdf/1910.13520v1.pdf | |
PWC | https://paperswithcode.com/paper/digital-twin-approach-to-clinical-dss-with |
Repo | |
Framework | |
On the Interaction Between Deep Detectors and Siamese Trackers in Video Surveillance
Title | On the Interaction Between Deep Detectors and Siamese Trackers in Video Surveillance |
Authors | Madhu Kiran, Vivek Tiwari, Le Thanh Nguyen-Meidine, Eric Granger |
Abstract | Visual object tracking is an important function in many real-time video surveillance applications, such as localization and spatio-temporal recognition of persons. In real-world applications, an object detector and tracker must interact on a periodic basis to discover new objects, and thereby to initiate tracks. Periodic interactions with the detector can also allow the tracker to validate and/or update its object template with new bounding boxes. However, bounding boxes provided by a state-of-the-art detector are noisy, due to changes in appearance, background and occlusion, which can cause the tracker to drift. Moreover, CNN-based detectors can provide a high level of accuracy at the expense of computational complexity, so interactions should be minimized for real-time applications. In this paper, a new approach is proposed to manage detector-tracker interactions for trackers from the Siamese-FC family. By integrating a change detection mechanism into a deep Siamese-FC tracker, its template can be adapted in response to changes in a target’s appearance that lead to drifts during tracking. An abrupt change detection triggers an update of tracker template using the bounding box produced by the detector, while in the case of a gradual change, the detector is used to update an evolving set of templates for robust matching. Experiments were performed using state-of-the-art Siamese-FC trackers and the YOLOv3 detector on a subset of videos from the OTB-100 dataset that mimic video surveillance scenarios. Results highlight the importance for reliable VOT of using accurate detectors. They also indicate that our adaptive Siamese trackers are robust to noisy object detections, and can significantly improve the performance of Siamese-FC tracking. |
Tasks | Object Tracking, Visual Object Tracking |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14552v1 |
https://arxiv.org/pdf/1910.14552v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-interaction-between-deep-detectors-and |
Repo | |
Framework | |
SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling
Title | SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling |
Authors | Huyen Nguyen |
Abstract | With language modeling becoming the popular base task for unsupervised representation learning in Natural Language Processing, it is important to come up with new architectures and techniques for faster and better training of language models. However, due to a peculiarity of languages – the larger the dataset, the higher the average number of times a word appears in that dataset – datasets of different sizes have very different properties. Architectures performing well on small datasets might not perform well on larger ones. For example, LSTM models perform well on WikiText-2 but poorly on WikiText-103, while Transformer models perform well on WikiText-103 but not on WikiText-2. For setups like architectural search, this is a challenge since it is prohibitively costly to run a search on the full dataset but it is not indicative to experiment on smaller ones. In this paper, we introduce SimpleBooks, a small dataset with the average word frequency as high as that of much larger ones. Created from 1,573 Gutenberg books with the highest ratio of word-level book length to vocabulary size, SimpleBooks contains 92M word-level tokens, on par with WikiText-103 (103M tokens), but has the vocabulary of 98K, a third of WikiText-103’s. SimpleBooks can be downloaded from https://dldata-public.s3.us-east-2.amazonaws.com/simplebooks.zip. |
Tasks | Language Modelling, Representation Learning, Unsupervised Representation Learning |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12391v1 |
https://arxiv.org/pdf/1911.12391v1.pdf | |
PWC | https://paperswithcode.com/paper/simplebooks-long-term-dependency-book-dataset |
Repo | |
Framework | |
Data assimilation in a nonlinear time-delayed dynamical system
Title | Data assimilation in a nonlinear time-delayed dynamical system |
Authors | Tullio Traverso, Luca Magri |
Abstract | When the heat released by a flame is sufficiently in phase with the acoustic pressure, a self-excited thermoacoustic oscillation can arise. These nonlinear oscillations are one of the biggest challenges faced in the design of safe and reliable gas turbines and rocket motors. In the worst-case scenario, uncontrolled thermoacoustic oscillations can shake an engine apart. Reduced-order thermoacoustic models, which are nonlinear and time-delayed, can only qualitatively predict thermoacoustic oscillations. To make reduced-order models quantitatively predictive, we develop a data assimilation framework for state estimation. We numerically estimate the most likely nonlinear state of a Galerkin-discretized time delayed model of a horizontal Rijke tube, which is a prototypical combustor. Data assimilation is an optimal blending of observations with previous state estimates (background) to produce optimal initial conditions. A cost functional is defined to measure the statistical distance between the model output and the measurements from experiments; and the distance between the initial conditions and the background knowledge. Its minimum corresponds to the optimal state, which is computed by Lagrangian optimization with the aid of adjoint equations. We study the influence of the number of Galerkin modes, which are the natural acoustic modes of the duct, with which the model is discretized. We show that decomposing the measured pressure signal in a finite number of modes is an effective way to enhance state estimation, especially when nonlinear modal interactions occur during the assimilation window. This work represents the first application of data assimilation to nonlinear thermoacoustics, which opens up new possibilities for real-time calibration of reduced-order models with experimental measurements. |
Tasks | Calibration |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.05163v1 |
http://arxiv.org/pdf/1904.05163v1.pdf | |
PWC | https://paperswithcode.com/paper/data-assimilation-in-a-nonlinear-time-delayed |
Repo | |
Framework | |
On the Resilience of Deep Learning for Reduced-voltage FPGAs
Title | On the Resilience of Deep Learning for Reduced-voltage FPGAs |
Authors | Kamyar Givaki, Behzad Salami, Reza Hojabr, S. M. Reza Tayaranian, Ahmad Khonsari, Dara Rahmati, Saeid Gorgin, Adrian Cristal, Osman S. Unsal |
Abstract | Deep Neural Networks (DNNs) are inherently computation-intensive and also power-hungry. Hardware accelerators such as Field Programmable Gate Arrays (FPGAs) are a promising solution that can satisfy these requirements for both embedded and High-Performance Computing (HPC) systems. In FPGAs, as well as CPUs and GPUs, aggressive voltage scaling below the nominal level is an effective technique for power dissipation minimization. Unfortunately, bit-flip faults start to appear as the voltage is scaled down closer to the transistor threshold due to timing issues, thus creating a resilience issue. This paper experimentally evaluates the resilience of the training phase of DNNs in the presence of voltage underscaling related faults of FPGAs, especially in on-chip memories. Toward this goal, we have experimentally evaluated the resilience of LeNet-5 and also a specially designed network for CIFAR-10 dataset with different activation functions of Rectified Linear Unit (Relu) and Hyperbolic Tangent (Tanh). We have found that modern FPGAs are robust enough in extremely low-voltage levels and that low-voltage related faults can be automatically masked within the training iterations, so there is no need for costly software- or hardware-oriented fault mitigation techniques like ECC. Approximately 10% more training iterations are needed to fill the gap in the accuracy. This observation is the result of the relatively low rate of undervolting faults, i.e., <0.1%, measured on real FPGA fabrics. We have also increased the fault rate significantly for the LeNet-5 network by randomly generated fault injection campaigns and observed that the training accuracy starts to degrade. When the fault rate increases, the network with Tanh activation function outperforms the one with Relu in terms of accuracy, e.g., when the fault rate is 30% the accuracy difference is 4.92%. |
Tasks | |
Published | 2019-12-26 |
URL | https://arxiv.org/abs/2001.00053v1 |
https://arxiv.org/pdf/2001.00053v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-resilience-of-deep-learning-for |
Repo | |
Framework | |
Censored Quantile Regression Forests
Title | Censored Quantile Regression Forests |
Authors | Alexander Hanbo Li, Jelena Bradic |
Abstract | Random forests are powerful non-parametric regression method but are severely limited in their usage in the presence of randomly censored observations, and naively applied can exhibit poor predictive performance due to the incurred biases. Based on a local adaptive representation of random forests, we develop its regression adjustment for randomly censored regression quantile models. Regression adjustment is based on new estimating equations that adapt to censoring and lead to quantile score whenever the data do not exhibit censoring. The proposed procedure named censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption. We establish its consistency under mild model specifications. Numerical studies showcase a clear advantage of the proposed procedure. |
Tasks | |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.03327v1 |
http://arxiv.org/pdf/1902.03327v1.pdf | |
PWC | https://paperswithcode.com/paper/censored-quantile-regression-forests |
Repo | |
Framework | |
A low-power end-to-end hybrid neuromorphic framework for surveillance applications
Title | A low-power end-to-end hybrid neuromorphic framework for surveillance applications |
Authors | Andres Ussa, Luca Della Vedova, Vandana Reddy Padala, Deepak Singla, Jyotibdha Acharya, Charles Zhang Lei, Garrick Orchard, Arindam Basu, Bharath Ramesh |
Abstract | With the success of deep learning, object recognition systems that can be deployed for real-world applications are becoming commonplace. However, inference that needs to largely take place on the `edge’ (not processed on servers), is a highly computational and memory intensive workload, making it intractable for low-power mobile nodes and remote security applications. To address this challenge, this paper proposes a low-power (5W) end-to-end neuromorphic framework for object tracking and classification using event-based cameras that possess desirable properties such as low power consumption (5-14 mW) and high dynamic range (120 dB). Nonetheless, unlike traditional approaches of using event-by-event processing, this work uses a mixed frame and event approach to get energy savings with high performance. Using a frame-based region proposal method based on the density of foreground events, a hardware-friendly object tracking is implemented using the apparent object velocity while tackling occlusion scenarios. For low-power classification of the tracked objects, the event camera is interfaced to IBM TrueNorth, which is time-multiplexed to tackle up to eight instances for a traffic monitoring application. The frame-based object track input is converted back to spikes for Truenorth classification via the energy efficient deep network (EEDN) pipeline. Using originally collected datasets, we train the TrueNorth model on the hardware track outputs, instead of using ground truth object locations as commonly done, and demonstrate the efficacy of our system to handle practical surveillance scenarios. Finally, we compare the proposed methodologies to state-of-the-art event-based systems for object tracking and classification, and demonstrate the use case of our neuromorphic approach for low-power applications without sacrificing on performance. | |
Tasks | Object Recognition, Object Tracking |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09806v3 |
https://arxiv.org/pdf/1910.09806v3.pdf | |
PWC | https://paperswithcode.com/paper/a-low-power-end-to-end-hybrid-neuromorphic |
Repo | |
Framework | |