October 20, 2019

3134 words 15 mins read

Paper Group ANR 38

Paper Group ANR 38

Human activity recognition based on time series analysis using U-Net. Detecting Intentions of Vulnerable Road Users Based on Collective Intelligence. Adaptation to Easy Data in Prediction with Limited Advice. Random Feature Stein Discrepancies. On the Role of Event Boundaries in Egocentric Activity Recognition from Photostreams. Activity Recognitio …

Human activity recognition based on time series analysis using U-Net

Title Human activity recognition based on time series analysis using U-Net
Authors Yong Zhang, Yu Zhang, Zhao Zhang, Jie Bao, Yunpeng Song
Abstract Traditional human activity recognition (HAR) based on time series adopts sliding window analysis method. This method faces the multi-class window problem which mistakenly labels different classes of sampling points within a window as a class. In this paper, a HAR algorithm based on U-Net is proposed to perform activity labeling and prediction at each sampling point. The activity data of the triaxial accelerometer is mapped into an image with the single pixel column and multi-channel which is input into the U-Net network for training and recognition. Our proposal can complete the pixel-level gesture recognition function. The method does not need manual feature extraction and can effectively identify short-term behaviors in long-term activity sequences. We collected the Sanitation dataset and tested the proposed scheme with four open data sets. The experimental results show that compared with Support Vector Machine (SVM), k-Nearest Neighbor (kNN), Decision Tree(DT), Quadratic Discriminant Analysis (QDA), Convolutional Neural Network (CNN) and Fully Convolutional Networks (FCN) methods, our proposal has the highest accuracy and F1-socre in each dataset, and has stable performance and high robustness. At the same time, after the U-Net has finished training, our proposal can achieve fast enough recognition speed.
Tasks Activity Recognition, Gesture Recognition, Human Activity Recognition, Time Series, Time Series Analysis
Published 2018-09-20
URL http://arxiv.org/abs/1809.08113v1
PDF http://arxiv.org/pdf/1809.08113v1.pdf
PWC https://paperswithcode.com/paper/human-activity-recognition-based-on-time
Repo
Framework

Detecting Intentions of Vulnerable Road Users Based on Collective Intelligence

Title Detecting Intentions of Vulnerable Road Users Based on Collective Intelligence
Authors Maarten Bieshaar, Günther Reitberger, Stefan Zernetsch, Bernhard Sick, Erich Fuchs, Konrad Doll
Abstract Vulnerable road users (VRUs, i.e. cyclists and pedestrians) will play an important role in future traffic. To avoid accidents and achieve a highly efficient traffic flow, it is important to detect VRUs and to predict their intentions. In this article a holistic approach for detecting intentions of VRUs by cooperative methods is presented. The intention detection consists of basic movement primitive prediction, e.g. standing, moving, turning, and a forecast of the future trajectory. Vehicles equipped with sensors, data processing systems and communication abilities, referred to as intelligent vehicles, acquire and maintain a local model of their surrounding traffic environment, e.g. crossing cyclists. Heterogeneous, open sets of agents (cooperating and interacting vehicles, infrastructure, e.g. cameras and laser scanners, and VRUs equipped with smart devices and body-worn sensors) exchange information forming a multi-modal sensor system with the goal to reliably and robustly detect VRUs and their intentions under consideration of real time requirements and uncertainties. The resulting model allows to extend the perceptual horizon of the individual agent beyond their own sensory capabilities, enabling a longer forecast horizon. Concealments, implausibilities and inconsistencies are resolved by the collective intelligence of cooperating agents. Novel techniques of signal processing and modelling in combination with analytical and learning based approaches of pattern and activity recognition are used for detection, as well as intention prediction of VRUs. Cooperation, by means of probabilistic sensor and knowledge fusion, takes place on the level of perception and intention recognition. Based on the requirements of the cooperative approach for the communication a new strategy for an ad hoc network is proposed.
Tasks Activity Recognition, Intent Detection
Published 2018-09-11
URL http://arxiv.org/abs/1809.03916v1
PDF http://arxiv.org/pdf/1809.03916v1.pdf
PWC https://paperswithcode.com/paper/detecting-intentions-of-vulnerable-road-users
Repo
Framework

Adaptation to Easy Data in Prediction with Limited Advice

Title Adaptation to Easy Data in Prediction with Limited Advice
Authors Tobias Sommer Thune, Yevgeny Seldin
Abstract We derive an online learning algorithm with improved regret guarantees for easy' loss sequences. We consider two types of easiness’: (a) stochastic loss sequences and (b) adversarial loss sequences with small effective range of the losses. While a number of algorithms have been proposed for exploiting small effective range in the full information setting, Gerchinovitz and Lattimore [2016] have shown the impossibility of regret scaling with the effective range of the losses in the bandit setting. We show that just one additional observation per round is sufficient to circumvent the impossibility result. The proposed Second Order Difference Adjustments (SODA) algorithm requires no prior knowledge of the effective range of the losses, $\varepsilon$, and achieves an $O(\varepsilon \sqrt{KT \ln K}) + \tilde{O}(\varepsilon K \sqrt[4]{T})$ expected regret guarantee, where $T$ is the time horizon and $K$ is the number of actions. The scaling with the effective loss range is achieved under significantly weaker assumptions than those made by Cesa-Bianchi and Shamir [2018] in an earlier attempt to circumvent the impossibility result. We also provide a regret lower bound of $\Omega(\varepsilon\sqrt{T K})$, which almost matches the upper bound. In addition, we show that in the stochastic setting SODA achieves an $O\left(\sum_{a:\Delta_a>0} \frac{K^3 \varepsilon^2}{\Delta_a}\right)$ pseudo-regret bound that holds simultaneously with the adversarial regret guarantee. In other words, SODA is safe against an unrestricted oblivious adversary and provides improved regret guarantees for at least two different types of `easiness’ simultaneously. |
Tasks
Published 2018-07-02
URL https://arxiv.org/abs/1807.00636v3
PDF https://arxiv.org/pdf/1807.00636v3.pdf
PWC https://paperswithcode.com/paper/adaptation-to-easy-data-in-prediction-with
Repo
Framework

Random Feature Stein Discrepancies

Title Random Feature Stein Discrepancies
Authors Jonathan H. Huggins, Lester Mackey
Abstract Computable Stein discrepancies have been deployed for a variety of applications, ranging from sampler selection in posterior inference to approximate Bayesian inference to goodness-of-fit testing. Existing convergence-determining Stein discrepancies admit strong theoretical guarantees but suffer from a computational cost that grows quadratically in the sample size. While linear-time Stein discrepancies have been proposed for goodness-of-fit testing, they exhibit avoidable degradations in testing power—even when power is explicitly optimized. To address these shortcomings, we introduce feature Stein discrepancies ($\Phi$SDs), a new family of quality measures that can be cheaply approximated using importance sampling. We show how to construct $\Phi$SDs that provably determine the convergence of a sample to its target and develop high-accuracy approximations—random $\Phi$SDs (R$\Phi$SDs)—which are computable in near-linear time. In our experiments with sampler selection for approximate posterior inference and goodness-of-fit testing, R$\Phi$SDs perform as well or better than quadratic-time KSDs while being orders of magnitude faster to compute.
Tasks Bayesian Inference
Published 2018-06-20
URL http://arxiv.org/abs/1806.07788v4
PDF http://arxiv.org/pdf/1806.07788v4.pdf
PWC https://paperswithcode.com/paper/random-feature-stein-discrepancies
Repo
Framework

On the Role of Event Boundaries in Egocentric Activity Recognition from Photostreams

Title On the Role of Event Boundaries in Egocentric Activity Recognition from Photostreams
Authors Alejandro Cartas, Estefania Talavera, Petia Radeva, Mariella Dimiccoli
Abstract Event boundaries play a crucial role as a pre-processing step for detection, localization, and recognition tasks of human activities in videos. Typically, although their intrinsic subjectiveness, temporal bounds are provided manually as input for training action recognition algorithms. However, their role for activity recognition in the domain of egocentric photostreams has been so far neglected. In this paper, we provide insights of how automatically computed boundaries can impact activity recognition results in the emerging domain of egocentric photostreams. Furthermore, we collected a new annotated dataset acquired by 15 people by a wearable photo-camera and we used it to show the generalization capabilities of several deep learning based architectures to unseen users.
Tasks Activity Recognition, Egocentric Activity Recognition, Temporal Action Localization
Published 2018-09-02
URL http://arxiv.org/abs/1809.00402v2
PDF http://arxiv.org/pdf/1809.00402v2.pdf
PWC https://paperswithcode.com/paper/on-the-role-of-event-boundaries-in-egocentric
Repo
Framework

Activity Recognition on a Large Scale in Short Videos - Moments in Time Dataset

Title Activity Recognition on a Large Scale in Short Videos - Moments in Time Dataset
Authors Ankit Shah, Harini Kesavamoorthy, Poorva Rane, Pramati Kalwad, Alexander Hauptmann, Florian Metze
Abstract Moments capture a huge part of our lives. Accurate recognition of these moments is challenging due to the diverse and complex interpretation of the moments. Action recognition refers to the act of classifying the desired action/activity present in a given video. In this work, we perform experiments on Moments in Time dataset to recognize accurately activities occurring in 3 second clips. We use state of the art techniques for visual, auditory and spatio temporal localization and develop method to accurately classify the activity in the Moments in Time dataset. Our novel approach of using Visual Based Textual features and fusion techniques performs well providing an overall 89.23 % Top - 5 accuracy on the 20 classes - a significant improvement over the Baseline TRN model.
Tasks Activity Recognition, Temporal Action Localization, Temporal Localization
Published 2018-09-01
URL http://arxiv.org/abs/1809.00241v2
PDF http://arxiv.org/pdf/1809.00241v2.pdf
PWC https://paperswithcode.com/paper/activity-recognition-on-a-large-scale-in
Repo
Framework

Deep Neural Networks based Modrec: Some Results with Inter-Symbol Interference and Adversarial Examples

Title Deep Neural Networks based Modrec: Some Results with Inter-Symbol Interference and Adversarial Examples
Authors S. Asim Ahmed, Subhashish Chakravarty, Michael Newhouse
Abstract Recent successes and advances in Deep Neural Networks (DNN) in machine vision and Natural Language Processing (NLP) have motivated their use in traditional signal processing and communications systems. In this paper, we present results of such applications to the problem of automatic modulation recognition. Variations in wireless communication channels are represented by statistical channel models and their parameterization will increase with the advent of 5G. In this paper, we report effect of simple two path channel model on our naive deep neural network based implementation. We also report impact of adversarial perturbation to the input signal.
Tasks
Published 2018-11-14
URL http://arxiv.org/abs/1811.06103v1
PDF http://arxiv.org/pdf/1811.06103v1.pdf
PWC https://paperswithcode.com/paper/deep-neural-networks-based-modrec-some
Repo
Framework

Online Human Activity Recognition using Low-Power Wearable Devices

Title Online Human Activity Recognition using Low-Power Wearable Devices
Authors Ganapati Bhat, Ranadeep Deb, Vatika Vardhan Chaurasia, Holly Shill, Umit Y. Ogras
Abstract Human activity recognition~(HAR) has attracted significant research interest due to its applications in health monitoring and patient rehabilitation. Recent research on HAR focuses on using smartphones due to their widespread use. However, this leads to inconvenient use, limited choice of sensors and inefficient use of resources, since smartphones are not designed for HAR. This paper presents the first HAR framework that can perform both online training and inference. The proposed framework starts with a novel technique that generates features using the fast Fourier and discrete wavelet transforms of a textile-based stretch sensor and accelerometer. Using these features, we design an artificial neural network classifier which is trained online using the policy gradient algorithm. Experiments on a low power IoT device (TI-CC2650 MCU) with nine users show 97.7% accuracy in identifying six activities and their transitions with less than 12.5 mW power consumption.
Tasks Activity Recognition, Human Activity Recognition
Published 2018-08-26
URL http://arxiv.org/abs/1808.08615v2
PDF http://arxiv.org/pdf/1808.08615v2.pdf
PWC https://paperswithcode.com/paper/online-human-activity-recognition-using-low
Repo
Framework

When Regression Meets Manifold Learning for Object Recognition and Pose Estimation

Title When Regression Meets Manifold Learning for Object Recognition and Pose Estimation
Authors Mai Bui, Sergey Zakharov, Shadi Albarqouni, Slobodan Ilic, Nassir Navab
Abstract In this work, we propose a method for object recognition and pose estimation from depth images using convolutional neural networks. Previous methods addressing this problem rely on manifold learning to learn low dimensional viewpoint descriptors and employ them in a nearest neighbor search on an estimated descriptor space. In comparison we create an efficient multi-task learning framework combining manifold descriptor learning and pose regression. By combining the strengths of manifold learning using triplet loss and pose regression, we could either estimate the pose directly reducing the complexity compared to NN search, or use learned descriptor for the NN descriptor matching. By in depth experimental evaluation of the novel loss function we observed that the view descriptors learned by the network are much more discriminative resulting in almost 30% increase regarding relative pose accuracy compared to related works. On the other hand, regarding directly regressed poses we obtained important improvement compared to simple pose regression. By leveraging the advantages of both manifold learning and regression tasks, we are able to improve the current state-of-the-art for object recognition and pose retrieval that we demonstrate through in depth experimental evaluation.
Tasks Multi-Task Learning, Object Recognition, Pose Estimation
Published 2018-05-16
URL http://arxiv.org/abs/1805.06400v1
PDF http://arxiv.org/pdf/1805.06400v1.pdf
PWC https://paperswithcode.com/paper/when-regression-meets-manifold-learning-for
Repo
Framework

Extraction of Airways using Graph Neural Networks

Title Extraction of Airways using Graph Neural Networks
Authors Raghavendra Selvan, Thomas Kipf, Max Welling, Jesper H. Pedersen, Jens Petersen, Marleen de Bruijne
Abstract We present extraction of tree structures, such as airways, from image data as a graph refinement task. To this end, we propose a graph auto-encoder model that uses an encoder based on graph neural networks (GNNs) to learn embeddings from input node features and a decoder to predict connections between nodes. Performance of the GNN model is compared with mean-field networks in their ability to extract airways from 3D chest CT scans.
Tasks
Published 2018-04-12
URL http://arxiv.org/abs/1804.04436v1
PDF http://arxiv.org/pdf/1804.04436v1.pdf
PWC https://paperswithcode.com/paper/extraction-of-airways-using-graph-neural
Repo
Framework

Multimodal Language Analysis with Recurrent Multistage Fusion

Title Multimodal Language Analysis with Recurrent Multistage Fusion
Authors Paul Pu Liang, Ziyin Liu, Amir Zadeh, Louis-Philippe Morency
Abstract Computational modeling of human multimodal language is an emerging research area in natural language processing spanning the language, visual and acoustic modalities. Comprehending multimodal language requires modeling not only the interactions within each modality (intra-modal interactions) but more importantly the interactions between modalities (cross-modal interactions). In this paper, we propose the Recurrent Multistage Fusion Network (RMFN) which decomposes the fusion problem into multiple stages, each of them focused on a subset of multimodal signals for specialized, effective fusion. Cross-modal interactions are modeled using this multistage fusion approach which builds upon intermediate representations of previous stages. Temporal and intra-modal interactions are modeled by integrating our proposed fusion approach with a system of recurrent neural networks. The RMFN displays state-of-the-art performance in modeling human multimodal language across three public datasets relating to multimodal sentiment analysis, emotion recognition, and speaker traits recognition. We provide visualizations to show that each stage of fusion focuses on a different subset of multimodal signals, learning increasingly discriminative multimodal representations.
Tasks Emotion Recognition, Multimodal Sentiment Analysis, Sentiment Analysis
Published 2018-08-12
URL http://arxiv.org/abs/1808.03920v1
PDF http://arxiv.org/pdf/1808.03920v1.pdf
PWC https://paperswithcode.com/paper/multimodal-language-analysis-with-recurrent
Repo
Framework

Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition

Title Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition
Authors Unaiza Ahsan, Rishi Madhok, Irfan Essa
Abstract We propose a self-supervised learning method to jointly reason about spatial and temporal context for video recognition. Recent self-supervised approaches have used spatial context [9, 34] as well as temporal coherency [32] but a combination of the two requires extensive preprocessing such as tracking objects through millions of video frames [59] or computing optical flow to determine frame regions with high motion [30]. We propose to combine spatial and temporal context in one self-supervised framework without any heavy preprocessing. We divide multiple video frames into grids of patches and train a network to solve jigsaw puzzles on these patches from multiple frames. So the network is trained to correctly identify the position of a patch within a video frame as well as the position of a patch over time. We also propose a novel permutation strategy that outperforms random permutations while significantly reducing computational and memory constraints. We use our trained network for transfer learning tasks such as video activity recognition and demonstrate the strength of our approach on two benchmark video action recognition datasets without using a single frame from these datasets for unsupervised pretraining of our proposed video jigsaw network.
Tasks Activity Recognition, Optical Flow Estimation, Temporal Action Localization, Transfer Learning, Video Recognition
Published 2018-08-22
URL http://arxiv.org/abs/1808.07507v1
PDF http://arxiv.org/pdf/1808.07507v1.pdf
PWC https://paperswithcode.com/paper/video-jigsaw-unsupervised-learning-of
Repo
Framework

Holistic Multi-modal Memory Network for Movie Question Answering

Title Holistic Multi-modal Memory Network for Movie Question Answering
Authors Anran Wang, Anh Tuan Luu, Chuan-Sheng Foo, Hongyuan Zhu, Yi Tay, Vijay Chandrasekhar
Abstract Answering questions according to multi-modal context is a challenging problem as it requires a deep integration of different data sources. Existing approaches only employ partial interactions among data sources in one attention hop. In this paper, we present the Holistic Multi-modal Memory Network (HMMN) framework which fully considers the interactions between different input sources (multi-modal context, question) in each hop. In addition, it takes answer choices into consideration during the context retrieval stage. Therefore, the proposed framework effectively integrates multi-modal context, question, and answer information, which leads to more informative context retrieved for question answering. Our HMMN framework achieves state-of-the-art accuracy on MovieQA dataset. Extensive ablation studies show the importance of holistic reasoning and contributions of different attention strategies.
Tasks Question Answering, Video Question Answering
Published 2018-11-12
URL http://arxiv.org/abs/1811.04595v1
PDF http://arxiv.org/pdf/1811.04595v1.pdf
PWC https://paperswithcode.com/paper/holistic-multi-modal-memory-network-for-movie
Repo
Framework

Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf restraining specifications

Title Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf restraining specifications
Authors Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, Fabio Patrizi
Abstract In this work we investigate on the concept of “restraining bolt”, envisioned in Science Fiction. Specifically we introduce a novel problem in AI. We have two distinct sets of features extracted from the world, one by the agent and one by the authority imposing restraining specifications (the “restraining bolt”). The two sets are apparently unrelated since of interest to independent parties, however they both account for (aspects of) the same world. We consider the case in which the agent is a reinforcement learning agent on the first set of features, while the restraining bolt is specified logically using linear time logic on finite traces LTLf/LDLf over the second set of features. We show formally, and illustrate with examples, that, under general circumstances, the agent can learn while shaping its goals to suitably conform (as much as possible) to the restraining bolt specifications.
Tasks
Published 2018-07-17
URL https://arxiv.org/abs/1807.06333v2
PDF https://arxiv.org/pdf/1807.06333v2.pdf
PWC https://paperswithcode.com/paper/reinforcement-learning-for-ltlfldlf-goals
Repo
Framework

Deep learning for in vitro prediction of pharmaceutical formulations

Title Deep learning for in vitro prediction of pharmaceutical formulations
Authors Yilong Yang, Zhuyifan Ye, Yan Su, Qianqian Zhao, Xiaoshan Li, Defang Ouyang
Abstract Current pharmaceutical formulation development still strongly relies on the traditional trial-and-error approach by individual experiences of pharmaceutical scientists, which is laborious, time-consuming and costly. Recently, deep learning has been widely applied in many challenging domains because of its important capability of automatic feature extraction. The aim of this research is to use deep learning to predict pharmaceutical formulations. In this paper, two different types of dosage forms were chosen as model systems. Evaluation criteria suitable for pharmaceutics were applied to assessing the performance of the models. Moreover, an automatic dataset selection algorithm was developed for selecting the representative data as validation and test datasets. Six machine learning methods were compared with deep learning. The result shows the accuracies of both two deep neural networks were above 80% and higher than other machine learning models, which showed good prediction in pharmaceutical formulations. In summary, deep learning with the automatic data splitting algorithm and the evaluation criteria suitable for pharmaceutical formulation data was firstly developed for the prediction of pharmaceutical formulations. The cross-disciplinary integration of pharmaceutics and artificial intelligence may shift the paradigm of pharmaceutical researches from experience-dependent studies to data-driven methodologies.
Tasks
Published 2018-09-06
URL http://arxiv.org/abs/1809.02069v1
PDF http://arxiv.org/pdf/1809.02069v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-in-vitro-prediction-of
Repo
Framework
comments powered by Disqus