July 28, 2019

3250 words 16 mins read

Paper Group ANR 180

Paper Group ANR 180

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization. Handwritten digit string recognition by combination of residual network and RNN-CTC. Evaluating Social Networks Using Task-Focused Network Inference. Learning to Attend, Copy, and Generate for Session-Based Query Suggestion. Enriched Deep Recurrent Visual Attention Model for …

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

Title VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization
Authors Ronald Clark, Sen Wang, Andrew Markham, Niki Trigoni, Hongkai Wen
Abstract Machine learning techniques, namely convolutional neural networks (CNN) and regression forests, have recently shown great promise in performing 6-DoF localization of monocular images. However, in most cases image-sequences, rather only single images, are readily available. To this extent, none of the proposed learning-based approaches exploit the valuable constraint of temporal smoothness, often leading to situations where the per-frame error is larger than the camera motion. In this paper we propose a recurrent model for performing 6-DoF localization of video-clips. We find that, even by considering only short sequences (20 frames), the pose estimates are smoothed and the localization error can be drastically reduced. Finally, we consider means of obtaining probabilistic pose estimates from our model. We evaluate our method on openly-available real-world autonomous driving and indoor localization datasets.
Tasks Autonomous Driving
Published 2017-02-21
URL http://arxiv.org/abs/1702.06521v2
PDF http://arxiv.org/pdf/1702.06521v2.pdf
PWC https://paperswithcode.com/paper/vidloc-a-deep-spatio-temporal-model-for-6-dof
Repo
Framework

Handwritten digit string recognition by combination of residual network and RNN-CTC

Title Handwritten digit string recognition by combination of residual network and RNN-CTC
Authors Hongjian Zhan, Qingqing Wang, Yue Lu
Abstract Recurrent neural network (RNN) and connectionist temporal classification (CTC) have showed successes in many sequence labeling tasks with the strong ability of dealing with the problems where the alignment between the inputs and the target labels is unknown. Residual network is a new structure of convolutional neural network and works well in various computer vision tasks. In this paper, we take advantage of the architectures mentioned above to create a new network for handwritten digit string recognition. First we design a residual network to extract features from input images, then we employ a RNN to model the contextual information within feature sequences and predict recognition results. At the top of this network, a standard CTC is applied to calculate the loss and yield the final results. These three parts compose an end-to-end trainable network. The proposed new architecture achieves the highest performances on ORAND-CAR-A and ORAND-CAR-B with recognition rates 89.75% and 91.14%, respectively. In addition, the experiments on a generated captcha dataset which has much longer string length show the potential of the proposed network to handle long strings.
Tasks
Published 2017-10-09
URL http://arxiv.org/abs/1710.03112v1
PDF http://arxiv.org/pdf/1710.03112v1.pdf
PWC https://paperswithcode.com/paper/handwritten-digit-string-recognition-by
Repo
Framework

Evaluating Social Networks Using Task-Focused Network Inference

Title Evaluating Social Networks Using Task-Focused Network Inference
Authors Ivan Brugere, Chris Kanich, Tanya Y. Berger-Wolf
Abstract Networks are representations of complex underlying social processes. However, the same given network may be more suitable to model one behavior of individuals than another. In many cases, aggregate population models may be more effective than modeling on the network. We present a general framework for evaluating the suitability of given networks for a set of predictive tasks of interest, compared against alternative, networks inferred from data. We present several interpretable network models and measures for our comparison. We apply this general framework to the case study on collective classification of music preferences in a newly available dataset of the Last.fm social network.
Tasks
Published 2017-07-08
URL http://arxiv.org/abs/1707.02385v1
PDF http://arxiv.org/pdf/1707.02385v1.pdf
PWC https://paperswithcode.com/paper/evaluating-social-networks-using-task-focused
Repo
Framework

Learning to Attend, Copy, and Generate for Session-Based Query Suggestion

Title Learning to Attend, Copy, and Generate for Session-Based Query Suggestion
Authors Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, Pascal Fleury
Abstract Users try to articulate their complex information needs during search sessions by reformulating their queries. To make this process more effective, search engines provide related queries to help users in specifying the information need in their search process. In this paper, we propose a customized sequence-to-sequence model for session-based query suggestion. In our model, we employ a query-aware attention mechanism to capture the structure of the session context. is enables us to control the scope of the session from which we infer the suggested next query, which helps not only handle the noisy data but also automatically detect session boundaries. Furthermore, we observe that, based on the user query reformulation behavior, within a single session a large portion of query terms is retained from the previously submitted queries and consists of mostly infrequent or unseen terms that are usually not included in the vocabulary. We therefore empower the decoder of our model to access the source words from the session context during decoding by incorporating a copy mechanism. Moreover, we propose evaluation metrics to assess the quality of the generative models for query suggestion. We conduct an extensive set of experiments and analysis. e results suggest that our model outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion.
Tasks
Published 2017-08-11
URL http://arxiv.org/abs/1708.03418v4
PDF http://arxiv.org/pdf/1708.03418v4.pdf
PWC https://paperswithcode.com/paper/learning-to-attend-copy-and-generate-for
Repo
Framework

Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition

Title Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition
Authors Artsiom Ablavatski, Shijian Lu, Jianfei Cai
Abstract We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an improved attention-based architecture for multiple object recognition. The proposed model is a fully differentiable unit that can be optimized end-to-end by using Stochastic Gradient Descent (SGD). The Spatial Transformer (ST) was employed as visual attention mechanism which allows to learn the geometric transformation of objects within images. With the combination of the Spatial Transformer and the powerful recurrent architecture, the proposed EDRAM can localize and recognize objects simultaneously. EDRAM has been evaluated on two publicly available datasets including MNIST Cluttered (with 70K cluttered digits) and SVHN (with up to 250k real world images of house numbers). Experiments show that it obtains superior performance as compared with the state-of-the-art models.
Tasks Object Recognition
Published 2017-06-12
URL http://arxiv.org/abs/1706.03581v1
PDF http://arxiv.org/pdf/1706.03581v1.pdf
PWC https://paperswithcode.com/paper/enriched-deep-recurrent-visual-attention
Repo
Framework

Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference

Title Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference
Authors Yi Tay, Luu Anh Tuan, Siu Cheung Hui
Abstract This paper presents a new deep learning architecture for Natural Language Inference (NLI). Firstly, we introduce a new architecture where alignment pairs are compared, compressed and then propagated to upper layers for enhanced representation learning. Secondly, we adopt factorization layers for efficient and expressive compression of alignment vectors into scalar features, which are then used to augment the base word representations. The design of our approach is aimed to be conceptually simple, compact and yet powerful. We conduct experiments on three popular benchmarks, SNLI, MultiNLI and SciTail, achieving competitive performance on all. A lightweight parameterization of our model also enjoys a $\approx 3$ times reduction in parameter size compared to the existing state-of-the-art models, e.g., ESIM and DIIN, while maintaining competitive performance. Additionally, visual analysis shows that our propagated features are highly interpretable.
Tasks Natural Language Inference, Representation Learning
Published 2017-12-30
URL http://arxiv.org/abs/1801.00102v2
PDF http://arxiv.org/pdf/1801.00102v2.pdf
PWC https://paperswithcode.com/paper/compare-compress-and-propagate-enhancing
Repo
Framework

Unsupervised state representation learning with robotic priors: a robustness benchmark

Title Unsupervised state representation learning with robotic priors: a robustness benchmark
Authors Timothée Lesort, Mathieu Seurin, Xinrui Li, Natalia Díaz Rodríguez, David Filliat
Abstract Our understanding of the world depends highly on our capacity to produce intuitive and simplified representations which can be easily used to solve problems. We reproduce this simplification process using a neural network to build a low dimensional state representation of the world from images acquired by a robot. As in Jonschkowski et al. 2015, we learn in an unsupervised way using prior knowledge about the world as loss functions called robotic priors and extend this approach to high dimension richer images to learn a 3D representation of the hand position of a robot from RGB images. We propose a quantitative evaluation of the learned representation using nearest neighbors in the state space that allows to assess its quality and show both the potential and limitations of robotic priors in realistic environments. We augment image size, add distractors and domain randomization, all crucial components to achieve transfer learning to real robots. Finally, we also contribute a new prior to improve the robustness of the representation. The applications of such low dimensional state representation range from easing reinforcement learning (RL) and knowledge transfer across tasks, to facilitating learning from raw data with more efficient and compact high level representations. The results show that the robotic prior approach is able to extract high level representation as the 3D position of an arm and organize it into a compact and coherent space of states in a challenging dataset.
Tasks Representation Learning, Transfer Learning
Published 2017-09-15
URL http://arxiv.org/abs/1709.05185v1
PDF http://arxiv.org/pdf/1709.05185v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-state-representation-learning
Repo
Framework

Beat by Beat: Classifying Cardiac Arrhythmias with Recurrent Neural Networks

Title Beat by Beat: Classifying Cardiac Arrhythmias with Recurrent Neural Networks
Authors Patrick Schwab, Gaetano Scebba, Jia Zhang, Marco Delai, Walter Karlen
Abstract With tens of thousands of electrocardiogram (ECG) records processed by mobile cardiac event recorders every day, heart rhythm classification algorithms are an important tool for the continuous monitoring of patients at risk. We utilise an annotated dataset of 12,186 single-lead ECG recordings to build a diverse ensemble of recurrent neural networks (RNNs) that is able to distinguish between normal sinus rhythms, atrial fibrillation, other types of arrhythmia and signals that are too noisy to interpret. In order to ease learning over the temporal dimension, we introduce a novel task formulation that harnesses the natural segmentation of ECG signals into heartbeats to drastically reduce the number of time steps per sequence. Additionally, we extend our RNNs with an attention mechanism that enables us to reason about which heartbeats our RNNs focus on to make their decisions. Through the use of attention, our model maintains a high degree of interpretability, while also achieving state-of-the-art classification performance with an average F1 score of 0.79 on an unseen test set (n=3,658).
Tasks
Published 2017-10-17
URL http://arxiv.org/abs/1710.06319v2
PDF http://arxiv.org/pdf/1710.06319v2.pdf
PWC https://paperswithcode.com/paper/beat-by-beat-classifying-cardiac-arrhythmias
Repo
Framework

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Title A Big Data Analysis Framework Using Apache Spark and Deep Learning
Authors Anand Gupta, Hardeo Thakur, Ritvik Shrivastava, Pulkit Kumar, Sreyashi Nag
Abstract With the spreading prevalence of Big Data, many advances have recently been made in this field. Frameworks such as Apache Hadoop and Apache Spark have gained a lot of traction over the past decades and have become massively popular, especially in industries. It is becoming increasingly evident that effective big data analysis is key to solving artificial intelligence problems. Thus, a multi-algorithm library was implemented in the Spark framework, called MLlib. While this library supports multiple machine learning algorithms, there is still scope to use the Spark setup efficiently for highly time-intensive and computationally expensive procedures like deep learning. In this paper, we propose a novel framework that combines the distributive computational abilities of Apache Spark and the advanced machine learning architecture of a deep multi-layer perceptron (MLP), using the popular concept of Cascade Learning. We conduct empirical analysis of our framework on two real world datasets. The results are encouraging and corroborate our proposed framework, in turn proving that it is an improvement over traditional big data analysis methods that use either Spark or Deep learning as individual elements.
Tasks
Published 2017-11-25
URL http://arxiv.org/abs/1711.09279v1
PDF http://arxiv.org/pdf/1711.09279v1.pdf
PWC https://paperswithcode.com/paper/a-big-data-analysis-framework-using-apache
Repo
Framework

RoboCupSimData: A RoboCup soccer research dataset

Title RoboCupSimData: A RoboCup soccer research dataset
Authors Olivia Michael, Oliver Obst, Falk Schmidsberger, Frieder Stolzenburg
Abstract RoboCup is an international scientific robot competition in which teams of multiple robots compete against each other. Its different leagues provide many sources of robotics data, that can be used for further analysis and application of machine learning. This paper describes a large dataset from games of some of the top teams (from 2016 and 2017) in RoboCup Soccer Simulation League (2D), where teams of 11 robots (agents) compete against each other. Overall, we used 10 different teams to play each other, resulting in 45 unique pairings. For each pairing, we ran 25 matches (of 10mins), leading to 1125 matches or more than 180 hours of game play. The generated CSV files are 17GB of data (zipped), or 229GB (unzipped). The dataset is unique in the sense that it contains both the ground truth data (global, complete, noise-free information of all objects on the field), as well as the noisy, local and incomplete percepts of each robot. These data are made available as CSV files, as well as in the original soccer simulator formats.
Tasks
Published 2017-11-06
URL http://arxiv.org/abs/1711.01703v1
PDF http://arxiv.org/pdf/1711.01703v1.pdf
PWC https://paperswithcode.com/paper/robocupsimdata-a-robocup-soccer-research
Repo
Framework

Bayesian Nonparametric Feature and Policy Learning for Decision-Making

Title Bayesian Nonparametric Feature and Policy Learning for Decision-Making
Authors Jürgen Hahn, Abdelhak M. Zoubir
Abstract Learning from demonstrations has gained increasing interest in the recent past, enabling an agent to learn how to make decisions by observing an experienced teacher. While many approaches have been proposed to solve this problem, there is only little work that focuses on reasoning about the observed behavior. We assume that, in many practical problems, an agent makes its decision based on latent features, indicating a certain action. Therefore, we propose a generative model for the states and actions. Inference reveals the number of features, the features, and the policies, allowing us to learn and to analyze the underlying structure of the observed behavior. Further, our approach enables prediction of actions for new states. Simulations are used to assess the performance of the algorithm based upon this model. Moreover, the problem of learning a driver’s behavior is investigated, demonstrating the performance of the proposed model in a real-world scenario.
Tasks Decision Making
Published 2017-02-26
URL http://arxiv.org/abs/1702.08001v1
PDF http://arxiv.org/pdf/1702.08001v1.pdf
PWC https://paperswithcode.com/paper/bayesian-nonparametric-feature-and-policy
Repo
Framework

Compact Descriptors for Video Analysis: the Emerging MPEG Standard

Title Compact Descriptors for Video Analysis: the Emerging MPEG Standard
Authors Ling-Yu Duan, Vijay Chandrasekhar, Shiqi Wang, Yihang Lou, Jie Lin, Yan Bai, Tiejun Huang, Alex Chichung Kot, Wen Gao
Abstract This paper provides an overview of the on-going compact descriptors for video analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG). MPEG-CDVA targets at defining a standardized bitstream syntax to enable interoperability in the context of video analysis applications. During the developments of MPEGCDVA, a series of techniques aiming to reduce the descriptor size and improve the video representation ability have been proposed. This article describes the new standard that is being developed and reports the performance of these key technical contributions.
Tasks
Published 2017-04-26
URL http://arxiv.org/abs/1704.08141v1
PDF http://arxiv.org/pdf/1704.08141v1.pdf
PWC https://paperswithcode.com/paper/compact-descriptors-for-video-analysis-the
Repo
Framework

Emotion Controlled Spectrum Mobility Scheme for Efficient Syntactic Interoperability In Cognitive Radio Based Internet of Vehicles

Title Emotion Controlled Spectrum Mobility Scheme for Efficient Syntactic Interoperability In Cognitive Radio Based Internet of Vehicles
Authors Faisal Riaz, Muaz A. Niazi
Abstract Blind spots are one of the causes of road accidents in the hilly and flat areas. These blind spot accidents can be decreased by establishing an Internet of Vehicles (IoV) using Vehicle-2-Vehicle (V2V) and Vehicle-2-Infrastrtructure (V2I) communication systems. But the problem with these IoV is that most of them are using DSRC or single Radio Access Technology (RAT) as a wireless technology, which has been proven to be failed for efficient communication between vehicles. Recently, Cognitive Radio (CR) based IoV have to be proven best wireless communication systems for vehicular networks. However, the spectrum mobility is a challenging task to keep CR based vehicular networks interoperable and has not been addressed sufficiently in existing research. In our previous research work, the Cognitive Radio Site (CR-Site) has been proposed as in-vehicle CR-device, which can be utilized to establish efficient IoV systems. H In this paper, we have introduced the Emotions Inspired Cognitive Agent (EIC_Agent) based spectrum mobility mechanism in CR-Site and proposed a novel emotions controlled spectrum mobility scheme for efficient syntactic interoperability between vehicles. For this purpose, a probabilistic deterministic finite automaton using fear factor is proposed to perform efficient spectrum mobility using fuzzy logic. In addition, the quantitative computation of different fear intensity levels has been performed with the help of fuzzy logic. The system has been tested using active data from different GSM service providers on Mangla-Mirpur road. This is supplemented by extensive simulation experiments which validate the proposed scheme for CR based high-speed vehicular networks. The qualitative comparison with the existing-state-of the-art has proven the superiority of the proposed emotions controlled syntactic interoperable spectrum mobility scheme within cognitive radio based IoV systems.
Tasks
Published 2017-08-06
URL http://arxiv.org/abs/1708.01927v1
PDF http://arxiv.org/pdf/1708.01927v1.pdf
PWC https://paperswithcode.com/paper/emotion-controlled-spectrum-mobility-scheme
Repo
Framework

Recurrent 3D Pose Sequence Machines

Title Recurrent 3D Pose Sequence Machines
Authors Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, Hui Cheng
Abstract 3D human articulated pose recovery from monocular image sequences is very challenging due to the diverse appearances, viewpoints, occlusions, and also the human 3D pose is inherently ambiguous from the monocular imagery. It is thus critical to exploit rich spatial and temporal long-range dependencies among body joints for accurate 3D pose sequence prediction. Existing approaches usually manually design some elaborate prior terms and human body kinematic constraints for capturing structures, which are often insufficient to exploit all intrinsic structures and not scalable for all scenarios. In contrast, this paper presents a Recurrent 3D Pose Sequence Machine(RPSM) to automatically learn the image-dependent structural constraint and sequence-dependent temporal context by using a multi-stage sequential refinement. At each stage, our RPSM is composed of three modules to predict the 3D pose sequences based on the previously learned 2D pose representations and 3D poses: (i) a 2D pose module extracting the image-dependent pose representations, (ii) a 3D pose recurrent module regressing 3D poses and (iii) a feature adaption module serving as a bridge between module (i) and (ii) to enable the representation transformation from 2D to 3D domain. These three modules are then assembled into a sequential prediction framework to refine the predicted poses with multiple recurrent stages. Extensive evaluations on the Human3.6M dataset and HumanEva-I dataset show that our RPSM outperforms all state-of-the-art approaches for 3D pose estimation.
Tasks 3D Pose Estimation, Pose Estimation
Published 2017-07-31
URL http://arxiv.org/abs/1707.09695v1
PDF http://arxiv.org/pdf/1707.09695v1.pdf
PWC https://paperswithcode.com/paper/recurrent-3d-pose-sequence-machines
Repo
Framework

A quantum dynamic belief decision making model

Title A quantum dynamic belief decision making model
Authors Zichang He, Wen Jiang
Abstract The sure thing principle and the law of total probability are basic laws in classic probability theory. A disjunction fallacy leads to the violation of these two classical probability laws. In this paper, a new quantum dynamic belief decision making model based on quantum dynamic modelling and Dempster-Shafer (D-S) evidence theory is proposed to address this issue and model the real human decision-making process. Some mathematical techniques are borrowed from quantum mathematics. Generally, belief and action are two parts in a decision making process. The uncertainty in belief part is represented by a superposition of certain states. The uncertainty in actions is represented as an extra uncertainty state. The interference effect is produced due to the entanglement between beliefs and actions. Basic probability assignment (BPA) of decisions is generated by quantum dynamic modelling. Then BPA of the extra uncertain state and an entanglement degree defined by an entropy function named Deng entropy are used to measure the interference effect. Compared the existing model, the number of free parameters is less in our model. Finally, a classical categorization decision-making experiment is illustrated to show the effectiveness of our model.
Tasks Decision Making
Published 2017-03-06
URL http://arxiv.org/abs/1703.02386v1
PDF http://arxiv.org/pdf/1703.02386v1.pdf
PWC https://paperswithcode.com/paper/a-quantum-dynamic-belief-decision-making
Repo
Framework
comments powered by Disqus