Paper Group ANR 180
VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization. Handwritten digit string recognition by combination of residual network and RNN-CTC. Evaluating Social Networks Using Task-Focused Network Inference. Learning to Attend, Copy, and Generate for Session-Based Query Suggestion. Enriched Deep Recurrent Visual Attention Model for …
VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization
Title | VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization |
Authors | Ronald Clark, Sen Wang, Andrew Markham, Niki Trigoni, Hongkai Wen |
Abstract | Machine learning techniques, namely convolutional neural networks (CNN) and regression forests, have recently shown great promise in performing 6-DoF localization of monocular images. However, in most cases image-sequences, rather only single images, are readily available. To this extent, none of the proposed learning-based approaches exploit the valuable constraint of temporal smoothness, often leading to situations where the per-frame error is larger than the camera motion. In this paper we propose a recurrent model for performing 6-DoF localization of video-clips. We find that, even by considering only short sequences (20 frames), the pose estimates are smoothed and the localization error can be drastically reduced. Finally, we consider means of obtaining probabilistic pose estimates from our model. We evaluate our method on openly-available real-world autonomous driving and indoor localization datasets. |
Tasks | Autonomous Driving |
Published | 2017-02-21 |
URL | http://arxiv.org/abs/1702.06521v2 |
http://arxiv.org/pdf/1702.06521v2.pdf | |
PWC | https://paperswithcode.com/paper/vidloc-a-deep-spatio-temporal-model-for-6-dof |
Repo | |
Framework | |
Handwritten digit string recognition by combination of residual network and RNN-CTC
Title | Handwritten digit string recognition by combination of residual network and RNN-CTC |
Authors | Hongjian Zhan, Qingqing Wang, Yue Lu |
Abstract | Recurrent neural network (RNN) and connectionist temporal classification (CTC) have showed successes in many sequence labeling tasks with the strong ability of dealing with the problems where the alignment between the inputs and the target labels is unknown. Residual network is a new structure of convolutional neural network and works well in various computer vision tasks. In this paper, we take advantage of the architectures mentioned above to create a new network for handwritten digit string recognition. First we design a residual network to extract features from input images, then we employ a RNN to model the contextual information within feature sequences and predict recognition results. At the top of this network, a standard CTC is applied to calculate the loss and yield the final results. These three parts compose an end-to-end trainable network. The proposed new architecture achieves the highest performances on ORAND-CAR-A and ORAND-CAR-B with recognition rates 89.75% and 91.14%, respectively. In addition, the experiments on a generated captcha dataset which has much longer string length show the potential of the proposed network to handle long strings. |
Tasks | |
Published | 2017-10-09 |
URL | http://arxiv.org/abs/1710.03112v1 |
http://arxiv.org/pdf/1710.03112v1.pdf | |
PWC | https://paperswithcode.com/paper/handwritten-digit-string-recognition-by |
Repo | |
Framework | |
Evaluating Social Networks Using Task-Focused Network Inference
Title | Evaluating Social Networks Using Task-Focused Network Inference |
Authors | Ivan Brugere, Chris Kanich, Tanya Y. Berger-Wolf |
Abstract | Networks are representations of complex underlying social processes. However, the same given network may be more suitable to model one behavior of individuals than another. In many cases, aggregate population models may be more effective than modeling on the network. We present a general framework for evaluating the suitability of given networks for a set of predictive tasks of interest, compared against alternative, networks inferred from data. We present several interpretable network models and measures for our comparison. We apply this general framework to the case study on collective classification of music preferences in a newly available dataset of the Last.fm social network. |
Tasks | |
Published | 2017-07-08 |
URL | http://arxiv.org/abs/1707.02385v1 |
http://arxiv.org/pdf/1707.02385v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-social-networks-using-task-focused |
Repo | |
Framework | |
Learning to Attend, Copy, and Generate for Session-Based Query Suggestion
Title | Learning to Attend, Copy, and Generate for Session-Based Query Suggestion |
Authors | Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, Pascal Fleury |
Abstract | Users try to articulate their complex information needs during search sessions by reformulating their queries. To make this process more effective, search engines provide related queries to help users in specifying the information need in their search process. In this paper, we propose a customized sequence-to-sequence model for session-based query suggestion. In our model, we employ a query-aware attention mechanism to capture the structure of the session context. is enables us to control the scope of the session from which we infer the suggested next query, which helps not only handle the noisy data but also automatically detect session boundaries. Furthermore, we observe that, based on the user query reformulation behavior, within a single session a large portion of query terms is retained from the previously submitted queries and consists of mostly infrequent or unseen terms that are usually not included in the vocabulary. We therefore empower the decoder of our model to access the source words from the session context during decoding by incorporating a copy mechanism. Moreover, we propose evaluation metrics to assess the quality of the generative models for query suggestion. We conduct an extensive set of experiments and analysis. e results suggest that our model outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion. |
Tasks | |
Published | 2017-08-11 |
URL | http://arxiv.org/abs/1708.03418v4 |
http://arxiv.org/pdf/1708.03418v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-attend-copy-and-generate-for |
Repo | |
Framework | |
Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition
Title | Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition |
Authors | Artsiom Ablavatski, Shijian Lu, Jianfei Cai |
Abstract | We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an improved attention-based architecture for multiple object recognition. The proposed model is a fully differentiable unit that can be optimized end-to-end by using Stochastic Gradient Descent (SGD). The Spatial Transformer (ST) was employed as visual attention mechanism which allows to learn the geometric transformation of objects within images. With the combination of the Spatial Transformer and the powerful recurrent architecture, the proposed EDRAM can localize and recognize objects simultaneously. EDRAM has been evaluated on two publicly available datasets including MNIST Cluttered (with 70K cluttered digits) and SVHN (with up to 250k real world images of house numbers). Experiments show that it obtains superior performance as compared with the state-of-the-art models. |
Tasks | Object Recognition |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03581v1 |
http://arxiv.org/pdf/1706.03581v1.pdf | |
PWC | https://paperswithcode.com/paper/enriched-deep-recurrent-visual-attention |
Repo | |
Framework | |
Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference
Title | Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference |
Authors | Yi Tay, Luu Anh Tuan, Siu Cheung Hui |
Abstract | This paper presents a new deep learning architecture for Natural Language Inference (NLI). Firstly, we introduce a new architecture where alignment pairs are compared, compressed and then propagated to upper layers for enhanced representation learning. Secondly, we adopt factorization layers for efficient and expressive compression of alignment vectors into scalar features, which are then used to augment the base word representations. The design of our approach is aimed to be conceptually simple, compact and yet powerful. We conduct experiments on three popular benchmarks, SNLI, MultiNLI and SciTail, achieving competitive performance on all. A lightweight parameterization of our model also enjoys a $\approx 3$ times reduction in parameter size compared to the existing state-of-the-art models, e.g., ESIM and DIIN, while maintaining competitive performance. Additionally, visual analysis shows that our propagated features are highly interpretable. |
Tasks | Natural Language Inference, Representation Learning |
Published | 2017-12-30 |
URL | http://arxiv.org/abs/1801.00102v2 |
http://arxiv.org/pdf/1801.00102v2.pdf | |
PWC | https://paperswithcode.com/paper/compare-compress-and-propagate-enhancing |
Repo | |
Framework | |
Unsupervised state representation learning with robotic priors: a robustness benchmark
Title | Unsupervised state representation learning with robotic priors: a robustness benchmark |
Authors | Timothée Lesort, Mathieu Seurin, Xinrui Li, Natalia Díaz Rodríguez, David Filliat |
Abstract | Our understanding of the world depends highly on our capacity to produce intuitive and simplified representations which can be easily used to solve problems. We reproduce this simplification process using a neural network to build a low dimensional state representation of the world from images acquired by a robot. As in Jonschkowski et al. 2015, we learn in an unsupervised way using prior knowledge about the world as loss functions called robotic priors and extend this approach to high dimension richer images to learn a 3D representation of the hand position of a robot from RGB images. We propose a quantitative evaluation of the learned representation using nearest neighbors in the state space that allows to assess its quality and show both the potential and limitations of robotic priors in realistic environments. We augment image size, add distractors and domain randomization, all crucial components to achieve transfer learning to real robots. Finally, we also contribute a new prior to improve the robustness of the representation. The applications of such low dimensional state representation range from easing reinforcement learning (RL) and knowledge transfer across tasks, to facilitating learning from raw data with more efficient and compact high level representations. The results show that the robotic prior approach is able to extract high level representation as the 3D position of an arm and organize it into a compact and coherent space of states in a challenging dataset. |
Tasks | Representation Learning, Transfer Learning |
Published | 2017-09-15 |
URL | http://arxiv.org/abs/1709.05185v1 |
http://arxiv.org/pdf/1709.05185v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-state-representation-learning |
Repo | |
Framework | |
Beat by Beat: Classifying Cardiac Arrhythmias with Recurrent Neural Networks
Title | Beat by Beat: Classifying Cardiac Arrhythmias with Recurrent Neural Networks |
Authors | Patrick Schwab, Gaetano Scebba, Jia Zhang, Marco Delai, Walter Karlen |
Abstract | With tens of thousands of electrocardiogram (ECG) records processed by mobile cardiac event recorders every day, heart rhythm classification algorithms are an important tool for the continuous monitoring of patients at risk. We utilise an annotated dataset of 12,186 single-lead ECG recordings to build a diverse ensemble of recurrent neural networks (RNNs) that is able to distinguish between normal sinus rhythms, atrial fibrillation, other types of arrhythmia and signals that are too noisy to interpret. In order to ease learning over the temporal dimension, we introduce a novel task formulation that harnesses the natural segmentation of ECG signals into heartbeats to drastically reduce the number of time steps per sequence. Additionally, we extend our RNNs with an attention mechanism that enables us to reason about which heartbeats our RNNs focus on to make their decisions. Through the use of attention, our model maintains a high degree of interpretability, while also achieving state-of-the-art classification performance with an average F1 score of 0.79 on an unseen test set (n=3,658). |
Tasks | |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06319v2 |
http://arxiv.org/pdf/1710.06319v2.pdf | |
PWC | https://paperswithcode.com/paper/beat-by-beat-classifying-cardiac-arrhythmias |
Repo | |
Framework | |
A Big Data Analysis Framework Using Apache Spark and Deep Learning
Title | A Big Data Analysis Framework Using Apache Spark and Deep Learning |
Authors | Anand Gupta, Hardeo Thakur, Ritvik Shrivastava, Pulkit Kumar, Sreyashi Nag |
Abstract | With the spreading prevalence of Big Data, many advances have recently been made in this field. Frameworks such as Apache Hadoop and Apache Spark have gained a lot of traction over the past decades and have become massively popular, especially in industries. It is becoming increasingly evident that effective big data analysis is key to solving artificial intelligence problems. Thus, a multi-algorithm library was implemented in the Spark framework, called MLlib. While this library supports multiple machine learning algorithms, there is still scope to use the Spark setup efficiently for highly time-intensive and computationally expensive procedures like deep learning. In this paper, we propose a novel framework that combines the distributive computational abilities of Apache Spark and the advanced machine learning architecture of a deep multi-layer perceptron (MLP), using the popular concept of Cascade Learning. We conduct empirical analysis of our framework on two real world datasets. The results are encouraging and corroborate our proposed framework, in turn proving that it is an improvement over traditional big data analysis methods that use either Spark or Deep learning as individual elements. |
Tasks | |
Published | 2017-11-25 |
URL | http://arxiv.org/abs/1711.09279v1 |
http://arxiv.org/pdf/1711.09279v1.pdf | |
PWC | https://paperswithcode.com/paper/a-big-data-analysis-framework-using-apache |
Repo | |
Framework | |
RoboCupSimData: A RoboCup soccer research dataset
Title | RoboCupSimData: A RoboCup soccer research dataset |
Authors | Olivia Michael, Oliver Obst, Falk Schmidsberger, Frieder Stolzenburg |
Abstract | RoboCup is an international scientific robot competition in which teams of multiple robots compete against each other. Its different leagues provide many sources of robotics data, that can be used for further analysis and application of machine learning. This paper describes a large dataset from games of some of the top teams (from 2016 and 2017) in RoboCup Soccer Simulation League (2D), where teams of 11 robots (agents) compete against each other. Overall, we used 10 different teams to play each other, resulting in 45 unique pairings. For each pairing, we ran 25 matches (of 10mins), leading to 1125 matches or more than 180 hours of game play. The generated CSV files are 17GB of data (zipped), or 229GB (unzipped). The dataset is unique in the sense that it contains both the ground truth data (global, complete, noise-free information of all objects on the field), as well as the noisy, local and incomplete percepts of each robot. These data are made available as CSV files, as well as in the original soccer simulator formats. |
Tasks | |
Published | 2017-11-06 |
URL | http://arxiv.org/abs/1711.01703v1 |
http://arxiv.org/pdf/1711.01703v1.pdf | |
PWC | https://paperswithcode.com/paper/robocupsimdata-a-robocup-soccer-research |
Repo | |
Framework | |
Bayesian Nonparametric Feature and Policy Learning for Decision-Making
Title | Bayesian Nonparametric Feature and Policy Learning for Decision-Making |
Authors | Jürgen Hahn, Abdelhak M. Zoubir |
Abstract | Learning from demonstrations has gained increasing interest in the recent past, enabling an agent to learn how to make decisions by observing an experienced teacher. While many approaches have been proposed to solve this problem, there is only little work that focuses on reasoning about the observed behavior. We assume that, in many practical problems, an agent makes its decision based on latent features, indicating a certain action. Therefore, we propose a generative model for the states and actions. Inference reveals the number of features, the features, and the policies, allowing us to learn and to analyze the underlying structure of the observed behavior. Further, our approach enables prediction of actions for new states. Simulations are used to assess the performance of the algorithm based upon this model. Moreover, the problem of learning a driver’s behavior is investigated, demonstrating the performance of the proposed model in a real-world scenario. |
Tasks | Decision Making |
Published | 2017-02-26 |
URL | http://arxiv.org/abs/1702.08001v1 |
http://arxiv.org/pdf/1702.08001v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-nonparametric-feature-and-policy |
Repo | |
Framework | |
Compact Descriptors for Video Analysis: the Emerging MPEG Standard
Title | Compact Descriptors for Video Analysis: the Emerging MPEG Standard |
Authors | Ling-Yu Duan, Vijay Chandrasekhar, Shiqi Wang, Yihang Lou, Jie Lin, Yan Bai, Tiejun Huang, Alex Chichung Kot, Wen Gao |
Abstract | This paper provides an overview of the on-going compact descriptors for video analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG). MPEG-CDVA targets at defining a standardized bitstream syntax to enable interoperability in the context of video analysis applications. During the developments of MPEGCDVA, a series of techniques aiming to reduce the descriptor size and improve the video representation ability have been proposed. This article describes the new standard that is being developed and reports the performance of these key technical contributions. |
Tasks | |
Published | 2017-04-26 |
URL | http://arxiv.org/abs/1704.08141v1 |
http://arxiv.org/pdf/1704.08141v1.pdf | |
PWC | https://paperswithcode.com/paper/compact-descriptors-for-video-analysis-the |
Repo | |
Framework | |
Emotion Controlled Spectrum Mobility Scheme for Efficient Syntactic Interoperability In Cognitive Radio Based Internet of Vehicles
Title | Emotion Controlled Spectrum Mobility Scheme for Efficient Syntactic Interoperability In Cognitive Radio Based Internet of Vehicles |
Authors | Faisal Riaz, Muaz A. Niazi |
Abstract | Blind spots are one of the causes of road accidents in the hilly and flat areas. These blind spot accidents can be decreased by establishing an Internet of Vehicles (IoV) using Vehicle-2-Vehicle (V2V) and Vehicle-2-Infrastrtructure (V2I) communication systems. But the problem with these IoV is that most of them are using DSRC or single Radio Access Technology (RAT) as a wireless technology, which has been proven to be failed for efficient communication between vehicles. Recently, Cognitive Radio (CR) based IoV have to be proven best wireless communication systems for vehicular networks. However, the spectrum mobility is a challenging task to keep CR based vehicular networks interoperable and has not been addressed sufficiently in existing research. In our previous research work, the Cognitive Radio Site (CR-Site) has been proposed as in-vehicle CR-device, which can be utilized to establish efficient IoV systems. H In this paper, we have introduced the Emotions Inspired Cognitive Agent (EIC_Agent) based spectrum mobility mechanism in CR-Site and proposed a novel emotions controlled spectrum mobility scheme for efficient syntactic interoperability between vehicles. For this purpose, a probabilistic deterministic finite automaton using fear factor is proposed to perform efficient spectrum mobility using fuzzy logic. In addition, the quantitative computation of different fear intensity levels has been performed with the help of fuzzy logic. The system has been tested using active data from different GSM service providers on Mangla-Mirpur road. This is supplemented by extensive simulation experiments which validate the proposed scheme for CR based high-speed vehicular networks. The qualitative comparison with the existing-state-of the-art has proven the superiority of the proposed emotions controlled syntactic interoperable spectrum mobility scheme within cognitive radio based IoV systems. |
Tasks | |
Published | 2017-08-06 |
URL | http://arxiv.org/abs/1708.01927v1 |
http://arxiv.org/pdf/1708.01927v1.pdf | |
PWC | https://paperswithcode.com/paper/emotion-controlled-spectrum-mobility-scheme |
Repo | |
Framework | |
Recurrent 3D Pose Sequence Machines
Title | Recurrent 3D Pose Sequence Machines |
Authors | Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, Hui Cheng |
Abstract | 3D human articulated pose recovery from monocular image sequences is very challenging due to the diverse appearances, viewpoints, occlusions, and also the human 3D pose is inherently ambiguous from the monocular imagery. It is thus critical to exploit rich spatial and temporal long-range dependencies among body joints for accurate 3D pose sequence prediction. Existing approaches usually manually design some elaborate prior terms and human body kinematic constraints for capturing structures, which are often insufficient to exploit all intrinsic structures and not scalable for all scenarios. In contrast, this paper presents a Recurrent 3D Pose Sequence Machine(RPSM) to automatically learn the image-dependent structural constraint and sequence-dependent temporal context by using a multi-stage sequential refinement. At each stage, our RPSM is composed of three modules to predict the 3D pose sequences based on the previously learned 2D pose representations and 3D poses: (i) a 2D pose module extracting the image-dependent pose representations, (ii) a 3D pose recurrent module regressing 3D poses and (iii) a feature adaption module serving as a bridge between module (i) and (ii) to enable the representation transformation from 2D to 3D domain. These three modules are then assembled into a sequential prediction framework to refine the predicted poses with multiple recurrent stages. Extensive evaluations on the Human3.6M dataset and HumanEva-I dataset show that our RPSM outperforms all state-of-the-art approaches for 3D pose estimation. |
Tasks | 3D Pose Estimation, Pose Estimation |
Published | 2017-07-31 |
URL | http://arxiv.org/abs/1707.09695v1 |
http://arxiv.org/pdf/1707.09695v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-3d-pose-sequence-machines |
Repo | |
Framework | |
A quantum dynamic belief decision making model
Title | A quantum dynamic belief decision making model |
Authors | Zichang He, Wen Jiang |
Abstract | The sure thing principle and the law of total probability are basic laws in classic probability theory. A disjunction fallacy leads to the violation of these two classical probability laws. In this paper, a new quantum dynamic belief decision making model based on quantum dynamic modelling and Dempster-Shafer (D-S) evidence theory is proposed to address this issue and model the real human decision-making process. Some mathematical techniques are borrowed from quantum mathematics. Generally, belief and action are two parts in a decision making process. The uncertainty in belief part is represented by a superposition of certain states. The uncertainty in actions is represented as an extra uncertainty state. The interference effect is produced due to the entanglement between beliefs and actions. Basic probability assignment (BPA) of decisions is generated by quantum dynamic modelling. Then BPA of the extra uncertain state and an entanglement degree defined by an entropy function named Deng entropy are used to measure the interference effect. Compared the existing model, the number of free parameters is less in our model. Finally, a classical categorization decision-making experiment is illustrated to show the effectiveness of our model. |
Tasks | Decision Making |
Published | 2017-03-06 |
URL | http://arxiv.org/abs/1703.02386v1 |
http://arxiv.org/pdf/1703.02386v1.pdf | |
PWC | https://paperswithcode.com/paper/a-quantum-dynamic-belief-decision-making |
Repo | |
Framework | |