October 20, 2019

3247 words 16 mins read

Paper Group ANR 90

Efficient Pose Tracking from Natural Features in Standard Web Browsers. Im2Struct: Recovering 3D Shape Structure from a Single RGB Image. Designing communication systems via iterative improvement: error correction coding with Bayes decoder and codebook optimized for source symbol error. Adaptive Structural Learning of Deep Belief Network for Medica …

Efficient Pose Tracking from Natural Features in Standard Web Browsers


Title	Efficient Pose Tracking from Natural Features in Standard Web Browsers
Authors	Fabian Göttl, Philipp Gagel, Jens Grubert
Abstract	Computer Vision-based natural feature tracking is at the core of modern Augmented Reality applications. Still, Web-based Augmented Reality typically relies on location-based sensing (using GPS and orientation sensors) or marker-based approaches to solve the pose estimation problem. We present an implementation and evaluation of an efficient natural feature tracking pipeline for standard Web browsers using HTML5 and WebAssembly. Our system can track image targets at real-time frame rates tablet PCs (up to 60 Hz) and smartphones (up to 25 Hz).
Tasks	Pose Estimation, Pose Tracking
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08424v1
PDF	http://arxiv.org/pdf/1804.08424v1.pdf
PWC	https://paperswithcode.com/paper/efficient-pose-tracking-from-natural-features
Repo
Framework

Im2Struct: Recovering 3D Shape Structure from a Single RGB Image


Title	Im2Struct: Recovering 3D Shape Structure from a Single RGB Image
Authors	Chengjie Niu, Jun Li, Kai Xu
Abstract	We propose to recover 3D shape structures from single RGB images, where structure refers to shape parts represented by cuboids and part relations encompassing connectivity and symmetry. Given a single 2D image with an object depicted, our goal is automatically recover a cuboid structure of the object parts as well as their mutual relations. We develop a convolutional-recursive auto-encoder comprised of structure parsing of a 2D image followed by structure recovering of a cuboid hierarchy. The encoder is achieved by a multi-scale convolutional network trained with the task of shape contour estimation, thereby learning to discern object structures in various forms and scales. The decoder fuses the features of the structure parsing network and the original image, and recursively decodes a hierarchy of cuboids. Since the decoder network is learned to recover part relations including connectivity and symmetry explicitly, the plausibility and generality of part structure recovery can be ensured. The two networks are jointly trained using the training data of contour-mask and cuboid structure pairs. Such pairs are generated by rendering stock 3D CAD models coming with part segmentation. Our method achieves unprecedentedly faithful and detailed recovery of diverse 3D part structures from single-view 2D images. We demonstrate two applications of our method including structure-guided completion of 3D volumes reconstructed from single-view images and structure-aware interactive editing of 2D images.
Tasks
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05469v1
PDF	http://arxiv.org/pdf/1804.05469v1.pdf
PWC	https://paperswithcode.com/paper/im2struct-recovering-3d-shape-structure-from
Repo
Framework

Designing communication systems via iterative improvement: error correction coding with Bayes decoder and codebook optimized for source symbol error


Title	Designing communication systems via iterative improvement: error correction coding with Bayes decoder and codebook optimized for source symbol error
Authors	Chai Wah Wu
Abstract	In most error correction coding (ECC) frameworks, the typical error metric is the bit error rate (BER) which measures the number of bit errors. For this metric, the positions of the bits are not relevant to the decoding, and in many noise models, not relevant to the BER either. In many applications this is unsatisfactory as typically all bits are not equal and have different significance. We look at ECC from a Bayesian perspective and introduce Bayes estimators with general loss functions to take into account the bit significance. We propose ECC schemes that optimize this error metric. As the problem is highly nonlinear, traditional ECC construction techniques are not applicable. Using exhaustive search is cost prohibitive, and thus we use iterative improvement search techniques to find good codebooks. We optimize both general codebooks and linear codes. We provide numerical experiments to show that they can be superior to classical linear block codes such as Hamming codes and decoding methods such as minimum distance decoding.
Tasks
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07429v4
PDF	http://arxiv.org/pdf/1805.07429v4.pdf
PWC	https://paperswithcode.com/paper/designing-communication-systems-via-iterative
Repo
Framework

Adaptive Structural Learning of Deep Belief Network for Medical Examination Data and Its Knowledge Extraction by using C4.5


Title	Adaptive Structural Learning of Deep Belief Network for Medical Examination Data and Its Knowledge Extraction by using C4.5
Authors	Shin Kamada, Takumi Ichimura, Toshihide Harada
Abstract	Deep Learning has a hierarchical network architecture to represent the complicated feature of input patterns. The adaptive structural learning method of Deep Belief Network (DBN) has been developed. The method can discover an optimal number of hidden neurons for given input data in a Restricted Boltzmann Machine (RBM) by neuron generation-annihilation algorithm, and generate a new hidden layer in DBN by the extension of the algorithm. In this paper, the proposed adaptive structural learning of DBN was applied to the comprehensive medical examination data for the cancer prediction. The prediction system shows higher classification accuracy (99.8% for training and 95.5% for test) than the traditional DBN. Moreover, the explicit knowledge with respect to the relation between input and output patterns was extracted from the trained DBN network by C4.5. Some characteristics extracted in the form of IF-THEN rules to find an initial cancer at the early stage were reported in this paper.
Tasks
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08777v1
PDF	http://arxiv.org/pdf/1808.08777v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-structural-learning-of-deep-belief
Repo
Framework

The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches


Title	The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches
Authors	Md Zahangir Alom, Tarek M. Taha, Christopher Yakopcic, Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin, Brian C Van Esesn, Abdul A S. Awwal, Vijayan K. Asari
Abstract	Deep learning has demonstrated tremendous success in variety of application domains in the past few years. This new field of machine learning has been growing rapidly and applied in most of the application domains with some new modalities of applications, which helps to open new opportunity. There are different methods have been proposed on different category of learning approaches, which includes supervised, semi-supervised and un-supervised learning. The experimental results show state-of-the-art performance of deep learning over traditional machine learning approaches in the field of Image Processing, Computer Vision, Speech Recognition, Machine Translation, Art, Medical imaging, Medical information processing, Robotics and control, Bio-informatics, Natural Language Processing (NLP), Cyber security, and many more. This report presents a brief survey on development of DL approaches, including Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) including Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL). In addition, we have included recent development of proposed advanced variant DL techniques based on the mentioned DL approaches. Furthermore, DL approaches have explored and evaluated in different application domains are also included in this survey. We have also comprised recently developed frameworks, SDKs, and benchmark datasets that are used for implementing and evaluating deep learning approaches. There are some surveys have published on Deep Learning in Neural Networks [1, 38] and a survey on RL [234]. However, those papers have not discussed the individual advanced techniques for training large scale deep learning models and the recently developed method of generative models [1].
Tasks	Machine Translation, Speech Recognition
Published	2018-03-03
URL	http://arxiv.org/abs/1803.01164v2
PDF	http://arxiv.org/pdf/1803.01164v2.pdf
PWC	https://paperswithcode.com/paper/the-history-began-from-alexnet-a
Repo
Framework

Neural Models for Reasoning over Multiple Mentions using Coreference


Title	Neural Models for Reasoning over Multiple Mentions using Coreference
Authors	Bhuwan Dhingra, Qiao Jin, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
Abstract	Many problems in NLP require aggregating information from multiple mentions of the same entity which may be far apart in the text. Existing Recurrent Neural Network (RNN) layers are biased towards short-term dependencies and hence not suited to such tasks. We present a recurrent layer which is instead biased towards coreferent dependencies. The layer uses coreference annotations extracted from an external system to connect entity mentions belonging to the same cluster. Incorporating this layer into a state-of-the-art reading comprehension model improves performance on three datasets – Wikihop, LAMBADA and the bAbi AI tasks – with large gains when training data is scarce.
Tasks	Reading Comprehension
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05922v1
PDF	http://arxiv.org/pdf/1804.05922v1.pdf
PWC	https://paperswithcode.com/paper/neural-models-for-reasoning-over-multiple
Repo
Framework


Title	Cross-Modal Retrieval with Implicit Concept Association
Authors	Yale Song, Mohammad Soleymani
Abstract	Traditional cross-modal retrieval assumes explicit association of concepts across modalities, where there is no ambiguity in how the concepts are linked to each other, e.g., when we do the image search with a query “dogs”, we expect to see dog images. In this paper, we consider a different setting for cross-modal retrieval where data from different modalities are implicitly linked via concepts that must be inferred by high-level reasoning; we call this setting implicit concept association. To foster future research in this setting, we present a new dataset containing 47K pairs of animated GIFs and sentences crawled from the web, in which the GIFs depict physical or emotional reactions to the scenarios described in the text (called “reaction GIFs”). We report on a user study showing that, despite the presence of implicit concept association, humans are able to identify video-sentence pairs with matching concepts, suggesting the feasibility of our task. Furthermore, we propose a novel visual-semantic embedding network based on multiple instance learning. Unlike traditional approaches, we compute multiple embeddings from each modality, each representing different concepts, and measure their similarity by considering all possible combinations of visual-semantic embeddings in the framework of multiple instance learning. We evaluate our approach on two video-sentence datasets with explicit and implicit concept association and report competitive results compared to existing approaches on cross-modal retrieval.
Tasks	Cross-Modal Retrieval, Image Retrieval, Multiple Instance Learning
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04318v2
PDF	http://arxiv.org/pdf/1804.04318v2.pdf
PWC	https://paperswithcode.com/paper/cross-modal-retrieval-with-implicit-concept
Repo
Framework

Open Subtitles Paraphrase Corpus for Six Languages


Title	Open Subtitles Paraphrase Corpus for Six Languages
Authors	Mathias Creutz
Abstract	This paper accompanies the release of Opusparcus, a new paraphrase corpus for six European languages: German, English, Finnish, French, Russian, and Swedish. The corpus consists of paraphrases, that is, pairs of sentences in the same language that mean approximately the same thing. The paraphrases are extracted from the OpenSubtitles2016 corpus, which contains subtitles from movies and TV shows. The informal and colloquial genre that occurs in subtitles makes such data a very interesting language resource, for instance, from the perspective of computer assisted language learning. For each target language, the Opusparcus data have been partitioned into three types of data sets: training, development and test sets. The training sets are large, consisting of millions of sentence pairs, and have been compiled automatically, with the help of probabilistic ranking functions. The development and test sets consist of sentence pairs that have been checked manually; each set contains approximately 1000 sentence pairs that have been verified to be acceptable paraphrases by two annotators.
Tasks
Published	2018-09-17
URL	http://arxiv.org/abs/1809.06142v1
PDF	http://arxiv.org/pdf/1809.06142v1.pdf
PWC	https://paperswithcode.com/paper/open-subtitles-paraphrase-corpus-for-six
Repo
Framework

Dynamic Oracles for Top-Down and In-Order Shift-Reduce Constituent Parsing


Title	Dynamic Oracles for Top-Down and In-Order Shift-Reduce Constituent Parsing
Authors	Daniel Fernández-González, Carlos Gómez-Rodríguez
Abstract	We introduce novel dynamic oracles for training two of the most accurate known shift-reduce algorithms for constituent parsing: the top-down and in-order transition-based parsers. In both cases, the dynamic oracles manage to notably increase their accuracy, in comparison to that obtained by performing classic static training. In addition, by improving the performance of the state-of-the-art in-order shift-reduce parser, we achieve the best accuracy to date (92.0 F1) obtained by a fully-supervised single-model greedy shift-reduce constituent parser on the WSJ benchmark.
Tasks
Published	2018-10-25
URL	http://arxiv.org/abs/1810.10882v1
PDF	http://arxiv.org/pdf/1810.10882v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-oracles-for-top-down-and-in-order
Repo
Framework

Predictive and Semantic Layout Estimation for Robotic Applications in Manhattan Worlds


Title	Predictive and Semantic Layout Estimation for Robotic Applications in Manhattan Worlds
Authors	Armon Shariati, Bernd Pfrommer, Camillo J. Taylor
Abstract	This paper describes an approach to automatically extracting floor plans from the kinds of incomplete measurements that could be acquired by an autonomous mobile robot. The approach proceeds by reasoning about extended structural layout surfaces which are automatically extracted from the available data. The scheme can be run in an online manner to build water tight representations of the environment. The system effectively speculates about room boundaries and free space regions which provides useful guidance to subsequent motion planning systems. Experimental results are presented on multiple data sets.
Tasks	Motion Planning
Published	2018-11-19
URL	http://arxiv.org/abs/1811.07442v1
PDF	http://arxiv.org/pdf/1811.07442v1.pdf
PWC	https://paperswithcode.com/paper/predictive-and-semantic-layout-estimation-for
Repo
Framework

A Continuous Information Gain Measure to Find the Most Discriminatory Problems for AI Benchmarking


Title	A Continuous Information Gain Measure to Find the Most Discriminatory Problems for AI Benchmarking
Authors	Matthew Stephenson, Damien Anderson, Ahmed Khalifa, John Levine, Jochen Renz, Julian Togelius, Christoph Salge
Abstract	This paper introduces an information-theoretic method for selecting a small subset of problems which gives us the most information about a group of problem-solving algorithms. This method was tested on the games in the General Video Game AI (GVGAI) framework, allowing us to identify a smaller set of games that still gives a large amount of information about the game-playing agents. This approach can be used to make agent testing more efficient in the future. We can achieve almost as good discriminatory accuracy when testing on only a handful of games as when testing on more than a hundred games, something which is often computationally infeasible. Furthermore, this method can be extended to study the dimensions of effective variance in game design between these games, allowing us to identify which games differentiate between agents in the most complementary ways. As a side effect of this investigation, we provide an up-to-date comparison on agent performance for all GVGAI games, and an analysis of correlations between scores and win-rates across both games and agents.
Tasks
Published	2018-09-09
URL	http://arxiv.org/abs/1809.02904v2
PDF	http://arxiv.org/pdf/1809.02904v2.pdf
PWC	https://paperswithcode.com/paper/a-continuous-information-gain-measure-to-find
Repo
Framework

A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos


Title	A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos
Authors	Joshua Gleason, Rajeev Ranjan, Steven Schwarcz, Carlos D. Castillo, Jun-Chen Cheng, Rama Chellappa
Abstract	Existing approaches for spatio-temporal action detection in videos are limited by the spatial extent and temporal duration of the actions. In this paper, we present a modular system for spatio-temporal action detection in untrimmed security videos. We propose a two stage approach. The first stage generates dense spatio-temporal proposals using hierarchical clustering and temporal jittering techniques on frame-wise object detections. The second stage is a Temporal Refinement I3D (TRI-3D) network that performs action classification and temporal refinement on the generated proposals. The object detection-based proposal generation step helps in detecting actions occurring in a small spatial region of a video frame, while temporal jittering and refinement helps in detecting actions of variable lengths. Experimental results on the spatio-temporal action detection dataset - DIVA - show the effectiveness of our system. For comparison, the performance of our system is also evaluated on the THUMOS14 temporal action detection dataset.
Tasks	Action Classification, Action Detection, Object Detection
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08496v2
PDF	http://arxiv.org/pdf/1811.08496v2.pdf
PWC	https://paperswithcode.com/paper/a-proposal-based-solution-to-spatio-temporal
Repo
Framework

Word Tagging with Foundational Ontology Classes: Extending the WordNet-DOLCE Mapping to Verbs


Title	Word Tagging with Foundational Ontology Classes: Extending the WordNet-DOLCE Mapping to Verbs
Authors	Vivian S. Silva, André Freitas, Siegfried Handschuh
Abstract	Semantic annotation is fundamental to deal with large-scale lexical information, mapping the information to an enumerable set of categories over which rules and algorithms can be applied, and foundational ontology classes can be used as a formal set of categories for such tasks. A previous alignment between WordNet noun synsets and DOLCE provided a starting point for ontology-based annotation, but in NLP tasks verbs are also of substantial importance. This work presents an extension to the WordNet-DOLCE noun mapping, aligning verbs according to their links to nouns denoting perdurants, transferring to the verb the DOLCE class assigned to the noun that best represents that verb’s occurrence. To evaluate the usefulness of this resource, we implemented a foundational ontology-based semantic annotation framework, that assigns a high-level foundational category to each word or phrase in a text, and compared it to a similar annotation tool, obtaining an increase of 9.05% in accuracy.
Tasks
Published	2018-06-20
URL	http://arxiv.org/abs/1806.07699v1
PDF	http://arxiv.org/pdf/1806.07699v1.pdf
PWC	https://paperswithcode.com/paper/word-tagging-with-foundational-ontology
Repo
Framework

Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities


Title	Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities
Authors	Nathaniel Blanchard, Daniel Moreira, Aparna Bharati, Walter J. Scheirer
Abstract	In the last decade, video blogs (vlogs) have become an extremely popular method through which people express sentiment. The ubiquitousness of these videos has increased the importance of multimodal fusion models, which incorporate video and audio features with traditional text features for automatic sentiment detection. Multimodal fusion offers a unique opportunity to build models that learn from the full depth of expression available to human viewers. In the detection of sentiment in these videos, acoustic and video features provide clarity to otherwise ambiguous transcripts. In this paper, we present a multimodal fusion model that exclusively uses high-level video and audio features to analyze spoken sentences for sentiment. We discard traditional transcription features in order to minimize human intervention and to maximize the deployability of our model on at-scale real-world data. We select high-level features for our model that have been successful in nonaffect domains in order to test their generalizability in the sentiment detection domain. We train and test our model on the newly released CMU Multimodal Opinion Sentiment and Emotion Intensity (CMUMOSEI) dataset, obtaining an F1 score of 0.8049 on the validation set and an F1 score of 0.6325 on the held-out challenge test set.
Tasks	Sentiment Analysis
Published	2018-07-03
URL	http://arxiv.org/abs/1807.01122v1
PDF	http://arxiv.org/pdf/1807.01122v1.pdf
PWC	https://paperswithcode.com/paper/getting-the-subtext-without-the-text-scalable
Repo
Framework

A Collaborative Computer Aided Diagnosis (C-CAD) System with Eye-Tracking, Sparse Attentional Model, and Deep Learning


Title	A Collaborative Computer Aided Diagnosis (C-CAD) System with Eye-Tracking, Sparse Attentional Model, and Deep Learning
Authors	Naji Khosravan, Haydar Celik, Baris Turkbey, Elizabeth Jones, Bradford Wood, Ulas Bagci
Abstract	There are at least two categories of errors in radiology screening that can lead to suboptimal diagnostic decisions and interventions:(i)human fallibility and (ii)complexity of visual search. Computer aided diagnostic (CAD) tools are developed to help radiologists to compensate for some of these errors. However, despite their significant improvements over conventional screening strategies, most CAD systems do not go beyond their use as second opinion tools due to producing a high number of false positives, which human interpreters need to correct. In parallel with efforts in computerized analysis of radiology scans, several researchers have examined behaviors of radiologists while screening medical images to better understand how and why they miss tumors, how they interact with the information in an image, and how they search for unknown pathology in the images. Eye-tracking tools have been instrumental in exploring answers to these fundamental questions. In this paper, we aim to develop a paradigm shift CAD system, called collaborative CAD (C-CAD), that unifies both of the above mentioned research lines: CAD and eye-tracking. We design an eye-tracking interface providing radiologists with a real radiology reading room experience. Then, we propose a novel algorithm that unifies eye-tracking data and a CAD system. Specifically, we present a new graph based clustering and sparsification algorithm to transform eye-tracking data (gaze) into a signal model to interpret gaze patterns quantitatively and qualitatively. The proposed C-CAD collaborates with radiologists via eye-tracking technology and helps them to improve diagnostic decisions. The C-CAD learns radiologists’ search efficiency by processing their gaze patterns. To do this, the C-CAD uses a deep learning algorithm in a newly designed multi-task learning platform to segment and diagnose cancers simultaneously.
Tasks	Eye Tracking, Multi-Task Learning
Published	2018-02-17
URL	http://arxiv.org/abs/1802.06260v2
PDF	http://arxiv.org/pdf/1802.06260v2.pdf
PWC	https://paperswithcode.com/paper/a-collaborative-computer-aided-diagnosis-c
Repo
Framework