July 28, 2019

3164 words 15 mins read

Paper Group ANR 268

Paper Group ANR 268

Zone-based Keyword Spotting in Bangla and Devanagari Documents. Authorship Analysis of Xenophon’s Cyropaedia. Answer Set Programming for Non-Stationary Markov Decision Processes. Organic Computing in the Spotlight. On the Global-Local Dichotomy in Sparsity Modeling. Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models. T …

Zone-based Keyword Spotting in Bangla and Devanagari Documents

Title Zone-based Keyword Spotting in Bangla and Devanagari Documents
Authors Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal
Abstract In this paper we present a word spotting system in text lines for offline Indic scripts such as Bangla (Bengali) and Devanagari. Recently, it was shown that zone-wise recognition method improves the word recognition performance than conventional full word recognition system in Indic scripts. Inspired with this idea we consider the zone segmentation approach and use middle zone information to improve the traditional word spotting performance. To avoid the problem of zone segmentation using heuristic approach, we propose here an HMM based approach to segment the upper and lower zone components from the text line images. The candidate keywords are searched from a line without segmenting characters or words. Also, we propose a novel feature combining foreground and background information of text line images for keyword-spotting by character filler models. A significant improvement in performance is noted by using both foreground and background information than their individual one. Pyramid Histogram of Oriented Gradient (PHOG) feature has been used in our word spotting framework. From the experiment, it has been noted that the proposed zone-segmentation based system outperforms traditional approaches of word spotting.
Tasks Keyword Spotting
Published 2017-12-05
URL http://arxiv.org/abs/1712.01434v1
PDF http://arxiv.org/pdf/1712.01434v1.pdf
PWC https://paperswithcode.com/paper/zone-based-keyword-spotting-in-bangla-and
Repo
Framework

Authorship Analysis of Xenophon’s Cyropaedia

Title Authorship Analysis of Xenophon’s Cyropaedia
Authors Anjalie Field
Abstract In the past several decades, many authorship attribution studies have used computational methods to determine the authors of disputed texts. Disputed authorship is a common problem in Classics, since little information about ancient documents has survived the centuries. Many scholars have questioned the authenticity of the final chapter of Xenophon’s Cyropaedia, a 4th century B.C. historical text. In this study, we use N-grams frequency vectors with a cosine similarity function and word frequency vectors with Naive Bayes Classifiers (NBC) and Support Vector Machines (SVM) to analyze the authorship of the Cyropaedia. Although the N-gram analysis shows that the epilogue of the Cyropaedia differs slightly from the rest of the work, comparing the analysis of Xenophon with analyses of Aristotle and Plato suggests that this difference is not significant. Both NBC and SVM analyses of word frequencies show that the final chapter of the Cyropaedia is closely related to the other chapters of the Cyropaedia. Therefore, this analysis suggests that the disputed chapter was written by Xenophon. This information can help scholars better understand the Cyropaedia and also demonstrates the usefulness of applying modern authorship analysis techniques to classical literature.
Tasks
Published 2017-11-06
URL http://arxiv.org/abs/1711.01684v1
PDF http://arxiv.org/pdf/1711.01684v1.pdf
PWC https://paperswithcode.com/paper/authorship-analysis-of-xenophons-cyropaedia
Repo
Framework

Answer Set Programming for Non-Stationary Markov Decision Processes

Title Answer Set Programming for Non-Stationary Markov Decision Processes
Authors Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Paulo E. Santos, Ramon Lopez de Mantaras
Abstract Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains.
Tasks Decision Making
Published 2017-05-03
URL http://arxiv.org/abs/1705.01399v1
PDF http://arxiv.org/pdf/1705.01399v1.pdf
PWC https://paperswithcode.com/paper/answer-set-programming-for-non-stationary
Repo
Framework

Organic Computing in the Spotlight

Title Organic Computing in the Spotlight
Authors Sven Tomforde, Bernhard Sick, Christian Müller-Schloer
Abstract Organic Computing is an initiative in the field of systems engineering that proposed to make use of concepts such as self-adaptation and self-organisation to increase the robustness of technical systems. Based on the observation that traditional design and operation concepts reach their limits, transferring more autonomy to the systems themselves should result in a reduction of complexity for users, administrators, and developers. However, there seems to be a need for an updated definition of the term “Organic Computing”, of desired properties of technical, organic systems, and the objectives of the Organic Computing initiative. With this article, we will address these points.
Tasks
Published 2017-01-27
URL http://arxiv.org/abs/1701.08125v1
PDF http://arxiv.org/pdf/1701.08125v1.pdf
PWC https://paperswithcode.com/paper/organic-computing-in-the-spotlight
Repo
Framework

On the Global-Local Dichotomy in Sparsity Modeling

Title On the Global-Local Dichotomy in Sparsity Modeling
Authors Dmitry Batenkov, Yaniv Romano, Michael Elad
Abstract The traditional sparse modeling approach, when applied to inverse problems with large data such as images, essentially assumes a sparse model for small overlapping data patches. While producing state-of-the-art results, this methodology is suboptimal, as it does not attempt to model the entire global signal in any meaningful way - a nontrivial task by itself. In this paper we propose a way to bridge this theoretical gap by constructing a global model from the bottom up. Given local sparsity assumptions in a dictionary, we show that the global signal representation must satisfy a constrained underdetermined system of linear equations, which can be solved efficiently by modern optimization methods such as Alternating Direction Method of Multipliers (ADMM). We investigate conditions for unique and stable recovery, and provide numerical evidence corroborating the theory.
Tasks
Published 2017-02-11
URL http://arxiv.org/abs/1702.03446v1
PDF http://arxiv.org/pdf/1702.03446v1.pdf
PWC https://paperswithcode.com/paper/on-the-global-local-dichotomy-in-sparsity
Repo
Framework

Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models

Title Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models
Authors Rohit Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Kannan
Abstract Sequence-to-sequence models, such as attention-based models in automatic speech recognition (ASR), are typically trained to optimize the cross-entropy criterion which corresponds to improving the log-likelihood of the data. However, system performance is usually measured in terms of word error rate (WER), not log-likelihood. Traditional ASR systems benefit from discriminative sequence training which optimizes criteria such as the state-level minimum Bayes risk (sMBR) which are more closely related to WER. In the present work, we explore techniques to train attention-based models to directly minimize expected word error rate. We consider two loss functions which approximate the expected number of word errors: either by sampling from the model, or by using N-best lists of decoded hypotheses, which we find to be more effective than the sampling-based method. In experimental evaluations, we find that the proposed training procedure improves performance by up to 8.2% relative to the baseline system. This allows us to train grapheme-based, uni-directional attention-based models which match the performance of a traditional, state-of-the-art, discriminative sequence-trained system on a mobile voice-search task.
Tasks Speech Recognition
Published 2017-12-05
URL http://arxiv.org/abs/1712.01818v1
PDF http://arxiv.org/pdf/1712.01818v1.pdf
PWC https://paperswithcode.com/paper/minimum-word-error-rate-training-for
Repo
Framework

Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

Title Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems
Authors Sondre Glimsdal, Ole-Christoffer Granmo
Abstract The multi-armed bandit problem forms the foundation for solving a wide range of on-line stochastic optimization problems through a simple, yet effective mechanism. One simply casts the problem as a gambler that repeatedly pulls one out of N slot machine arms, eliciting random rewards. Learning of reward probabilities is then combined with reward maximization, by carefully balancing reward exploration against reward exploitation. In this paper, we address a particularly intriguing variant of the multi-armed bandit problem, referred to as the {\it Stochastic Point Location (SPL) Problem}. The gambler is here only told whether the optimal arm (point) lies to the “left” or to the “right” of the arm pulled, with the feedback being erroneous with probability $1-\pi$. This formulation thus captures optimization in continuous action spaces with both {\it informative} and {\it deceptive} feedback. To tackle this class of problems, we formulate a compact and scalable Bayesian representation of the solution space that simultaneously captures both the location of the optimal arm as well as the probability of receiving correct feedback. We further introduce the accompanying Thompson Sampling guided Stochastic Point Location (TS-SPL) scheme for balancing exploration against exploitation. By learning $\pi$, TS-SPL also supports {\it deceptive} environments that are lying about the direction of the optimal arm. This, in turn, allows us to solve the fundamental Stochastic Root Finding (SRF) Problem. Empirical results demonstrate that our scheme deals with both deceptive and informative environments, significantly outperforming competing algorithms both for SRF and SPL.
Tasks Stochastic Optimization
Published 2017-08-05
URL http://arxiv.org/abs/1708.01791v1
PDF http://arxiv.org/pdf/1708.01791v1.pdf
PWC https://paperswithcode.com/paper/thompson-sampling-guided-stochastic-searching
Repo
Framework

Few-Shot Image Recognition by Predicting Parameters from Activations

Title Few-Shot Image Recognition by Predicting Parameters from Activations
Authors Siyuan Qiao, Chenxi Liu, Wei Shen, Alan Yuille
Abstract In this paper, we are interested in the few-shot learning problem. In particular, we focus on a challenging scenario where the number of categories is large and the number of examples per novel category is very limited, e.g. 1, 2, or 3. Motivated by the close relationship between the parameters and the activations in a neural network associated with the same category, we propose a novel method that can adapt a pre-trained neural network to novel categories by directly predicting the parameters from the activations. Zero training is required in adaptation to novel categories, and fast inference is realized by a single forward pass. We evaluate our method by doing few-shot image recognition on the ImageNet dataset, which achieves the state-of-the-art classification accuracy on novel categories by a significant margin while keeping comparable performance on the large-scale categories. We also test our method on the MiniImageNet dataset and it strongly outperforms the previous state-of-the-art methods.
Tasks Few-Shot Image Classification, Few-Shot Learning
Published 2017-06-12
URL http://arxiv.org/abs/1706.03466v3
PDF http://arxiv.org/pdf/1706.03466v3.pdf
PWC https://paperswithcode.com/paper/few-shot-image-recognition-by-predicting
Repo
Framework

Combination of Hidden Markov Random Field and Conjugate Gradient for Brain Image Segmentation

Title Combination of Hidden Markov Random Field and Conjugate Gradient for Brain Image Segmentation
Authors EL-Hachemi Guerrout, Samy Ait-Aoudia, Dominique Michelucci, Ramdane Mahiou
Abstract Image segmentation is the process of partitioning the image into significant regions easier to analyze. Nowadays, segmentation has become a necessity in many practical medical imaging methods as locating tumors and diseases. Hidden Markov Random Field model is one of several techniques used in image segmentation. It provides an elegant way to model the segmentation process. This modeling leads to the minimization of an objective function. Conjugate Gradient algorithm (CG) is one of the best known optimization techniques. This paper proposes the use of the Conjugate Gradient algorithm (CG) for image segmentation, based on the Hidden Markov Random Field. Since derivatives are not available for this expression, finite differences are used in the CG algorithm to approximate the first derivative. The approach is evaluated using a number of publicly available images, where ground truth is known. The Dice Coefficient is used as an objective criterion to measure the quality of segmentation. The results show that the proposed CG approach compares favorably with other variants of Hidden Markov Random Field segmentation algorithms.
Tasks Brain Image Segmentation, Semantic Segmentation
Published 2017-05-13
URL http://arxiv.org/abs/1705.04823v4
PDF http://arxiv.org/pdf/1705.04823v4.pdf
PWC https://paperswithcode.com/paper/combination-of-hidden-markov-random-field-and
Repo
Framework

The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations

Title The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations
Authors Lasha Abzianidze, Johannes Bjerva, Kilian Evang, Hessel Haagsma, Rik van Noord, Pierre Ludmann, Duc-Duy Nguyen, Johan Bos
Abstract The Parallel Meaning Bank is a corpus of translations annotated with shared, formal meaning representations comprising over 11 million words divided over four languages (English, German, Italian, and Dutch). Our approach is based on cross-lingual projection: automatically produced (and manually corrected) semantic annotations for English sentences are mapped onto their word-aligned translations, assuming that the translations are meaning-preserving. The semantic annotation consists of five main steps: (i) segmentation of the text in sentences and lexical items; (ii) syntactic parsing with Combinatory Categorial Grammar; (iii) universal semantic tagging; (iv) symbolization; and (v) compositional semantic analysis based on Discourse Representation Theory. These steps are performed using statistical models trained in a semi-supervised manner. The employed annotation models are all language-neutral. Our first results are promising.
Tasks
Published 2017-02-13
URL http://arxiv.org/abs/1702.03964v1
PDF http://arxiv.org/pdf/1702.03964v1.pdf
PWC https://paperswithcode.com/paper/the-parallel-meaning-bank-towards-a
Repo
Framework

Training Convolutional Neural Networks with Limited Training Data for Ear Recognition in the Wild

Title Training Convolutional Neural Networks with Limited Training Data for Ear Recognition in the Wild
Authors Žiga Emeršič, Dejan Štepec, Vitomir Štruc, Peter Peer
Abstract Identity recognition from ear images is an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes ear recognition technology an appealing choice for surveillance and security applications as well as related application domains. In contrast to other biometric modalities, where large datasets captured in uncontrolled settings are readily available, datasets of ear images are still limited in size and mostly of laboratory-like quality. As a consequence, ear recognition technology has not benefited yet from advances in deep learning and convolutional neural networks (CNNs) and is still lacking behind other modalities that experienced significant performance gains owing to deep recognition technology. In this paper we address this problem and aim at building a CNNbased ear recognition model. We explore different strategies towards model training with limited amounts of training data and show that by selecting an appropriate model architecture, using aggressive data augmentation and selective learning on existing (pre-trained) models, we are able to learn an effective CNN-based model using a little more than 1300 training images. The result of our work is the first CNN-based approach to ear recognition that is also made publicly available to the research community. With our model we are able to improve on the rank one recognition rate of the previous state-of-the-art by more than 25% on a challenging dataset of ear images captured from the web (a.k.a. in the wild).
Tasks Data Augmentation
Published 2017-11-27
URL http://arxiv.org/abs/1711.09952v2
PDF http://arxiv.org/pdf/1711.09952v2.pdf
PWC https://paperswithcode.com/paper/training-convolutional-neural-networks-with-1
Repo
Framework

Multimodal Storytelling via Generative Adversarial Imitation Learning

Title Multimodal Storytelling via Generative Adversarial Imitation Learning
Authors Zhiqian Chen, Xuchao Zhang, Arnold P. Boedihardjo, Jing Dai, Chang-Tien Lu
Abstract Deriving event storylines is an effective summarization method to succinctly organize extensive information, which can significantly alleviate the pain of information overload. The critical challenge is the lack of widely recognized definition of storyline metric. Prior studies have developed various approaches based on different assumptions about users’ interests. These works can extract interesting patterns, but their assumptions do not guarantee that the derived patterns will match users’ preference. On the other hand, their exclusiveness of single modality source misses cross-modality information. This paper proposes a method, multimodal imitation learning via generative adversarial networks(MIL-GAN), to directly model users’ interests as reflected by various data. In particular, the proposed model addresses the critical challenge by imitating users’ demonstrated storylines. Our proposed model is designed to learn the reward patterns given user-provided storylines and then applies the learned policy to unseen data. The proposed approach is demonstrated to be capable of acquiring the user’s implicit intent and outperforming competing methods by a substantial margin with a user study.
Tasks Imitation Learning
Published 2017-12-05
URL http://arxiv.org/abs/1712.01455v1
PDF http://arxiv.org/pdf/1712.01455v1.pdf
PWC https://paperswithcode.com/paper/multimodal-storytelling-via-generative
Repo
Framework

Ensemble of Part Detectors for Simultaneous Classification and Localization

Title Ensemble of Part Detectors for Simultaneous Classification and Localization
Authors Xiaopeng Zhang, Hongkai Xiong, Weiyao Lin, Qi Tian
Abstract Part-based representation has been proven to be effective for a variety of visual applications. However, automatic discovery of discriminative parts without object/part-level annotations is challenging. This paper proposes a discriminative mid-level representation paradigm based on the responses of a collection of part detectors, which only requires the image-level labels. Towards this goal, we first develop a detector-based spectral clustering method to mine the representative and discriminative mid-level patterns for detector initialization. The advantage of the proposed pattern mining technology is that the distance metric based on detectors only focuses on discriminative details, and a set of such grouped detectors offer an effective way for consistent pattern mining. Relying on the discovered patterns, we further formulate the detector learning process as a confidence-loss sparse Multiple Instance Learning (cls-MIL) task, which considers the diversity of the positive samples, while avoid drifting away the well localized ones by assigning a confidence value to each positive sample. The responses of the learned detectors can form an effective mid-level image representation for both image classification and object localization. Experiments conducted on benchmark datasets demonstrate the superiority of our method over existing approaches.
Tasks Image Classification, Multiple Instance Learning, Object Localization
Published 2017-05-29
URL http://arxiv.org/abs/1705.10034v1
PDF http://arxiv.org/pdf/1705.10034v1.pdf
PWC https://paperswithcode.com/paper/ensemble-of-part-detectors-for-simultaneous
Repo
Framework

Semi-Automated Nasal PAP Mask Sizing using Facial Photographs

Title Semi-Automated Nasal PAP Mask Sizing using Facial Photographs
Authors Benjamin Johnston, Alistair McEwan, Philip de Chazal
Abstract We present a semi-automated system for sizing nasal Positive Airway Pressure (PAP) masks based upon a neural network model that was trained with facial photographs of both PAP mask users and non-users. It demonstrated an accuracy of 72% in correctly sizing a mask and 96% accuracy sizing to within 1 mask size group. The semi-automated system performed comparably to sizing from manual measurements taken from the same images which produced 89% and 100% accuracy respectively.
Tasks
Published 2017-09-21
URL http://arxiv.org/abs/1709.07166v1
PDF http://arxiv.org/pdf/1709.07166v1.pdf
PWC https://paperswithcode.com/paper/semi-automated-nasal-pap-mask-sizing-using
Repo
Framework

ToolNet: Holistically-Nested Real-Time Segmentation of Robotic Surgical Tools

Title ToolNet: Holistically-Nested Real-Time Segmentation of Robotic Surgical Tools
Authors Luis C. Garcia-Peraza-Herrera, Wenqi Li, Lucas Fidon, Caspar Gruijthuijsen, Alain Devreker, George Attilakos, Jan Deprest, Emmanuel Vander Poorten, Danail Stoyanov, Tom Vercauteren, Sebastien Ourselin
Abstract Real-time tool segmentation from endoscopic videos is an essential part of many computer-assisted robotic surgical systems and of critical importance in robotic surgical data science. We propose two novel deep learning architectures for automatic segmentation of non-rigid surgical instruments. Both methods take advantage of automated deep-learning-based multi-scale feature extraction while trying to maintain an accurate segmentation quality at all resolutions. The two proposed methods encode the multi-scale constraint inside the network architecture. The first proposed architecture enforces it by cascaded aggregation of predictions and the second proposed network does it by means of a holistically-nested architecture where the loss at each scale is taken into account for the optimization process. As the proposed methods are for real-time semantic labeling, both present a reduced number of parameters. We propose the use of parametric rectified linear units for semantic labeling in these small architectures to increase the regularization ability of the design and maintain the segmentation accuracy without overfitting the training sets. We compare the proposed architectures against state-of-the-art fully convolutional networks. We validate our methods using existing benchmark datasets, including ex vivo cases with phantom tissue and different robotic surgical instruments present in the scene. Our results show a statistically significant improved Dice Similarity Coefficient over previous instrument segmentation methods. We analyze our design choices and discuss the key drivers for improving accuracy.
Tasks
Published 2017-06-25
URL http://arxiv.org/abs/1706.08126v2
PDF http://arxiv.org/pdf/1706.08126v2.pdf
PWC https://paperswithcode.com/paper/toolnet-holistically-nested-real-time
Repo
Framework
comments powered by Disqus