October 20, 2019

3112 words 15 mins read

Paper Group ANR 40

Paper Group ANR 40

Automated Bridge Component Recognition using Video Data. On the Generation of Medical Question-Answer Pairs. Deep learning for word-level handwritten Indic script identification. Gaussian Process Landmarking on Manifolds. Person Search via A Mask-Guided Two-Stream CNN Model. From Thumbnails to Summaries - A single Deep Neural Network to Rule Them A …

Automated Bridge Component Recognition using Video Data

Title Automated Bridge Component Recognition using Video Data
Authors Yasutaka Narazaki, Vedhus Hoskere, Tu A. Hoang, Billie F. Spencer Jr
Abstract This paper investigates the automated recognition of structural bridge components using video data. Although understanding video data for structural inspections is straightforward for human inspectors, the implementation of the same task using machine learning methods has not been fully realized. In particular, single-frame image processing techniques, such as convolutional neural networks (CNNs), are not expected to identify structural components accurately when the image is a close-up view, lacking contextual information regarding where on the structure the image originates. Inspired by the significant progress in video processing techniques, this study investigates automated bridge component recognition using video data, where the information from the past frames is used to augment the understanding of the current frame. A new simulated video dataset is created to train the machine learning algorithms. Then, convolutional Neural Networks (CNNs) with recurrent architectures are designed and applied to implement the automated bridge component recognition task. Results are presented for simulated video data, as well as video collected in the field.
Tasks
Published 2018-06-18
URL http://arxiv.org/abs/1806.06820v2
PDF http://arxiv.org/pdf/1806.06820v2.pdf
PWC https://paperswithcode.com/paper/automated-bridge-component-recognition-using
Repo
Framework

On the Generation of Medical Question-Answer Pairs

Title On the Generation of Medical Question-Answer Pairs
Authors Sheng Shen, Yaliang Li, Nan Du, Xian Wu, Yusheng Xie, Shen Ge, Tao Yang, Kai Wang, Xingzheng Liang, Wei Fan
Abstract Question answering (QA) has achieved promising progress recently. However, answering a question in real-world scenarios like the medical domain is still challenging, due to the requirement of external knowledge and the insufficient quantity of high-quality training data. In the light of these challenges, we study the task of generating medical QA pairs in this paper. With the insight that each medical question can be considered as a sample from the latent distribution of questions given answers, we propose an automated medical QA pair generation framework, consisting of an unsupervised key phrase detector that explores unstructured material for validity, and a generator that involves a multi-pass decoder to integrate structural knowledge for diversity. A series of experiments have been conducted on a real-world dataset collected from the National Medical Licensing Examination of China. Both automatic evaluation and human annotation demonstrate the effectiveness of the proposed method. Further investigation shows that, by incorporating the generated QA pairs for training, significant improvement in terms of accuracy can be achieved for the examination QA system.
Tasks Question Answering
Published 2018-11-01
URL https://arxiv.org/abs/1811.00681v2
PDF https://arxiv.org/pdf/1811.00681v2.pdf
PWC https://paperswithcode.com/paper/on-the-generation-of-medical-question-answer
Repo
Framework

Deep learning for word-level handwritten Indic script identification

Title Deep learning for word-level handwritten Indic script identification
Authors Soumya Ukil, Swarnendu Ghosh, Sk Md Obaidullah, K. C. Santosh, Kaushik Roy, Nibaran Das
Abstract We propose a novel method that uses convolutional neural networks (CNNs) for feature extraction. Not just limited to conventional spatial domain representation, we use multilevel 2D discrete Haar wavelet transform, where image representations are scaled to a variety of different sizes. These are then used to train different CNNs to select features. To be precise, we use 10 different CNNs that select a set of 10240 features, i.e. 1024/CNN. With this, 11 different handwritten scripts are identified, where 1K words per script are used. In our test, we have achieved the maximum script identification rate of 94.73% using multi-layer perceptron (MLP). Our results outperform the state-of-the-art techniques.
Tasks
Published 2018-01-05
URL http://arxiv.org/abs/1801.01627v1
PDF http://arxiv.org/pdf/1801.01627v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-word-level-handwritten
Repo
Framework

Gaussian Process Landmarking on Manifolds

Title Gaussian Process Landmarking on Manifolds
Authors Tingran Gao, Shahar Z. Kovalsky, Ingrid Daubechies
Abstract As a means of improving analysis of biological shapes, we propose an algorithm for sampling a Riemannian manifold by sequentially selecting points with maximum uncertainty under a Gaussian process model. This greedy strategy is known to be near-optimal in the experimental design literature, and appears to outperform the use of user-placed landmarks in representing the geometry of biological objects in our application. In the noiseless regime, we establish an upper bound for the mean squared prediction error (MSPE) in terms of the number of samples and geometric quantities of the manifold, demonstrating that the MSPE for our proposed sequential design decays at a rate comparable to the oracle rate achievable by any sequential or non-sequential optimal design; to our knowledge this is the first result of this type for sequential experimental design. The key is to link the greedy algorithm to reduced basis methods in the context of model reduction for partial differential equations. We expect this approach will find additional applications in other fields of research.
Tasks
Published 2018-02-09
URL http://arxiv.org/abs/1802.03479v4
PDF http://arxiv.org/pdf/1802.03479v4.pdf
PWC https://paperswithcode.com/paper/gaussian-process-landmarking-on-manifolds
Repo
Framework

Person Search via A Mask-Guided Two-Stream CNN Model

Title Person Search via A Mask-Guided Two-Stream CNN Model
Authors Di Chen, Shanshan Zhang, Wanli Ouyang, Jian Yang, Ying Tai
Abstract In this work, we tackle the problem of person search, which is a challenging task consisted of pedestrian detection and person re-identification~(re-ID). Instead of sharing representations in a single joint model, we find that separating detector and re-ID feature extraction yields better performance. In order to extract more representative features for each identity, we segment out the foreground person from the original image patch. We propose a simple yet effective re-ID method, which models foreground person and original image patches individually, and obtains enriched representations from two separate CNN streams. From the experiments on two standard person search benchmarks of CUHK-SYSU and PRW, we achieve mAP of $83.0%$ and $32.6%$ respectively, surpassing the state of the art by a large margin (more than 5pp).
Tasks Pedestrian Detection, Person Re-Identification, Person Search
Published 2018-07-21
URL http://arxiv.org/abs/1807.08107v1
PDF http://arxiv.org/pdf/1807.08107v1.pdf
PWC https://paperswithcode.com/paper/person-search-via-a-mask-guided-two-stream
Repo
Framework

From Thumbnails to Summaries - A single Deep Neural Network to Rule Them All

Title From Thumbnails to Summaries - A single Deep Neural Network to Rule Them All
Authors Hongxiang Gu, Viswanathan Swaminathan
Abstract Video summaries come in many forms, from traditional single-image thumbnails, animated thumbnails, storyboards, to trailer-like video summaries. Content creators use the summaries to display the most attractive portion of their videos; the users use them to quickly evaluate if a video is worth watching. All forms of summaries are essential to video viewers, content creators, and advertisers. Often video content management systems have to generate multiple versions of summaries that vary in duration and presentational forms. We present a framework ReconstSum that utilizes LSTM-based autoencoder architecture to extract and select a sparse subset of video frames or keyshots that optimally represent the input video in an unsupervised manner. The encoder selects a subset from the input video while the decoder seeks to reconstruct the video from the selection. The goal is to minimize the difference between the original input video and the reconstructed video. Our method is easily extendable to generate a variety of applications including static video thumbnails, animated thumbnails, storyboards and “trailer-like” highlights. We specifically study and evaluate two most popular use cases: thumbnail generation and storyboard generation. We demonstrate that our methods generate better results than the state-of-the-art techniques in both use cases.
Tasks
Published 2018-08-01
URL http://arxiv.org/abs/1808.00184v1
PDF http://arxiv.org/pdf/1808.00184v1.pdf
PWC https://paperswithcode.com/paper/from-thumbnails-to-summaries-a-single-deep
Repo
Framework

Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes

Title Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes
Authors Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, Ingmar Posner
Abstract Visually predicting the stability of block towers is a popular task in the domain of intuitive physics. While previous work focusses on prediction accuracy, a one-dimensional performance measure, we provide a broader analysis of the learned physical understanding of the final model and how the learning process can be guided. To this end, we introduce neural stethoscopes as a general purpose framework for quantifying the degree of importance of specific factors of influence in deep neural networks as well as for actively promoting and suppressing information as appropriate. In doing so, we unify concepts from multitask learning as well as training with auxiliary and adversarial losses. We apply neural stethoscopes to analyse the state-of-the-art neural network for stability prediction. We show that the baseline model is susceptible to being misled by incorrect visual cues. This leads to a performance breakdown to the level of random guessing when training on scenarios where visual cues are inversely correlated with stability. Using stethoscopes to promote meaningful feature extraction increases performance from 51% to 90% prediction accuracy. Conversely, training on an easy dataset where visual cues are positively correlated with stability, the baseline model learns a bias leading to poor performance on a harder dataset. Using an adversarial stethoscope, the network is successfully de-biased, leading to a performance increase from 66% to 88%.
Tasks
Published 2018-06-14
URL https://arxiv.org/abs/1806.05502v5
PDF https://arxiv.org/pdf/1806.05502v5.pdf
PWC https://paperswithcode.com/paper/neural-stethoscopes-unifying-analytic
Repo
Framework

Sí o no, què penses? Catalonian Independence and Linguistic Identity on Social Media

Title Sí o no, què penses? Catalonian Independence and Linguistic Identity on Social Media
Authors Ian Stewart, Yuval Pinter, Jacob Eisenstein
Abstract Political identity is often manifested in language variation, but the relationship between the two is still relatively unexplored from a quantitative perspective. This study examines the use of Catalan, a language local to the semi-autonomous region of Catalonia in Spain, on Twitter in discourse related to the 2017 independence referendum. We corroborate prior findings that pro-independence tweets are more likely to include the local language than anti-independence tweets. We also find that Catalan is used more often in referendum-related discourse than in other contexts, contrary to prior findings on language variation. This suggests a strong role for the Catalan language in the expression of Catalonian political identity.
Tasks
Published 2018-04-13
URL http://arxiv.org/abs/1804.05088v1
PDF http://arxiv.org/pdf/1804.05088v1.pdf
PWC https://paperswithcode.com/paper/si-o-no-que-penses-catalonian-independence-1
Repo
Framework

SCH-GAN: Semi-supervised Cross-modal Hashing by Generative Adversarial Network

Title SCH-GAN: Semi-supervised Cross-modal Hashing by Generative Adversarial Network
Authors Jian Zhang, Yuxin Peng, Mingkuan Yuan
Abstract Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space, which can realize fast and flexible retrieval across different modalities. Supervised cross-modal hashing methods have achieved considerable progress by incorporating semantic side information. However, they mainly have two limitations: (1) Heavily rely on large-scale labeled cross-modal training data which are labor intensive and hard to obtain. (2) Ignore the rich information contained in the large amount of unlabeled data across different modalities, especially the margin examples that are easily to be incorrectly retrieved, which can help to model the correlations. To address these problems, in this paper we propose a novel Semi-supervised Cross-Modal Hashing approach by Generative Adversarial Network (SCH-GAN). We aim to take advantage of GAN’s ability for modeling data distributions to promote cross-modal hashing learning in an adversarial way. The main contributions can be summarized as follows: (1) We propose a novel generative adversarial network for cross-modal hashing. In our proposed SCH-GAN, the generative model tries to select margin examples of one modality from unlabeled data when giving a query of another modality. While the discriminative model tries to distinguish the selected examples and true positive examples of the query. These two models play a minimax game so that the generative model can promote the hashing performance of discriminative model. (2) We propose a reinforcement learning based algorithm to drive the training of proposed SCH-GAN. The generative model takes the correlation score predicted by discriminative model as a reward, and tries to select the examples close to the margin to promote discriminative model by maximizing the margin between positive and negative data. Experiments on 3 widely-used datasets verify the effectiveness of our proposed approach.
Tasks
Published 2018-02-07
URL http://arxiv.org/abs/1802.02488v1
PDF http://arxiv.org/pdf/1802.02488v1.pdf
PWC https://paperswithcode.com/paper/sch-gan-semi-supervised-cross-modal-hashing
Repo
Framework

Harmonic analysis on directed graphs and applications: from Fourier analysis to wavelets

Title Harmonic analysis on directed graphs and applications: from Fourier analysis to wavelets
Authors Harry Sevi, Gabriel Rilling, Pierre Borgnat
Abstract We introduce a novel harmonic analysis for functions defined on the vertices of a strongly connected directed graph of which the random walk operator is the cornerstone. As a first step, we consider the set of eigenvectors of the random walk operator as a non-orthogonal Fourier-type basis for functions over directed graphs. We found a frequency interpretation by linking the variation of the eigenvectors of the random walk operator obtained from their Dirichlet energy to the real part of their associated eigenvalues. From this Fourier basis, we can proceed further and build multi-scale analyses on directed graphs. We propose both a redundant wavelet transform and a decimated wavelet transform by extending the diffusion wavelets framework by Coifman and Maggioni for directed graphs. The development of our harmonic analysis on directed graphs thus leads us to consider both semi-supervised learning problems and signal modeling problems on graphs applied to directed graphs highlighting the efficiency of our framework.
Tasks
Published 2018-11-28
URL http://arxiv.org/abs/1811.11636v2
PDF http://arxiv.org/pdf/1811.11636v2.pdf
PWC https://paperswithcode.com/paper/harmonic-analysis-on-directed-graphs-and
Repo
Framework

Improved Runtime Bounds for the Univariate Marginal Distribution Algorithm via Anti-Concentration

Title Improved Runtime Bounds for the Univariate Marginal Distribution Algorithm via Anti-Concentration
Authors Per Kristian Lehre, Phan Trung Hai Nguyen
Abstract Unlike traditional evolutionary algorithms which produce offspring via genetic operators, Estimation of Distribution Algorithms (EDAs) sample solutions from probabilistic models which are learned from selected individuals. It is hoped that EDAs may improve optimisation performance on epistatic fitness landscapes by learning variable interactions. However, hardly any rigorous results are available to support claims about the performance of EDAs, even for fitness functions without epistasis. The expected runtime of the Univariate Marginal Distribution Algorithm (UMDA) on OneMax was recently shown to be in $\mathcal{O}\left(n\lambda\log \lambda\right)$ by Dang and Lehre (GECCO 2015). Later, Krejca and Witt (FOGA 2017) proved the lower bound $\Omega\left(\lambda\sqrt{n}+n\log n\right)$ via an involved drift analysis. We prove a $\mathcal{O}\left(n\lambda\right)$ bound, given some restrictions on the population size. This implies the tight bound $\Theta\left(n\log n\right)$ when $\lambda=\mathcal{O}\left(\log n\right)$, matching the runtime of classical EAs. Our analysis uses the level-based theorem and anti-concentration properties of the Poisson-Binomial distribution. We expect that these generic methods will facilitate further analysis of EDAs.
Tasks
Published 2018-02-02
URL http://arxiv.org/abs/1802.00721v1
PDF http://arxiv.org/pdf/1802.00721v1.pdf
PWC https://paperswithcode.com/paper/improved-runtime-bounds-for-the-univariate
Repo
Framework

Baseline Detection in Historical Documents using Convolutional U-Nets

Title Baseline Detection in Historical Documents using Convolutional U-Nets
Authors Michael Fink, Thomas Layer, Georg Mackenbrock, Michael Sprinzl
Abstract Baseline detection is still a challenging task for heterogeneous collections of historical documents. We present a novel approach to baseline extraction in such settings, turning out the winning entry to the ICDAR 2017 Competition on Baseline detection (cBAD). It utilizes deep convolutional nets (CNNs) for both, the actual extraction of baselines, as well as for a simple form of layout analysis in a pre-processing step. To the best of our knowledge it is the first CNN-based system for baseline extraction applying a U-net architecture and sliding window detection, profiting from a high local accuracy of the candidate lines extracted. Final baseline post-processing complements our approach, compensating for inaccuracies mainly due to missing context information during sliding window detection. We experimentally evaluate the components of our system individually on the cBAD dataset. Moreover, we investigate how it generalizes to different data by means of the dataset used for the baseline extraction task of the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts (HisDoc). A comparison with the results reported for HisDoc shows that it also outperforms the contestants of the latter.
Tasks Window Detection
Published 2018-10-22
URL http://arxiv.org/abs/1810.09343v1
PDF http://arxiv.org/pdf/1810.09343v1.pdf
PWC https://paperswithcode.com/paper/baseline-detection-in-historical-documents
Repo
Framework

Introducing Curvature to the Label Space

Title Introducing Curvature to the Label Space
Authors Conor Sheehan, Ben Day, Pietro Liò
Abstract One-hot encoding is a labelling system that embeds classes as standard basis vectors in a label space. Despite seeing near-universal use in supervised categorical classification tasks, the scheme is problematic in its geometric implication that, as all classes are equally distant, all classes are equally different. This is inconsistent with most, if not all, real-world tasks due to the prevalence of ancestral and convergent relationships generating a varying degree of morphological similarity across classes. We address this issue by introducing curvature to the label-space using a metric tensor as a self-regulating method that better represents these relationships as a bolt-on, learning-algorithm agnostic solution. We propose both general constraints and specific statistical parameterizations of the metric and identify a direction for future research using autoencoder-based parameterizations.
Tasks
Published 2018-10-22
URL http://arxiv.org/abs/1810.09549v1
PDF http://arxiv.org/pdf/1810.09549v1.pdf
PWC https://paperswithcode.com/paper/introducing-curvature-to-the-label-space
Repo
Framework

Temporal Bilinear Networks for Video Action Recognition

Title Temporal Bilinear Networks for Video Action Recognition
Authors Yanghao Li, Sijie Song, Yuqi Li, Jiaying Liu
Abstract Temporal modeling in videos is a fundamental yet challenging problem in computer vision. In this paper, we propose a novel Temporal Bilinear (TB) model to capture the temporal pairwise feature interactions between adjacent frames. Compared with some existing temporal methods which are limited in linear transformations, our TB model considers explicit quadratic bilinear transformations in the temporal domain for motion evolution and sequential relation modeling. We further leverage the factorized bilinear model in linear complexity and a bottleneck network design to build our TB blocks, which also constrains the parameters and computation cost. We consider two schemes in terms of the incorporation of TB blocks and the original 2D spatial convolutions, namely wide and deep Temporal Bilinear Networks (TBN). Finally, we perform experiments on several widely adopted datasets including Kinetics, UCF101 and HMDB51. The effectiveness of our TBNs is validated by comprehensive ablation analyses and comparisons with various state-of-the-art methods.
Tasks Temporal Action Localization
Published 2018-11-25
URL http://arxiv.org/abs/1811.09974v1
PDF http://arxiv.org/pdf/1811.09974v1.pdf
PWC https://paperswithcode.com/paper/temporal-bilinear-networks-for-video-action
Repo
Framework

Hybrid Temporal Situation Calculus

Title Hybrid Temporal Situation Calculus
Authors Vitaliy Batusov, Giuseppe De Giacomo, Mikhail Soutchanski
Abstract The ability to model continuous change in Reiter’s temporal situation calculus action theories has attracted a lot of interest. In this paper, we propose a new development of his approach, which is directly inspired by hybrid systems in control theory. Specifically, while keeping the foundations of Reiter’s axiomatization, we propose an elegant extension of his approach by adding a time argument to all fluents that represent continuous change. Thereby, we insure that change can happen not only because of actions, but also due to the passage of time. We present a systematic methodology to derive, from simple premises, a new group of axioms which specify how continuous fluents change over time within a situation. We study regression for our new temporal basic action theories and demonstrate what reasoning problems can be solved. Finally, we formally show that our temporal basic action theories indeed capture hybrid automata.
Tasks
Published 2018-07-12
URL http://arxiv.org/abs/1807.04861v1
PDF http://arxiv.org/pdf/1807.04861v1.pdf
PWC https://paperswithcode.com/paper/hybrid-temporal-situation-calculus
Repo
Framework
comments powered by Disqus