Paper Group ANR 40
Automated Bridge Component Recognition using Video Data. On the Generation of Medical Question-Answer Pairs. Deep learning for word-level handwritten Indic script identification. Gaussian Process Landmarking on Manifolds. Person Search via A Mask-Guided Two-Stream CNN Model. From Thumbnails to Summaries - A single Deep Neural Network to Rule Them A …
Automated Bridge Component Recognition using Video Data
Title | Automated Bridge Component Recognition using Video Data |
Authors | Yasutaka Narazaki, Vedhus Hoskere, Tu A. Hoang, Billie F. Spencer Jr |
Abstract | This paper investigates the automated recognition of structural bridge components using video data. Although understanding video data for structural inspections is straightforward for human inspectors, the implementation of the same task using machine learning methods has not been fully realized. In particular, single-frame image processing techniques, such as convolutional neural networks (CNNs), are not expected to identify structural components accurately when the image is a close-up view, lacking contextual information regarding where on the structure the image originates. Inspired by the significant progress in video processing techniques, this study investigates automated bridge component recognition using video data, where the information from the past frames is used to augment the understanding of the current frame. A new simulated video dataset is created to train the machine learning algorithms. Then, convolutional Neural Networks (CNNs) with recurrent architectures are designed and applied to implement the automated bridge component recognition task. Results are presented for simulated video data, as well as video collected in the field. |
Tasks | |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06820v2 |
http://arxiv.org/pdf/1806.06820v2.pdf | |
PWC | https://paperswithcode.com/paper/automated-bridge-component-recognition-using |
Repo | |
Framework | |
On the Generation of Medical Question-Answer Pairs
Title | On the Generation of Medical Question-Answer Pairs |
Authors | Sheng Shen, Yaliang Li, Nan Du, Xian Wu, Yusheng Xie, Shen Ge, Tao Yang, Kai Wang, Xingzheng Liang, Wei Fan |
Abstract | Question answering (QA) has achieved promising progress recently. However, answering a question in real-world scenarios like the medical domain is still challenging, due to the requirement of external knowledge and the insufficient quantity of high-quality training data. In the light of these challenges, we study the task of generating medical QA pairs in this paper. With the insight that each medical question can be considered as a sample from the latent distribution of questions given answers, we propose an automated medical QA pair generation framework, consisting of an unsupervised key phrase detector that explores unstructured material for validity, and a generator that involves a multi-pass decoder to integrate structural knowledge for diversity. A series of experiments have been conducted on a real-world dataset collected from the National Medical Licensing Examination of China. Both automatic evaluation and human annotation demonstrate the effectiveness of the proposed method. Further investigation shows that, by incorporating the generated QA pairs for training, significant improvement in terms of accuracy can be achieved for the examination QA system. |
Tasks | Question Answering |
Published | 2018-11-01 |
URL | https://arxiv.org/abs/1811.00681v2 |
https://arxiv.org/pdf/1811.00681v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-generation-of-medical-question-answer |
Repo | |
Framework | |
Deep learning for word-level handwritten Indic script identification
Title | Deep learning for word-level handwritten Indic script identification |
Authors | Soumya Ukil, Swarnendu Ghosh, Sk Md Obaidullah, K. C. Santosh, Kaushik Roy, Nibaran Das |
Abstract | We propose a novel method that uses convolutional neural networks (CNNs) for feature extraction. Not just limited to conventional spatial domain representation, we use multilevel 2D discrete Haar wavelet transform, where image representations are scaled to a variety of different sizes. These are then used to train different CNNs to select features. To be precise, we use 10 different CNNs that select a set of 10240 features, i.e. 1024/CNN. With this, 11 different handwritten scripts are identified, where 1K words per script are used. In our test, we have achieved the maximum script identification rate of 94.73% using multi-layer perceptron (MLP). Our results outperform the state-of-the-art techniques. |
Tasks | |
Published | 2018-01-05 |
URL | http://arxiv.org/abs/1801.01627v1 |
http://arxiv.org/pdf/1801.01627v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-word-level-handwritten |
Repo | |
Framework | |
Gaussian Process Landmarking on Manifolds
Title | Gaussian Process Landmarking on Manifolds |
Authors | Tingran Gao, Shahar Z. Kovalsky, Ingrid Daubechies |
Abstract | As a means of improving analysis of biological shapes, we propose an algorithm for sampling a Riemannian manifold by sequentially selecting points with maximum uncertainty under a Gaussian process model. This greedy strategy is known to be near-optimal in the experimental design literature, and appears to outperform the use of user-placed landmarks in representing the geometry of biological objects in our application. In the noiseless regime, we establish an upper bound for the mean squared prediction error (MSPE) in terms of the number of samples and geometric quantities of the manifold, demonstrating that the MSPE for our proposed sequential design decays at a rate comparable to the oracle rate achievable by any sequential or non-sequential optimal design; to our knowledge this is the first result of this type for sequential experimental design. The key is to link the greedy algorithm to reduced basis methods in the context of model reduction for partial differential equations. We expect this approach will find additional applications in other fields of research. |
Tasks | |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03479v4 |
http://arxiv.org/pdf/1802.03479v4.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-process-landmarking-on-manifolds |
Repo | |
Framework | |
Person Search via A Mask-Guided Two-Stream CNN Model
Title | Person Search via A Mask-Guided Two-Stream CNN Model |
Authors | Di Chen, Shanshan Zhang, Wanli Ouyang, Jian Yang, Ying Tai |
Abstract | In this work, we tackle the problem of person search, which is a challenging task consisted of pedestrian detection and person re-identification~(re-ID). Instead of sharing representations in a single joint model, we find that separating detector and re-ID feature extraction yields better performance. In order to extract more representative features for each identity, we segment out the foreground person from the original image patch. We propose a simple yet effective re-ID method, which models foreground person and original image patches individually, and obtains enriched representations from two separate CNN streams. From the experiments on two standard person search benchmarks of CUHK-SYSU and PRW, we achieve mAP of $83.0%$ and $32.6%$ respectively, surpassing the state of the art by a large margin (more than 5pp). |
Tasks | Pedestrian Detection, Person Re-Identification, Person Search |
Published | 2018-07-21 |
URL | http://arxiv.org/abs/1807.08107v1 |
http://arxiv.org/pdf/1807.08107v1.pdf | |
PWC | https://paperswithcode.com/paper/person-search-via-a-mask-guided-two-stream |
Repo | |
Framework | |
From Thumbnails to Summaries - A single Deep Neural Network to Rule Them All
Title | From Thumbnails to Summaries - A single Deep Neural Network to Rule Them All |
Authors | Hongxiang Gu, Viswanathan Swaminathan |
Abstract | Video summaries come in many forms, from traditional single-image thumbnails, animated thumbnails, storyboards, to trailer-like video summaries. Content creators use the summaries to display the most attractive portion of their videos; the users use them to quickly evaluate if a video is worth watching. All forms of summaries are essential to video viewers, content creators, and advertisers. Often video content management systems have to generate multiple versions of summaries that vary in duration and presentational forms. We present a framework ReconstSum that utilizes LSTM-based autoencoder architecture to extract and select a sparse subset of video frames or keyshots that optimally represent the input video in an unsupervised manner. The encoder selects a subset from the input video while the decoder seeks to reconstruct the video from the selection. The goal is to minimize the difference between the original input video and the reconstructed video. Our method is easily extendable to generate a variety of applications including static video thumbnails, animated thumbnails, storyboards and “trailer-like” highlights. We specifically study and evaluate two most popular use cases: thumbnail generation and storyboard generation. We demonstrate that our methods generate better results than the state-of-the-art techniques in both use cases. |
Tasks | |
Published | 2018-08-01 |
URL | http://arxiv.org/abs/1808.00184v1 |
http://arxiv.org/pdf/1808.00184v1.pdf | |
PWC | https://paperswithcode.com/paper/from-thumbnails-to-summaries-a-single-deep |
Repo | |
Framework | |
Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes
Title | Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes |
Authors | Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, Ingmar Posner |
Abstract | Visually predicting the stability of block towers is a popular task in the domain of intuitive physics. While previous work focusses on prediction accuracy, a one-dimensional performance measure, we provide a broader analysis of the learned physical understanding of the final model and how the learning process can be guided. To this end, we introduce neural stethoscopes as a general purpose framework for quantifying the degree of importance of specific factors of influence in deep neural networks as well as for actively promoting and suppressing information as appropriate. In doing so, we unify concepts from multitask learning as well as training with auxiliary and adversarial losses. We apply neural stethoscopes to analyse the state-of-the-art neural network for stability prediction. We show that the baseline model is susceptible to being misled by incorrect visual cues. This leads to a performance breakdown to the level of random guessing when training on scenarios where visual cues are inversely correlated with stability. Using stethoscopes to promote meaningful feature extraction increases performance from 51% to 90% prediction accuracy. Conversely, training on an easy dataset where visual cues are positively correlated with stability, the baseline model learns a bias leading to poor performance on a harder dataset. Using an adversarial stethoscope, the network is successfully de-biased, leading to a performance increase from 66% to 88%. |
Tasks | |
Published | 2018-06-14 |
URL | https://arxiv.org/abs/1806.05502v5 |
https://arxiv.org/pdf/1806.05502v5.pdf | |
PWC | https://paperswithcode.com/paper/neural-stethoscopes-unifying-analytic |
Repo | |
Framework | |
Sí o no, què penses? Catalonian Independence and Linguistic Identity on Social Media
Title | Sí o no, què penses? Catalonian Independence and Linguistic Identity on Social Media |
Authors | Ian Stewart, Yuval Pinter, Jacob Eisenstein |
Abstract | Political identity is often manifested in language variation, but the relationship between the two is still relatively unexplored from a quantitative perspective. This study examines the use of Catalan, a language local to the semi-autonomous region of Catalonia in Spain, on Twitter in discourse related to the 2017 independence referendum. We corroborate prior findings that pro-independence tweets are more likely to include the local language than anti-independence tweets. We also find that Catalan is used more often in referendum-related discourse than in other contexts, contrary to prior findings on language variation. This suggests a strong role for the Catalan language in the expression of Catalonian political identity. |
Tasks | |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.05088v1 |
http://arxiv.org/pdf/1804.05088v1.pdf | |
PWC | https://paperswithcode.com/paper/si-o-no-que-penses-catalonian-independence-1 |
Repo | |
Framework | |
SCH-GAN: Semi-supervised Cross-modal Hashing by Generative Adversarial Network
Title | SCH-GAN: Semi-supervised Cross-modal Hashing by Generative Adversarial Network |
Authors | Jian Zhang, Yuxin Peng, Mingkuan Yuan |
Abstract | Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space, which can realize fast and flexible retrieval across different modalities. Supervised cross-modal hashing methods have achieved considerable progress by incorporating semantic side information. However, they mainly have two limitations: (1) Heavily rely on large-scale labeled cross-modal training data which are labor intensive and hard to obtain. (2) Ignore the rich information contained in the large amount of unlabeled data across different modalities, especially the margin examples that are easily to be incorrectly retrieved, which can help to model the correlations. To address these problems, in this paper we propose a novel Semi-supervised Cross-Modal Hashing approach by Generative Adversarial Network (SCH-GAN). We aim to take advantage of GAN’s ability for modeling data distributions to promote cross-modal hashing learning in an adversarial way. The main contributions can be summarized as follows: (1) We propose a novel generative adversarial network for cross-modal hashing. In our proposed SCH-GAN, the generative model tries to select margin examples of one modality from unlabeled data when giving a query of another modality. While the discriminative model tries to distinguish the selected examples and true positive examples of the query. These two models play a minimax game so that the generative model can promote the hashing performance of discriminative model. (2) We propose a reinforcement learning based algorithm to drive the training of proposed SCH-GAN. The generative model takes the correlation score predicted by discriminative model as a reward, and tries to select the examples close to the margin to promote discriminative model by maximizing the margin between positive and negative data. Experiments on 3 widely-used datasets verify the effectiveness of our proposed approach. |
Tasks | |
Published | 2018-02-07 |
URL | http://arxiv.org/abs/1802.02488v1 |
http://arxiv.org/pdf/1802.02488v1.pdf | |
PWC | https://paperswithcode.com/paper/sch-gan-semi-supervised-cross-modal-hashing |
Repo | |
Framework | |
Harmonic analysis on directed graphs and applications: from Fourier analysis to wavelets
Title | Harmonic analysis on directed graphs and applications: from Fourier analysis to wavelets |
Authors | Harry Sevi, Gabriel Rilling, Pierre Borgnat |
Abstract | We introduce a novel harmonic analysis for functions defined on the vertices of a strongly connected directed graph of which the random walk operator is the cornerstone. As a first step, we consider the set of eigenvectors of the random walk operator as a non-orthogonal Fourier-type basis for functions over directed graphs. We found a frequency interpretation by linking the variation of the eigenvectors of the random walk operator obtained from their Dirichlet energy to the real part of their associated eigenvalues. From this Fourier basis, we can proceed further and build multi-scale analyses on directed graphs. We propose both a redundant wavelet transform and a decimated wavelet transform by extending the diffusion wavelets framework by Coifman and Maggioni for directed graphs. The development of our harmonic analysis on directed graphs thus leads us to consider both semi-supervised learning problems and signal modeling problems on graphs applied to directed graphs highlighting the efficiency of our framework. |
Tasks | |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11636v2 |
http://arxiv.org/pdf/1811.11636v2.pdf | |
PWC | https://paperswithcode.com/paper/harmonic-analysis-on-directed-graphs-and |
Repo | |
Framework | |
Improved Runtime Bounds for the Univariate Marginal Distribution Algorithm via Anti-Concentration
Title | Improved Runtime Bounds for the Univariate Marginal Distribution Algorithm via Anti-Concentration |
Authors | Per Kristian Lehre, Phan Trung Hai Nguyen |
Abstract | Unlike traditional evolutionary algorithms which produce offspring via genetic operators, Estimation of Distribution Algorithms (EDAs) sample solutions from probabilistic models which are learned from selected individuals. It is hoped that EDAs may improve optimisation performance on epistatic fitness landscapes by learning variable interactions. However, hardly any rigorous results are available to support claims about the performance of EDAs, even for fitness functions without epistasis. The expected runtime of the Univariate Marginal Distribution Algorithm (UMDA) on OneMax was recently shown to be in $\mathcal{O}\left(n\lambda\log \lambda\right)$ by Dang and Lehre (GECCO 2015). Later, Krejca and Witt (FOGA 2017) proved the lower bound $\Omega\left(\lambda\sqrt{n}+n\log n\right)$ via an involved drift analysis. We prove a $\mathcal{O}\left(n\lambda\right)$ bound, given some restrictions on the population size. This implies the tight bound $\Theta\left(n\log n\right)$ when $\lambda=\mathcal{O}\left(\log n\right)$, matching the runtime of classical EAs. Our analysis uses the level-based theorem and anti-concentration properties of the Poisson-Binomial distribution. We expect that these generic methods will facilitate further analysis of EDAs. |
Tasks | |
Published | 2018-02-02 |
URL | http://arxiv.org/abs/1802.00721v1 |
http://arxiv.org/pdf/1802.00721v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-runtime-bounds-for-the-univariate |
Repo | |
Framework | |
Baseline Detection in Historical Documents using Convolutional U-Nets
Title | Baseline Detection in Historical Documents using Convolutional U-Nets |
Authors | Michael Fink, Thomas Layer, Georg Mackenbrock, Michael Sprinzl |
Abstract | Baseline detection is still a challenging task for heterogeneous collections of historical documents. We present a novel approach to baseline extraction in such settings, turning out the winning entry to the ICDAR 2017 Competition on Baseline detection (cBAD). It utilizes deep convolutional nets (CNNs) for both, the actual extraction of baselines, as well as for a simple form of layout analysis in a pre-processing step. To the best of our knowledge it is the first CNN-based system for baseline extraction applying a U-net architecture and sliding window detection, profiting from a high local accuracy of the candidate lines extracted. Final baseline post-processing complements our approach, compensating for inaccuracies mainly due to missing context information during sliding window detection. We experimentally evaluate the components of our system individually on the cBAD dataset. Moreover, we investigate how it generalizes to different data by means of the dataset used for the baseline extraction task of the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts (HisDoc). A comparison with the results reported for HisDoc shows that it also outperforms the contestants of the latter. |
Tasks | Window Detection |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09343v1 |
http://arxiv.org/pdf/1810.09343v1.pdf | |
PWC | https://paperswithcode.com/paper/baseline-detection-in-historical-documents |
Repo | |
Framework | |
Introducing Curvature to the Label Space
Title | Introducing Curvature to the Label Space |
Authors | Conor Sheehan, Ben Day, Pietro Liò |
Abstract | One-hot encoding is a labelling system that embeds classes as standard basis vectors in a label space. Despite seeing near-universal use in supervised categorical classification tasks, the scheme is problematic in its geometric implication that, as all classes are equally distant, all classes are equally different. This is inconsistent with most, if not all, real-world tasks due to the prevalence of ancestral and convergent relationships generating a varying degree of morphological similarity across classes. We address this issue by introducing curvature to the label-space using a metric tensor as a self-regulating method that better represents these relationships as a bolt-on, learning-algorithm agnostic solution. We propose both general constraints and specific statistical parameterizations of the metric and identify a direction for future research using autoencoder-based parameterizations. |
Tasks | |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09549v1 |
http://arxiv.org/pdf/1810.09549v1.pdf | |
PWC | https://paperswithcode.com/paper/introducing-curvature-to-the-label-space |
Repo | |
Framework | |
Temporal Bilinear Networks for Video Action Recognition
Title | Temporal Bilinear Networks for Video Action Recognition |
Authors | Yanghao Li, Sijie Song, Yuqi Li, Jiaying Liu |
Abstract | Temporal modeling in videos is a fundamental yet challenging problem in computer vision. In this paper, we propose a novel Temporal Bilinear (TB) model to capture the temporal pairwise feature interactions between adjacent frames. Compared with some existing temporal methods which are limited in linear transformations, our TB model considers explicit quadratic bilinear transformations in the temporal domain for motion evolution and sequential relation modeling. We further leverage the factorized bilinear model in linear complexity and a bottleneck network design to build our TB blocks, which also constrains the parameters and computation cost. We consider two schemes in terms of the incorporation of TB blocks and the original 2D spatial convolutions, namely wide and deep Temporal Bilinear Networks (TBN). Finally, we perform experiments on several widely adopted datasets including Kinetics, UCF101 and HMDB51. The effectiveness of our TBNs is validated by comprehensive ablation analyses and comparisons with various state-of-the-art methods. |
Tasks | Temporal Action Localization |
Published | 2018-11-25 |
URL | http://arxiv.org/abs/1811.09974v1 |
http://arxiv.org/pdf/1811.09974v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-bilinear-networks-for-video-action |
Repo | |
Framework | |
Hybrid Temporal Situation Calculus
Title | Hybrid Temporal Situation Calculus |
Authors | Vitaliy Batusov, Giuseppe De Giacomo, Mikhail Soutchanski |
Abstract | The ability to model continuous change in Reiter’s temporal situation calculus action theories has attracted a lot of interest. In this paper, we propose a new development of his approach, which is directly inspired by hybrid systems in control theory. Specifically, while keeping the foundations of Reiter’s axiomatization, we propose an elegant extension of his approach by adding a time argument to all fluents that represent continuous change. Thereby, we insure that change can happen not only because of actions, but also due to the passage of time. We present a systematic methodology to derive, from simple premises, a new group of axioms which specify how continuous fluents change over time within a situation. We study regression for our new temporal basic action theories and demonstrate what reasoning problems can be solved. Finally, we formally show that our temporal basic action theories indeed capture hybrid automata. |
Tasks | |
Published | 2018-07-12 |
URL | http://arxiv.org/abs/1807.04861v1 |
http://arxiv.org/pdf/1807.04861v1.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-temporal-situation-calculus |
Repo | |
Framework | |