Paper Group ANR 507
An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios. Estimating Feature-Label Dependence Using Gini Distance Statistics. A Graph-based Ranking Approach to Extract Key-frames for Static Video Summarization. Comprehensive Video Understanding: Video summarization with content-based video recommender design. AFP-Net: R …
An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios
Title | An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios |
Authors | Jungwoo Pyo, Joohyun Lee, Youngjune Park, Tien-Cuong Bui, Sang Kyun Cha |
Abstract | A speaker naming task, which finds and identifies the active speaker in a certain movie or drama scene, is crucial for dealing with high-level video analysis applications such as automatic subtitle labeling and video summarization. Modern approaches have usually exploited biometric features with a gradient-based method instead of rule-based algorithms. In a certain situation, however, a naive gradient-based method does not work efficiently. For example, when new characters are added to the target identification list, the neural network needs to be frequently retrained to identify new people and it causes delays in model preparation. In this paper, we present an attention-based method which reduces the model setup time by updating the newly added data via online adaptation without a gradient update process. We comparatively analyzed with three evaluation metrics(accuracy, memory usage, setup time) of the attention-based method and existing gradient-based methods under various controlled settings of speaker naming. Also, we applied existing speaker naming models and the attention-based model to real video to prove that our approach shows comparable accuracy to the existing state-of-the-art models and even higher accuracy in some cases. |
Tasks | Video Summarization |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00649v1 |
https://arxiv.org/pdf/1912.00649v1.pdf | |
PWC | https://paperswithcode.com/paper/an-attention-based-speaker-naming-method-for |
Repo | |
Framework | |
Estimating Feature-Label Dependence Using Gini Distance Statistics
Title | Estimating Feature-Label Dependence Using Gini Distance Statistics |
Authors | Silu Zhang, Xin Dang, Dao Nguyen, Dawn Wilkins, Yixin Chen |
Abstract | Identifying statistical dependence between the features and the label is a fundamental problem in supervised learning. This paper presents a framework for estimating dependence between numerical features and a categorical label using generalized Gini distance, an energy distance in reproducing kernel Hilbert spaces (RKHS). Two Gini distance based dependence measures are explored: Gini distance covariance and Gini distance correlation. Unlike Pearson covariance and correlation, which do not characterize independence, the above Gini distance based measures define dependence as well as independence of random variables. The test statistics are simple to calculate and do not require probability density estimation. Uniform convergence bounds and asymptotic bounds are derived for the test statistics. Comparisons with distance covariance statistics are provided. It is shown that Gini distance statistics converge faster than distance covariance statistics in the uniform convergence bounds, hence tighter upper bounds on both Type I and Type II errors. Moreover, the probability of Gini distance covariance statistic under-performing the distance covariance statistic in Type II error decreases to 0 exponentially with the increase of the sample size. Extensive experimental results are presented to demonstrate the performance of the proposed method. |
Tasks | Density Estimation |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02171v1 |
https://arxiv.org/pdf/1906.02171v1.pdf | |
PWC | https://paperswithcode.com/paper/estimating-feature-label-dependence-using |
Repo | |
Framework | |
A Graph-based Ranking Approach to Extract Key-frames for Static Video Summarization
Title | A Graph-based Ranking Approach to Extract Key-frames for Static Video Summarization |
Authors | Saikat Chakraborty |
Abstract | Video abstraction has become one of the efficient approaches to grasp the content of a video without seeing it entirely. Key frame-based static video summarization falls under this category. In this paper, we propose a graph-based approach which summarizes the video with best user satisfaction. We treated each video frame as a node of the graph and assigned a rank to each node by our proposed VidRank algorithm. We developed three different models of VidRank algorithm and performed a comparative study on those models. A comprehensive evaluation of 50 videos from open video database using objective and semi-objective measures indicates the superiority of our static video summary generation method. |
Tasks | Video Summarization |
Published | 2019-11-29 |
URL | https://arxiv.org/abs/1911.13279v1 |
https://arxiv.org/pdf/1911.13279v1.pdf | |
PWC | https://paperswithcode.com/paper/a-graph-based-ranking-approach-to-extract-key |
Repo | |
Framework | |
Comprehensive Video Understanding: Video summarization with content-based video recommender design
Title | Comprehensive Video Understanding: Video summarization with content-based video recommender design |
Authors | Yudong Jiang, Kaixu Cui, Bo Peng, Changliang Xu |
Abstract | Video summarization aims to extract keyframes/shots from a long video. Previous methods mainly take diversity and representativeness of generated summaries as prior knowledge in algorithm design. In this paper, we formulate video summarization as a content-based recommender problem, which should distill the most useful content from a long video for users who suffer from information overload. A scalable deep neural network is proposed on predicting if one video segment is a useful segment for users by explicitly modelling both segment and video. Moreover, we accomplish scene and action recognition in untrimmed videos in order to find more correlations among different aspects of video understanding tasks. Also, our paper will discuss the effect of audio and visual features in summarization task. We also extend our work by data augmentation and multi-task learning for preventing the model from early-stage overfitting. The final results of our model win the first place in ICCV 2019 CoView Workshop Challenge Track. |
Tasks | Data Augmentation, Multi-Task Learning, Video Summarization, Video Understanding |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13888v1 |
https://arxiv.org/pdf/1910.13888v1.pdf | |
PWC | https://paperswithcode.com/paper/comprehensive-video-understanding-video |
Repo | |
Framework | |
AFP-Net: Realtime Anchor-Free Polyp Detection in Colonoscopy
Title | AFP-Net: Realtime Anchor-Free Polyp Detection in Colonoscopy |
Authors | Dechun Wang, Ning Zhang, Xinzi Sun, Pengfei Zhang, Chenxi Zhang, Yu Cao, Benyuan Liu |
Abstract | Colorectal cancer (CRC) is a common and lethal disease. Globally, CRC is the third most commonly diagnosed cancer in males and the second in females. For colorectal cancer, the best screening test available is the colonoscopy. During a colonoscopic procedure, a tiny camera at the tip of the endoscope generates a video of the internal mucosa of the colon. The video data are displayed on a monitor for the physician to examine the lining of the entire colon and check for colorectal polyps. Detection and removal of colorectal polyps are associated with a reduction in mortality from colorectal cancer. However, the miss rate of polyp detection during colonoscopy procedure is often high even for very experienced physicians. The reason lies in the high variation of polyp in terms of shape, size, textural, color and illumination. Though challenging, with the great advances in object detection techniques, automated polyp detection still demonstrates a great potential in reducing the false negative rate while maintaining a high precision. In this paper, we propose a novel anchor free polyp detector that can localize polyps without using predefined anchor boxes. To further strengthen the model, we leverage a Context Enhancement Module and Cosine Ground truth Projection. Our approach can respond in real time while achieving state-of-the-art performance with 99.36% precision and 96.44% recall. |
Tasks | Object Detection |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02477v3 |
https://arxiv.org/pdf/1909.02477v3.pdf | |
PWC | https://paperswithcode.com/paper/afp-net-realtime-anchor-free-polyp-detection |
Repo | |
Framework | |
An Efficient and Layout-Independent Automatic License Plate Recognition System Based on the YOLO detector
Title | An Efficient and Layout-Independent Automatic License Plate Recognition System Based on the YOLO detector |
Authors | Rayson Laroca, Luiz A. Zanlorensi, Gabriel R. Gonçalves, Eduardo Todt, William Robson Schwartz, David Menotti |
Abstract | In this paper, we present an efficient and layout-independent Automatic License Plate Recognition (ALPR) system based on the state-of-the-art YOLO object detector that contains a unified approach for license plate (LP) detection and layout classification to improve the recognition results using post-processing rules. The system is conceived by evaluating and optimizing different models with various modifications, aiming at achieving the best speed/accuracy trade-off at each stage. The networks are trained using images from several datasets, with the addition of various data augmentation techniques, so that they are robust under different conditions. The proposed system achieved an average end-to-end recognition rate of 96.8% across eight public datasets (from five different regions) used in the experiments, outperforming both previous works and commercial systems in the ChineseLP, OpenALPR-EU, SSIG-SegPlate and UFPR-ALPR datasets. In the other datasets, the proposed approach achieved competitive results to those attained by the baselines. Our system also achieved impressive frames per second (FPS) rates on a high-end GPU, being able to perform in real time even when there are four vehicles in the scene. An additional contribution is that we manually labeled 38,334 bounding boxes on 6,237 images from public datasets and made the annotations publicly available to the research community. |
Tasks | Data Augmentation, License Plate Recognition |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01754v2 |
https://arxiv.org/pdf/1909.01754v2.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-and-layout-independent-automatic |
Repo | |
Framework | |
License Plate Recognition with Compressive Sensing Based Feature Extraction
Title | License Plate Recognition with Compressive Sensing Based Feature Extraction |
Authors | Andrej Jokic, Nikola Vukovic |
Abstract | License plate recognition is the key component to many automatic traffic control systems. It enables the automatic identification of vehicles in many applications. Such systems must be able to identify vehicles from images taken in various conditions including low light, rain, snow, etc. In order to reduce the complexity and cost of the hardware required for such devices, the algorithm should be as efficient as possible. This paper proposes a license plate recognition system which uses a new approach based on compressive sensing techniques for dimensionality reduction and feature extraction. Dimensionality reduction will enable precise classification with less training data while demanding less computational power. Based on the extracted features, character recognition and classification is done by a Support Vector Machine classifier. |
Tasks | Compressive Sensing, Dimensionality Reduction, License Plate Recognition |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.05386v1 |
http://arxiv.org/pdf/1902.05386v1.pdf | |
PWC | https://paperswithcode.com/paper/license-plate-recognition-with-compressive |
Repo | |
Framework | |
TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation
Title | TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation |
Authors | Fan Yang, Xiao Liu, Dongliang He, Chuang Gan, Jian Wang, Chao Li, Fu Li, Shilei Wen |
Abstract | In this work, we introduce a new problem, named as {\em story-preserving long video truncation}, that requires an algorithm to automatically truncate a long-duration video into multiple short and attractive sub-videos with each one containing an unbroken story. This differs from traditional video highlight detection or video summarization problems in that each sub-video is required to maintain a coherent and integral story, which is becoming particularly important for resource-production video sharing platforms such as Youtube, Facebook, TikTok, Kwai, etc. To address the problem, we collect and annotate a new large video truncation dataset, named as TruNet, which contains 1470 videos with on average 11 short stories per video. With the new dataset, we further develop and train a neural architecture for video truncation that consists of two components: a Boundary Aware Network (BAN) and a Fast-Forward Long Short-Term Memory (FF-LSTM). We first use the BAN to generate high quality temporal proposals by jointly considering frame-level attractiveness and boundaryness. We then apply the FF-LSTM, which tends to capture high-order dependencies among a sequence of frames, to decide whether a temporal proposal is a coherent and integral story. We show that our proposed framework outperforms existing approaches for the story-preserving long video truncation problem in both quantitative measures and user-study. The dataset is available for public academic research usage at https://ai.baidu.com/broad/download. |
Tasks | Video Summarization |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.05899v1 |
https://arxiv.org/pdf/1910.05899v1.pdf | |
PWC | https://paperswithcode.com/paper/trunet-short-videos-generation-from-long |
Repo | |
Framework | |
Video Summarization using Keyframe Extraction and Video Skimming
Title | Video Summarization using Keyframe Extraction and Video Skimming |
Authors | Shruti Jadon, Mahmood Jasim |
Abstract | Video is one of the robust sources of information and the consumption of online and offline videos has reached an unprecedented level in the last few years. A fundamental challenge of extracting information from videos is a viewer has to go through the complete video to understand the context, as opposed to an image where the viewer can extract information from a single frame. In this work, we attempt to employ different Algorithmic methodologies including local features and deep neural networks along with multiple clustering methods to find an effective way of summarizing a video by interesting keyframe extraction. |
Tasks | Video Summarization |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04792v1 |
https://arxiv.org/pdf/1910.04792v1.pdf | |
PWC | https://paperswithcode.com/paper/video-summarization-using-keyframe-extraction |
Repo | |
Framework | |
Spectrogram Feature Losses for Music Source Separation
Title | Spectrogram Feature Losses for Music Source Separation |
Authors | Abhimanyu Sahai, Romann Weber, Brian McWilliams |
Abstract | In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a high-level feature loss term, extracted from the spectrograms using a VGG net, can improve separation quality vis-a-vis a pure pixel-level loss. We show this improvement in the context of the MMDenseNet, a State-of-the-Art deep learning model for this task, for the extraction of drums and vocal sounds from songs in the musdb18 database, covering a broad range of western music genres. We believe that this finding can be generalized and applied to broader machine learning-based systems in the audio domain. |
Tasks | Music Source Separation |
Published | 2019-01-15 |
URL | https://arxiv.org/abs/1901.05061v3 |
https://arxiv.org/pdf/1901.05061v3.pdf | |
PWC | https://paperswithcode.com/paper/spectrogram-feature-losses-for-music-source |
Repo | |
Framework | |
Bayes metaclassifier and Soft-confusion-matrix classifier in the task of multi-label classification
Title | Bayes metaclassifier and Soft-confusion-matrix classifier in the task of multi-label classification |
Authors | Pawel Trajdos, Marcin Majak |
Abstract | The aim of this paper was to compare soft confusion matrix approach and Bayes metaclassifier under the multi-label classification framework. Although the methods were successfully applied under the multi-label classification framework, they have not been compared directly thus far. Such comparison is of vital importance because both methods are quite similar as they are both based on the concept of randomized reference classifier. Since both algorithms were designed to deal with single-label problems, they are combined with the problem-transformation approach to multi-label classification. Present study included 29 benchmark datasets and four different base classifiers. The algorithms were compared in terms of 11 quality criteria and the results were subjected to statistical analysis. |
Tasks | Multi-Label Classification |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1901.08827v1 |
http://arxiv.org/pdf/1901.08827v1.pdf | |
PWC | https://paperswithcode.com/paper/bayes-metaclassifier-and-soft-confusion |
Repo | |
Framework | |
Infusing domain knowledge in AI-based “black box” models for better explainability with application in bankruptcy prediction
Title | Infusing domain knowledge in AI-based “black box” models for better explainability with application in bankruptcy prediction |
Authors | Sheikh Rabiul Islam, William Eberle, Sid Bundy, Sheikh Khaled Ghafoor |
Abstract | Although “black box” models such as Artificial Neural Networks, Support Vector Machines, and Ensemble Approaches continue to show superior performance in many disciplines, their adoption in the sensitive disciplines (e.g., finance, healthcare) is questionable due to the lack of interpretability and explainability of the model. In fact, future adoption of “black box” models is difficult because of the recent rule of “right of explanation” by the European Union where a user can ask for an explanation behind an algorithmic decision, and the newly proposed bill by the US government, the “Algorithmic Accountability Act”, which would require companies to assess their machine learning systems for bias and discrimination and take corrective measures. Top Bankruptcy Prediction Models are A.I.-based and are in need of better explainability -the extent to which the internal working mechanisms of an AI system can be explained in human terms. Although explainable artificial intelligence is an emerging field of research, infusing domain knowledge for better explainability might be a possible solution. In this work, we demonstrate a way to collect and infuse domain knowledge into a “black box” model for bankruptcy prediction. Our understanding from the experiments reveals that infused domain knowledge makes the output from the black box model more interpretable and explainable. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11474v2 |
https://arxiv.org/pdf/1905.11474v2.pdf | |
PWC | https://paperswithcode.com/paper/infusing-domain-knowledge-in-ai-based-black |
Repo | |
Framework | |
SuperCaptioning: Image Captioning Using Two-dimensional Word Embedding
Title | SuperCaptioning: Image Captioning Using Two-dimensional Word Embedding |
Authors | Baohua Sun, Lin Yang, Michael Lin, Charles Young, Patrick Dong, Wenhan Zhang, Jason Dong |
Abstract | Language and vision are processed as two different modal in current work for image captioning. However, recent work on Super Characters method shows the effectiveness of two-dimensional word embedding, which converts text classification problem into image classification problem. In this paper, we propose the SuperCaptioning method, which borrows the idea of two-dimensional word embedding from Super Characters method, and processes the information of language and vision together in one single CNN model. The experimental results on Flickr30k data shows the proposed method gives high quality image captions. An interactive demo is ready to show at the workshop. |
Tasks | Image Captioning, Image Classification, Text Classification |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10515v2 |
https://arxiv.org/pdf/1905.10515v2.pdf | |
PWC | https://paperswithcode.com/paper/supercaptioning-image-captioning-using-two |
Repo | |
Framework | |
Patient-specific Conditional Joint Models of Shape, Image Features and Clinical Indicators
Title | Patient-specific Conditional Joint Models of Shape, Image Features and Clinical Indicators |
Authors | Bernhard Egger, Markus D. Schirmer, Florian Dubost, Marco J. Nardin, Natalia S. Rost, Polina Golland |
Abstract | We propose and demonstrate a joint model of anatomical shapes, image features and clinical indicators for statistical shape modeling and medical image analysis. The key idea is to employ a copula model to separate the joint dependency structure from the marginal distributions of variables of interest. This separation provides flexibility on the assumptions made during the modeling process. The proposed method can handle binary, discrete, ordinal and continuous variables. We demonstrate a simple and efficient way to include binary, discrete and ordinal variables into the modeling. We build Bayesian conditional models based on observed partial clinical indicators, features or shape based on Gaussian processes capturing the dependency structure. We apply the proposed method on a stroke dataset to jointly model the shape of the lateral ventricles, the spatial distribution of the white matter hyperintensity associated with periventricular white matter disease, and clinical indicators. The proposed method yields interpretable joint models for data exploration and patient-specific statistical shape models for medical image analysis. |
Tasks | Gaussian Processes |
Published | 2019-07-17 |
URL | https://arxiv.org/abs/1907.07783v1 |
https://arxiv.org/pdf/1907.07783v1.pdf | |
PWC | https://paperswithcode.com/paper/patient-specific-conditional-joint-models-of |
Repo | |
Framework | |
Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning
Title | Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning |
Authors | Emma Tosch, Kaleigh Clary, John Foley, David Jensen |
Abstract | Evaluation of deep reinforcement learning (RL) is inherently challenging. In particular, learned policies are largely opaque, and hypotheses about the behavior of deep RL agents are difficult to test in black-box environments. Considerable effort has gone into addressing opacity, but almost no effort has been devoted to producing high quality environments for experimental evaluation of agent behavior. We present TOYBOX, a new high-performance, open-source* subset of Atari environments re-designed for the experimental evaluation of deep RL. We show that TOYBOX enables a wide range of experiments and analyses that are impossible in other environments. *https://kdl-umass.github.io/Toybox/ |
Tasks | |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02825v1 |
https://arxiv.org/pdf/1905.02825v1.pdf | |
PWC | https://paperswithcode.com/paper/toybox-a-suite-of-environments-for |
Repo | |
Framework | |