January 29, 2020

3020 words 15 mins read

Paper Group ANR 507

An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios. Estimating Feature-Label Dependence Using Gini Distance Statistics. A Graph-based Ranking Approach to Extract Key-frames for Static Video Summarization. Comprehensive Video Understanding: Video summarization with content-based video recommender design. AFP-Net: R …

An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios


Title	An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios
Authors	Jungwoo Pyo, Joohyun Lee, Youngjune Park, Tien-Cuong Bui, Sang Kyun Cha
Abstract	A speaker naming task, which finds and identifies the active speaker in a certain movie or drama scene, is crucial for dealing with high-level video analysis applications such as automatic subtitle labeling and video summarization. Modern approaches have usually exploited biometric features with a gradient-based method instead of rule-based algorithms. In a certain situation, however, a naive gradient-based method does not work efficiently. For example, when new characters are added to the target identification list, the neural network needs to be frequently retrained to identify new people and it causes delays in model preparation. In this paper, we present an attention-based method which reduces the model setup time by updating the newly added data via online adaptation without a gradient update process. We comparatively analyzed with three evaluation metrics(accuracy, memory usage, setup time) of the attention-based method and existing gradient-based methods under various controlled settings of speaker naming. Also, we applied existing speaker naming models and the attention-based model to real video to prove that our approach shows comparable accuracy to the existing state-of-the-art models and even higher accuracy in some cases.
Tasks	Video Summarization
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00649v1
PDF	https://arxiv.org/pdf/1912.00649v1.pdf
PWC	https://paperswithcode.com/paper/an-attention-based-speaker-naming-method-for
Repo
Framework

Estimating Feature-Label Dependence Using Gini Distance Statistics


Title	Estimating Feature-Label Dependence Using Gini Distance Statistics
Authors	Silu Zhang, Xin Dang, Dao Nguyen, Dawn Wilkins, Yixin Chen
Abstract	Identifying statistical dependence between the features and the label is a fundamental problem in supervised learning. This paper presents a framework for estimating dependence between numerical features and a categorical label using generalized Gini distance, an energy distance in reproducing kernel Hilbert spaces (RKHS). Two Gini distance based dependence measures are explored: Gini distance covariance and Gini distance correlation. Unlike Pearson covariance and correlation, which do not characterize independence, the above Gini distance based measures define dependence as well as independence of random variables. The test statistics are simple to calculate and do not require probability density estimation. Uniform convergence bounds and asymptotic bounds are derived for the test statistics. Comparisons with distance covariance statistics are provided. It is shown that Gini distance statistics converge faster than distance covariance statistics in the uniform convergence bounds, hence tighter upper bounds on both Type I and Type II errors. Moreover, the probability of Gini distance covariance statistic under-performing the distance covariance statistic in Type II error decreases to 0 exponentially with the increase of the sample size. Extensive experimental results are presented to demonstrate the performance of the proposed method.
Tasks	Density Estimation
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02171v1
PDF	https://arxiv.org/pdf/1906.02171v1.pdf
PWC	https://paperswithcode.com/paper/estimating-feature-label-dependence-using
Repo
Framework

A Graph-based Ranking Approach to Extract Key-frames for Static Video Summarization


Title	A Graph-based Ranking Approach to Extract Key-frames for Static Video Summarization
Authors	Saikat Chakraborty
Abstract	Video abstraction has become one of the efficient approaches to grasp the content of a video without seeing it entirely. Key frame-based static video summarization falls under this category. In this paper, we propose a graph-based approach which summarizes the video with best user satisfaction. We treated each video frame as a node of the graph and assigned a rank to each node by our proposed VidRank algorithm. We developed three different models of VidRank algorithm and performed a comparative study on those models. A comprehensive evaluation of 50 videos from open video database using objective and semi-objective measures indicates the superiority of our static video summary generation method.
Tasks	Video Summarization
Published	2019-11-29
URL	https://arxiv.org/abs/1911.13279v1
PDF	https://arxiv.org/pdf/1911.13279v1.pdf
PWC	https://paperswithcode.com/paper/a-graph-based-ranking-approach-to-extract-key
Repo
Framework

Comprehensive Video Understanding: Video summarization with content-based video recommender design


Title	Comprehensive Video Understanding: Video summarization with content-based video recommender design
Authors	Yudong Jiang, Kaixu Cui, Bo Peng, Changliang Xu
Abstract	Video summarization aims to extract keyframes/shots from a long video. Previous methods mainly take diversity and representativeness of generated summaries as prior knowledge in algorithm design. In this paper, we formulate video summarization as a content-based recommender problem, which should distill the most useful content from a long video for users who suffer from information overload. A scalable deep neural network is proposed on predicting if one video segment is a useful segment for users by explicitly modelling both segment and video. Moreover, we accomplish scene and action recognition in untrimmed videos in order to find more correlations among different aspects of video understanding tasks. Also, our paper will discuss the effect of audio and visual features in summarization task. We also extend our work by data augmentation and multi-task learning for preventing the model from early-stage overfitting. The final results of our model win the first place in ICCV 2019 CoView Workshop Challenge Track.
Tasks	Data Augmentation, Multi-Task Learning, Video Summarization, Video Understanding
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13888v1
PDF	https://arxiv.org/pdf/1910.13888v1.pdf
PWC	https://paperswithcode.com/paper/comprehensive-video-understanding-video
Repo
Framework

AFP-Net: Realtime Anchor-Free Polyp Detection in Colonoscopy


Title	AFP-Net: Realtime Anchor-Free Polyp Detection in Colonoscopy
Authors	Dechun Wang, Ning Zhang, Xinzi Sun, Pengfei Zhang, Chenxi Zhang, Yu Cao, Benyuan Liu
Abstract	Colorectal cancer (CRC) is a common and lethal disease. Globally, CRC is the third most commonly diagnosed cancer in males and the second in females. For colorectal cancer, the best screening test available is the colonoscopy. During a colonoscopic procedure, a tiny camera at the tip of the endoscope generates a video of the internal mucosa of the colon. The video data are displayed on a monitor for the physician to examine the lining of the entire colon and check for colorectal polyps. Detection and removal of colorectal polyps are associated with a reduction in mortality from colorectal cancer. However, the miss rate of polyp detection during colonoscopy procedure is often high even for very experienced physicians. The reason lies in the high variation of polyp in terms of shape, size, textural, color and illumination. Though challenging, with the great advances in object detection techniques, automated polyp detection still demonstrates a great potential in reducing the false negative rate while maintaining a high precision. In this paper, we propose a novel anchor free polyp detector that can localize polyps without using predefined anchor boxes. To further strengthen the model, we leverage a Context Enhancement Module and Cosine Ground truth Projection. Our approach can respond in real time while achieving state-of-the-art performance with 99.36% precision and 96.44% recall.
Tasks	Object Detection
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02477v3
PDF	https://arxiv.org/pdf/1909.02477v3.pdf
PWC	https://paperswithcode.com/paper/afp-net-realtime-anchor-free-polyp-detection
Repo
Framework

An Efficient and Layout-Independent Automatic License Plate Recognition System Based on the YOLO detector


Title	An Efficient and Layout-Independent Automatic License Plate Recognition System Based on the YOLO detector
Authors	Rayson Laroca, Luiz A. Zanlorensi, Gabriel R. Gonçalves, Eduardo Todt, William Robson Schwartz, David Menotti
Abstract	In this paper, we present an efficient and layout-independent Automatic License Plate Recognition (ALPR) system based on the state-of-the-art YOLO object detector that contains a unified approach for license plate (LP) detection and layout classification to improve the recognition results using post-processing rules. The system is conceived by evaluating and optimizing different models with various modifications, aiming at achieving the best speed/accuracy trade-off at each stage. The networks are trained using images from several datasets, with the addition of various data augmentation techniques, so that they are robust under different conditions. The proposed system achieved an average end-to-end recognition rate of 96.8% across eight public datasets (from five different regions) used in the experiments, outperforming both previous works and commercial systems in the ChineseLP, OpenALPR-EU, SSIG-SegPlate and UFPR-ALPR datasets. In the other datasets, the proposed approach achieved competitive results to those attained by the baselines. Our system also achieved impressive frames per second (FPS) rates on a high-end GPU, being able to perform in real time even when there are four vehicles in the scene. An additional contribution is that we manually labeled 38,334 bounding boxes on 6,237 images from public datasets and made the annotations publicly available to the research community.
Tasks	Data Augmentation, License Plate Recognition
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01754v2
PDF	https://arxiv.org/pdf/1909.01754v2.pdf
PWC	https://paperswithcode.com/paper/an-efficient-and-layout-independent-automatic
Repo
Framework

License Plate Recognition with Compressive Sensing Based Feature Extraction


Title	License Plate Recognition with Compressive Sensing Based Feature Extraction
Authors	Andrej Jokic, Nikola Vukovic
Abstract	License plate recognition is the key component to many automatic traffic control systems. It enables the automatic identification of vehicles in many applications. Such systems must be able to identify vehicles from images taken in various conditions including low light, rain, snow, etc. In order to reduce the complexity and cost of the hardware required for such devices, the algorithm should be as efficient as possible. This paper proposes a license plate recognition system which uses a new approach based on compressive sensing techniques for dimensionality reduction and feature extraction. Dimensionality reduction will enable precise classification with less training data while demanding less computational power. Based on the extracted features, character recognition and classification is done by a Support Vector Machine classifier.
Tasks	Compressive Sensing, Dimensionality Reduction, License Plate Recognition
Published	2019-02-07
URL	http://arxiv.org/abs/1902.05386v1
PDF	http://arxiv.org/pdf/1902.05386v1.pdf
PWC	https://paperswithcode.com/paper/license-plate-recognition-with-compressive
Repo
Framework

TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation


Title	TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation
Authors	Fan Yang, Xiao Liu, Dongliang He, Chuang Gan, Jian Wang, Chao Li, Fu Li, Shilei Wen
Abstract	In this work, we introduce a new problem, named as {\em story-preserving long video truncation}, that requires an algorithm to automatically truncate a long-duration video into multiple short and attractive sub-videos with each one containing an unbroken story. This differs from traditional video highlight detection or video summarization problems in that each sub-video is required to maintain a coherent and integral story, which is becoming particularly important for resource-production video sharing platforms such as Youtube, Facebook, TikTok, Kwai, etc. To address the problem, we collect and annotate a new large video truncation dataset, named as TruNet, which contains 1470 videos with on average 11 short stories per video. With the new dataset, we further develop and train a neural architecture for video truncation that consists of two components: a Boundary Aware Network (BAN) and a Fast-Forward Long Short-Term Memory (FF-LSTM). We first use the BAN to generate high quality temporal proposals by jointly considering frame-level attractiveness and boundaryness. We then apply the FF-LSTM, which tends to capture high-order dependencies among a sequence of frames, to decide whether a temporal proposal is a coherent and integral story. We show that our proposed framework outperforms existing approaches for the story-preserving long video truncation problem in both quantitative measures and user-study. The dataset is available for public academic research usage at https://ai.baidu.com/broad/download.
Tasks	Video Summarization
Published	2019-10-14
URL	https://arxiv.org/abs/1910.05899v1
PDF	https://arxiv.org/pdf/1910.05899v1.pdf
PWC	https://paperswithcode.com/paper/trunet-short-videos-generation-from-long
Repo
Framework

Video Summarization using Keyframe Extraction and Video Skimming


Title	Video Summarization using Keyframe Extraction and Video Skimming
Authors	Shruti Jadon, Mahmood Jasim
Abstract	Video is one of the robust sources of information and the consumption of online and offline videos has reached an unprecedented level in the last few years. A fundamental challenge of extracting information from videos is a viewer has to go through the complete video to understand the context, as opposed to an image where the viewer can extract information from a single frame. In this work, we attempt to employ different Algorithmic methodologies including local features and deep neural networks along with multiple clustering methods to find an effective way of summarizing a video by interesting keyframe extraction.
Tasks	Video Summarization
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04792v1
PDF	https://arxiv.org/pdf/1910.04792v1.pdf
PWC	https://paperswithcode.com/paper/video-summarization-using-keyframe-extraction
Repo
Framework

Spectrogram Feature Losses for Music Source Separation


Title	Spectrogram Feature Losses for Music Source Separation
Authors	Abhimanyu Sahai, Romann Weber, Brian McWilliams
Abstract	In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a high-level feature loss term, extracted from the spectrograms using a VGG net, can improve separation quality vis-a-vis a pure pixel-level loss. We show this improvement in the context of the MMDenseNet, a State-of-the-Art deep learning model for this task, for the extraction of drums and vocal sounds from songs in the musdb18 database, covering a broad range of western music genres. We believe that this finding can be generalized and applied to broader machine learning-based systems in the audio domain.
Tasks	Music Source Separation
Published	2019-01-15
URL	https://arxiv.org/abs/1901.05061v3
PDF	https://arxiv.org/pdf/1901.05061v3.pdf
PWC	https://paperswithcode.com/paper/spectrogram-feature-losses-for-music-source
Repo
Framework

Bayes metaclassifier and Soft-confusion-matrix classifier in the task of multi-label classification


Title	Bayes metaclassifier and Soft-confusion-matrix classifier in the task of multi-label classification
Authors	Pawel Trajdos, Marcin Majak
Abstract	The aim of this paper was to compare soft confusion matrix approach and Bayes metaclassifier under the multi-label classification framework. Although the methods were successfully applied under the multi-label classification framework, they have not been compared directly thus far. Such comparison is of vital importance because both methods are quite similar as they are both based on the concept of randomized reference classifier. Since both algorithms were designed to deal with single-label problems, they are combined with the problem-transformation approach to multi-label classification. Present study included 29 benchmark datasets and four different base classifiers. The algorithms were compared in terms of 11 quality criteria and the results were subjected to statistical analysis.
Tasks	Multi-Label Classification
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08827v1
PDF	http://arxiv.org/pdf/1901.08827v1.pdf
PWC	https://paperswithcode.com/paper/bayes-metaclassifier-and-soft-confusion
Repo
Framework

Infusing domain knowledge in AI-based “black box” models for better explainability with application in bankruptcy prediction


Title	Infusing domain knowledge in AI-based “black box” models for better explainability with application in bankruptcy prediction
Authors	Sheikh Rabiul Islam, William Eberle, Sid Bundy, Sheikh Khaled Ghafoor
Abstract	Although “black box” models such as Artificial Neural Networks, Support Vector Machines, and Ensemble Approaches continue to show superior performance in many disciplines, their adoption in the sensitive disciplines (e.g., finance, healthcare) is questionable due to the lack of interpretability and explainability of the model. In fact, future adoption of “black box” models is difficult because of the recent rule of “right of explanation” by the European Union where a user can ask for an explanation behind an algorithmic decision, and the newly proposed bill by the US government, the “Algorithmic Accountability Act”, which would require companies to assess their machine learning systems for bias and discrimination and take corrective measures. Top Bankruptcy Prediction Models are A.I.-based and are in need of better explainability -the extent to which the internal working mechanisms of an AI system can be explained in human terms. Although explainable artificial intelligence is an emerging field of research, infusing domain knowledge for better explainability might be a possible solution. In this work, we demonstrate a way to collect and infuse domain knowledge into a “black box” model for bankruptcy prediction. Our understanding from the experiments reveals that infused domain knowledge makes the output from the black box model more interpretable and explainable.
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11474v2
PDF	https://arxiv.org/pdf/1905.11474v2.pdf
PWC	https://paperswithcode.com/paper/infusing-domain-knowledge-in-ai-based-black
Repo
Framework

SuperCaptioning: Image Captioning Using Two-dimensional Word Embedding


Title	SuperCaptioning: Image Captioning Using Two-dimensional Word Embedding
Authors	Baohua Sun, Lin Yang, Michael Lin, Charles Young, Patrick Dong, Wenhan Zhang, Jason Dong
Abstract	Language and vision are processed as two different modal in current work for image captioning. However, recent work on Super Characters method shows the effectiveness of two-dimensional word embedding, which converts text classification problem into image classification problem. In this paper, we propose the SuperCaptioning method, which borrows the idea of two-dimensional word embedding from Super Characters method, and processes the information of language and vision together in one single CNN model. The experimental results on Flickr30k data shows the proposed method gives high quality image captions. An interactive demo is ready to show at the workshop.
Tasks	Image Captioning, Image Classification, Text Classification
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10515v2
PDF	https://arxiv.org/pdf/1905.10515v2.pdf
PWC	https://paperswithcode.com/paper/supercaptioning-image-captioning-using-two
Repo
Framework

Patient-specific Conditional Joint Models of Shape, Image Features and Clinical Indicators


Title	Patient-specific Conditional Joint Models of Shape, Image Features and Clinical Indicators
Authors	Bernhard Egger, Markus D. Schirmer, Florian Dubost, Marco J. Nardin, Natalia S. Rost, Polina Golland
Abstract	We propose and demonstrate a joint model of anatomical shapes, image features and clinical indicators for statistical shape modeling and medical image analysis. The key idea is to employ a copula model to separate the joint dependency structure from the marginal distributions of variables of interest. This separation provides flexibility on the assumptions made during the modeling process. The proposed method can handle binary, discrete, ordinal and continuous variables. We demonstrate a simple and efficient way to include binary, discrete and ordinal variables into the modeling. We build Bayesian conditional models based on observed partial clinical indicators, features or shape based on Gaussian processes capturing the dependency structure. We apply the proposed method on a stroke dataset to jointly model the shape of the lateral ventricles, the spatial distribution of the white matter hyperintensity associated with periventricular white matter disease, and clinical indicators. The proposed method yields interpretable joint models for data exploration and patient-specific statistical shape models for medical image analysis.
Tasks	Gaussian Processes
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07783v1
PDF	https://arxiv.org/pdf/1907.07783v1.pdf
PWC	https://paperswithcode.com/paper/patient-specific-conditional-joint-models-of
Repo
Framework

Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning


Title	Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning
Authors	Emma Tosch, Kaleigh Clary, John Foley, David Jensen
Abstract	Evaluation of deep reinforcement learning (RL) is inherently challenging. In particular, learned policies are largely opaque, and hypotheses about the behavior of deep RL agents are difficult to test in black-box environments. Considerable effort has gone into addressing opacity, but almost no effort has been devoted to producing high quality environments for experimental evaluation of agent behavior. We present TOYBOX, a new high-performance, open-source* subset of Atari environments re-designed for the experimental evaluation of deep RL. We show that TOYBOX enables a wide range of experiments and analyses that are impossible in other environments. *https://kdl-umass.github.io/Toybox/
Tasks
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02825v1
PDF	https://arxiv.org/pdf/1905.02825v1.pdf
PWC	https://paperswithcode.com/paper/toybox-a-suite-of-environments-for
Repo
Framework