July 27, 2019

3041 words 15 mins read

Paper Group ANR 486

A Taught-Obesrve-Ask (TOA) Method for Object Detection with Critical Supervision. DCTM: Discrete-Continuous Transformation Matching for Semantic Flow. Highly Efficient Human Action Recognition with Quantum Genetic Algorithm Optimized Support Vector Machine. Human perception in computer vision. Airway segmentation from 3D chest CT volumes based on v …

A Taught-Obesrve-Ask (TOA) Method for Object Detection with Critical Supervision


Title	A Taught-Obesrve-Ask (TOA) Method for Object Detection with Critical Supervision
Authors	Chi-Hao Wu, Qin Huang, Siyang Li, C. -C. Jay Kuo
Abstract	Being inspired by child’s learning experience - taught first and followed by observation and questioning, we investigate a critically supervised learning methodology for object detection in this work. Specifically, we propose a taught-observe-ask (TOA) method that consists of several novel components such as negative object proposal, critical example mining, and machine-guided question-answer (QA) labeling. To consider labeling time and performance jointly, new evaluation methods are developed to compare the performance of the TOA method, with the fully and weakly supervised learning methods. Extensive experiments are conducted on the PASCAL VOC and the Caltech benchmark datasets. The TOA method provides significantly improved performance of weakly supervision yet demands only about 3-6% of labeling time of full supervision. The effectiveness of each novel component is also analyzed.
Tasks	Object Detection
Published	2017-11-03
URL	http://arxiv.org/abs/1711.01043v1
PDF	http://arxiv.org/pdf/1711.01043v1.pdf
PWC	https://paperswithcode.com/paper/a-taught-obesrve-ask-toa-method-for-object
Repo
Framework

DCTM: Discrete-Continuous Transformation Matching for Semantic Flow


Title	DCTM: Discrete-Continuous Transformation Matching for Semantic Flow
Authors	Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn
Abstract	Techniques for dense semantic correspondence have provided limited ability to deal with the geometric variations that commonly exist between semantically similar images. While variations due to scale and rotation have been examined, there lack practical solutions for more complex deformations such as affine transformations because of the tremendous size of the associated solution space. To address this problem, we present a discrete-continuous transformation matching (DCTM) framework where dense affine transformation fields are inferred through a discrete label optimization in which the labels are iteratively updated via continuous regularization. In this way, our approach draws solutions from the continuous space of affine transformations in a manner that can be computed efficiently through constant-time edge-aware filtering and a proposed affine-varying CNN-based descriptor. Experimental results show that this model outperforms the state-of-the-art methods for dense semantic correspondence on various benchmarks.
Tasks
Published	2017-07-18
URL	http://arxiv.org/abs/1707.05471v1
PDF	http://arxiv.org/pdf/1707.05471v1.pdf
PWC	https://paperswithcode.com/paper/dctm-discrete-continuous-transformation
Repo
Framework

Highly Efficient Human Action Recognition with Quantum Genetic Algorithm Optimized Support Vector Machine


Title	Highly Efficient Human Action Recognition with Quantum Genetic Algorithm Optimized Support Vector Machine
Authors	Yafeng Liu, Shimin Feng, Zhikai Zhao, Enjie Ding
Abstract	In this paper we propose the use of quantum genetic algorithm to optimize the support vector machine (SVM) for human action recognition. The Microsoft Kinect sensor can be used for skeleton tracking, which provides the joints’ position data. However, how to extract the motion features for representing the dynamics of a human skeleton is still a challenge due to the complexity of human motion. We present a highly efficient features extraction method for action classification, that is, using the joint angles to represent a human skeleton and calculating the variance of each angle during an action time window. Using the proposed representation, we compared the human action classification accuracy of two approaches, including the optimized SVM based on quantum genetic algorithm and the conventional SVM with grid search. Experimental results on the MSR-12 dataset show that the conventional SVM achieved an accuracy of $ 93.85% $. The proposed approach outperforms the conventional method with an accuracy of $ 96.15% $.
Tasks	Action Classification, Temporal Action Localization
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09511v2
PDF	http://arxiv.org/pdf/1711.09511v2.pdf
PWC	https://paperswithcode.com/paper/highly-efficient-human-action-recognition
Repo
Framework

Human perception in computer vision


Title	Human perception in computer vision
Authors	Ron Dekel
Abstract	Computer vision has made remarkable progress in recent years. Deep neural network (DNN) models optimized to identify objects in images exhibit unprecedented task-trained accuracy and, remarkably, some generalization ability: new visual problems can now be solved more easily based on previous learning. Biological vision (learned in life and through evolution) is also accurate and general-purpose. Is it possible that these different learning regimes converge to similar problem-dependent optimal computations? We therefore asked whether the human system-level computation of visual perception has DNN correlates and considered several anecdotal test cases. We found that perceptual sensitivity to image changes has DNN mid-computation correlates, while sensitivity to segmentation, crowding and shape has DNN end-computation correlates. Our results quantify the applicability of using DNN computation to estimate perceptual loss, and are consistent with the fascinating theoretical view that properties of human perception are a consequence of architecture-independent visual learning.
Tasks
Published	2017-01-17
URL	http://arxiv.org/abs/1701.04674v1
PDF	http://arxiv.org/pdf/1701.04674v1.pdf
PWC	https://paperswithcode.com/paper/human-perception-in-computer-vision
Repo
Framework

Airway segmentation from 3D chest CT volumes based on volume of interest using gradient vector flow


Title	Airway segmentation from 3D chest CT volumes based on volume of interest using gradient vector flow
Authors	Qier Meng, Takayuki Kitasaka, Masahiro Oda, Junji Ueno, Kensaku Mori
Abstract	Some lung diseases are related to bronchial airway structures and morphology. Although airway segmentation from chest CT volumes is an important task in the computer-aided diagnosis and surgery assistance systems for the chest, complete 3-D airway structure segmentation is a quite challenging task due to its complex tree-like structure. In this paper, we propose a new airway segmentation method from 3D chest CT volumes based on volume of interests (VOI) using gradient vector flow (GVF). This method segments the bronchial regions by applying the cavity enhancement filter (CEF) to trace the bronchial tree structure from the trachea. It uses the CEF in the VOI to segment each branch. And a tube-likeness function based on GVF and the GVF magnitude map in each VOI are utilized to assist predicting the positions and directions of child branches. By calculating the tube-likeness function based on GVF and the GVF magnitude map, the airway-like candidate structures are identified and their centrelines are extracted. Based on the extracted centrelines, we can detect the branch points of the bifurcations and directions of the airway branches in the next level. At the same time, a leakage detection is performed to avoid the leakage by analysing the pixel information and the shape information of airway candidate regions extracted in the VOI. Finally, we unify all of the extracted bronchial regions to form an integrated airway tree. Preliminary experiments using four cases of chest CT volumes demonstrated that the proposed method can extract more bronchial branches in comparison with other methods.
Tasks
Published	2017-04-26
URL	http://arxiv.org/abs/1704.08030v1
PDF	http://arxiv.org/pdf/1704.08030v1.pdf
PWC	https://paperswithcode.com/paper/airway-segmentation-from-3d-chest-ct-volumes
Repo
Framework

SurfCut: Surfaces of Minimal Paths From Topological Structures


Title	SurfCut: Surfaces of Minimal Paths From Topological Structures
Authors	Marei Algarni, Ganesh Sundaramoorthi
Abstract	We present SurfCut, an algorithm for extracting a smooth, simple surface with an unknown 3D curve boundary from a noisy 3D image and a seed point. Our method is built on the novel observation that certain ridge curves of a function defined on a front propagated using the Fast Marching algorithm lie on the surface. Our method extracts and cuts these ridges to form the surface boundary. Our surface extraction algorithm is built on the novel observation that the surface lies in a valley of the distance from Fast Marching. We show that the resulting surface is a collection of minimal paths. Using the framework of cubical complexes and Morse theory, we design algorithms to extract these critical structures robustly. Experiments on three 3D datasets show the robustness of our method, and that it achieves higher accuracy with lower computational cost than state-of-the-art.
Tasks
Published	2017-04-30
URL	http://arxiv.org/abs/1705.00301v1
PDF	http://arxiv.org/pdf/1705.00301v1.pdf
PWC	https://paperswithcode.com/paper/surfcut-surfaces-of-minimal-paths-from
Repo
Framework

A State-Space Approach to Dynamic Nonnegative Matrix Factorization


Title	A State-Space Approach to Dynamic Nonnegative Matrix Factorization
Authors	Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo
Abstract	Nonnegative matrix factorization (NMF) has been actively investigated and used in a wide range of problems in the past decade. A significant amount of attention has been given to develop NMF algorithms that are suitable to model time series with strong temporal dependencies. In this paper, we propose a novel state-space approach to perform dynamic NMF (D-NMF). In the proposed probabilistic framework, the NMF coefficients act as the state variables and their dynamics are modeled using a multi-lag nonnegative vector autoregressive (N-VAR) model within the process equation. We use expectation maximization and propose a maximum-likelihood estimation framework to estimate the basis matrix and the N-VAR model parameters. Interestingly, the N-VAR model parameters are obtained by simply applying NMF. Moreover, we derive a maximum a posteriori estimate of the state variables (i.e., the NMF coefficients) that is based on a prediction step and an update step, similarly to the Kalman filter. We illustrate the benefits of the proposed approach using different numerical simulations where D-NMF significantly outperforms its static counterpart. Experimental results for three different applications show that the proposed approach outperforms two state-of-the-art NMF approaches that exploit temporal dependencies, namely a nonnegative hidden Markov model and a frame stacking approach, while it requires less memory and computational power.
Tasks	Time Series
Published	2017-08-31
URL	http://arxiv.org/abs/1709.00025v1
PDF	http://arxiv.org/pdf/1709.00025v1.pdf
PWC	https://paperswithcode.com/paper/a-state-space-approach-to-dynamic-nonnegative
Repo
Framework

Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations


Title	Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations
Authors	Pan Xu, Jian Ma, Quanquan Gu
Abstract	We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization, and an efficient alternating gradient descent algorithm with hard thresholding to solve it. Our algorithm is orders of magnitude faster than the convex relaxation based methods for LVGGM. In addition, we prove that our algorithm is guaranteed to linearly converge to the unknown sparse and low-rank components up to the optimal statistical precision. Experiments on both synthetic and genomic data demonstrate the superiority of our algorithm over the state-of-the-art algorithms and corroborate our theory.
Tasks
Published	2017-02-28
URL	http://arxiv.org/abs/1702.08651v1
PDF	http://arxiv.org/pdf/1702.08651v1.pdf
PWC	https://paperswithcode.com/paper/speeding-up-latent-variable-gaussian
Repo
Framework

Interleaver Design for Deep Neural Networks


Title	Interleaver Design for Deep Neural Networks
Authors	Sourya Dey, Peter A. Beerel, Keith M. Chugg
Abstract	We propose a class of interleavers for a novel deep neural network (DNN) architecture that uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements, and speed up training. The interleavers guarantee clash-free memory accesses to eliminate idle operational cycles, optimize spread and dispersion to improve network performance, and are designed to ease the complexity of memory address computations in hardware. We present a design algorithm with mathematical proofs for these properties. We also explore interleaver variations and analyze the behavior of neural networks as a function of interleaver metrics.
Tasks	Mathematical Proofs
Published	2017-11-18
URL	http://arxiv.org/abs/1711.06935v3
PDF	http://arxiv.org/pdf/1711.06935v3.pdf
PWC	https://paperswithcode.com/paper/interleaver-design-for-deep-neural-networks
Repo
Framework

Ridesourcing Car Detection by Transfer Learning


Title	Ridesourcing Car Detection by Transfer Learning
Authors	Leye Wang, Xu Geng, Jintao Ke, Chen Peng, Xiaojuan Ma, Daqing Zhang, Qiang Yang
Abstract	Ridesourcing platforms like Uber and Didi are getting more and more popular around the world. However, unauthorized ridesourcing activities taking advantages of the sharing economy can greatly impair the healthy development of this emerging industry. As the first step to regulate on-demand ride services and eliminate black market, we design a method to detect ridesourcing cars from a pool of cars based on their trajectories. Since licensed ridesourcing car traces are not openly available and may be completely missing in some cities due to legal issues, we turn to transferring knowledge from public transport open data, i.e, taxis and buses, to ridesourcing detection among ordinary vehicles. We propose a two-stage transfer learning framework. In Stage 1, we take taxi and bus data as input to learn a random forest (RF) classifier using trajectory features shared by taxis/buses and ridesourcing/other cars. Then, we use the RF to label all the candidate cars. In Stage 2, leveraging the subset of high confident labels from the previous stage as input, we further learn a convolutional neural network (CNN) classifier for ridesourcing detection, and iteratively refine RF and CNN, as well as the feature set, via a co-training process. Finally, we use the resulting ensemble of RF and CNN to identify the ridesourcing cars in the candidate pool. Experiments on real car, taxi and bus traces show that our transfer learning framework, with no need of a pre-labeled ridesourcing dataset, can achieve similar accuracy as the supervised learning methods.
Tasks	Transfer Learning
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08409v1
PDF	http://arxiv.org/pdf/1705.08409v1.pdf
PWC	https://paperswithcode.com/paper/ridesourcing-car-detection-by-transfer
Repo
Framework

Real Time Analytics: Algorithms and Systems


Title	Real Time Analytics: Algorithms and Systems
Authors	Arun Kejariwal, Sanjeev Kulkarni, Karthik Ramasamy
Abstract	Velocity is one of the 4 Vs commonly used to characterize Big Data. In this regard, Forrester remarked the following in Q3 2014: “The high velocity, white-water flow of data from innumerable real-time data sources such as market data, Internet of Things, mobile, sensors, click-stream, and even transactions remain largely unnavigated by most firms. The opportunity to leverage streaming analytics has never been greater.” Example use cases of streaming analytics include, but not limited to: (a) visualization of business metrics in real-time (b) facilitating highly personalized experiences (c) expediting response during emergencies. Streaming analytics is extensively used in a wide variety of domains such as healthcare, e-commerce, financial services, telecommunications, energy and utilities, manufacturing, government and transportation. In this tutorial, we shall present an in-depth overview of streaming analytics - applications, algorithms and platforms - landscape. We shall walk through how the field has evolved over the last decade and then discuss the current challenges - the impact of the other three Vs, viz., Volume, Variety and Veracity, on Big Data streaming analytics. The tutorial is intended for both researchers and practitioners in the industry. We shall also present state-of-the-affairs of streaming analytics at Twitter.
Tasks
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02621v1
PDF	http://arxiv.org/pdf/1708.02621v1.pdf
PWC	https://paperswithcode.com/paper/real-time-analytics-algorithms-and-systems
Repo
Framework

Grammar Induction for Minimalist Grammars using Variational Bayesian Inference : A Technical Report


Title	Grammar Induction for Minimalist Grammars using Variational Bayesian Inference : A Technical Report
Authors	Eva Portelance, Amelia Bruno, Daniel Harasim, Leon Bergen, Timothy J. O’Donnell
Abstract	The following technical report presents a formal approach to probabilistic minimalist grammar parameter estimation. We describe a formalization of a minimalist grammar. We then present an algorithm for the application of variational Bayesian inference to this formalization.
Tasks	Bayesian Inference
Published	2017-10-31
URL	https://arxiv.org/abs/1710.11350v3
PDF	https://arxiv.org/pdf/1710.11350v3.pdf
PWC	https://paperswithcode.com/paper/grammar-induction-for-mildly-context
Repo
Framework

Detailed Surface Geometry and Albedo Recovery from RGB-D Video Under Natural Illumination


Title	Detailed Surface Geometry and Albedo Recovery from RGB-D Video Under Natural Illumination
Authors	Xinxin Zuo, Sen Wang, Jiangbin Zheng, Ruigang Yang
Abstract	In this paper we present a novel approach for depth map enhancement from an RGB-D video sequence. The basic idea is to exploit the shading information in the color image. Instead of making assumption about surface albedo or controlled object motion and lighting, we use the lighting variations introduced by casual object movement. We are effectively calculating photometric stereo from a moving object under natural illuminations. The key technical challenge is to establish correspondences over the entire image set. We therefore develop a lighting insensitive robust pixel matching technique that out-performs optical flow method in presence of lighting variations. In addition we present an expectation-maximization framework to recover the surface normal and albedo simultaneously, without any regularization term. We have validated our method on both synthetic and real datasets to show its superior performance on both surface details recovery and intrinsic decomposition.
Tasks	Optical Flow Estimation
Published	2017-02-06
URL	https://arxiv.org/abs/1702.01486v4
PDF	https://arxiv.org/pdf/1702.01486v4.pdf
PWC	https://paperswithcode.com/paper/detailed-surface-geometry-and-albedo-recovery
Repo
Framework

CHAM: action recognition using convolutional hierarchical attention model


Title	CHAM: action recognition using convolutional hierarchical attention model
Authors	Shiyang Yan, Jeremy S. Smith, Wenjin Lu, Bailing Zhang
Abstract	Recently, the soft attention mechanism, which was originally proposed in language processing, has been applied in computer vision tasks like image captioning. This paper presents improvements to the soft attention model by combining a convolutional LSTM with a hierarchical system architecture to recognize action categories in videos. We call this model the Convolutional Hierarchical Attention Model (CHAM). The model applies a convolutional operation inside the LSTM cell and an attention map generation process to recognize actions. The hierarchical architecture of this model is able to explicitly reason on multi-granularities of action categories. The proposed architecture achieved improved results on three publicly available datasets: the UCF sports dataset, the Olympic sports dataset and the HMDB51 dataset.
Tasks	Image Captioning, Temporal Action Localization
Published	2017-05-09
URL	http://arxiv.org/abs/1705.03146v2
PDF	http://arxiv.org/pdf/1705.03146v2.pdf
PWC	https://paperswithcode.com/paper/cham-action-recognition-using-convolutional
Repo
Framework

Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading


Title	Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading
Authors	Adriana Fernandez-Lopez, Federico M. Sukno
Abstract	Speech is the most common communication method between humans and involves the perception of both auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, but it has been demonstrated that video can provide information that is complementary to the audio. Thus, the study of automatic lip-reading is important and is still an open problem. One of the key challenges is the definition of the visual elementary units (the visemes) and their vocabulary. Many researchers have analyzed the importance of the phoneme to viseme mapping and have proposed viseme vocabularies with lengths between 11 and 15 visemes. These viseme vocabularies have usually been manually defined by their linguistic properties and in some cases using decision trees or clustering techniques. In this work, we focus on the automatic construction of an optimal viseme vocabulary based on the association of phonemes with similar appearance. To this end, we construct an automatic system that uses local appearance descriptors to extract the main characteristics of the mouth region and HMMs to model the statistic relations of both viseme and phoneme sequences. To compare the performance of the system different descriptors (PCA, DCT and SIFT) are analyzed. We test our system in a Spanish corpus of continuous speech. Our results indicate that we are able to recognize approximately 58% of the visemes, 47% of the phonemes and 23% of the words in a continuous speech scenario and that the optimal viseme vocabulary for Spanish is composed by 20 visemes.
Tasks	Speech Recognition
Published	2017-04-26
URL	http://arxiv.org/abs/1704.08035v1
PDF	http://arxiv.org/pdf/1704.08035v1.pdf
PWC	https://paperswithcode.com/paper/automatic-viseme-vocabulary-construction-to
Repo
Framework