Paper Group ANR 486
A Taught-Obesrve-Ask (TOA) Method for Object Detection with Critical Supervision. DCTM: Discrete-Continuous Transformation Matching for Semantic Flow. Highly Efficient Human Action Recognition with Quantum Genetic Algorithm Optimized Support Vector Machine. Human perception in computer vision. Airway segmentation from 3D chest CT volumes based on v …
A Taught-Obesrve-Ask (TOA) Method for Object Detection with Critical Supervision
Title | A Taught-Obesrve-Ask (TOA) Method for Object Detection with Critical Supervision |
Authors | Chi-Hao Wu, Qin Huang, Siyang Li, C. -C. Jay Kuo |
Abstract | Being inspired by child’s learning experience - taught first and followed by observation and questioning, we investigate a critically supervised learning methodology for object detection in this work. Specifically, we propose a taught-observe-ask (TOA) method that consists of several novel components such as negative object proposal, critical example mining, and machine-guided question-answer (QA) labeling. To consider labeling time and performance jointly, new evaluation methods are developed to compare the performance of the TOA method, with the fully and weakly supervised learning methods. Extensive experiments are conducted on the PASCAL VOC and the Caltech benchmark datasets. The TOA method provides significantly improved performance of weakly supervision yet demands only about 3-6% of labeling time of full supervision. The effectiveness of each novel component is also analyzed. |
Tasks | Object Detection |
Published | 2017-11-03 |
URL | http://arxiv.org/abs/1711.01043v1 |
http://arxiv.org/pdf/1711.01043v1.pdf | |
PWC | https://paperswithcode.com/paper/a-taught-obesrve-ask-toa-method-for-object |
Repo | |
Framework | |
DCTM: Discrete-Continuous Transformation Matching for Semantic Flow
Title | DCTM: Discrete-Continuous Transformation Matching for Semantic Flow |
Authors | Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn |
Abstract | Techniques for dense semantic correspondence have provided limited ability to deal with the geometric variations that commonly exist between semantically similar images. While variations due to scale and rotation have been examined, there lack practical solutions for more complex deformations such as affine transformations because of the tremendous size of the associated solution space. To address this problem, we present a discrete-continuous transformation matching (DCTM) framework where dense affine transformation fields are inferred through a discrete label optimization in which the labels are iteratively updated via continuous regularization. In this way, our approach draws solutions from the continuous space of affine transformations in a manner that can be computed efficiently through constant-time edge-aware filtering and a proposed affine-varying CNN-based descriptor. Experimental results show that this model outperforms the state-of-the-art methods for dense semantic correspondence on various benchmarks. |
Tasks | |
Published | 2017-07-18 |
URL | http://arxiv.org/abs/1707.05471v1 |
http://arxiv.org/pdf/1707.05471v1.pdf | |
PWC | https://paperswithcode.com/paper/dctm-discrete-continuous-transformation |
Repo | |
Framework | |
Highly Efficient Human Action Recognition with Quantum Genetic Algorithm Optimized Support Vector Machine
Title | Highly Efficient Human Action Recognition with Quantum Genetic Algorithm Optimized Support Vector Machine |
Authors | Yafeng Liu, Shimin Feng, Zhikai Zhao, Enjie Ding |
Abstract | In this paper we propose the use of quantum genetic algorithm to optimize the support vector machine (SVM) for human action recognition. The Microsoft Kinect sensor can be used for skeleton tracking, which provides the joints’ position data. However, how to extract the motion features for representing the dynamics of a human skeleton is still a challenge due to the complexity of human motion. We present a highly efficient features extraction method for action classification, that is, using the joint angles to represent a human skeleton and calculating the variance of each angle during an action time window. Using the proposed representation, we compared the human action classification accuracy of two approaches, including the optimized SVM based on quantum genetic algorithm and the conventional SVM with grid search. Experimental results on the MSR-12 dataset show that the conventional SVM achieved an accuracy of $ 93.85% $. The proposed approach outperforms the conventional method with an accuracy of $ 96.15% $. |
Tasks | Action Classification, Temporal Action Localization |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09511v2 |
http://arxiv.org/pdf/1711.09511v2.pdf | |
PWC | https://paperswithcode.com/paper/highly-efficient-human-action-recognition |
Repo | |
Framework | |
Human perception in computer vision
Title | Human perception in computer vision |
Authors | Ron Dekel |
Abstract | Computer vision has made remarkable progress in recent years. Deep neural network (DNN) models optimized to identify objects in images exhibit unprecedented task-trained accuracy and, remarkably, some generalization ability: new visual problems can now be solved more easily based on previous learning. Biological vision (learned in life and through evolution) is also accurate and general-purpose. Is it possible that these different learning regimes converge to similar problem-dependent optimal computations? We therefore asked whether the human system-level computation of visual perception has DNN correlates and considered several anecdotal test cases. We found that perceptual sensitivity to image changes has DNN mid-computation correlates, while sensitivity to segmentation, crowding and shape has DNN end-computation correlates. Our results quantify the applicability of using DNN computation to estimate perceptual loss, and are consistent with the fascinating theoretical view that properties of human perception are a consequence of architecture-independent visual learning. |
Tasks | |
Published | 2017-01-17 |
URL | http://arxiv.org/abs/1701.04674v1 |
http://arxiv.org/pdf/1701.04674v1.pdf | |
PWC | https://paperswithcode.com/paper/human-perception-in-computer-vision |
Repo | |
Framework | |
Airway segmentation from 3D chest CT volumes based on volume of interest using gradient vector flow
Title | Airway segmentation from 3D chest CT volumes based on volume of interest using gradient vector flow |
Authors | Qier Meng, Takayuki Kitasaka, Masahiro Oda, Junji Ueno, Kensaku Mori |
Abstract | Some lung diseases are related to bronchial airway structures and morphology. Although airway segmentation from chest CT volumes is an important task in the computer-aided diagnosis and surgery assistance systems for the chest, complete 3-D airway structure segmentation is a quite challenging task due to its complex tree-like structure. In this paper, we propose a new airway segmentation method from 3D chest CT volumes based on volume of interests (VOI) using gradient vector flow (GVF). This method segments the bronchial regions by applying the cavity enhancement filter (CEF) to trace the bronchial tree structure from the trachea. It uses the CEF in the VOI to segment each branch. And a tube-likeness function based on GVF and the GVF magnitude map in each VOI are utilized to assist predicting the positions and directions of child branches. By calculating the tube-likeness function based on GVF and the GVF magnitude map, the airway-like candidate structures are identified and their centrelines are extracted. Based on the extracted centrelines, we can detect the branch points of the bifurcations and directions of the airway branches in the next level. At the same time, a leakage detection is performed to avoid the leakage by analysing the pixel information and the shape information of airway candidate regions extracted in the VOI. Finally, we unify all of the extracted bronchial regions to form an integrated airway tree. Preliminary experiments using four cases of chest CT volumes demonstrated that the proposed method can extract more bronchial branches in comparison with other methods. |
Tasks | |
Published | 2017-04-26 |
URL | http://arxiv.org/abs/1704.08030v1 |
http://arxiv.org/pdf/1704.08030v1.pdf | |
PWC | https://paperswithcode.com/paper/airway-segmentation-from-3d-chest-ct-volumes |
Repo | |
Framework | |
SurfCut: Surfaces of Minimal Paths From Topological Structures
Title | SurfCut: Surfaces of Minimal Paths From Topological Structures |
Authors | Marei Algarni, Ganesh Sundaramoorthi |
Abstract | We present SurfCut, an algorithm for extracting a smooth, simple surface with an unknown 3D curve boundary from a noisy 3D image and a seed point. Our method is built on the novel observation that certain ridge curves of a function defined on a front propagated using the Fast Marching algorithm lie on the surface. Our method extracts and cuts these ridges to form the surface boundary. Our surface extraction algorithm is built on the novel observation that the surface lies in a valley of the distance from Fast Marching. We show that the resulting surface is a collection of minimal paths. Using the framework of cubical complexes and Morse theory, we design algorithms to extract these critical structures robustly. Experiments on three 3D datasets show the robustness of our method, and that it achieves higher accuracy with lower computational cost than state-of-the-art. |
Tasks | |
Published | 2017-04-30 |
URL | http://arxiv.org/abs/1705.00301v1 |
http://arxiv.org/pdf/1705.00301v1.pdf | |
PWC | https://paperswithcode.com/paper/surfcut-surfaces-of-minimal-paths-from |
Repo | |
Framework | |
A State-Space Approach to Dynamic Nonnegative Matrix Factorization
Title | A State-Space Approach to Dynamic Nonnegative Matrix Factorization |
Authors | Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo |
Abstract | Nonnegative matrix factorization (NMF) has been actively investigated and used in a wide range of problems in the past decade. A significant amount of attention has been given to develop NMF algorithms that are suitable to model time series with strong temporal dependencies. In this paper, we propose a novel state-space approach to perform dynamic NMF (D-NMF). In the proposed probabilistic framework, the NMF coefficients act as the state variables and their dynamics are modeled using a multi-lag nonnegative vector autoregressive (N-VAR) model within the process equation. We use expectation maximization and propose a maximum-likelihood estimation framework to estimate the basis matrix and the N-VAR model parameters. Interestingly, the N-VAR model parameters are obtained by simply applying NMF. Moreover, we derive a maximum a posteriori estimate of the state variables (i.e., the NMF coefficients) that is based on a prediction step and an update step, similarly to the Kalman filter. We illustrate the benefits of the proposed approach using different numerical simulations where D-NMF significantly outperforms its static counterpart. Experimental results for three different applications show that the proposed approach outperforms two state-of-the-art NMF approaches that exploit temporal dependencies, namely a nonnegative hidden Markov model and a frame stacking approach, while it requires less memory and computational power. |
Tasks | Time Series |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1709.00025v1 |
http://arxiv.org/pdf/1709.00025v1.pdf | |
PWC | https://paperswithcode.com/paper/a-state-space-approach-to-dynamic-nonnegative |
Repo | |
Framework | |
Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations
Title | Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations |
Authors | Pan Xu, Jian Ma, Quanquan Gu |
Abstract | We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization, and an efficient alternating gradient descent algorithm with hard thresholding to solve it. Our algorithm is orders of magnitude faster than the convex relaxation based methods for LVGGM. In addition, we prove that our algorithm is guaranteed to linearly converge to the unknown sparse and low-rank components up to the optimal statistical precision. Experiments on both synthetic and genomic data demonstrate the superiority of our algorithm over the state-of-the-art algorithms and corroborate our theory. |
Tasks | |
Published | 2017-02-28 |
URL | http://arxiv.org/abs/1702.08651v1 |
http://arxiv.org/pdf/1702.08651v1.pdf | |
PWC | https://paperswithcode.com/paper/speeding-up-latent-variable-gaussian |
Repo | |
Framework | |
Interleaver Design for Deep Neural Networks
Title | Interleaver Design for Deep Neural Networks |
Authors | Sourya Dey, Peter A. Beerel, Keith M. Chugg |
Abstract | We propose a class of interleavers for a novel deep neural network (DNN) architecture that uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements, and speed up training. The interleavers guarantee clash-free memory accesses to eliminate idle operational cycles, optimize spread and dispersion to improve network performance, and are designed to ease the complexity of memory address computations in hardware. We present a design algorithm with mathematical proofs for these properties. We also explore interleaver variations and analyze the behavior of neural networks as a function of interleaver metrics. |
Tasks | Mathematical Proofs |
Published | 2017-11-18 |
URL | http://arxiv.org/abs/1711.06935v3 |
http://arxiv.org/pdf/1711.06935v3.pdf | |
PWC | https://paperswithcode.com/paper/interleaver-design-for-deep-neural-networks |
Repo | |
Framework | |
Ridesourcing Car Detection by Transfer Learning
Title | Ridesourcing Car Detection by Transfer Learning |
Authors | Leye Wang, Xu Geng, Jintao Ke, Chen Peng, Xiaojuan Ma, Daqing Zhang, Qiang Yang |
Abstract | Ridesourcing platforms like Uber and Didi are getting more and more popular around the world. However, unauthorized ridesourcing activities taking advantages of the sharing economy can greatly impair the healthy development of this emerging industry. As the first step to regulate on-demand ride services and eliminate black market, we design a method to detect ridesourcing cars from a pool of cars based on their trajectories. Since licensed ridesourcing car traces are not openly available and may be completely missing in some cities due to legal issues, we turn to transferring knowledge from public transport open data, i.e, taxis and buses, to ridesourcing detection among ordinary vehicles. We propose a two-stage transfer learning framework. In Stage 1, we take taxi and bus data as input to learn a random forest (RF) classifier using trajectory features shared by taxis/buses and ridesourcing/other cars. Then, we use the RF to label all the candidate cars. In Stage 2, leveraging the subset of high confident labels from the previous stage as input, we further learn a convolutional neural network (CNN) classifier for ridesourcing detection, and iteratively refine RF and CNN, as well as the feature set, via a co-training process. Finally, we use the resulting ensemble of RF and CNN to identify the ridesourcing cars in the candidate pool. Experiments on real car, taxi and bus traces show that our transfer learning framework, with no need of a pre-labeled ridesourcing dataset, can achieve similar accuracy as the supervised learning methods. |
Tasks | Transfer Learning |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08409v1 |
http://arxiv.org/pdf/1705.08409v1.pdf | |
PWC | https://paperswithcode.com/paper/ridesourcing-car-detection-by-transfer |
Repo | |
Framework | |
Real Time Analytics: Algorithms and Systems
Title | Real Time Analytics: Algorithms and Systems |
Authors | Arun Kejariwal, Sanjeev Kulkarni, Karthik Ramasamy |
Abstract | Velocity is one of the 4 Vs commonly used to characterize Big Data. In this regard, Forrester remarked the following in Q3 2014: “The high velocity, white-water flow of data from innumerable real-time data sources such as market data, Internet of Things, mobile, sensors, click-stream, and even transactions remain largely unnavigated by most firms. The opportunity to leverage streaming analytics has never been greater.” Example use cases of streaming analytics include, but not limited to: (a) visualization of business metrics in real-time (b) facilitating highly personalized experiences (c) expediting response during emergencies. Streaming analytics is extensively used in a wide variety of domains such as healthcare, e-commerce, financial services, telecommunications, energy and utilities, manufacturing, government and transportation. In this tutorial, we shall present an in-depth overview of streaming analytics - applications, algorithms and platforms - landscape. We shall walk through how the field has evolved over the last decade and then discuss the current challenges - the impact of the other three Vs, viz., Volume, Variety and Veracity, on Big Data streaming analytics. The tutorial is intended for both researchers and practitioners in the industry. We shall also present state-of-the-affairs of streaming analytics at Twitter. |
Tasks | |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02621v1 |
http://arxiv.org/pdf/1708.02621v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-analytics-algorithms-and-systems |
Repo | |
Framework | |
Grammar Induction for Minimalist Grammars using Variational Bayesian Inference : A Technical Report
Title | Grammar Induction for Minimalist Grammars using Variational Bayesian Inference : A Technical Report |
Authors | Eva Portelance, Amelia Bruno, Daniel Harasim, Leon Bergen, Timothy J. O’Donnell |
Abstract | The following technical report presents a formal approach to probabilistic minimalist grammar parameter estimation. We describe a formalization of a minimalist grammar. We then present an algorithm for the application of variational Bayesian inference to this formalization. |
Tasks | Bayesian Inference |
Published | 2017-10-31 |
URL | https://arxiv.org/abs/1710.11350v3 |
https://arxiv.org/pdf/1710.11350v3.pdf | |
PWC | https://paperswithcode.com/paper/grammar-induction-for-mildly-context |
Repo | |
Framework | |
Detailed Surface Geometry and Albedo Recovery from RGB-D Video Under Natural Illumination
Title | Detailed Surface Geometry and Albedo Recovery from RGB-D Video Under Natural Illumination |
Authors | Xinxin Zuo, Sen Wang, Jiangbin Zheng, Ruigang Yang |
Abstract | In this paper we present a novel approach for depth map enhancement from an RGB-D video sequence. The basic idea is to exploit the shading information in the color image. Instead of making assumption about surface albedo or controlled object motion and lighting, we use the lighting variations introduced by casual object movement. We are effectively calculating photometric stereo from a moving object under natural illuminations. The key technical challenge is to establish correspondences over the entire image set. We therefore develop a lighting insensitive robust pixel matching technique that out-performs optical flow method in presence of lighting variations. In addition we present an expectation-maximization framework to recover the surface normal and albedo simultaneously, without any regularization term. We have validated our method on both synthetic and real datasets to show its superior performance on both surface details recovery and intrinsic decomposition. |
Tasks | Optical Flow Estimation |
Published | 2017-02-06 |
URL | https://arxiv.org/abs/1702.01486v4 |
https://arxiv.org/pdf/1702.01486v4.pdf | |
PWC | https://paperswithcode.com/paper/detailed-surface-geometry-and-albedo-recovery |
Repo | |
Framework | |
CHAM: action recognition using convolutional hierarchical attention model
Title | CHAM: action recognition using convolutional hierarchical attention model |
Authors | Shiyang Yan, Jeremy S. Smith, Wenjin Lu, Bailing Zhang |
Abstract | Recently, the soft attention mechanism, which was originally proposed in language processing, has been applied in computer vision tasks like image captioning. This paper presents improvements to the soft attention model by combining a convolutional LSTM with a hierarchical system architecture to recognize action categories in videos. We call this model the Convolutional Hierarchical Attention Model (CHAM). The model applies a convolutional operation inside the LSTM cell and an attention map generation process to recognize actions. The hierarchical architecture of this model is able to explicitly reason on multi-granularities of action categories. The proposed architecture achieved improved results on three publicly available datasets: the UCF sports dataset, the Olympic sports dataset and the HMDB51 dataset. |
Tasks | Image Captioning, Temporal Action Localization |
Published | 2017-05-09 |
URL | http://arxiv.org/abs/1705.03146v2 |
http://arxiv.org/pdf/1705.03146v2.pdf | |
PWC | https://paperswithcode.com/paper/cham-action-recognition-using-convolutional |
Repo | |
Framework | |
Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading
Title | Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading |
Authors | Adriana Fernandez-Lopez, Federico M. Sukno |
Abstract | Speech is the most common communication method between humans and involves the perception of both auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, but it has been demonstrated that video can provide information that is complementary to the audio. Thus, the study of automatic lip-reading is important and is still an open problem. One of the key challenges is the definition of the visual elementary units (the visemes) and their vocabulary. Many researchers have analyzed the importance of the phoneme to viseme mapping and have proposed viseme vocabularies with lengths between 11 and 15 visemes. These viseme vocabularies have usually been manually defined by their linguistic properties and in some cases using decision trees or clustering techniques. In this work, we focus on the automatic construction of an optimal viseme vocabulary based on the association of phonemes with similar appearance. To this end, we construct an automatic system that uses local appearance descriptors to extract the main characteristics of the mouth region and HMMs to model the statistic relations of both viseme and phoneme sequences. To compare the performance of the system different descriptors (PCA, DCT and SIFT) are analyzed. We test our system in a Spanish corpus of continuous speech. Our results indicate that we are able to recognize approximately 58% of the visemes, 47% of the phonemes and 23% of the words in a continuous speech scenario and that the optimal viseme vocabulary for Spanish is composed by 20 visemes. |
Tasks | Speech Recognition |
Published | 2017-04-26 |
URL | http://arxiv.org/abs/1704.08035v1 |
http://arxiv.org/pdf/1704.08035v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-viseme-vocabulary-construction-to |
Repo | |
Framework | |