May 5, 2019

3204 words 16 mins read

Paper Group ANR 540

Paper Group ANR 540

Community Detection Algorithm Combining Stochastic Block Model and Attribute Data Clustering. Apparent Age Estimation Using Ensemble of Deep Learning Models. Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks. Neighborhood Preserved Sparse Representation for Robust Classification on Symmetric Positive Definite Matrices. Le …

Community Detection Algorithm Combining Stochastic Block Model and Attribute Data Clustering

Title Community Detection Algorithm Combining Stochastic Block Model and Attribute Data Clustering
Authors Shun Kataoka, Takuto Kobayashi, Muneki Yasuda, Kazuyuki Tanaka
Abstract We propose a new algorithm to detect the community structure in a network that utilizes both the network structure and vertex attribute data. Suppose we have the network structure together with the vertex attribute data, that is, the information assigned to each vertex associated with the community to which it belongs. The problem addressed this paper is the detection of the community structure from the information of both the network structure and the vertex attribute data. Our approach is based on the Bayesian approach that models the posterior probability distribution of the community labels. The detection of the community structure in our method is achieved by using belief propagation and an EM algorithm. We numerically verified the performance of our method using computer-generated networks and real-world networks.
Tasks Community Detection
Published 2016-07-21
URL http://arxiv.org/abs/1608.00920v1
PDF http://arxiv.org/pdf/1608.00920v1.pdf
PWC https://paperswithcode.com/paper/community-detection-algorithm-combining
Repo
Framework

Apparent Age Estimation Using Ensemble of Deep Learning Models

Title Apparent Age Estimation Using Ensemble of Deep Learning Models
Authors Refik Can Malli, Mehmet Aygun, Hazim Kemal Ekenel
Abstract In this paper, we address the problem of apparent age estimation. Different from estimating the real age of individuals, in which each face image has a single age label, in this problem, face images have multiple age labels, corresponding to the ages perceived by the annotators, when they look at these images. This provides an intriguing computer vision problem, since in generic image or object classification tasks, it is typical to have a single ground truth label per class. To account for multiple labels per image, instead of using average age of the annotated face image as the class label, we have grouped the face images that are within a specified age range. Using these age groups and their age-shifted groupings, we have trained an ensemble of deep learning models. Before feeding an input face image to a deep learning model, five facial landmark points are detected and used for 2-D alignment. We have employed and fine tuned convolutional neural networks (CNNs) that are based on VGG-16 [24] architecture and pretrained on the IMDB-WIKI dataset [22]. The outputs of these deep learning models are then combined to produce the final estimation. Proposed method achieves 0.3668 error in the final ChaLearn LAP 2016 challenge test set [5].
Tasks Age Estimation, Object Classification
Published 2016-06-09
URL http://arxiv.org/abs/1606.02909v1
PDF http://arxiv.org/pdf/1606.02909v1.pdf
PWC https://paperswithcode.com/paper/apparent-age-estimation-using-ensemble-of
Repo
Framework

Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks

Title Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks
Authors Adam Charles, Dong Yin, Christopher Rozell
Abstract Recurrent neural networks (RNNs) have drawn interest from machine learning researchers because of their effectiveness at preserving past inputs for time-varying data processing tasks. To understand the success and limitations of RNNs, it is critical that we advance our analysis of their fundamental memory properties. We focus on echo state networks (ESNs), which are RNNs with simple memoryless nodes and random connectivity. In most existing analyses, the short-term memory (STM) capacity results conclude that the ESN network size must scale linearly with the input size for unstructured inputs. The main contribution of this paper is to provide general results characterizing the STM capacity for linear ESNs with multidimensional input streams when the inputs have common low-dimensional structure: sparsity in a basis or significant statistical dependence between inputs. In both cases, we show that the number of nodes in the network must scale linearly with the information rate and poly-logarithmically with the ambient input dimension. The analysis relies on advanced applications of random matrix theory and results in explicit non-asymptotic bounds on the recovery error. Taken together, this analysis provides a significant step forward in our understanding of the STM properties in RNNs.
Tasks
Published 2016-05-26
URL http://arxiv.org/abs/1605.08346v3
PDF http://arxiv.org/pdf/1605.08346v3.pdf
PWC https://paperswithcode.com/paper/distributed-sequence-memory-of
Repo
Framework

Neighborhood Preserved Sparse Representation for Robust Classification on Symmetric Positive Definite Matrices

Title Neighborhood Preserved Sparse Representation for Robust Classification on Symmetric Positive Definite Matrices
Authors Ming Yin, Shengli Xie, Yi Guo, Junbin Gao, Yun Zhang
Abstract Due to its promising classification performance, sparse representation based classification(SRC) algorithm has attracted great attention in the past few years. However, the existing SRC type methods apply only to vector data in Euclidean space. As such, there is still no satisfactory approach to conduct classification task for symmetric positive definite (SPD) matrices which is very useful in computer vision. To address this problem, in this paper, a neighborhood preserved kernel SRC method is proposed on SPD manifolds. Specifically, by embedding the SPD matrices into a Reproducing Kernel Hilbert Space (RKHS), the proposed method can perform classification on SPD manifolds through an appropriate Log-Euclidean kernel. Through exploiting the geodesic distance between SPD matrices, our method can effectively characterize the intrinsic local Riemannian geometry within data so as to well unravel the underlying sub-manifold structure. Despite its simplicity, experimental results on several famous database demonstrate that the proposed method achieves better classification results than the state-of-the-art approaches.
Tasks Sparse Representation-based Classification
Published 2016-01-27
URL http://arxiv.org/abs/1601.07336v1
PDF http://arxiv.org/pdf/1601.07336v1.pdf
PWC https://paperswithcode.com/paper/neighborhood-preserved-sparse-representation
Repo
Framework

Lexical bundles in computational linguistics academic literature

Title Lexical bundles in computational linguistics academic literature
Authors Adel Rahimi
Abstract In this study we analyzed a corpus of 8 million words academic literature from Computational lingustics’ academic literature. the lexical bundles from this corpus are categorized based on structures and functions.
Tasks
Published 2016-03-09
URL http://arxiv.org/abs/1603.02905v2
PDF http://arxiv.org/pdf/1603.02905v2.pdf
PWC https://paperswithcode.com/paper/lexical-bundles-in-computational-linguistics
Repo
Framework

An Approach for Self-Training Audio Event Detectors Using Web Data

Title An Approach for Self-Training Audio Event Detectors Using Web Data
Authors Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane
Abstract Audio Event Detection (AED) aims to recognize sounds within audio and video recordings. AED employs machine learning algorithms commonly trained and tested on annotated datasets. However, available datasets are limited in number of samples and hence it is difficult to model acoustic diversity. Therefore, we propose combining labeled audio from a dataset and unlabeled audio from the web to improve the sound models. The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube. Whenever the detectors recognized any of the known sounds with high confidence, the unlabeled audio was use to re-train the detectors. The performance of the re-trained detectors is compared to the one from the original detectors using the annotated test set. Results showed an improvement of the AED, and uncovered challenges of using web audio from videos.
Tasks
Published 2016-09-20
URL http://arxiv.org/abs/1609.06026v3
PDF http://arxiv.org/pdf/1609.06026v3.pdf
PWC https://paperswithcode.com/paper/an-approach-for-self-training-audio-event
Repo
Framework

Real-time analysis of cataract surgery videos using statistical models

Title Real-time analysis of cataract surgery videos using statistical models
Authors Katia Charrière, Gwenolé Quellec, Mathieu Lamard, David Martiano, Guy Cazuguel, Gouenou Coatrieux, Béatrice Cochener
Abstract The automatic analysis of the surgical process, from videos recorded during surgeries, could be very useful to surgeons, both for training and for acquiring new techniques. The training process could be optimized by automatically providing some targeted recommendations or warnings, similar to the expert surgeon’s guidance. In this paper, we propose to reuse videos recorded and stored during cataract surgeries to perform the analysis. The proposed system allows to automatically recognize, in real time, what the surgeon is doing: what surgical phase or, more precisely, what surgical step he or she is performing. This recognition relies on the inference of a multilevel statistical model which uses 1) the conditional relations between levels of description (steps and phases) and 2) the temporal relations among steps and among phases. The model accepts two types of inputs: 1) the presence of surgical tools, manually provided by the surgeons, or 2) motion in videos, automatically analyzed through the Content Based Video retrieval (CBVR) paradigm. Different data-driven statistical models are evaluated in this paper. For this project, a dataset of 30 cataract surgery videos was collected at Brest University hospital. The system was evaluated in terms of area under the ROC curve. Promising results were obtained using either the presence of surgical tools ($A_z$ = 0.983) or motion analysis ($A_z$ = 0.759). The generality of the method allows to adapt it to any kinds of surgeries. The proposed solution could be used in a computer assisted surgery tool to support surgeons during the surgery.
Tasks Video Retrieval
Published 2016-10-18
URL http://arxiv.org/abs/1610.05465v1
PDF http://arxiv.org/pdf/1610.05465v1.pdf
PWC https://paperswithcode.com/paper/real-time-analysis-of-cataract-surgery-videos
Repo
Framework

A Multi-Batch L-BFGS Method for Machine Learning

Title A Multi-Batch L-BFGS Method for Machine Learning
Authors Albert S. Berahas, Jorge Nocedal, Martin Takáč
Abstract The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This can cause difficulties because L-BFGS employs gradient differences to update the Hessian approximations, and when these gradients are computed using different data points the process can be unstable. This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases.
Tasks
Published 2016-05-19
URL http://arxiv.org/abs/1605.06049v2
PDF http://arxiv.org/pdf/1605.06049v2.pdf
PWC https://paperswithcode.com/paper/a-multi-batch-l-bfgs-method-for-machine
Repo
Framework

RGBD Salient Object Detection via Deep Fusion

Title RGBD Salient Object Detection via Deep Fusion
Authors Liangqiong Qu, Shengfeng He, Jiawei Zhang, Jiandong Tian, Yandong Tang, Qingxiong Yang
Abstract Numerous efforts have been made to design different low level saliency cues for the RGBD saliency detection, such as color or depth contrast features, background and color compactness priors. However, how these saliency cues interact with each other and how to incorporate these low level saliency cues effectively to generate a master saliency map remain a challenging problem. In this paper, we design a new convolutional neural network (CNN) to fuse different low level saliency cues into hierarchical features for automatically detecting salient objects in RGBD images. In contrast to the existing works that directly feed raw image pixels to the CNN, the proposed method takes advantage of the knowledge in traditional saliency detection by adopting various meaningful and well-designed saliency feature vectors as input. This can guide the training of CNN towards detecting salient object more effectively due to the reduced learning ambiguity. We then integrate a Laplacian propagation framework with the learned CNN to extract a spatially consistent saliency map by exploiting the intrinsic structure of the input image. Extensive quantitative and qualitative experimental evaluations on three datasets demonstrate that the proposed method consistently outperforms state-of-the-art methods.
Tasks Object Detection, Saliency Detection, Salient Object Detection
Published 2016-07-12
URL http://arxiv.org/abs/1607.03333v1
PDF http://arxiv.org/pdf/1607.03333v1.pdf
PWC https://paperswithcode.com/paper/rgbd-salient-object-detection-via-deep-fusion
Repo
Framework

Graph Regularized Low Rank Representation for Aerosol Optical Depth Retrieval

Title Graph Regularized Low Rank Representation for Aerosol Optical Depth Retrieval
Authors Yubao Sun, Renlong Hang, Qingshan Liu, Fuping Zhu, Hucheng Pei
Abstract In this paper, we propose a novel data-driven regression model for aerosol optical depth (AOD) retrieval. First, we adopt a low rank representation (LRR) model to learn a powerful representation of the spectral response. Then, graph regularization is incorporated into the LRR model to capture the local structure information and the nonlinear property of the remote-sensing data. Since it is easy to acquire the rich satellite-retrieval results, we use them as a baseline to construct the graph. Finally, the learned feature representation is feeded into support vector machine (SVM) to retrieve AOD. Experiments are conducted on two widely used data sets acquired by different sensors, and the experimental results show that the proposed method can achieve superior performance compared to the physical models and other state-of-the-art empirical models.
Tasks
Published 2016-02-22
URL http://arxiv.org/abs/1602.06818v2
PDF http://arxiv.org/pdf/1602.06818v2.pdf
PWC https://paperswithcode.com/paper/graph-regularized-low-rank-representation-for
Repo
Framework

Rain Removal via Shrinkage-Based Sparse Coding and Learned Rain Dictionary

Title Rain Removal via Shrinkage-Based Sparse Coding and Learned Rain Dictionary
Authors Chang-Hwan Son, Xiao-Ping Zhang
Abstract This paper introduces a new rain removal model based on the shrinkage of the sparse codes for a single image. Recently, dictionary learning and sparse coding have been widely used for image restoration problems. These methods can also be applied to the rain removal by learning two types of rain and non-rain dictionaries and forcing the sparse codes of the rain dictionary to be zero vectors. However, this approach can generate unwanted edge artifacts and detail loss in the non-rain regions. Based on this observation, a new approach for shrinking the sparse codes is presented in this paper. To effectively shrink the sparse codes in the rain and non-rain regions, an error map between the input rain image and the reconstructed rain image is generated by using the learned rain dictionary. Based on this error map, both the sparse codes of rain and non-rain dictionaries are used jointly to represent the image structures of objects and avoid the edge artifacts in the non-rain regions. In the rain regions, the correlation matrix between the rain and non-rain dictionaries is calculated. Then, the sparse codes corresponding to the highly correlated signal-atoms in the rain and non-rain dictionaries are shrunk jointly to improve the removal of the rain structures. The experimental results show that the proposed shrinkage-based sparse coding can preserve image structures and avoid the edge artifacts in the non-rain regions, and it can remove the rain structures in the rain regions. Also, visual quality evaluation confirms that the proposed method outperforms the conventional texture and rain removal methods.
Tasks Dictionary Learning, Image Restoration, Rain Removal
Published 2016-10-03
URL http://arxiv.org/abs/1610.00386v1
PDF http://arxiv.org/pdf/1610.00386v1.pdf
PWC https://paperswithcode.com/paper/rain-removal-via-shrinkage-based-sparse
Repo
Framework

A Framework for Human Pose Estimation in Videos

Title A Framework for Human Pose Estimation in Videos
Authors Dong Zhang, Mubarak Shah
Abstract In this paper, we present a method to estimate a sequence of human poses in unconstrained videos. We aim to demonstrate that by using temporal information, the human pose estimation results can be improved over image based pose estimation methods. In contrast to the commonly employed graph optimization formulation, which is NP-hard and needs approximate solutions, we formulate this problem into a unified two stage tree-based optimization problem for which an efficient and exact solution exists. Although the proposed method finds an exact solution, it does not sacrifice the ability to model the spatial and temporal constraints between body parts in the frames; in fact it models the {\em symmetric} parts better than the existing methods. The proposed method is based on two main ideas: Abstraction' and Association’ to enforce the intra- and inter-frame body part constraints without inducing extra computational complexity to the polynomial time solution. Using the idea of Abstraction', a new concept of abstract body part’ is introduced to conceptually combine the symmetric body parts and model them in the tree based body part structure. Using the idea of `Association’, the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames. A sequence of the best poses is inferred from the abstract body part tracklets through the tree-based optimization. Finally, the poses are refined by limb alignment and refinement schemes. We evaluated the proposed method on three publicly available video based human pose estimation datasets, and obtained dramatically improved performance compared to the state-of-the-art methods. |
Tasks Pose Estimation
Published 2016-04-26
URL http://arxiv.org/abs/1604.07788v1
PDF http://arxiv.org/pdf/1604.07788v1.pdf
PWC https://paperswithcode.com/paper/a-framework-for-human-pose-estimation-in
Repo
Framework

One-to-Many Network for Visually Pleasing Compression Artifacts Reduction

Title One-to-Many Network for Visually Pleasing Compression Artifacts Reduction
Authors Jun Guo, Hongyang Chao
Abstract We consider the compression artifacts reduction problem, where a compressed image is transformed into an artifact-free image. Recent approaches for this problem typically train a one-to-one mapping using a per-pixel $L_2$ loss between the outputs and the ground-truths. We point out that these approaches used to produce overly smooth results, and PSNR doesn’t reflect their real performance. In this paper, we propose a one-to-many network, which measures output quality using a perceptual loss, a naturalness loss, and a JPEG loss. We also avoid grid-like artifacts during deconvolution using a “shift-and-average” strategy. Extensive experimental results demonstrate the dramatic visual improvement of our approach over the state of the arts.
Tasks
Published 2016-11-15
URL http://arxiv.org/abs/1611.04994v2
PDF http://arxiv.org/pdf/1611.04994v2.pdf
PWC https://paperswithcode.com/paper/one-to-many-network-for-visually-pleasing
Repo
Framework

Learning Language-Visual Embedding for Movie Understanding with Natural-Language

Title Learning Language-Visual Embedding for Movie Understanding with Natural-Language
Authors Atousa Torabi, Niket Tandon, Leonid Sigal
Abstract Learning a joint language-visual embedding has a number of very appealing properties and can result in variety of practical application, including natural language image/video annotation and search. In this work, we study three different joint language-visual neural network model architectures. We evaluate our models on large scale LSMDC16 movie dataset for two tasks: 1) Standard Ranking for video annotation and retrieval 2) Our proposed movie multiple-choice test. This test facilitate automatic evaluation of visual-language models for natural language video annotation based on human activities. In addition to original Audio Description (AD) captions, provided as part of LSMDC16, we collected and will make available a) manually generated re-phrasings of those captions obtained using Amazon MTurk b) automatically generated human activity elements in “Predicate + Object” (PO) phrases based on “Knowlywood”, an activity knowledge mining model. Our best model archives Recall@10 of 19.2% on annotation and 18.9% on video retrieval tasks for subset of 1000 samples. For multiple-choice test, our best model achieve accuracy 58.11% over whole LSMDC16 public test-set.
Tasks Video Retrieval
Published 2016-09-26
URL http://arxiv.org/abs/1609.08124v1
PDF http://arxiv.org/pdf/1609.08124v1.pdf
PWC https://paperswithcode.com/paper/learning-language-visual-embedding-for-movie
Repo
Framework

Spatial Concept Acquisition for a Mobile Robot that Integrates Self-Localization and Unsupervised Word Discovery from Spoken Sentences

Title Spatial Concept Acquisition for a Mobile Robot that Integrates Self-Localization and Unsupervised Word Discovery from Spoken Sentences
Authors Akira Taniguchi, Tadahiro Taniguchi, Tetsunari Inamura
Abstract In this paper, we propose a novel unsupervised learning method for the lexical acquisition of words related to places visited by robots, from human continuous speech signals. We address the problem of learning novel words by a robot that has no prior knowledge of these words except for a primitive acoustic model. Further, we propose a method that allows a robot to effectively use the learned words and their meanings for self-localization tasks. The proposed method is nonparametric Bayesian spatial concept acquisition method (SpCoA) that integrates the generative model for self-localization and the unsupervised word segmentation in uttered sentences via latent variables related to the spatial concept. We implemented the proposed method SpCoA on SIGVerse, which is a simulation environment, and TurtleBot2, which is a mobile robot in a real environment. Further, we conducted experiments for evaluating the performance of SpCoA. The experimental results showed that SpCoA enabled the robot to acquire the names of places from speech sentences. They also revealed that the robot could effectively utilize the acquired spatial concepts and reduce the uncertainty in self-localization.
Tasks
Published 2016-02-03
URL http://arxiv.org/abs/1602.01208v3
PDF http://arxiv.org/pdf/1602.01208v3.pdf
PWC https://paperswithcode.com/paper/spatial-concept-acquisition-for-a-mobile
Repo
Framework
comments powered by Disqus