May 7, 2019

2899 words 14 mins read

Paper Group ANR 97

Paper Group ANR 97

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task. Low Rank Approximation with Entrywise $\ell_1$-Norm Error. Semantic-Aware Depth Super-Resolution in Outdoor Scenes. LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning. Hierarchical Gaussian Mixture Model with Objects Attached …

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

Title Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task
Authors Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut
Abstract We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options. Accomplishing the task entails demonstrating comprehension beyond just recognizing “keywords” (or key-phrases) and their corresponding visual concepts. Instead, it requires an alignment between the representations of the two modalities that achieves a visually-grounded “understanding” of various linguistic elements and their dependencies. This new task also admits an easy-to-compute and well-studied metric: the accuracy in detecting the true target among the decoys. The paper makes several contributions: an effective and extensible mechanism for generating decoys from (human-created) image captions; an instance of applying this mechanism, yielding a large-scale machine comprehension dataset (based on the COCO images and captions) that we make publicly available; human evaluation results on this dataset, informing a performance upper-bound; and several baseline and competitive learning approaches that illustrate the utility of the proposed task and dataset in advancing both image and language comprehension. We also show that, in a multi-task learning setting, the performance on the proposed task is positively correlated with the end-to-end task of image captioning.
Tasks Image Captioning, Multi-Task Learning, Reading Comprehension
Published 2016-12-22
URL http://arxiv.org/abs/1612.07833v1
PDF http://arxiv.org/pdf/1612.07833v1.pdf
PWC https://paperswithcode.com/paper/understanding-image-and-text-simultaneously-a
Repo
Framework

Low Rank Approximation with Entrywise $\ell_1$-Norm Error

Title Low Rank Approximation with Entrywise $\ell_1$-Norm Error
Authors Zhao Song, David P. Woodruff, Peilin Zhong
Abstract We study the $\ell_1$-low rank approximation problem, where for a given $n \times d$ matrix $A$ and approximation factor $\alpha \geq 1$, the goal is to output a rank-$k$ matrix $\widehat{A}$ for which $$\A-\widehat{A}_1 \leq \alpha \cdot \min_{\textrm{rank-}k\textrm{ matrices}~A’}\A-A’_1,$$ where for an $n \times d$ matrix $C$, we let $\C_1 = \sum_{i=1}^n \sum_{j=1}^d C_{i,j}$. This error measure is known to be more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. The problem was shown to be NP-hard by Gillis and Vavasis and a number of heuristics have been proposed. It was asked in multiple places if there are any approximation algorithms. We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$. If $k$ is constant, we further improve the approximation ratio to $O(1)$ with a $\mathrm{poly}(nd)$-time algorithm. Under the Exponential Time Hypothesis, we show there is no $\mathrm{poly}(nd)$-time algorithm achieving a $(1+\frac{1}{\log^{1+\gamma}(nd)})$-approximation, for $\gamma > 0$ an arbitrarily small constant, even when $k = 1$. We give a number of additional results for $\ell_1$-low rank approximation: nearly tight upper and lower bounds for column subset selection, CUR decompositions, extensions to low rank approximation with respect to $\ell_p$-norms for $1 \leq p < 2$ and earthmover distance, low-communication distributed protocols and low-memory streaming algorithms, algorithms with limited randomness, and bicriteria algorithms. We also give a preliminary empirical evaluation.
Tasks
Published 2016-11-03
URL http://arxiv.org/abs/1611.00898v1
PDF http://arxiv.org/pdf/1611.00898v1.pdf
PWC https://paperswithcode.com/paper/low-rank-approximation-with-entrywise-ell_1
Repo
Framework

Semantic-Aware Depth Super-Resolution in Outdoor Scenes

Title Semantic-Aware Depth Super-Resolution in Outdoor Scenes
Authors Miaomiao Liu, Mathieu Salzmann, Xuming He
Abstract While depth sensors are becoming increasingly popular, their spatial resolution often remains limited. Depth super-resolution therefore emerged as a solution to this problem. Despite much progress, state-of-the-art techniques suffer from two drawbacks: (i) they rely on the assumption that intensity edges coincide with depth discontinuities, which, unfortunately, is only true in controlled environments; and (ii) they typically exploit the availability of high-resolution training depth maps, which can often not be acquired in practice due to the sensors’ limitations. By contrast, here, we introduce an approach to performing depth super-resolution in more challenging conditions, such as in outdoor scenes. To this end, we first propose to exploit semantic information to better constrain the super-resolution process. In particular, we design a co-sparse analysis model that learns filters from joint intensity, depth and semantic information. Furthermore, we show how low-resolution training depth maps can be employed in our learning strategy. We demonstrate the benefits of our approach over state-of-the-art depth super-resolution methods on two outdoor scene datasets.
Tasks Super-Resolution
Published 2016-05-31
URL http://arxiv.org/abs/1605.09546v1
PDF http://arxiv.org/pdf/1605.09546v1.pdf
PWC https://paperswithcode.com/paper/semantic-aware-depth-super-resolution-in
Repo
Framework

LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning

Title LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning
Authors Ernest Cheung, Tsan Kwong Wong, Aniket Bera, Xiaogang Wang, Dinesh Manocha
Abstract We present a novel procedural framework to generate an arbitrary number of labeled crowd videos (LCrowdV). The resulting crowd video datasets are used to design accurate algorithms or training models for crowded scene understanding. Our overall approach is composed of two components: a procedural simulation framework for generating crowd movements and behaviors, and a procedural rendering framework to generate different videos or images. Each video or image is automatically labeled based on the environment, number of pedestrians, density, behavior, flow, lighting conditions, viewpoint, noise, etc. Furthermore, we can increase the realism by combining synthetically-generated behaviors with real-world background videos. We demonstrate the benefits of LCrowdV over prior lableled crowd datasets by improving the accuracy of pedestrian detection and crowd behavior classification algorithms. LCrowdV would be released on the WWW.
Tasks Pedestrian Detection, Scene Understanding
Published 2016-06-29
URL http://arxiv.org/abs/1606.08998v2
PDF http://arxiv.org/pdf/1606.08998v2.pdf
PWC https://paperswithcode.com/paper/lcrowdv-generating-labeled-videos-for
Repo
Framework

Hierarchical Gaussian Mixture Model with Objects Attached to Terminal and Non-terminal Dendrogram Nodes

Title Hierarchical Gaussian Mixture Model with Objects Attached to Terminal and Non-terminal Dendrogram Nodes
Authors Łukasz P. Olech, Mariusz Paradowski
Abstract A hierarchical clustering algorithm based on Gaussian mixture model is presented. The key difference to regular hierarchical mixture models is the ability to store objects in both terminal and nonterminal nodes. Upper levels of the hierarchy contain sparsely distributed objects, while lower levels contain densely represented ones. As it was shown by experiments, this ability helps in noise detection (modelling). Furthermore, compared to regular hierarchical mixture model, the presented method generates more compact dendrograms with higher quality measured by adopted F-measure.
Tasks
Published 2016-03-28
URL http://arxiv.org/abs/1603.08342v1
PDF http://arxiv.org/pdf/1603.08342v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-gaussian-mixture-model-with
Repo
Framework

Generalized Linear Models for Aggregated Data

Title Generalized Linear Models for Aggregated Data
Authors Avradeep Bhowmik, Joydeep Ghosh, Oluwasanmi Koyejo
Abstract Databases in domains such as healthcare are routinely released to the public in aggregated form. Unfortunately, naive modeling with aggregated data may significantly diminish the accuracy of inferences at the individual level. This paper addresses the scenario where features are provided at the individual level, but the target variables are only available as histogram aggregates or order statistics. We consider a limiting case of generalized linear modeling when the target variables are only known up to permutation, and explore how this relates to permutation testing; a standard technique for assessing statistical dependency. Based on this relationship, we propose a simple algorithm to estimate the model parameters and individual level inferences via alternating imputation and standard generalized linear model fitting. Our results suggest the effectiveness of the proposed approach when, in the original data, permutation testing accurately ascertains the veracity of the linear relationship. The framework is extended to general histogram data with larger bins - with order statistics such as the median as a limiting case. Our experimental results on simulated data and aggregated healthcare data suggest a diminishing returns property with respect to the granularity of the histogram - when a linear relationship holds in the original data, the targets can be predicted accurately given relatively coarse histograms.
Tasks Imputation
Published 2016-05-14
URL http://arxiv.org/abs/1605.04466v1
PDF http://arxiv.org/pdf/1605.04466v1.pdf
PWC https://paperswithcode.com/paper/generalized-linear-models-for-aggregated-data
Repo
Framework

Redefining Binarization and the Visual Archetype

Title Redefining Binarization and the Visual Archetype
Authors Anguelos Nicolaou, Liwicki Marcus
Abstract Although binarization is considered passe, it still remains a highly popular research topic. In this paper we propose a rethinking of what binarization is. We introduce the notion of the visual archetype as the ideal form of any one document. Binarization can be defined as the restoration of the visual archetype for a class of images. This definition broadens the scope of what binarization means but also suggests ground-truth should focus on the foreground.
Tasks Document Binarization
Published 2016-09-29
URL http://arxiv.org/abs/1609.09451v1
PDF http://arxiv.org/pdf/1609.09451v1.pdf
PWC https://paperswithcode.com/paper/redefining-binarization-and-the-visual
Repo
Framework

NIST: An Image Classification Network to Image Semantic Retrieval

Title NIST: An Image Classification Network to Image Semantic Retrieval
Authors Le Dong, Xiuyuan Chen, Mengdie Mao, Qianni Zhang
Abstract This paper proposes a classification network to image semantic retrieval (NIST) framework to counter the image retrieval challenge. Our approach leverages the successful classification network GoogleNet based on Convolutional Neural Networks to obtain the semantic feature matrix which contains the serial number of classes and corresponding probabilities. Compared with traditional image retrieval using feature matching to compute the similarity between two images, NIST leverages the semantic information to construct semantic feature matrix and uses the semantic distance algorithm to compute the similarity. Besides, the fusion strategy can significantly reduce storage and time consumption due to less classes participating in the last semantic distance computation. Experiments demonstrate that our NIST framework produces state-of-the-art results in retrieval experiments on MIRFLICKR-25K dataset.
Tasks Image Classification, Image Retrieval
Published 2016-07-02
URL http://arxiv.org/abs/1607.00464v1
PDF http://arxiv.org/pdf/1607.00464v1.pdf
PWC https://paperswithcode.com/paper/nist-an-image-classification-network-to-image
Repo
Framework

The Predictive Context Tree: Predicting Contexts and Interactions

Title The Predictive Context Tree: Predicting Contexts and Interactions
Authors Alasdair Thomason, Nathan Griffiths, Victor Sanchez
Abstract With a large proportion of people carrying location-aware smartphones, we have an unprecedented platform from which to understand individuals and predict their future actions. This work builds upon the Context Tree data structure that summarises the historical contexts of individuals from augmented geospatial trajectories, and constructs a predictive model for their likely future contexts. The Predictive Context Tree (PCT) is constructed as a hierarchical classifier, capable of predicting both the future locations that a user will visit and the contexts that a user will be immersed within. The PCT is evaluated over real-world geospatial trajectories, and compared against existing location extraction and prediction techniques, as well as a proposed hybrid approach that uses identified land usage elements in combination with machine learning to predict future interactions. Our results demonstrate that higher predictive accuracies can be achieved using this hybrid approach over traditional extracted location datasets, and the PCT itself matches the performance of the hybrid approach at predicting future interactions, while adding utility in the form of context predictions. Such a prediction system is capable of understanding not only where a user will visit, but also their context, in terms of what they are likely to be doing.
Tasks
Published 2016-10-05
URL http://arxiv.org/abs/1610.01381v1
PDF http://arxiv.org/pdf/1610.01381v1.pdf
PWC https://paperswithcode.com/paper/the-predictive-context-tree-predicting
Repo
Framework
Title Orthographic Syllable as basic unit for SMT between Related Languages
Authors Anoop Kunchukuttan, Pushpak Bhattacharyya
Abstract We explore the use of the orthographic syllable, a variable-length consonant-vowel sequence, as a basic unit of translation between related languages which use abugida or alphabetic scripts. We show that orthographic syllable level translation significantly outperforms models trained over other basic units (word, morpheme and character) when training over small parallel corpora.
Tasks
Published 2016-10-03
URL http://arxiv.org/abs/1610.00634v1
PDF http://arxiv.org/pdf/1610.00634v1.pdf
PWC https://paperswithcode.com/paper/orthographic-syllable-as-basic-unit-for-smt
Repo
Framework

Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks

Title Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks
Authors Eder Santana, Matthew Emigh, Pablo Zegers, Jose C Principe
Abstract We propose a convolutional recurrent neural network, with Winner-Take-All dropout for high dimensional unsupervised feature learning in multi-dimensional time series. We apply the proposedmethod for object recognition with temporal context in videos and obtain better results than comparable methods in the literature, including the Deep Predictive Coding Networks previously proposed by Chalasani and Principe.Our contributions can be summarized as a scalable reinterpretation of the Deep Predictive Coding Networks trained end-to-end with backpropagation through time, an extension of the previously proposed Winner-Take-All Autoencoders to sequences in time, and a new technique for initializing and regularizing convolutional-recurrent neural networks.
Tasks Object Recognition, Time Series
Published 2016-10-31
URL http://arxiv.org/abs/1611.00050v2
PDF http://arxiv.org/pdf/1611.00050v2.pdf
PWC https://paperswithcode.com/paper/exploiting-spatio-temporal-structure-with
Repo
Framework

Nonparametric Bayesian Negative Binomial Factor Analysis

Title Nonparametric Bayesian Negative Binomial Factor Analysis
Authors Mingyuan Zhou
Abstract A common approach to analyze a covariate-sample count matrix, an element of which represents how many times a covariate appears in a sample, is to factorize it under the Poisson likelihood. We show its limitation in capturing the tendency for a covariate present in a sample to both repeat itself and excite related ones. To address this limitation, we construct negative binomial factor analysis (NBFA) to factorize the matrix under the negative binomial likelihood, and relate it to a Dirichlet-multinomial distribution based mixed-membership model. To support countably infinite factors, we propose the hierarchical gamma-negative binomial process. By exploiting newly proved connections between discrete distributions, we construct two blocked and a collapsed Gibbs sampler that all adaptively truncate their number of factors, and demonstrate that the blocked Gibbs sampler developed under a compound Poisson representation converges fast and has low computational complexity. Example results show that NBFA has a distinct mechanism in adjusting its number of inferred factors according to the sample lengths, and provides clear advantages in parsimonious representation, predictive power, and computational complexity over previously proposed discrete latent variable models, which either completely ignore burstiness, or model only the burstiness of the covariates but not that of the factors.
Tasks Latent Variable Models
Published 2016-04-25
URL http://arxiv.org/abs/1604.07464v2
PDF http://arxiv.org/pdf/1604.07464v2.pdf
PWC https://paperswithcode.com/paper/nonparametric-bayesian-negative-binomial
Repo
Framework

Blind Source Separation: Fundamentals and Recent Advances (A Tutorial Overview Presented at SBrT-2001)

Title Blind Source Separation: Fundamentals and Recent Advances (A Tutorial Overview Presented at SBrT-2001)
Authors Eleftherios Kofidis
Abstract Blind source separation (BSS), i.e., the decoupling of unknown signals that have been mixed in an unknown way, has been a topic of great interest in the signal processing community for the last decade, covering a wide range of applications in such diverse fields as digital communications, pattern recognition, biomedical engineering, and financial data analysis, among others. This course aims at an introduction to the BSS problem via an exposition of well-known and established as well as some more recent approaches to its solution. A unified way is followed in presenting the various results so as to more easily bring out their similarities/differences and emphasize their relative advantages/disadvantages. Only a representative sample of the existing knowledge on BSS will be included in this course. The interested readers are encouraged to consult the list of bibliographical references for more details on this exciting and always active research topic.
Tasks
Published 2016-03-09
URL http://arxiv.org/abs/1603.03089v1
PDF http://arxiv.org/pdf/1603.03089v1.pdf
PWC https://paperswithcode.com/paper/blind-source-separation-fundamentals-and
Repo
Framework

Real-time Human Pose Estimation from Video with Convolutional Neural Networks

Title Real-time Human Pose Estimation from Video with Convolutional Neural Networks
Authors Marko Linna, Juho Kannala, Esa Rahtu
Abstract In this paper, we present a method for real-time multi-person human pose estimation from video by utilizing convolutional neural networks. Our method is aimed for use case specific applications, where good accuracy is essential and variation of the background and poses is limited. This enables us to use a generic network architecture, which is both accurate and fast. We divide the problem into two phases: (1) pre-training and (2) finetuning. In pre-training, the network is learned with highly diverse input data from publicly available datasets, while in finetuning we train with application specific data, which we record with Kinect. Our method differs from most of the state-of-the-art methods in that we consider the whole system, including person detector, pose estimator and an automatic way to record application specific training material for finetuning. Our method is considerably faster than many of the state-of-the-art methods. Our method can be thought of as a replacement for Kinect, and it can be used for higher level tasks, such as gesture control, games, person tracking, action recognition and action tracking. We achieved accuracy of 96.8% (PCK@0.2) with application specific data.
Tasks Pose Estimation, Temporal Action Localization
Published 2016-09-23
URL http://arxiv.org/abs/1609.07420v1
PDF http://arxiv.org/pdf/1609.07420v1.pdf
PWC https://paperswithcode.com/paper/real-time-human-pose-estimation-from-video
Repo
Framework

Minimalist Regression Network with Reinforced Gradients and Weighted Estimates: a Case Study on Parameters Estimation in Automated Welding

Title Minimalist Regression Network with Reinforced Gradients and Weighted Estimates: a Case Study on Parameters Estimation in Automated Welding
Authors Soheil Keshmiri
Abstract This paper presents a minimalist neural regression network as an aggregate of independent identical regression blocks that are trained simultaneously. Moreover, it introduces a new multiplicative parameter, shared by all the neural units of a given layer, to maintain the quality of its gradients. Furthermore, it increases its estimation accuracy via learning a weight factor whose quantity captures the redundancy between the estimated and actual values at each training iteration. We choose the estimation of the direct weld parameters of different welding techniques to show a significant improvement in calculation of these parameters by our model in contrast to state-of-the-arts techniques in the literature. Furthermore, we demonstrate the ability of our model to retain its performance when presented with combined data of different welding techniques. This is a nontrivial result in attaining an scalable model whose quality of estimation is independent of adopted welding techniques.
Tasks
Published 2016-07-05
URL http://arxiv.org/abs/1607.01136v1
PDF http://arxiv.org/pdf/1607.01136v1.pdf
PWC https://paperswithcode.com/paper/minimalist-regression-network-with-reinforced
Repo
Framework
comments powered by Disqus