May 7, 2019

2899 words 14 mins read

Paper Group ANR 97

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task. Low Rank Approximation with Entrywise $\ell_1$-Norm Error. Semantic-Aware Depth Super-Resolution in Outdoor Scenes. LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning. Hierarchical Gaussian Mixture Model with Objects Attached …

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task


Title	Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task
Authors	Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut
Abstract	We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options. Accomplishing the task entails demonstrating comprehension beyond just recognizing “keywords” (or key-phrases) and their corresponding visual concepts. Instead, it requires an alignment between the representations of the two modalities that achieves a visually-grounded “understanding” of various linguistic elements and their dependencies. This new task also admits an easy-to-compute and well-studied metric: the accuracy in detecting the true target among the decoys. The paper makes several contributions: an effective and extensible mechanism for generating decoys from (human-created) image captions; an instance of applying this mechanism, yielding a large-scale machine comprehension dataset (based on the COCO images and captions) that we make publicly available; human evaluation results on this dataset, informing a performance upper-bound; and several baseline and competitive learning approaches that illustrate the utility of the proposed task and dataset in advancing both image and language comprehension. We also show that, in a multi-task learning setting, the performance on the proposed task is positively correlated with the end-to-end task of image captioning.
Tasks	Image Captioning, Multi-Task Learning, Reading Comprehension
Published	2016-12-22
URL	http://arxiv.org/abs/1612.07833v1
PDF	http://arxiv.org/pdf/1612.07833v1.pdf
PWC	https://paperswithcode.com/paper/understanding-image-and-text-simultaneously-a
Repo
Framework

Low Rank Approximation with Entrywise $\ell_1$-Norm Error


Title	Low Rank Approximation with Entrywise $\ell_1$-Norm Error
Authors	Zhao Song, David P. Woodruff, Peilin Zhong
Abstract	We study the $\ell_1$-low rank approximation problem, where for a given $n \times d$ matrix $A$ and approximation factor $\alpha \geq 1$, the goal is to output a rank-$k$ matrix $\widehat{A}$ for which $$\A-\widehat{A}_1 \leq \alpha \cdot \min_{\textrm{rank-}k\textrm{ matrices}~A’}\A-A’_1,$$ where for an $n \times d$ matrix $C$, we let $\C_1 = \sum_{i=1}^n \sum_{j=1}^d C_{i,j}$. This error measure is known to be more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. The problem was shown to be NP-hard by Gillis and Vavasis and a number of heuristics have been proposed. It was asked in multiple places if there are any approximation algorithms. We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$. If $k$ is constant, we further improve the approximation ratio to $O(1)$ with a $\mathrm{poly}(nd)$-time algorithm. Under the Exponential Time Hypothesis, we show there is no $\mathrm{poly}(nd)$-time algorithm achieving a $(1+\frac{1}{\log^{1+\gamma}(nd)})$-approximation, for $\gamma > 0$ an arbitrarily small constant, even when $k = 1$. We give a number of additional results for $\ell_1$-low rank approximation: nearly tight upper and lower bounds for column subset selection, CUR decompositions, extensions to low rank approximation with respect to $\ell_p$-norms for $1 \leq p < 2$ and earthmover distance, low-communication distributed protocols and low-memory streaming algorithms, algorithms with limited randomness, and bicriteria algorithms. We also give a preliminary empirical evaluation.
Tasks
Published	2016-11-03
URL	http://arxiv.org/abs/1611.00898v1
PDF	http://arxiv.org/pdf/1611.00898v1.pdf
PWC	https://paperswithcode.com/paper/low-rank-approximation-with-entrywise-ell_1
Repo
Framework

Semantic-Aware Depth Super-Resolution in Outdoor Scenes


Title	Semantic-Aware Depth Super-Resolution in Outdoor Scenes
Authors	Miaomiao Liu, Mathieu Salzmann, Xuming He
Abstract	While depth sensors are becoming increasingly popular, their spatial resolution often remains limited. Depth super-resolution therefore emerged as a solution to this problem. Despite much progress, state-of-the-art techniques suffer from two drawbacks: (i) they rely on the assumption that intensity edges coincide with depth discontinuities, which, unfortunately, is only true in controlled environments; and (ii) they typically exploit the availability of high-resolution training depth maps, which can often not be acquired in practice due to the sensors’ limitations. By contrast, here, we introduce an approach to performing depth super-resolution in more challenging conditions, such as in outdoor scenes. To this end, we first propose to exploit semantic information to better constrain the super-resolution process. In particular, we design a co-sparse analysis model that learns filters from joint intensity, depth and semantic information. Furthermore, we show how low-resolution training depth maps can be employed in our learning strategy. We demonstrate the benefits of our approach over state-of-the-art depth super-resolution methods on two outdoor scene datasets.
Tasks	Super-Resolution
Published	2016-05-31
URL	http://arxiv.org/abs/1605.09546v1
PDF	http://arxiv.org/pdf/1605.09546v1.pdf
PWC	https://paperswithcode.com/paper/semantic-aware-depth-super-resolution-in
Repo
Framework

LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning


Title	LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning
Authors	Ernest Cheung, Tsan Kwong Wong, Aniket Bera, Xiaogang Wang, Dinesh Manocha
Abstract	We present a novel procedural framework to generate an arbitrary number of labeled crowd videos (LCrowdV). The resulting crowd video datasets are used to design accurate algorithms or training models for crowded scene understanding. Our overall approach is composed of two components: a procedural simulation framework for generating crowd movements and behaviors, and a procedural rendering framework to generate different videos or images. Each video or image is automatically labeled based on the environment, number of pedestrians, density, behavior, flow, lighting conditions, viewpoint, noise, etc. Furthermore, we can increase the realism by combining synthetically-generated behaviors with real-world background videos. We demonstrate the benefits of LCrowdV over prior lableled crowd datasets by improving the accuracy of pedestrian detection and crowd behavior classification algorithms. LCrowdV would be released on the WWW.
Tasks	Pedestrian Detection, Scene Understanding
Published	2016-06-29
URL	http://arxiv.org/abs/1606.08998v2
PDF	http://arxiv.org/pdf/1606.08998v2.pdf
PWC	https://paperswithcode.com/paper/lcrowdv-generating-labeled-videos-for
Repo
Framework

Hierarchical Gaussian Mixture Model with Objects Attached to Terminal and Non-terminal Dendrogram Nodes


Title	Hierarchical Gaussian Mixture Model with Objects Attached to Terminal and Non-terminal Dendrogram Nodes
Authors	Łukasz P. Olech, Mariusz Paradowski
Abstract	A hierarchical clustering algorithm based on Gaussian mixture model is presented. The key difference to regular hierarchical mixture models is the ability to store objects in both terminal and nonterminal nodes. Upper levels of the hierarchy contain sparsely distributed objects, while lower levels contain densely represented ones. As it was shown by experiments, this ability helps in noise detection (modelling). Furthermore, compared to regular hierarchical mixture model, the presented method generates more compact dendrograms with higher quality measured by adopted F-measure.
Tasks
Published	2016-03-28
URL	http://arxiv.org/abs/1603.08342v1
PDF	http://arxiv.org/pdf/1603.08342v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-gaussian-mixture-model-with
Repo
Framework

Generalized Linear Models for Aggregated Data


Title	Generalized Linear Models for Aggregated Data
Authors	Avradeep Bhowmik, Joydeep Ghosh, Oluwasanmi Koyejo
Abstract	Databases in domains such as healthcare are routinely released to the public in aggregated form. Unfortunately, naive modeling with aggregated data may significantly diminish the accuracy of inferences at the individual level. This paper addresses the scenario where features are provided at the individual level, but the target variables are only available as histogram aggregates or order statistics. We consider a limiting case of generalized linear modeling when the target variables are only known up to permutation, and explore how this relates to permutation testing; a standard technique for assessing statistical dependency. Based on this relationship, we propose a simple algorithm to estimate the model parameters and individual level inferences via alternating imputation and standard generalized linear model fitting. Our results suggest the effectiveness of the proposed approach when, in the original data, permutation testing accurately ascertains the veracity of the linear relationship. The framework is extended to general histogram data with larger bins - with order statistics such as the median as a limiting case. Our experimental results on simulated data and aggregated healthcare data suggest a diminishing returns property with respect to the granularity of the histogram - when a linear relationship holds in the original data, the targets can be predicted accurately given relatively coarse histograms.
Tasks	Imputation
Published	2016-05-14
URL	http://arxiv.org/abs/1605.04466v1
PDF	http://arxiv.org/pdf/1605.04466v1.pdf
PWC	https://paperswithcode.com/paper/generalized-linear-models-for-aggregated-data
Repo
Framework

Redefining Binarization and the Visual Archetype


Title	Redefining Binarization and the Visual Archetype
Authors	Anguelos Nicolaou, Liwicki Marcus
Abstract	Although binarization is considered passe, it still remains a highly popular research topic. In this paper we propose a rethinking of what binarization is. We introduce the notion of the visual archetype as the ideal form of any one document. Binarization can be defined as the restoration of the visual archetype for a class of images. This definition broadens the scope of what binarization means but also suggests ground-truth should focus on the foreground.
Tasks	Document Binarization
Published	2016-09-29
URL	http://arxiv.org/abs/1609.09451v1
PDF	http://arxiv.org/pdf/1609.09451v1.pdf
PWC	https://paperswithcode.com/paper/redefining-binarization-and-the-visual
Repo
Framework

NIST: An Image Classification Network to Image Semantic Retrieval


Title	NIST: An Image Classification Network to Image Semantic Retrieval
Authors	Le Dong, Xiuyuan Chen, Mengdie Mao, Qianni Zhang
Abstract	This paper proposes a classification network to image semantic retrieval (NIST) framework to counter the image retrieval challenge. Our approach leverages the successful classification network GoogleNet based on Convolutional Neural Networks to obtain the semantic feature matrix which contains the serial number of classes and corresponding probabilities. Compared with traditional image retrieval using feature matching to compute the similarity between two images, NIST leverages the semantic information to construct semantic feature matrix and uses the semantic distance algorithm to compute the similarity. Besides, the fusion strategy can significantly reduce storage and time consumption due to less classes participating in the last semantic distance computation. Experiments demonstrate that our NIST framework produces state-of-the-art results in retrieval experiments on MIRFLICKR-25K dataset.
Tasks	Image Classification, Image Retrieval
Published	2016-07-02
URL	http://arxiv.org/abs/1607.00464v1
PDF	http://arxiv.org/pdf/1607.00464v1.pdf
PWC	https://paperswithcode.com/paper/nist-an-image-classification-network-to-image
Repo
Framework

The Predictive Context Tree: Predicting Contexts and Interactions


Title	The Predictive Context Tree: Predicting Contexts and Interactions
Authors	Alasdair Thomason, Nathan Griffiths, Victor Sanchez
Abstract	With a large proportion of people carrying location-aware smartphones, we have an unprecedented platform from which to understand individuals and predict their future actions. This work builds upon the Context Tree data structure that summarises the historical contexts of individuals from augmented geospatial trajectories, and constructs a predictive model for their likely future contexts. The Predictive Context Tree (PCT) is constructed as a hierarchical classifier, capable of predicting both the future locations that a user will visit and the contexts that a user will be immersed within. The PCT is evaluated over real-world geospatial trajectories, and compared against existing location extraction and prediction techniques, as well as a proposed hybrid approach that uses identified land usage elements in combination with machine learning to predict future interactions. Our results demonstrate that higher predictive accuracies can be achieved using this hybrid approach over traditional extracted location datasets, and the PCT itself matches the performance of the hybrid approach at predicting future interactions, while adding utility in the form of context predictions. Such a prediction system is capable of understanding not only where a user will visit, but also their context, in terms of what they are likely to be doing.
Tasks
Published	2016-10-05
URL	http://arxiv.org/abs/1610.01381v1
PDF	http://arxiv.org/pdf/1610.01381v1.pdf
PWC	https://paperswithcode.com/paper/the-predictive-context-tree-predicting
Repo
Framework


Title	Orthographic Syllable as basic unit for SMT between Related Languages
Authors	Anoop Kunchukuttan, Pushpak Bhattacharyya
Abstract	We explore the use of the orthographic syllable, a variable-length consonant-vowel sequence, as a basic unit of translation between related languages which use abugida or alphabetic scripts. We show that orthographic syllable level translation significantly outperforms models trained over other basic units (word, morpheme and character) when training over small parallel corpora.
Tasks
Published	2016-10-03
URL	http://arxiv.org/abs/1610.00634v1
PDF	http://arxiv.org/pdf/1610.00634v1.pdf
PWC	https://paperswithcode.com/paper/orthographic-syllable-as-basic-unit-for-smt
Repo
Framework

Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks


Title	Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks
Authors	Eder Santana, Matthew Emigh, Pablo Zegers, Jose C Principe
Abstract	We propose a convolutional recurrent neural network, with Winner-Take-All dropout for high dimensional unsupervised feature learning in multi-dimensional time series. We apply the proposedmethod for object recognition with temporal context in videos and obtain better results than comparable methods in the literature, including the Deep Predictive Coding Networks previously proposed by Chalasani and Principe.Our contributions can be summarized as a scalable reinterpretation of the Deep Predictive Coding Networks trained end-to-end with backpropagation through time, an extension of the previously proposed Winner-Take-All Autoencoders to sequences in time, and a new technique for initializing and regularizing convolutional-recurrent neural networks.
Tasks	Object Recognition, Time Series
Published	2016-10-31
URL	http://arxiv.org/abs/1611.00050v2
PDF	http://arxiv.org/pdf/1611.00050v2.pdf
PWC	https://paperswithcode.com/paper/exploiting-spatio-temporal-structure-with
Repo
Framework

Nonparametric Bayesian Negative Binomial Factor Analysis


Title	Nonparametric Bayesian Negative Binomial Factor Analysis
Authors	Mingyuan Zhou
Abstract	A common approach to analyze a covariate-sample count matrix, an element of which represents how many times a covariate appears in a sample, is to factorize it under the Poisson likelihood. We show its limitation in capturing the tendency for a covariate present in a sample to both repeat itself and excite related ones. To address this limitation, we construct negative binomial factor analysis (NBFA) to factorize the matrix under the negative binomial likelihood, and relate it to a Dirichlet-multinomial distribution based mixed-membership model. To support countably infinite factors, we propose the hierarchical gamma-negative binomial process. By exploiting newly proved connections between discrete distributions, we construct two blocked and a collapsed Gibbs sampler that all adaptively truncate their number of factors, and demonstrate that the blocked Gibbs sampler developed under a compound Poisson representation converges fast and has low computational complexity. Example results show that NBFA has a distinct mechanism in adjusting its number of inferred factors according to the sample lengths, and provides clear advantages in parsimonious representation, predictive power, and computational complexity over previously proposed discrete latent variable models, which either completely ignore burstiness, or model only the burstiness of the covariates but not that of the factors.
Tasks	Latent Variable Models
Published	2016-04-25
URL	http://arxiv.org/abs/1604.07464v2
PDF	http://arxiv.org/pdf/1604.07464v2.pdf
PWC	https://paperswithcode.com/paper/nonparametric-bayesian-negative-binomial
Repo
Framework


Title	Blind Source Separation: Fundamentals and Recent Advances (A Tutorial Overview Presented at SBrT-2001)
Authors	Eleftherios Kofidis
Abstract	Blind source separation (BSS), i.e., the decoupling of unknown signals that have been mixed in an unknown way, has been a topic of great interest in the signal processing community for the last decade, covering a wide range of applications in such diverse fields as digital communications, pattern recognition, biomedical engineering, and financial data analysis, among others. This course aims at an introduction to the BSS problem via an exposition of well-known and established as well as some more recent approaches to its solution. A unified way is followed in presenting the various results so as to more easily bring out their similarities/differences and emphasize their relative advantages/disadvantages. Only a representative sample of the existing knowledge on BSS will be included in this course. The interested readers are encouraged to consult the list of bibliographical references for more details on this exciting and always active research topic.
Tasks
Published	2016-03-09
URL	http://arxiv.org/abs/1603.03089v1
PDF	http://arxiv.org/pdf/1603.03089v1.pdf
PWC	https://paperswithcode.com/paper/blind-source-separation-fundamentals-and
Repo
Framework

Real-time Human Pose Estimation from Video with Convolutional Neural Networks


Title	Real-time Human Pose Estimation from Video with Convolutional Neural Networks
Authors	Marko Linna, Juho Kannala, Esa Rahtu
Abstract	In this paper, we present a method for real-time multi-person human pose estimation from video by utilizing convolutional neural networks. Our method is aimed for use case specific applications, where good accuracy is essential and variation of the background and poses is limited. This enables us to use a generic network architecture, which is both accurate and fast. We divide the problem into two phases: (1) pre-training and (2) finetuning. In pre-training, the network is learned with highly diverse input data from publicly available datasets, while in finetuning we train with application specific data, which we record with Kinect. Our method differs from most of the state-of-the-art methods in that we consider the whole system, including person detector, pose estimator and an automatic way to record application specific training material for finetuning. Our method is considerably faster than many of the state-of-the-art methods. Our method can be thought of as a replacement for Kinect, and it can be used for higher level tasks, such as gesture control, games, person tracking, action recognition and action tracking. We achieved accuracy of 96.8% (PCK@0.2) with application specific data.
Tasks	Pose Estimation, Temporal Action Localization
Published	2016-09-23
URL	http://arxiv.org/abs/1609.07420v1
PDF	http://arxiv.org/pdf/1609.07420v1.pdf
PWC	https://paperswithcode.com/paper/real-time-human-pose-estimation-from-video
Repo
Framework

Minimalist Regression Network with Reinforced Gradients and Weighted Estimates: a Case Study on Parameters Estimation in Automated Welding


Title	Minimalist Regression Network with Reinforced Gradients and Weighted Estimates: a Case Study on Parameters Estimation in Automated Welding
Authors	Soheil Keshmiri
Abstract	This paper presents a minimalist neural regression network as an aggregate of independent identical regression blocks that are trained simultaneously. Moreover, it introduces a new multiplicative parameter, shared by all the neural units of a given layer, to maintain the quality of its gradients. Furthermore, it increases its estimation accuracy via learning a weight factor whose quantity captures the redundancy between the estimated and actual values at each training iteration. We choose the estimation of the direct weld parameters of different welding techniques to show a significant improvement in calculation of these parameters by our model in contrast to state-of-the-arts techniques in the literature. Furthermore, we demonstrate the ability of our model to retain its performance when presented with combined data of different welding techniques. This is a nontrivial result in attaining an scalable model whose quality of estimation is independent of adopted welding techniques.
Tasks
Published	2016-07-05
URL	http://arxiv.org/abs/1607.01136v1
PDF	http://arxiv.org/pdf/1607.01136v1.pdf
PWC	https://paperswithcode.com/paper/minimalist-regression-network-with-reinforced
Repo
Framework