Paper Group ANR 97
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task. Low Rank Approximation with Entrywise $\ell_1$-Norm Error. Semantic-Aware Depth Super-Resolution in Outdoor Scenes. LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning. Hierarchical Gaussian Mixture Model with Objects Attached …
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task
Title | Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task |
Authors | Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut |
Abstract | We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options. Accomplishing the task entails demonstrating comprehension beyond just recognizing “keywords” (or key-phrases) and their corresponding visual concepts. Instead, it requires an alignment between the representations of the two modalities that achieves a visually-grounded “understanding” of various linguistic elements and their dependencies. This new task also admits an easy-to-compute and well-studied metric: the accuracy in detecting the true target among the decoys. The paper makes several contributions: an effective and extensible mechanism for generating decoys from (human-created) image captions; an instance of applying this mechanism, yielding a large-scale machine comprehension dataset (based on the COCO images and captions) that we make publicly available; human evaluation results on this dataset, informing a performance upper-bound; and several baseline and competitive learning approaches that illustrate the utility of the proposed task and dataset in advancing both image and language comprehension. We also show that, in a multi-task learning setting, the performance on the proposed task is positively correlated with the end-to-end task of image captioning. |
Tasks | Image Captioning, Multi-Task Learning, Reading Comprehension |
Published | 2016-12-22 |
URL | http://arxiv.org/abs/1612.07833v1 |
http://arxiv.org/pdf/1612.07833v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-image-and-text-simultaneously-a |
Repo | |
Framework | |
Low Rank Approximation with Entrywise $\ell_1$-Norm Error
Title | Low Rank Approximation with Entrywise $\ell_1$-Norm Error |
Authors | Zhao Song, David P. Woodruff, Peilin Zhong |
Abstract | We study the $\ell_1$-low rank approximation problem, where for a given $n \times d$ matrix $A$ and approximation factor $\alpha \geq 1$, the goal is to output a rank-$k$ matrix $\widehat{A}$ for which $$\A-\widehat{A}_1 \leq \alpha \cdot \min_{\textrm{rank-}k\textrm{ matrices}~A’}\A-A’_1,$$ where for an $n \times d$ matrix $C$, we let $\C_1 = \sum_{i=1}^n \sum_{j=1}^d C_{i,j}$. This error measure is known to be more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. The problem was shown to be NP-hard by Gillis and Vavasis and a number of heuristics have been proposed. It was asked in multiple places if there are any approximation algorithms. We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$. If $k$ is constant, we further improve the approximation ratio to $O(1)$ with a $\mathrm{poly}(nd)$-time algorithm. Under the Exponential Time Hypothesis, we show there is no $\mathrm{poly}(nd)$-time algorithm achieving a $(1+\frac{1}{\log^{1+\gamma}(nd)})$-approximation, for $\gamma > 0$ an arbitrarily small constant, even when $k = 1$. We give a number of additional results for $\ell_1$-low rank approximation: nearly tight upper and lower bounds for column subset selection, CUR decompositions, extensions to low rank approximation with respect to $\ell_p$-norms for $1 \leq p < 2$ and earthmover distance, low-communication distributed protocols and low-memory streaming algorithms, algorithms with limited randomness, and bicriteria algorithms. We also give a preliminary empirical evaluation. |
Tasks | |
Published | 2016-11-03 |
URL | http://arxiv.org/abs/1611.00898v1 |
http://arxiv.org/pdf/1611.00898v1.pdf | |
PWC | https://paperswithcode.com/paper/low-rank-approximation-with-entrywise-ell_1 |
Repo | |
Framework | |
Semantic-Aware Depth Super-Resolution in Outdoor Scenes
Title | Semantic-Aware Depth Super-Resolution in Outdoor Scenes |
Authors | Miaomiao Liu, Mathieu Salzmann, Xuming He |
Abstract | While depth sensors are becoming increasingly popular, their spatial resolution often remains limited. Depth super-resolution therefore emerged as a solution to this problem. Despite much progress, state-of-the-art techniques suffer from two drawbacks: (i) they rely on the assumption that intensity edges coincide with depth discontinuities, which, unfortunately, is only true in controlled environments; and (ii) they typically exploit the availability of high-resolution training depth maps, which can often not be acquired in practice due to the sensors’ limitations. By contrast, here, we introduce an approach to performing depth super-resolution in more challenging conditions, such as in outdoor scenes. To this end, we first propose to exploit semantic information to better constrain the super-resolution process. In particular, we design a co-sparse analysis model that learns filters from joint intensity, depth and semantic information. Furthermore, we show how low-resolution training depth maps can be employed in our learning strategy. We demonstrate the benefits of our approach over state-of-the-art depth super-resolution methods on two outdoor scene datasets. |
Tasks | Super-Resolution |
Published | 2016-05-31 |
URL | http://arxiv.org/abs/1605.09546v1 |
http://arxiv.org/pdf/1605.09546v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-aware-depth-super-resolution-in |
Repo | |
Framework | |
LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning
Title | LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning |
Authors | Ernest Cheung, Tsan Kwong Wong, Aniket Bera, Xiaogang Wang, Dinesh Manocha |
Abstract | We present a novel procedural framework to generate an arbitrary number of labeled crowd videos (LCrowdV). The resulting crowd video datasets are used to design accurate algorithms or training models for crowded scene understanding. Our overall approach is composed of two components: a procedural simulation framework for generating crowd movements and behaviors, and a procedural rendering framework to generate different videos or images. Each video or image is automatically labeled based on the environment, number of pedestrians, density, behavior, flow, lighting conditions, viewpoint, noise, etc. Furthermore, we can increase the realism by combining synthetically-generated behaviors with real-world background videos. We demonstrate the benefits of LCrowdV over prior lableled crowd datasets by improving the accuracy of pedestrian detection and crowd behavior classification algorithms. LCrowdV would be released on the WWW. |
Tasks | Pedestrian Detection, Scene Understanding |
Published | 2016-06-29 |
URL | http://arxiv.org/abs/1606.08998v2 |
http://arxiv.org/pdf/1606.08998v2.pdf | |
PWC | https://paperswithcode.com/paper/lcrowdv-generating-labeled-videos-for |
Repo | |
Framework | |
Hierarchical Gaussian Mixture Model with Objects Attached to Terminal and Non-terminal Dendrogram Nodes
Title | Hierarchical Gaussian Mixture Model with Objects Attached to Terminal and Non-terminal Dendrogram Nodes |
Authors | Łukasz P. Olech, Mariusz Paradowski |
Abstract | A hierarchical clustering algorithm based on Gaussian mixture model is presented. The key difference to regular hierarchical mixture models is the ability to store objects in both terminal and nonterminal nodes. Upper levels of the hierarchy contain sparsely distributed objects, while lower levels contain densely represented ones. As it was shown by experiments, this ability helps in noise detection (modelling). Furthermore, compared to regular hierarchical mixture model, the presented method generates more compact dendrograms with higher quality measured by adopted F-measure. |
Tasks | |
Published | 2016-03-28 |
URL | http://arxiv.org/abs/1603.08342v1 |
http://arxiv.org/pdf/1603.08342v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-gaussian-mixture-model-with |
Repo | |
Framework | |
Generalized Linear Models for Aggregated Data
Title | Generalized Linear Models for Aggregated Data |
Authors | Avradeep Bhowmik, Joydeep Ghosh, Oluwasanmi Koyejo |
Abstract | Databases in domains such as healthcare are routinely released to the public in aggregated form. Unfortunately, naive modeling with aggregated data may significantly diminish the accuracy of inferences at the individual level. This paper addresses the scenario where features are provided at the individual level, but the target variables are only available as histogram aggregates or order statistics. We consider a limiting case of generalized linear modeling when the target variables are only known up to permutation, and explore how this relates to permutation testing; a standard technique for assessing statistical dependency. Based on this relationship, we propose a simple algorithm to estimate the model parameters and individual level inferences via alternating imputation and standard generalized linear model fitting. Our results suggest the effectiveness of the proposed approach when, in the original data, permutation testing accurately ascertains the veracity of the linear relationship. The framework is extended to general histogram data with larger bins - with order statistics such as the median as a limiting case. Our experimental results on simulated data and aggregated healthcare data suggest a diminishing returns property with respect to the granularity of the histogram - when a linear relationship holds in the original data, the targets can be predicted accurately given relatively coarse histograms. |
Tasks | Imputation |
Published | 2016-05-14 |
URL | http://arxiv.org/abs/1605.04466v1 |
http://arxiv.org/pdf/1605.04466v1.pdf | |
PWC | https://paperswithcode.com/paper/generalized-linear-models-for-aggregated-data |
Repo | |
Framework | |
Redefining Binarization and the Visual Archetype
Title | Redefining Binarization and the Visual Archetype |
Authors | Anguelos Nicolaou, Liwicki Marcus |
Abstract | Although binarization is considered passe, it still remains a highly popular research topic. In this paper we propose a rethinking of what binarization is. We introduce the notion of the visual archetype as the ideal form of any one document. Binarization can be defined as the restoration of the visual archetype for a class of images. This definition broadens the scope of what binarization means but also suggests ground-truth should focus on the foreground. |
Tasks | Document Binarization |
Published | 2016-09-29 |
URL | http://arxiv.org/abs/1609.09451v1 |
http://arxiv.org/pdf/1609.09451v1.pdf | |
PWC | https://paperswithcode.com/paper/redefining-binarization-and-the-visual |
Repo | |
Framework | |
NIST: An Image Classification Network to Image Semantic Retrieval
Title | NIST: An Image Classification Network to Image Semantic Retrieval |
Authors | Le Dong, Xiuyuan Chen, Mengdie Mao, Qianni Zhang |
Abstract | This paper proposes a classification network to image semantic retrieval (NIST) framework to counter the image retrieval challenge. Our approach leverages the successful classification network GoogleNet based on Convolutional Neural Networks to obtain the semantic feature matrix which contains the serial number of classes and corresponding probabilities. Compared with traditional image retrieval using feature matching to compute the similarity between two images, NIST leverages the semantic information to construct semantic feature matrix and uses the semantic distance algorithm to compute the similarity. Besides, the fusion strategy can significantly reduce storage and time consumption due to less classes participating in the last semantic distance computation. Experiments demonstrate that our NIST framework produces state-of-the-art results in retrieval experiments on MIRFLICKR-25K dataset. |
Tasks | Image Classification, Image Retrieval |
Published | 2016-07-02 |
URL | http://arxiv.org/abs/1607.00464v1 |
http://arxiv.org/pdf/1607.00464v1.pdf | |
PWC | https://paperswithcode.com/paper/nist-an-image-classification-network-to-image |
Repo | |
Framework | |
The Predictive Context Tree: Predicting Contexts and Interactions
Title | The Predictive Context Tree: Predicting Contexts and Interactions |
Authors | Alasdair Thomason, Nathan Griffiths, Victor Sanchez |
Abstract | With a large proportion of people carrying location-aware smartphones, we have an unprecedented platform from which to understand individuals and predict their future actions. This work builds upon the Context Tree data structure that summarises the historical contexts of individuals from augmented geospatial trajectories, and constructs a predictive model for their likely future contexts. The Predictive Context Tree (PCT) is constructed as a hierarchical classifier, capable of predicting both the future locations that a user will visit and the contexts that a user will be immersed within. The PCT is evaluated over real-world geospatial trajectories, and compared against existing location extraction and prediction techniques, as well as a proposed hybrid approach that uses identified land usage elements in combination with machine learning to predict future interactions. Our results demonstrate that higher predictive accuracies can be achieved using this hybrid approach over traditional extracted location datasets, and the PCT itself matches the performance of the hybrid approach at predicting future interactions, while adding utility in the form of context predictions. Such a prediction system is capable of understanding not only where a user will visit, but also their context, in terms of what they are likely to be doing. |
Tasks | |
Published | 2016-10-05 |
URL | http://arxiv.org/abs/1610.01381v1 |
http://arxiv.org/pdf/1610.01381v1.pdf | |
PWC | https://paperswithcode.com/paper/the-predictive-context-tree-predicting |
Repo | |
Framework | |
Orthographic Syllable as basic unit for SMT between Related Languages
Title | Orthographic Syllable as basic unit for SMT between Related Languages |
Authors | Anoop Kunchukuttan, Pushpak Bhattacharyya |
Abstract | We explore the use of the orthographic syllable, a variable-length consonant-vowel sequence, as a basic unit of translation between related languages which use abugida or alphabetic scripts. We show that orthographic syllable level translation significantly outperforms models trained over other basic units (word, morpheme and character) when training over small parallel corpora. |
Tasks | |
Published | 2016-10-03 |
URL | http://arxiv.org/abs/1610.00634v1 |
http://arxiv.org/pdf/1610.00634v1.pdf | |
PWC | https://paperswithcode.com/paper/orthographic-syllable-as-basic-unit-for-smt |
Repo | |
Framework | |
Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks
Title | Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks |
Authors | Eder Santana, Matthew Emigh, Pablo Zegers, Jose C Principe |
Abstract | We propose a convolutional recurrent neural network, with Winner-Take-All dropout for high dimensional unsupervised feature learning in multi-dimensional time series. We apply the proposedmethod for object recognition with temporal context in videos and obtain better results than comparable methods in the literature, including the Deep Predictive Coding Networks previously proposed by Chalasani and Principe.Our contributions can be summarized as a scalable reinterpretation of the Deep Predictive Coding Networks trained end-to-end with backpropagation through time, an extension of the previously proposed Winner-Take-All Autoencoders to sequences in time, and a new technique for initializing and regularizing convolutional-recurrent neural networks. |
Tasks | Object Recognition, Time Series |
Published | 2016-10-31 |
URL | http://arxiv.org/abs/1611.00050v2 |
http://arxiv.org/pdf/1611.00050v2.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-spatio-temporal-structure-with |
Repo | |
Framework | |
Nonparametric Bayesian Negative Binomial Factor Analysis
Title | Nonparametric Bayesian Negative Binomial Factor Analysis |
Authors | Mingyuan Zhou |
Abstract | A common approach to analyze a covariate-sample count matrix, an element of which represents how many times a covariate appears in a sample, is to factorize it under the Poisson likelihood. We show its limitation in capturing the tendency for a covariate present in a sample to both repeat itself and excite related ones. To address this limitation, we construct negative binomial factor analysis (NBFA) to factorize the matrix under the negative binomial likelihood, and relate it to a Dirichlet-multinomial distribution based mixed-membership model. To support countably infinite factors, we propose the hierarchical gamma-negative binomial process. By exploiting newly proved connections between discrete distributions, we construct two blocked and a collapsed Gibbs sampler that all adaptively truncate their number of factors, and demonstrate that the blocked Gibbs sampler developed under a compound Poisson representation converges fast and has low computational complexity. Example results show that NBFA has a distinct mechanism in adjusting its number of inferred factors according to the sample lengths, and provides clear advantages in parsimonious representation, predictive power, and computational complexity over previously proposed discrete latent variable models, which either completely ignore burstiness, or model only the burstiness of the covariates but not that of the factors. |
Tasks | Latent Variable Models |
Published | 2016-04-25 |
URL | http://arxiv.org/abs/1604.07464v2 |
http://arxiv.org/pdf/1604.07464v2.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-bayesian-negative-binomial |
Repo | |
Framework | |
Blind Source Separation: Fundamentals and Recent Advances (A Tutorial Overview Presented at SBrT-2001)
Title | Blind Source Separation: Fundamentals and Recent Advances (A Tutorial Overview Presented at SBrT-2001) |
Authors | Eleftherios Kofidis |
Abstract | Blind source separation (BSS), i.e., the decoupling of unknown signals that have been mixed in an unknown way, has been a topic of great interest in the signal processing community for the last decade, covering a wide range of applications in such diverse fields as digital communications, pattern recognition, biomedical engineering, and financial data analysis, among others. This course aims at an introduction to the BSS problem via an exposition of well-known and established as well as some more recent approaches to its solution. A unified way is followed in presenting the various results so as to more easily bring out their similarities/differences and emphasize their relative advantages/disadvantages. Only a representative sample of the existing knowledge on BSS will be included in this course. The interested readers are encouraged to consult the list of bibliographical references for more details on this exciting and always active research topic. |
Tasks | |
Published | 2016-03-09 |
URL | http://arxiv.org/abs/1603.03089v1 |
http://arxiv.org/pdf/1603.03089v1.pdf | |
PWC | https://paperswithcode.com/paper/blind-source-separation-fundamentals-and |
Repo | |
Framework | |
Real-time Human Pose Estimation from Video with Convolutional Neural Networks
Title | Real-time Human Pose Estimation from Video with Convolutional Neural Networks |
Authors | Marko Linna, Juho Kannala, Esa Rahtu |
Abstract | In this paper, we present a method for real-time multi-person human pose estimation from video by utilizing convolutional neural networks. Our method is aimed for use case specific applications, where good accuracy is essential and variation of the background and poses is limited. This enables us to use a generic network architecture, which is both accurate and fast. We divide the problem into two phases: (1) pre-training and (2) finetuning. In pre-training, the network is learned with highly diverse input data from publicly available datasets, while in finetuning we train with application specific data, which we record with Kinect. Our method differs from most of the state-of-the-art methods in that we consider the whole system, including person detector, pose estimator and an automatic way to record application specific training material for finetuning. Our method is considerably faster than many of the state-of-the-art methods. Our method can be thought of as a replacement for Kinect, and it can be used for higher level tasks, such as gesture control, games, person tracking, action recognition and action tracking. We achieved accuracy of 96.8% (PCK@0.2) with application specific data. |
Tasks | Pose Estimation, Temporal Action Localization |
Published | 2016-09-23 |
URL | http://arxiv.org/abs/1609.07420v1 |
http://arxiv.org/pdf/1609.07420v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-human-pose-estimation-from-video |
Repo | |
Framework | |
Minimalist Regression Network with Reinforced Gradients and Weighted Estimates: a Case Study on Parameters Estimation in Automated Welding
Title | Minimalist Regression Network with Reinforced Gradients and Weighted Estimates: a Case Study on Parameters Estimation in Automated Welding |
Authors | Soheil Keshmiri |
Abstract | This paper presents a minimalist neural regression network as an aggregate of independent identical regression blocks that are trained simultaneously. Moreover, it introduces a new multiplicative parameter, shared by all the neural units of a given layer, to maintain the quality of its gradients. Furthermore, it increases its estimation accuracy via learning a weight factor whose quantity captures the redundancy between the estimated and actual values at each training iteration. We choose the estimation of the direct weld parameters of different welding techniques to show a significant improvement in calculation of these parameters by our model in contrast to state-of-the-arts techniques in the literature. Furthermore, we demonstrate the ability of our model to retain its performance when presented with combined data of different welding techniques. This is a nontrivial result in attaining an scalable model whose quality of estimation is independent of adopted welding techniques. |
Tasks | |
Published | 2016-07-05 |
URL | http://arxiv.org/abs/1607.01136v1 |
http://arxiv.org/pdf/1607.01136v1.pdf | |
PWC | https://paperswithcode.com/paper/minimalist-regression-network-with-reinforced |
Repo | |
Framework | |