July 28, 2019

3007 words 15 mins read

Paper Group ANR 340

Spatiotemporal Modeling for Crowd Counting in Videos. Sparse Kernel Canonical Correlation Analysis via $\ell_1$-regularization. Connectivity Learning in Multi-Branch Networks. Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification. SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild. An Automatic App …

Spatiotemporal Modeling for Crowd Counting in Videos


Title	Spatiotemporal Modeling for Crowd Counting in Videos
Authors	Feng Xiong, Xingjian Shi, Dit-Yan Yeung
Abstract	Region of Interest (ROI) crowd counting can be formulated as a regression problem of learning a mapping from an image or a video frame to a crowd density map. Recently, convolutional neural network (CNN) models have achieved promising results for crowd counting. However, even when dealing with video data, CNN-based methods still consider each video frame independently, ignoring the strong temporal correlation between neighboring frames. To exploit the otherwise very useful temporal information in video sequences, we propose a variant of a recent deep learning model called convolutional LSTM (ConvLSTM) for crowd counting. Unlike the previous CNN-based methods, our method fully captures both spatial and temporal dependencies. Furthermore, we extend the ConvLSTM model to a bidirectional ConvLSTM model which can access long-range information in both directions. Extensive experiments using four publicly available datasets demonstrate the reliability of our approach and the effectiveness of incorporating temporal information to boost the accuracy of crowd counting. In addition, we also conduct some transfer learning experiments to show that once our model is trained on one dataset, its learning experience can be transferred easily to a new dataset which consists of only very few video frames for model adaptation.
Tasks	Crowd Counting, Transfer Learning
Published	2017-07-25
URL	http://arxiv.org/abs/1707.07890v1
PDF	http://arxiv.org/pdf/1707.07890v1.pdf
PWC	https://paperswithcode.com/paper/spatiotemporal-modeling-for-crowd-counting-in
Repo
Framework

Sparse Kernel Canonical Correlation Analysis via $\ell_1$-regularization


Title	Sparse Kernel Canonical Correlation Analysis via $\ell_1$-regularization
Authors	Xiaowei Zhang, Delin Chu, Li-Zhi Liao, Michael K. Ng
Abstract	Canonical correlation analysis (CCA) is a multivariate statistical technique for finding the linear relationship between two sets of variables. The kernel generalization of CCA named kernel CCA has been proposed to find nonlinear relations between datasets. Despite their wide usage, they have one common limitation that is the lack of sparsity in their solution. In this paper, we consider sparse kernel CCA and propose a novel sparse kernel CCA algorithm (SKCCA). Our algorithm is based on a relationship between kernel CCA and least squares. Sparsity of the dual transformations is introduced by penalizing the $\ell_{1}$-norm of dual vectors. Experiments demonstrate that our algorithm not only performs well in computing sparse dual transformations but also can alleviate the over-fitting problem of kernel CCA.
Tasks
Published	2017-01-16
URL	http://arxiv.org/abs/1701.04207v1
PDF	http://arxiv.org/pdf/1701.04207v1.pdf
PWC	https://paperswithcode.com/paper/sparse-kernel-canonical-correlation-analysis
Repo
Framework

Connectivity Learning in Multi-Branch Networks


Title	Connectivity Learning in Multi-Branch Networks
Authors	Karim Ahmed, Lorenzo Torresani
Abstract	While much of the work in the design of convolutional networks over the last five years has revolved around the empirical investigation of the importance of depth, filter sizes, and number of feature channels, recent studies have shown that branching, i.e., splitting the computation along parallel but distinct threads and then aggregating their outputs, represents a new promising dimension for significant improvements in performance. To combat the complexity of design choices in multi-branch architectures, prior work has adopted simple strategies, such as a fixed branching factor, the same input being fed to all parallel branches, and an additive combination of the outputs produced by all branches at aggregation points. In this work we remove these predefined choices and propose an algorithm to learn the connections between branches in the network. Instead of being chosen a priori by the human designer, the multi-branch connectivity is learned simultaneously with the weights of the network by optimizing a single loss function defined with respect to the end task. We demonstrate our approach on the problem of multi-class image classification using three different datasets where it yields consistently higher accuracy compared to the state-of-the-art “ResNeXt” multi-branch network given the same learning capacity.
Tasks	Image Classification
Published	2017-09-27
URL	http://arxiv.org/abs/1709.09582v2
PDF	http://arxiv.org/pdf/1709.09582v2.pdf
PWC	https://paperswithcode.com/paper/connectivity-learning-in-multi-branch
Repo
Framework

Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification


Title	Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification
Authors	Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu, Jiebo Luo
Abstract	Text in natural images contains rich semantics that are often highly relevant to objects or scene. In this paper, we focus on the problem of fully exploiting scene text for visual understanding. The main idea is combining word representations and deep visual features into a globally trainable deep convolutional neural network. First, the recognized words are obtained by a scene text reading system. Then, we combine the word embedding of the recognized words and the deep visual features into a single representation, which is optimized by a convolutional neural network for fine-grained image classification. In our framework, the attention mechanism is adopted to reveal the relevance between each recognized word and the given image, which further enhances the recognition performance. We have performed experiments on two datasets: Con-Text dataset and Drink Bottle dataset, that are proposed for fine-grained classification of business places and drink bottles, respectively. The experimental results consistently demonstrate that the proposed method combining textual and visual cues significantly outperforms classification with only visual representations. Moreover, we have shown that the learned representation improves the retrieval performance on the drink bottle images by a large margin, making it potentially useful in product search.
Tasks	Fine-Grained Image Classification, Image Classification
Published	2017-04-15
URL	http://arxiv.org/abs/1704.04613v2
PDF	http://arxiv.org/pdf/1704.04613v2.pdf
PWC	https://paperswithcode.com/paper/integrating-scene-text-and-visual-appearance
Repo
Framework

SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild


Title	SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild
Authors	Soumyadip Sengupta, Angjoo Kanazawa, Carlos D. Castillo, David Jacobs
Abstract	We present SfSNet, an end-to-end learning framework for producing an accurate decomposition of an unconstrained human face image into shape, reflectance and illuminance. SfSNet is designed to reflect a physical lambertian rendering model. SfSNet learns from a mixture of labeled synthetic and unlabeled real world images. This allows the network to capture low frequency variations from synthetic and high frequency details from real images through the photometric reconstruction loss. SfSNet consists of a new decomposition architecture with residual blocks that learns a complete separation of albedo and normal. This is used along with the original image to predict lighting. SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation.
Tasks
Published	2017-12-02
URL	http://arxiv.org/abs/1712.01261v2
PDF	http://arxiv.org/pdf/1712.01261v2.pdf
PWC	https://paperswithcode.com/paper/sfsnet-learning-shape-reflectance-and-1
Repo
Framework

An Automatic Approach for Document-level Topic Model Evaluation


Title	An Automatic Approach for Document-level Topic Model Evaluation
Authors	Shraey Bhatia, Jey Han Lau, Timothy Baldwin
Abstract	Topic models jointly learn topics and document-level topic distribution. Extrinsic evaluation of topic models tends to focus exclusively on topic-level evaluation, e.g. by assessing the coherence of topics. We demonstrate that there can be large discrepancies between topic- and document-level model quality, and that basing model evaluation on topic-level analysis can be highly misleading. We propose a method for automatically predicting topic model quality based on analysis of document-level topic allocations, and provide empirical evidence for its robustness.
Tasks	Topic Models
Published	2017-06-16
URL	http://arxiv.org/abs/1706.05140v1
PDF	http://arxiv.org/pdf/1706.05140v1.pdf
PWC	https://paperswithcode.com/paper/an-automatic-approach-for-document-level
Repo
Framework

WACSF - Weighted Atom-Centered Symmetry Functions as Descriptors in Machine Learning Potentials


Title	WACSF - Weighted Atom-Centered Symmetry Functions as Descriptors in Machine Learning Potentials
Authors	Michael Gastegger, Ludwig Schwiedrzik, Marius Bittermann, Florian Berzsenyi, Philipp Marquetand
Abstract	We introduce weighted atom-centered symmetry functions (wACSFs) as descriptors of a chemical system’s geometry for use in the prediction of chemical properties such as enthalpies or potential energies via machine learning. The wACSFs are based on conventional atom-centered symmetry functions (ACSFs) but overcome the undesirable scaling of the latter with increasing number of different elements in a chemical system. The performance of these two descriptors is compared using them as inputs in high-dimensional neural network potentials (HDNNPs), employing the molecular structures and associated enthalpies of the 133855 molecules containing up to five different elements reported in the QM9 database as reference data. A substantially smaller number of wACSFs than ACSFs is needed to obtain a comparable spatial resolution of the molecular structures. At the same time, this smaller set of wACSFs leads to significantly better generalization performance in the machine learning potential than the large set of conventional ACSFs. Furthermore, we show that the intrinsic parameters of the descriptors can in principle be optimized with a genetic algorithm in a highly automated manner. For the wACSFs employed here, we find however that using a simple empirical parametrization scheme is sufficient in order to obtain HDNNPs with high accuracy.
Tasks
Published	2017-12-15
URL	http://arxiv.org/abs/1712.05861v1
PDF	http://arxiv.org/pdf/1712.05861v1.pdf
PWC	https://paperswithcode.com/paper/wacsf-weighted-atom-centered-symmetry
Repo
Framework

Neural Text Generation: A Practical Guide


Title	Neural Text Generation: A Practical Guide
Authors	Ziang Xie
Abstract	Deep learning methods have recently achieved great empirical success on machine translation, dialogue response generation, summarization, and other text generation tasks. At a high level, the technique has been to train end-to-end neural network models consisting of an encoder model to produce a hidden representation of the source text, followed by a decoder model to generate the target. While such models have significantly fewer pieces than earlier systems, significant tuning is still required to achieve good performance. For text generation models in particular, the decoder can behave in undesired ways, such as by generating truncated or repetitive outputs, outputting bland and generic responses, or in some cases producing ungrammatical gibberish. This paper is intended as a practical guide for resolving such undesired behavior in text generation models, with the aim of helping enable real-world applications.
Tasks	Machine Translation, Text Generation
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09534v1
PDF	http://arxiv.org/pdf/1711.09534v1.pdf
PWC	https://paperswithcode.com/paper/neural-text-generation-a-practical-guide
Repo
Framework

Content-Based Weak Supervision for Ad-Hoc Re-Ranking


Title	Content-Based Weak Supervision for Ad-Hoc Re-Ranking
Authors	Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder
Abstract	One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document pairs that already exhibit relevance (e.g., newswire headline-content pairs and encyclopedic heading-paragraph pairs). We also propose filtering techniques to eliminate training samples that are too far out of domain using two techniques: a heuristic-based approach and novel supervised filter that re-purposes a neural ranker. Using several leading neural ranking architectures and multiple weak supervision datasets, we show that these sources of training pairs are effective on their own (outperforming prior weak supervision techniques), and that filtering can further improve performance.
Tasks	Information Retrieval
Published	2017-07-01
URL	https://arxiv.org/abs/1707.00189v3
PDF	https://arxiv.org/pdf/1707.00189v3.pdf
PWC	https://paperswithcode.com/paper/an-approach-for-weakly-supervised-deep
Repo
Framework

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification


Title	Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification
Authors	Yunlong Bian, Chuang Gan, Xiao Liu, Fu Li, Xiang Long, Yandong Li, Heng Qi, Jie Zhou, Shilei Wen, Yuanqing Lin
Abstract	This paper describes our solution for the video recognition task of ActivityNet Kinetics challenge that ranked the 1st place. Most of existing state-of-the-art video recognition approaches are in favor of an end-to-end pipeline. One exception is the framework of DevNet. The merit of DevNet is that they first use the video data to learn a network (i.e. fine-tuning or training from scratch). Instead of directly using the end-to-end classification scores (e.g. softmax scores), they extract the features from the learned network and then fed them into the off-the-shelf machine learning models to conduct video classification. However, the effectiveness of this line work has long-term been ignored and underestimated. In this submission, we extensively use this strategy. Particularly, we investigate four temporal modeling approaches using the learned features: Multi-group Shifting Attention Network, Temporal Xception Network, Multi-stream sequence Model and Fast-Forward Sequence Model. Experiment results on the challenging Kinetics dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing approaches in the large-scale video recognition tasks. Most remarkably, our best single Multi-group Shifting Attention Network can achieve 77.7% in term of top-1 accuracy and 93.2% in term of top-5 accuracy on the validation set.
Tasks	Video Classification, Video Recognition
Published	2017-08-12
URL	http://arxiv.org/abs/1708.03805v1
PDF	http://arxiv.org/pdf/1708.03805v1.pdf
PWC	https://paperswithcode.com/paper/revisiting-the-effectiveness-of-off-the-shelf
Repo
Framework

DeepTransport: Learning Spatial-Temporal Dependency for Traffic Condition Forecasting


Title	DeepTransport: Learning Spatial-Temporal Dependency for Traffic Condition Forecasting
Authors	Xingyi Cheng, Ruiqing Zhang, Jie Zhou, Wei Xu
Abstract	Predicting traffic conditions has been recently explored as a way to relieve traffic congestion. Several pioneering approaches have been proposed based on traffic observations of the target location as well as its adjacent regions, but they obtain somewhat limited accuracy due to lack of mining road topology. To address the effect attenuation problem, we propose to take account of the traffic of surrounding locations(wider than adjacent range). We propose an end-to-end framework called DeepTransport, in which Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are utilized to obtain spatial-temporal traffic information within a transport network topology. In addition, attention mechanism is introduced to align spatial and temporal information. Moreover, we constructed and released a real-world large traffic condition dataset with 5-minute resolution. Our experiments on this dataset demonstrate our method captures the complex relationship in temporal and spatial domain. It significantly outperforms traditional statistical methods and a state-of-the-art deep learning method.
Tasks
Published	2017-09-27
URL	https://arxiv.org/abs/1709.09585v2
PDF	https://arxiv.org/pdf/1709.09585v2.pdf
PWC	https://paperswithcode.com/paper/deeptransport-learning-spatial-temporal
Repo
Framework

Negative-Unlabeled Tensor Factorization for Location Category Inference from Highly Inaccurate Mobility Data


Title	Negative-Unlabeled Tensor Factorization for Location Category Inference from Highly Inaccurate Mobility Data
Authors	Jinfeng Yi, Qi Lei, Wesley Gifford, Ji Liu, Junchi Yan
Abstract	Identifying significant location categories visited by mobile users is the key to a variety of applications. This is an extremely challenging task due to the possible deviation between the estimated location coordinate and the actual location, which could be on the order of kilometers. To estimate the actual location category more precisely, we propose a novel tensor factorization framework, through several key observations including the intrinsic correlations between users, to infer the most likely location categories within the location uncertainty circle. In addition, the proposed algorithm can also predict where users are even in the absence of location information. In order to efficiently solve the proposed framework, we propose a parameter-free and scalable optimization algorithm by effectively exploring the sparse and low-rank structure of the tensor. Our empirical studies show that the proposed algorithm is both efficient and effective: it can solve problems with millions of users and billions of location updates, and also provides superior prediction accuracies on real-world location updates and check-in data sets.
Tasks
Published	2017-02-21
URL	http://arxiv.org/abs/1702.06362v3
PDF	http://arxiv.org/pdf/1702.06362v3.pdf
PWC	https://paperswithcode.com/paper/negative-unlabeled-tensor-factorization-for
Repo
Framework

Attention Transfer from Web Images for Video Recognition


Title	Attention Transfer from Web Images for Video Recognition
Authors	Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli
Abstract	Training deep learning based video classifiers for action recognition requires a large amount of labeled videos. The labeling process is labor-intensive and time-consuming. On the other hand, large amount of weakly-labeled images are uploaded to the Internet by users everyday. To harness the rich and highly diverse set of Web images, a scalable approach is to crawl these images to train deep learning based classifier, such as Convolutional Neural Networks (CNN). However, due to the domain shift problem, the performance of Web images trained deep classifiers tend to degrade when directly deployed to videos. One way to address this problem is to fine-tune the trained models on videos, but sufficient amount of annotated videos are still required. In this work, we propose a novel approach to transfer knowledge from image domain to video domain. The proposed method can adapt to the target domain (i.e. video data) with limited amount of training data. Our method maps the video frames into a low-dimensional feature space using the class-discriminative spatial attention map for CNNs. We design a novel Siamese EnergyNet structure to learn energy functions on the attention maps by jointly optimizing two loss functions, such that the attention map corresponding to a ground truth concept would have higher energy. We conduct extensive experiments on two challenging video recognition datasets (i.e. TVHI and UCF101), and demonstrate the efficacy of our proposed method.
Tasks	Temporal Action Localization, Video Recognition
Published	2017-08-03
URL	http://arxiv.org/abs/1708.00973v1
PDF	http://arxiv.org/pdf/1708.00973v1.pdf
PWC	https://paperswithcode.com/paper/attention-transfer-from-web-images-for-video
Repo
Framework

Large-scale Video Classification guided by Batch Normalized LSTM Translator


Title	Large-scale Video Classification guided by Batch Normalized LSTM Translator
Authors	Jae Hyeon Yoo
Abstract	Youtube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it is a challenging task to classify a huge amount of multi-labels. By change of perspective, we propose a novel method by regarding labels as words. In details, we describe online learning approaches to multi-label video classification that are guided by deep recurrent neural networks for video to sentence translator. We designed the translator based on LSTMs and found out that a stochastic gating before the input of each LSTM cell can help us to design the structural details. In addition, we adopted batch normalizations into our models to improve our LSTM models. Since our models are feature extractors, they can be used with other classifiers. Finally we report improved validation results of our models on large-scale Youtube-8M datasets and discussions for the further improvement.
Tasks	Image Classification, Video Classification, Video Recognition
Published	2017-07-13
URL	http://arxiv.org/abs/1707.04045v1
PDF	http://arxiv.org/pdf/1707.04045v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-video-classification-guided-by
Repo
Framework

Comparative Study Of Data Mining Query Languages


Title	Comparative Study Of Data Mining Query Languages
Authors	Mohamed Anis Bach Tobji
Abstract	Since formulation of Inductive Database (IDB) problem, several Data Mining (DM) languages have been proposed, confirming that KDD process could be supported via inductive queries (IQ) answering. This paper reviews the existing DM languages. We are presenting important primitives of the DM language and classifying our languages according to primitives’ satisfaction. In addition, we presented languages’ syntaxes and tried to apply each one to a database sample to test a set of KDD operations. This study allows us to highlight languages capabilities and limits, which is very useful for future work and perspectives.
Tasks
Published	2017-01-27
URL	http://arxiv.org/abs/1701.08190v1
PDF	http://arxiv.org/pdf/1701.08190v1.pdf
PWC	https://paperswithcode.com/paper/comparative-study-of-data-mining-query
Repo
Framework