July 28, 2019

3089 words 15 mins read

Paper Group ANR 438

Multi-Sensor Data Pattern Recognition for Multi-Target Localization: A Machine Learning Approach. Detekcja upadku i wybranych akcji na sekwencjach obrazów cyfrowych. Spatio-temporal Person Retrieval via Natural Language Queries. Interspecies Knowledge Transfer for Facial Keypoint Detection. A Closed-Form Model for Image-Based Distant Lighting. Abno …

Multi-Sensor Data Pattern Recognition for Multi-Target Localization: A Machine Learning Approach


Title	Multi-Sensor Data Pattern Recognition for Multi-Target Localization: A Machine Learning Approach
Authors	Kasthurirengan Suresh, Samuel Silva, Johnathan Votion, Yongcan Cao
Abstract	Data-target pairing is an important step towards multi-target localization for the intelligent operation of unmanned systems. Target localization plays a crucial role in numerous applications, such as search, and rescue missions, traffic management and surveillance. The objective of this paper is to present an innovative target location learning approach, where numerous machine learning approaches, including K-means clustering and supported vector machines (SVM), are used to learn the data pattern across a list of spatially distributed sensors. To enable the accurate data association from different sensors for accurate target localization, appropriate data pre-processing is essential, which is then followed by the application of different machine learning algorithms to appropriately group data from different sensors for the accurate localization of multiple targets. Through simulation examples, the performance of these machine learning algorithms is quantified and compared.
Tasks
Published	2017-02-28
URL	http://arxiv.org/abs/1703.00084v1
PDF	http://arxiv.org/pdf/1703.00084v1.pdf
PWC	https://paperswithcode.com/paper/multi-sensor-data-pattern-recognition-for
Repo
Framework

Detekcja upadku i wybranych akcji na sekwencjach obrazów cyfrowych


Title	Detekcja upadku i wybranych akcji na sekwencjach obrazów cyfrowych
Authors	Michal Kepski
Abstract	In recent years a growing interest on action recognition is observed, including detection of fall accident for the elderly. However, despite many efforts undertaken, the existing technology is not widely used by elderly, mainly because of its flaws like low precision, large number of false alarms, inadequate privacy preserving during data acquisition and processing. This research work meets these expectations. The work is empirical and it is situated in the field of computer vision systems. The main part of the work situates itself in the area of action and behavior recognition. Efficient algorithms for fall detection were developed, tested and implemented using image sequences and wireless inertial sensor worn by a monitored person. A set of descriptors for depth maps has been elaborated to permit classification of pose as well as the action of a person. Experimental research was carried out based on the prepared data repository consisting of synchronized depth and accelerometric data. The study was carried out in the scenario with a static camera facing the scene and an active camera observing the scene from above. The experimental results showed that the developed algorithms for fall detection have high sensitivity and specificity. The algorithm were designed with regard to low computational demands and possibility to run on ARM platforms. Several experiments including person detection, tracking and fall detection in real-time were carried out to show efficiency and reliability of the proposed solutions.
Tasks	Human Detection, Temporal Action Localization
Published	2017-06-25
URL	http://arxiv.org/abs/1706.08107v1
PDF	http://arxiv.org/pdf/1706.08107v1.pdf
PWC	https://paperswithcode.com/paper/detekcja-upadku-i-wybranych-akcji-na
Repo
Framework

Spatio-temporal Person Retrieval via Natural Language Queries


Title	Spatio-temporal Person Retrieval via Natural Language Queries
Authors	Masataka Yamaguchi, Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada
Abstract	In this paper, we address the problem of spatio-temporal person retrieval from multiple videos using a natural language query, in which we output a tube (i.e., a sequence of bounding boxes) which encloses the person described by the query. For this problem, we introduce a novel dataset consisting of videos containing people annotated with bounding boxes for each second and with five natural language descriptions. To retrieve the tube of the person described by a given natural language query, we design a model that combines methods for spatio-temporal human detection and multimodal retrieval. We conduct comprehensive experiments to compare a variety of tube and text representations and multimodal retrieval methods, and present a strong baseline in this task as well as demonstrate the efficacy of our tube representation and multimodal feature embedding technique. Finally, we demonstrate the versatility of our model by applying it to two other important tasks.
Tasks	Human Detection, Person Retrieval
Published	2017-04-26
URL	http://arxiv.org/abs/1704.07945v2
PDF	http://arxiv.org/pdf/1704.07945v2.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-person-retrieval-via-natural
Repo
Framework

Interspecies Knowledge Transfer for Facial Keypoint Detection


Title	Interspecies Knowledge Transfer for Facial Keypoint Detection
Authors	Maheen Rashid, Xiuye Gu, Yong Jae Lee
Abstract	We present a method for localizing facial keypoints on animals by transferring knowledge gained from human faces. Instead of directly finetuning a network trained to detect keypoints on human faces to animal faces (which is sub-optimal since human and animal faces can look quite different), we propose to first adapt the animal images to the pre-trained human detection network by correcting for the differences in animal and human face shape. We first find the nearest human neighbors for each animal image using an unsupervised shape matching method. We use these matches to train a thin plate spline warping network to warp each animal face to look more human-like. The warping network is then jointly finetuned with a pre-trained human facial keypoint detection network using an animal dataset. We demonstrate state-of-the-art results on both horse and sheep facial keypoint detection, and significant improvement over simple finetuning, especially when training data is scarce. Additionally, we present a new dataset with 3717 images with horse face and facial keypoint annotations.
Tasks	Human Detection, Keypoint Detection, Transfer Learning
Published	2017-04-13
URL	http://arxiv.org/abs/1704.04023v1
PDF	http://arxiv.org/pdf/1704.04023v1.pdf
PWC	https://paperswithcode.com/paper/interspecies-knowledge-transfer-for-facial
Repo
Framework

A Closed-Form Model for Image-Based Distant Lighting


Title	A Closed-Form Model for Image-Based Distant Lighting
Authors	Mais Alnasser, Hassan Foroosh
Abstract	In this paper, we present a new mathematical foundation for image-based lighting. Using a simple manipulation of the local coordinate system, we derive a closed-form solution to the light integral equation under distant environment illumination. We derive our solution for different BRDF’s such as lambertian and Phong-like. The method is free of noise, and provides the possibility of using the full spectrum of frequencies captured by images taken from the environment. This allows for the color of the rendered object to be toned according to the color of the light in the environment. Experimental results also show that one can gain an order of magnitude or higher in rendering time compared to Monte Carlo quadrature methods and spherical harmonics.
Tasks
Published	2017-05-14
URL	http://arxiv.org/abs/1705.04927v1
PDF	http://arxiv.org/pdf/1705.04927v1.pdf
PWC	https://paperswithcode.com/paper/a-closed-form-model-for-image-based-distant
Repo
Framework

Abnormal Event Detection in Videos using Generative Adversarial Nets


Title	Abnormal Event Detection in Videos using Generative Adversarial Nets
Authors	Mahdyar Ravanbakhsh, Moin Nabi, Enver Sangineto, Lucio Marcenaro, Carlo Regazzoni, Nicu Sebe
Abstract	In this paper we address the abnormality detection problem in crowded scenes. We propose to use Generative Adversarial Nets (GANs), which are trained using normal frames and corresponding optical-flow images in order to learn an internal representation of the scene normality. Since our GANs are trained with only normal data, they are not able to generate abnormal events. At testing time the real data are compared with both the appearance and the motion representations reconstructed by our GANs and abnormal areas are detected by computing local differences. Experimental results on challenging abnormality detection datasets show the superiority of the proposed method compared to the state of the art in both frame-level and pixel-level abnormality detection tasks.
Tasks	Abnormal Event Detection In Video, Anomaly Detection, Optical Flow Estimation
Published	2017-08-31
URL	http://arxiv.org/abs/1708.09644v1
PDF	http://arxiv.org/pdf/1708.09644v1.pdf
PWC	https://paperswithcode.com/paper/abnormal-event-detection-in-videos-using-1
Repo
Framework

Interpretable Vector AutoRegressions with Exogenous Time Series


Title	Interpretable Vector AutoRegressions with Exogenous Time Series
Authors	Ines Wilms, Sumanta Basu, Jacob Bien, David S. Matteson
Abstract	The Vector AutoRegressive (VAR) model is fundamental to the study of multivariate time series. Although VAR models are intensively investigated by many researchers, practitioners often show more interest in analyzing VARX models that incorporate the impact of unmodeled exogenous variables (X) into the VAR. However, since the parameter space grows quadratically with the number of time series, estimation quickly becomes challenging. While several proposals have been made to sparsely estimate large VAR models, the estimation of large VARX models is under-explored. Moreover, typically these sparse proposals involve a lasso-type penalty and do not incorporate lag selection into the estimation procedure. As a consequence, the resulting models may be difficult to interpret. In this paper, we propose a lag-based hierarchically sparse estimator, called “HVARX”, for large VARX models. We illustrate the usefulness of HVARX on a cross-category management marketing application. Our results show how it provides a highly interpretable model, and improves out-of-sample forecast accuracy compared to a lasso-type approach.
Tasks	Time Series
Published	2017-11-09
URL	http://arxiv.org/abs/1711.03623v1
PDF	http://arxiv.org/pdf/1711.03623v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-vector-autoregressions-with
Repo
Framework

Towards Accurate Multi-person Pose Estimation in the Wild


Title	Towards Accurate Multi-person Pose Estimation in the Wild
Authors	George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy
Abstract	We propose a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task. It is a simple, yet powerful, top-down approach consisting of two stages. In the first stage, we predict the location and scale of boxes which are likely to contain people; for this we use the Faster RCNN detector. In the second stage, we estimate the keypoints of the person potentially contained in each proposed bounding box. For each keypoint type we predict dense heatmaps and offsets using a fully convolutional ResNet. To combine these outputs we introduce a novel aggregation procedure to obtain highly localized keypoint predictions. We also use a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and a novel form of keypoint-based confidence score estimation, instead of box-level scoring. Trained on COCO data alone, our final system achieves average precision of 0.649 on the COCO test-dev set and the 0.643 test-standard sets, outperforming the winner of the 2016 COCO keypoints challenge and other recent state-of-art. Further, by using additional in-house labeled data we obtain an even higher average precision of 0.685 on the test-dev set and 0.673 on the test-standard set, more than 5% absolute improvement compared to the previous best performing method on the same dataset.
Tasks	Human Detection, Multi-Person Pose Estimation, Pose Estimation
Published	2017-01-06
URL	http://arxiv.org/abs/1701.01779v2
PDF	http://arxiv.org/pdf/1701.01779v2.pdf
PWC	https://paperswithcode.com/paper/towards-accurate-multi-person-pose-estimation
Repo
Framework

Variance-Reduced Stochastic Learning by Networked Agents under Random Reshuffling


Title	Variance-Reduced Stochastic Learning by Networked Agents under Random Reshuffling
Authors	Kun Yuan, Bicheng Ying, Jiageng Liu, Ali H. Sayed
Abstract	A new amortized variance-reduced gradient (AVRG) algorithm was developed in \cite{ying2017convergence}, which has constant storage requirement in comparison to SAGA and balanced gradient computations in comparison to SVRG. One key advantage of the AVRG strategy is its amenability to decentralized implementations. In this work, we show how AVRG can be extended to the network case where multiple learning agents are assumed to be connected by a graph topology. In this scenario, each agent observes data that is spatially distributed and all agents are only allowed to communicate with direct neighbors. Moreover, the amount of data observed by the individual agents may differ drastically. For such situations, the balanced gradient computation property of AVRG becomes a real advantage in reducing idle time caused by unbalanced local data storage requirements, which is characteristic of other reduced-variance gradient algorithms. The resulting diffusion-AVRG algorithm is shown to have linear convergence to the exact solution, and is much more memory efficient than other alternative algorithms. In addition, we propose a mini-batch strategy to balance the communication and computation efficiency for diffusion-AVRG. When a proper batch size is employed, it is observed in simulations that diffusion-AVRG is more computationally efficient than exact diffusion or EXTRA while maintaining almost the same communication efficiency.
Tasks
Published	2017-08-04
URL	http://arxiv.org/abs/1708.01384v3
PDF	http://arxiv.org/pdf/1708.01384v3.pdf
PWC	https://paperswithcode.com/paper/variance-reduced-stochastic-learning-by
Repo
Framework

On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation


Title	On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation
Authors	Vamsi K Ithapu, Sathya N Ravi, Vikas Singh
Abstract	We study mechanisms to characterize how the asymptotic convergence of backpropagation in deep architectures, in general, is related to the network structure, and how it may be influenced by other design choices including activation type, denoising and dropout rate. We seek to analyze whether network architecture and input data statistics may guide the choices of learning parameters and vice versa. Given the broad applicability of deep architectures, this issue is interesting both from theoretical and a practical standpoint. Using properties of general nonconvex objectives (with first-order information), we first build the association between structural, distributional and learnability aspects of the network vis-`a-vis their interaction with parameter convergence rates. We identify a nice relationship between feature denoising and dropout, and construct families of networks that achieve the same level of convergence. We then derive a workflow that provides systematic guidance regarding the choice of network sizes and learning parameters often mediated4 by input statistics. Our technical results are corroborated by an extensive set of evaluations, presented in this paper as well as independent empirical observations reported by other groups. We also perform experiments showing the practical implications of our framework for choosing the best fully-connected design for a given problem.
Tasks	Denoising
Published	2017-02-28
URL	http://arxiv.org/abs/1702.08670v1
PDF	http://arxiv.org/pdf/1702.08670v1.pdf
PWC	https://paperswithcode.com/paper/on-architectural-choices-in-deep-learning
Repo
Framework

Optimization by gradient boosting


Title	Optimization by gradient boosting
Authors	Gérard Biau, Benoît Cadre
Abstract	Gradient boosting is a state-of-the-art prediction technique that sequentially produces a model in the form of linear combinations of simple predictors—typically decision trees—by solving an infinite-dimensional convex optimization problem. We provide in the present paper a thorough analysis of two widespread versions of gradient boosting, and introduce a general framework for studying these algorithms from the point of view of functional optimization. We prove their convergence as the number of iterations tends to infinity and highlight the importance of having a strongly convex risk functional to minimize. We also present a reasonable statistical context ensuring consistency properties of the boosting predictors as the sample size grows. In our approach, the optimization procedures are run forever (that is, without resorting to an early stopping strategy), and statistical regularization is basically achieved via an appropriate $L^2$ penalization of the loss and strong convexity arguments.
Tasks
Published	2017-07-17
URL	http://arxiv.org/abs/1707.05023v1
PDF	http://arxiv.org/pdf/1707.05023v1.pdf
PWC	https://paperswithcode.com/paper/optimization-by-gradient-boosting
Repo
Framework

Stacked Kernel Network


Title	Stacked Kernel Network
Authors	Shuai Zhang, Jianxin Li, Pengtao Xie, Yingchun Zhang, Minglai Shao, Haoyi Zhou, Mengyi Yan
Abstract	Kernel methods are powerful tools to capture nonlinear patterns behind data. They implicitly learn high (even infinite) dimensional nonlinear features in the Reproducing Kernel Hilbert Space (RKHS) while making the computation tractable by leveraging the kernel trick. Classic kernel methods learn a single layer of nonlinear features, whose representational power may be limited. Motivated by recent success of deep neural networks (DNNs) that learn multi-layer hierarchical representations, we propose a Stacked Kernel Network (SKN) that learns a hierarchy of RKHS-based nonlinear features. SKN interleaves several layers of nonlinear transformations (from a linear space to a RKHS) and linear transformations (from a RKHS to a linear space). Similar to DNNs, a SKN is composed of multiple layers of hidden units, but each parameterized by a RKHS function rather than a finite-dimensional vector. We propose three ways to represent the RKHS functions in SKN: (1)nonparametric representation, (2)parametric representation and (3)random Fourier feature representation. Furthermore, we expand SKN into CNN architecture called Stacked Kernel Convolutional Network (SKCN). SKCN learning a hierarchy of RKHS-based nonlinear features by convolutional operation with each filter also parameterized by a RKHS function rather than a finite-dimensional matrix in CNN, which is suitable for image inputs. Experiments on various datasets demonstrate the effectiveness of SKN and SKCN, which outperform the competitive methods.
Tasks
Published	2017-11-25
URL	http://arxiv.org/abs/1711.09219v1
PDF	http://arxiv.org/pdf/1711.09219v1.pdf
PWC	https://paperswithcode.com/paper/stacked-kernel-network
Repo
Framework

Flexible and Creative Chinese Poetry Generation Using Neural Memory


Title	Flexible and Creative Chinese Poetry Generation Using Neural Memory
Authors	Jiyuan Zhang, Yang Feng, Dong Wang, Yang Wang, Andrew Abel, Shiyue Zhang, Andi Zhang
Abstract	It has been shown that Chinese poems can be successfully generated by sequence-to-sequence neural models, particularly with the attention mechanism. A potential problem of this approach, however, is that neural models can only learn abstract rules, while poem generation is a highly creative process that involves not only rules but also innovations for which pure statistical models are not appropriate in principle. This work proposes a memory-augmented neural model for Chinese poem generation, where the neural model and the augmented memory work together to balance the requirements of linguistic accordance and aesthetic innovation, leading to innovative generations that are still rule-compliant. In addition, it is found that the memory mechanism provides interesting flexibility that can be used to generate poems with different styles.
Tasks
Published	2017-05-10
URL	http://arxiv.org/abs/1705.03773v1
PDF	http://arxiv.org/pdf/1705.03773v1.pdf
PWC	https://paperswithcode.com/paper/flexible-and-creative-chinese-poetry
Repo
Framework

Are crossing dependencies really scarce?


Title	Are crossing dependencies really scarce?
Authors	Ramon Ferrer-i-Cancho, Carlos Gomez-Rodriguez, J. L. Esteban
Abstract	The syntactic structure of a sentence can be modelled as a tree, where vertices correspond to words and edges indicate syntactic dependencies. It has been claimed recurrently that the number of edge crossings in real sentences is small. However, a baseline or null hypothesis has been lacking. Here we quantify the amount of crossings of real sentences and compare it to the predictions of a series of baselines. We conclude that crossings are really scarce in real sentences. Their scarcity is unexpected by the hubiness of the trees. Indeed, real sentences are close to linear trees, where the potential number of crossings is maximized.
Tasks
Published	2017-03-24
URL	http://arxiv.org/abs/1703.08324v2
PDF	http://arxiv.org/pdf/1703.08324v2.pdf
PWC	https://paperswithcode.com/paper/are-crossing-dependencies-really-scarce
Repo
Framework

Distributed Bundle Adjustment


Title	Distributed Bundle Adjustment
Authors	Karthikeyan Natesan Ramamurthy, Chung-Ching Lin, Aleksandr Aravkin, Sharath Pankanti, Raphael Viguier
Abstract	Most methods for Bundle Adjustment (BA) in computer vision are either centralized or operate incrementally. This leads to poor scaling and affects the quality of solution as the number of images grows in large scale structure from motion (SfM). Furthermore, they cannot be used in scenarios where image acquisition and processing must be distributed. We address this problem with a new distributed BA algorithm. Our distributed formulation uses alternating direction method of multipliers (ADMM), and, since each processor sees only a small portion of the data, we show that robust formulations improve performance. We analyze convergence of the proposed algorithm, and illustrate numerical performance, accuracy of the parameter estimates, and scalability of the distributed implementation in the context of synthetic 3D datasets with known camera position and orientation ground truth. The results are comparable to an alternate state-of-the-art centralized bundle adjustment algorithm on synthetic and real 3D reconstruction problems. The runtime of our implementation scales linearly with the number of observed points.
Tasks	3D Reconstruction
Published	2017-08-26
URL	http://arxiv.org/abs/1708.07954v1
PDF	http://arxiv.org/pdf/1708.07954v1.pdf
PWC	https://paperswithcode.com/paper/distributed-bundle-adjustment
Repo
Framework