July 27, 2019

3254 words 16 mins read

Paper Group ANR 540

Coresets for Dependency Networks. Sum-Product Networks for Hybrid Domains. Learning to Fly by Crashing. Jaccard analysis and LASSO-based feature selection for location fingerprinting with limited computational complexity. Facial Dynamics Interpreter Network: What are the Important Relations between Local Dynamics for Facial Trait Estimation?. Struc …

Coresets for Dependency Networks


Title	Coresets for Dependency Networks
Authors	Alejandro Molina, Alexander Munteanu, Kristian Kersting
Abstract	Many applications infer the structure of a probabilistic graphical model from data to elucidate the relationships between variables. But how can we train graphical models on a massive data set? In this paper, we show how to construct coresets -compressed data sets which can be used as proxy for the original data and have provably bounded worst case error- for Gaussian dependency networks (DNs), i.e., cyclic directed graphical models over Gaussians, where the parents of each variable are its Markov blanket. Specifically, we prove that Gaussian DNs admit coresets of size independent of the size of the data set. Unfortunately, this does not extend to DNs over members of the exponential family in general. As we will prove, Poisson DNs do not admit small coresets. Despite this worst-case result, we will provide an argument why our coreset construction for DNs can still work well in practice on count data. To corroborate our theoretical results, we empirically evaluated the resulting Core DNs on real data sets. The results
Tasks
Published	2017-10-09
URL	http://arxiv.org/abs/1710.03285v2
PDF	http://arxiv.org/pdf/1710.03285v2.pdf
PWC	https://paperswithcode.com/paper/coresets-for-dependency-networks
Repo
Framework

Sum-Product Networks for Hybrid Domains


Title	Sum-Product Networks for Hybrid Domains
Authors	Alejandro Molina, Antonio Vergari, Nicola Di Mauro, Sriraam Natarajan, Floriana Esposito, Kristian Kersting
Abstract	While all kinds of mixed data -from personal data, over panel and scientific data, to public and commercial data- are collected and stored, building probabilistic graphical models for these hybrid domains becomes more difficult. Users spend significant amounts of time in identifying the parametric form of the random variables (Gaussian, Poisson, Logit, etc.) involved and learning the mixed models. To make this difficult task easier, we propose the first trainable probabilistic deep architecture for hybrid domains that features tractable queries. It is based on Sum-Product Networks (SPNs) with piecewise polynomial leave distributions together with novel nonparametric decomposition and conditioning steps using the Hirschfeld-Gebelein-R'enyi Maximum Correlation Coefficient. This relieves the user from deciding a-priori the parametric form of the random variables but is still expressive enough to effectively approximate any continuous distribution and permits efficient learning and inference. Our empirical evidence shows that the architecture, called Mixed SPNs, can indeed capture complex distributions across a wide range of hybrid domains.
Tasks
Published	2017-10-09
URL	http://arxiv.org/abs/1710.03297v3
PDF	http://arxiv.org/pdf/1710.03297v3.pdf
PWC	https://paperswithcode.com/paper/sum-product-networks-for-hybrid-domains
Repo
Framework

Learning to Fly by Crashing


Title	Learning to Fly by Crashing
Authors	Dhiraj Gandhi, Lerrel Pinto, Abhinav Gupta
Abstract	How do you learn to navigate an Unmanned Aerial Vehicle (UAV) and avoid obstacles? One approach is to use a small dataset collected by human experts: however, high capacity learning algorithms tend to overfit when trained with little data. An alternative is to use simulation. But the gap between simulation and real world remains large especially for perception problems. The reason most research avoids using large-scale real data is the fear of crashes! In this paper, we propose to bite the bullet and collect a dataset of crashes itself! We build a drone whose sole purpose is to crash into objects: it samples naive trajectories and crashes into random objects. We crash our drone 11,500 times to create one of the biggest UAV crash dataset. This dataset captures the different ways in which a UAV can crash. We use all this negative flying data in conjunction with positive data sampled from the same trajectories to learn a simple yet powerful policy for UAV navigation. We show that this simple self-supervised model is quite effective in navigating the UAV even in extremely cluttered environments with dynamic obstacles including humans. For supplementary video see: https://youtu.be/u151hJaGKUo
Tasks
Published	2017-04-19
URL	http://arxiv.org/abs/1704.05588v2
PDF	http://arxiv.org/pdf/1704.05588v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-fly-by-crashing
Repo
Framework

Jaccard analysis and LASSO-based feature selection for location fingerprinting with limited computational complexity


Title	Jaccard analysis and LASSO-based feature selection for location fingerprinting with limited computational complexity
Authors	Caifa Zhou, Andreas Wieser
Abstract	We propose an approach to reduce both computational complexity and data storage requirements for the online positioning stage of a fingerprinting-based indoor positioning system (FIPS) by introducing segmentation of the region of interest (RoI) into sub-regions, sub-region selection using a modified Jaccard index, and feature selection based on randomized least absolute shrinkage and selection operator (LASSO). We implement these steps into a Bayesian framework of position estimation using the maximum a posteriori (MAP) principle. An additional benefit of these steps is that the time for estimating the position, and the required data storage are virtually independent of the size of the RoI and of the total number of available features within the RoI. Thus the proposed steps facilitate application of FIPS to large areas. Results of an experimental analysis using real data collected in an office building using a Nexus 6P smart phone as user device and a total station for providing position ground truth corroborate the expected performance of the proposed approach. The positioning accuracy obtained by only processing 10 automatically identified features instead of all available ones and limiting position estimation to 10 automatically identified sub-regions instead of the entire RoI is equivalent to processing all available data. In the chosen example, 50% of the errors are less than 1.8 m and 90% are less than 5 m. However, the computation time using the automatically identified subset of data is only about 1% of that required for processing the entire data set.
Tasks	Feature Selection
Published	2017-11-21
URL	http://arxiv.org/abs/1711.07812v1
PDF	http://arxiv.org/pdf/1711.07812v1.pdf
PWC	https://paperswithcode.com/paper/jaccard-analysis-and-lasso-based-feature
Repo
Framework

Facial Dynamics Interpreter Network: What are the Important Relations between Local Dynamics for Facial Trait Estimation?


Title	Facial Dynamics Interpreter Network: What are the Important Relations between Local Dynamics for Facial Trait Estimation?
Authors	Seong Tae Kim, Yong Man Ro
Abstract	Human face analysis is an important task in computer vision. According to cognitive-psychological studies, facial dynamics could provide crucial cues for face analysis. The motion of a facial local region in facial expression is related to the motion of other facial local regions. In this paper, a novel deep learning approach, named facial dynamics interpreter network, has been proposed to interpret the important relations between local dynamics for estimating facial traits from expression sequence. The facial dynamics interpreter network is designed to be able to encode a relational importance, which is used for interpreting the relation between facial local dynamics and estimating facial traits. By comparative experiments, the effectiveness of the proposed method has been verified. The important relations between facial local dynamics are investigated by the proposed facial dynamics interpreter network in gender classification and age estimation. Moreover, experimental results show that the proposed method outperforms the state-of-the-art methods in gender classification and age estimation.
Tasks	Age Estimation
Published	2017-11-29
URL	http://arxiv.org/abs/1711.10688v2
PDF	http://arxiv.org/pdf/1711.10688v2.pdf
PWC	https://paperswithcode.com/paper/facial-dynamics-interpreter-network-what-are
Repo
Framework

Structural Damage Identification Using Artificial Neural Network and Synthetic data


Title	Structural Damage Identification Using Artificial Neural Network and Synthetic data
Authors	Divya Shyam Singha, G. B. L. Chowdarya, D Roy Mahapatraa
Abstract	This paper presents real-time vibration based identification technique using measured frequency response functions(FRFs) under random vibration loading. Artificial Neural Networks (ANNs) are trained to map damage fingerprints to damage characteristic parameters. Principal component statistical analysis(PCA) technique was used to tackle the problem of high dimensionality and high noise of data, which is common for industrial structures. The present study considers Crack, Rivet hole expansion and redundant uniform mass as damages on the structure. Frequency response function data after being reduced in size using PCA is fed to individual neural networks to localize and predict the severity of damage on the structure. The system of ANNs trained with both numerical and experimental model data to make the system reliable and robust. The methodology is applied to a numerical model of stiffened panel structure, where damages are confined close to the stiffener. The results showed that, in all the cases considered, it is possible to localize and predict severity of the damage occurrence with very good accuracy and reliability.
Tasks
Published	2017-03-27
URL	http://arxiv.org/abs/1703.09651v1
PDF	http://arxiv.org/pdf/1703.09651v1.pdf
PWC	https://paperswithcode.com/paper/structural-damage-identification-using
Repo
Framework

Human Detection for Night Surveillance using Adaptive Background Subtracted Image


Title	Human Detection for Night Surveillance using Adaptive Background Subtracted Image
Authors	Yash Khandhediya, Karishma Sav, Vandit Gajjar
Abstract	Surveillance based on Computer Vision has become a major necessity in current era. Most of the surveillance systems operate on visible light imaging, but performance based on visible light imaging is limited due to some factors like variation in light intensity during the daytime. The matter of concern lies in the need for processing images in low light, such as in the need of nighttime surveillance. In this paper, we have proposed a novel approach for human detection using FLIR(Forward Looking Infrared) camera. As the principle involves sensing based on thermal radiation in the Near IR Region, it is possible to detect Humans from an image captured using a FLIR camera even in low light. The proposed method for human detection involves processing of Thermal images by using HOG (Histogram of Oriented Gradients) feature extraction technique along with some enhancements. The principle of the proposed technique lies in an adaptive background subtraction algorithm, which works in association with the HOG technique. By means of this method, we are able to reduce execution time, precision and some other parameters, which result in improvement of overall accuracy of the human detection system.
Tasks	Human Detection
Published	2017-09-27
URL	http://arxiv.org/abs/1709.09389v1
PDF	http://arxiv.org/pdf/1709.09389v1.pdf
PWC	https://paperswithcode.com/paper/human-detection-for-night-surveillance-using
Repo
Framework

Human Detection and Tracking for Video Surveillance A Cognitive Science Approach


Title	Human Detection and Tracking for Video Surveillance A Cognitive Science Approach
Authors	Vandit Gajjar, Ayesha Gurnani, Yash Khandhediya
Abstract	With crimes on the rise all around the world, video surveillance is becoming more important day by day. Due to the lack of human resources to monitor this increasing number of cameras manually new computer vision algorithms to perform lower and higher level tasks are being developed. We have developed a new method incorporating the most acclaimed Histograms of Oriented Gradients the theory of Visual Saliency and the saliency prediction model Deep Multi Level Network to detect human beings in video sequences. Furthermore we implemented the k Means algorithm to cluster the HOG feature vectors of the positively detected windows and determined the path followed by a person in the video. We achieved a detection precision of 83.11% and a recall of 41.27%. We obtained these results 76.866 times faster than classification on normal images.
Tasks	Human Detection, Saliency Prediction
Published	2017-09-03
URL	http://arxiv.org/abs/1709.00726v1
PDF	http://arxiv.org/pdf/1709.00726v1.pdf
PWC	https://paperswithcode.com/paper/human-detection-and-tracking-for-video
Repo
Framework

Joint Multi-Person Pose Estimation and Semantic Part Segmentation


Title	Joint Multi-Person Pose Estimation and Semantic Part Segmentation
Authors	Fangting Xia, Peng Wang, Xianjie Chen, Alan Yuille
Abstract	Human pose estimation and semantic part segmentation are two complementary tasks in computer vision. In this paper, we propose to solve the two tasks jointly for natural multi-person images, in which the estimated pose provides object-level shape prior to regularize part segments while the part-level segments constrain the variation of pose locations. Specifically, we first train two fully convolutional neural networks (FCNs), namely Pose FCN and Part FCN, to provide initial estimation of pose joint potential and semantic part potential. Then, to refine pose joint location, the two types of potentials are fused with a fully-connected conditional random field (FCRF), where a novel segment-joint smoothness term is used to encourage semantic and spatial consistency between parts and joints. To refine part segments, the refined pose and the original part potential are integrated through a Part FCN, where the skeleton feature from pose serves as additional regularization cues for part segments. Finally, to reduce the complexity of the FCRF, we induce human detection boxes and infer the graph inside each box, making the inference forty times faster. Since there’s no dataset that contains both part segments and pose labels, we extend the PASCAL VOC part dataset with human pose joints and perform extensive experiments to compare our method against several most recent strategies. We show that on this dataset our algorithm surpasses competing methods by a large margin in both tasks.
Tasks	Human Detection, Multi-Person Pose Estimation, Pose Estimation
Published	2017-08-10
URL	http://arxiv.org/abs/1708.03383v1
PDF	http://arxiv.org/pdf/1708.03383v1.pdf
PWC	https://paperswithcode.com/paper/joint-multi-person-pose-estimation-and
Repo
Framework

Vision-Based Fallen Person Detection for the Elderly


Title	Vision-Based Fallen Person Detection for the Elderly
Authors	Markus D. Solbach, John K. Tsotsos
Abstract	Falls are serious and costly for elderly people. The Centers for Disease Control and Prevention of the US reports that millions of older people, 65 and older, fall each year at least once. Serious injuries such as; hip fractures, broken bones or head injury, are caused by 20% of the falls. The time it takes to respond and treat a fallen person is crucial. With this paper we present a new , non-invasive system for fallen people detection. Our approach uses only stereo camera data for passively sensing the environment. The key novelty is a human fall detector which uses a CNN based human pose estimator in combination with stereo data to reconstruct the human pose in 3D and estimate the ground plane in 3D. Furthermore, our system consists of a reasoning module which formulates a number of measures to reason whether a person is fallen. We have tested our approach in different scenarios covering most activities elderly people might encounter living at home. Based on our extensive evaluations, our systems shows high accuracy and almost no miss-classification. To reproduce our results, the implementation is publicly available to the scientific community.
Tasks	Human Detection
Published	2017-07-24
URL	http://arxiv.org/abs/1707.07608v2
PDF	http://arxiv.org/pdf/1707.07608v2.pdf
PWC	https://paperswithcode.com/paper/vision-based-fallen-person-detection-for-the
Repo
Framework

End-to-end Flow Correlation Tracking with Spatial-temporal Attention


Title	End-to-end Flow Correlation Tracking with Spatial-temporal Attention
Authors	Zheng Zhu, Wei Wu, Wei Zou, Junjie Yan
Abstract	Discriminative correlation filters (DCF) with deep convolutional features have achieved favorable performance in recent tracking benchmarks. However, most of existing DCF trackers only consider appearance features of current frame, and hardly benefit from motion and inter-frame information. The lack of temporal information degrades the tracking performance during challenges such as partial occlusion and deformation. In this work, we focus on making use of the rich flow information in consecutive frames to improve the feature representation and the tracking accuracy. Firstly, individual components, including optical flow estimation, feature extraction, aggregation and correlation filter tracking are formulated as special layers in network. To the best of our knowledge, this is the first work to jointly train flow and tracking task in a deep learning framework. Then the historical feature maps at predefined intervals are warped and aggregated with current ones by the guiding of flow. For adaptive aggregation, we propose a novel spatial-temporal attention mechanism. Extensive experiments are performed on four challenging tracking datasets: OTB2013, OTB2015, VOT2015 and VOT2016, and the proposed method achieves superior results on these benchmarks.
Tasks	Optical Flow Estimation
Published	2017-11-03
URL	http://arxiv.org/abs/1711.01124v4
PDF	http://arxiv.org/pdf/1711.01124v4.pdf
PWC	https://paperswithcode.com/paper/end-to-end-flow-correlation-tracking-with
Repo
Framework

Comparing Apples and Oranges: Off-Road Pedestrian Detection on the NREC Agricultural Person-Detection Dataset


Title	Comparing Apples and Oranges: Off-Road Pedestrian Detection on the NREC Agricultural Person-Detection Dataset
Authors	Zachary Pezzementi, Trenton Tabor, Peiyun Hu, Jonathan K. Chang, Deva Ramanan, Carl Wellington, Benzun P. Wisely Babu, Herman Herman
Abstract	Person detection from vehicles has made rapid progress recently with the advent of multiple highquality datasets of urban and highway driving, yet no large-scale benchmark is available for the same problem in off-road or agricultural environments. Here we present the NREC Agricultural Person-Detection Dataset to spur research in these environments. It consists of labeled stereo video of people in orange and apple orchards taken from two perception platforms (a tractor and a pickup truck), along with vehicle position data from RTK GPS. We define a benchmark on part of the dataset that combines a total of 76k labeled person images and 19k sampled person-free images. The dataset highlights several key challenges of the domain, including varying environment, substantial occlusion by vegetation, people in motion and in non-standard poses, and people seen from a variety of distances; meta-data are included to allow targeted evaluation of each of these effects. Finally, we present baseline detection performance results for three leading approaches from urban pedestrian detection and our own convolutional neural network approach that benefits from the incorporation of additional image context. We show that the success of existing approaches on urban data does not transfer directly to this domain.
Tasks	Human Detection, Pedestrian Detection
Published	2017-07-22
URL	http://arxiv.org/abs/1707.07169v2
PDF	http://arxiv.org/pdf/1707.07169v2.pdf
PWC	https://paperswithcode.com/paper/comparing-apples-and-oranges-off-road
Repo
Framework

Face Aging With Conditional Generative Adversarial Networks


Title	Face Aging With Conditional Generative Adversarial Networks
Authors	Grigory Antipov, Moez Baccouche, Jean-Luc Dugelay
Abstract	It has been recently shown that Generative Adversarial Networks (GANs) can produce synthetic images of exceptional visual fidelity. In this work, we propose the GAN-based method for automatic face aging. Contrary to previous works employing GANs for altering of facial attributes, we make a particular emphasize on preserving the original person’s identity in the aged version of his/her face. To this end, we introduce a novel approach for “Identity-Preserving” optimization of GAN’s latent vectors. The objective evaluation of the resulting aged and rejuvenated face images by the state-of-the-art face recognition and age estimation solutions demonstrate the high potential of the proposed method.
Tasks	Age Estimation, Face Recognition
Published	2017-02-07
URL	http://arxiv.org/abs/1702.01983v2
PDF	http://arxiv.org/pdf/1702.01983v2.pdf
PWC	https://paperswithcode.com/paper/face-aging-with-conditional-generative
Repo
Framework

Take it in your stride: Do we need striding in CNNs?


Title	Take it in your stride: Do we need striding in CNNs?
Authors	Chen Kong, Simon Lucey
Abstract	Since their inception, CNNs have utilized some type of striding operator to reduce the overlap of receptive fields and spatial dimensions. Although having clear heuristic motivations (i.e. lowering the number of parameters to learn) the mathematical role of striding within CNN learning remains unclear. This paper offers a novel and mathematical rigorous perspective on the role of the striding operator within modern CNNs. Specifically, we demonstrate theoretically that one can always represent a CNN that incorporates striding with an equivalent non-striding CNN which has more filters and smaller size. Through this equivalence we are then able to characterize striding as an additional mechanism for parameter sharing among channels, thus reducing training complexity. Finally, the framework presented in this paper offers a new mathematical perspective on the role of striding which we hope shall facilitate and simplify the future theoretical analysis of CNNs.
Tasks
Published	2017-12-07
URL	http://arxiv.org/abs/1712.02502v1
PDF	http://arxiv.org/pdf/1712.02502v1.pdf
PWC	https://paperswithcode.com/paper/take-it-in-your-stride-do-we-need-striding-in
Repo
Framework

The shortest way to visit all metro lines in a city


Title	The shortest way to visit all metro lines in a city
Authors	Florian Sikora
Abstract	What if ${$a tourist, a train addict, Dr. Sheldon Cooper, somebody who likes to waste time$}$ wants to visit all metro lines or carriages in a given network in a minimum number of steps? We study this problem with an application to the metro network of Paris and Tokyo, proposing optimal solutions thanks to mathematical programming tools. Quite surprisingly, it appears that you can visit all 16 Parisian metro lines in only 26 steps (we denote by a step the act of taking the metro from one station to an adjacent one). Perhaps even more surprisingly, adding the 5 RER lines to these 16 lines does not increase the size of the best solution. It is also possible to visit the 13 lines of (the dense network of) Tokyo with only 15 steps.
Tasks
Published	2017-09-13
URL	http://arxiv.org/abs/1709.05948v3
PDF	http://arxiv.org/pdf/1709.05948v3.pdf
PWC	https://paperswithcode.com/paper/the-shortest-way-to-visit-all-metro-lines-in
Repo
Framework