January 29, 2020

3710 words 18 mins read

Paper Group ANR 703

VICSOM: VIsual Clues from SOcial Media for psychological assessment. A Context-and-Spatial Aware Network for Multi-Person Pose Estimation. Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM. Exactly Sparse Gaussian Variational Inference with Application to Derivative-Free Batch Nonlinear State Estimation. KarNet: An Eff …


Title	VICSOM: VIsual Clues from SOcial Media for psychological assessment
Authors	Mohammad Mahdi Dehshibi, Gerard Pons, Bita Baiani, David Masip
Abstract	Sharing multimodal information (typically images, videos or text) in Social Network Sites (SNS) occupies a relevant part of our time. The particular way how users expose themselves in SNS can provide useful information to infer human behaviors. This paper proposes to use multimodal data gathered from Instagram accounts to predict the perceived prototypical needs described in Glasser’s choice theory. The contribution is two-fold: (i) we provide a large multimodal database from Instagram public profiles (more than 30,000 images and text captions) annotated by expert Psychologists on each perceived behavior according to Glasser’s theory, and (ii) we propose to automate the recognition of the (unconsciously) perceived needs by the users. Particularly, we propose a baseline using three different feature sets: visual descriptors based on pixel images (SURF and Visual Bag of Words), a high-level descriptor based on the automated scene description using Convolutional Neural Networks, and a text-based descriptor (Word2vec) obtained from processing the captions provided by the users. Finally, we propose a multimodal fusion of these descriptors obtaining promising results in the multi-label classification problem.
Tasks	Multi-Label Classification
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06203v1
PDF	https://arxiv.org/pdf/1905.06203v1.pdf
PWC	https://paperswithcode.com/paper/vicsom-visual-clues-from-social-media-for
Repo
Framework

A Context-and-Spatial Aware Network for Multi-Person Pose Estimation


Title	A Context-and-Spatial Aware Network for Multi-Person Pose Estimation
Authors	Dongdong Yu, Kai Su, Xin Geng, Changhu Wang
Abstract	Multi-person pose estimation is a fundamental yet challenging task in computer vision. Both rich context information and spatial information are required to precisely locate the keypoints for all persons in an image. In this paper, a novel Context-and-Spatial Aware Network (CSANet), which integrates both a Context Aware Path and Spatial Aware Path, is proposed to obtain effective features involving both context information and spatial information. Specifically, we design a Context Aware Path with structure supervision strategy and spatial pyramid pooling strategy to enhance the context information. Meanwhile, a Spatial Aware Path is proposed to preserve the spatial information, which also shortens the information propagation path from low-level features to high-level features. On top of these two paths, we employ a Heavy Head Path to further combine and enhance the features effectively. Experimentally, our proposed network outperforms state-of-the-art methods on the COCO keypoint benchmark, which verifies the effectiveness of our method and further corroborates the above proposition.
Tasks	Multi-Person Pose Estimation, Pose Estimation
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05355v1
PDF	https://arxiv.org/pdf/1905.05355v1.pdf
PWC	https://paperswithcode.com/paper/a-context-and-spatial-aware-network-for-multi
Repo
Framework

Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM


Title	Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM
Authors	Xuesong Shi, Dongjiang Li, Pengpeng Zhao, Qinbin Tian, Yuxin Tian, Qiwei Long, Chunhao Zhu, Jingwei Song, Fei Qiao, Le Song, Yangquan Guo, Zhigang Wang, Yimin Zhang, Baoxing Qin, Wei Yang, Fangshi Wang, Rosa H. M. Chan, Qi She
Abstract	Service robots should be able to operate autonomously in dynamic and daily changing environments over an extended period of time. While Simultaneous Localization And Mapping (SLAM) is one of the most fundamental problems for robotic autonomy, most existing SLAM works are evaluated with data sequences that are recorded in a short period of time. In real-world deployment, there can be out-of-sight scene changes caused by both natural factors and human activities. For example, in home scenarios, most objects may be movable, replaceable or deformable, and the visual features of the same place may be significantly different in some successive days. Such out-of-sight dynamics pose great challenges to the robustness of pose estimation, and hence a robot’s long-term deployment and operation. To differentiate the forementioned problem from the conventional works which are usually evaluated in a static setting in a single run, the term \textit{lifelong SLAM} is used here to address SLAM problems in an ever-changing environment over a long period of time. To accelerate lifelong SLAM research, we release the OpenLORIS-Scene datasets. The data are collected in real-world indoor scenes, for multiple times in each place to include scene changes in real life. We also design benchmarking metrics for lifelong SLAM, with which the robustness and accuracy of pose estimation are evaluated separately. The datasets and benchmark are available online at https://lifelong-robotic-vision.github.io/dataset/scene.
Tasks	Pose Estimation, Simultaneous Localization and Mapping
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05603v2
PDF	https://arxiv.org/pdf/1911.05603v2.pdf
PWC	https://paperswithcode.com/paper/are-we-ready-for-service-robots-the-openloris
Repo
Framework

Exactly Sparse Gaussian Variational Inference with Application to Derivative-Free Batch Nonlinear State Estimation


Title	Exactly Sparse Gaussian Variational Inference with Application to Derivative-Free Batch Nonlinear State Estimation
Authors	Timothy D. Barfoot, James R. Forbes, David Yoon
Abstract	We present a Gaussian Variational Inference (GVI) technique that can be applied to large-scale nonlinear batch state estimation problems. The main contribution is to show how to fit the best Gaussian to the posterior efficiently by exploiting factorization of the joint likelihood of the state and data, as is common in practical problems. The proposed Exactly Sparse Gaussian Variational Inference (ESGVI) technique stores the inverse covariance matrix, which is typically very sparse (e.g., block-tridiagonal for classic state estimation). We show that the only blocks of the (dense) covariance matrix that are required during the calculations correspond to the non-zero blocks of the inverse covariance matrix, and further show how to calculate these blocks efficiently in the general GVI problem. ESGVI operates iteratively, and while we can use analytical derivatives at each iteration, Gaussian cubature can be substituted, thereby producing an efficient derivative-free batch formulation. ESGVI simplifies to precisely the Rauch-Tung-Striebel (RTS) smoother in the batch linear estimation case, but goes beyond the ‘extended’ RTS smoother in the nonlinear case since it finds the best-fit Gaussian, not the Maximum A Posteriori (MAP) point solution. We demonstrate the technique on controlled simulation problems and a batch nonlinear Simultaneous Localization and Mapping (SLAM) problem with an experimental dataset.
Tasks	Simultaneous Localization and Mapping
Published	2019-11-09
URL	https://arxiv.org/abs/1911.08333v1
PDF	https://arxiv.org/pdf/1911.08333v1.pdf
PWC	https://paperswithcode.com/paper/exactly-sparse-gaussian-variational-inference
Repo
Framework

KarNet: An Efficient Boolean Function Simplifier


Title	KarNet: An Efficient Boolean Function Simplifier
Authors	Shanka Subhra Mondal, Abhilash Nandy, Ritesh Agrawal, Debashis Sen
Abstract	Many approaches such as Quine-McCluskey algorithm, Karnaugh map solving, Petrick’s method and McBoole’s method have been devised to simplify Boolean expressions in order to optimize hardware implementation of digital circuits. However, the algorithmic implementations of these methods are hard-coded and also their computation time is proportional to the number of minterms involved in the expression. In this paper, we propose KarNet, where the ability of Convolutional Neural Networks to model relationships between various cell locations and values by capturing spatial dependencies is exploited to solve Karnaugh maps. In order to do so, a Karnaugh map is represented as an image signal, where each cell is considered as a pixel. Experimental results show that the computation time of KarNet is independent of the number of minterms and is of the order of one-hundredth to one-tenth that of the rule-based methods. KarNet being a learned system is found to achieve nearly a hundred percent accuracy, precision, and recall. We train KarNet to solve four variable Karnaugh maps and also show that a similar method can be applied on Karnaugh maps with more variables. Finally, we show a way to build a fully accurate and computationally fast system using KarNet.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01363v1
PDF	https://arxiv.org/pdf/1906.01363v1.pdf
PWC	https://paperswithcode.com/paper/karnet-an-efficient-boolean-function
Repo
Framework

Community-based 3-SAT Formulas with a Predefined Solution


Title	Community-based 3-SAT Formulas with a Predefined Solution
Authors	Yamin Hu, Wenjian Luo, Junteng Wang
Abstract	It is crucial to generate crafted SAT formulas with predefined solutions for the testing and development of SAT solvers since many SAT formulas from real-world applications have solutions. Although some generating algorithms have been proposed to generate SAT formulas with predefined solutions, community structures of SAT formulas are not considered. We propose a 3-SAT formula generating algorithm that not only guarantees the existence of a predefined solution, but also simultaneously considers community structures and clause distributions. The proposed 3-SAT formula generating algorithm controls the quality of community structures through controlling (1) the number of clauses whose variables have a common community, which we call intra-community clauses, and (2) the number of variables that only belong to one community, which we call intra-community variables. To study the combined effect of community structures and clause distributions on the hardness of SAT formulas, we measure solving runtimes of two solvers, gluHack (a leading CDCL solver) and CPSparrow (a leading SLS solver), on the generated SAT formulas under different groups of parameter settings. Through extensive experiments, we obtain some noteworthy observations on the SAT formulas generated by the proposed algorithm: (1) The community structure has little or no effects on the hardness of SAT formulas with regard to CPSparrow but a strong effect with regard to gluHack. (2) Only when the proportion of true literals in a SAT formula in terms of the predefined solution is 0.5, SAT formulas are hard-to-solve with regard to gluHack; when this proportion is below 0.5, SAT formulas are hard-to-solve with regard to CPSparrow. (3) When the ratio of the number of clauses to that of variables is around 4.25, the SAT formulas are hard-to-solve with regard to both gluHack and CPSparrow.
Tasks
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09706v1
PDF	http://arxiv.org/pdf/1902.09706v1.pdf
PWC	https://paperswithcode.com/paper/community-based-3-sat-formulas-with-a
Repo
Framework

Measuring robustness of Visual SLAM


Title	Measuring robustness of Visual SLAM
Authors	David Prokhorov, Dmitry Zhukov, Olga Barinova, Anna Vorontsova, Anton Konushin
Abstract	Simultaneous localization and mapping (SLAM) is an essential component of robotic systems. In this work we perform a feasibility study of RGB-D SLAM for the task of indoor robot navigation. Recent visual SLAM methods, e.g. ORBSLAM2 \cite{mur2017orb}, demonstrate really impressive accuracy, but the experiments in the papers are usually conducted on just a few sequences, that makes it difficult to reason about the robustness of the methods. Another problem is that all available RGB-D datasets contain the trajectories with very complex camera motions. In this work we extensively evaluate ORBSLAM2 to better understand the state-of-the-art. First, we conduct experiments on the popular publicly available datasets for RGB-D SLAM across the conventional metrics. We perform statistical analysis of the results and find correlations between the metrics and the attributes of the trajectories. Then, we introduce a new large and diverse HomeRobot dataset where we model the motions of a simple home robot. Our dataset is created using physically-based rendering with realistic lighting and contains the scenes composed by human designers. It includes thousands of sequences, that is two orders of magnitude greater than in previous works. We find that while in many cases the accuracy of SLAM is very good, the robustness is still an issue.
Tasks	Robot Navigation, Simultaneous Localization and Mapping
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04755v1
PDF	https://arxiv.org/pdf/1910.04755v1.pdf
PWC	https://paperswithcode.com/paper/measuring-robustness-of-visual-slam
Repo
Framework

Active collaboration in relative observation for Multi-agent visual SLAM based on Deep Q Network


Title	Active collaboration in relative observation for Multi-agent visual SLAM based on Deep Q Network
Authors	Zhaoyi Pei, Piaosong Hao, Meixiang Quan, Muhammad Zuhair Qadir, Guo Li
Abstract	This paper proposes a unique active relative localization mechanism for multi-agent Simultaneous Localization and Mapping(SLAM),in which a agent to be observed are considered as a task, which is performed by others assisting that agent by relative observation. A task allocation algorithm based on deep reinforcement learning are proposed for this mechanism. Each agent can choose whether to localize other agents or to continue independent SLAM on it own initiative. By this way, the process of each agent SLAM will be interacted by the collaboration. Firstly, based on the characteristics of ORBSLAM, a unique observation function which models the whole MAS is obtained. Secondly, a novel type of Deep Q network(DQN) called MAS-DQN is deployed to learn correspondence between Q Value and state-action pair,abstract representation of agents in MAS are learned in the process of collaboration among agents. Finally, each agent must act with a certain degree of freedom according to MAS-DQN. The simulation results of comparative experiments prove that this mechanism improves the efficiency of cooperation in the process of multi-agent SLAM.
Tasks	Simultaneous Localization and Mapping
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10157v1
PDF	https://arxiv.org/pdf/1909.10157v1.pdf
PWC	https://paperswithcode.com/paper/190910157
Repo
Framework

Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields


Title	Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields
Authors	Evan Shelhamer, Dequan Wang, Trevor Darrell
Abstract	The visual world is vast and varied, but its variations divide into structured and unstructured factors. We compose free-form filters and structured Gaussian filters, optimized end-to-end, to factorize deep representations and learn both local features and their degree of locality. Our semi-structured composition is strictly more expressive than free-form filtering, and changes in its structured parameters would require changes in free-form architecture. In effect this optimizes over receptive field size and shape, tuning locality to the data and task. Dynamic inference, in which the Gaussian structure varies with the input, adapts receptive field size to compensate for local scale variation. Optimizing receptive field size improves semantic segmentation accuracy on Cityscapes by 1-2 points for strong dilated and skip architectures and by up to 10 points for suboptimal designs. Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency.
Tasks	Semantic Segmentation
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11487v1
PDF	http://arxiv.org/pdf/1904.11487v1.pdf
PWC	https://paperswithcode.com/paper/190411487
Repo
Framework

High Accuracy Classification of White Blood Cells using TSLDA Classifier and Covariance Features


Title	High Accuracy Classification of White Blood Cells using TSLDA Classifier and Covariance Features
Authors	Hamed Talebi, Amin Ranjbar, Alireza Davoudi, Hamed Gholami, Mohammad Bagher Menhaj
Abstract	creating automated processes in different areas of medical science with the application of engineering tools is a highly growing field over recent decades. In this context, many medical image processing and analyzing researchers use worthwhile methods in artificial intelligence, which can reduce necessary human power while increases accuracy of results. Among various medical images, blood microscopic images play a vital role in heart failure diagnosis, e.g., blood cancers. The prominent component in blood cancer diagnosis is white blood cells (WBCs) which due to its general characteristics in microscopic images sometimes make difficulties in recognition and classification tasks such as non-uniform colors/illuminances, different shapes, sizes, and textures. Moreover, overlapped WBCs in bone marrow images and neighboring to red blood cells are identified as reasons for errors in the classification task. In this paper, we have endeavored to segment various parts in medical images via Na"ive Bayes clustering method and in next stage via TSLDA classifier, which is supplied by features acquired from covariance descriptor results in the accuracy of 98.02%. It seems that this result is delightful in WBCs recognition.
Tasks
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05131v2
PDF	https://arxiv.org/pdf/1906.05131v2.pdf
PWC	https://paperswithcode.com/paper/high-accuracy-classification-of-white-blood
Repo
Framework

Size Independent Neural Transfer for RDDL Planning


Title	Size Independent Neural Transfer for RDDL Planning
Authors	Sankalp Garg, Aniket Bajpai, Mausam
Abstract	Neural planners for RDDL MDPs produce deep reactive policies in an offline fashion. These scale well with large domains, but are sample inefficient and time-consuming to train from scratch for each new problem. To mitigate this, recent work has studied neural transfer learning, so that a generic planner trained on other problems of the same domain can rapidly transfer to a new problem. However, this approach only transfers across problems of the same size. We present the first method for neural transfer of RDDL MDPs that can transfer across problems of different sizes. Our architecture has two key innovations to achieve size independence: (1) a state encoder, which outputs a fixed length state embedding by max pooling over varying number of object embeddings, (2) a single parameter-tied action decoder that projects object embeddings into action probabilities for the final policy. On the two challenging RDDL domains of SysAdmin and Game Of Life, our approach powerfully transfers across problem sizes and has superior learning curves over training from scratch.
Tasks	Transfer Learning
Published	2019-02-08
URL	http://arxiv.org/abs/1902.03081v2
PDF	http://arxiv.org/pdf/1902.03081v2.pdf
PWC	https://paperswithcode.com/paper/size-independent-neural-transfer-for-rddl
Repo
Framework

Deep Gaussian networks for function approximation on data defined manifolds


Title	Deep Gaussian networks for function approximation on data defined manifolds
Authors	Hrushikesh Mhaskar
Abstract	In much of the literature on function approximation by deep networks, the function is assumed to be defined on some known domain, such as a cube or a sphere. In practice, the data might not be dense on these domains, and therefore, the approximation theory results are observed to be too conservative. In manifold learning, one assumes instead that the data is sampled from an unknown manifold; i.e., the manifold is defined by the data itself. Function approximation on this unknown manifold is then a two stage procedure: first, one approximates the Laplace-Beltrami operator (and its eigen-decomposition) on this manifold using a graph Laplacian, and next, approximates the target function using the eigen-functions. Alternatively, one estimates first some atlas on the manifold and then uses local approximation techniques based on the local coordinate charts. In this paper, we propose a more direct approach to function approximation on unknown, data defined manifolds without computing the eigen-decomposition of some operator or an atlas for the manifold, and estimate the degree of approximation. Our constructions are universal; i.e., do not require the knowledge of any prior on the target function other than continuity on the manifold. For smooth functions, the estimates do not suffer from the so-called saturation phenomenon. We demonstrate via a property called good propagation of errors how the results can be lifted for function approximation using deep networks where each channel evaluates a Gaussian network on a possibly unknown manifold.
Tasks
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00156v2
PDF	https://arxiv.org/pdf/1908.00156v2.pdf
PWC	https://paperswithcode.com/paper/deep-gaussian-networks-for-function
Repo
Framework

Geo-Aware Networks for Fine-Grained Recognition


Title	Geo-Aware Networks for Fine-Grained Recognition
Authors	Grace Chu, Brian Potetz, Weijun Wang, Andrew Howard, Yang Song, Fernando Brucher, Thomas Leung, Hartwig Adam
Abstract	Fine-grained recognition distinguishes among categories with subtle visual differences. In order to differentiate between these challenging visual categories, it is helpful to leverage additional information. Geolocation is a rich source of additional information that can be used to improve fine-grained classification accuracy, but has been understudied. Our contributions to this field are twofold. First, to the best of our knowledge, this is the first paper which systematically examined various ways of incorporating geolocation information into fine-grained image classification through the use of geolocation priors, post-processing or feature modulation. Secondly, to overcome the situation where no fine-grained dataset has complete geolocation information, we release two fine-grained datasets with geolocation by providing complementary information to existing popular datasets - iNaturalist and YFCC100M. By leveraging geolocation information we improve top-1 accuracy in iNaturalist from 70.1% to 79.0% for a strong baseline image-only model. Comparing several models, we found that best performance was achieved by a post-processing model that consumed the output of the image-only baseline alongside geolocation. However, for a resource-constrained model (MobileNetV2), performance was better with a feature modulation model that trains jointly over pixels and geolocation: accuracy increased from 59.6% to 72.2%. Our work makes a strong case for incorporating geolocation information in fine-grained recognition models for both server and on-device.
Tasks	Fine-Grained Image Classification, Image Classification
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01737v2
PDF	https://arxiv.org/pdf/1906.01737v2.pdf
PWC	https://paperswithcode.com/paper/geo-aware-networks-for-fine-grained
Repo
Framework

Joint segmentation and classification of retinal arteries/veins from fundus images


Title	Joint segmentation and classification of retinal arteries/veins from fundus images
Authors	Fantin Girard, Conrad Kavalec, Farida Cheriet
Abstract	Objective Automatic artery/vein (A/V) segmentation from fundus images is required to track blood vessel changes occurring with many pathologies including retinopathy and cardiovascular pathologies. One of the clinical measures that quantifies vessel changes is the arterio-venous ratio (AVR) which represents the ratio between artery and vein diameters. This measure significantly depends on the accuracy of vessel segmentation and classification into arteries and veins. This paper proposes a fast, novel method for semantic A/V segmentation combining deep learning and graph propagation. Methods A convolutional neural network (CNN) is proposed to jointly segment and classify vessels into arteries and veins. The initial CNN labeling is propagated through a graph representation of the retinal vasculature, whose nodes are defined as the vessel branches and edges are weighted by the cost of linking pairs of branches. To efficiently propagate the labels, the graph is simplified into its minimum spanning tree. Results The method achieves an accuracy of 94.8% for vessels segmentation. The A/V classification achieves a specificity of 92.9% with a sensitivity of 93.7% on the CT-DRIVE database compared to the state-of-the-art-specificity and sensitivity, both of 91.7%. Conclusion The results show that our method outperforms the leading previous works on a public dataset for A/V classification and is by far the fastest. Significance The proposed global AVR calculated on the whole fundus image using our automatic A/V segmentation method can better track vessel changes associated to diabetic retinopathy than the standard local AVR calculated only around the optic disc.
Tasks
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01330v1
PDF	http://arxiv.org/pdf/1903.01330v1.pdf
PWC	https://paperswithcode.com/paper/joint-segmentation-and-classification-of
Repo
Framework

PSNet: Parametric Sigmoid Norm Based CNN for Face Recognition


Title	PSNet: Parametric Sigmoid Norm Based CNN for Face Recognition
Authors	Yash Srivastava, Vaishnav Murali, Shiv Ram Dubey
Abstract	The Convolutional Neural Networks (CNN) have become very popular recently due to its outstanding performance in various computer vision applications. It is also used over widely studied face recognition problem. However, the existing layers of CNN are unable to cope with the problem of hard examples which generally produce lower class scores. Thus, the existing methods become biased towards the easy examples. In this paper, we resolve this problem by incorporating a Parametric Sigmoid Norm (PSN) layer just before the final fully-connected layer. We propose a PSNet CNN model by using the PSN layer. The PSN layer facilitates high gradient flow for harder examples as compared to easy examples. Thus, it forces the network to learn the visual characteristics of hard examples. We conduct the face recognition experiments to test the performance of PSN layer. The suitability of the PSN layer with different loss functions is also experimented. The widely used Labeled Faces in the Wild (LFW) and YouTube Faces (YTF) datasets are used in the experiments. The experimental results confirm the relevance of the proposed PSN layer.
Tasks	Face Recognition
Published	2019-12-05
URL	https://arxiv.org/abs/1912.10946v1
PDF	https://arxiv.org/pdf/1912.10946v1.pdf
PWC	https://paperswithcode.com/paper/psnet-parametric-sigmoid-norm-based-cnn-for
Repo
Framework