Paper Group ANR 703
VICSOM: VIsual Clues from SOcial Media for psychological assessment. A Context-and-Spatial Aware Network for Multi-Person Pose Estimation. Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM. Exactly Sparse Gaussian Variational Inference with Application to Derivative-Free Batch Nonlinear State Estimation. KarNet: An Eff …
VICSOM: VIsual Clues from SOcial Media for psychological assessment
Title | VICSOM: VIsual Clues from SOcial Media for psychological assessment |
Authors | Mohammad Mahdi Dehshibi, Gerard Pons, Bita Baiani, David Masip |
Abstract | Sharing multimodal information (typically images, videos or text) in Social Network Sites (SNS) occupies a relevant part of our time. The particular way how users expose themselves in SNS can provide useful information to infer human behaviors. This paper proposes to use multimodal data gathered from Instagram accounts to predict the perceived prototypical needs described in Glasser’s choice theory. The contribution is two-fold: (i) we provide a large multimodal database from Instagram public profiles (more than 30,000 images and text captions) annotated by expert Psychologists on each perceived behavior according to Glasser’s theory, and (ii) we propose to automate the recognition of the (unconsciously) perceived needs by the users. Particularly, we propose a baseline using three different feature sets: visual descriptors based on pixel images (SURF and Visual Bag of Words), a high-level descriptor based on the automated scene description using Convolutional Neural Networks, and a text-based descriptor (Word2vec) obtained from processing the captions provided by the users. Finally, we propose a multimodal fusion of these descriptors obtaining promising results in the multi-label classification problem. |
Tasks | Multi-Label Classification |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.06203v1 |
https://arxiv.org/pdf/1905.06203v1.pdf | |
PWC | https://paperswithcode.com/paper/vicsom-visual-clues-from-social-media-for |
Repo | |
Framework | |
A Context-and-Spatial Aware Network for Multi-Person Pose Estimation
Title | A Context-and-Spatial Aware Network for Multi-Person Pose Estimation |
Authors | Dongdong Yu, Kai Su, Xin Geng, Changhu Wang |
Abstract | Multi-person pose estimation is a fundamental yet challenging task in computer vision. Both rich context information and spatial information are required to precisely locate the keypoints for all persons in an image. In this paper, a novel Context-and-Spatial Aware Network (CSANet), which integrates both a Context Aware Path and Spatial Aware Path, is proposed to obtain effective features involving both context information and spatial information. Specifically, we design a Context Aware Path with structure supervision strategy and spatial pyramid pooling strategy to enhance the context information. Meanwhile, a Spatial Aware Path is proposed to preserve the spatial information, which also shortens the information propagation path from low-level features to high-level features. On top of these two paths, we employ a Heavy Head Path to further combine and enhance the features effectively. Experimentally, our proposed network outperforms state-of-the-art methods on the COCO keypoint benchmark, which verifies the effectiveness of our method and further corroborates the above proposition. |
Tasks | Multi-Person Pose Estimation, Pose Estimation |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05355v1 |
https://arxiv.org/pdf/1905.05355v1.pdf | |
PWC | https://paperswithcode.com/paper/a-context-and-spatial-aware-network-for-multi |
Repo | |
Framework | |
Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM
Title | Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM |
Authors | Xuesong Shi, Dongjiang Li, Pengpeng Zhao, Qinbin Tian, Yuxin Tian, Qiwei Long, Chunhao Zhu, Jingwei Song, Fei Qiao, Le Song, Yangquan Guo, Zhigang Wang, Yimin Zhang, Baoxing Qin, Wei Yang, Fangshi Wang, Rosa H. M. Chan, Qi She |
Abstract | Service robots should be able to operate autonomously in dynamic and daily changing environments over an extended period of time. While Simultaneous Localization And Mapping (SLAM) is one of the most fundamental problems for robotic autonomy, most existing SLAM works are evaluated with data sequences that are recorded in a short period of time. In real-world deployment, there can be out-of-sight scene changes caused by both natural factors and human activities. For example, in home scenarios, most objects may be movable, replaceable or deformable, and the visual features of the same place may be significantly different in some successive days. Such out-of-sight dynamics pose great challenges to the robustness of pose estimation, and hence a robot’s long-term deployment and operation. To differentiate the forementioned problem from the conventional works which are usually evaluated in a static setting in a single run, the term \textit{lifelong SLAM} is used here to address SLAM problems in an ever-changing environment over a long period of time. To accelerate lifelong SLAM research, we release the OpenLORIS-Scene datasets. The data are collected in real-world indoor scenes, for multiple times in each place to include scene changes in real life. We also design benchmarking metrics for lifelong SLAM, with which the robustness and accuracy of pose estimation are evaluated separately. The datasets and benchmark are available online at https://lifelong-robotic-vision.github.io/dataset/scene. |
Tasks | Pose Estimation, Simultaneous Localization and Mapping |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05603v2 |
https://arxiv.org/pdf/1911.05603v2.pdf | |
PWC | https://paperswithcode.com/paper/are-we-ready-for-service-robots-the-openloris |
Repo | |
Framework | |
Exactly Sparse Gaussian Variational Inference with Application to Derivative-Free Batch Nonlinear State Estimation
Title | Exactly Sparse Gaussian Variational Inference with Application to Derivative-Free Batch Nonlinear State Estimation |
Authors | Timothy D. Barfoot, James R. Forbes, David Yoon |
Abstract | We present a Gaussian Variational Inference (GVI) technique that can be applied to large-scale nonlinear batch state estimation problems. The main contribution is to show how to fit the best Gaussian to the posterior efficiently by exploiting factorization of the joint likelihood of the state and data, as is common in practical problems. The proposed Exactly Sparse Gaussian Variational Inference (ESGVI) technique stores the inverse covariance matrix, which is typically very sparse (e.g., block-tridiagonal for classic state estimation). We show that the only blocks of the (dense) covariance matrix that are required during the calculations correspond to the non-zero blocks of the inverse covariance matrix, and further show how to calculate these blocks efficiently in the general GVI problem. ESGVI operates iteratively, and while we can use analytical derivatives at each iteration, Gaussian cubature can be substituted, thereby producing an efficient derivative-free batch formulation. ESGVI simplifies to precisely the Rauch-Tung-Striebel (RTS) smoother in the batch linear estimation case, but goes beyond the ‘extended’ RTS smoother in the nonlinear case since it finds the best-fit Gaussian, not the Maximum A Posteriori (MAP) point solution. We demonstrate the technique on controlled simulation problems and a batch nonlinear Simultaneous Localization and Mapping (SLAM) problem with an experimental dataset. |
Tasks | Simultaneous Localization and Mapping |
Published | 2019-11-09 |
URL | https://arxiv.org/abs/1911.08333v1 |
https://arxiv.org/pdf/1911.08333v1.pdf | |
PWC | https://paperswithcode.com/paper/exactly-sparse-gaussian-variational-inference |
Repo | |
Framework | |
KarNet: An Efficient Boolean Function Simplifier
Title | KarNet: An Efficient Boolean Function Simplifier |
Authors | Shanka Subhra Mondal, Abhilash Nandy, Ritesh Agrawal, Debashis Sen |
Abstract | Many approaches such as Quine-McCluskey algorithm, Karnaugh map solving, Petrick’s method and McBoole’s method have been devised to simplify Boolean expressions in order to optimize hardware implementation of digital circuits. However, the algorithmic implementations of these methods are hard-coded and also their computation time is proportional to the number of minterms involved in the expression. In this paper, we propose KarNet, where the ability of Convolutional Neural Networks to model relationships between various cell locations and values by capturing spatial dependencies is exploited to solve Karnaugh maps. In order to do so, a Karnaugh map is represented as an image signal, where each cell is considered as a pixel. Experimental results show that the computation time of KarNet is independent of the number of minterms and is of the order of one-hundredth to one-tenth that of the rule-based methods. KarNet being a learned system is found to achieve nearly a hundred percent accuracy, precision, and recall. We train KarNet to solve four variable Karnaugh maps and also show that a similar method can be applied on Karnaugh maps with more variables. Finally, we show a way to build a fully accurate and computationally fast system using KarNet. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01363v1 |
https://arxiv.org/pdf/1906.01363v1.pdf | |
PWC | https://paperswithcode.com/paper/karnet-an-efficient-boolean-function |
Repo | |
Framework | |
Community-based 3-SAT Formulas with a Predefined Solution
Title | Community-based 3-SAT Formulas with a Predefined Solution |
Authors | Yamin Hu, Wenjian Luo, Junteng Wang |
Abstract | It is crucial to generate crafted SAT formulas with predefined solutions for the testing and development of SAT solvers since many SAT formulas from real-world applications have solutions. Although some generating algorithms have been proposed to generate SAT formulas with predefined solutions, community structures of SAT formulas are not considered. We propose a 3-SAT formula generating algorithm that not only guarantees the existence of a predefined solution, but also simultaneously considers community structures and clause distributions. The proposed 3-SAT formula generating algorithm controls the quality of community structures through controlling (1) the number of clauses whose variables have a common community, which we call intra-community clauses, and (2) the number of variables that only belong to one community, which we call intra-community variables. To study the combined effect of community structures and clause distributions on the hardness of SAT formulas, we measure solving runtimes of two solvers, gluHack (a leading CDCL solver) and CPSparrow (a leading SLS solver), on the generated SAT formulas under different groups of parameter settings. Through extensive experiments, we obtain some noteworthy observations on the SAT formulas generated by the proposed algorithm: (1) The community structure has little or no effects on the hardness of SAT formulas with regard to CPSparrow but a strong effect with regard to gluHack. (2) Only when the proportion of true literals in a SAT formula in terms of the predefined solution is 0.5, SAT formulas are hard-to-solve with regard to gluHack; when this proportion is below 0.5, SAT formulas are hard-to-solve with regard to CPSparrow. (3) When the ratio of the number of clauses to that of variables is around 4.25, the SAT formulas are hard-to-solve with regard to both gluHack and CPSparrow. |
Tasks | |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09706v1 |
http://arxiv.org/pdf/1902.09706v1.pdf | |
PWC | https://paperswithcode.com/paper/community-based-3-sat-formulas-with-a |
Repo | |
Framework | |
Measuring robustness of Visual SLAM
Title | Measuring robustness of Visual SLAM |
Authors | David Prokhorov, Dmitry Zhukov, Olga Barinova, Anna Vorontsova, Anton Konushin |
Abstract | Simultaneous localization and mapping (SLAM) is an essential component of robotic systems. In this work we perform a feasibility study of RGB-D SLAM for the task of indoor robot navigation. Recent visual SLAM methods, e.g. ORBSLAM2 \cite{mur2017orb}, demonstrate really impressive accuracy, but the experiments in the papers are usually conducted on just a few sequences, that makes it difficult to reason about the robustness of the methods. Another problem is that all available RGB-D datasets contain the trajectories with very complex camera motions. In this work we extensively evaluate ORBSLAM2 to better understand the state-of-the-art. First, we conduct experiments on the popular publicly available datasets for RGB-D SLAM across the conventional metrics. We perform statistical analysis of the results and find correlations between the metrics and the attributes of the trajectories. Then, we introduce a new large and diverse HomeRobot dataset where we model the motions of a simple home robot. Our dataset is created using physically-based rendering with realistic lighting and contains the scenes composed by human designers. It includes thousands of sequences, that is two orders of magnitude greater than in previous works. We find that while in many cases the accuracy of SLAM is very good, the robustness is still an issue. |
Tasks | Robot Navigation, Simultaneous Localization and Mapping |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04755v1 |
https://arxiv.org/pdf/1910.04755v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-robustness-of-visual-slam |
Repo | |
Framework | |
Active collaboration in relative observation for Multi-agent visual SLAM based on Deep Q Network
Title | Active collaboration in relative observation for Multi-agent visual SLAM based on Deep Q Network |
Authors | Zhaoyi Pei, Piaosong Hao, Meixiang Quan, Muhammad Zuhair Qadir, Guo Li |
Abstract | This paper proposes a unique active relative localization mechanism for multi-agent Simultaneous Localization and Mapping(SLAM),in which a agent to be observed are considered as a task, which is performed by others assisting that agent by relative observation. A task allocation algorithm based on deep reinforcement learning are proposed for this mechanism. Each agent can choose whether to localize other agents or to continue independent SLAM on it own initiative. By this way, the process of each agent SLAM will be interacted by the collaboration. Firstly, based on the characteristics of ORBSLAM, a unique observation function which models the whole MAS is obtained. Secondly, a novel type of Deep Q network(DQN) called MAS-DQN is deployed to learn correspondence between Q Value and state-action pair,abstract representation of agents in MAS are learned in the process of collaboration among agents. Finally, each agent must act with a certain degree of freedom according to MAS-DQN. The simulation results of comparative experiments prove that this mechanism improves the efficiency of cooperation in the process of multi-agent SLAM. |
Tasks | Simultaneous Localization and Mapping |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10157v1 |
https://arxiv.org/pdf/1909.10157v1.pdf | |
PWC | https://paperswithcode.com/paper/190910157 |
Repo | |
Framework | |
Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields
Title | Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields |
Authors | Evan Shelhamer, Dequan Wang, Trevor Darrell |
Abstract | The visual world is vast and varied, but its variations divide into structured and unstructured factors. We compose free-form filters and structured Gaussian filters, optimized end-to-end, to factorize deep representations and learn both local features and their degree of locality. Our semi-structured composition is strictly more expressive than free-form filtering, and changes in its structured parameters would require changes in free-form architecture. In effect this optimizes over receptive field size and shape, tuning locality to the data and task. Dynamic inference, in which the Gaussian structure varies with the input, adapts receptive field size to compensate for local scale variation. Optimizing receptive field size improves semantic segmentation accuracy on Cityscapes by 1-2 points for strong dilated and skip architectures and by up to 10 points for suboptimal designs. Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency. |
Tasks | Semantic Segmentation |
Published | 2019-04-25 |
URL | http://arxiv.org/abs/1904.11487v1 |
http://arxiv.org/pdf/1904.11487v1.pdf | |
PWC | https://paperswithcode.com/paper/190411487 |
Repo | |
Framework | |
High Accuracy Classification of White Blood Cells using TSLDA Classifier and Covariance Features
Title | High Accuracy Classification of White Blood Cells using TSLDA Classifier and Covariance Features |
Authors | Hamed Talebi, Amin Ranjbar, Alireza Davoudi, Hamed Gholami, Mohammad Bagher Menhaj |
Abstract | creating automated processes in different areas of medical science with the application of engineering tools is a highly growing field over recent decades. In this context, many medical image processing and analyzing researchers use worthwhile methods in artificial intelligence, which can reduce necessary human power while increases accuracy of results. Among various medical images, blood microscopic images play a vital role in heart failure diagnosis, e.g., blood cancers. The prominent component in blood cancer diagnosis is white blood cells (WBCs) which due to its general characteristics in microscopic images sometimes make difficulties in recognition and classification tasks such as non-uniform colors/illuminances, different shapes, sizes, and textures. Moreover, overlapped WBCs in bone marrow images and neighboring to red blood cells are identified as reasons for errors in the classification task. In this paper, we have endeavored to segment various parts in medical images via Na"ive Bayes clustering method and in next stage via TSLDA classifier, which is supplied by features acquired from covariance descriptor results in the accuracy of 98.02%. It seems that this result is delightful in WBCs recognition. |
Tasks | |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05131v2 |
https://arxiv.org/pdf/1906.05131v2.pdf | |
PWC | https://paperswithcode.com/paper/high-accuracy-classification-of-white-blood |
Repo | |
Framework | |
Size Independent Neural Transfer for RDDL Planning
Title | Size Independent Neural Transfer for RDDL Planning |
Authors | Sankalp Garg, Aniket Bajpai, Mausam |
Abstract | Neural planners for RDDL MDPs produce deep reactive policies in an offline fashion. These scale well with large domains, but are sample inefficient and time-consuming to train from scratch for each new problem. To mitigate this, recent work has studied neural transfer learning, so that a generic planner trained on other problems of the same domain can rapidly transfer to a new problem. However, this approach only transfers across problems of the same size. We present the first method for neural transfer of RDDL MDPs that can transfer across problems of different sizes. Our architecture has two key innovations to achieve size independence: (1) a state encoder, which outputs a fixed length state embedding by max pooling over varying number of object embeddings, (2) a single parameter-tied action decoder that projects object embeddings into action probabilities for the final policy. On the two challenging RDDL domains of SysAdmin and Game Of Life, our approach powerfully transfers across problem sizes and has superior learning curves over training from scratch. |
Tasks | Transfer Learning |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.03081v2 |
http://arxiv.org/pdf/1902.03081v2.pdf | |
PWC | https://paperswithcode.com/paper/size-independent-neural-transfer-for-rddl |
Repo | |
Framework | |
Deep Gaussian networks for function approximation on data defined manifolds
Title | Deep Gaussian networks for function approximation on data defined manifolds |
Authors | Hrushikesh Mhaskar |
Abstract | In much of the literature on function approximation by deep networks, the function is assumed to be defined on some known domain, such as a cube or a sphere. In practice, the data might not be dense on these domains, and therefore, the approximation theory results are observed to be too conservative. In manifold learning, one assumes instead that the data is sampled from an unknown manifold; i.e., the manifold is defined by the data itself. Function approximation on this unknown manifold is then a two stage procedure: first, one approximates the Laplace-Beltrami operator (and its eigen-decomposition) on this manifold using a graph Laplacian, and next, approximates the target function using the eigen-functions. Alternatively, one estimates first some atlas on the manifold and then uses local approximation techniques based on the local coordinate charts. In this paper, we propose a more direct approach to function approximation on unknown, data defined manifolds without computing the eigen-decomposition of some operator or an atlas for the manifold, and estimate the degree of approximation. Our constructions are universal; i.e., do not require the knowledge of any prior on the target function other than continuity on the manifold. For smooth functions, the estimates do not suffer from the so-called saturation phenomenon. We demonstrate via a property called good propagation of errors how the results can be lifted for function approximation using deep networks where each channel evaluates a Gaussian network on a possibly unknown manifold. |
Tasks | |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00156v2 |
https://arxiv.org/pdf/1908.00156v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-gaussian-networks-for-function |
Repo | |
Framework | |
Geo-Aware Networks for Fine-Grained Recognition
Title | Geo-Aware Networks for Fine-Grained Recognition |
Authors | Grace Chu, Brian Potetz, Weijun Wang, Andrew Howard, Yang Song, Fernando Brucher, Thomas Leung, Hartwig Adam |
Abstract | Fine-grained recognition distinguishes among categories with subtle visual differences. In order to differentiate between these challenging visual categories, it is helpful to leverage additional information. Geolocation is a rich source of additional information that can be used to improve fine-grained classification accuracy, but has been understudied. Our contributions to this field are twofold. First, to the best of our knowledge, this is the first paper which systematically examined various ways of incorporating geolocation information into fine-grained image classification through the use of geolocation priors, post-processing or feature modulation. Secondly, to overcome the situation where no fine-grained dataset has complete geolocation information, we release two fine-grained datasets with geolocation by providing complementary information to existing popular datasets - iNaturalist and YFCC100M. By leveraging geolocation information we improve top-1 accuracy in iNaturalist from 70.1% to 79.0% for a strong baseline image-only model. Comparing several models, we found that best performance was achieved by a post-processing model that consumed the output of the image-only baseline alongside geolocation. However, for a resource-constrained model (MobileNetV2), performance was better with a feature modulation model that trains jointly over pixels and geolocation: accuracy increased from 59.6% to 72.2%. Our work makes a strong case for incorporating geolocation information in fine-grained recognition models for both server and on-device. |
Tasks | Fine-Grained Image Classification, Image Classification |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01737v2 |
https://arxiv.org/pdf/1906.01737v2.pdf | |
PWC | https://paperswithcode.com/paper/geo-aware-networks-for-fine-grained |
Repo | |
Framework | |
Joint segmentation and classification of retinal arteries/veins from fundus images
Title | Joint segmentation and classification of retinal arteries/veins from fundus images |
Authors | Fantin Girard, Conrad Kavalec, Farida Cheriet |
Abstract | Objective Automatic artery/vein (A/V) segmentation from fundus images is required to track blood vessel changes occurring with many pathologies including retinopathy and cardiovascular pathologies. One of the clinical measures that quantifies vessel changes is the arterio-venous ratio (AVR) which represents the ratio between artery and vein diameters. This measure significantly depends on the accuracy of vessel segmentation and classification into arteries and veins. This paper proposes a fast, novel method for semantic A/V segmentation combining deep learning and graph propagation. Methods A convolutional neural network (CNN) is proposed to jointly segment and classify vessels into arteries and veins. The initial CNN labeling is propagated through a graph representation of the retinal vasculature, whose nodes are defined as the vessel branches and edges are weighted by the cost of linking pairs of branches. To efficiently propagate the labels, the graph is simplified into its minimum spanning tree. Results The method achieves an accuracy of 94.8% for vessels segmentation. The A/V classification achieves a specificity of 92.9% with a sensitivity of 93.7% on the CT-DRIVE database compared to the state-of-the-art-specificity and sensitivity, both of 91.7%. Conclusion The results show that our method outperforms the leading previous works on a public dataset for A/V classification and is by far the fastest. Significance The proposed global AVR calculated on the whole fundus image using our automatic A/V segmentation method can better track vessel changes associated to diabetic retinopathy than the standard local AVR calculated only around the optic disc. |
Tasks | |
Published | 2019-03-04 |
URL | http://arxiv.org/abs/1903.01330v1 |
http://arxiv.org/pdf/1903.01330v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-segmentation-and-classification-of |
Repo | |
Framework | |
PSNet: Parametric Sigmoid Norm Based CNN for Face Recognition
Title | PSNet: Parametric Sigmoid Norm Based CNN for Face Recognition |
Authors | Yash Srivastava, Vaishnav Murali, Shiv Ram Dubey |
Abstract | The Convolutional Neural Networks (CNN) have become very popular recently due to its outstanding performance in various computer vision applications. It is also used over widely studied face recognition problem. However, the existing layers of CNN are unable to cope with the problem of hard examples which generally produce lower class scores. Thus, the existing methods become biased towards the easy examples. In this paper, we resolve this problem by incorporating a Parametric Sigmoid Norm (PSN) layer just before the final fully-connected layer. We propose a PSNet CNN model by using the PSN layer. The PSN layer facilitates high gradient flow for harder examples as compared to easy examples. Thus, it forces the network to learn the visual characteristics of hard examples. We conduct the face recognition experiments to test the performance of PSN layer. The suitability of the PSN layer with different loss functions is also experimented. The widely used Labeled Faces in the Wild (LFW) and YouTube Faces (YTF) datasets are used in the experiments. The experimental results confirm the relevance of the proposed PSN layer. |
Tasks | Face Recognition |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.10946v1 |
https://arxiv.org/pdf/1912.10946v1.pdf | |
PWC | https://paperswithcode.com/paper/psnet-parametric-sigmoid-norm-based-cnn-for |
Repo | |
Framework | |