January 28, 2020

3477 words 17 mins read

Paper Group ANR 792

CreditPrint: Credit Investigation via Geographic Footprints by Deep Learning. Towards Learning to Detect and Predict Contact Events on Vision-based Tactile Sensors. Optical Flow Techniques for Facial Expression Analysis: Performance Evaluation and Improvements. SegMap: Segment-based mapping and localization using data-driven descriptors. A Comparat …

CreditPrint: Credit Investigation via Geographic Footprints by Deep Learning


Title	CreditPrint: Credit Investigation via Geographic Footprints by Deep Learning
Authors	Xiao Han, Ruiqing Ding, Leye Wang, Hailiang Huang
Abstract	Credit investigation is critical for financial services. Whereas, traditional methods are often restricted as the employed data hardly provide sufficient, timely and reliable information. With the prevalence of smart mobile devices, peoples’ geographic footprints can be automatically and constantly collected nowadays, which provides an unprecedented opportunity for credit investigations. Inspired by the observation that locations are somehow related to peoples’ credit level, this research aims to enhance credit investigation with users’ geographic footprints. To this end, a two-stage credit investigation framework is designed, namely CreditPrint. In the first stage, CreditPrint explores regions’ credit characteristics and learns a credit-aware embedding for each region by considering both each region’s individual characteristics and cross-region relationships with graph convolutional networks. In the second stage, a hierarchical attention-based credit assessment network is proposed to aggregate the credit indications from a user’s multiple trajectories covering diverse regions. The results on real-life user mobility datasets show that CreditPrint can increase the credit investigation accuracy by up to 10% compared to baseline methods.
Tasks
Published	2019-10-19
URL	https://arxiv.org/abs/1910.08734v1
PDF	https://arxiv.org/pdf/1910.08734v1.pdf
PWC	https://paperswithcode.com/paper/creditprint-credit-investigation-via
Repo
Framework

Towards Learning to Detect and Predict Contact Events on Vision-based Tactile Sensors


Title	Towards Learning to Detect and Predict Contact Events on Vision-based Tactile Sensors
Authors	Yazhan Zhang, Weihao Yuan, Zicheng Kan, Michael Yu Wang
Abstract	In essence, successful grasp boils down to correct responses to multiple contact events between fingertips and objects. In most scenarios, tactile sensing is adequate to distinguish contact events. Due to the nature of high dimensionality of tactile information, classifying spatiotemporal tactile signals using conventional model-based methods is difficult. In this work, we propose to predict and classify tactile signal using deep learning methods, seeking to enhance the adaptability of the robotic grasp system to external event changes that may lead to grasping failure. We develop a deep learning framework and collect 6650 tactile image sequences with a vision-based tactile sensor, and the neural network is integrated into a contact-event-based robotic grasping system. In grasping experiments, we achieved 52% increase in terms of object lifting success rate with contact detection, significantly higher robustness under unexpected loads with slip prediction compared with open-loop grasps, demonstrating that integration of the proposed framework into robotic grasping system substantially improves picking success rate and capability to withstand external disturbances.
Tasks	Robotic Grasping
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03973v1
PDF	https://arxiv.org/pdf/1910.03973v1.pdf
PWC	https://paperswithcode.com/paper/towards-learning-to-detect-and-predict
Repo
Framework

Optical Flow Techniques for Facial Expression Analysis: Performance Evaluation and Improvements


Title	Optical Flow Techniques for Facial Expression Analysis: Performance Evaluation and Improvements
Authors	Benjamin Allaert, Isaac Ronald Ward, Ioan Marius Bilasco, Chaabane Djeraba, Mohammed Bennamoun
Abstract	Optical flow techniques are becoming increasingly performant and robust when estimating motion in a scene, but their performance has yet to be proven in the area of facial expression recognition. In this work, a variety of optical flow approaches are evaluated across multiple facial expression datasets, so as to provide a consistent performance evaluation. Additionally, the strengths of multiple optical flow approaches are combined in a novel data augmentation scheme. Under this scheme, increases in average accuracy of up to 6% (depending on the choice of optical flow approaches and dataset) have been achieved.
Tasks	Data Augmentation, Facial Expression Recognition, Optical Flow Estimation
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11592v1
PDF	http://arxiv.org/pdf/1904.11592v1.pdf
PWC	https://paperswithcode.com/paper/optical-flow-techniques-for-facial-expression
Repo
Framework

SegMap: Segment-based mapping and localization using data-driven descriptors


Title	SegMap: Segment-based mapping and localization using data-driven descriptors
Authors	Renaud Dubé, Andrei Cramariuc, Daniel Dugas, Hannes Sommer, Marcin Dymczyk, Juan Nieto, Roland Siegwart, Cesar Cadena
Abstract	Precisely estimating a robot’s pose in a prior, global map is a fundamental capability for mobile robotics, e.g. autonomous driving or exploration in disaster zones. This task, however, remains challenging in unstructured, dynamic environments, where local features are not discriminative enough and global scene descriptors only provide coarse information. We therefore present SegMap: a map representation solution for localization and mapping based on the extraction of segments in 3D point clouds. Working at the level of segments offers increased invariance to view-point and local structural changes, and facilitates real-time processing of large-scale 3D data. SegMap exploits a single compact data-driven descriptor for performing multiple tasks: global localization, 3D dense map reconstruction, and semantic information extraction. The performance of SegMap is evaluated in multiple urban driving and search and rescue experiments. We show that the learned SegMap descriptor has superior segment retrieval capabilities, compared to state-of-the-art handcrafted descriptors. In consequence, we achieve a higher localization accuracy and a 6% increase in recall over state-of-the-art. These segment-based localizations allow us to reduce the open-loop odometry drift by up to 50%. SegMap is open-source available along with easy to run demonstrations.
Tasks	Autonomous Driving
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12837v1
PDF	https://arxiv.org/pdf/1909.12837v1.pdf
PWC	https://paperswithcode.com/paper/segmap-segment-based-mapping-and-localization
Repo
Framework

A Comparative Study of Neural Network Compression


Title	A Comparative Study of Neural Network Compression
Authors	Hossein Baktash, Emanuele Natale, Laurent Viennot
Abstract	There has recently been an increasing desire to evaluate neural networks locally on computationally-limited devices in order to exploit their recent effectiveness for several applications; such effectiveness has nevertheless come together with a considerable increase in the size of modern neural networks, which constitute a major downside in several of the aforementioned computationally-limited settings. There has thus been a demand of compression techniques for neural networks. Several proposal in this direction have been made, which famously include hashing-based methods and pruning-based ones. However, the evaluation of the efficacy of these techniques has so far been heterogeneous, with no clear evidence in favor of any of them over the others. The goal of this work is to address this latter issue by providing a comparative study. While most previous studies test the capability of a technique in reducing the number of parameters of state-of-the-art networks , we follow [CWT + 15] in evaluating their performance on basic ar-chitectures on the MNIST dataset and variants of it, which allows for a clearer analysis of some aspects of their behavior. To the best of our knowledge, we are the first to directly compare famous approaches such as HashedNet, Optimal Brain Damage (OBD), and magnitude-based pruning with L1 and L2 regularization among them and against equivalent-size feed-forward neural networks with simple (fully-connected) and structural (convolutional) neural networks. Rather surprisingly, our experiments show that (iterative) pruning-based methods are substantially better than the HashedNet architecture, whose compression doesn’t appear advantageous to a carefully chosen convolutional network. We also show that, when the compression level is high, the famous OBD pruning heuristics deteriorates to the point of being less efficient than simple magnitude-based techniques.
Tasks	L2 Regularization, Neural Network Compression
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11144v1
PDF	https://arxiv.org/pdf/1910.11144v1.pdf
PWC	https://paperswithcode.com/paper/a-comparative-study-of-neural-network
Repo
Framework

Sparse (group) learning with Lipschitz loss functions: a unified analysis


Title	Sparse (group) learning with Lipschitz loss functions: a unified analysis
Authors	Antoine Dedieu
Abstract	We study a family of sparse estimators defined as minimizers of some empirical Lipschitz loss function—which include hinge, logistic and quantile regression losses—with a convex, sparse or group-sparse regularization. In particular, we consider the L1-norm on the coefficients, its sorted Slope version, and the Group L1-L2 extension. First, we propose a theoretical framework which simultaneously derives new L2 estimation upper bounds for all three regularization schemes. For L1 and Slope regularizations, our bounds scale as $(k^/n) \log(p/k^)$—$n\times p$ is the size of the design matrix and $k^$ the dimension of the theoretical loss minimizer $\beta^$—matching the optimal minimax rate achieved for the least-squares case. For Group L1-L2 regularization, our bounds scale as $(s^/n) \log\left( G / s^ \right) + m^* / n$—$G$ is the total number of groups and $m^$ the number of coefficients in the $s^$ groups which contain $\beta^*$—and improve over the least-squares case. We additionally show that when the signal is strongly group-sparse Group L1-L2 is superior to L1 and Slope. Our bounds are achieved both in probability and in expectation, under common assumptions in the literature. Second, we propose an accelerated proximal algorithm which computes the convex estimators studied when the number of variables is of the order of $100,000$. We additionally compare their statistical performance of our estimators against standard baselines for settings where the signal is either sparse or group-sparse. Our experiments findings reveal (i) the good empirical performance of L1 and Slope regularizations for sparse binary classification problems, (ii) the superiority of Group L1-L2 regularization for group-sparse classification problems and (iii) the appealing properties of sparse quantile regression estimators for sparse regression problems with heteroscedastic noise.
Tasks	L2 Regularization
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08880v6
PDF	https://arxiv.org/pdf/1910.08880v6.pdf
PWC	https://paperswithcode.com/paper/sparse-group-learning-with-lipschitz-loss
Repo
Framework

Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation (POINT^2)


Title	Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation (POINT^2)
Authors	Haofu Liao, Wei-An Lin, Jiarui Zhang, Jingdan Zhang, Jiebo Luo, S. Kevin Zhou
Abstract	We propose to tackle the problem of multiview 2D/3D rigid registration for intervention via a Point-Of-Interest Network for Tracking and Triangulation (POINT^2). POINT^2 learns to establish 2D point-to-point correspondences between the pre- and intra-intervention images by tracking a set of random POIs. The 3D pose of the pre-intervention volume is then estimated through a triangulation layer. In POINT^2, the unified framework of the POI tracker and the triangulation layer enables learning informative 2D features and estimating 3D pose jointly. In contrast to existing approaches, POINT^2 only requires a single forward-pass to achieve a reliable 2D/3D registration. As the POI tracker is shift-invariant, POINT^2 is more robust to the initial pose of the 3D pre-intervention image. Extensive experiments on a large-scale clinical cone-beam CT (CBCT) dataset show that the proposed POINT^2 method outperforms the existing learning-based method in terms of accuracy, robustness and running time. Furthermore, when used as an initial pose estimator, our method also improves the robustness and speed of the state-of-the-art optimization-based approaches by ten folds.
Tasks
Published	2019-03-10
URL	https://arxiv.org/abs/1903.03896v3
PDF	https://arxiv.org/pdf/1903.03896v3.pdf
PWC	https://paperswithcode.com/paper/multiview-2d3d-rigid-registration-via-a-point
Repo
Framework

Single Image BRDF Parameter Estimation with a Conditional Adversarial Network


Title	Single Image BRDF Parameter Estimation with a Conditional Adversarial Network
Authors	Mark Boss, Hendrik P. A. Lensch
Abstract	Creating plausible surfaces is an essential component in achieving a high degree of realism in rendering. To relieve artists, who create these surfaces in a time-consuming, manual process, automated retrieval of the spatially-varying Bidirectional Reflectance Distribution Function (SVBRDF) from a single mobile phone image is desirable. By leveraging a deep neural network, this casual capturing method can be achieved. The trained network can estimate per pixel normal, base color, metallic and roughness parameters from the Disney BRDF. The input image is taken with a mobile phone lit by the camera flash. The network is trained to compensate for environment lighting and thus learned to reduce artifacts introduced by other light sources. These losses contain a multi-scale discriminator with an additional perceptual loss, a rendering loss using a differentiable renderer, and a parameter loss. Besides the local precision, this loss formulation generates material texture maps which are globally more consistent. The network is set up as a generator network trained in an adversarial fashion to ensure that only plausible maps are produced. The estimated parameters not only reproduce the material faithfully in rendering but capture the style of hand-authored materials due to the more global loss terms compared to previous works without requiring additional post-processing. Both the resolution and the quality is improved.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05148v1
PDF	https://arxiv.org/pdf/1910.05148v1.pdf
PWC	https://paperswithcode.com/paper/single-image-brdf-parameter-estimation-with-a
Repo
Framework

Unmasking Clever Hans Predictors and Assessing What Machines Really Learn


Title	Unmasking Clever Hans Predictors and Assessing What Machines Really Learn
Authors	Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller
Abstract	Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly “intelligent” behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.
Tasks
Published	2019-02-26
URL	http://arxiv.org/abs/1902.10178v1
PDF	http://arxiv.org/pdf/1902.10178v1.pdf
PWC	https://paperswithcode.com/paper/unmasking-clever-hans-predictors-and
Repo
Framework

Convolutional Neural Networks on Randomized Data


Title	Convolutional Neural Networks on Randomized Data
Authors	Cristian Ivan
Abstract	Convolutional Neural Networks (CNNs) are build specifically for computer vision tasks for which it is known that the input data is a hierarchical structure based on locally correlated elements. The question that naturally arises is what happens with the performance of CNNs if one of the basic properties of the data is removed, e.g. what happens if the image pixels are randomly permuted? Intuitively one expects that the convolutional network performs poorly in these circumstances in contrast to a multilayer perceptron (MLPs) whose classification accuracy should not be affected by the pixel randomization. This work shows that by randomizing image pixels the hierarchical structure of the data is destroyed and long range correlations are introduced which standard CNNs are not able to capture. We show that their classification accuracy is heavily dependent on the class similarities as well as the pixel randomization process. We also indicate that dilated convolutions are able to recover some of the pixel correlations and improve the performance.
Tasks
Published	2019-07-25
URL	https://arxiv.org/abs/1907.10935v1
PDF	https://arxiv.org/pdf/1907.10935v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-networks-on-randomized
Repo
Framework

An Annotation Scheme of A Large-scale Multi-party Dialogues Dataset for Discourse Parsing and Machine Comprehension


Title	An Annotation Scheme of A Large-scale Multi-party Dialogues Dataset for Discourse Parsing and Machine Comprehension
Authors	Jiaqi Li, Ming Liu, Bing Qin, Zihao Zheng, Ting Liu
Abstract	In this paper, we propose the scheme for annotating large-scale multi-party chat dialogues for discourse parsing and machine comprehension. The main goal of this project is to help understand multi-party dialogues. Our dataset is based on the Ubuntu Chat Corpus. For each multi-party dialogue, we annotate the discourse structure and question-answer pairs for dialogues. As we know, this is the first large scale corpus for multi-party dialogues discourse parsing, and we firstly propose the task for multi-party dialogues machine reading comprehension.
Tasks	Machine Reading Comprehension, Reading Comprehension
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03514v1
PDF	https://arxiv.org/pdf/1911.03514v1.pdf
PWC	https://paperswithcode.com/paper/an-annotation-scheme-of-a-large-scale-multi
Repo
Framework

A deep learning system for differential diagnosis of skin diseases


Title	A deep learning system for differential diagnosis of skin diseases
Authors	Yuan Liu, Ayush Jain, Clara Eng, David H. Way, Kang Lee, Peggy Bui, Kimberly Kanada, Guilherme de Oliveira Marinho, Jessica Gallegos, Sara Gabriele, Vishakha Gupta, Nalini Singh, Vivek Natarajan, Rainer Hofmann-Wellenhof, Greg S. Corrado, Lily H. Peng, Dale R. Webster, Dennis Ai, Susan Huang, Yun Liu, R. Carter Dunn, David Coz
Abstract	Skin conditions affect an estimated 1.9 billion people worldwide. A shortage of dermatologists causes long wait times and leads patients to seek dermatologic care from general practitioners. However, the diagnostic accuracy of general practitioners has been reported to be only 0.24-0.70 (compared to 0.77-0.96 for dermatologists), resulting in referral errors, delays in care, and errors in diagnosis and treatment. In this paper, we developed a deep learning system (DLS) to provide a differential diagnosis of skin conditions for clinical cases (skin photographs and associated medical histories). The DLS distinguishes between 26 skin conditions that represent roughly 80% of the volume of skin conditions seen in primary care. The DLS was developed and validated using de-identified cases from a teledermatology practice serving 17 clinical sites via a temporal split: the first 14,021 cases for development and the last 3,756 cases for validation. On the validation set, where a panel of three board-certified dermatologists defined the reference standard for every case, the DLS achieved 0.71 and 0.93 top-1 and top-3 accuracies respectively. For a random subset of the validation set (n=963 cases), 18 clinicians reviewed the cases for comparison. On this subset, the DLS achieved a 0.67 top-1 accuracy, non-inferior to board-certified dermatologists (0.63, p<0.001), and higher than primary care physicians (PCPs, 0.45) and nurse practitioners (NPs, 0.41). The top-3 accuracy showed a similar trend: 0.90 DLS, 0.75 dermatologists, 0.60 PCPs, and 0.55 NPs. These results highlight the potential of the DLS to augment general practitioners to accurately diagnose skin conditions by suggesting differential diagnoses that may not have been considered. Future work will be needed to prospectively assess the clinical impact of using this tool in actual clinical workflows.
Tasks
Published	2019-09-11
URL	https://arxiv.org/abs/1909.05382v1
PDF	https://arxiv.org/pdf/1909.05382v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-system-for-differential
Repo
Framework

Matching on What Matters: A Pseudo-Metric Learning Approach to Matching Estimation in High Dimensions


Title	Matching on What Matters: A Pseudo-Metric Learning Approach to Matching Estimation in High Dimensions
Authors	Gentry Johnson, Brian Quistorff, Matt Goldman
Abstract	When pre-processing observational data via matching, we seek to approximate each unit with maximally similar peers that had an alternative treatment status–essentially replicating a randomized block design. However, as one considers a growing number of continuous features, a curse of dimensionality applies making asymptotically valid inference impossible (Abadie and Imbens, 2006). The alternative of ignoring plausibly relevant features is certainly no better, and the resulting trade-off substantially limits the application of matching methods to “wide” datasets. Instead, Li and Fu (2017) recasts the problem of matching in a metric learning framework that maps features to a low-dimensional space that facilitates “closer matches” while still capturing important aspects of unit-level heterogeneity. However, that method lacks key theoretical guarantees and can produce inconsistent estimates in cases of heterogeneous treatment effects. Motivated by straightforward extension of existing results in the matching literature, we present alternative techniques that learn latent matching features through either MLPs or through siamese neural networks trained on a carefully selected loss function. We benchmark the resulting alternative methods in simulations as well as against two experimental data sets–including the canonical NSW worker training program data set–and find superior performance of the neural-net-based methods.
Tasks	Metric Learning
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12020v1
PDF	https://arxiv.org/pdf/1905.12020v1.pdf
PWC	https://paperswithcode.com/paper/matching-on-what-matters-a-pseudo-metric
Repo
Framework

Traffic Sign Detection and Recognition for Autonomous Driving in Virtual Simulation Environment


Title	Traffic Sign Detection and Recognition for Autonomous Driving in Virtual Simulation Environment
Authors	Meixin Zhu, Jingyun Hu, Ziyuan Pu, Zhiyong Cui, Liangwu Yan, Yinhai Wang
Abstract	This study developed a traffic sign detection and recognition algorithm based on the RetinaNet. Two main aspects were revised to improve the detection of traffic signs: image cropping to address the issue of large image and small traffic signs; and using more anchors with various scales to detect traffic signs with different sizes and shapes. The proposed algorithm was trained and tested in a series of autonomous driving front-view images in a virtual simulation environment. Results show that the algorithm performed extremely well under good illumination and weather conditions. Its drawbacks are that it sometimes failed to detect object under bad weather conditions like snow and failed to distinguish speed limits signs with different limit values.
Tasks	Autonomous Driving, Image Cropping
Published	2019-10-27
URL	https://arxiv.org/abs/1911.05626v1
PDF	https://arxiv.org/pdf/1911.05626v1.pdf
PWC	https://paperswithcode.com/paper/traffic-sign-detection-and-recognition-for
Repo
Framework

An End-to-End Neural Network for Image Cropping by Learning Composition from Aesthetic Photos


Title	An End-to-End Neural Network for Image Cropping by Learning Composition from Aesthetic Photos
Authors	Peng Lu, Hao Zhang, Xujun Peng, Xiaofu Jin
Abstract	As one of the fundamental techniques for image editing, image cropping discards unrelevant contents and remains the pleasing portions of the image to enhance the overall composition and achieve better visual/aesthetic perception. In this paper, we primarily focus on improving the accuracy of automatic image cropping, and on further exploring its potential in public datasets with high efficiency. From this respect, we propose a deep learning based framework to learn the objects composition from photos with high aesthetic qualities, where an anchor region is detected through a convolutional neural network (CNN) with the Gaussian kernel to maintain the interested objects’ integrity. This initial detected anchor area is then fed into a light weighted regression network to obtain the final cropping result. Unlike the conventional methods that multiple candidates are proposed and evaluated iteratively, only a single anchor region is produced in our model, which is mapped to the final output directly. Thus, low computational resources are required for the proposed approach. The experimental results on the public datasets show that both cropping accuracy and efficiency achieve the state-ofthe-art performances.
Tasks	Image Cropping
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01432v3
PDF	https://arxiv.org/pdf/1907.01432v3.pdf
PWC	https://paperswithcode.com/paper/an-end-to-end-neural-network-for-image
Repo
Framework