January 30, 2020

3783 words 18 mins read

Paper Group ANR 305

Multimodal Subspace Support Vector Data Description. RAD: On-line Anomaly Detection for Highly Unreliable Data. TSRNet: Scalable 3D Surface Reconstruction Network for Point Clouds using Tangent Convolution. PZnet: Efficient 3D ConvNet Inference on Manycore CPUs. Theory of Minds: Understanding Behavior in Groups Through Inverse Planning. Free-riders …

Multimodal Subspace Support Vector Data Description


Title	Multimodal Subspace Support Vector Data Description
Authors	Fahad Sohrab, Jenni Raitoharju, Alexandros Iosifidis, Moncef Gabbouj
Abstract	In this paper, we propose a novel method for projecting data from multiple modalities to a new subspace optimized for one-class classification. The proposed method iteratively transforms the data from the original feature space of each modality to a new common feature space along with finding a joint compact description of data coming from all the modalities. For data in each modality, we define a separate transformation to map the data from the corresponding feature space to the new optimized subspace by exploiting the available information from the class of interest only. The data description in the new subspace is obtained by Support Vector Data Description. We also propose different regularization strategies for the proposed method and provide both linear and non-linear formulation. We conduct experiments on two multimodal datasets and compare the proposed approach with baseline and recently proposed one-class classification methods combined with early fusion and also considering each modality separately. We show that the proposed Multimodal Subspace Support Vector Data Description outperforms all the methods using data from a single modality and performs better or equally well than the methods fusing data from all modalities.
Tasks
Published	2019-04-16
URL	http://arxiv.org/abs/1904.07698v1
PDF	http://arxiv.org/pdf/1904.07698v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-subspace-support-vector-data
Repo
Framework

RAD: On-line Anomaly Detection for Highly Unreliable Data


Title	RAD: On-line Anomaly Detection for Highly Unreliable Data
Authors	Zilong Zhao, Robert Birke, Rui Han, Bogdan Robu, Sara Bouchenak, Sonia Ben Mokhtar, Lydia Y. Chen
Abstract	Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we present a two-layer on-line learning framework for robust anomaly detection (RAD) in the presence of unreliable anomaly labels, where the first layer is to filter out the suspicious data, and the second layer detects the anomaly patterns from the remaining data. To adapt to the on-line nature of anomaly detection, we extend RAD with additional features of repetitively cleaning, conflicting opinions of classifiers, and oracle knowledge. We on-line learn from the incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. We specifically focus on three use cases, (i) detecting 10 classes of IoT attacks, (ii) predicting 4 classes of task failures of big data jobs, (iii) recognising 20 celebrities faces. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98% for IoT device attacks (i.e., +11%), up to 84% for cloud task failures (i.e., +20%) under 40% noise, and up to 74% for face recognition (i.e., +28%) under 30% noisy labels. The proposed RAD is general and can be applied to different anomaly detection algorithms.
Tasks	Anomaly Detection, Face Recognition
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04383v1
PDF	https://arxiv.org/pdf/1911.04383v1.pdf
PWC	https://paperswithcode.com/paper/rad-on-line-anomaly-detection-for-highly
Repo
Framework

TSRNet: Scalable 3D Surface Reconstruction Network for Point Clouds using Tangent Convolution


Title	TSRNet: Scalable 3D Surface Reconstruction Network for Point Clouds using Tangent Convolution
Authors	Zhenxing Mi, Yiming Luo, Wenbing Tao
Abstract	Existing learning-based surface reconstruction methods from point clouds are still facing challenges in terms of scalability and preservation of details on point clouds of large scales. In this paper, we propose the TSRNet, a novel scalable learning-based method for surface reconstruction. It first takes a point cloud and its related octree vertices as input and learns to classify whether the octree vertices are in front or at back of the implicit surface. Then the Marching Cubes (MC) is applied to extract a surface from the binary labeled octree. In our method, we design a scalable learning-based pipeline for surface reconstruction. It does not consider the whole input data at once. It allows to divide the point cloud and octree vertices and to process different parts in parallel. Our network captures local geometry details by constructing local geometry-aware features for octree vertices. The local geometry-aware features enhance the predication accuracy greatly for the relative position among the vertices and the implicit surface. They also boost the generalization capability of our network. Our method is able to reconstruct local geometry details from point clouds of different scales, especially for point clouds with millions of points. More importantly, the time consumption on such point clouds is acceptable and competitive. Experiments show that our method achieves a significant breakthrough in scalability and quality compared with state-of-the-art learning-based methods.
Tasks
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07401v1
PDF	https://arxiv.org/pdf/1911.07401v1.pdf
PWC	https://paperswithcode.com/paper/tsrnet-scalable-3d-surface-reconstruction
Repo
Framework

PZnet: Efficient 3D ConvNet Inference on Manycore CPUs


Title	PZnet: Efficient 3D ConvNet Inference on Manycore CPUs
Authors	Sergiy Popovych, Davit Buniatyan, Aleksandar Zlateski, Kai Li, H. Sebastian Seung
Abstract	Convolutional nets have been shown to achieve state-of-the-art accuracy in many biomedical image analysis tasks. Many tasks within biomedical analysis domain involve analyzing volumetric (3D) data acquired by CT, MRI and Microscopy acquisition methods. To deploy convolutional nets in practical working systems, it is important to solve the efficient inference problem. Namely, one should be able to apply an already-trained convolutional network to many large images using limited computational resources. In this paper we present PZnet, a CPU-only engine that can be used to perform inference for a variety of 3D convolutional net architectures. PZNet outperforms MKL-based CPU implementations of PyTorch and Tensorflow by more than 3.5x for the popular U-net architecture. Moreover, for 3D convolutions with low featuremap numbers, cloud CPU inference with PZnet outperfroms cloud GPU inference in terms of cost efficiency.
Tasks
Published	2019-03-18
URL	http://arxiv.org/abs/1903.07525v1
PDF	http://arxiv.org/pdf/1903.07525v1.pdf
PWC	https://paperswithcode.com/paper/pznet-efficient-3d-convnet-inference-on
Repo
Framework

Theory of Minds: Understanding Behavior in Groups Through Inverse Planning


Title	Theory of Minds: Understanding Behavior in Groups Through Inverse Planning
Authors	Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum
Abstract	Human social behavior is structured by relationships. We form teams, groups, tribes, and alliances at all scales of human life. These structures guide multi-agent cooperation and competition, but when we observe others these underlying relationships are typically unobservable and hence must be inferred. Humans make these inferences intuitively and flexibly, often making rapid generalizations about the latent relationships that underlie behavior from just sparse and noisy observations. Rapid and accurate inferences are important for determining who to cooperate with, who to compete with, and how to cooperate in order to compete. Towards the goal of building machine-learning algorithms with human-like social intelligence, we develop a generative model of multi-agent action understanding based on a novel representation for these latent relationships called Composable Team Hierarchies (CTH). This representation is grounded in the formalism of stochastic games and multi-agent reinforcement learning. We use CTH as a target for Bayesian inference yielding a new algorithm for understanding behavior in groups that can both infer hidden relationships as well as predict future actions for multiple agents interacting together. Our algorithm rapidly recovers an underlying causal model of how agents relate in spatial stochastic games from just a few observations. The patterns of inference made by this algorithm closely correspond with human judgments and the algorithm makes the same rapid generalizations that people do.
Tasks	Bayesian Inference, Multi-agent Reinforcement Learning
Published	2019-01-18
URL	http://arxiv.org/abs/1901.06085v1
PDF	http://arxiv.org/pdf/1901.06085v1.pdf
PWC	https://paperswithcode.com/paper/theory-of-minds-understanding-behavior-in
Repo
Framework

Free-riders in Federated Learning: Attacks and Defenses


Title	Free-riders in Federated Learning: Attacks and Defenses
Authors	Jierui Lin, Min Du, Jian Liu
Abstract	Federated learning is a recently proposed paradigm that enables multiple clients to collaboratively train a joint model. It allows clients to train models locally, and leverages the parameter server to generate a global model by aggregating the locally submitted gradient updates at each round. Although the incentive model for federated learning has not been fully developed, it is supposed that participants are able to get rewards or the privilege to use the final global model, as a compensation for taking efforts to train the model. Therefore, a client who does not have any local data has the incentive to construct local gradient updates in order to deceive for rewards. In this paper, we are the first to propose the notion of free rider attacks, to explore possible ways that an attacker may construct gradient updates, without any local training data. Furthermore, we explore possible defenses that could detect the proposed attacks, and propose a new high dimensional detection method called STD-DAGMM, which particularly works well for anomaly detection of model parameters. We extend the attacks and defenses to consider more free riders as well as differential privacy, which sheds light on and calls for future research in this field.
Tasks	Anomaly Detection
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12560v1
PDF	https://arxiv.org/pdf/1911.12560v1.pdf
PWC	https://paperswithcode.com/paper/free-riders-in-federated-learning-attacks-and
Repo
Framework

xBD: A Dataset for Assessing Building Damage from Satellite Imagery


Title	xBD: A Dataset for Assessing Building Damage from Satellite Imagery
Authors	Ritwik Gupta, Richard Hosfelt, Sandra Sajeev, Nirav Patel, Bryce Goodman, Jigar Doshi, Eric Heim, Howie Choset, Matthew Gaston
Abstract	We present xBD, a new, large-scale dataset for the advancement of change detection and building damage assessment for humanitarian assistance and disaster recovery research. Natural disaster response requires an accurate understanding of damaged buildings in an affected region. Current response strategies require in-person damage assessments within 24-48 hours of a disaster. Massive potential exists for using aerial imagery combined with computer vision algorithms to assess damage and reduce the potential danger to human life. In collaboration with multiple disaster response agencies, xBD provides pre- and post-event satellite imagery across a variety of disaster events with building polygons, ordinal labels of damage level, and corresponding satellite metadata. Furthermore, the dataset contains bounding boxes and labels for environmental factors such as fire, water, and smoke. xBD is the largest building damage assessment dataset to date, containing 850,736 building annotations across 45,362 km\textsuperscript{2} of imagery.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09296v1
PDF	https://arxiv.org/pdf/1911.09296v1.pdf
PWC	https://paperswithcode.com/paper/xbd-a-dataset-for-assessing-building-damage
Repo
Framework

An Approach for Adaptive Automatic Threat Recognition Within 3D Computed Tomography Images for Baggage Security Screening


Title	An Approach for Adaptive Automatic Threat Recognition Within 3D Computed Tomography Images for Baggage Security Screening
Authors	Qian Wang, Khalid N. Ismail, Toby P. Breckon
Abstract	The screening of baggage using X-ray scanners is now routine in aviation security with automatic threat detection approaches, based on 3D X-ray computed tomography (CT) images, known as Automatic Threat Recognition (ATR) within the aviation security industry. These current strategies use pre-defined threat material signatures in contrast to adaptability towards new and emerging threat signatures. To address this issue, the concept of adaptive automatic threat recognition (AATR) was proposed in previous work. In this paper, we present a solution to AATR based on such X-ray CT baggage scan imagery. This aims to address the issues of rapidly evolving threat signatures within the screening requirements. Ideally, the detection algorithms deployed within the security scanners should be readily adaptable to different situations with varying requirements of threat characteristics (e.g., threat material, physical properties of objects). We tackle this issue using a novel adaptive machine learning methodology with our solution consisting of a multi-scale 3D CT image segmentation algorithm, a multi-class support vector machine (SVM) classifier for object material recognition and a strategy to enable the adaptability of our approach. Experiments are conducted on both open and sequestered 3D CT baggage image datasets specifically collected for the AATR study. Our proposed approach performs well on both recognition and adaptation. Overall our approach can achieve the probability of detection around 90% with a probability of false alarm below 20%. Our AATR shows the capabilities of adapting to varying types of materials, even the unknown materials which are not available in the training data, adapting to varying required probability of detection and adapting to varying scales of the threat object.
Tasks	Computed Tomography (CT), Material Recognition, Semantic Segmentation
Published	2019-03-25
URL	https://arxiv.org/abs/1903.10604v2
PDF	https://arxiv.org/pdf/1903.10604v2.pdf
PWC	https://paperswithcode.com/paper/an-approach-for-adaptive-automatic-threat
Repo
Framework

An End-to-end Framework For Integrated Pulmonary Nodule Detection and False Positive Reduction


Title	An End-to-end Framework For Integrated Pulmonary Nodule Detection and False Positive Reduction
Authors	Hao Tang, Xingwei Liu, Xiaohui Xie
Abstract	Pulmonary nodule detection using low-dose Computed Tomography (CT) is often the first step in lung disease screening and diagnosis. Recently, algorithms based on deep convolutional neural nets have shown great promise for automated nodule detection. Most of the existing deep learning nodule detection systems are constructed in two steps: a) nodule candidates screening and b) false positive reduction, using two different models trained separately. Although it is commonly adopted, the two-step approach not only imposes significant resource overhead on training two independent deep learning models, but also is sub-optimal because it prevents cross-talk between the two. In this work, we present an end-to-end framework for nodule detection, integrating nodule candidate screening and false positive reduction into one model, trained jointly. We demonstrate that the end-to-end system improves the performance by 3.88% over the two-step approach, while at the same time reducing model complexity by one third and cutting inference time by 3.6 fold. Code will be made publicly available.
Tasks	Computed Tomography (CT)
Published	2019-03-23
URL	http://arxiv.org/abs/1903.09880v1
PDF	http://arxiv.org/pdf/1903.09880v1.pdf
PWC	https://paperswithcode.com/paper/an-end-to-end-framework-for-integrated
Repo
Framework

Visual recognition in the wild by sampling deep similarity functions


Title	Visual recognition in the wild by sampling deep similarity functions
Authors	Mikhail Usvyatsov, Konrad Schindler
Abstract	Recognising relevant objects or object states in its environment is a basic capability for an autonomous robot. The dominant approach to object recognition in images and range images is classification by supervised machine learning, nowadays mostly with deep convolutional neural networks (CNNs). This works well for target classes whose variability can be completely covered with training examples. However, a robot moving in the wild, i.e., in an environment that is not known at the time the recognition system is trained, will often face \emph{domain shift}: the training data cannot be assumed to exhaustively cover all the within-class variability that will be encountered in the test data. In that situation, learning is in principle possible, since the training set does capture the defining properties, respectively dissimilarities, of the target classes. But directly training a CNN to predict class probabilities is prone to overfitting to irrelevant correlations between the class labels and the specific subset of the target class that is represented in the training set. We explore the idea to instead learn a Siamese CNN that acts as similarity function between pairs of training examples. Class predictions are then obtained by measuring the similarities between a new test instance and the training samples. We show that the CNN embedding correctly recovers the relative similarities to arbitrary class exemplars in the training set. And that therefore few, randomly picked training exemplars are sufficient to achieve good predictions, making the procedure efficient.
Tasks	Object Recognition
Published	2019-03-15
URL	http://arxiv.org/abs/1903.06837v1
PDF	http://arxiv.org/pdf/1903.06837v1.pdf
PWC	https://paperswithcode.com/paper/visual-recognition-in-the-wild-by-sampling
Repo
Framework

Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset


Title	Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset
Authors	Guoxian Song, Jianfei Cai, Tat-Jen Cham, Jianmin Zheng, Juyong Zhang, Henry Fuchs
Abstract	Teleconference or telepresence based on virtual reality (VR) headmount display (HMD) device is a very interesting and promising application since HMD can provide immersive feelings for users. However, in order to facilitate face-to-face communications for HMD users, real-time 3D facial performance capture of a person wearing HMD is needed, which is a very challenging task due to the large occlusion caused by HMD. The existing limited solutions are very complex either in setting or in approach as well as lacking the performance capture of 3D eye gaze movement. In this paper, we propose a convolutional neural network (CNN) based solution for real-time 3D face-eye performance capture of HMD users without complex modification to devices. To address the issue of lacking training data, we generate massive pairs of HMD face-label dataset by data synthesis as well as collecting VR-IR eye dataset from multiple subjects. Then, we train a dense-fitting network for facial region and an eye gaze network to regress 3D eye model parameters. Extensive experimental results demonstrate that our system can efficiently and effectively produce in real time a vivid personalized 3D avatar with the correct identity, pose, expression and eye motion corresponding to the HMD user.
Tasks
Published	2019-01-21
URL	http://arxiv.org/abs/1901.06765v1
PDF	http://arxiv.org/pdf/1901.06765v1.pdf
PWC	https://paperswithcode.com/paper/real-time-3d-face-eye-performance-capture-of
Repo
Framework

Model-Driven Deep Learning for Joint MIMO Channel Estimation and Signal Detection


Title	Model-Driven Deep Learning for Joint MIMO Channel Estimation and Signal Detection
Authors	Hengtao He, Chao-Kai Wen, Shi Jin, Geoffrey Ye Li
Abstract	In this paper, we investigate the model-driven deep learning (DL) for joint MIMO channel estimation and signal detection (JCESD), where signal detection considers channel estimation error and channel statistics while channel estimation is refined by detected data and takes the signal detection error into consideration. In particular, the MIMO signal detector is specially designed by unfolding an iterative algorithm and adding some trainable parameters. Since the number of trainable parameters is much fewer than the data-driven DL based signal detector, the model-driven DL based MIMO signal detector can be rapidly trained with a much smaller data set. Furthermore, the proposed signal detector can be extended to soft-input soft-output detection easily. Based on numerical results, the model-driven DL based JCESD scheme significantly improves the performance of the corresponding traditional iterative detector and the signal detector exhibits superior robustness to signal-to-noise ratio (SNR) and channel correlation mismatches.
Tasks
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09439v1
PDF	https://arxiv.org/pdf/1907.09439v1.pdf
PWC	https://paperswithcode.com/paper/model-driven-deep-learning-for-joint-mimo
Repo
Framework

The cost-free nature of optimally tuning Tikhonov regularizers and other ordered smoothers


Title	The cost-free nature of optimally tuning Tikhonov regularizers and other ordered smoothers
Authors	Pierre C Bellec, Dana Yang
Abstract	We consider the problem of selecting the best estimator among a family of Tikhonov regularized estimators, or, alternatively, to select a linear combination of these regularizers that is as good as the best regularizer in the family. Our theory reveals that if the Tikhonov regularizers share the same penalty matrix with different tuning parameters, a convex procedure based on $Q$-aggregation achieves the mean square error of the best estimator, up to a small error term no larger than $C\sigma^2$, where $\sigma^2$ is the noise level and $C>0$ is an absolute constant. Remarkably, the error term does not depend on the penalty matrix or the number of estimators as long as they share the same penalty matrix, i.e., it applies to any grid of tuning parameters, no matter how large the cardinality of the grid is. This reveals the surprising “cost-free” nature of optimally tuning Tikhonov regularizers, in striking contrast with the existing literature on aggregation of estimators where one typically has to pay a cost of $\sigma^2\log(M)$ where $M$ is the number of estimators in the family. The result holds, more generally, for any family of ordered linear smoothers. This encompasses Ridge regression as well as Principal Component Regression. The result is extended to the problem of tuning Tikhonov regularizers with different penalty matrices.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12517v1
PDF	https://arxiv.org/pdf/1905.12517v1.pdf
PWC	https://paperswithcode.com/paper/the-cost-free-nature-of-optimally-tuning
Repo
Framework

Neural Population Coding for Effective Temporal Classification


Title	Neural Population Coding for Effective Temporal Classification
Authors	Zihan Pan, Jibin Wu, Yansong Chua, Malu Zhang, Haizhou Li
Abstract	Neural encoding plays an important role in faithfully describing the temporally rich patterns, whose instances include human speech and environmental sounds. For tasks that involve classifying such spatio-temporal patterns with the Spiking Neural Networks (SNNs), how these patterns are encoded directly influence the difficulty of the task. In this paper, we compare several existing temporal and population coding schemes and evaluate them on both speech (TIDIGITS) and sound (RWCP) datasets. We show that, with population neural codings, the encoded patterns are linearly separable using the Support Vector Machine (SVM). We note that the population neural codings effectively project the temporal information onto the spatial domain, thus improving linear separability in the spatial dimension, achieving an accuracy of 95% and 100% for TIDIGITS and RWCP datasets classified using the SVM, respectively. This observation suggests that an effective neural coding scheme greatly simplifies the classification problem such that a simple linear classifier would suffice. The above datasets are then classified using the Tempotron, an SNN-based classifier. SNN classification results agree with the SVM findings that population neural codings help to improve classification accuracy. Hence, other than the learning algorithm, effective neural encoding is just as important as an SNN designed to recognize spatio-temporal patterns. It is an often neglected but powerful abstraction that deserves further study.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.08018v2
PDF	https://arxiv.org/pdf/1909.08018v2.pdf
PWC	https://paperswithcode.com/paper/neural-population-coding-for-effective
Repo
Framework

Image Privacy Prediction Using Deep Neural Networks


Title	Image Privacy Prediction Using Deep Neural Networks
Authors	Ashwini Tonge, Cornelia Caragea
Abstract	Images today are increasingly shared online on social networking sites such as Facebook, Flickr, Foursquare, and Instagram. Despite that current social networking sites allow users to change their privacy preferences, this is often a cumbersome task for the vast majority of users on the Web, who face difficulties in assigning and managing privacy settings. Thus, automatically predicting images’ privacy to warn users about private or sensitive content before uploading these images on social networking sites has become a necessity in our current interconnected world. In this paper, we explore learning models to automatically predict appropriate images’ privacy as private or public using carefully identified image-specific features. We study deep visual semantic features that are derived from various layers of Convolutional Neural Networks (CNNs) as well as textual features such as user tags and deep tags generated from deep CNNs. Particularly, we extract deep (visual and tag) features from four pre-trained CNN architectures for object recognition, i.e., AlexNet, GoogLeNet, VGG-16, and ResNet, and compare their performance for image privacy prediction. Results of our experiments on a Flickr dataset of over thirty thousand images show that the learning models trained on features extracted from ResNet outperform the state-of-the-art models for image privacy prediction. We further investigate the combination of user tags and deep tags derived from CNN architectures using two settings: (1) SVM on the bag-of-tags features; and (2) text-based CNN. Our results show that even though the models trained on the visual features perform better than those trained on the tag features, the combination of deep visual features with image tags shows improvements in performance over the individual feature sets.
Tasks	Object Recognition
Published	2019-03-08
URL	http://arxiv.org/abs/1903.03695v1
PDF	http://arxiv.org/pdf/1903.03695v1.pdf
PWC	https://paperswithcode.com/paper/image-privacy-prediction-using-deep-neural
Repo
Framework