April 3, 2020

3321 words 16 mins read

Paper Group ANR 50

Paper Group ANR 50

MANet: Multimodal Attention Network based Point- View fusion for 3D Shape Recognition. Curvature Regularized Surface Reconstruction from Point Cloud. Highly Efficient Salient Object Detection with 100K Parameters. Overview of the CCKS 2019 Knowledge Graph Evaluation Track: Entity, Relation, Event and QA. Preparation of ordered states in ultra-cold …

MANet: Multimodal Attention Network based Point- View fusion for 3D Shape Recognition

Title MANet: Multimodal Attention Network based Point- View fusion for 3D Shape Recognition
Authors Yaxin Zhao, Jichao Jiao, Tangkun Zhang
Abstract 3D shape recognition has attracted more and more attention as a task of 3D vision research. The proliferation of 3D data encourages various deep learning methods based on 3D data. Now there have been many deep learning models based on point-cloud data or multi-view data alone. However, in the era of big data, integrating data of two different modals to obtain a unified 3D shape descriptor is bound to improve the recognition accuracy. Therefore, this paper proposes a fusion network based on multimodal attention mechanism for 3D shape recognition. Considering the limitations of multi-view data, we introduce a soft attention scheme, which can use the global point-cloud features to filter the multi-view features, and then realize the effective fusion of the two features. More specifically, we obtain the enhanced multi-view features by mining the contribution of each multi-view image to the overall shape recognition, and then fuse the point-cloud features and the enhanced multi-view features to obtain a more discriminative 3D shape descriptor. We have performed relevant experiments on the ModelNet40 dataset, and experimental results verify the effectiveness of our method.
Tasks 3D Shape Recognition
Published 2020-02-28
URL https://arxiv.org/abs/2002.12573v1
PDF https://arxiv.org/pdf/2002.12573v1.pdf
PWC https://paperswithcode.com/paper/manet-multimodal-attention-network-based

Curvature Regularized Surface Reconstruction from Point Cloud

Title Curvature Regularized Surface Reconstruction from Point Cloud
Authors Yuchen He, Sung Ha Kang, Hao Liu
Abstract We propose a variational functional and fast algorithms to reconstruct implicit surface from point cloud data with a curvature constraint. The minimizing functional balances the distance function from the point cloud and the mean curvature term. Only the point location is used, without any local normal or curvature estimation at each point. With the added curvature constraint, the computation becomes particularly challenging. To enhance the computational efficiency, we solve the problem by a novel operator splitting scheme. It replaces the original high-order PDEs by a decoupled PDE system, which is solved by a semi-implicit method. We also discuss approach using an augmented Lagrangian method. The proposed method shows robustness against noise, and recovers concave features and sharp corners better compared to models without curvature constraint. Numerical experiments in two and three dimensional data sets, noisy and sparse data are presented to validate the model.
Published 2020-01-22
URL https://arxiv.org/abs/2001.07884v1
PDF https://arxiv.org/pdf/2001.07884v1.pdf
PWC https://paperswithcode.com/paper/curvature-regularized-surface-reconstruction

Highly Efficient Salient Object Detection with 100K Parameters

Title Highly Efficient Salient Object Detection with 100K Parameters
Authors Shang-Hua Gao, Yong-Qiang Tan, Ming-Ming Cheng, Chengze Lu, Yunpeng Chen, Shuicheng Yan
Abstract Salient object detection models often demand a considerable amount of computation cost to make precise prediction for each pixel, making them hardly applicable on low-power devices. In this paper, we aim to relieve the contradiction between computation cost and model performance by improving the network efficiency to a higher degree. We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features, while reducing the representation redundancy by a novel dynamic weight decay scheme. The effective dynamic weight decay scheme stably boosts the sparsity of parameters during training, supports learnable number of channels for each scale in gOctConv, allowing 80% of parameters reduce with negligible performance drop. Utilizing gOctConv, we build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% parameters (100k) of large models on popular salient object detection benchmarks.
Tasks Object Detection, Salient Object Detection
Published 2020-03-12
URL https://arxiv.org/abs/2003.05643v1
PDF https://arxiv.org/pdf/2003.05643v1.pdf
PWC https://paperswithcode.com/paper/highly-efficient-salient-object-detection

Overview of the CCKS 2019 Knowledge Graph Evaluation Track: Entity, Relation, Event and QA

Title Overview of the CCKS 2019 Knowledge Graph Evaluation Track: Entity, Relation, Event and QA
Authors Xianpei Han, Zhichun Wang, Jiangtao Zhang, Qinghua Wen, Wenqi Li, Buzhou Tang, Qi Wang, Zhifan Feng, Yang Zhang, Yajuan Lu, Haitao Wang, Wenliang Chen, Hao Shao, Yubo Chen, Kang Liu, Jun Zhao, Taifeng Wang, Kezun Zhang, Meng Wang, Yinlin Jiang, Guilin Qi, Lei Zou, Sen Hu, Minhao Zhang, Yinnian Lin
Abstract Knowledge graph models world knowledge as concepts, entities, and the relationships between them, which has been widely used in many real-world tasks. CCKS 2019 held an evaluation track with 6 tasks and attracted more than 1,600 teams. In this paper, we give an overview of the knowledge graph evaluation tract at CCKS 2019. By reviewing the task definition, successful methods, useful resources, good strategies and research challenges associated with each task in CCKS 2019, this paper can provide a helpful reference for developing knowledge graph applications and conducting future knowledge graph researches.
Published 2020-03-09
URL https://arxiv.org/abs/2003.03875v1
PDF https://arxiv.org/pdf/2003.03875v1.pdf
PWC https://paperswithcode.com/paper/overview-of-the-ccks-2019-knowledge-graph

Preparation of ordered states in ultra-cold gases using Bayesian optimization

Title Preparation of ordered states in ultra-cold gases using Bayesian optimization
Authors Rick Mukherjee, Frederic Sauvage, Harry Xie, Robert Löw, Florian Mintert
Abstract Ultra-cold atomic gases are unique in terms of the degree of controllability, both for internal and external degrees of freedom. This makes it possible to use them for the study of complex quantum many-body phenomena. However in many scenarios, the prerequisite condition of faithfully preparing a desired quantum state despite decoherence and system imperfections is not always adequately met. To path the way to a specific target state, we explore quantum optimal control framework based on Bayesian optimization. The probabilistic modeling and broad exploration aspects of Bayesian optimization is particularly suitable for quantum experiments where data acquisition can be expensive. Using numerical simulations for the superfluid to Mott-insulator transition for bosons in a lattice as well for the formation of Rydberg crystals as explicit examples, we demonstrate that Bayesian optimization is capable of finding better control solutions with regards to finite and noisy data compared to existing methods of optimal control.
Published 2020-01-10
URL https://arxiv.org/abs/2001.03520v2
PDF https://arxiv.org/pdf/2001.03520v2.pdf
PWC https://paperswithcode.com/paper/preparation-of-ordered-states-in-ultra-cold

Training-free Monocular 3D Event Detection System for Traffic Surveillance

Title Training-free Monocular 3D Event Detection System for Traffic Surveillance
Authors Lijun Yu, Peng Chen, Wenhe Liu, Guoliang Kang, Alexander G. Hauptmann
Abstract We focus on the problem of detecting traffic events in a surveillance scenario, including the detection of both vehicle actions and traffic collisions. Existing event detection systems are mostly learning-based and have achieved convincing performance when a large amount of training data is available. However, in real-world scenarios, collecting sufficient labeled training data is expensive and sometimes impossible (e.g. for traffic collision detection). Moreover, the conventional 2D representation of surveillance views is easily affected by occlusions and different camera views in nature. To deal with the aforementioned problems, in this paper, we propose a training-free monocular 3D event detection system for traffic surveillance. Our system firstly projects the vehicles into the 3D Euclidean space and estimates their kinematic states. Then we develop multiple simple yet effective ways to identify the events based on the kinematic patterns, which need no further training. Consequently, our system is robust to the occlusions and the viewpoint changes. Exclusive experiments report the superior result of our method on large-scale real-world surveillance datasets, which validates the effectiveness of our proposed system.
Published 2020-02-01
URL https://arxiv.org/abs/2002.00137v1
PDF https://arxiv.org/pdf/2002.00137v1.pdf
PWC https://paperswithcode.com/paper/training-free-monocular-3d-event-detection

Efficient Crowd Counting via Structured Knowledge Transfer

Title Efficient Crowd Counting via Structured Knowledge Transfer
Authors Lingbo Liu, Jiaqi Chen, Hefeng Wu, Tianshui Chen, Guanbin Li, Liang Lin
Abstract Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. However, most previous works relied on heavy backbone networks and required prohibitive runtimes, which would seriously restrict their deployment scopes and cause poor scalability. To liberate these crowd counting models, we propose a novel Structured Knowledge Transfer (SKT) framework integrating two complementary transfer modules, which can generate a lightweight but still highly effective student network by fully exploiting the structured knowledge of a well-trained teacher network. Specifically, an Intra-Layer Pattern Transfer sequentially distills the knowledge embedded in single-layer features of the teacher network to guide feature learning of the student network. Simultaneously, an Inter-Layer Relation Transfer densely distills the cross-layer correlation knowledge of the teacher to regularize the student’s feature evolution. In this way, our student network can learn compact and knowledgeable features, yielding high efficiency and competitive performance. Extensive evaluations on three benchmarks well demonstrate the knowledge transfer effectiveness of our SKT for extensive crowd counting models. In particular, only having one-sixteenth of the parameters and computation cost of original models, our distilled VGG-based models obtain at least 6.5$\times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
Tasks Crowd Counting, Transfer Learning
Published 2020-03-23
URL https://arxiv.org/abs/2003.10120v1
PDF https://arxiv.org/pdf/2003.10120v1.pdf
PWC https://paperswithcode.com/paper/efficient-crowd-counting-via-structured

Distilling Knowledge from Graph Convolutional Networks

Title Distilling Knowledge from Graph Convolutional Networks
Authors Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, Xinchao Wang
Abstract Existing knowledge distillation methods focus on convolutional neural networks~(CNNs), where the input samples like images lie in a grid domain, and have largely overlooked graph convolutional networks~(GCN) that handle non-grid data. In this paper, we propose to our best knowledge the first dedicated approach to {distilling} knowledge from a pre-trained GCN model. To enable the knowledge transfer from the teacher GCN to the student, we propose a local structure preserving module that explicitly accounts for the topological semantics of the teacher. In this module, the local structure information from both the teacher and the student are extracted as distributions, and hence minimizing the distance between these distributions enables topology-aware knowledge transfer from the teacher, yielding a compact yet high-performance student model. Moreover, the proposed approach is readily extendable to dynamic graph models, where the input graphs for the teacher and the student may differ. We evaluate the proposed method on two different datasets using GCN models of different architectures, and demonstrate that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
Tasks Transfer Learning
Published 2020-03-23
URL https://arxiv.org/abs/2003.10477v3
PDF https://arxiv.org/pdf/2003.10477v3.pdf
PWC https://paperswithcode.com/paper/distillating-knowledge-from-graph

A Robotic 3D Perception System for Operating Room Environment Awareness

Title A Robotic 3D Perception System for Operating Room Environment Awareness
Authors Zhaoshuo Li, Amirreza Shaban, Jean-Gabriel Simard, Dinesh Rabindran, Simon DiMaio, Omid Mohareri
Abstract Purpose: We describe a 3D multi-view perception system for the da Vinci surgical system to enable Operating room (OR) scene understanding and context awareness. Methods: Our proposed system is comprised of four Time-of-Flight (ToF) cameras rigidly attached to strategic locations on the daVinci Xi patient side cart (PSC). The cameras are registered to the robot’s kinematic chain by performing a one-time calibration routine and therefore, information from all cameras can be fused and represented in one common coordinate frame. Based on this architecture, a multi-view 3D scene semantic segmentation algorithm is created to enable recognition of common and salient objects/equipment and surgical activities in a da Vinci OR. Our proposed 3D semantic segmentation method has been trained and validated on a novel densely annotated dataset that has been captured from clinical scenarios. Results: The results show that our proposed architecture has acceptable registration error ($3.3%\pm1.4%$ of object-camera distance) and can robustly improve scene segmentation performance (mean Intersection Over Union - mIOU) for less frequently appearing classes ($\ge 0.013$) compared to a single-view method. Conclusion: We present the first dynamic multi-view perception system with a novel segmentation architecture, which can be used as a building block technology for applications such as surgical workflow analysis, automation of surgical sub-tasks and advanced guidance systems.
Tasks 3D Semantic Segmentation, Calibration, Scene Segmentation, Scene Understanding, Semantic Segmentation
Published 2020-03-20
URL https://arxiv.org/abs/2003.09487v2
PDF https://arxiv.org/pdf/2003.09487v2.pdf
PWC https://paperswithcode.com/paper/a-robotic-3d-perception-system-for-operating

Multilayer Dense Connections for Hierarchical Concept Classification

Title Multilayer Dense Connections for Hierarchical Concept Classification
Authors Toufiq Parag, Hongcheng Wang
Abstract Classification is a pivotal function for many computer vision tasks such as object classification, detection, scene segmentation. Multinomial logistic regression with a single final layer of dense connections has become the ubiquitous technique for CNN-based classification. While these classifiers learn a mapping between the input and a set of output category classes, they do not typically learn a comprehensive knowledge about the category. In particular, when a CNN based image classifier correctly identifies the image of a Chimpanzee, it does not know that it is a member of Primate, Mammal, Chordate families and a living thing. We propose a multilayer dense connectivity for a CNN to simultaneously predict the category and its conceptual superclasses in hierarchical order. We experimentally demonstrate that our proposed dense connections, in conjunction with popular convolutional feature layers, can learn to predict the conceptual classes with minimal increase in network size while maintaining the categorical classification accuracy.
Tasks Object Classification, Scene Segmentation
Published 2020-03-19
URL https://arxiv.org/abs/2003.09015v1
PDF https://arxiv.org/pdf/2003.09015v1.pdf
PWC https://paperswithcode.com/paper/multilayer-dense-connections-for-hierarchical

Intelligent multiscale simulation based on process-guided composite database

Title Intelligent multiscale simulation based on process-guided composite database
Authors Zeliang Liu, Haoyan Wei, Tianyu Huang, C. T. Wu
Abstract In the paper, we present an integrated data-driven modeling framework based on process modeling, material homogenization, mechanistic machine learning, and concurrent multiscale simulation. We are interested in the injection-molded short fiber reinforced composites, which have been identified as key material systems in automotive, aerospace, and electronics industries. The molding process induces spatially varying microstructures across various length scales, while the resulting strongly anisotropic and nonlinear material properties are still challenging to be captured by conventional modeling approaches. To prepare the linear elastic training data for our machine learning tasks, Representative Volume Elements (RVE) with different fiber orientations and volume fractions are generated through stochastic reconstruction. More importantly, we utilize the recently proposed Deep Material Network (DMN) to learn the hidden microscale morphologies from data. With essential physics embedded in its building blocks, this data-driven material model can be extrapolated to predict nonlinear material behaviors efficiently and accurately. Through the transfer learning of DMN, we create a unified process-guided material database that covers a full range of geometric descriptors for short fiber reinforced composites. Finally, this unified DMN database is implemented and coupled with macroscale finite element model to enable concurrent multiscale simulations. From our perspective, the proposed framework is also promising in many other emergent multiscale engineering systems, such as additive manufacturing and compressive molding.
Tasks Transfer Learning
Published 2020-03-20
URL https://arxiv.org/abs/2003.09491v1
PDF https://arxiv.org/pdf/2003.09491v1.pdf
PWC https://paperswithcode.com/paper/intelligent-multiscale-simulation-based-on

Microvasculature Segmentation and Inter-capillary Area Quantification of the Deep Vascular Complex using Transfer Learning

Title Microvasculature Segmentation and Inter-capillary Area Quantification of the Deep Vascular Complex using Transfer Learning
Authors Julian Lo, Morgan Heisler, Vinicius Vanzan, Sonja Karst, Ivana Zadro Matovinovic, Sven Loncaric, Eduardo V. Navajas, Mirza Faisal Beg, Marinko V. Sarunic
Abstract Purpose: Optical Coherence Tomography Angiography (OCT-A) permits visualization of the changes to the retinal circulation due to diabetic retinopathy (DR), a microvascular complication of diabetes. We demonstrate accurate segmentation of the vascular morphology for the superficial capillary plexus and deep vascular complex (SCP and DVC) using a convolutional neural network (CNN) for quantitative analysis. Methods: Retinal OCT-A with a 6x6mm field of view (FOV) were acquired using a Zeiss PlexElite. Multiple-volume acquisition and averaging enhanced the vessel network contrast used for training the CNN. We used transfer learning from a CNN trained on 76 images from smaller FOVs of the SCP acquired using different OCT systems. Quantitative analysis of perfusion was performed on the automated vessel segmentations in representative patients with DR. Results: The automated segmentations of the OCT-A images maintained the hierarchical branching and lobular morphologies of the SCP and DVC, respectively. The network segmented the SCP with an accuracy of 0.8599, and a Dice index of 0.8618. For the DVC, the accuracy was 0.7986, and the Dice index was 0.8139. The inter-rater comparisons for the SCP had an accuracy and Dice index of 0.8300 and 0.6700, respectively, and 0.6874 and 0.7416 for the DVC. Conclusions: Transfer learning reduces the amount of manually-annotated images required, while producing high quality automatic segmentations of the SCP and DVC. Using high quality training data preserves the characteristic appearance of the capillary networks in each layer. Translational Relevance: Accurate retinal microvasculature segmentation with the CNN results in improved perfusion analysis in diabetic retinopathy.
Tasks Transfer Learning
Published 2020-03-19
URL https://arxiv.org/abs/2003.09033v1
PDF https://arxiv.org/pdf/2003.09033v1.pdf
PWC https://paperswithcode.com/paper/microvasculature-segmentation-and-inter

Diagnosis of Breast Cancer using Hybrid Transfer Learning

Title Diagnosis of Breast Cancer using Hybrid Transfer Learning
Authors Subrato Bharati, Prajoy Podder
Abstract Breast cancer is a common cancer for women. Early detection of breast cancer can considerably increase the survival rate of women. This paper mainly focuses on transfer learning process to detect breast cancer. Modified VGG (MVGG), residual network, mobile network is proposed and implemented in this paper. DDSM dataset is used in this paper. Experimental results show that our proposed hybrid transfers learning model (Fusion of MVGG16 and ImageNet) provides an accuracy of 88.3% where the number of epoch is 15. On the other hand, only modified VGG 16 architecture (MVGG 16) provides an accuracy 80.8% and MobileNet provides an accuracy of 77.2%. So, it is clearly stated that the proposed hybrid pre-trained network outperforms well compared to single architecture. This architecture can be considered as an effective tool for the radiologists in order to reduce the false negative and false positive rate. Therefore, the efficiency of mammography analysis will be improved.
Tasks Transfer Learning
Published 2020-03-23
URL https://arxiv.org/abs/2003.13503v1
PDF https://arxiv.org/pdf/2003.13503v1.pdf
PWC https://paperswithcode.com/paper/diagnosis-of-breast-cancer-using-hybrid

ENSEI: Efficient Secure Inference via Frequency-Domain Homomorphic Convolution for Privacy-Preserving Visual Recognition

Title ENSEI: Efficient Secure Inference via Frequency-Domain Homomorphic Convolution for Privacy-Preserving Visual Recognition
Authors Song Bian, Tianchen Wang, Masayuki Hiromoto, Yiyu Shi, Takashi Sato
Abstract In this work, we propose ENSEI, a secure inference (SI) framework based on the frequency-domain secure convolution (FDSC) protocol for the efficient execution of privacy-preserving visual recognition. Our observation is that, under the combination of homomorphic encryption and secret sharing, homomorphic convolution can be obliviously carried out in the frequency domain, significantly simplifying the related computations. We provide protocol designs and parameter derivations for number-theoretic transform (NTT) based FDSC. In the experiment, we thoroughly study the accuracy-efficiency trade-offs between time- and frequency-domain homomorphic convolution. With ENSEI, compared to the best known works, we achieve 5–11x online time reduction, up to 33x setup time reduction, and up to 10x reduction in the overall inference time. A further 33% of bandwidth reductions can be obtained on binary neural networks with only 1% of accuracy degradation on the CIFAR-10 dataset.
Published 2020-03-11
URL https://arxiv.org/abs/2003.05328v1
PDF https://arxiv.org/pdf/2003.05328v1.pdf
PWC https://paperswithcode.com/paper/ensei-efficient-secure-inference-via

A U-Net Based Discriminator for Generative Adversarial Networks

Title A U-Net Based Discriminator for Generative Adversarial Networks
Authors Edgar Schönfeld, Bernt Schiele, Anna Khoreva
Abstract Among the major remaining challenges for generative adversarial networks (GANs) is the capacity to synthesize globally and locally coherent images with object shapes and textures indistinguishable from real images. To target this issue we propose an alternative U-Net based discriminator architecture, borrowing the insights from the segmentation literature. The proposed U-Net based architecture allows to provide detailed per-pixel feedback to the generator while maintaining the global coherence of synthesized images, by providing the global image feedback as well. Empowered by the per-pixel response of the discriminator, we further propose a per-pixel consistency regularization technique based on the CutMix data augmentation, encouraging the U-Net discriminator to focus more on semantic and structural changes between real and fake images. This improves the U-Net discriminator training, further enhancing the quality of generated samples. The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics, enabling the generator to synthesize images with varying structure, appearance and levels of detail, maintaining global and local realism. Compared to the BigGAN baseline, we achieve an average improvement of 2.7 FID points across FFHQ, CelebA, and the newly introduced COCO-Animals dataset.
Tasks Data Augmentation
Published 2020-02-28
URL https://arxiv.org/abs/2002.12655v1
PDF https://arxiv.org/pdf/2002.12655v1.pdf
PWC https://paperswithcode.com/paper/a-u-net-based-discriminator-for-generative
comments powered by Disqus