April 2, 2020

3088 words 15 mins read

Paper Group ANR 272

Synthesis and Edition of Ultrasound Images via Sketch Guided Progressive Growing GANs. SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings. Video Anomaly Detection for Smart Surveillance. Conversational Search for Learning Technologies. ARA : Aggregated RAPPOR and Analysis for Centralized Differential Privacy. TITAN: Future Forecas …

Synthesis and Edition of Ultrasound Images via Sketch Guided Progressive Growing GANs


Title	Synthesis and Edition of Ultrasound Images via Sketch Guided Progressive Growing GANs
Authors	Jiamin Liang, Xin Yang, Haoming Li, Yi Wang, Manh The Van, Haoran Dou, Chaoyu Chen, Jinghui Fang, Xiaowen Liang, Zixin Mai, Guowen Zhu, Zhiyi Chen, Dong Ni
Abstract	Ultrasound (US) is widely accepted in clinic for anatomical structure inspection. However, lacking in resources to practice US scan, novices often struggle to learn the operation skills. Also, in the deep learning era, automated US image analysis is limited by the lack of annotated samples. Efficiently synthesizing realistic, editable and high resolution US images can solve the problems. The task is challenging and previous methods can only partially complete it. In this paper, we devise a new framework for US image synthesis. Particularly, we firstly adopt a sketch generative adversarial networks (Sgan) to introduce background sketch upon object mask in a conditioned generative adversarial network. With enriched sketch cues, Sgan can generate realistic US images with editable and fine-grained structure details. Although effective, Sgan is hard to generate high resolution US images. To achieve this, we further implant the Sgan into a progressive growing scheme (PGSgan). By smoothly growing both generator and discriminator, PGSgan can gradually synthesize US images from low to high resolution. By synthesizing ovary and follicle US images, our extensive perceptual evaluation, user study and segmentation results prove the promising efficacy and efficiency of the proposed PGSgan.
Tasks	Image Generation
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00226v1
PDF	https://arxiv.org/pdf/2004.00226v1.pdf
PWC	https://paperswithcode.com/paper/synthesis-and-edition-of-ultrasound-images
Repo
Framework

SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings


Title	SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings
Authors	Wenyu Han, Siyuan Xiang, Chenhui Liu, Ruoyu Wang, Chen Feng
Abstract	Spatial reasoning is an important component of human intelligence. We can imagine the shapes of 3D objects and reason about their spatial relations by merely looking at their three-view line drawings in 2D, with different levels of competence. Can deep networks be trained to perform spatial reasoning tasks? How can we measure their “spatial intelligence”? To answer these questions, we present the SPARE3D dataset. Based on cognitive science and psychometrics, SPARE3D contains three types of 2D-3D reasoning tasks on view consistency, camera pose, and shape generation, with increasing difficulty. We then design a method to automatically generate a large number of challenging questions with ground truth answers for each task. They are used to provide supervision for training our baseline models using state-of-the-art architectures like ResNet. Our experiments show that although convolutional networks have achieved superhuman performance in many visual learning tasks, their spatial reasoning performance in SPARE3D is almost equal to random guesses. We hope SPARE3D can stimulate new problem formulations and network designs for spatial reasoning to empower intelligent robots to operate effectively in the 3D world via 2D sensors. The dataset and code are available at https://ai4ce.github.io/SPARE3D.
Tasks
Published	2020-03-31
URL	https://arxiv.org/abs/2003.14034v1
PDF	https://arxiv.org/pdf/2003.14034v1.pdf
PWC	https://paperswithcode.com/paper/spare3d-a-dataset-for-spatial-reasoning-on
Repo
Framework

Video Anomaly Detection for Smart Surveillance


Title	Video Anomaly Detection for Smart Surveillance
Authors	Sijie Zhu, Chen Chen, Waqas Sultani
Abstract	In modern intelligent video surveillance systems, automatic anomaly detection through computer vision analytics plays a pivotal role which not only significantly increases monitoring efficiency but also reduces the burden on live monitoring. Anomalies in videos are broadly defined as events or activities that are unusual and signify irregular behavior. The goal of anomaly detection is to temporally or spatially localize the anomaly events in video sequences. Temporal localization (i.e. indicating the start and end frames of the anomaly event in a video) is referred to as frame-level detection. Spatial localization, which is more challenging, means to identify the pixels within each anomaly frame that correspond to the anomaly event. This setting is usually referred to as pixel-level detection. In this paper, we provide a brief overview of the recent research progress on video anomaly detection and highlight a few future research directions.
Tasks	Anomaly Detection, Temporal Localization
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00222v1
PDF	https://arxiv.org/pdf/2004.00222v1.pdf
PWC	https://paperswithcode.com/paper/video-anomaly-detection-for-smart
Repo
Framework

Conversational Search for Learning Technologies


Title	Conversational Search for Learning Technologies
Authors	Sharon Oviatt, Laure Soulier
Abstract	Conversational search is based on a user-system cooperation with the objective to solve an information-seeking task. In this report, we discuss the implication of such cooperation with the learning perspective from both user and system side. We also focus on the stimulation of learning through a key component of conversational search, namely the multimodality of communication way, and discuss the implication in terms of information retrieval. We end with a research road map describing promising research directions and perspectives.
Tasks	Information Retrieval
Published	2020-01-09
URL	https://arxiv.org/abs/2001.02912v1
PDF	https://arxiv.org/pdf/2001.02912v1.pdf
PWC	https://paperswithcode.com/paper/conversational-search-for-learning
Repo
Framework

ARA : Aggregated RAPPOR and Analysis for Centralized Differential Privacy


Title	ARA : Aggregated RAPPOR and Analysis for Centralized Differential Privacy
Authors	Sudipta Paul, Subhankar Mishra
Abstract	Differential privacy(DP) has now become a standard in case of sensitive statistical data analysis. The two main approaches in DP is local and central. Both the approaches have a clear gap in terms of data storing,amount of data to be analyzed, analysis, speed etc. Local wins on the speed. We have tested the state of the art standard RAPPOR which is a local approach and supported this gap. Our work completely focuses on that part too. Here, we propose a model which initially collects RAPPOR reports from multiple clients which are then pushed to a Tf-Idf estimation model. The Tf-Idf estimation model then estimates the reports on the basis of the occurrence of “on bit” in a particular position and its contribution to that position. Thus it generates a centralized differential privacy analysis from multiple clients. Our model successfully and efficiently analyzed the major truth value every time.
Tasks
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01618v1
PDF	https://arxiv.org/pdf/2001.01618v1.pdf
PWC	https://paperswithcode.com/paper/ara-aggregated-rappor-and-analysis-for
Repo
Framework

TITAN: Future Forecast using Action Priors


Title	TITAN: Future Forecast using Action Priors
Authors	Srikanth Malla, Behzad Dariush, Chiho Choi
Abstract	We consider the problem of predicting the future trajectory of scene agents from egocentric views obtained from a moving platform. This problem is important in a variety of domains, particularly for autonomous systems making reactive or strategic decisions in navigation. In an attempt to address this problem, we introduce TITAN (Trajectory Inference using Targeted Action priors Network), a new model that incorporates prior positions, actions, and context to forecast future trajectory of agents and future ego-motion. In the absence of an appropriate dataset for this task, we created the TITAN dataset that consists of 700 labeled video-clips (with odometry) captured from a moving vehicle on highly interactive urban traffic scenes in Tokyo. Our dataset includes 50 labels including vehicle states and actions, pedestrian age groups, and targeted pedestrian action attributes that are organized hierarchically corresponding to atomic, simple/complex-contextual, transportive, and communicative actions. To evaluate our model, we conducted extensive experiments on the TITAN dataset, revealing significant performance improvement against baselines and state-of-the-art algorithms. We also report promising results from our Agent Importance Mechanism (AIM), a module which provides insight into assessment of perceived risk by calculating the relative influence of each agent on the future ego-trajectory. The dataset is available at https://usa.honda-ri.com/titan
Tasks
Published	2020-03-31
URL	https://arxiv.org/abs/2003.13886v2
PDF	https://arxiv.org/pdf/2003.13886v2.pdf
PWC	https://paperswithcode.com/paper/titan-future-forecast-using-action-priors
Repo
Framework

ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes


Title	ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
Authors	Charles R. Qi, Xinlei Chen, Or Litany, Leonidas J. Guibas
Abstract	3D object detection has seen quick progress thanks to advances in deep learning on point clouds. A few recent works have even shown state-of-the-art performance with just point clouds input (e.g. VoteNet). However, point cloud data have inherent limitations. They are sparse, lack color information and often suffer from sensor noise. Images, on the other hand, have high resolution and rich texture. Thus they can complement the 3D geometry provided by point clouds. Yet how to effectively use image information to assist point cloud based detection is still an open question. In this work, we build on top of VoteNet and propose a 3D detection architecture called ImVoteNet specialized for RGB-D scenes. ImVoteNet is based on fusing 2D votes in images and 3D votes in point clouds. Compared to prior work on multi-modal detection, we explicitly extract both geometric and semantic features from the 2D images. We leverage camera parameters to lift these features to 3D. To improve the synergy of 2D-3D feature fusion, we also propose a multi-tower training scheme. We validate our model on the challenging SUN RGB-D dataset, advancing state-of-the-art results by 5.7 mAP. We also provide rich ablation studies to analyze the contribution of each design choice.
Tasks	3D Object Detection, Object Detection
Published	2020-01-29
URL	https://arxiv.org/abs/2001.10692v1
PDF	https://arxiv.org/pdf/2001.10692v1.pdf
PWC	https://paperswithcode.com/paper/imvotenet-boosting-3d-object-detection-in
Repo
Framework

Self-attention-based BiGRU and capsule network for named entity recognition


Title	Self-attention-based BiGRU and capsule network for named entity recognition
Authors	Jianfeng Deng, Lianglun Cheng, Zhuowei Wang
Abstract	Named entity recognition(NER) is one of the tasks of natural language processing(NLP). In view of the problem that the traditional character representation ability is weak and the neural network method is unable to capture the important sequence information. An self-attention-based bidirectional gated recurrent unit(BiGRU) and capsule network(CapsNet) for NER is proposed. This model generates character vectors through bidirectional encoder representation of transformers(BERT) pre-trained model. BiGRU is used to capture sequence context features, and self-attention mechanism is proposed to give different focus on the information captured by hidden layer of BiGRU. Finally, we propose to use CapsNet for entity recognition. We evaluated the recognition performance of the model on two datasets. Experimental results show that the model has better performance without relying on external dictionary information.
Tasks	Named Entity Recognition
Published	2020-01-30
URL	https://arxiv.org/abs/2002.00735v1
PDF	https://arxiv.org/pdf/2002.00735v1.pdf
PWC	https://paperswithcode.com/paper/self-attention-based-bigru-and-capsule
Repo
Framework

Spectrum Translation for Cross-Spectral Ocular Matching


Title	Spectrum Translation for Cross-Spectral Ocular Matching
Authors	Kevin Hernandez Diaz, Fernando Alonso-Fernandez, Josef Bigun
Abstract	Cross-spectral verification remains a big issue in biometrics, especially for the ocular area due to differences in the reflected features in the images depending on the region and spectrum used. In this paper, we investigate the use of Conditional Adversarial Networks for spectrum translation between near infra-red and visual light images for ocular biometrics. We analyze the transformation based on the overall visual quality of the transformed images and the accuracy drop of the identification system when trained with opposing data. We use the PolyU database and propose two different systems for biometric verification, the first one based on Siamese Networks trained with Softmax and Cross-Entropy loss, and the second one a Triplet Loss network. We achieved an EER of 1% when using a Triplet Loss network trained for NIR and finding the Euclidean distance between the real NIR images and the fake ones translated from the visible spectrum. We also outperform previous results using baseline algorithms.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06228v1
PDF	https://arxiv.org/pdf/2002.06228v1.pdf
PWC	https://paperswithcode.com/paper/spectrum-translation-for-cross-spectral
Repo
Framework


Title	Shared Cross-Modal Trajectory Prediction for Autonomous Driving
Authors	Chiho Choi
Abstract	We propose a framework for predicting future trajectories of traffic agents in highly interactive environments. On the basis of the fact that autonomous driving vehicles are equipped with various types of sensors (e.g., LiDAR scanner, RGB camera, etc.), our work aims to get benefit from the use of multiple input modalities that are complementary to each other. The proposed approach is composed of two stages. (i) feature encoding where we discover motion behavior of the target agent with respect to other directly and indirectly observable influences. We extract such behaviors from multiple perspectives such as in top-down and frontal view. (ii) cross-modal embedding where we embed a set of learned behavior representations into a single cross-modal latent space. We construct a generative model and formulate the objective functions with an additional regularizer specifically designed for future prediction. An extensive evaluation is conducted to show the efficacy of the proposed framework using two benchmark driving datasets.
Tasks	Autonomous Driving, Future prediction, Trajectory Prediction
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00202v1
PDF	https://arxiv.org/pdf/2004.00202v1.pdf
PWC	https://paperswithcode.com/paper/shared-cross-modal-trajectory-prediction-for
Repo
Framework

Continuous Geodesic Convolutions for Learning on 3D Shapes


Title	Continuous Geodesic Convolutions for Learning on 3D Shapes
Authors	Zhangsihao Yang, Or Litany, Tolga Birdal, Srinath Sridhar, Leonidas Guibas
Abstract	The majority of descriptor-based methods for geometric processing of non-rigid shape rely on hand-crafted descriptors. Recently, learning-based techniques have been shown effective, achieving state-of-the-art results in a variety of tasks. Yet, even though these methods can in principle work directly on raw data, most methods still rely on hand-crafted descriptors at the input layer. In this work, we wish to challenge this practice and use a neural network to learn descriptors directly from the raw mesh. To this end, we introduce two modules into our neural architecture. The first is a local reference frame (LRF) used to explicitly make the features invariant to rigid transformations. The second is continuous convolution kernels that provide robustness to sampling. We show the efficacy of our proposed network in learning on raw meshes using two cornerstone tasks: shape matching, and human body parts segmentation. Our results show superior results over baseline methods that use hand-crafted descriptors.
Tasks
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02506v1
PDF	https://arxiv.org/pdf/2002.02506v1.pdf
PWC	https://paperswithcode.com/paper/continuous-geodesic-convolutions-for-learning
Repo
Framework

Estimation of the spatial weighting matrix for regular lattice data – An adaptive lasso approach with cross-sectional resampling


Title	Estimation of the spatial weighting matrix for regular lattice data – An adaptive lasso approach with cross-sectional resampling
Authors	Miryam S. Merk, Philipp Otto
Abstract	Spatial econometric research typically relies on the assumption that the spatial dependence structure is known in advance and is represented by a deterministic spatial weights matrix. Contrary to classical approaches, we investigate the estimation of sparse spatial dependence structures for regular lattice data. In particular, an adaptive least absolute shrinkage and selection operator (lasso) is used to select and estimate the individual connections of the spatial weights matrix. To recover the spatial dependence structure, we propose cross-sectional resampling, assuming that the random process is exchangeable. The estimation procedure is based on a two-step approach to circumvent simultaneity issues that typically arise from endogenous spatial autoregressive dependencies. The two-step adaptive lasso approach with cross-sectional resampling is verified using Monte Carlo simulations. Eventually, we apply the procedure to model nitrogen dioxide ($\mathrm{NO_2}$) concentrations and show that estimating the spatial dependence structure contrary to using prespecified weights matrices improves the prediction accuracy considerably.
Tasks
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01532v1
PDF	https://arxiv.org/pdf/2001.01532v1.pdf
PWC	https://paperswithcode.com/paper/estimation-of-the-spatial-weighting-matrix
Repo
Framework

Efficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability


Title	Efficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability
Authors	Prathyush Sambaturu, Aparna Gupta, Ian Davidson, S. S. Ravi, Anil Vullikanti, Andrew Warren
Abstract	Improving the explainability of the results from machine learning methods has become an important research goal. Here, we study the problem of making clusters more interpretable by extending a recent approach of [Davidson et al., NeurIPS 2018] for constructing succinct representations for clusters. Given a set of objects $S$, a partition $\pi$ of $S$ (into clusters), and a universe $T$ of tags such that each element in $S$ is associated with a subset of tags, the goal is to find a representative set of tags for each cluster such that those sets are pairwise-disjoint and the total size of all the representatives is minimized. Since this problem is NP-hard in general, we develop approximation algorithms with provable performance guarantees for the problem. We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat levels.
Tasks
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02487v1
PDF	https://arxiv.org/pdf/2002.02487v1.pdf
PWC	https://paperswithcode.com/paper/efficient-algorithms-for-generating-provably
Repo
Framework

Semi-Supervised Cervical Dysplasia Classification With Learnable Graph Convolutional Network


Title	Semi-Supervised Cervical Dysplasia Classification With Learnable Graph Convolutional Network
Authors	Yanglan Ou, Yuan Xue, Ye Yuan, Tao Xu, Vincent Pisztora, Jia Li, Xiaolei Huang
Abstract	Cervical cancer is the second most prevalent cancer affecting women today. As the early detection of cervical carcinoma relies heavily upon screening and pre-clinical testing, digital cervicography has great potential as a primary or auxiliary screening tool, especially in low-resource regions due to its low cost and easy access. Although an automated cervical dysplasia detection system has been desirable, traditional fully-supervised training of such systems requires large amounts of annotated data which are often labor-intensive to collect. To alleviate the need for much manual annotation, we propose a novel graph convolutional network (GCN) based semi-supervised classification model that can be trained with fewer annotations. In existing GCNs, graphs are constructed with fixed features and can not be updated during the learning process. This limits their ability to exploit new features learned during graph convolution. In this paper, we propose a novel and more flexible GCN model with a feature encoder that adaptively updates the adjacency matrix during learning and demonstrate that this model design leads to improved performance. Our experimental results on a cervical dysplasia classification dataset show that the proposed framework outperforms previous methods under a semi-supervised setting, especially when the labeled samples are scarce.
Tasks
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00191v1
PDF	https://arxiv.org/pdf/2004.00191v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-cervical-dysplasia
Repo
Framework

Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset


Title	Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset
Authors	Lee Callender, Curtis Hawthorne, Jesse Engel
Abstract	Classifier metrics, such as accuracy and F-measure score, often serve as proxies for performance in downstream tasks. For the case of generative systems that use predicted labels as inputs, accuracy is a good proxy only if it aligns with the perceptual quality of generated outputs. Here, we demonstrate this effect using the example of automatic drum transcription (ADT). We optimize classifiers for downstream generation by predicting expressive dynamics (velocity) and show with listening tests that they produce outputs with improved perceptual quality, despite achieving similar results on classification metrics. To train expressive ADT models, we introduce the Expanded Groove MIDI dataset (E-GMD), a large dataset of human drum performances, with audio recordings annotated in MIDI. E-GMD contains 444 hours of audio from 43 drum kits and is an order of magnitude larger than similar datasets. It is also the first human-performed drum dataset with annotations of velocity. We make this new dataset available under a Creative Commons license along with open source code for training and a pre-trained model for inference.
Tasks	Drum Transcription
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00188v1
PDF	https://arxiv.org/pdf/2004.00188v1.pdf
PWC	https://paperswithcode.com/paper/improving-perceptual-quality-of-drum
Repo
Framework