April 3, 2020

3580 words 17 mins read

Paper Group AWR 79

Expected Information Maximization: Using the I-Projection for Mixture Density Estimation. Evaluating Salient Object Detection in Natural Images with Multiple Objects having Multi-level Saliency. COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest Radiography Images. Exploring Categorical Regular …

Expected Information Maximization: Using the I-Projection for Mixture Density Estimation


Title	Expected Information Maximization: Using the I-Projection for Mixture Density Estimation
Authors	Philipp Becker, Oleg Arenz, Gerhard Neumann
Abstract	Modelling highly multi-modal data is a challenging problem in machine learning. Most algorithms are based on maximizing the likelihood, which corresponds to the M(oment)-projection of the data distribution to the model distribution. The M-projection forces the model to average over modes it cannot represent. In contrast, the I(information)-projection ignores such modes in the data and concentrates on the modes the model can represent. Such behavior is appealing whenever we deal with highly multi-modal data where modelling single modes correctly is more important than covering all the modes. Despite this advantage, the I-projection is rarely used in practice due to the lack of algorithms that can efficiently optimize it based on data. In this work, we present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection solely based on samples for general latent variable models, where we focus on Gaussian mixtures models and Gaussian mixtures of experts. Our approach applies a variational upper bound to the I-projection objective which decomposes the original objective into single objectives for each mixture component as well as for the coefficients, allowing an efficient optimization. Similar to GANs, our approach employs discriminators but uses a more stable optimization procedure, using a tight upper bound. We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches and we illustrate the effectiveness of our approach for modelling multi-modal behavior on two pedestrian and traffic prediction datasets.
Tasks	Density Estimation, Latent Variable Models, Traffic Prediction
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08682v1
PDF	https://arxiv.org/pdf/2001.08682v1.pdf
PWC	https://paperswithcode.com/paper/expected-information-maximization-using-the-i-1
Repo	https://github.com/pbecker93/ExpectedInformationMaximization
Framework	tf

Evaluating Salient Object Detection in Natural Images with Multiple Objects having Multi-level Saliency


Title	Evaluating Salient Object Detection in Natural Images with Multiple Objects having Multi-level Saliency
Authors	Gökhan Yildirim, Debashis Sen, Mohan Kankanhalli, Sabine Süsstrunk
Abstract	Salient object detection is evaluated using binary ground truth with the labels being salient object class and background. In this paper, we corroborate based on three subjective experiments on a novel image dataset that objects in natural images are inherently perceived to have varying levels of importance. Our dataset, named SalMoN (saliency in multi-object natural images), has 588 images containing multiple objects. The subjective experiments performed record spontaneous attention and perception through eye fixation duration, point clicking and rectangle drawing. As object saliency in a multi-object image is inherently multi-level, we propose that salient object detection must be evaluated for the capability to detect all multi-level salient objects apart from the salient object class detection capability. For this purpose, we generate multi-level maps as ground truth corresponding to all the dataset images using the results of the subjective experiments, with the labels being multi-level salient objects and background. We then propose the use of mean absolute error, Kendall’s rank correlation and average area under precision-recall curve to evaluate existing salient object detection methods on our multi-level saliency ground truth dataset. Approaches that represent saliency detection on images as local-global hierarchical processing of a graph perform well in our dataset.
Tasks	Object Detection, Saliency Detection, Salient Object Detection
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08514v1
PDF	https://arxiv.org/pdf/2003.08514v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-salient-object-detection-in
Repo	https://github.com/gokyildirim/salmon_dataset
Framework	none

COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest Radiography Images


Title	COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest Radiography Images
Authors	Linda Wang, Alexander Wong
Abstract	The COVID-19 pandemic continues to have a devastating effect on the health and well-being of the global population. A critical step in the fight against COVID-19 is effective screening of infected patients, with one of the key screening approaches being radiological imaging using chest radiography. Motivated by this, a number of artificial intelligence (AI) systems based on deep learning have been proposed and results have been shown to be quite promising in terms of accuracy in detecting patients infected with COVID-19 using chest radiography images. However, to the best of the authors’ knowledge, these developed AI systems have been closed source and unavailable to the research community for deeper understanding and extension, and unavailable for public access and use. Therefore, in this study we introduce COVID-Net, a deep convolutional neural network design tailored for the detection of COVID-19 cases from chest radiography images that is open source and available to the general public. We also describe the chest radiography dataset leveraged to train COVID-Net, which we will refer to as COVIDx and is comprised of 16,756 chest radiography images across 13,645 patient cases from two open access data repositories. Furthermore, we investigate how COVID-Net makes predictions using an explainability method in an attempt to gain deeper insights into critical factors associated with COVID cases, which can aid clinicians in improved screening. By no means a production-ready solution, the hope is that the open access COVID-Net, along with the description on constructing the open source COVIDx dataset, will be leveraged and build upon by both researchers and citizen data scientists alike to accelerate the development of highly accurate yet practical deep learning solutions for detecting COVID-19 cases and accelerate treatment of those who need it the most.
Tasks	COVID-19 Detection
Published	2020-03-22
URL	https://arxiv.org/abs/2003.09871v2
PDF	https://arxiv.org/pdf/2003.09871v2.pdf
PWC	https://paperswithcode.com/paper/covid-net-a-tailored-deep-convolutional
Repo	https://github.com/IliasPap/COVIDNet
Framework	pytorch

Exploring Categorical Regularization for Domain Adaptive Object Detection


Title	Exploring Categorical Regularization for Domain Adaptive Object Detection
Authors	Chang-Dong Xu, Xing-Ran Zhao, Xin Jin, Xiu-Shen Wei
Abstract	In this paper, we tackle the domain adaptive object detection problem, where the main challenge lies in significant domain gaps between source and target domains. Previous work seeks to plainly align image-level and instance-level shifts to eventually minimize the domain discrepancy. However, they still overlook to match crucial image regions and important instances across domains, which will strongly affect domain shift mitigation. In this work, we propose a simple but effective categorical regularization framework for alleviating this issue. It can be applied as a plug-and-play component on a series of Domain Adaptive Faster R-CNN methods which are prominent for dealing with domain adaptive detection. Specifically, by integrating an image-level multi-label classifier upon the detection backbone, we can obtain the sparse but crucial image regions corresponding to categorical information, thanks to the weakly localization ability of the classification manner. Meanwhile, at the instance level, we leverage the categorical consistency between image-level predictions (by the classifier) and instance-level predictions (by the detection head) as a regularization factor to automatically hunt for the hard aligned instances of target domains. Extensive experiments of various domain shift scenarios show that our method obtains a significant performance gain over original Domain Adaptive Faster R-CNN detectors. Furthermore, qualitative visualization and analyses can demonstrate the ability of our method for attending on the key regions/instances targeting on domain adaptation. Our code is open-source and available at \url{https://github.com/Megvii-Nanjing/CR-DA-DET}.
Tasks	Domain Adaptation, Object Detection
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09152v1
PDF	https://arxiv.org/pdf/2003.09152v1.pdf
PWC	https://paperswithcode.com/paper/exploring-categorical-regularization-for
Repo	https://github.com/Megvii-Nanjing/CR-DA-DET
Framework	pytorch

MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps


Title	MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps
Authors	Pengxiang Wu, Siheng Chen, Dimitris Metaxas
Abstract	The ability to reliably perceive the environmental states, particularly the existence of objects and their motion behavior, is crucial for autonomous driving. In this work, we propose an efficient deep model, called MotionNet, to jointly perform perception and motion prediction from 3D point clouds. MotionNet takes a sequence of LiDAR sweeps as input and outputs a bird’s eye view (BEV) map, which encodes the object category and motion information in each grid cell. The backbone of MotionNet is a novel spatio-temporal pyramid network, which extracts deep spatial and temporal features in a hierarchical fashion. To enforce the smoothness of predictions over both space and time, the training of MotionNet is further regularized with novel spatial and temporal consistency losses. Extensive experiments show that the proposed method overall outperforms the state-of-the-arts, including the latest scene-flow- and 3D-object-detection-based methods. This indicates the potential value of the proposed method serving as a backup to the bounding-box-based system, and providing complementary information to the motion planner in autonomous driving. Code is available at https://github.com/pxiangwu/MotionNet.
Tasks	3D Object Detection, Autonomous Driving, motion prediction, Object Detection
Published	2020-03-15
URL	https://arxiv.org/abs/2003.06754v1
PDF	https://arxiv.org/pdf/2003.06754v1.pdf
PWC	https://paperswithcode.com/paper/motionnet-joint-perception-and-motion
Repo	https://github.com/pxiangwu/MotionNet
Framework	pytorch

Arbitrary-Oriented Object Detection with Circular Smooth Label


Title	Arbitrary-Oriented Object Detection with Circular Smooth Label
Authors	Xue Yang, Junchi Yan
Abstract	Arbitrary-oriented object detection has recently attracted increasing attention in vision for their importance in aerial imagery, scene text, and face etc. In this paper, we show that existing regression-based rotation detectors suffer the problem of discontinuous boundaries, which is directly caused by angular periodicity or corner ordering. By a careful study, we find the root cause is that the ideal predictions are beyond the defined range. We design a new rotation detection baseline, to address the boundary problem by transforming angular prediction from a regression problem to a classification task with little accuracy loss, whereby high-precision angle classification is devised in contrast to previous works using coarse-granularity in rotation detection. We also propose a circular smooth label (CSL) technique to handle the periodicity of the angle and increase the error tolerance to adjacent angles. We further introduce four window functions in CSL and explore the effect of different window radius sizes on detection performance. Extensive experiments and visual analysis on two large-scale public datasets for aerial images i.e. DOTA, HRSC2016, as well as scene text dataset ICDAR2015 and MLT, show the effectiveness of our approach. The code will be released at https://github.com/Thinklab-SJTU/CSL_RetinaNet_Tensorflow.
Tasks	Object Detection
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05597v1
PDF	https://arxiv.org/pdf/2003.05597v1.pdf
PWC	https://paperswithcode.com/paper/arbitrary-oriented-object-detection-with
Repo	https://github.com/Thinklab-SJTU/CSL_RetinaNet_Tensorflow
Framework	tf

Deep Soft Procrustes for Markerless Volumetric Sensor Alignment


Title	Deep Soft Procrustes for Markerless Volumetric Sensor Alignment
Authors	Vladimiros Sterzentsenko, Alexandros Doumanoglou, Spyridon Thermos, Nikolaos Zioulis, Dimitrios Zarpalas, Petros Daras
Abstract	With the advent of consumer grade depth sensors, low-cost volumetric capture systems are easier to deploy. Their wider adoption though depends on their usability and by extension on the practicality of spatially aligning multiple sensors. Most existing alignment approaches employ visual patterns, e.g. checkerboards, or markers and require high user involvement and technical knowledge. More user-friendly and easier-to-use approaches rely on markerless methods that exploit geometric patterns of a physical structure. However, current SoA approaches are bounded by restrictions in the placement and the number of sensors. In this work, we improve markerless data-driven correspondence estimation to achieve more robust and flexible multi-sensor spatial alignment. In particular, we incorporate geometric constraints in an end-to-end manner into a typical segmentation based model and bridge the intermediate dense classification task with the targeted pose estimation one. This is accomplished by a soft, differentiable procrustes analysis that regularizes the segmentation and achieves higher extrinsic calibration performance in expanded sensor placement configurations, while being unrestricted by the number of sensors of the volumetric capture system. Our model is experimentally shown to achieve similar results with marker-based methods and outperform the markerless ones, while also being robust to the pose variations of the calibration structure. Code and pretrained models are available at https://vcl3d.github.io/StructureNet/.
Tasks	Calibration, Pose Estimation
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10176v1
PDF	https://arxiv.org/pdf/2003.10176v1.pdf
PWC	https://paperswithcode.com/paper/deep-soft-procrustes-for-markerless
Repo	https://github.com/VCL3D/VolumetricCapture
Framework	none

Minimal Solvers for Indoor UAV Positioning


Title	Minimal Solvers for Indoor UAV Positioning
Authors	Marcus Valtonen Örnhag, Patrik Persson, Mårten Wadenbäck, Kalle Åström, Anders Heyden
Abstract	In this paper we consider a collection of relative pose problems which arise naturally in applications for visual indoor UAV navigation. We focus on cases where additional information from an onboard IMU is available and thus provides a partial extrinsic calibration through the gravitational vector. The solvers are designed for a partially calibrated camera, for a variety of realistic indoor scenarios, which makes it possible to navigate using images of the ground floor. Current state-of-the-art solvers use more general assumptions, such as using arbitrary planar structures; however, these solvers do not yield adequate reconstructions for real scenes, nor do they perform fast enough to be incorporated in real-time systems. We show that the proposed solvers enjoy better numerical stability, are faster, and require fewer point correspondences, compared to state-of-the-art solvers. These properties are vital components for robust navigation in real-time systems, and we demonstrate on both synthetic and real data that our method outperforms other methods, and yields superior motion estimation.
Tasks	Calibration, Motion Estimation
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07111v1
PDF	https://arxiv.org/pdf/2003.07111v1.pdf
PWC	https://paperswithcode.com/paper/minimal-solvers-for-indoor-uav-positioning
Repo	https://github.com/marcusvaltonen/minimal_indoor_uav
Framework	none

A context based deep learning approach for unbalanced medical image segmentation


Title	A context based deep learning approach for unbalanced medical image segmentation
Authors	Balamurali Murugesan, Kaushik Sarveswaran, Vijaya Raghavan S, Sharath M Shankaranarayana, Keerthi Ram, Mohanasankar Sivaprakasam
Abstract	Automated medical image segmentation is an important step in many medical procedures. Recently, deep learning networks have been widely used for various medical image segmentation tasks, with U-Net and generative adversarial nets (GANs) being some of the commonly used ones. Foreground-background class imbalance is a common occurrence in medical images, and U-Net has difficulty in handling class imbalance because of its cross entropy (CE) objective function. Similarly, GAN also suffers from class imbalance because the discriminator looks at the entire image to classify it as real or fake. Since the discriminator is essentially a deep learning classifier, it is incapable of correctly identifying minor changes in small structures. To address these issues, we propose a novel context based CE loss function for U-Net, and a novel architecture Seg-GLGAN. The context based CE is a linear combination of CE obtained over the entire image and its region of interest (ROI). In Seg-GLGAN, we introduce a novel context discriminator to which the entire image and its ROI are fed as input, thus enforcing local context. We conduct extensive experiments using two challenging unbalanced datasets: PROMISE12 and ACDC. We observe that segmentation results obtained from our methods give better segmentation metrics as compared to various baseline methods.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02387v1
PDF	https://arxiv.org/pdf/2001.02387v1.pdf
PWC	https://paperswithcode.com/paper/a-context-based-deep-learning-approach-for
Repo	https://github.com/Bala93/Context-aware-segmentation
Framework	pytorch

PCGRL: Procedural Content Generation via Reinforcement Learning


Title	PCGRL: Procedural Content Generation via Reinforcement Learning
Authors	Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius
Abstract	We investigate how reinforcement learning can be used to train level-designing agents. This represents a new approach to procedural content generation in games, where level design is framed as a game, and the content generator itself is learned. By seeing the design problem as a sequential task, we can use reinforcement learning to learn how to take the next action so that the expected final level quality is maximized. This approach can be used when few or no examples exist to train from, and the trained generator is very fast. We investigate three different ways of transforming two-dimensional level design problems into Markov decision processes and apply these to three game environments.
Tasks
Published	2020-01-24
URL	https://arxiv.org/abs/2001.09212v2
PDF	https://arxiv.org/pdf/2001.09212v2.pdf
PWC	https://paperswithcode.com/paper/pcgrl-procedural-content-generation-via
Repo	https://github.com/amidos2006/gym-pcgrl
Framework	tf

Bayesian Sparsification Methods for Deep Complex-valued Networks


Title	Bayesian Sparsification Methods for Deep Complex-valued Networks
Authors	Ivan Nazarov, Evgeny Burnaev
Abstract	With continual miniaturization ever more applications of deep learning can be found in embedded systems, where it is common to encounter data with natural complex domain representation. To this end we extend Sparse Variational Dropout to complex-valued neural networks and verify the proposed Bayesian technique by conducting a large numerical study of the performance-compression trade-off of C-valued networks on two tasks: image recognition on MNIST-like and CIFAR10 datasets and music transcription on MusicNet. We replicate the state-of-the-art result by Trabelsi et al. [2018] on MusicNet with a complex-valued network compressed by 50-100x at a small performance penalty.
Tasks
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11413v1
PDF	https://arxiv.org/pdf/2003.11413v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-sparsification-methods-for-deep
Repo	https://github.com/ivannz/cplxmodule
Framework	pytorch

Vision-based Fight Detection from Surveillance Cameras


Title	Vision-based Fight Detection from Surveillance Cameras
Authors	Şeymanur Aktı, Gözde Ayşe Tataroğlu, Hazım Kemal Ekenel
Abstract	Vision-based action recognition is one of the most challenging research topics of computer vision and pattern recognition. A specific application of it, namely, detecting fights from surveillance cameras in public areas, prisons, etc., is desired to quickly get under control these violent incidents. This paper addresses this research problem and explores LSTM-based approaches to solve it. Moreover, the attention layer is also utilized. Besides, a new dataset is collected, which consists of fight scenes from surveillance camera videos available at YouTube. This dataset is made publicly available. From the extensive experiments conducted on Hockey Fight, Peliculas, and the newly collected fight datasets, it is observed that the proposed approach, which integrates Xception model, Bi-LSTM, and attention, improves the state-of-the-art accuracy for fight scene classification.
Tasks	Scene Classification
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04355v1
PDF	https://arxiv.org/pdf/2002.04355v1.pdf
PWC	https://paperswithcode.com/paper/vision-based-fight-detection-from
Repo	https://github.com/sayibet/fight-detection-surv-dataset
Framework	none

Pretraining Image Encoders without Reconstruction via Feature Prediction Loss


Title	Pretraining Image Encoders without Reconstruction via Feature Prediction Loss
Authors	Gustav Grund Pihlgren, Fredrik Sandin, Marcus Liwicki
Abstract	This work investigates three different loss functions for autoencoder-based pretraining of image encoders: The commonly used reconstruction loss, the more recently introduced perceptual similarity loss, and a feature prediction loss proposed here; the latter turning out to be the most efficient choice. Former work shows that predictions based on embeddings generated by image autoencoders can be improved by training with perceptual loss. So far the autoencoders trained with perceptual loss networks implemented an explicit comparison of the original and reconstructed images using the loss network. However, given such a loss network we show that there is no need for the timeconsuming task of decoding the entire image. Instead, we propose to decode the features of the loss network, hence the name “feature prediction loss”. To evaluate this method we compare six different procedures for training image encoders based on pixel-wise, perceptual similarity, and feature prediction loss. The embedding-based prediction results show that encoders trained with feature prediction loss is as good or better than those trained with the other two losses. Additionally, the encoder is significantly faster to train using feature prediction loss in comparison to the other losses. The method implementation used in this work is available online: https://github.com/guspih/Perceptual-Autoencoders
Tasks
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07441v1
PDF	https://arxiv.org/pdf/2003.07441v1.pdf
PWC	https://paperswithcode.com/paper/pretraining-image-encoders-without
Repo	https://github.com/guspih/Perceptual-Autoencoders
Framework	pytorch

SAUNet: Shape Attentive U-Net for Interpretable Medical Image Segmentation


Title	SAUNet: Shape Attentive U-Net for Interpretable Medical Image Segmentation
Authors	Jesse Sun, Fatemeh Darbehani, Mark Zaidi, Bo Wang
Abstract	Medical image segmentation is a difficult but important task for many clinical operations such as cardiac bi-ventricular volume estimation. More recently, there has been a shift to utilizing deep learning and fully convolutional neural networks (CNNs) to perform image segmentation that has yielded state-of-the-art results in many public benchmark datasets. Despite the progress of deep learning in medical image segmentation, standard CNNs are still not fully adopted in clinical settings as they lack robustness and interpretability. Shapes are generally more meaningful features than solely textures of images, which are features regular CNNs learn, causing a lack of robustness. Likewise, previous works surrounding model interpretability have been focused on post hoc gradient-based saliency methods. However, gradient-based saliency methods typically require additional computations post hoc and have been shown to be unreliable for interpretability. Thus, we present a new architecture called Shape Attentive U-Net (SAUNet) which focuses on model interpretability and robustness. The proposed architecture attempts to address these limitations by the use of a secondary shape stream that captures rich shape-dependent information in parallel with the regular texture stream. Furthermore, we suggest multi-resolution saliency maps can be learned using our dual-attention decoder module which allows for multi-level interpretability and mitigates the need for additional computations post hoc. Our method also achieves state-of-the-art results on the two large public cardiac MRI image segmentation datasets of SUN09 and AC17.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07645v3
PDF	https://arxiv.org/pdf/2001.07645v3.pdf
PWC	https://paperswithcode.com/paper/saunet-shape-attentive-u-net-for
Repo	https://github.com/bowang-lab/shape-attentive-unet
Framework	pytorch

See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks


Title	See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks
Authors	Xiankai Lu, Wenguan Wang, Chao Ma, Jianbing Shen, Ling Shao, Fatih Porikli
Abstract	We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view. We emphasize the importance of inherent correlation among video frames and incorporate a global co-attention mechanism to improve further the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in our network provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. We train COSNet with pairs of video frames, which naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos. Our extensive experiments over three large benchmarks manifest that COSNet outperforms the current alternatives by a large margin.
Tasks	Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2020-01-19
URL	https://arxiv.org/abs/2001.06810v1
PDF	https://arxiv.org/pdf/2001.06810v1.pdf
PWC	https://paperswithcode.com/paper/see-more-know-more-unsupervised-video-object-1
Repo	https://github.com/carrierlxk/COSNet
Framework	pytorch