January 27, 2020

3413 words 17 mins read

Paper Group ANR 1307

Simulating Execution Time of Tensor Programs using Graph Neural Networks. Towards Robust Image Classification Using Sequential Attention Models. Pay Attention: Leveraging Sequence Models to Predict the Useful Life of Batteries. A Noise-Robust Fast Sparse Bayesian Learning Model. Interpreting chest X-rays via CNNs that exploit disease dependencies a …

Simulating Execution Time of Tensor Programs using Graph Neural Networks


Title	Simulating Execution Time of Tensor Programs using Graph Neural Networks
Authors	Jakub M. Tomczak, Romain Lepert, Auke Wiggers
Abstract	Optimizing the execution time of tensor program, e.g., a convolution, involves finding its optimal configuration. Searching the configuration space exhaustively is typically infeasible in practice. In line with recent research using TVM, we propose to learn a surrogate model to overcome this issue. The model is trained on an acyclic graph called an abstract syntax tree, and utilizes a graph convolutional network to exploit structure in the graph. We claim that a learnable graph-based data processing is a strong competitor to heuristic-based feature extraction. We present a new dataset of graphs corresponding to configurations and their execution time for various tensor programs. We provide baselines for a runtime prediction task.
Tasks
Published	2019-04-26
URL	https://arxiv.org/abs/1904.11876v3
PDF	https://arxiv.org/pdf/1904.11876v3.pdf
PWC	https://paperswithcode.com/paper/simulating-execution-time-of-tensor-programs
Repo
Framework

Towards Robust Image Classification Using Sequential Attention Models


Title	Towards Robust Image Classification Using Sequential Attention Models
Authors	Daniel Zoran, Mike Chrzanowski, Po-Sen Huang, Sven Gowal, Alex Mott, Pushmeet Kohl
Abstract	In this paper we propose to augment a modern neural-network architecture with an attention model inspired by human perception. Specifically, we adversarially train and analyze a neural model incorporating a human inspired, visual attention component that is guided by a recurrent top-down sequential process. Our experimental evaluation uncovers several notable findings about the robustness and behavior of this new model. First, introducing attention to the model significantly improves adversarial robustness resulting in state-of-the-art ImageNet accuracies under a wide range of random targeted attack strengths. Second, we show that by varying the number of attention steps (glances/fixations) for which the model is unrolled, we are able to make its defense capabilities stronger, even in light of stronger attacks — resulting in a “computational race” between the attacker and the defender. Finally, we show that some of the adversarial examples generated by attacking our model are quite different from conventional adversarial examples — they contain global, salient and spatially coherent structures coming from the target class that would be recognizable even to a human, and work by distracting the attention of the model away from the main object in the original image.
Tasks	Image Classification
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02184v1
PDF	https://arxiv.org/pdf/1912.02184v1.pdf
PWC	https://paperswithcode.com/paper/towards-robust-image-classification-using
Repo
Framework

Pay Attention: Leveraging Sequence Models to Predict the Useful Life of Batteries


Title	Pay Attention: Leveraging Sequence Models to Predict the Useful Life of Batteries
Authors	Samuel Paradis, Michael Whitmeyer
Abstract	We use data on 124 batteries released by Stanford University to first try to solve the binary classification problem of determining if a battery is “good” or “bad” given only the first 5 cycles of data (i.e., will it last longer than a certain threshold of cycles), as well as the prediction problem of determining the exact number of cycles a battery will last given the first 100 cycles of data. We approach the problem from a purely data-driven standpoint, hoping to use deep learning to learn the patterns in the sequences of data that the Stanford team engineered by hand. For both problems, we used a similar deep network design, that included an optional 1-D convolution, LSTMs, an optional Attention layer, followed by fully connected layers to produce our output. For the classification task, we were able to achieve very competitive results, with validation accuracies above 90%, and a test accuracy of 95%, compared to the 97.5% test accuracy of the current leading model. For the prediction task, we were also able to achieve competitive results, with a test MAPE error of 12.5% as compared with a 9.1% MAPE error achieved by the current leading model (Severson et al. 2019).
Tasks
Published	2019-10-03
URL	https://arxiv.org/abs/1910.01347v2
PDF	https://arxiv.org/pdf/1910.01347v2.pdf
PWC	https://paperswithcode.com/paper/pay-attention-leveraging-sequence-models-to
Repo
Framework

A Noise-Robust Fast Sparse Bayesian Learning Model


Title	A Noise-Robust Fast Sparse Bayesian Learning Model
Authors	Ingvild M. Helgøy, Yushu Li
Abstract	This paper utilizes the hierarchical model structure from the Bayesian Lasso in the Sparse Bayesian Learning process to develop a new type of probabilistic supervised learning approach. This approach has several performance advantages, such as being fast, sparse and especially robust to the variance in random noise. The hierarchical model structure in this Bayesian framework is designed in such a way that the priors do not only penalize the unnecessary complexity of the model but also depend on the variance of the random noise in the data. The hyperparameters in the model are estimated by the Fast Marginal Likelihood Maximization algorithm and can achieve low computational cost and faster learning process. We compare our methodology with two other popular Sparse Bayesian Learning models: The Relevance Vector Machine and a sparse Bayesian model that has been used for signal reconstruction in compressive sensing. We show that our method will generally provide more sparse solutions and be more flexible and stable when data is polluted by high variance noise.
Tasks	Compressive Sensing
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07220v1
PDF	https://arxiv.org/pdf/1908.07220v1.pdf
PWC	https://paperswithcode.com/paper/a-noise-robust-fast-sparse-bayesian-learning
Repo
Framework

Interpreting chest X-rays via CNNs that exploit disease dependencies and uncertainty labels


Title	Interpreting chest X-rays via CNNs that exploit disease dependencies and uncertainty labels
Authors	Hieu H. Pham, Tung T. Le, Dat Q. Tran, Dat T. Ngo, Ha Q. Nguyen
Abstract	Chest radiography is one of the most common types of diagnostic radiology exams, which is critical for screening and diagnosis of many different thoracic diseases. Specialized algorithms have been developed to detect several specific pathologies such as lung nodule or lung cancer. However, accurately detecting the presence of multiple diseases from chest X-rays (CXRs) is still a challenging task. This paper presents a supervised multi-label classification framework based on deep convolutional neural networks (CNNs) for predicting the risk of 14 common thoracic diseases. We tackle this problem by training state-of-the-art CNNs that exploit dependencies among abnormality labels. We also propose to use the label smoothing technique for a better handling of uncertain samples, which occupy a significant portion of almost every CXR dataset. Our model is trained on over 200,000 CXRs of the recently released CheXpert dataset and achieves a mean area under the curve (AUC) of 0.940 in predicting 5 selected pathologies from the validation set. This is the highest AUC score yet reported to date. The proposed method is also evaluated on the independent test set of the CheXpert competition, which is composed of 500 CXR studies annotated by a panel of 5 experienced radiologists. The performance is on average better than 2.6 out of 3 other individual radiologists with a mean AUC of 0.930, which ranks first on the CheXpert leaderboard at the time of writing this paper.
Tasks	Multi-Label Classification
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06475v2
PDF	https://arxiv.org/pdf/1911.06475v2.pdf
PWC	https://paperswithcode.com/paper/interpreting-chest-x-rays-via-cnns-that
Repo
Framework

Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training


Title	Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training
Authors	Margaret Li, Stephen Roller, Ilia Kulikov, Sean Welleck, Y-Lan Boureau, Kyunghyun Cho, Jason Weston
Abstract	Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address. They tend to produce generations that (i) rely too much on copying from the context, (ii) contain repetitions within utterances, (iii) overuse frequent words, and (iv) at a deeper level, contain logical flaws. In this work we show how all of these problems can be addressed by extending the recently introduced unlikelihood loss (Welleck et al., 2019) to these cases. We show that appropriate loss functions which regularize generated outputs to match human distributions are effective for the first three issues. For the last important general issue, we show applying unlikelihood to collected data of what a model should not do is effective for improving logical consistency, potentially paving the way to generative models with greater reasoning ability. We demonstrate the efficacy of our approach across several dialogue tasks.
Tasks
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03860v1
PDF	https://arxiv.org/pdf/1911.03860v1.pdf
PWC	https://paperswithcode.com/paper/dont-say-that-making-inconsistent-dialogue
Repo
Framework

Adversarial training with cycle consistency for unsupervised super-resolution in endomicroscopy


Title	Adversarial training with cycle consistency for unsupervised super-resolution in endomicroscopy
Authors	Daniele Ravì, Agnieszka Barbara Szczotka, Stephen P Pereira, Tom Vercauteren
Abstract	In recent years, endomicroscopy has become increasingly used for diagnostic purposes and interventional guidance. It can provide intraoperative aids for real-time tissue characterization and can help to perform visual investigations aimed for example to discover epithelial cancers. Due to physical constraints on the acquisition process, endomicroscopy images, still today have a low number of informative pixels which hampers their quality. Post-processing techniques, such as Super-Resolution (SR), are a potential solution to increase the quality of these images. SR techniques are often supervised, requiring aligned pairs of low-resolution (LR) and high-resolution (HR) images patches to train a model. However, in our domain, the lack of HR images hinders the collection of such pairs and makes supervised training unsuitable. For this reason, we propose an unsupervised SR framework based on an adversarial deep neural network with a physically-inspired cycle consistency, designed to impose some acquisition properties on the super-resolved images. Our framework can exploit HR images, regardless of the domain where they are coming from, to transfer the quality of the HR images to the initial LR images. This property can be particularly useful in all situations where pairs of LR/HR are not available during the training. Our quantitative analysis, validated using a database of 238 endomicroscopy video sequences from 143 patients, shows the ability of the pipeline to produce convincing super-resolved images. A Mean Opinion Score (MOS) study also confirms this quantitative image quality assessment.
Tasks	Image Quality Assessment, Super-Resolution
Published	2019-01-21
URL	http://arxiv.org/abs/1901.06988v2
PDF	http://arxiv.org/pdf/1901.06988v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-training-with-cycle-consistency
Repo
Framework

No reference image quality assessment metric based on regional mutual information among images


Title	No reference image quality assessment metric based on regional mutual information among images
Authors	Vinay Kumar, Vivek Singh Bawa
Abstract	With the inclusion of camera in daily life, an automatic no reference image quality evaluation index is required for automatic classification of images. The present manuscripts proposes a new No Reference Regional Mutual Information based technique for evaluating the quality of an image. We use regional mutual information on subsets of the complete image. Proposed technique is tested on four benchmark natural image databases, and one benchmark synthetic database. A comparative analysis with classical and state-of-art methods indicate superiority of the present technique for high quality images and comparable for other images of the respective databases.
Tasks	Image Quality Assessment, No-Reference Image Quality Assessment
Published	2019-01-17
URL	http://arxiv.org/abs/1901.05811v1
PDF	http://arxiv.org/pdf/1901.05811v1.pdf
PWC	https://paperswithcode.com/paper/no-reference-image-quality-assessment-metric
Repo
Framework

Automated Lesion Detection by Regressing Intensity-Based Distance with a Neural Network


Title	Automated Lesion Detection by Regressing Intensity-Based Distance with a Neural Network
Authors	Kimberlin M. H. van Wijnen, Florian Dubost, Pinar Yilmaz, M. Arfan Ikram, Wiro J. Niessen, Hieab Adams, Meike W. Vernooij, Marleen de Bruijne
Abstract	Localization of focal vascular lesions on brain MRI is an important component of research on the etiology of neurological disorders. However, manual annotation of lesions can be challenging, time-consuming and subject to observer bias. Automated detection methods often need voxel-wise annotations for training. We propose a novel approach for automated lesion detection that can be trained on scans only annotated with a dot per lesion instead of a full segmentation. From the dot annotations and their corresponding intensity images we compute various distance maps (DMs), indicating the distance to a lesion based on spatial distance, intensity distance, or both. We train a fully convolutional neural network (FCN) to predict these DMs for unseen intensity images. The local optima in the predicted DMs are expected to correspond to lesion locations. We show the potential of this approach to detect enlarged perivascular spaces in white matter on a large brain MRI dataset with an independent test set of 1000 scans. Our method matches the intra-rater performance of the expert rater that was computed on an independent set. We compare the different types of distance maps, showing that incorporating intensity information in the distance maps used to train an FCN greatly improves performance.
Tasks
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12452v1
PDF	https://arxiv.org/pdf/1907.12452v1.pdf
PWC	https://paperswithcode.com/paper/automated-lesion-detection-by-regressing
Repo
Framework


Title	Detecting Pathogenic Social Media Accounts without Content or Network Structure
Authors	Elham Shaabani, Ruocheng Guo, Paulo Shakarian
Abstract	The spread of harmful mis-information in social media is a pressing problem. We refer accounts that have the capability of spreading such information to viral proportions as “Pathogenic Social Media” accounts. These accounts include terrorist supporters accounts, water armies, and fake news writers. We introduce an unsupervised causality-based framework that also leverages label propagation. This approach identifies these users without using network structure, cascade path information, content and user’s information. We show our approach obtains higher precision (0.75) in identifying Pathogenic Social Media accounts in comparison with random (precision of 0.11) and existing bot detection (precision of 0.16) methods.
Tasks
Published	2019-05-04
URL	https://arxiv.org/abs/1905.01556v1
PDF	https://arxiv.org/pdf/1905.01556v1.pdf
PWC	https://paperswithcode.com/paper/detecting-pathogenic-social-media-accounts
Repo
Framework

ViLiVO: Virtual LiDAR-Visual Odometry for an Autonomous Vehicle with a Multi-Camera System


Title	ViLiVO: Virtual LiDAR-Visual Odometry for an Autonomous Vehicle with a Multi-Camera System
Authors	Zhenzhen Xiang, Jingrui Yu, Jie Li, Jianbo Su
Abstract	In this paper, we present a multi-camera visual odometry (VO) system for an autonomous vehicle. Our system mainly consists of a virtual LiDAR and a pose tracker. We use a perspective transformation method to synthesize a surround-view image from undistorted fisheye camera images. With a semantic segmentation model, the free space can be extracted. The scans of the virtual LiDAR are generated by discretizing the contours of the free space. As for the pose tracker, we propose a visual odometry system fusing both the feature matching and the virtual LiDAR scan matching results. Only those feature points located in the free space area are utilized to ensure the 2D-2D matching for pose estimation. Furthermore, bundle adjustment (BA) is performed to minimize the feature points reprojection error and scan matching error. We apply our system to an autonomous vehicle equipped with four fisheye cameras. The testing scenarios include an outdoor parking lot as well as an indoor garage. Experimental results demonstrate that our system achieves a more robust and accurate performance comparing with a fisheye camera based monocular visual odometry system.
Tasks	Monocular Visual Odometry, Pose Estimation, Semantic Segmentation, Visual Odometry
Published	2019-09-30
URL	https://arxiv.org/abs/1909.12947v1
PDF	https://arxiv.org/pdf/1909.12947v1.pdf
PWC	https://paperswithcode.com/paper/vilivo-virtual-lidar-visual-odometry-for-an
Repo
Framework

DeepTIO: A Deep Thermal-Inertial Odometry with Visual Hallucination


Title	DeepTIO: A Deep Thermal-Inertial Odometry with Visual Hallucination
Authors	Muhamad Risqi U. Saputra, Pedro P. B. de Gusmao, Chris Xiaoxuan Lu, Yasin Almalioglu, Stefano Rosa, Changhao Chen, Johan Wahlström, Wei Wang, Andrew Markham, Niki Trigoni
Abstract	Visual odometry shows excellent performance in a wide range of environments. However, in visually-denied scenarios (e.g. heavy smoke or darkness), pose estimates degrade or even fail. Thermal cameras are commonly used for perception and inspection when the environment has low visibility. However, their use in odometry estimation is hampered by the lack of robust visual features. In part, this is as a result of the sensor measuring the ambient temperature profile rather than scene appearance and geometry. To overcome this issue, we propose a Deep Neural Network model for thermal-inertial odometry (DeepTIO) by incorporating a visual hallucination network to provide the thermal network with complementary information. The hallucination network is taught to predict fake visual features from thermal images by using Huber loss. We also employ selective fusion to attentively fuse the features from three different modalities, i.e thermal, hallucination, and inertial features. Extensive experiments are performed in hand-held and mobile robot data in benign and smoke-filled environments, showing the efficacy of the proposed model.
Tasks	Visual Odometry
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07231v2
PDF	https://arxiv.org/pdf/1909.07231v2.pdf
PWC	https://paperswithcode.com/paper/deeptio-a-deep-thermal-inertial-odometry-with
Repo
Framework

Machine Vision for Natural Gas Methane Emissions Detection Using an Infrared Camera


Title	Machine Vision for Natural Gas Methane Emissions Detection Using an Infrared Camera
Authors	Jingfan Wang, Lyne P. Tchapmi, Arvind P. Ravikumara, Mike McGuire, Clay S. Bell, Daniel Zimmerle, Silvio Savarese, Adam R. Brandt
Abstract	It is crucial to reduce natural gas methane emissions, which can potentially offset the climate benefits of replacing coal with gas. Optical gas imaging (OGI) is a widely-used method to detect methane leaks, but is labor-intensive and cannot provide leak detection results without operators’ judgment. In this paper, we develop a computer vision approach to OGI-based leak detection using convolutional neural networks (CNN) trained on methane leak images to enable automatic detection. First, we collect ~1 M frames of labeled video of methane leaks from different leaking equipment for building CNN model, covering a wide range of leak sizes (5.3-2051.6 gCH4/h) and imaging distances (4.6-15.6 m). Second, we examine different background subtraction methods to extract the methane plume in the foreground. Third, we then test three CNN model variants, collectively called GasNet, to detect plumes in videos taken at other pieces of leaking equipment. We assess the ability of GasNet to perform leak detection by comparing it to a baseline method that uses optical-flow based change detection algorithm. We explore the sensitivity of results to the CNN structure, with a moderate-complexity variant performing best across distances. We find that the detection accuracy can reach as high as 99%, the overall detection accuracy can exceed 95% for a case across all leak sizes and imaging distances. Binary detection accuracy exceeds 97% for large leaks (~710 gCH4/h) imaged closely (~5-7 m). At closer imaging distances (~5-10 m), CNN-based models have greater than 94% accuracy across all leak sizes. At farthest distances (~13-16 m), performance degrades rapidly, but it can achieve above 95% accuracy to detect large leaks (>950 gCH4/h). The GasNet-based computer vision approach could be deployed in OGI surveys to allow automatic vigilance of methane leak detection with high detection accuracy in the real world.
Tasks	Optical Flow Estimation
Published	2019-04-01
URL	http://arxiv.org/abs/1904.08500v1
PDF	http://arxiv.org/pdf/1904.08500v1.pdf
PWC	https://paperswithcode.com/paper/190408500
Repo
Framework

Multi-Task Deep Learning with Dynamic Programming for Embryo Early Development Stage Classification from Time-Lapse Videos


Title	Multi-Task Deep Learning with Dynamic Programming for Embryo Early Development Stage Classification from Time-Lapse Videos
Authors	Zihan Liu, Bo Huang, Yuqi Cui, Yifan Xu, Bo Zhang, Lixia Zhu, Yang Wang, Lei Jin, Dongrui Wu
Abstract	Time-lapse is a technology used to record the development of embryos during in-vitro fertilization (IVF). Accurate classification of embryo early development stages can provide embryologists valuable information for assessing the embryo quality, and hence is critical to the success of IVF. This paper proposes a multi-task deep learning with dynamic programming (MTDL-DP) approach for this purpose. It first uses MTDL to pre-classify each frame in the time-lapse video to an embryo development stage, and then DP to optimize the stage sequence so that the stage number is monotonically non-decreasing, which usually holds in practice. Different MTDL frameworks, e.g., one-to-many, many-to-one, and many-to-many, are investigated. It is shown that the one-to-many MTDL framework achieved the best compromise between performance and computational cost. To our knowledge, this is the first study that applies MTDL to embryo early development stage classification from time-lapse videos.
Tasks
Published	2019-08-22
URL	https://arxiv.org/abs/1908.09637v1
PDF	https://arxiv.org/pdf/1908.09637v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-deep-learning-with-dynamic
Repo
Framework

GIF2Video: Color Dequantization and Temporal Interpolation of GIF images


Title	GIF2Video: Color Dequantization and Temporal Interpolation of GIF images
Authors	Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai
Abstract	Graphics Interchange Format (GIF) is a highly portable graphics format that is ubiquitous on the Internet. Despite their small sizes, GIF images often contain undesirable visual artifacts such as flat color regions, false contours, color shift, and dotted patterns. In this paper, we propose GIF2Video, the first learning-based method for enhancing the visual quality of GIFs in the wild. We focus on the challenging task of GIF restoration by recovering information lost in the three steps of GIF creation: frame sampling, color quantization, and color dithering. We first propose a novel CNN architecture for color dequantization. It is built upon a compositional architecture for multi-step color correction, with a comprehensive loss function designed to handle large quantization errors. We then adapt the SuperSlomo network for temporal interpolation of GIF frames. We introduce two large datasets, namely GIF-Faces and GIF-Moments, for both training and evaluation. Experimental results show that our method can significantly improve the visual quality of GIFs, and outperforms direct baseline and state-of-the-art approaches.
Tasks	Quantization
Published	2019-01-09
URL	http://arxiv.org/abs/1901.02840v2
PDF	http://arxiv.org/pdf/1901.02840v2.pdf
PWC	https://paperswithcode.com/paper/gif2video-color-dequantization-and-temporal
Repo
Framework