October 17, 2019

3391 words 16 mins read

Paper Group ANR 880

Chest X-rays Classification: A Multi-Label and Fine-Grained Problem. Offline Object Extraction from Dynamic Occupancy Grid Map Sequences. Adjustable Real-time Style Transfer. Unsupervised Deep Domain Adaptation for Pedestrian Detection. Cross-modal Hallucination for Few-shot Fine-grained Recognition. Near Real-Time Data Labeling Using a Depth Senso …

Chest X-rays Classification: A Multi-Label and Fine-Grained Problem


Title	Chest X-rays Classification: A Multi-Label and Fine-Grained Problem
Authors	Zongyuan Ge, Dwarikanath Mahapatra, Suman Sedai, Rahil Garnavi, Rajib Chakravorty
Abstract	The widely used ChestX-ray14 dataset addresses an important medical image classification problem and has the following caveats: 1) many lung pathologies are visually similar, 2) a variant of diseases including lung cancer, tuberculosis, and pneumonia are present in a single scan, i.e. multiple labels and 3) The incidence of healthy images is much larger than diseased samples, creating imbalanced data. These properties are common in medical domain. Existing literature uses stateof- the-art DensetNet/Resnet models being transfer learned where output neurons of the networks are trained for individual diseases to cater for multiple diseases labels in each image. However, most of them don’t consider relationship between multiple classes. In this work we have proposed a novel error function, Multi-label Softmax Loss (MSML), to specifically address the properties of multiple labels and imbalanced data. Moreover, we have designed deep network architecture based on fine-grained classification concept that incorporates MSML. We have evaluated our proposed method on various network backbones and showed consistent performance improvements of AUC-ROC scores on the ChestX-ray14 dataset. The proposed error function provides a new method to gain improved performance across wider medical datasets.
Tasks	Image Classification
Published	2018-07-19
URL	http://arxiv.org/abs/1807.07247v3
PDF	http://arxiv.org/pdf/1807.07247v3.pdf
PWC	https://paperswithcode.com/paper/chest-x-rays-classification-a-multi-label-and
Repo
Framework

Offline Object Extraction from Dynamic Occupancy Grid Map Sequences


Title	Offline Object Extraction from Dynamic Occupancy Grid Map Sequences
Authors	Daniel Stumper, Fabian Gies, Stefan Hoermann, Klaus Dietmayer
Abstract	A dynamic occupancy grid map (DOGMa) allows a fast, robust, and complete environment representation for automated vehicles. Dynamic objects in a DOGMa, however, are commonly represented as independent cells while modeled objects with shape and pose are favorable. The evaluation of algorithms for object extraction or the training and validation of learning algorithms rely on labeled ground truth data. Manually annotating objects in a DOGMa to obtain ground truth data is a time consuming and expensive process. Additionally the quality of labeled data depend strongly on the variation of filtered input data. The presented work introduces an automatic labeling process, where a full sequence is used to extract the best possible object pose and shape in terms of temporal consistency. A two direction temporal search is executed to trace single objects over a sequence, where the best estimate of its extent and pose is refined in every time step. Furthermore, the presented algorithm only uses statistical constraints of the cell clusters for the object extraction instead of fixed heuristic parameters. Experimental results show a well-performing automatic labeling algorithm with real sensor data even at challenging scenarios.
Tasks
Published	2018-04-11
URL	http://arxiv.org/abs/1804.03933v1
PDF	http://arxiv.org/pdf/1804.03933v1.pdf
PWC	https://paperswithcode.com/paper/offline-object-extraction-from-dynamic
Repo
Framework

Adjustable Real-time Style Transfer


Title	Adjustable Real-time Style Transfer
Authors	Mohammad Babaeizadeh, Golnaz Ghiasi
Abstract	Artistic style transfer is the problem of synthesizing an image with content similar to a given image and style similar to another. Although recent feed-forward neural networks can generate stylized images in real-time, these models produce a single stylization given a pair of style/content images, and the user doesn’t have control over the synthesized output. Moreover, the style transfer depends on the hyper-parameters of the model with varying “optimum” for different input images. Therefore, if the stylized output is not appealing to the user, she/he has to try multiple models or retrain one with different hyper-parameters to get a favorite stylization. In this paper, we address these issues by proposing a novel method which allows adjustment of crucial hyper-parameters, after the training and in real-time, through a set of manually adjustable parameters. These parameters enable the user to modify the synthesized outputs from the same pair of style/content images, in search of a favorite stylized image. Our quantitative and qualitative experiments indicate how adjusting these parameters is comparable to retraining the model with different hyper-parameters. We also demonstrate how these parameters can be randomized to generate results which are diverse but still very similar in style and content.
Tasks	Style Transfer
Published	2018-11-21
URL	http://arxiv.org/abs/1811.08560v1
PDF	http://arxiv.org/pdf/1811.08560v1.pdf
PWC	https://paperswithcode.com/paper/adjustable-real-time-style-transfer
Repo
Framework

Unsupervised Deep Domain Adaptation for Pedestrian Detection


Title	Unsupervised Deep Domain Adaptation for Pedestrian Detection
Authors	Lihang Liu, Weiyao Lin, Lisheng Wu, Yong Yu, Michael Ying Yang
Abstract	This paper addresses the problem of unsupervised domain adaptation on the task of pedestrian detection in crowded scenes. First, we utilize an iterative algorithm to iteratively select and auto-annotate positive pedestrian samples with high confidence as the training samples for the target domain. Meanwhile, we also reuse negative samples from the source domain to compensate for the imbalance between the amount of positive samples and negative samples. Second, based on the deep network we also design an unsupervised regularizer to mitigate influence from data noise. More specifically, we transform the last fully connected layer into two sub-layers - an element-wise multiply layer and a sum layer, and add the unsupervised regularizer to further improve the domain adaptation accuracy. In experiments for pedestrian detection, the proposed method boosts the recall value by nearly 30% while the precision stays almost the same. Furthermore, we perform our method on standard domain adaptation benchmarks on both supervised and unsupervised settings and also achieve state-of-the-art results.
Tasks	Domain Adaptation, Pedestrian Detection, Unsupervised Domain Adaptation
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03269v1
PDF	http://arxiv.org/pdf/1802.03269v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-deep-domain-adaptation-for
Repo
Framework


Title	Cross-modal Hallucination for Few-shot Fine-grained Recognition
Authors	Frederik Pahde, Patrick Jähnichen, Tassilo Klein, Moin Nabi
Abstract	State-of-the-art deep learning algorithms generally require large amounts of data for model training. Lack thereof can severely deteriorate the performance, particularly in scenarios with fine-grained boundaries between categories. To this end, we propose a multimodal approach that facilitates bridging the information gap by means of meaningful joint embeddings. Specifically, we present a benchmark that is multimodal during training (i.e. images and texts) and single-modal in testing time (i.e. images), with the associated task to utilize multimodal data in base classes (with many samples), to learn explicit visual classifiers for novel classes (with few samples). Next, we propose a framework built upon the idea of cross-modal data hallucination. In this regard, we introduce a discriminative text-conditional GAN for sample generation with a simple self-paced strategy for sample selection. We show the results of our proposed discriminative hallucinated method for 1-, 2-, and 5- shot learning on the CUB dataset, where the accuracy is improved by employing multimodal data.
Tasks
Published	2018-06-13
URL	http://arxiv.org/abs/1806.05147v2
PDF	http://arxiv.org/pdf/1806.05147v2.pdf
PWC	https://paperswithcode.com/paper/cross-modal-hallucination-for-few-shot-fine
Repo
Framework

Near Real-Time Data Labeling Using a Depth Sensor for EMG Based Prosthetic Arms


Title	Near Real-Time Data Labeling Using a Depth Sensor for EMG Based Prosthetic Arms
Authors	Geesara Prathap, Titus Nanda Kumara, Roshan Ragel
Abstract	Recognizing sEMG (Surface Electromyography) signals belonging to a particular action (e.g., lateral arm raise) automatically is a challenging task as EMG signals themselves have a lot of variation even for the same action due to several factors. To overcome this issue, there should be a proper separation which indicates similar patterns repetitively for a particular action in raw signals. A repetitive pattern is not always matched because the same action can be carried out with different time duration. Thus, a depth sensor (Kinect) was used for pattern identification where three joint angles were recording continuously which is clearly separable for a particular action while recording sEMG signals. To Segment out a repetitive pattern in angle data, MDTW (Moving Dynamic Time Warping) approach is introduced. This technique is allowed to retrieve suspected motion of interest from raw signals. MDTW based on DTW algorithm, but it will be moving through the whole dataset in a pre-defined manner which is capable of picking up almost all the suspected segments inside a given dataset an optimal way. Elevated bicep curl and lateral arm raise movements are taken as motions of interest to show how the proposed technique can be employed to achieve auto identification and labelling. The full implementation is available at https://github.com/GPrathap/OpenBCIPython
Tasks
Published	2018-11-10
URL	http://arxiv.org/abs/1811.04239v1
PDF	http://arxiv.org/pdf/1811.04239v1.pdf
PWC	https://paperswithcode.com/paper/near-real-time-data-labeling-using-a-depth
Repo
Framework

CURE-OR: Challenging Unreal and Real Environments for Object Recognition


Title	CURE-OR: Challenging Unreal and Real Environments for Object Recognition
Authors	Dogancan Temel, Jinsol Lee, Ghassan AlRegib
Abstract	In this paper, we introduce a large-scale, controlled, and multi-platform object recognition dataset denoted as Challenging Unreal and Real Environments for Object Recognition (CURE-OR). In this dataset, there are 1,000,000 images of 100 objects with varying size, color, and texture that are positioned in five different orientations and captured using five devices including a webcam, a DSLR, and three smartphone cameras in real-world (real) and studio (unreal) environments. The controlled challenging conditions include underexposure, overexposure, blur, contrast, dirty lens, image noise, resizing, and loss of color information. We utilize CURE-OR dataset to test recognition APIs-Amazon Rekognition and Microsoft Azure Computer Vision- and show that their performance significantly degrades under challenging conditions. Moreover, we investigate the relationship between object recognition and image quality and show that objective quality algorithms can estimate recognition performance under certain photometric challenging conditions. The dataset is publicly available at https://ghassanalregib.com/cure-or/.
Tasks	Object Recognition
Published	2018-10-18
URL	http://arxiv.org/abs/1810.08293v2
PDF	http://arxiv.org/pdf/1810.08293v2.pdf
PWC	https://paperswithcode.com/paper/cure-or-challenging-unreal-and-real
Repo
Framework

Chest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation and Augmentation


Title	Chest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation and Augmentation
Authors	Sergii Stirenko, Yuriy Kochura, Oleg Alienin, Oleksandr Rokovyi, Peng Gang, Wei Zeng, Yuri Gordienko
Abstract	The results of chest X-ray (CXR) analysis of 2D images to get the statistically reliable predictions (availability of tuberculosis) by computer-aided diagnosis (CADx) on the basis of deep learning are presented. They demonstrate the efficiency of lung segmentation, lossless and lossy data augmentation for CADx of tuberculosis by deep convolutional neural network (CNN) applied to the small and not well-balanced dataset even. CNN demonstrates ability to train (despite overfitting) on the pre-processed dataset obtained after lung segmentation in contrast to the original not-segmented dataset. Lossless data augmentation of the segmented dataset leads to the lowest validation loss (without overfitting) and nearly the same accuracy (within the limits of standard deviation) in comparison to the original and other pre-processed datasets after lossy data augmentation. The additional limited lossy data augmentation results in the lower validation loss, but with a decrease of the validation accuracy. In conclusion, besides the more complex deep CNNs and bigger datasets, the better progress of CADx for the small and not well-balanced datasets even could be obtained by better segmentation, data augmentation, dataset stratification, and exclusion of non-evident outliers.
Tasks	Data Augmentation
Published	2018-03-03
URL	http://arxiv.org/abs/1803.01199v1
PDF	http://arxiv.org/pdf/1803.01199v1.pdf
PWC	https://paperswithcode.com/paper/chest-x-ray-analysis-of-tuberculosis-by-deep
Repo
Framework

Aggregated Channels Network for Real-Time Pedestrian Detection


Title	Aggregated Channels Network for Real-Time Pedestrian Detection
Authors	Farzin Ghorban, Javier Marín, Yu Su, Alessandro Colombo, Anton Kummert
Abstract	Convolutional neural networks (CNNs) have demonstrated their superiority in numerous computer vision tasks, yet their computational cost results prohibitive for many real-time applications such as pedestrian detection which is usually performed on low-consumption hardware. In order to alleviate this drawback, most strategies focus on using a two-stage cascade approach. Essentially, in the first stage a fast method generates a significant but reduced amount of high quality proposals that later, in the second stage, are evaluated by the CNN. In this work, we propose a novel detection pipeline that further benefits from the two-stage cascade strategy. More concretely, the enriched and subsequently compressed features used in the first stage are reused as the CNN input. As a consequence, a simpler network architecture, adapted for such small input sizes, allows to achieve real-time performance and obtain results close to the state-of-the-art while running significantly faster without the use of GPU. In particular, considering that the proposed pipeline runs in frame rate, the achieved performance is highly competitive. We furthermore demonstrate that the proposed pipeline on itself can serve as an effective proposal generator.
Tasks	Pedestrian Detection
Published	2018-01-01
URL	http://arxiv.org/abs/1801.00476v1
PDF	http://arxiv.org/pdf/1801.00476v1.pdf
PWC	https://paperswithcode.com/paper/aggregated-channels-network-for-real-time
Repo
Framework

Accurate reconstruction of image stimuli from human fMRI based on the decoding model with capsule network architecture


Title	Accurate reconstruction of image stimuli from human fMRI based on the decoding model with capsule network architecture
Authors	Kai Qiao, Chi Zhang, Linyuan Wang, Bin Yan, Jian Chen, Lei Zeng, Li Tong
Abstract	In neuroscience, all kinds of computation models were designed to answer the open question of how sensory stimuli are encoded by neurons and conversely, how sensory stimuli can be decoded from neuronal activities. Especially, functional Magnetic Resonance Imaging (fMRI) studies have made many great achievements with the rapid development of the deep network computation. However, comparing with the goal of decoding orientation, position and object category from activities in visual cortex, accurate reconstruction of image stimuli from human fMRI is a still challenging work. In this paper, the capsule network (CapsNet) architecture based visual reconstruction (CNAVR) method is developed to reconstruct image stimuli. The capsule means containing a group of neurons to perform the better organization of feature structure and representation, inspired by the structure of cortical mini column including several hundred neurons in primates. The high-level capsule features in the CapsNet includes diverse features of image stimuli such as semantic class, orientation, location and so on. We used these features to bridge between human fMRI and image stimuli. We firstly employed the CapsNet to train the nonlinear mapping from image stimuli to high-level capsule features, and from high-level capsule features to image stimuli again in an end-to-end manner. After estimating the serviceability of each voxel by encoding performance to accomplish the selecting of voxels, we secondly trained the nonlinear mapping from dimension-decreasing fMRI data to high-level capsule features. Finally, we can predict the high-level capsule features with fMRI data, and reconstruct image stimuli with the CapsNet. We evaluated the proposed CNAVR method on the dataset of handwritten digital images, and exceeded about 10% than the accuracy of all existing state-of-the-art methods on the structural similarity index (SSIM).
Tasks
Published	2018-01-02
URL	http://arxiv.org/abs/1801.00602v1
PDF	http://arxiv.org/pdf/1801.00602v1.pdf
PWC	https://paperswithcode.com/paper/accurate-reconstruction-of-image-stimuli-from
Repo
Framework

Discovering Process Maps from Event Streams


Title	Discovering Process Maps from Event Streams
Authors	Volodymyr Leno, Abel Armas-Cervantes, Marlon Dumas, Marcello La Rosa, Fabrizio M. Maggi
Abstract	Automated process discovery is a class of process mining methods that allow analysts to extract business process models from event logs. Traditional process discovery methods extract process models from a snapshot of an event log stored in its entirety. In some scenarios, however, events keep coming with a high arrival rate to the extent that it is impractical to store the entire event log and to continuously re-discover a process model from scratch. Such scenarios require online process discovery approaches. Given an event stream produced by the execution of a business process, the goal of an online process discovery method is to maintain a continuously updated model of the process with a bounded amount of memory while at the same time achieving similar accuracy as offline methods. However, existing online discovery approaches require relatively large amounts of memory to achieve levels of accuracy comparable to that of offline methods. Therefore, this paper proposes an approach that addresses this limitation by mapping the problem of online process discovery to that of cache memory management, and applying well-known cache replacement policies to the problem of online process discovery. The approach has been implemented in .NET, experimentally integrated with the Minit process mining tool and comparatively evaluated against an existing baseline using real-life datasets.
Tasks
Published	2018-04-08
URL	http://arxiv.org/abs/1804.02704v1
PDF	http://arxiv.org/pdf/1804.02704v1.pdf
PWC	https://paperswithcode.com/paper/discovering-process-maps-from-event-streams
Repo
Framework

Optimal link prediction with matrix logistic regression


Title	Optimal link prediction with matrix logistic regression
Authors	Nicolai Baldin, Quentin Berthet
Abstract	We consider the problem of link prediction, based on partial observation of a large network, and on side information associated to its vertices. The generative model is formulated as a matrix logistic regression. The performance of the model is analysed in a high-dimensional regime under a structural assumption. The minimax rate for the Frobenius-norm risk is established and a combinatorial estimator based on the penalised maximum likelihood approach is shown to achieve it. Furthermore, it is shown that this rate cannot be attained by any (randomised) algorithm computable in polynomial time under a computational complexity assumption.
Tasks	Link Prediction
Published	2018-03-19
URL	http://arxiv.org/abs/1803.07054v1
PDF	http://arxiv.org/pdf/1803.07054v1.pdf
PWC	https://paperswithcode.com/paper/optimal-link-prediction-with-matrix-logistic
Repo
Framework

Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs


Title	Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs
Authors	Mohammad Sadegh Talebi, Odalric-Ambrym Maillard
Abstract	The problem of reinforcement learning in an unknown and discrete Markov Decision Process (MDP) under the average-reward criterion is considered, when the learner interacts with the system in a single stream of observations, starting from an initial state without any reset. We revisit the minimax lower bound for that problem by making appear the local variance of the bias function in place of the diameter of the MDP. Furthermore, we provide a novel analysis of the KL-UCRL algorithm establishing a high-probability regret bound scaling as $\widetilde {\mathcal O}\Bigl({\textstyle \sqrt{S\sum_{s,a}{\bf V}^\star_{s,a}T}}\Big)$ for this algorithm for ergodic MDPs, where $S$ denotes the number of states and where ${\bf V}^\star_{s,a}$ is the variance of the bias function with respect to the next-state distribution following action $a$ in state $s$. The resulting bound improves upon the best previously known regret bound $\widetilde {\mathcal O}(DS\sqrt{AT})$ for that algorithm, where $A$ and $D$ respectively denote the maximum number of actions (per state) and the diameter of MDP. We finally compare the leading terms of the two bounds in some benchmark MDPs indicating that the derived bound can provide an order of magnitude improvement in some cases. Our analysis leverages novel variations of the transportation lemma combined with Kullback-Leibler concentration inequalities, that we believe to be of independent interest.
Tasks
Published	2018-03-05
URL	http://arxiv.org/abs/1803.01626v1
PDF	http://arxiv.org/pdf/1803.01626v1.pdf
PWC	https://paperswithcode.com/paper/variance-aware-regret-bounds-for-undiscounted
Repo
Framework

Intent Detection and Slots Prompt in a Closed-Domain Chatbot


Title	Intent Detection and Slots Prompt in a Closed-Domain Chatbot
Authors	Amber Nigam, Prashik Sahare, Kushagra Pandya
Abstract	In this paper, we introduce a methodology for predicting intent and slots of a query for a chatbot that answers career-related queries. We take a multi-staged approach where both the processes (intent-classification and slot-tagging) inform each other’s decision-making in different stages. The model breaks down the problem into stages, solving one problem at a time and passing on relevant results of the current stage to the next, thereby reducing search space for subsequent stages, and eventually making classification and tagging more viable after each stage. We also observe that relaxing rules for a fuzzy entity-matching in slot-tagging after each stage (by maintaining a separate Named Entity Tagger per stage) helps us improve performance, although at a slight cost of false-positives. Our model has achieved state-of-the-art performance with F1-score of 77.63% for intent-classification and 82.24% for slot-tagging on our dataset that we would publicly release along with the paper.
Tasks	Chatbot, Decision Making, Intent Classification, Intent Detection
Published	2018-12-27
URL	http://arxiv.org/abs/1812.10628v2
PDF	http://arxiv.org/pdf/1812.10628v2.pdf
PWC	https://paperswithcode.com/paper/intent-detection-and-slots-prompt-in-a-closed
Repo
Framework

Reversible Image Watermarking for Health Informatics Systems Using Distortion Compensation in Wavelet Domain


Title	Reversible Image Watermarking for Health Informatics Systems Using Distortion Compensation in Wavelet Domain
Authors	Hamidreza Zarrabi, Mohsen Hajabdollahi, S. M. Reza Soroushmehr, Nader Karimi, Shadrokh Samavi, Kayvan Najarian
Abstract	Reversible image watermarking guaranties restoration of both original cover and watermark logo from the watermarked image. Capacity and distortion of the image under reversible watermarking are two important parameters. In this study a reversible watermarking is investigated with focusing on increasing the embedding capacity and reducing the distortion in medical images. Integer wavelet transform is used for embedding where in each iteration, one watermark bit is embedded in one transform coefficient. We devise a novel approach that when a coefficient is modified in an iteration, the produced distortion is compensated in the next iteration. This distortion compensation method would result in low distortion rate. The proposed method is tested on four types of medical images including MRI of brain, cardiac MRI, MRI of breast, and intestinal polyp images. Using a one-level wavelet transform, maximum capacity of 1.5 BPP is obtained. Experimental results demonstrate that the proposed method is superior to the state-of-the-art works in terms of capacity and distortion.
Tasks
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07786v1
PDF	http://arxiv.org/pdf/1802.07786v1.pdf
PWC	https://paperswithcode.com/paper/reversible-image-watermarking-for-health
Repo
Framework