Paper Group ANR 880
Chest X-rays Classification: A Multi-Label and Fine-Grained Problem. Offline Object Extraction from Dynamic Occupancy Grid Map Sequences. Adjustable Real-time Style Transfer. Unsupervised Deep Domain Adaptation for Pedestrian Detection. Cross-modal Hallucination for Few-shot Fine-grained Recognition. Near Real-Time Data Labeling Using a Depth Senso …
Chest X-rays Classification: A Multi-Label and Fine-Grained Problem
Title | Chest X-rays Classification: A Multi-Label and Fine-Grained Problem |
Authors | Zongyuan Ge, Dwarikanath Mahapatra, Suman Sedai, Rahil Garnavi, Rajib Chakravorty |
Abstract | The widely used ChestX-ray14 dataset addresses an important medical image classification problem and has the following caveats: 1) many lung pathologies are visually similar, 2) a variant of diseases including lung cancer, tuberculosis, and pneumonia are present in a single scan, i.e. multiple labels and 3) The incidence of healthy images is much larger than diseased samples, creating imbalanced data. These properties are common in medical domain. Existing literature uses stateof- the-art DensetNet/Resnet models being transfer learned where output neurons of the networks are trained for individual diseases to cater for multiple diseases labels in each image. However, most of them don’t consider relationship between multiple classes. In this work we have proposed a novel error function, Multi-label Softmax Loss (MSML), to specifically address the properties of multiple labels and imbalanced data. Moreover, we have designed deep network architecture based on fine-grained classification concept that incorporates MSML. We have evaluated our proposed method on various network backbones and showed consistent performance improvements of AUC-ROC scores on the ChestX-ray14 dataset. The proposed error function provides a new method to gain improved performance across wider medical datasets. |
Tasks | Image Classification |
Published | 2018-07-19 |
URL | http://arxiv.org/abs/1807.07247v3 |
http://arxiv.org/pdf/1807.07247v3.pdf | |
PWC | https://paperswithcode.com/paper/chest-x-rays-classification-a-multi-label-and |
Repo | |
Framework | |
Offline Object Extraction from Dynamic Occupancy Grid Map Sequences
Title | Offline Object Extraction from Dynamic Occupancy Grid Map Sequences |
Authors | Daniel Stumper, Fabian Gies, Stefan Hoermann, Klaus Dietmayer |
Abstract | A dynamic occupancy grid map (DOGMa) allows a fast, robust, and complete environment representation for automated vehicles. Dynamic objects in a DOGMa, however, are commonly represented as independent cells while modeled objects with shape and pose are favorable. The evaluation of algorithms for object extraction or the training and validation of learning algorithms rely on labeled ground truth data. Manually annotating objects in a DOGMa to obtain ground truth data is a time consuming and expensive process. Additionally the quality of labeled data depend strongly on the variation of filtered input data. The presented work introduces an automatic labeling process, where a full sequence is used to extract the best possible object pose and shape in terms of temporal consistency. A two direction temporal search is executed to trace single objects over a sequence, where the best estimate of its extent and pose is refined in every time step. Furthermore, the presented algorithm only uses statistical constraints of the cell clusters for the object extraction instead of fixed heuristic parameters. Experimental results show a well-performing automatic labeling algorithm with real sensor data even at challenging scenarios. |
Tasks | |
Published | 2018-04-11 |
URL | http://arxiv.org/abs/1804.03933v1 |
http://arxiv.org/pdf/1804.03933v1.pdf | |
PWC | https://paperswithcode.com/paper/offline-object-extraction-from-dynamic |
Repo | |
Framework | |
Adjustable Real-time Style Transfer
Title | Adjustable Real-time Style Transfer |
Authors | Mohammad Babaeizadeh, Golnaz Ghiasi |
Abstract | Artistic style transfer is the problem of synthesizing an image with content similar to a given image and style similar to another. Although recent feed-forward neural networks can generate stylized images in real-time, these models produce a single stylization given a pair of style/content images, and the user doesn’t have control over the synthesized output. Moreover, the style transfer depends on the hyper-parameters of the model with varying “optimum” for different input images. Therefore, if the stylized output is not appealing to the user, she/he has to try multiple models or retrain one with different hyper-parameters to get a favorite stylization. In this paper, we address these issues by proposing a novel method which allows adjustment of crucial hyper-parameters, after the training and in real-time, through a set of manually adjustable parameters. These parameters enable the user to modify the synthesized outputs from the same pair of style/content images, in search of a favorite stylized image. Our quantitative and qualitative experiments indicate how adjusting these parameters is comparable to retraining the model with different hyper-parameters. We also demonstrate how these parameters can be randomized to generate results which are diverse but still very similar in style and content. |
Tasks | Style Transfer |
Published | 2018-11-21 |
URL | http://arxiv.org/abs/1811.08560v1 |
http://arxiv.org/pdf/1811.08560v1.pdf | |
PWC | https://paperswithcode.com/paper/adjustable-real-time-style-transfer |
Repo | |
Framework | |
Unsupervised Deep Domain Adaptation for Pedestrian Detection
Title | Unsupervised Deep Domain Adaptation for Pedestrian Detection |
Authors | Lihang Liu, Weiyao Lin, Lisheng Wu, Yong Yu, Michael Ying Yang |
Abstract | This paper addresses the problem of unsupervised domain adaptation on the task of pedestrian detection in crowded scenes. First, we utilize an iterative algorithm to iteratively select and auto-annotate positive pedestrian samples with high confidence as the training samples for the target domain. Meanwhile, we also reuse negative samples from the source domain to compensate for the imbalance between the amount of positive samples and negative samples. Second, based on the deep network we also design an unsupervised regularizer to mitigate influence from data noise. More specifically, we transform the last fully connected layer into two sub-layers - an element-wise multiply layer and a sum layer, and add the unsupervised regularizer to further improve the domain adaptation accuracy. In experiments for pedestrian detection, the proposed method boosts the recall value by nearly 30% while the precision stays almost the same. Furthermore, we perform our method on standard domain adaptation benchmarks on both supervised and unsupervised settings and also achieve state-of-the-art results. |
Tasks | Domain Adaptation, Pedestrian Detection, Unsupervised Domain Adaptation |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03269v1 |
http://arxiv.org/pdf/1802.03269v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-deep-domain-adaptation-for |
Repo | |
Framework | |
Cross-modal Hallucination for Few-shot Fine-grained Recognition
Title | Cross-modal Hallucination for Few-shot Fine-grained Recognition |
Authors | Frederik Pahde, Patrick Jähnichen, Tassilo Klein, Moin Nabi |
Abstract | State-of-the-art deep learning algorithms generally require large amounts of data for model training. Lack thereof can severely deteriorate the performance, particularly in scenarios with fine-grained boundaries between categories. To this end, we propose a multimodal approach that facilitates bridging the information gap by means of meaningful joint embeddings. Specifically, we present a benchmark that is multimodal during training (i.e. images and texts) and single-modal in testing time (i.e. images), with the associated task to utilize multimodal data in base classes (with many samples), to learn explicit visual classifiers for novel classes (with few samples). Next, we propose a framework built upon the idea of cross-modal data hallucination. In this regard, we introduce a discriminative text-conditional GAN for sample generation with a simple self-paced strategy for sample selection. We show the results of our proposed discriminative hallucinated method for 1-, 2-, and 5- shot learning on the CUB dataset, where the accuracy is improved by employing multimodal data. |
Tasks | |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.05147v2 |
http://arxiv.org/pdf/1806.05147v2.pdf | |
PWC | https://paperswithcode.com/paper/cross-modal-hallucination-for-few-shot-fine |
Repo | |
Framework | |
Near Real-Time Data Labeling Using a Depth Sensor for EMG Based Prosthetic Arms
Title | Near Real-Time Data Labeling Using a Depth Sensor for EMG Based Prosthetic Arms |
Authors | Geesara Prathap, Titus Nanda Kumara, Roshan Ragel |
Abstract | Recognizing sEMG (Surface Electromyography) signals belonging to a particular action (e.g., lateral arm raise) automatically is a challenging task as EMG signals themselves have a lot of variation even for the same action due to several factors. To overcome this issue, there should be a proper separation which indicates similar patterns repetitively for a particular action in raw signals. A repetitive pattern is not always matched because the same action can be carried out with different time duration. Thus, a depth sensor (Kinect) was used for pattern identification where three joint angles were recording continuously which is clearly separable for a particular action while recording sEMG signals. To Segment out a repetitive pattern in angle data, MDTW (Moving Dynamic Time Warping) approach is introduced. This technique is allowed to retrieve suspected motion of interest from raw signals. MDTW based on DTW algorithm, but it will be moving through the whole dataset in a pre-defined manner which is capable of picking up almost all the suspected segments inside a given dataset an optimal way. Elevated bicep curl and lateral arm raise movements are taken as motions of interest to show how the proposed technique can be employed to achieve auto identification and labelling. The full implementation is available at https://github.com/GPrathap/OpenBCIPython |
Tasks | |
Published | 2018-11-10 |
URL | http://arxiv.org/abs/1811.04239v1 |
http://arxiv.org/pdf/1811.04239v1.pdf | |
PWC | https://paperswithcode.com/paper/near-real-time-data-labeling-using-a-depth |
Repo | |
Framework | |
CURE-OR: Challenging Unreal and Real Environments for Object Recognition
Title | CURE-OR: Challenging Unreal and Real Environments for Object Recognition |
Authors | Dogancan Temel, Jinsol Lee, Ghassan AlRegib |
Abstract | In this paper, we introduce a large-scale, controlled, and multi-platform object recognition dataset denoted as Challenging Unreal and Real Environments for Object Recognition (CURE-OR). In this dataset, there are 1,000,000 images of 100 objects with varying size, color, and texture that are positioned in five different orientations and captured using five devices including a webcam, a DSLR, and three smartphone cameras in real-world (real) and studio (unreal) environments. The controlled challenging conditions include underexposure, overexposure, blur, contrast, dirty lens, image noise, resizing, and loss of color information. We utilize CURE-OR dataset to test recognition APIs-Amazon Rekognition and Microsoft Azure Computer Vision- and show that their performance significantly degrades under challenging conditions. Moreover, we investigate the relationship between object recognition and image quality and show that objective quality algorithms can estimate recognition performance under certain photometric challenging conditions. The dataset is publicly available at https://ghassanalregib.com/cure-or/. |
Tasks | Object Recognition |
Published | 2018-10-18 |
URL | http://arxiv.org/abs/1810.08293v2 |
http://arxiv.org/pdf/1810.08293v2.pdf | |
PWC | https://paperswithcode.com/paper/cure-or-challenging-unreal-and-real |
Repo | |
Framework | |
Chest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation and Augmentation
Title | Chest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation and Augmentation |
Authors | Sergii Stirenko, Yuriy Kochura, Oleg Alienin, Oleksandr Rokovyi, Peng Gang, Wei Zeng, Yuri Gordienko |
Abstract | The results of chest X-ray (CXR) analysis of 2D images to get the statistically reliable predictions (availability of tuberculosis) by computer-aided diagnosis (CADx) on the basis of deep learning are presented. They demonstrate the efficiency of lung segmentation, lossless and lossy data augmentation for CADx of tuberculosis by deep convolutional neural network (CNN) applied to the small and not well-balanced dataset even. CNN demonstrates ability to train (despite overfitting) on the pre-processed dataset obtained after lung segmentation in contrast to the original not-segmented dataset. Lossless data augmentation of the segmented dataset leads to the lowest validation loss (without overfitting) and nearly the same accuracy (within the limits of standard deviation) in comparison to the original and other pre-processed datasets after lossy data augmentation. The additional limited lossy data augmentation results in the lower validation loss, but with a decrease of the validation accuracy. In conclusion, besides the more complex deep CNNs and bigger datasets, the better progress of CADx for the small and not well-balanced datasets even could be obtained by better segmentation, data augmentation, dataset stratification, and exclusion of non-evident outliers. |
Tasks | Data Augmentation |
Published | 2018-03-03 |
URL | http://arxiv.org/abs/1803.01199v1 |
http://arxiv.org/pdf/1803.01199v1.pdf | |
PWC | https://paperswithcode.com/paper/chest-x-ray-analysis-of-tuberculosis-by-deep |
Repo | |
Framework | |
Aggregated Channels Network for Real-Time Pedestrian Detection
Title | Aggregated Channels Network for Real-Time Pedestrian Detection |
Authors | Farzin Ghorban, Javier Marín, Yu Su, Alessandro Colombo, Anton Kummert |
Abstract | Convolutional neural networks (CNNs) have demonstrated their superiority in numerous computer vision tasks, yet their computational cost results prohibitive for many real-time applications such as pedestrian detection which is usually performed on low-consumption hardware. In order to alleviate this drawback, most strategies focus on using a two-stage cascade approach. Essentially, in the first stage a fast method generates a significant but reduced amount of high quality proposals that later, in the second stage, are evaluated by the CNN. In this work, we propose a novel detection pipeline that further benefits from the two-stage cascade strategy. More concretely, the enriched and subsequently compressed features used in the first stage are reused as the CNN input. As a consequence, a simpler network architecture, adapted for such small input sizes, allows to achieve real-time performance and obtain results close to the state-of-the-art while running significantly faster without the use of GPU. In particular, considering that the proposed pipeline runs in frame rate, the achieved performance is highly competitive. We furthermore demonstrate that the proposed pipeline on itself can serve as an effective proposal generator. |
Tasks | Pedestrian Detection |
Published | 2018-01-01 |
URL | http://arxiv.org/abs/1801.00476v1 |
http://arxiv.org/pdf/1801.00476v1.pdf | |
PWC | https://paperswithcode.com/paper/aggregated-channels-network-for-real-time |
Repo | |
Framework | |
Accurate reconstruction of image stimuli from human fMRI based on the decoding model with capsule network architecture
Title | Accurate reconstruction of image stimuli from human fMRI based on the decoding model with capsule network architecture |
Authors | Kai Qiao, Chi Zhang, Linyuan Wang, Bin Yan, Jian Chen, Lei Zeng, Li Tong |
Abstract | In neuroscience, all kinds of computation models were designed to answer the open question of how sensory stimuli are encoded by neurons and conversely, how sensory stimuli can be decoded from neuronal activities. Especially, functional Magnetic Resonance Imaging (fMRI) studies have made many great achievements with the rapid development of the deep network computation. However, comparing with the goal of decoding orientation, position and object category from activities in visual cortex, accurate reconstruction of image stimuli from human fMRI is a still challenging work. In this paper, the capsule network (CapsNet) architecture based visual reconstruction (CNAVR) method is developed to reconstruct image stimuli. The capsule means containing a group of neurons to perform the better organization of feature structure and representation, inspired by the structure of cortical mini column including several hundred neurons in primates. The high-level capsule features in the CapsNet includes diverse features of image stimuli such as semantic class, orientation, location and so on. We used these features to bridge between human fMRI and image stimuli. We firstly employed the CapsNet to train the nonlinear mapping from image stimuli to high-level capsule features, and from high-level capsule features to image stimuli again in an end-to-end manner. After estimating the serviceability of each voxel by encoding performance to accomplish the selecting of voxels, we secondly trained the nonlinear mapping from dimension-decreasing fMRI data to high-level capsule features. Finally, we can predict the high-level capsule features with fMRI data, and reconstruct image stimuli with the CapsNet. We evaluated the proposed CNAVR method on the dataset of handwritten digital images, and exceeded about 10% than the accuracy of all existing state-of-the-art methods on the structural similarity index (SSIM). |
Tasks | |
Published | 2018-01-02 |
URL | http://arxiv.org/abs/1801.00602v1 |
http://arxiv.org/pdf/1801.00602v1.pdf | |
PWC | https://paperswithcode.com/paper/accurate-reconstruction-of-image-stimuli-from |
Repo | |
Framework | |
Discovering Process Maps from Event Streams
Title | Discovering Process Maps from Event Streams |
Authors | Volodymyr Leno, Abel Armas-Cervantes, Marlon Dumas, Marcello La Rosa, Fabrizio M. Maggi |
Abstract | Automated process discovery is a class of process mining methods that allow analysts to extract business process models from event logs. Traditional process discovery methods extract process models from a snapshot of an event log stored in its entirety. In some scenarios, however, events keep coming with a high arrival rate to the extent that it is impractical to store the entire event log and to continuously re-discover a process model from scratch. Such scenarios require online process discovery approaches. Given an event stream produced by the execution of a business process, the goal of an online process discovery method is to maintain a continuously updated model of the process with a bounded amount of memory while at the same time achieving similar accuracy as offline methods. However, existing online discovery approaches require relatively large amounts of memory to achieve levels of accuracy comparable to that of offline methods. Therefore, this paper proposes an approach that addresses this limitation by mapping the problem of online process discovery to that of cache memory management, and applying well-known cache replacement policies to the problem of online process discovery. The approach has been implemented in .NET, experimentally integrated with the Minit process mining tool and comparatively evaluated against an existing baseline using real-life datasets. |
Tasks | |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02704v1 |
http://arxiv.org/pdf/1804.02704v1.pdf | |
PWC | https://paperswithcode.com/paper/discovering-process-maps-from-event-streams |
Repo | |
Framework | |
Optimal link prediction with matrix logistic regression
Title | Optimal link prediction with matrix logistic regression |
Authors | Nicolai Baldin, Quentin Berthet |
Abstract | We consider the problem of link prediction, based on partial observation of a large network, and on side information associated to its vertices. The generative model is formulated as a matrix logistic regression. The performance of the model is analysed in a high-dimensional regime under a structural assumption. The minimax rate for the Frobenius-norm risk is established and a combinatorial estimator based on the penalised maximum likelihood approach is shown to achieve it. Furthermore, it is shown that this rate cannot be attained by any (randomised) algorithm computable in polynomial time under a computational complexity assumption. |
Tasks | Link Prediction |
Published | 2018-03-19 |
URL | http://arxiv.org/abs/1803.07054v1 |
http://arxiv.org/pdf/1803.07054v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-link-prediction-with-matrix-logistic |
Repo | |
Framework | |
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs
Title | Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs |
Authors | Mohammad Sadegh Talebi, Odalric-Ambrym Maillard |
Abstract | The problem of reinforcement learning in an unknown and discrete Markov Decision Process (MDP) under the average-reward criterion is considered, when the learner interacts with the system in a single stream of observations, starting from an initial state without any reset. We revisit the minimax lower bound for that problem by making appear the local variance of the bias function in place of the diameter of the MDP. Furthermore, we provide a novel analysis of the KL-UCRL algorithm establishing a high-probability regret bound scaling as $\widetilde {\mathcal O}\Bigl({\textstyle \sqrt{S\sum_{s,a}{\bf V}^\star_{s,a}T}}\Big)$ for this algorithm for ergodic MDPs, where $S$ denotes the number of states and where ${\bf V}^\star_{s,a}$ is the variance of the bias function with respect to the next-state distribution following action $a$ in state $s$. The resulting bound improves upon the best previously known regret bound $\widetilde {\mathcal O}(DS\sqrt{AT})$ for that algorithm, where $A$ and $D$ respectively denote the maximum number of actions (per state) and the diameter of MDP. We finally compare the leading terms of the two bounds in some benchmark MDPs indicating that the derived bound can provide an order of magnitude improvement in some cases. Our analysis leverages novel variations of the transportation lemma combined with Kullback-Leibler concentration inequalities, that we believe to be of independent interest. |
Tasks | |
Published | 2018-03-05 |
URL | http://arxiv.org/abs/1803.01626v1 |
http://arxiv.org/pdf/1803.01626v1.pdf | |
PWC | https://paperswithcode.com/paper/variance-aware-regret-bounds-for-undiscounted |
Repo | |
Framework | |
Intent Detection and Slots Prompt in a Closed-Domain Chatbot
Title | Intent Detection and Slots Prompt in a Closed-Domain Chatbot |
Authors | Amber Nigam, Prashik Sahare, Kushagra Pandya |
Abstract | In this paper, we introduce a methodology for predicting intent and slots of a query for a chatbot that answers career-related queries. We take a multi-staged approach where both the processes (intent-classification and slot-tagging) inform each other’s decision-making in different stages. The model breaks down the problem into stages, solving one problem at a time and passing on relevant results of the current stage to the next, thereby reducing search space for subsequent stages, and eventually making classification and tagging more viable after each stage. We also observe that relaxing rules for a fuzzy entity-matching in slot-tagging after each stage (by maintaining a separate Named Entity Tagger per stage) helps us improve performance, although at a slight cost of false-positives. Our model has achieved state-of-the-art performance with F1-score of 77.63% for intent-classification and 82.24% for slot-tagging on our dataset that we would publicly release along with the paper. |
Tasks | Chatbot, Decision Making, Intent Classification, Intent Detection |
Published | 2018-12-27 |
URL | http://arxiv.org/abs/1812.10628v2 |
http://arxiv.org/pdf/1812.10628v2.pdf | |
PWC | https://paperswithcode.com/paper/intent-detection-and-slots-prompt-in-a-closed |
Repo | |
Framework | |
Reversible Image Watermarking for Health Informatics Systems Using Distortion Compensation in Wavelet Domain
Title | Reversible Image Watermarking for Health Informatics Systems Using Distortion Compensation in Wavelet Domain |
Authors | Hamidreza Zarrabi, Mohsen Hajabdollahi, S. M. Reza Soroushmehr, Nader Karimi, Shadrokh Samavi, Kayvan Najarian |
Abstract | Reversible image watermarking guaranties restoration of both original cover and watermark logo from the watermarked image. Capacity and distortion of the image under reversible watermarking are two important parameters. In this study a reversible watermarking is investigated with focusing on increasing the embedding capacity and reducing the distortion in medical images. Integer wavelet transform is used for embedding where in each iteration, one watermark bit is embedded in one transform coefficient. We devise a novel approach that when a coefficient is modified in an iteration, the produced distortion is compensated in the next iteration. This distortion compensation method would result in low distortion rate. The proposed method is tested on four types of medical images including MRI of brain, cardiac MRI, MRI of breast, and intestinal polyp images. Using a one-level wavelet transform, maximum capacity of 1.5 BPP is obtained. Experimental results demonstrate that the proposed method is superior to the state-of-the-art works in terms of capacity and distortion. |
Tasks | |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07786v1 |
http://arxiv.org/pdf/1802.07786v1.pdf | |
PWC | https://paperswithcode.com/paper/reversible-image-watermarking-for-health |
Repo | |
Framework | |