October 18, 2019

3252 words 16 mins read

Paper Group ANR 442

Paper Group ANR 442

Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks. Image Captioning using Deep Neural Architectures. Deep supervision with additional labels for retinal vessel segmentation task. SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines. One-Class Feature Learning Using Intra-Class Splitting. A Mu …

Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks

Title Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks
Authors Chenyang Lu, Marinus Jacobus Gerardus van de Molengraft, Gijs Dubbelman
Abstract In this work, we research and evaluate end-to-end learning of monocular semantic-metric occupancy grid mapping from weak binocular ground truth. The network learns to predict four classes, as well as a camera to bird’s eye view mapping. At the core, it utilizes a variational encoder-decoder network that encodes the front-view visual information of the driving scene and subsequently decodes it into a 2-D top-view Cartesian coordinate system. The evaluations on Cityscapes show that the end-to-end learning of semantic-metric occupancy grids outperforms the deterministic mapping approach with flat-plane assumption by more than 12% mean IoU. Furthermore, we show that the variational sampling with a relatively small embedding vector brings robustness against vehicle dynamic perturbations, and generalizability for unseen KITTI data. Our network achieves real-time inference rates of approx. 35 Hz for an input image with a resolution of 256x512 pixels and an output map with 64x64 occupancy grid cells using a Titan V GPU.
Tasks
Published 2018-04-06
URL http://arxiv.org/abs/1804.02176v3
PDF http://arxiv.org/pdf/1804.02176v3.pdf
PWC https://paperswithcode.com/paper/monocular-semantic-occupancy-grid-mapping
Repo
Framework

Image Captioning using Deep Neural Architectures

Title Image Captioning using Deep Neural Architectures
Authors Parth Shah, Vishvajit Bakarola, Supriya Pati
Abstract Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about different available models for image captioning task. We have also discussed about how the advancement in the task of object recognition and machine translation has greatly improved the performance of image captioning model in recent years. In addition to that we have discussed how this model can be implemented. In the end, we have also evaluated the performance of model using standard evaluation matrices.
Tasks Image Captioning, Machine Translation, Object Recognition
Published 2018-01-17
URL http://arxiv.org/abs/1801.05568v1
PDF http://arxiv.org/pdf/1801.05568v1.pdf
PWC https://paperswithcode.com/paper/image-captioning-using-deep-neural
Repo
Framework

Deep supervision with additional labels for retinal vessel segmentation task

Title Deep supervision with additional labels for retinal vessel segmentation task
Authors Yishuo Zhang, Albert C. S. Chung
Abstract Automatic analysis of retinal blood images is of vital importance in diagnosis tasks of retinopathy. Segmenting vessels accurately is a fundamental step in analysing retinal images. However, it is usually difficult due to various imaging conditions, low image contrast and the appearance of pathologies such as micro-aneurysms. In this paper, we propose a novel method with deep neural networks to solve this problem. We utilize U-net with residual connection to detect vessels. To achieve better accuracy, we introduce an edge-aware mechanism, in which we convert the original task into a multi-class task by adding additional labels on boundary areas. In this way, the network will pay more attention to the boundary areas of vessels and achieve a better performance, especially in tiny vessels detecting. Besides, side output layers are applied in order to give deep supervision and therefore help convergence. We train and evaluate our model on three databases: DRIVE, STARE, and CHASEDB1. Experimental results show that our method has a comparable performance with AUC of 97.99% on DRIVE and an efficient running time compared to the state-of-the-art methods.
Tasks Retinal Vessel Segmentation
Published 2018-06-06
URL http://arxiv.org/abs/1806.02132v3
PDF http://arxiv.org/pdf/1806.02132v3.pdf
PWC https://paperswithcode.com/paper/deep-supervision-with-additional-labels-for
Repo
Framework

SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines

Title SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines
Authors Roy Schwartz, Sam Thomson, Noah A. Smith
Abstract Recurrent and convolutional neural networks comprise two distinct families of models that have proven to be useful for encoding natural language utterances. In this paper we present SoPa, a new model that aims to bridge these two approaches. SoPa combines neural representation learning with weighted finite-state automata (WFSAs) to learn a soft version of traditional surface patterns. We show that SoPa is an extension of a one-layer CNN, and that such CNNs are equivalent to a restricted version of SoPa, and accordingly, to a restricted form of WFSA. Empirically, on three text classification tasks, SoPa is comparable or better than both a BiLSTM (RNN) baseline and a CNN baseline, and is particularly useful in small data settings.
Tasks Representation Learning, Text Classification
Published 2018-05-15
URL http://arxiv.org/abs/1805.06061v1
PDF http://arxiv.org/pdf/1805.06061v1.pdf
PWC https://paperswithcode.com/paper/sopa-bridging-cnns-rnns-and-weighted-finite
Repo
Framework

One-Class Feature Learning Using Intra-Class Splitting

Title One-Class Feature Learning Using Intra-Class Splitting
Authors Patrick Schlachter, Yiwen Liao, Bin Yang
Abstract This paper proposes a novel generic one-class feature learning method based on intra-class splitting. In one-class classification, feature learning is challenging, because only samples of one class are available during training. Hence, state-of-the-art methods require reference multi-class datasets to pretrain feature extractors. In contrast, the proposed method realizes feature learning by splitting the given normal class into typical and atypical normal samples. By introducing closeness loss and dispersion loss, an intra-class joint training procedure between the two subsets after splitting enables the extraction of valuable features for one-class classification. Various experiments on three well-known image classification datasets demonstrate the effectiveness of our method which outperformed other baseline models in average.
Tasks Image Classification
Published 2018-12-20
URL https://arxiv.org/abs/1812.08468v5
PDF https://arxiv.org/pdf/1812.08468v5.pdf
PWC https://paperswithcode.com/paper/one-class-feature-learning-using-intra-class
Repo
Framework

A Multi-Scheme Ensemble Using Coopetitive Soft-Gating With Application to Power Forecasting for Renewable Energy Generation

Title A Multi-Scheme Ensemble Using Coopetitive Soft-Gating With Application to Power Forecasting for Renewable Energy Generation
Authors André Gensler, Bernhard Sick
Abstract In this article, we propose a novel ensemble technique with a multi-scheme weighting based on a technique called coopetitive soft gating. This technique combines both, ensemble member competition and cooperation, in order to maximize the overall forecasting accuracy of the ensemble. The proposed algorithm combines the ideas of multiple ensemble paradigms (power forecasting model ensemble, weather forecasting model ensemble, and lagged ensemble) in a hierarchical structure. The technique is designed to be used in a flexible manner on single and multiple weather forecasting models, and for a variety of lead times. We compare the technique to other power forecasting models and ensemble techniques with a flexible number of weather forecasting models, which can have the same, or varying forecasting horizons. It is shown that the model is able to outperform those models on a number of publicly available data sets. The article closes with a discussion of properties of the proposed model which are relevant in its application.
Tasks Weather Forecasting
Published 2018-03-16
URL http://arxiv.org/abs/1803.06344v1
PDF http://arxiv.org/pdf/1803.06344v1.pdf
PWC https://paperswithcode.com/paper/a-multi-scheme-ensemble-using-coopetitive
Repo
Framework

Retinal Vessel Segmentation Based on Conditional Deep Convolutional Generative Adversarial Networks

Title Retinal Vessel Segmentation Based on Conditional Deep Convolutional Generative Adversarial Networks
Authors Yun Jiang, Ning Tan
Abstract The segmentation of retinal vessels is of significance for doctors to diagnose the fundus diseases. However, existing methods have various problems in the segmentation of the retinal vessels, such as insufficient segmentation of retinal vessels, weak anti-noise interference ability, and sensitivity to lesions, etc. Aiming to the shortcomings of existed methods, this paper proposes the use of conditional deep convolutional generative adversarial networks to segment the retinal vessels. We mainly improve the network structure of the generator. The introduction of the residual module at the convolutional layer for residual learning makes the network structure sensitive to changes in the output, as to better adjust the weight of the generator. In order to reduce the number of parameters and calculations, using a small convolution to halve the number of channels in the input signature before using a large convolution kernel. By used skip connection to connect the output of the convolutional layer with the output of the deconvolution layer to avoid low-level information sharing. By verifying the method on the DRIVE and STARE datasets, the segmentation accuracy rate is 96.08% and 97.71%, the sensitivity reaches 82.74% and 85.34% respectively, and the F-measure reaches 82.08% and 85.02% respectively. The sensitivity is 4.82% and 2.4% higher than that of R2U-Net.
Tasks Retinal Vessel Segmentation
Published 2018-05-11
URL http://arxiv.org/abs/1805.04224v1
PDF http://arxiv.org/pdf/1805.04224v1.pdf
PWC https://paperswithcode.com/paper/retinal-vessel-segmentation-based-on
Repo
Framework

Learning Unsupervised Visual Grounding Through Semantic Self-Supervision

Title Learning Unsupervised Visual Grounding Through Semantic Self-Supervision
Authors Syed Ashar Javed, Shreyas Saxena, Vineet Gandhi
Abstract Localizing natural language phrases in images is a challenging problem that requires joint understanding of both the textual and visual modalities. In the unsupervised setting, lack of supervisory signals exacerbate this difficulty. In this paper, we propose a novel framework for unsupervised visual grounding which uses concept learning as a proxy task to obtain self-supervision. The simple intuition behind this idea is to encourage the model to localize to regions which can explain some semantic property in the data, in our case, the property being the presence of a concept in a set of images. We present thorough quantitative and qualitative experiments to demonstrate the efficacy of our approach and show a 5.6% improvement over the current state of the art on Visual Genome dataset, a 5.8% improvement on the ReferItGame dataset and comparable to state-of-art performance on the Flickr30k dataset.
Tasks
Published 2018-03-17
URL http://arxiv.org/abs/1803.06506v3
PDF http://arxiv.org/pdf/1803.06506v3.pdf
PWC https://paperswithcode.com/paper/learning-unsupervised-visual-grounding
Repo
Framework
Title Efficient Progressive Neural Architecture Search
Authors Juan-Manuel Perez-Rua, Moez Baccouche, Stephane Pateux
Abstract This paper addresses the difficult problem of finding an optimal neural architecture design for a given image classification task. We propose a method that aggregates two main results of the previous state-of-the-art in neural architecture search. These are, appealing to the strong sampling efficiency of a search scheme based on sequential model-based optimization (SMBO), and increasing training efficiency by sharing weights among sampled architectures. Sequential search has previously demonstrated its capabilities to find state-of-the-art neural architectures for image classification. However, its computational cost remains high, even unreachable under modest computational settings. Affording SMBO with weight-sharing alleviates this problem. On the other hand, progressive search with SMBO is inherently greedy, as it leverages a learned surrogate function to predict the validation error of neural architectures. This prediction is directly used to rank the sampled neural architectures. We propose to attenuate the greediness of the original SMBO method by relaxing the role of the surrogate function so it predicts architecture sampling probability instead. We demonstrate with experiments on the CIFAR-10 dataset that our method, denominated Efficient progressive neural architecture search (EPNAS), leads to increased search efficiency, while retaining competitiveness of found architectures.
Tasks Image Classification, Neural Architecture Search
Published 2018-08-01
URL http://arxiv.org/abs/1808.00391v1
PDF http://arxiv.org/pdf/1808.00391v1.pdf
PWC https://paperswithcode.com/paper/efficient-progressive-neural-architecture
Repo
Framework

Voiceprint recognition of Parkinson patients based on deep learning

Title Voiceprint recognition of Parkinson patients based on deep learning
Authors Zhijing Xu, Juan Wang, Ying Zhang, Xiangjian He
Abstract More than 90% of the Parkinson Disease (PD) patients suffer from vocal disorders. Speech impairment is already indicator of PD. This study focuses on PD diagnosis through voiceprint features. In this paper, a method based on Deep Neural Network (DNN) recognition and classification combined with Mini-Batch Gradient Descent (MBGD) is proposed to distinguish PD patients from healthy people using voiceprint features. In order to exact the voiceprint features from patients, Weighted Mel Frequency Cepstrum Coefficients (WMFCC) is applied. The proposed method is tested on experimental data obtained by the voice recordings of three sustained vowels /a/, /o/ and /u/ from participants (48 PD and 20 healthy people). The results show that the proposed method achieves a high accuracy of diagnosis of PD patients from healthy people, than the conventional methods like Support Vector Machine (SVM) and other mentioned in this paper. The accuracy achieved is 89.5%. WMFCC approach can solve the problem that the high-order cepstrum coefficients are small and the features component’s representation ability to the audio is weak. MBGD reduces the computational loads of the loss function, and increases the training speed of the system. DNN classifier enhances the classification ability of voiceprint features. Therefore, the above approaches can provide a solid solution for the quick auxiliary diagnosis of PD in early stage.
Tasks
Published 2018-12-17
URL http://arxiv.org/abs/1812.06613v1
PDF http://arxiv.org/pdf/1812.06613v1.pdf
PWC https://paperswithcode.com/paper/voiceprint-recognition-of-parkinson-patients
Repo
Framework
Title An Improved Generic Bet-and-Run Strategy for Speeding Up Stochastic Local Search
Authors Thomas Weise, Zijun Wu, Markus Wagner
Abstract A commonly used strategy for improving optimization algorithms is to restart the algorithm when it is believed to be trapped in an inferior part of the search space. Building on the recent success of Bet-and-Run approaches for restarted local search solvers, we introduce an improved generic Bet-and-Run strategy. The goal is to obtain the best possible results within a given time budget t using a given black-box optimization algorithm. If no prior knowledge about problem features and algorithm behavior is available, the question about how to use the time budget most efficiently arises. We propose to first start k>=1 independent runs of the algorithm during an initialization budget t1<t, pausing these runs, then apply a decision maker D to choose 1<=m<=k runs from them (consuming t2>=0 time units in doing so), and then continuing these runs for the remaining t3=t-t1-t2 time units. In previous Bet-and-Run strategies, the decision maker D=currentBest would simply select the run with the best- so-far results at negligible time. We propose using more advanced methods to discriminate between “good” and “bad” sample runs, with the goal of increasing the correlation of the chosen run with the a-posteriori best one. We test several different approaches, including neural networks trained or polynomials fitted on the current trace of the algorithm to predict which run may yield the best results if granted the remaining budget. We show with extensive experiments that this approach can yield better results than the previous methods, but also find that the currentBest method is a very reliable and robust baseline approach.
Tasks
Published 2018-06-23
URL http://arxiv.org/abs/1806.08984v1
PDF http://arxiv.org/pdf/1806.08984v1.pdf
PWC https://paperswithcode.com/paper/an-improved-generic-bet-and-run-strategy-for
Repo
Framework

Efficient Embedding of MPI Collectives in MXNET DAGs for scaling Deep Learning

Title Efficient Embedding of MPI Collectives in MXNET DAGs for scaling Deep Learning
Authors Amith R Mamidala
Abstract Availability of high performance computing infrastructures such as clusters of GPUs and CPUs have fueled the growth of distributed learning systems. Deep Learning frameworks express neural nets as DAGs and execute these DAGs on computation resources such as GPUs. In this paper, we propose efficient designs of embedding MPI collective operations into data parallel DAGs. Incorrect designs can easily lead to deadlocks or program crashes. In particular, we demonstrate three designs: Funneled, Concurrent communication and Dependency chaining of using MPI collectives with DAGs. These designs automatically enable overlap of computation with communication by allowing for concurrent execution with the other tasks. We directly implement these designs into the KVStore API of the MXNET. This allows us to directly leverage the rest of the infrastructure. Using ImageNet and CIFAR data sets, we show the potential of our designs. In particular, our designs scale to 256 GPUs with as low as 50 seconds of epoch times for ImageNet 1K datasets.
Tasks
Published 2018-02-20
URL http://arxiv.org/abs/1802.06949v1
PDF http://arxiv.org/pdf/1802.06949v1.pdf
PWC https://paperswithcode.com/paper/efficient-embedding-of-mpi-collectives-in
Repo
Framework

Extracting Keywords from Open-Ended Business Survey Questions

Title Extracting Keywords from Open-Ended Business Survey Questions
Authors Barbara McGillivray, Gard Jenset, Dominik Heil
Abstract Open-ended survey data constitute an important basis in research as well as for making business decisions. Collecting and manually analysing free-text survey data is generally more costly than collecting and analysing survey data consisting of answers to multiple-choice questions. Yet free-text data allow for new content to be expressed beyond predefined categories and are a very valuable source of new insights into people’s opinions. At the same time, surveys always make ontological assumptions about the nature of the entities that are researched, and this has vital ethical consequences. Human interpretations and opinions can only be properly ascertained in their richness using textual data sources; if these sources are analyzed appropriately, the essential linguistic nature of humans and social entities is safeguarded. Natural Language Processing (NLP) offers possibilities for meeting this ethical business challenge by automating the analysis of natural language and thus allowing for insightful investigations of human judgements. We present a computational pipeline for analysing large amounts of responses to open-ended questions in surveys and extract keywords that appropriately represent people’s opinions. This pipeline addresses the need to perform such tasks outside the scope of both commercial software and bespoke analysis, exceeds the performance to state-of-the-art systems, and performs this task in a transparent way that allows for scrutinising and exposing potential biases in the analysis. Following the principle of Open Data Science, our code is open-source and generalizable to other datasets.
Tasks
Published 2018-08-31
URL http://arxiv.org/abs/1808.10685v2
PDF http://arxiv.org/pdf/1808.10685v2.pdf
PWC https://paperswithcode.com/paper/extracting-keywords-from-open-ended-business
Repo
Framework

Trajectory PHD and CPHD filters

Title Trajectory PHD and CPHD filters
Authors Ángel F. García-Fernández, Lennart Svensson
Abstract This paper presents the probability hypothesis density filter (PHD) and the cardinality PHD (CPHD) filter for sets of trajectories, which are referred to as the trajectory PHD (TPHD) and trajectory CPHD (TCPHD) filters. Contrary to the PHD/CPHD filters, the TPHD/TCPHD filters are able to produce trajectory estimates from first principles. The TPHD filter is derived by recursively obtaining the best Poisson multitrajectory density approximation to the posterior density over the alive trajectories by minimising the Kullback-Leibler divergence. The TCPHD is derived in the same way but propagating an independent identically distributed (IID) cluster multitrajectory density approximation. We also propose the Gaussian mixture implementations of the TPHD and TCPHD recursions, the Gaussian mixture TPHD (GMTPHD) and the Gaussian mixture TCPHD (GMTCPHD), and the L-scan computationally efficient implementations, which only update the density of the trajectory states of the last L time steps.
Tasks
Published 2018-11-21
URL https://arxiv.org/abs/1811.08820v3
PDF https://arxiv.org/pdf/1811.08820v3.pdf
PWC https://paperswithcode.com/paper/trajectory-phd-and-cphd-filters
Repo
Framework

Context-Aware Systems for Sequential Item Recommendation

Title Context-Aware Systems for Sequential Item Recommendation
Authors Moin Nadeem, Dustin Stansbury, Shane Mooney
Abstract Quizlet is the most popular online learning tool in the United States, and is used by over 2/3 of high school students, and 1/2 of college students. With more than 95% of Quizlet users reporting improved grades as a result, the platform has become the de-facto tool used in millions of classrooms. In this paper, we explore the task of recommending suitable content for a student to study, given their prior interests, as well as what their peers are studying. We propose a novel approach, i.e. Neural Educational Recommendation Engine (NERE), to recommend educational content by leveraging student behaviors rather than ratings. We have found that this approach better captures social factors that are more aligned with learning. NERE is based on a recurrent neural network that includes collaborative and content-based approaches for recommendation, and takes into account any particular student’s speed, mastery, and experience to recommend the appropriate task. We train NERE by jointly learning the user embeddings and content embeddings, and attempt to predict the content embedding for the final timestamp. We also develop a confidence estimator for our neural network, which is a crucial requirement for productionizing this model. We apply NERE to Quizlet’s proprietary dataset, and present our results. We achieved an R^2 score of 0.81 in the content embedding space, and a recall score of 54% on our 100 nearest neighbors. This vastly exceeds the recall@100 score of 12% that a standard matrix-factorization approach provides. We conclude with a discussion on how NERE will be deployed, and position our work as one of the first educational recommender systems for the K-12 space.
Tasks Recommendation Systems
Published 2018-09-21
URL http://arxiv.org/abs/1809.08922v2
PDF http://arxiv.org/pdf/1809.08922v2.pdf
PWC https://paperswithcode.com/paper/neural-educational-recommendation-engine-nere
Repo
Framework
comments powered by Disqus