January 28, 2020

3166 words 15 mins read

Paper Group ANR 1017

Paper Group ANR 1017

Closed-Form Full Map Posteriors for Robot Localization with Lidar Sensors. FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review. HDI-Forest: Highest Density Interval Regression Forest. Multimodal Representation Model based on Graph-Based Rank Fusion. SANet:Superpixel Attention Network for Skin Lesion Attribute …

Closed-Form Full Map Posteriors for Robot Localization with Lidar Sensors

Title Closed-Form Full Map Posteriors for Robot Localization with Lidar Sensors
Authors Lukas Luft, Alexander Schaefer, Tobias Schubert, Wolfram Burgard
Abstract A popular class of lidar-based grid mapping algorithms computes for each map cell the probability that it reflects an incident laser beam. These algorithms typically determine the map as the set of reflection probabilities that maximizes the likelihood of the underlying laser data and do not compute the full posterior distribution over all possible maps. Thereby, they discard crucial information about the confidence of the estimate. The approach presented in this paper preserves this information by determining the full map posterior. In general, this problem is hard because distributions over real-valued quantities can possess infinitely many dimensions. However, for two state-of-the-art beam-based lidar models, our approach yields closed-form map posteriors that possess only two parameters per cell. Even better, these posteriors come for free, in the sense that they use the same parameters as the traditional approaches, without the need for additional computations. An important use case for grid maps is robot localization, which we formulate as Bayesian filtering based on the closed-form map posterior rather than based on a single map. The resulting measurement likelihoods can also be expressed in closed form. In simulations and extensive real-world experiments, we show that leveraging the full map posterior improves the localization accuracy compared to approaches that use the most likely map.
Tasks
Published 2019-10-23
URL https://arxiv.org/abs/1910.10493v1
PDF https://arxiv.org/pdf/1910.10493v1.pdf
PWC https://paperswithcode.com/paper/closed-form-full-map-posteriors-for-robot
Repo
Framework

FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review

Title FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review
Authors Ahmad Shawahna, Sadiq M. Sait, Aiman El-Maleh
Abstract Due to recent advances in digital technologies, and availability of credible data, an area of artificial intelligence, deep learning, has emerged, and has demonstrated its ability and effectiveness in solving complex learning problems not possible before. In particular, convolution neural networks (CNNs) have demonstrated their effectiveness in image detection and recognition applications. However, they require intensive CPU operations and memory bandwidth that make general CPUs fail to achieve desired performance levels. Consequently, hardware accelerators that use application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), and graphic processing units (GPUs) have been employed to improve the throughput of CNNs. More precisely, FPGAs have been recently adopted for accelerating the implementation of deep learning networks due to their ability to maximize parallelism as well as due to their energy efficiency. In this paper, we review recent existing techniques for accelerating deep learning networks on FPGAs. We highlight the key features employed by the various techniques for improving the acceleration performance. In addition, we provide recommendations for enhancing the utilization of FPGAs for CNNs acceleration. The techniques investigated in this paper represent the recent trends in FPGA-based accelerators of deep learning networks. Thus, this review is expected to direct the future advances on efficient hardware accelerators and to be useful for deep learning researchers.
Tasks
Published 2019-01-01
URL http://arxiv.org/abs/1901.00121v1
PDF http://arxiv.org/pdf/1901.00121v1.pdf
PWC https://paperswithcode.com/paper/fpga-based-accelerators-of-deep-learning
Repo
Framework

HDI-Forest: Highest Density Interval Regression Forest

Title HDI-Forest: Highest Density Interval Regression Forest
Authors Lin Zhu, Jiaxing Lu, Yihong Chen
Abstract By seeking the narrowest prediction intervals (PIs) that satisfy the specified coverage probability requirements, the recently proposed quality-based PI learning principle can extract high-quality PIs that better summarize the predictive certainty in regression tasks, and has been widely applied to solve many practical problems. Currently, the state-of-the-art quality-based PI estimation methods are based on deep neural networks or linear models. In this paper, we propose Highest Density Interval Regression Forest (HDI-Forest), a novel quality-based PI estimation method that is instead based on Random Forest. HDI-Forest does not require additional model training, and directly reuses the trees learned in a standard Random Forest model. By utilizing the special properties of Random Forest, HDI-Forest could efficiently and more directly optimize the PI quality metrics. Extensive experiments on benchmark datasets show that HDI-Forest significantly outperforms previous approaches, reducing the average PI width by over 20% while achieving the same or better coverage probability
Tasks
Published 2019-05-24
URL https://arxiv.org/abs/1905.10101v2
PDF https://arxiv.org/pdf/1905.10101v2.pdf
PWC https://paperswithcode.com/paper/hdi-forest-highest-density-interval
Repo
Framework

Multimodal Representation Model based on Graph-Based Rank Fusion

Title Multimodal Representation Model based on Graph-Based Rank Fusion
Authors Icaro Cavalcante Dourado, Salvatore Tabbone, Ricardo da Silva Torres
Abstract This paper proposes an unsupervised representation model, based on rank-fusion graphs, for general applicability in multimodal tasks, either unsupervised or supervised prediction. Rank-fusion graphs encode information from multiple descriptors and retrieval models, thus being able to capture underlying relationships between modalities, samples, and the collection itself. By doing so, our method is able to promote a fusion model better than either early-fusion and late-fusion alternatives. The solution is based on the encoding of multiple ranks of a query, defined according to different criteria, into a graph. Later, we embed the generated graph into a feature space, creating fusion vectors. Those embeddings are employed to build an estimator that infers whether an input (even multimodal) object refers to a class (or event) or not. Performed experiments in the context of multiple multimodal and visual datasets, evaluated on several descriptors and retrieval models, demonstrate that our representation model is highly effective for different detection scenarios involving visual, textual, and multimodal features, yielding better detection results than state-of-the-art methods.
Tasks
Published 2019-12-21
URL https://arxiv.org/abs/1912.10314v3
PDF https://arxiv.org/pdf/1912.10314v3.pdf
PWC https://paperswithcode.com/paper/multimodal-representation-model-based-on
Repo
Framework

SANet:Superpixel Attention Network for Skin Lesion Attributes Detection

Title SANet:Superpixel Attention Network for Skin Lesion Attributes Detection
Authors Xinzi He, Baiying Lei, Tianfu Wang
Abstract The accurate detection of lesion attributes is meaningful for both the computeraid diagnosis system and dermatologists decisions. However, unlike lesion segmentation and melenoma classification, there are few deep learning methods and literatures focusing on this task. Currently, the lesion attribute detection still remains challenging due to the extremely unbalanced class distribution and insufficient samples, as well as large intraclass and low interclass variations. To solve these problems, we propose a deep learning framework named superpixel attention network (SANet). Firstly, we segment input images into small regions and shuffle the obtained regions by the random shuttle mechanism (RSM). Secondly, we apply the SANet to capture discriminative features and reconstruct input images. Specifically, SANet contains two sub modules: superpixel average pooling and superpixel at tention module. We introduce a superpixel average pooling to reformulate the superpixel classification problem as a superpixel segmentation problem and a SAMis utilized to focus on discriminative superpixel regions and feature channels. Finally, we design a novel but effective loss, namely global balancing loss to address the serious data imbalance in ISIC 2018 Task 2 lesion attributes detection dataset. The proposed method achieves quite good performance on the ISIC 2018 Task 2 challenge.
Tasks Lesion Segmentation
Published 2019-10-20
URL https://arxiv.org/abs/1910.08995v1
PDF https://arxiv.org/pdf/1910.08995v1.pdf
PWC https://paperswithcode.com/paper/sanetsuperpixel-attention-network-for-skin
Repo
Framework

Domain Adaptive Inference for Neural Machine Translation

Title Domain Adaptive Inference for Neural Machine Translation
Authors Danielle Saunders, Felix Stahlberg, Adria de Gispert, Bill Byrne
Abstract We investigate adaptive ensemble weighting for Neural Machine Translation, addressing the case of improving performance on a new and potentially unknown domain without sacrificing performance on the original domain. We adapt sequentially across two Spanish-English and three English-German tasks, comparing unregularized fine-tuning, L2 and Elastic Weight Consolidation. We then report a novel scheme for adaptive NMT ensemble decoding by extending Bayesian Interpolation with source information, and show strong improvements across test domains without access to the domain label.
Tasks Machine Translation
Published 2019-06-02
URL https://arxiv.org/abs/1906.00408v1
PDF https://arxiv.org/pdf/1906.00408v1.pdf
PWC https://paperswithcode.com/paper/190600408
Repo
Framework

Implicit Discriminator in Variational Autoencoder

Title Implicit Discriminator in Variational Autoencoder
Authors Prateek Munjal, Akanksha Paul, Narayanan C. Krishnan
Abstract Recently generative models have focused on combining the advantages of variational autoencoders (VAE) and generative adversarial networks (GAN) for good reconstruction and generative abilities. In this work we introduce a novel hybrid architecture, Implicit Discriminator in Variational Autoencoder (IDVAE), that combines a VAE and a GAN, which does not need an explicit discriminator network. The fundamental premise of the IDVAE architecture is that the encoder of a VAE and the discriminator of a GAN utilize common features and therefore can be trained as a shared network, while the decoder of the VAE and the generator of the GAN can be combined to learn a single network. This results in a simple two-tier architecture that has the properties of both a VAE and a GAN. The qualitative and quantitative experiments on real-world benchmark datasets demonstrates that IDVAE perform better than the state of the art hybrid approaches. We experimentally validate that IDVAE can be easily extended to work in a conditional setting and demonstrate its performance on complex datasets.
Tasks
Published 2019-09-28
URL https://arxiv.org/abs/1909.13062v1
PDF https://arxiv.org/pdf/1909.13062v1.pdf
PWC https://paperswithcode.com/paper/implicit-discriminator-in-variational
Repo
Framework

To lemmatize or not to lemmatize: how word normalisation affects ELMo performance in word sense disambiguation

Title To lemmatize or not to lemmatize: how word normalisation affects ELMo performance in word sense disambiguation
Authors Andrey Kutuzov, Elizaveta Kuzmenko
Abstract We critically evaluate the widespread assumption that deep learning NLP models do not require lemmatized input. To test this, we trained versions of contextualised word embedding ELMo models on raw tokenized corpora and on the corpora with word tokens replaced by their lemmas. Then, these models were evaluated on the word sense disambiguation task. This was done for the English and Russian languages. The experiments showed that while lemmatization is indeed not necessary for English, the situation is different for Russian. It seems that for rich-morphology languages, using lemmatized training and testing data yields small but consistent improvements: at least for word sense disambiguation. This means that the decisions about text pre-processing before training ELMo should consider the linguistic nature of the language in question.
Tasks Lemmatization, Word Sense Disambiguation
Published 2019-09-06
URL https://arxiv.org/abs/1909.03135v1
PDF https://arxiv.org/pdf/1909.03135v1.pdf
PWC https://paperswithcode.com/paper/to-lemmatize-or-not-to-lemmatize-how-word
Repo
Framework

DeepFloat: Resource-Efficient Dynamic Management of Vehicular Floating Content

Title DeepFloat: Resource-Efficient Dynamic Management of Vehicular Floating Content
Authors Gaetano Manzo, Sebastian Otalora, Marco Ajmone Marsan, Torsten Braun, Hung Nguyen, Gianluca Rizzo
Abstract Opportunistic communications are expected to playa crucial role in enabling context-aware vehicular services. A widely investigated opportunistic communication paradigm for storing a piece of content probabilistically in a geographica larea is Floating Content (FC). A key issue in the practical deployment of FC is how to tune content replication and caching in a way which achieves a target performance (in terms of the mean fraction of users possessing the content in a given region of space) while minimizing the use of bandwidth and host memory. Fully distributed, distance-based approaches prove highly inefficient, and may not meet the performance target,while centralized, model-based approaches do not perform well in realistic, inhomogeneous settings. In this work, we present a data-driven centralized approach to resource-efficient, QoS-aware dynamic management of FC.We propose a Deep Learning strategy, which employs a Convolutional Neural Network (CNN) to capture the relationships between patterns of users mobility, of content diffusion and replication, and FC performance in terms of resource utilization and of content availability within a given area. Numerical evaluations show the effectiveness of our approach in deriving strategies which efficiently modulate the FC operation in space and effectively adapt to mobility pattern changes over time.
Tasks
Published 2019-06-11
URL https://arxiv.org/abs/1906.07098v1
PDF https://arxiv.org/pdf/1906.07098v1.pdf
PWC https://paperswithcode.com/paper/deepfloat-resource-efficient-dynamic
Repo
Framework

Technical report of “Empirical Study on Human Evaluation of Complex Argumentation Frameworks”

Title Technical report of “Empirical Study on Human Evaluation of Complex Argumentation Frameworks”
Authors Marcos Cramer, Mathieu Guillaume
Abstract In abstract argumentation, multiple argumentation semantics have been proposed that allow to select sets of jointly acceptable arguments from a given argumentation framework, i.e. based only on the attack relation between arguments. The existence of multiple argumentation semantics raises the question which of these semantics predicts best how humans evaluate arguments. Previous empirical cognitive studies that have tested how humans evaluate sets of arguments depending on the attack relation between them have been limited to a small set of very simple argumentation frameworks, so that some semantics studied in the literature could not be meaningfully distinguished by these studies. In this paper we report on an empirical cognitive study that overcomes these limitations by taking into consideration twelve argumentation frameworks of three to eight arguments each. These argumentation frameworks were mostly more complex than the argumentation frameworks considered in previous studies. All twelve argumentation framework were systematically instantiated with natural language arguments based on a certain fictional scenario, and participants were shown both the natural language arguments and a graphical depiction of the attack relation between them. Our data shows that grounded and CF2 semantics were the best predictors of human argument evaluation. A detailed analysis revealed that part of the participants chose a cognitively simpler strategy that is predicted very well by grounded semantics, while another part of the participants chose a cognitively more demanding strategy that is mostly predicted well by CF2 semantics.
Tasks Abstract Argumentation
Published 2019-02-27
URL http://arxiv.org/abs/1902.10552v1
PDF http://arxiv.org/pdf/1902.10552v1.pdf
PWC https://paperswithcode.com/paper/technical-report-of-empirical-study-on-human
Repo
Framework

ResUNet++: An Advanced Architecture for Medical Image Segmentation

Title ResUNet++: An Advanced Architecture for Medical Image Segmentation
Authors Debesh Jha, Pia H. Smedsrud, Michael A. Riegler, Dag Johansen, Thomas de Lange, Pal Halvorsen, Havard D. Johansen
Abstract Accurate computer-aided polyp detection and segmentation during colonoscopy examinations can help endoscopists resect abnormal tissue and thereby decrease chances of polyps growing into cancer. Towards developing a fully automated model for pixel-wise polyp segmentation, we propose ResUNet++, which is an improved ResUNet architecture for colonoscopic image segmentation. Our experimental evaluations show that the suggested architecture produces good segmentation results on publicly available datasets. Furthermore, ResUNet++ significantly outperforms U-Net and ResUNet, two key state-of-the-art deep learning architectures, by achieving high evaluation scores with a dice coefficient of 81.33%, and a mean Intersection over Union (mIoU) of 79.27% for the Kvasir-SEG dataset and a dice coefficient of 79.55%, and a mIoU of 79.62% with CVC-612 dataset.
Tasks Medical Image Segmentation, Semantic Segmentation
Published 2019-11-16
URL https://arxiv.org/abs/1911.07067v1
PDF https://arxiv.org/pdf/1911.07067v1.pdf
PWC https://paperswithcode.com/paper/resunet-an-advanced-architecture-for-medical
Repo
Framework

EdgeNet: A novel approach for Arabic numeral classification

Title EdgeNet: A novel approach for Arabic numeral classification
Authors S. M. A. Sharif, Ghulam Mujtaba, S. M. Nadim Uddin
Abstract Despite the importance of handwritten numeral classification, a robust and effective method for a widely used language like Arabic is still due. This study focuses to overcome two major limitations of existing works: data diversity and effective learning method. Hence, the existing Arabic numeral datasets have been merged into a single dataset and augmented to introduce data diversity. Moreover, a novel deep model has been proposed to exploit diverse data samples of unified dataset. The proposed deep model utilizes the low-level edge features by propagating them through residual connection. To make a fair comparison with the proposed model, the existing works have been studied under the unified dataset. The comparison experiments illustrate that the unified dataset accelerates the performance of the existing works. Moreover, the proposed model outperforms the existing state-of-the-art Arabic handwritten numeral classification methods and obtain an accuracy of 99.59% in the validation phase. Apart from that, different state-of-the-art classification models have studied with the same dataset to reveal their feasibility for the Arabic numeral classification. Code available at http://github.com/sharif-apu/EdgeNet.
Tasks
Published 2019-07-30
URL https://arxiv.org/abs/1908.02254v1
PDF https://arxiv.org/pdf/1908.02254v1.pdf
PWC https://paperswithcode.com/paper/edgenet-a-novel-approach-for-arabic-numeral
Repo
Framework

Task-Relevant Adversarial Imitation Learning

Title Task-Relevant Adversarial Imitation Learning
Authors Konrad Zolna, Scott Reed, Alexander Novikov, Sergio Gomez Colmenarej, David Budden, Serkan Cabi, Misha Denil, Nando de Freitas, Ziyu Wang
Abstract We show that a critical problem in adversarial imitation from high-dimensional sensory data is the tendency of discriminator networks to distinguish agent and expert behaviour using task-irrelevant features beyond the control of the agent. We analyze this problem in detail and propose a solution as well as several baselines that outperform standard Generative Adversarial Imitation Learning (GAIL). Our proposed solution, Task-Relevant Adversarial Imitation Learning (TRAIL), uses a constrained optimization objective to overcome task-irrelevant features. Comprehensive experiments show that TRAIL can solve challenging manipulation tasks from pixels by imitating human operators, where other agents such as behaviour cloning (BC), standard GAIL, improved GAIL variants including our newly proposed baselines, and Deterministic Policy Gradients from Demonstrations (DPGfD) fail to find solutions, even when the other agents have access to task reward.
Tasks Imitation Learning
Published 2019-10-02
URL https://arxiv.org/abs/1910.01077v1
PDF https://arxiv.org/pdf/1910.01077v1.pdf
PWC https://paperswithcode.com/paper/task-relevant-adversarial-imitation-learning
Repo
Framework

Reweighted Proximal Pruning for Large-Scale Language Representation

Title Reweighted Proximal Pruning for Large-Scale Language Representation
Authors Fu-Ming Guo, Sijia Liu, Finlay S. Mungall, Xue Lin, Yanzhi Wang
Abstract Recently, pre-trained language representation flourishes as the mainstay of the natural language understanding community, e.g., BERT. These pre-trained language representations can create state-of-the-art results on a wide range of downstream tasks. Along with continuous significant performance improvement, the size and complexity of these pre-trained neural models continue to increase rapidly. Is it possible to compress these large-scale language representation models? How will the pruned language representation affect the downstream multi-task transfer learning objectives? In this paper, we propose Reweighted Proximal Pruning (RPP), a new pruning method specifically designed for a large-scale language representation model. Through experiments on SQuAD and the GLUE benchmark suite, we show that proximal pruned BERT keeps high accuracy for both the pre-training task and the downstream multiple fine-tuning tasks at high prune ratio. RPP provides a new perspective to help us analyze what large-scale language representation might learn. Additionally, RPP makes it possible to deploy a large state-of-the-art language representation model such as BERT on a series of distinct devices (e.g., online servers, mobile phones, and edge devices).
Tasks Transfer Learning
Published 2019-09-27
URL https://arxiv.org/abs/1909.12486v2
PDF https://arxiv.org/pdf/1909.12486v2.pdf
PWC https://paperswithcode.com/paper/reweighted-proximal-pruning-for-large-scale-1
Repo
Framework

Learning Singing From Speech

Title Learning Singing From Speech
Authors Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu
Abstract We propose an algorithm that is capable of synthesizing high quality target speaker’s singing voice given only their normal speech samples. The proposed algorithm first integrate speech and singing synthesis into a unified framework, and learns universal speaker embeddings that are shareable between speech and singing synthesis tasks. Specifically, the speaker embeddings learned from normal speech via the speech synthesis objective are shared with those learned from singing samples via the singing synthesis objective in the unified training framework. This makes the learned speaker embedding a transferable representation for both speaking and singing. We evaluate the proposed algorithm on singing voice conversion task where the content of original singing is covered with the timbre of another speaker’s voice learned purely from their normal speech samples. Our experiments indicate that the proposed algorithm generates high-quality singing voices that sound highly similar to target speaker’s voice given only his or her normal speech samples. We believe that proposed algorithm will open up new opportunities for singing synthesis and conversion for broader users and applications.
Tasks Speech Synthesis, Voice Conversion
Published 2019-12-20
URL https://arxiv.org/abs/1912.10128v1
PDF https://arxiv.org/pdf/1912.10128v1.pdf
PWC https://paperswithcode.com/paper/learning-singing-from-speech
Repo
Framework
comments powered by Disqus