October 18, 2019

3118 words 15 mins read

Paper Group ANR 435

Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health. A High GOPs/Slice Time Series Classifier for Portable and Embedded Biomedical Applications. PPFNet: Global Context Aware Local Features for Robust 3D Point Matching. OpenSeqSLAM2.0: An Open Source Toolbox for Visual Place Recognition …

Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health


Title	Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health
Authors	Gregory Yauney, Aman Rana, Lawrence C. Wong, Perikumar Javia, Ali Muftu, Pratik Shah
Abstract	Imaging fluorescent disease biomarkers in tissues and skin is a non-invasive method to screen for health conditions. We report an automated process that combines intraoral fluorescent porphyrin biomarker imaging, clinical examinations and machine learning for correlation of systemic health conditions with periodontal disease. 1215 intraoral fluorescent images, from 284 consenting adults aged 18-90, were analyzed using a machine learning classifier that can segment periodontal inflammation. The classifier achieved an AUC of 0.677 with precision and recall of 0.271 and 0.429, respectively, indicating a learned association between disease signatures in collected images. Periodontal diseases were more prevalent among males (p=0.0012) and older subjects (p=0.0224) in the screened population. Physicians independently examined the collected images, assigning localized modified gingival indices (MGIs). MGIs and periodontal disease were then cross-correlated with responses to a medical history questionnaire, blood pressure and body mass index measurements, and optic nerve, tympanic membrane, neurological, and cardiac rhythm imaging examinations. Gingivitis and early periodontal disease were associated with subjects diagnosed with optic nerve abnormalities (p <0.0001) in their retinal scans. We also report significant co-occurrences of periodontal disease in subjects reporting swollen joints (p=0.0422) and a family history of eye disease (p=0.0337). These results indicate cross-correlation of poor periodontal health with systemic health outcomes and stress the importance of oral health screenings at the primary care level. Our screening process and analysis method, using images and machine learning, can be generalized for automated diagnoses and systemic health screenings for other diseases.
Tasks
Published	2018-10-25
URL	http://arxiv.org/abs/1810.10664v1
PDF	http://arxiv.org/pdf/1810.10664v1.pdf
PWC	https://paperswithcode.com/paper/automated-process-incorporating-machine
Repo
Framework

A High GOPs/Slice Time Series Classifier for Portable and Embedded Biomedical Applications


Title	A High GOPs/Slice Time Series Classifier for Portable and Embedded Biomedical Applications
Authors	Hamid Soleimani, Aliasghar, Makhlooghpour, Wilten Nicola, Claudia Clopath, Emmanuel. M. Drakakis
Abstract	Nowadays a diverse range of physiological data can be captured continuously for various applications in particular wellbeing and healthcare. Such data require efficient methods for classification and analysis. Deep learning algorithms have shown remarkable potential regarding such analyses, however, the use of these algorithms on low-power wearable devices is challenged by resource constraints such as area and power consumption. Most of the available on-chip deep learning processors contain complex and dense hardware architectures in order to achieve the highest possible throughput. Such a trend in hardware design may not be efficient in applications where on-node computation is required and the focus is more on the area and power efficiency as in the case of portable and embedded biomedical devices. This paper presents an efficient time-series classifier capable of automatically detecting effective features and classifying the input signals in real-time. In the proposed classifier, throughput is traded off with hardware complexity and cost using resource sharing techniques. A Convolutional Neural Network (CNN) is employed to extract input features and then a Long-Short-Term-Memory (LSTM) architecture with ternary weight precision classifies the input signals according to the extracted features. Hardware implementation on a Xilinx FPGA confirm that the proposed hardware can accurately classify multiple complex biomedical time series data with low area and power consumption and outperform all previously presented state-of-the-art records. Most notably, our classifier reaches 1.3$\times$ higher GOPs/Slice than similar state of the art FPGA-based accelerators.
Tasks	Time Series
Published	2018-02-27
URL	http://arxiv.org/abs/1802.10458v2
PDF	http://arxiv.org/pdf/1802.10458v2.pdf
PWC	https://paperswithcode.com/paper/a-high-gopsslice-time-series-classifier-for
Repo
Framework

PPFNet: Global Context Aware Local Features for Robust 3D Point Matching


Title	PPFNet: Global Context Aware Local Features for Robust 3D Point Matching
Authors	Haowen Deng, Tolga Birdal, Slobodan Ilic
Abstract	We present PPFNet - Point Pair Feature NETwork for deeply learning a globally informed 3D local feature descriptor to find correspondences in unorganized point clouds. PPFNet learns local descriptors on pure geometry and is highly aware of the global context, an important cue in deep learning. Our 3D representation is computed as a collection of point-pair-features combined with the points and normals within a local vicinity. Our permutation invariant network design is inspired by PointNet and sets PPFNet to be ordering-free. As opposed to voxelization, our method is able to consume raw point clouds to exploit the full sparsity. PPFNet uses a novel $\textit{N-tuple}$ loss and architecture injecting the global information naturally into the local descriptor. It shows that context awareness also boosts the local feature representation. Qualitative and quantitative evaluations of our network suggest increased recall, improved robustness and invariance as well as a vital step in the 3D descriptor extraction performance.
Tasks
Published	2018-02-07
URL	http://arxiv.org/abs/1802.02669v2
PDF	http://arxiv.org/pdf/1802.02669v2.pdf
PWC	https://paperswithcode.com/paper/ppfnet-global-context-aware-local-features
Repo
Framework

OpenSeqSLAM2.0: An Open Source Toolbox for Visual Place Recognition Under Changing Conditions


Title	OpenSeqSLAM2.0: An Open Source Toolbox for Visual Place Recognition Under Changing Conditions
Authors	Ben Talbot, Sourav Garg, Michael Milford
Abstract	Visually recognising a traversed route - regardless of whether seen during the day or night, in clear or inclement conditions, or in summer or winter - is an important capability for navigating robots. Since SeqSLAM was introduced in 2012, a large body of work has followed exploring how robotic systems can use the algorithm to meet the challenges posed by navigation in changing environmental conditions. The following paper describes OpenSeqSLAM2.0, a fully open source toolbox for visual place recognition under changing conditions. Beyond the benefits of open access to the source code, OpenSeqSLAM2.0 provides a number of tools to facilitate exploration of the visual place recognition problem and interactive parameter tuning. Using the new open source platform, it is shown for the first time how comprehensive parameter characterisations provide new insights into many of the system components previously presented in ad hoc ways and provide users with a guide to what system component options should be used under what circumstances and why.
Tasks	Visual Place Recognition
Published	2018-04-06
URL	http://arxiv.org/abs/1804.02156v2
PDF	http://arxiv.org/pdf/1804.02156v2.pdf
PWC	https://paperswithcode.com/paper/openseqslam20-an-open-source-toolbox-for
Repo
Framework

Practical Shape Analysis and Segmentation Methods for Point Cloud Models


Title	Practical Shape Analysis and Segmentation Methods for Point Cloud Models
Authors	Reed M. Williams, Horea T. Ilieş
Abstract	Current point cloud processing algorithms do not have the capability to automatically extract semantic information from the observed scenes, except in very specialized cases. Furthermore, existing mesh analysis paradigms cannot be directly employed to automatically perform typical shape analysis tasks directly on point cloud models. We present a potent framework for shape analysis, similarity, and segmentation of noisy point cloud models for real objects of engineering interest, models that may be incomplete. The proposed framework relies on spectral methods and the heat diffusion kernel to construct compact shape signatures, and we show that the framework supports a variety of clustering techniques that have traditionally been applied only on mesh models. We developed and implemented one practical and convergent estimate of the Laplace-Beltrami operator for point clouds as well as a number of clustering techniques adapted to work directly on point clouds to produce geometric features of engineering interest. The key advantage of this framework is that it supports practical shape analysis capabilities that operate directly on point cloud models of objects without requiring surface reconstruction or global meshing. We show that the proposed technique is robust against typical noise present in possibly incomplete point clouds, and segment point clouds scanned by depth cameras (e.g. Kinect) into semantically-meaningful sub-shapes.
Tasks
Published	2018-10-25
URL	http://arxiv.org/abs/1810.10933v1
PDF	http://arxiv.org/pdf/1810.10933v1.pdf
PWC	https://paperswithcode.com/paper/practical-shape-analysis-and-segmentation
Repo
Framework

Self-Attention Linguistic-Acoustic Decoder


Title	Self-Attention Linguistic-Acoustic Decoder
Authors	Santiago Pascual, Antonio Bonafonte, Joan Serrà
Abstract	The conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks. Despite the good performance of such models (in terms of low distortion in the generated speech), their recursive structure tends to make them slow to train and to sample from. In this work, we try to overcome the limitations of recursive structure by using a module based on the transformer decoder network, designed without recurrent connections but emulating them with attention and positioning codes. Our results show that the proposed decoder network is competitive in terms of distortion when compared to a recurrent baseline, whilst being significantly faster in terms of CPU inference time. On average, it increases Mel cepstral distortion between 0.1 and 0.3 dB, but it is over an order of magnitude faster on average. Fast inference is important for the deployment of speech synthesis systems on devices with restricted resources, like mobile phones or embedded systems, where speaking virtual assistants are gaining importance.
Tasks	Speech Synthesis
Published	2018-08-31
URL	http://arxiv.org/abs/1808.10678v2
PDF	http://arxiv.org/pdf/1808.10678v2.pdf
PWC	https://paperswithcode.com/paper/self-attention-linguistic-acoustic-decoder
Repo
Framework

Crack Detection Using Enhanced Thresholding on UAV based Collected Images


Title	Crack Detection Using Enhanced Thresholding on UAV based Collected Images
Authors	Q. Zhu, T. H. Dinh, V. T. Hoang, M. D. Phung, Q. P. Ha
Abstract	This paper proposes a thresholding approach for crack detection in an unmanned aerial vehicle (UAV) based infrastructure inspection system. The proposed algorithm performs recursively on the intensity histogram of UAV-taken images to exploit their crack-pixels appearing at the low intensity interval. A quantified criterion of interclass contrast is proposed and employed as an object cost and stop condition for the recursive process. Experiments on different datasets show that our algorithm outperforms different segmentation approaches to accurately extract crack features of some commercial buildings.
Tasks
Published	2018-12-19
URL	http://arxiv.org/abs/1812.07868v1
PDF	http://arxiv.org/pdf/1812.07868v1.pdf
PWC	https://paperswithcode.com/paper/crack-detection-using-enhanced-thresholding
Repo
Framework

Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis


Title	Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
Authors	Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, RJ Skerry-Ryan
Abstract	Although end-to-end text-to-speech (TTS) models such as Tacotron have shown excellent results, they typically require a sizable set of high-quality <text, audio> pairs for training, which are expensive to collect. In this paper, we propose a semi-supervised training framework to improve the data efficiency of Tacotron. The idea is to allow Tacotron to utilize textual and acoustic knowledge contained in large, publicly-available text and speech corpora. Importantly, these external data are unpaired and potentially noisy. Specifically, first we embed each word in the input text into word vectors and condition the Tacotron encoder on them. We then use an unpaired speech corpus to pre-train the Tacotron decoder in the acoustic domain. Finally, we fine-tune the model using available paired data. We demonstrate that the proposed framework enables Tacotron to generate intelligible speech using less than half an hour of paired training data.
Tasks	Speech Synthesis
Published	2018-08-30
URL	http://arxiv.org/abs/1808.10128v1
PDF	http://arxiv.org/pdf/1808.10128v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-training-for-improving-data
Repo
Framework

An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method


Title	An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method
Authors	Ajin George Joseph, Shalabh Bhatnagar
Abstract	In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic approximation variant of the very popular cross entropy (CE) optimization method which is a model based search method to find the global optimum of a real-valued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability.
Tasks
Published	2018-06-15
URL	http://arxiv.org/abs/1806.06720v1
PDF	http://arxiv.org/pdf/1806.06720v1.pdf
PWC	https://paperswithcode.com/paper/an-online-prediction-algorithm-for
Repo
Framework

Multimodal speech synthesis architecture for unsupervised speaker adaptation


Title	Multimodal speech synthesis architecture for unsupervised speaker adaptation
Authors	Hieu-Thi Luong, Junichi Yamagishi
Abstract	This paper proposes a new architecture for speaker adaptation of multi-speaker neural-network speech synthesis systems, in which an unseen speaker’s voice can be built using a relatively small amount of speech data without transcriptions. This is sometimes called “unsupervised speaker adaptation”. More specifically, we concatenate the layers to the audio inputs when performing unsupervised speaker adaptation while we concatenate them to the text inputs when synthesizing speech from text. Two new training schemes for the new architecture are also proposed in this paper. These training schemes are not limited to speech synthesis, other applications are suggested. Experimental results show that the proposed model not only enables adaptation to unseen speakers using untranscribed speech but it also improves the performance of multi-speaker modeling and speaker adaptation using transcribed audio files.
Tasks	Speech Synthesis
Published	2018-08-20
URL	http://arxiv.org/abs/1808.06288v1
PDF	http://arxiv.org/pdf/1808.06288v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-speech-synthesis-architecture-for
Repo
Framework

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks


Title	Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks
Authors	Sercan O. Arik, Heewoo Jun, Gregory Diamos
Abstract	We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN achieves more than an order of magnitude higher compute intensity than commonly-used iterative algorithms like Griffin-Lim, yielding efficient utilization for modern multi-core processors, and very fast (more than 300x real-time) waveform synthesis. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality. We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations.
Tasks	Speech Recognition, Speech Synthesis
Published	2018-08-20
URL	http://arxiv.org/abs/1808.06719v2
PDF	http://arxiv.org/pdf/1808.06719v2.pdf
PWC	https://paperswithcode.com/paper/fast-spectrogram-inversion-using-multi-head
Repo
Framework

Geometry-aware Deep Network for Single-Image Novel View Synthesis


Title	Geometry-aware Deep Network for Single-Image Novel View Synthesis
Authors	Miaomiao Liu, Xuming He, Mathieu Salzmann
Abstract	This paper tackles the problem of novel view synthesis from a single image. In particular, we target real-world scenes with rich geometric structure, a challenging task due to the large appearance variations of such scenes and the lack of simple 3D models to represent them. Modern, learning-based approaches mostly focus on appearance to synthesize novel views and thus tend to generate predictions that are inconsistent with the underlying scene structure. By contrast, in this paper, we propose to exploit the 3D geometry of the scene to synthesize a novel view. Specifically, we approximate a real-world scene by a fixed number of planes, and learn to predict a set of homographies and their corresponding region masks to transform the input image into a novel view. To this end, we develop a new region-aware geometric transform network that performs these multiple tasks in a common framework. Our results on the outdoor KITTI and the indoor ScanNet datasets demonstrate the effectiveness of our network in generating high quality synthetic views that respect the scene geometry, thus outperforming the state-of-the-art methods.
Tasks	Novel View Synthesis
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06008v1
PDF	http://arxiv.org/pdf/1804.06008v1.pdf
PWC	https://paperswithcode.com/paper/geometry-aware-deep-network-for-single-image
Repo
Framework

Accelerating Convolutional Neural Networks via Activation Map Compression


Title	Accelerating Convolutional Neural Networks via Activation Map Compression
Authors	Georgios Georgiadis
Abstract	The deep learning revolution brought us an extensive array of neural network architectures that achieve state-of-the-art performance in a wide variety of Computer Vision tasks including among others, classification, detection and segmentation. In parallel, we have also been observing an unprecedented demand in computational and memory requirements, rendering the efficient use of neural networks in low-powered devices virtually unattainable. Towards this end, we propose a three-stage compression and acceleration pipeline that sparsifies, quantizes and entropy encodes activation maps of Convolutional Neural Networks. Sparsification increases the representational power of activation maps leading to both acceleration of inference and higher model accuracy. Inception-V3 and MobileNet-V1 can be accelerated by as much as $1.6\times$ with an increase in accuracy of $0.38%$ and $0.54%$ on the ImageNet and CIFAR-10 datasets respectively. Quantizing and entropy coding the sparser activation maps lead to higher compression over the baseline, reducing the memory cost of the network execution. Inception-V3 and MobileNet-V1 activation maps, quantized to $16$ bits, are compressed by as much as $6\times$ with an increase in accuracy of $0.36%$ and $0.55%$ respectively.
Tasks
Published	2018-12-10
URL	http://arxiv.org/abs/1812.04056v2
PDF	http://arxiv.org/pdf/1812.04056v2.pdf
PWC	https://paperswithcode.com/paper/accelerating-convolutional-neural-networks
Repo
Framework

BOP: Benchmark for 6D Object Pose Estimation


Title	BOP: Benchmark for 6D Object Pose Estimation
Authors	Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders Glent Buch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis, Caner Sahin, Fabian Manhardt, Federico Tombari, Tae-Kyun Kim, Jiri Matas, Carsten Rother
Abstract	We propose a benchmark for 6D pose estimation of a rigid object from a single RGB-D input image. The training data consists of a texture-mapped 3D object model or images of the object in known 6D poses. The benchmark comprises of: i) eight datasets in a unified format that cover different practical scenarios, including two new datasets focusing on varying lighting conditions, ii) an evaluation methodology with a pose-error function that deals with pose ambiguities, iii) a comprehensive evaluation of 15 diverse recent methods that captures the status quo of the field, and iv) an online evaluation system that is open for continuous submission of new results. The evaluation shows that methods based on point-pair features currently perform best, outperforming template matching methods, learning-based methods and methods based on 3D local features. The project website is available at bop.felk.cvut.cz.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGBD, Pose Estimation
Published	2018-08-24
URL	http://arxiv.org/abs/1808.08319v1
PDF	http://arxiv.org/pdf/1808.08319v1.pdf
PWC	https://paperswithcode.com/paper/bop-benchmark-for-6d-object-pose-estimation
Repo
Framework

Unsupervised Representation Learning with Laplacian Pyramid Auto-encoders


Title	Unsupervised Representation Learning with Laplacian Pyramid Auto-encoders
Authors	Qilu Zhao, Zongmin Li
Abstract	Scale-space representation has been popular in computer vision community due to its theoretical foundation. The motivation for generating a scale-space representation of a given data set originates from the basic observation that real-world objects are composed of different structures at different scales. Hence, it’s reasonable to consider learning features with image pyramids generated by smoothing and down-sampling operations. In this paper we propose Laplacian pyramid auto-encoders, a straightforward modification of the deep convolutional auto-encoder architecture, for unsupervised representation learning. The method uses multiple encoding-decoding sub-networks within a Laplacian pyramid framework to reconstruct the original image and the low pass filtered images. The last layer of each encoding sub-network also connects to an encoding layer of the sub-network in the next level, which aims to reverse the process of Laplacian pyramid generation. Experimental results showed that Laplacian pyramid benefited the classification and reconstruction performance of deep auto-encoder approaches, and batch normalization is critical to get deep auto-encoders approaches to begin learning.
Tasks	Representation Learning, Unsupervised Representation Learning
Published	2018-01-16
URL	http://arxiv.org/abs/1801.05278v2
PDF	http://arxiv.org/pdf/1801.05278v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-representation-learning-with
Repo
Framework