Paper Group ANR 435
Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health. A High GOPs/Slice Time Series Classifier for Portable and Embedded Biomedical Applications. PPFNet: Global Context Aware Local Features for Robust 3D Point Matching. OpenSeqSLAM2.0: An Open Source Toolbox for Visual Place Recognition …
Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health
Title | Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health |
Authors | Gregory Yauney, Aman Rana, Lawrence C. Wong, Perikumar Javia, Ali Muftu, Pratik Shah |
Abstract | Imaging fluorescent disease biomarkers in tissues and skin is a non-invasive method to screen for health conditions. We report an automated process that combines intraoral fluorescent porphyrin biomarker imaging, clinical examinations and machine learning for correlation of systemic health conditions with periodontal disease. 1215 intraoral fluorescent images, from 284 consenting adults aged 18-90, were analyzed using a machine learning classifier that can segment periodontal inflammation. The classifier achieved an AUC of 0.677 with precision and recall of 0.271 and 0.429, respectively, indicating a learned association between disease signatures in collected images. Periodontal diseases were more prevalent among males (p=0.0012) and older subjects (p=0.0224) in the screened population. Physicians independently examined the collected images, assigning localized modified gingival indices (MGIs). MGIs and periodontal disease were then cross-correlated with responses to a medical history questionnaire, blood pressure and body mass index measurements, and optic nerve, tympanic membrane, neurological, and cardiac rhythm imaging examinations. Gingivitis and early periodontal disease were associated with subjects diagnosed with optic nerve abnormalities (p <0.0001) in their retinal scans. We also report significant co-occurrences of periodontal disease in subjects reporting swollen joints (p=0.0422) and a family history of eye disease (p=0.0337). These results indicate cross-correlation of poor periodontal health with systemic health outcomes and stress the importance of oral health screenings at the primary care level. Our screening process and analysis method, using images and machine learning, can be generalized for automated diagnoses and systemic health screenings for other diseases. |
Tasks | |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.10664v1 |
http://arxiv.org/pdf/1810.10664v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-process-incorporating-machine |
Repo | |
Framework | |
A High GOPs/Slice Time Series Classifier for Portable and Embedded Biomedical Applications
Title | A High GOPs/Slice Time Series Classifier for Portable and Embedded Biomedical Applications |
Authors | Hamid Soleimani, Aliasghar, Makhlooghpour, Wilten Nicola, Claudia Clopath, Emmanuel. M. Drakakis |
Abstract | Nowadays a diverse range of physiological data can be captured continuously for various applications in particular wellbeing and healthcare. Such data require efficient methods for classification and analysis. Deep learning algorithms have shown remarkable potential regarding such analyses, however, the use of these algorithms on low-power wearable devices is challenged by resource constraints such as area and power consumption. Most of the available on-chip deep learning processors contain complex and dense hardware architectures in order to achieve the highest possible throughput. Such a trend in hardware design may not be efficient in applications where on-node computation is required and the focus is more on the area and power efficiency as in the case of portable and embedded biomedical devices. This paper presents an efficient time-series classifier capable of automatically detecting effective features and classifying the input signals in real-time. In the proposed classifier, throughput is traded off with hardware complexity and cost using resource sharing techniques. A Convolutional Neural Network (CNN) is employed to extract input features and then a Long-Short-Term-Memory (LSTM) architecture with ternary weight precision classifies the input signals according to the extracted features. Hardware implementation on a Xilinx FPGA confirm that the proposed hardware can accurately classify multiple complex biomedical time series data with low area and power consumption and outperform all previously presented state-of-the-art records. Most notably, our classifier reaches 1.3$\times$ higher GOPs/Slice than similar state of the art FPGA-based accelerators. |
Tasks | Time Series |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.10458v2 |
http://arxiv.org/pdf/1802.10458v2.pdf | |
PWC | https://paperswithcode.com/paper/a-high-gopsslice-time-series-classifier-for |
Repo | |
Framework | |
PPFNet: Global Context Aware Local Features for Robust 3D Point Matching
Title | PPFNet: Global Context Aware Local Features for Robust 3D Point Matching |
Authors | Haowen Deng, Tolga Birdal, Slobodan Ilic |
Abstract | We present PPFNet - Point Pair Feature NETwork for deeply learning a globally informed 3D local feature descriptor to find correspondences in unorganized point clouds. PPFNet learns local descriptors on pure geometry and is highly aware of the global context, an important cue in deep learning. Our 3D representation is computed as a collection of point-pair-features combined with the points and normals within a local vicinity. Our permutation invariant network design is inspired by PointNet and sets PPFNet to be ordering-free. As opposed to voxelization, our method is able to consume raw point clouds to exploit the full sparsity. PPFNet uses a novel $\textit{N-tuple}$ loss and architecture injecting the global information naturally into the local descriptor. It shows that context awareness also boosts the local feature representation. Qualitative and quantitative evaluations of our network suggest increased recall, improved robustness and invariance as well as a vital step in the 3D descriptor extraction performance. |
Tasks | |
Published | 2018-02-07 |
URL | http://arxiv.org/abs/1802.02669v2 |
http://arxiv.org/pdf/1802.02669v2.pdf | |
PWC | https://paperswithcode.com/paper/ppfnet-global-context-aware-local-features |
Repo | |
Framework | |
OpenSeqSLAM2.0: An Open Source Toolbox for Visual Place Recognition Under Changing Conditions
Title | OpenSeqSLAM2.0: An Open Source Toolbox for Visual Place Recognition Under Changing Conditions |
Authors | Ben Talbot, Sourav Garg, Michael Milford |
Abstract | Visually recognising a traversed route - regardless of whether seen during the day or night, in clear or inclement conditions, or in summer or winter - is an important capability for navigating robots. Since SeqSLAM was introduced in 2012, a large body of work has followed exploring how robotic systems can use the algorithm to meet the challenges posed by navigation in changing environmental conditions. The following paper describes OpenSeqSLAM2.0, a fully open source toolbox for visual place recognition under changing conditions. Beyond the benefits of open access to the source code, OpenSeqSLAM2.0 provides a number of tools to facilitate exploration of the visual place recognition problem and interactive parameter tuning. Using the new open source platform, it is shown for the first time how comprehensive parameter characterisations provide new insights into many of the system components previously presented in ad hoc ways and provide users with a guide to what system component options should be used under what circumstances and why. |
Tasks | Visual Place Recognition |
Published | 2018-04-06 |
URL | http://arxiv.org/abs/1804.02156v2 |
http://arxiv.org/pdf/1804.02156v2.pdf | |
PWC | https://paperswithcode.com/paper/openseqslam20-an-open-source-toolbox-for |
Repo | |
Framework | |
Practical Shape Analysis and Segmentation Methods for Point Cloud Models
Title | Practical Shape Analysis and Segmentation Methods for Point Cloud Models |
Authors | Reed M. Williams, Horea T. Ilieş |
Abstract | Current point cloud processing algorithms do not have the capability to automatically extract semantic information from the observed scenes, except in very specialized cases. Furthermore, existing mesh analysis paradigms cannot be directly employed to automatically perform typical shape analysis tasks directly on point cloud models. We present a potent framework for shape analysis, similarity, and segmentation of noisy point cloud models for real objects of engineering interest, models that may be incomplete. The proposed framework relies on spectral methods and the heat diffusion kernel to construct compact shape signatures, and we show that the framework supports a variety of clustering techniques that have traditionally been applied only on mesh models. We developed and implemented one practical and convergent estimate of the Laplace-Beltrami operator for point clouds as well as a number of clustering techniques adapted to work directly on point clouds to produce geometric features of engineering interest. The key advantage of this framework is that it supports practical shape analysis capabilities that operate directly on point cloud models of objects without requiring surface reconstruction or global meshing. We show that the proposed technique is robust against typical noise present in possibly incomplete point clouds, and segment point clouds scanned by depth cameras (e.g. Kinect) into semantically-meaningful sub-shapes. |
Tasks | |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.10933v1 |
http://arxiv.org/pdf/1810.10933v1.pdf | |
PWC | https://paperswithcode.com/paper/practical-shape-analysis-and-segmentation |
Repo | |
Framework | |
Self-Attention Linguistic-Acoustic Decoder
Title | Self-Attention Linguistic-Acoustic Decoder |
Authors | Santiago Pascual, Antonio Bonafonte, Joan Serrà |
Abstract | The conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks. Despite the good performance of such models (in terms of low distortion in the generated speech), their recursive structure tends to make them slow to train and to sample from. In this work, we try to overcome the limitations of recursive structure by using a module based on the transformer decoder network, designed without recurrent connections but emulating them with attention and positioning codes. Our results show that the proposed decoder network is competitive in terms of distortion when compared to a recurrent baseline, whilst being significantly faster in terms of CPU inference time. On average, it increases Mel cepstral distortion between 0.1 and 0.3 dB, but it is over an order of magnitude faster on average. Fast inference is important for the deployment of speech synthesis systems on devices with restricted resources, like mobile phones or embedded systems, where speaking virtual assistants are gaining importance. |
Tasks | Speech Synthesis |
Published | 2018-08-31 |
URL | http://arxiv.org/abs/1808.10678v2 |
http://arxiv.org/pdf/1808.10678v2.pdf | |
PWC | https://paperswithcode.com/paper/self-attention-linguistic-acoustic-decoder |
Repo | |
Framework | |
Crack Detection Using Enhanced Thresholding on UAV based Collected Images
Title | Crack Detection Using Enhanced Thresholding on UAV based Collected Images |
Authors | Q. Zhu, T. H. Dinh, V. T. Hoang, M. D. Phung, Q. P. Ha |
Abstract | This paper proposes a thresholding approach for crack detection in an unmanned aerial vehicle (UAV) based infrastructure inspection system. The proposed algorithm performs recursively on the intensity histogram of UAV-taken images to exploit their crack-pixels appearing at the low intensity interval. A quantified criterion of interclass contrast is proposed and employed as an object cost and stop condition for the recursive process. Experiments on different datasets show that our algorithm outperforms different segmentation approaches to accurately extract crack features of some commercial buildings. |
Tasks | |
Published | 2018-12-19 |
URL | http://arxiv.org/abs/1812.07868v1 |
http://arxiv.org/pdf/1812.07868v1.pdf | |
PWC | https://paperswithcode.com/paper/crack-detection-using-enhanced-thresholding |
Repo | |
Framework | |
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
Title | Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis |
Authors | Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, RJ Skerry-Ryan |
Abstract | Although end-to-end text-to-speech (TTS) models such as Tacotron have shown excellent results, they typically require a sizable set of high-quality <text, audio> pairs for training, which are expensive to collect. In this paper, we propose a semi-supervised training framework to improve the data efficiency of Tacotron. The idea is to allow Tacotron to utilize textual and acoustic knowledge contained in large, publicly-available text and speech corpora. Importantly, these external data are unpaired and potentially noisy. Specifically, first we embed each word in the input text into word vectors and condition the Tacotron encoder on them. We then use an unpaired speech corpus to pre-train the Tacotron decoder in the acoustic domain. Finally, we fine-tune the model using available paired data. We demonstrate that the proposed framework enables Tacotron to generate intelligible speech using less than half an hour of paired training data. |
Tasks | Speech Synthesis |
Published | 2018-08-30 |
URL | http://arxiv.org/abs/1808.10128v1 |
http://arxiv.org/pdf/1808.10128v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-training-for-improving-data |
Repo | |
Framework | |
An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method
Title | An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method |
Authors | Ajin George Joseph, Shalabh Bhatnagar |
Abstract | In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic approximation variant of the very popular cross entropy (CE) optimization method which is a model based search method to find the global optimum of a real-valued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability. |
Tasks | |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.06720v1 |
http://arxiv.org/pdf/1806.06720v1.pdf | |
PWC | https://paperswithcode.com/paper/an-online-prediction-algorithm-for |
Repo | |
Framework | |
Multimodal speech synthesis architecture for unsupervised speaker adaptation
Title | Multimodal speech synthesis architecture for unsupervised speaker adaptation |
Authors | Hieu-Thi Luong, Junichi Yamagishi |
Abstract | This paper proposes a new architecture for speaker adaptation of multi-speaker neural-network speech synthesis systems, in which an unseen speaker’s voice can be built using a relatively small amount of speech data without transcriptions. This is sometimes called “unsupervised speaker adaptation”. More specifically, we concatenate the layers to the audio inputs when performing unsupervised speaker adaptation while we concatenate them to the text inputs when synthesizing speech from text. Two new training schemes for the new architecture are also proposed in this paper. These training schemes are not limited to speech synthesis, other applications are suggested. Experimental results show that the proposed model not only enables adaptation to unseen speakers using untranscribed speech but it also improves the performance of multi-speaker modeling and speaker adaptation using transcribed audio files. |
Tasks | Speech Synthesis |
Published | 2018-08-20 |
URL | http://arxiv.org/abs/1808.06288v1 |
http://arxiv.org/pdf/1808.06288v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-speech-synthesis-architecture-for |
Repo | |
Framework | |
Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks
Title | Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks |
Authors | Sercan O. Arik, Heewoo Jun, Gregory Diamos |
Abstract | We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN achieves more than an order of magnitude higher compute intensity than commonly-used iterative algorithms like Griffin-Lim, yielding efficient utilization for modern multi-core processors, and very fast (more than 300x real-time) waveform synthesis. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality. We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations. |
Tasks | Speech Recognition, Speech Synthesis |
Published | 2018-08-20 |
URL | http://arxiv.org/abs/1808.06719v2 |
http://arxiv.org/pdf/1808.06719v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-spectrogram-inversion-using-multi-head |
Repo | |
Framework | |
Geometry-aware Deep Network for Single-Image Novel View Synthesis
Title | Geometry-aware Deep Network for Single-Image Novel View Synthesis |
Authors | Miaomiao Liu, Xuming He, Mathieu Salzmann |
Abstract | This paper tackles the problem of novel view synthesis from a single image. In particular, we target real-world scenes with rich geometric structure, a challenging task due to the large appearance variations of such scenes and the lack of simple 3D models to represent them. Modern, learning-based approaches mostly focus on appearance to synthesize novel views and thus tend to generate predictions that are inconsistent with the underlying scene structure. By contrast, in this paper, we propose to exploit the 3D geometry of the scene to synthesize a novel view. Specifically, we approximate a real-world scene by a fixed number of planes, and learn to predict a set of homographies and their corresponding region masks to transform the input image into a novel view. To this end, we develop a new region-aware geometric transform network that performs these multiple tasks in a common framework. Our results on the outdoor KITTI and the indoor ScanNet datasets demonstrate the effectiveness of our network in generating high quality synthetic views that respect the scene geometry, thus outperforming the state-of-the-art methods. |
Tasks | Novel View Synthesis |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06008v1 |
http://arxiv.org/pdf/1804.06008v1.pdf | |
PWC | https://paperswithcode.com/paper/geometry-aware-deep-network-for-single-image |
Repo | |
Framework | |
Accelerating Convolutional Neural Networks via Activation Map Compression
Title | Accelerating Convolutional Neural Networks via Activation Map Compression |
Authors | Georgios Georgiadis |
Abstract | The deep learning revolution brought us an extensive array of neural network architectures that achieve state-of-the-art performance in a wide variety of Computer Vision tasks including among others, classification, detection and segmentation. In parallel, we have also been observing an unprecedented demand in computational and memory requirements, rendering the efficient use of neural networks in low-powered devices virtually unattainable. Towards this end, we propose a three-stage compression and acceleration pipeline that sparsifies, quantizes and entropy encodes activation maps of Convolutional Neural Networks. Sparsification increases the representational power of activation maps leading to both acceleration of inference and higher model accuracy. Inception-V3 and MobileNet-V1 can be accelerated by as much as $1.6\times$ with an increase in accuracy of $0.38%$ and $0.54%$ on the ImageNet and CIFAR-10 datasets respectively. Quantizing and entropy coding the sparser activation maps lead to higher compression over the baseline, reducing the memory cost of the network execution. Inception-V3 and MobileNet-V1 activation maps, quantized to $16$ bits, are compressed by as much as $6\times$ with an increase in accuracy of $0.36%$ and $0.55%$ respectively. |
Tasks | |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.04056v2 |
http://arxiv.org/pdf/1812.04056v2.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-convolutional-neural-networks |
Repo | |
Framework | |
BOP: Benchmark for 6D Object Pose Estimation
Title | BOP: Benchmark for 6D Object Pose Estimation |
Authors | Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders Glent Buch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis, Caner Sahin, Fabian Manhardt, Federico Tombari, Tae-Kyun Kim, Jiri Matas, Carsten Rother |
Abstract | We propose a benchmark for 6D pose estimation of a rigid object from a single RGB-D input image. The training data consists of a texture-mapped 3D object model or images of the object in known 6D poses. The benchmark comprises of: i) eight datasets in a unified format that cover different practical scenarios, including two new datasets focusing on varying lighting conditions, ii) an evaluation methodology with a pose-error function that deals with pose ambiguities, iii) a comprehensive evaluation of 15 diverse recent methods that captures the status quo of the field, and iv) an online evaluation system that is open for continuous submission of new results. The evaluation shows that methods based on point-pair features currently perform best, outperforming template matching methods, learning-based methods and methods based on 3D local features. The project website is available at bop.felk.cvut.cz. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGBD, Pose Estimation |
Published | 2018-08-24 |
URL | http://arxiv.org/abs/1808.08319v1 |
http://arxiv.org/pdf/1808.08319v1.pdf | |
PWC | https://paperswithcode.com/paper/bop-benchmark-for-6d-object-pose-estimation |
Repo | |
Framework | |
Unsupervised Representation Learning with Laplacian Pyramid Auto-encoders
Title | Unsupervised Representation Learning with Laplacian Pyramid Auto-encoders |
Authors | Qilu Zhao, Zongmin Li |
Abstract | Scale-space representation has been popular in computer vision community due to its theoretical foundation. The motivation for generating a scale-space representation of a given data set originates from the basic observation that real-world objects are composed of different structures at different scales. Hence, it’s reasonable to consider learning features with image pyramids generated by smoothing and down-sampling operations. In this paper we propose Laplacian pyramid auto-encoders, a straightforward modification of the deep convolutional auto-encoder architecture, for unsupervised representation learning. The method uses multiple encoding-decoding sub-networks within a Laplacian pyramid framework to reconstruct the original image and the low pass filtered images. The last layer of each encoding sub-network also connects to an encoding layer of the sub-network in the next level, which aims to reverse the process of Laplacian pyramid generation. Experimental results showed that Laplacian pyramid benefited the classification and reconstruction performance of deep auto-encoder approaches, and batch normalization is critical to get deep auto-encoders approaches to begin learning. |
Tasks | Representation Learning, Unsupervised Representation Learning |
Published | 2018-01-16 |
URL | http://arxiv.org/abs/1801.05278v2 |
http://arxiv.org/pdf/1801.05278v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-representation-learning-with |
Repo | |
Framework | |