October 19, 2019

3386 words 16 mins read

Paper Group ANR 148

Deep Air Quality Forecasting Using Hybrid Deep Learning Framework. Disjoint Mapping Network for Cross-modal Matching of Voices and Faces. Leveraged volume sampling for linear regression. Cursive Scene Text Analysis by Deep Convolutional Linear Pyramids. How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD. Two-stage …

Deep Air Quality Forecasting Using Hybrid Deep Learning Framework


Title	Deep Air Quality Forecasting Using Hybrid Deep Learning Framework
Authors	Shengdong Du, Tianrui Li, Yan Yang, Shi-Jinn Horng
Abstract	Air quality forecasting has been regarded as the key problem of air pollution early warning and control management. In this paper, we propose a novel deep learning model for air quality (mainly PM2.5) forecasting, which learns the spatial-temporal correlation features and interdependence of multivariate air quality related time series data by hybrid deep learning architecture. Due to the nonlinear and dynamic characteristics of multivariate air quality time series data, the base modules of our model include one-dimensional Convolutional Neural Networks (1D-CNNs) and Bi-directional Long Short-term Memory networks (Bi-LSTM). The former is to extract the local trend features and spatial correlation features, and the latter is to learn spatial-temporal dependencies. Then we design a jointly hybrid deep learning framework based on one-dimensional CNNs and Bi-LSTM for shared representation features learning of multivariate air quality related time series data. We conduct extensive experimental evaluations using two real-world datasets, and the results show that our model is capable of dealing with PM2.5 air pollution forecasting with satisfied accuracy.
Tasks	Time Series
Published	2018-12-12
URL	https://arxiv.org/abs/1812.04783v3
PDF	https://arxiv.org/pdf/1812.04783v3.pdf
PWC	https://paperswithcode.com/paper/deep-air-quality-forecasting-using-hybrid
Repo
Framework


Title	Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
Authors	Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh
Abstract	We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces. Different from the existing methods, DIMNet does not explicitly learn the joint relationship between the modalities. Instead, DIMNet learns a shared representation for different modalities by mapping them individually to their common covariates. These shared representations can then be used to find the correspondences between the modalities. We show empirically that DIMNet is able to achieve better performance than other current methods, with the additional benefits of being conceptually simpler and less data-intensive.
Tasks
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04836v2
PDF	http://arxiv.org/pdf/1807.04836v2.pdf
PWC	https://paperswithcode.com/paper/disjoint-mapping-network-for-cross-modal
Repo
Framework

Leveraged volume sampling for linear regression


Title	Leveraged volume sampling for linear regression
Authors	Michał Dereziński, Manfred K. Warmuth, Daniel Hsu
Abstract	Suppose an $n \times d$ design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to sample only a small number $k \ll n$ of the responses, and then produce a weight vector whose sum of squares loss over all points is at most $1+\epsilon$ times the minimum. When $k$ is very small (e.g., $k=d$), jointly sampling diverse subsets of points is crucial. One such method called volume sampling has a unique and desirable property that the weight vector it produces is an unbiased estimate of the optimum. It is therefore natural to ask if this method offers the optimal unbiased estimate in terms of the number of responses $k$ needed to achieve a $1+\epsilon$ loss approximation. Surprisingly we show that volume sampling can have poor behavior when we require a very accurate approximation – indeed worse than some i.i.d. sampling techniques whose estimates are biased, such as leverage score sampling. We then develop a new rescaled variant of volume sampling that produces an unbiased estimate which avoids this bad behavior and has at least as good a tail bound as leverage score sampling: sample size $k=O(d\log d + d/\epsilon)$ suffices to guarantee total loss at most $1+\epsilon$ times the minimum with high probability. Thus, we improve on the best previously known sample size for an unbiased estimator, $k=O(d^2/\epsilon)$. Our rescaling procedure leads to a new efficient algorithm for volume sampling which is based on a determinantal rejection sampling technique with potentially broader applications to determinantal point processes. Other contributions include introducing the combinatorics needed for rescaled volume sampling and developing tail bounds for sums of dependent random matrices which arise in the process.
Tasks	Point Processes
Published	2018-02-19
URL	http://arxiv.org/abs/1802.06749v3
PDF	http://arxiv.org/pdf/1802.06749v3.pdf
PWC	https://paperswithcode.com/paper/leveraged-volume-sampling-for-linear
Repo
Framework

Cursive Scene Text Analysis by Deep Convolutional Linear Pyramids


Title	Cursive Scene Text Analysis by Deep Convolutional Linear Pyramids
Authors	Saad Bin Ahmed, Saeeda Naz, Muhammad Imran Razzak, Rubiyah Yusof
Abstract	The camera captured images have various aspects to investigate. Generally, the emphasis of research depends on the interesting regions. Sometimes the focus could be on color segmentation, object detection or scene text analysis. The image analysis, visibility and layout analysis are the tasks easier for humans as suggested by behavioral trait of humans, but in contrast when these same tasks are supposed to perform by machines then it seems to be challenging. The learning machines always learn from the properties associated to provided samples. The numerous approaches are designed in recent years for scene text extraction and recognition and the efforts are underway to improve the accuracy. The convolutional approach provided reasonable results on non-cursive text analysis appeared in natural images. The work presented in this manuscript exploited the strength of linear pyramids by considering each pyramid as a feature of the provided sample. Each pyramid image process through various empirically selected kernels. The performance was investigated by considering Arabic text on each image pyramid of EASTR-42k dataset. The error rate of 0.17% was reported on Arabic scene text recognition.
Tasks	Object Detection, Scene Text Recognition
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10792v1
PDF	http://arxiv.org/pdf/1809.10792v1.pdf
PWC	https://paperswithcode.com/paper/cursive-scene-text-analysis-by-deep
Repo
Framework

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD


Title	How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD
Authors	Zeyuan Allen-Zhu
Abstract	Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$. However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is convex. If $f(x)$ is convex, to find a point with gradient norm $\varepsilon$, we design an algorithm SGD3 with a near-optimal rate $\tilde{O}(\varepsilon^{-2})$, improving the best known rate $O(\varepsilon^{-8/3})$ of [17]. If $f(x)$ is nonconvex, to find its $\varepsilon$-approximate local minimum, we design an algorithm SGD5 with rate $\tilde{O}(\varepsilon^{-3.5})$, where previously SGD variants only achieve $\tilde{O}(\varepsilon^{-4})$ [6, 15, 32]. This is no slower than the best known stochastic version of Newton’s method in all parameter regimes [29].
Tasks
Published	2018-01-08
URL	http://arxiv.org/abs/1801.02982v2
PDF	http://arxiv.org/pdf/1801.02982v2.pdf
PWC	https://paperswithcode.com/paper/how-to-make-the-gradients-small
Repo
Framework

Two-stage quality adaptive fingerprint image enhancement using Fuzzy c-means clustering based fingerprint quality analysis


Title	Two-stage quality adaptive fingerprint image enhancement using Fuzzy c-means clustering based fingerprint quality analysis
Authors	Ram Prakash Sharma, Somnath Dey
Abstract	Fingerprint recognition techniques are immensely dependent on quality of the fingerprint images. To improve the performance of recognition algorithm for poor quality images an efficient enhancement algorithm should be designed. Performance improvement of recognition algorithm will be more if enhancement process is adaptive to the fingerprint quality (wet, dry or normal). In this paper, a quality adaptive fingerprint enhancement algorithm is proposed. The proposed fingerprint quality assessment algorithm clusters the fingerprint images in appropriate quality class of dry, wet, normal dry, normal wet and good quality using fuzzy c-means technique. It considers seven features namely, mean, moisture, variance, uniformity, contrast, ridge valley area uniformity and ridge valley uniformity into account for clustering the fingerprint images in appropriate quality class. Fingerprint images of each quality class undergo through a two-stage fingerprint quality enhancement process. A quality adaptive preprocessing method is used as front-end before enhancing the fingerprint images with Gabor, short term Fourier transform and oriented diffusion filtering based enhancement techniques. Experimental results show improvement in the verification results for FVC2004 datasets. Significant improvement in equal error rate is observed while using quality adaptive preprocessing based approaches in comparison to the current state-of-the-art enhancement techniques.
Tasks	Image Enhancement
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07527v1
PDF	http://arxiv.org/pdf/1805.07527v1.pdf
PWC	https://paperswithcode.com/paper/two-stage-quality-adaptive-fingerprint-image
Repo
Framework

Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition


Title	Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition
Authors	Dimitrios Kollias, Stefanos Zafeiriou
Abstract	Automatic understanding of human affect using visual signals is a problem that has attracted significant interest over the past 20 years. However, human emotional states are quite complex. To appraise such states displayed in real-world settings, we need expressive emotional descriptors that are capable of capturing and describing this complexity. The circumplex model of affect, which is described in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the activation of the emotion), can be used for this purpose. Recent progress in the emotion recognition domain has been achieved through the development of deep neural architectures and the availability of very large training databases. To this end, Aff-Wild has been the first large-scale “in-the-wild” database, containing around 1,200,000 frames. In this paper, we build upon this database, extending it with 260 more subjects and 1,413,000 new video frames. We call the union of Aff-Wild with the additional data, Aff-Wild2. The videos are downloaded from Youtube and have large variations in pose, age, illumination conditions, ethnicity and profession. Both database-specific as well as cross-database experiments are performed in this paper, by utilizing the Aff-Wild2, along with the RECOLA database. The developed deep neural architectures are based on the joint training of state-of-the-art convolutional and recurrent neural networks with attention mechanism; thus exploiting both the invariant properties of convolutional features, while modeling temporal dynamics that arise in human behaviour via the recurrent layers. The obtained results show premise for utilization of the extended Aff-Wild, as well as of the developed deep neural architectures for visual analysis of human behaviour in terms of continuous emotion dimensions.
Tasks	Emotion Recognition
Published	2018-11-11
URL	https://arxiv.org/abs/1811.07770v2
PDF	https://arxiv.org/pdf/1811.07770v2.pdf
PWC	https://paperswithcode.com/paper/aff-wild2-extending-the-aff-wild-database-for
Repo
Framework

Simple and practical algorithms for $\ell_p$-norm low-rank approximation


Title	Simple and practical algorithms for $\ell_p$-norm low-rank approximation
Authors	Anastasios Kyrillidis
Abstract	We propose practical algorithms for entrywise $\ell_p$-norm low-rank approximation, for $p = 1$ or $p = \infty$. The proposed framework, which is non-convex and gradient-based, is easy to implement and typically attains better approximations, faster, than state of the art. From a theoretical standpoint, we show that the proposed scheme can attain $(1 + \varepsilon)$-OPT approximations. Our algorithms are not hyperparameter-free: they achieve the desiderata only assuming algorithm’s hyperparameters are known a priori—or are at least approximable. I.e., our theory indicates what problem quantities need to be known, in order to get a good solution within polynomial time, and does not contradict to recent inapproximabilty results, as in [46].
Tasks
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09464v1
PDF	http://arxiv.org/pdf/1805.09464v1.pdf
PWC	https://paperswithcode.com/paper/simple-and-practical-algorithms-for-ell_p
Repo
Framework

Structure-preserving Guided Retinal Image Filtering and Its Application for Optic Disc Analysis


Title	Structure-preserving Guided Retinal Image Filtering and Its Application for Optic Disc Analysis
Authors	Jun Cheng, Zhengguo Li, Zaiwang Gu, Huazhu Fu, Damon Wing Kee Wong, Jiang Liu
Abstract	Retinal fundus photographs have been used in the diagnosis of many ocular diseases such as glaucoma, pathological myopia, age-related macular degeneration and diabetic retinopathy. With the development of computer science, computer aided diagnosis has been developed to process and analyse the retinal images automatically. One of the challenges in the analysis is that the quality of the retinal image is often degraded. For example, a cataract in human lens will attenuate the retinal image, just as a cloudy camera lens which reduces the quality of a photograph. It often obscures the details in the retinal images and posts challenges in retinal image processing and analysing tasks. In this paper, we approximate the degradation of the retinal images as a combination of human-lens attenuation and scattering. A novel structure-preserving guided retinal image filtering (SGRIF) is then proposed to restore images based on the attenuation and scattering model. The proposed SGRIF consists of a step of global structure transferring and a step of global edge-preserving smoothing. Our results show that the proposed SGRIF method is able to improve the contrast of retinal images, measured by histogram flatness measure, histogram spread and variability of local luminosity. In addition, we further explored the benefits of SGRIF for subsequent retinal image processing and analysing tasks. In the two applications of deep learning based optic cup segmentation and sparse learning based cup-to-disc ratio (CDR) computation, our results show that we are able to achieve more accurate optic cup segmentation and CDR measurements from images processed by SGRIF.
Tasks	Sparse Learning
Published	2018-05-17
URL	http://arxiv.org/abs/1805.06625v2
PDF	http://arxiv.org/pdf/1805.06625v2.pdf
PWC	https://paperswithcode.com/paper/structure-preserving-guided-retinal-image
Repo
Framework

Training Deep Neural Networks with 8-bit Floating Point Numbers


Title	Training Deep Neural Networks with 8-bit Floating Point Numbers
Authors	Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan
Abstract	The state-of-the-art hardware platforms for training Deep Neural Networks (DNNs) are moving from traditional single precision (32-bit) computations towards 16 bits of precision – in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations. However, unlike inference, training with numbers represented with less than 16 bits has been challenging due to the need to maintain fidelity of the gradient computations during back-propagation. Here we demonstrate, for the first time, the successful training of DNNs using 8-bit floating point numbers while fully maintaining the accuracy on a spectrum of Deep Learning models and datasets. In addition to reducing the data and computation precision to 8 bits, we also successfully reduce the arithmetic precision for additions (used in partial product accumulation and weight updates) from 32 bits to 16 bits through the introduction of a number of key ideas including chunk-based accumulation and floating point stochastic rounding. The use of these novel techniques lays the foundation for a new generation of hardware training platforms with the potential for 2-4x improved throughput over today’s systems.
Tasks
Published	2018-12-19
URL	http://arxiv.org/abs/1812.08011v1
PDF	http://arxiv.org/pdf/1812.08011v1.pdf
PWC	https://paperswithcode.com/paper/training-deep-neural-networks-with-8-bit
Repo
Framework

Projection image-to-image translation in hybrid X-ray/MR imaging


Title	Projection image-to-image translation in hybrid X-ray/MR imaging
Authors	Bernhard Stimpel, Christopher Syben, Tobias Würfl, Katharina Breininger, Katrin Mentl, Jonathan M. Lommen, Arnd Dörfler, Andreas Maier
Abstract	The potential benefit of hybrid X-ray and MR imaging in the interventional environment is large due to the combination of fast imaging with high contrast variety. However, a vast amount of existing image enhancement methods requires the image information of both modalities to be present in the same domain. To unlock this potential, we present a solution to image-to-image translation from MR projections to corresponding X-ray projection images. The approach is based on a state-of-the-art image generator network that is modified to fit the specific application. Furthermore, we propose the inclusion of a gradient map in the loss function to allow the network to emphasize high-frequency details in image generation. Our approach is capable of creating X-ray projection images with natural appearance. Additionally, our extensions show clear improvement compared to the baseline method.
Tasks	Image Enhancement, Image Generation, Image-to-Image Translation
Published	2018-04-11
URL	https://arxiv.org/abs/1804.03955v2
PDF	https://arxiv.org/pdf/1804.03955v2.pdf
PWC	https://paperswithcode.com/paper/projection-image-to-image-translation-in
Repo
Framework

Fast Hyperparameter Optimization of Deep Neural Networks via Ensembling Multiple Surrogates


Title	Fast Hyperparameter Optimization of Deep Neural Networks via Ensembling Multiple Surrogates
Authors	Yang Li, Jiawei Jiang, Yingxia Shao, Bin Cui
Abstract	The performance of deep neural networks crucially depends on good hyperparameter configurations. Bayesian optimization is a powerful framework for optimizing the hyperparameters of DNNs. These methods need sufficient evaluation data to approximate and minimize the validation error function of hyperparameters. However, the expensive evaluation cost of DNNs leads to very few evaluation data within a limited time, which greatly reduces the efficiency of Bayesian optimization. Besides, the previous researches focus on using the complete evaluation data to conduct Bayesian optimization, and ignore the intermediate evaluation data generated by early stopping methods. To alleviate the insufficient evaluation data problem, we propose a fast hyperparameter optimization method, HOIST, that utilizes both the complete and intermediate evaluation data to accelerate the hyperparameter optimization of DNNs. Specifically, we train multiple basic surrogates to gather information from the mixed evaluation data, and then combine all basic surrogates using weighted bagging to provide an accurate ensemble surrogate. Our empirical studies show that HOIST outperforms the state-of-the-art approaches on a wide range of DNNs, including feed forward neural networks, convolutional neural networks, recurrent neural networks, and variational autoencoder.
Tasks	Hyperparameter Optimization
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02319v2
PDF	http://arxiv.org/pdf/1811.02319v2.pdf
PWC	https://paperswithcode.com/paper/fast-hyperparameter-optimization-of-deep
Repo
Framework

A Retinal Image Enhancement Technique for Blood Vessel Segmentation Algorithm


Title	A Retinal Image Enhancement Technique for Blood Vessel Segmentation Algorithm
Authors	A. M. R. R. Bandara, P. W. G. R. M. P. B. Giragama
Abstract	The morphology of blood vessels in retinal fundus images is an important indicator of diseases like glaucoma, hypertension and diabetic retinopathy. The accuracy of retinal blood vessels segmentation affects the quality of retinal image analysis which is used in diagnosis methods in modern ophthalmology. Contrast enhancement is one of the crucial steps in any of retinal blood vessel segmentation approaches. The reliability of the segmentation depends on the consistency of the contrast over the image. This paper presents an assessment of the suitability of a recently invented spatially adaptive contrast enhancement technique for enhancing retinal fundus images for blood vessel segmentation. The enhancement technique was integrated with a variant of Tyler Coye algorithm, which has been improved with Hough line transformation based vessel reconstruction method. The proposed approach was evaluated on two public datasets STARE and DRIVE. The assessment was done by comparing the segmentation performance with five widely used contrast enhancement techniques based on wavelet transform, contrast limited histogram equalization, local normalization, linear un-sharp masking and contourlet transform. The results revealed that the assessed enhancement technique is well suited for the application and also outperforms all compared techniques.
Tasks	Image Enhancement
Published	2018-02-28
URL	http://arxiv.org/abs/1803.00036v1
PDF	http://arxiv.org/pdf/1803.00036v1.pdf
PWC	https://paperswithcode.com/paper/a-retinal-image-enhancement-technique-for
Repo
Framework

Predicting Rapid Fire Growth (Flashover) Using Conditional Generative Adversarial Networks


Title	Predicting Rapid Fire Growth (Flashover) Using Conditional Generative Adversarial Networks
Authors	Kyongsik Yun, Jessi Bustos, Thomas Lu
Abstract	A flashover occurs when a fire spreads very rapidly through crevices due to intense heat. Flashovers present one of the most frightening and challenging fire phenomena to those who regularly encounter them: firefighters. Firefighters’ safety and lives often depend on their ability to predict flashovers before they occur. Typical pre-flashover fire characteristics include dark smoke, high heat, and rollover (“angel fingers”) and can be quantified by color, size, and shape. Using a color video stream from a firefighter’s body camera, we applied generative adversarial neural networks for image enhancement. The neural networks were trained to enhance very dark fire and smoke patterns in videos and monitor dynamic changes in smoke and fire areas. Preliminary tests with limited flashover training videos showed that we predicted a flashover as early as 55 seconds before it occurred.
Tasks	Image Enhancement
Published	2018-01-30
URL	http://arxiv.org/abs/1801.09804v1
PDF	http://arxiv.org/pdf/1801.09804v1.pdf
PWC	https://paperswithcode.com/paper/predicting-rapid-fire-growth-flashover-using
Repo
Framework


Title	Multi-Task Zipping via Layer-wise Neuron Sharing
Authors	Xiaoxi He, Zimu Zhou, Lothar Thiele
Abstract	Future mobile devices are anticipated to perceive, understand and react to the world on their own by running multiple correlated deep neural networks on-device. Yet the complexity of these neural networks needs to be trimmed down both within-model and cross-model to fit in mobile storage and memory. Previous studies focus on squeezing the redundancy within a single neural network. In this work, we aim to reduce the redundancy across multiple models. We propose Multi-Task Zipping (MTZ), a framework to automatically merge correlated, pre-trained deep neural networks for cross-model compression. Central in MTZ is a layer-wise neuron sharing and incoming weight updating scheme that induces a minimal change in the error function. MTZ inherits information from each model and demands light retraining to re-boost the accuracy of individual tasks. Evaluations show that MTZ is able to fully merge the hidden layers of two VGG-16 networks with a 3.18% increase in the test error averaged on ImageNet and CelebA, or share 39.61% parameters between the two networks with <0.5% increase in the test errors for both tasks. The number of iterations to retrain the combined network is at least 17.8 times lower than that of training a single VGG-16 network. Moreover, experiments show that MTZ is also able to effectively merge multiple residual networks.
Tasks	Model Compression
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09791v2
PDF	http://arxiv.org/pdf/1805.09791v2.pdf
PWC	https://paperswithcode.com/paper/multi-task-zipping-via-layer-wise-neuron
Repo
Framework