October 21, 2019

3241 words 16 mins read

Paper Group AWR 96

Paper Group AWR 96

Self-supervised Knowledge Distillation Using Singular Value Decomposition. Human Motion Prediction via Spatio-Temporal Inpainting. Real-world Noisy Image Denoising: A New Benchmark. Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation. A Survey on Compiler Autotuning using Machine Learning. Stochastic Adaptive Neural Architect …

Self-supervised Knowledge Distillation Using Singular Value Decomposition

Title Self-supervised Knowledge Distillation Using Singular Value Decomposition
Authors Seung Hyun Lee, Dae Ha Kim, Byung Cheol Song
Abstract To solve deep neural network (DNN)‘s huge training dataset and its high computation issue, so-called teacher-student (T-S) DNN which transfers the knowledge of T-DNN to S-DNN has been proposed. However, the existing T-S-DNN has limited range of use, and the knowledge of T-DNN is insufficiently transferred to S-DNN. To improve the quality of the transferred knowledge from T-DNN, we propose a new knowledge distillation using singular value decomposition (SVD). In addition, we define a knowledge transfer as a self-supervised task and suggest a way to continuously receive information from T-DNN. Simulation results show that a S-DNN with a computational cost of 1/5 of the T-DNN can be up to 1.1% better than the T-DNN in terms of classification accuracy. Also assuming the same computational cost, our S-DNN outperforms the S-DNN driven by the state-of-the-art distillation with a performance advantage of 1.79%. code is available on https://github.com/sseung0703/SSKD_SVD.
Tasks Transfer Learning
Published 2018-07-18
URL http://arxiv.org/abs/1807.06819v1
PDF http://arxiv.org/pdf/1807.06819v1.pdf
PWC https://paperswithcode.com/paper/self-supervised-knowledge-distillation-using
Repo https://github.com/wnma3mz/KD_Notes
Framework tf

Human Motion Prediction via Spatio-Temporal Inpainting

Title Human Motion Prediction via Spatio-Temporal Inpainting
Authors Alejandro Hernandez Ruiz, Juergen Gall, Francesc Moreno-Noguer
Abstract We propose a Generative Adversarial Network (GAN) to forecast 3D human motion given a sequence of past 3D skeleton poses. While recent GANs have shown promising results, they can only forecast plausible motion over relatively short periods of time (few hundred milliseconds) and typically ignore the absolute position of the skeleton w.r.t. the camera. Our scheme provides long term predictions (two seconds or more) for both the body pose and its absolute position. Our approach builds upon three main contributions. First, we represent the data using a spatio-temporal tensor of 3D skeleton coordinates which allows formulating the prediction problem as an inpainting one, for which GANs work particularly well. Secondly, we design an architecture to learn the joint distribution of body poses and global motion, capable to hypothesize large chunks of the input 3D tensor with missing data. And finally, we argue that the L2 metric, considered so far by most approaches, fails to capture the actual distribution of long-term human motion. We propose two alternative metrics, based on the distribution of frequencies, that are able to capture more realistic motion patterns. Extensive experiments demonstrate our approach to significantly improve the state of the art, while also handling situations in which past observations are corrupted by occlusions, noise and missing frames.
Tasks Motion Forecasting, motion prediction
Published 2018-12-13
URL https://arxiv.org/abs/1812.05478v2
PDF https://arxiv.org/pdf/1812.05478v2.pdf
PWC https://paperswithcode.com/paper/human-motion-prediction-via-spatio-temporal
Repo https://github.com/magnux/MotionGAN
Framework tf

Real-world Noisy Image Denoising: A New Benchmark

Title Real-world Noisy Image Denoising: A New Benchmark
Authors Jun Xu, Hui Li, Zhetong Liang, David Zhang, Lei Zhang
Abstract Most of previous image denoising methods focus on additive white Gaussian noise (AWGN). However,the real-world noisy image denoising problem with the advancing of the computer vision techiniques. In order to promote the study on this problem while implementing the concurrent real-world image denoising datasets, we construct a new benchmark dataset which contains comprehensive real-world noisy images of different natural scenes. These images are captured by different cameras under different camera settings. We evaluate the different denoising methods on our new dataset as well as previous datasets. Extensive experimental results demonstrate that the recently proposed methods designed specifically for realistic noise removal based on sparse or low rank theories achieve better denoising performance and are more robust than other competing methods, and the newly proposed dataset is more challenging. The constructed dataset of real photographs is publicly available at \url{https://github.com/csjunxu/PolyUDataset} for researchers to investigate new real-world image denoising methods. We will add more analysis on the noise statistics in the real photographs of our new dataset in the next version of this article.
Tasks Denoising, Image Denoising
Published 2018-04-07
URL http://arxiv.org/abs/1804.02603v1
PDF http://arxiv.org/pdf/1804.02603v1.pdf
PWC https://paperswithcode.com/paper/real-world-noisy-image-denoising-a-new
Repo https://github.com/csjunxu/PolyU-Real-World-Noisy-Images-Dataset
Framework none

Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation

Title Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation
Authors Qiujia Li, Preben Ness, Anton Ragni, Mark Gales
Abstract The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word. In the simplest case, these scores are word posterior probabilities whilst more complex schemes utilise bi-directional recurrent neural network (BiRNN) models. A number of upstream and downstream applications, however, rely on confidence scores assigned not only to 1-best hypotheses but to all words found in confusion networks or lattices. These include but are not limited to speaker adaptation, semi-supervised training and information retrieval. Although word posteriors could be used in those applications as confidence scores, they are known to have reliability issues. To make improved confidence scores more generally available, this paper shows how BiRNNs can be extended from 1-best sequences to confusion network and lattice structures. Experiments are conducted using one of the Cambridge University submissions to the IARPA OpenKWS 2016 competition. The results show that confusion network and lattice-based BiRNNs can provide a significant improvement in confidence estimation.
Tasks Information Retrieval, Speech Recognition
Published 2018-10-30
URL http://arxiv.org/abs/1810.13024v2
PDF http://arxiv.org/pdf/1810.13024v2.pdf
PWC https://paperswithcode.com/paper/bi-directional-lattice-recurrent-neural
Repo https://github.com/alecokas/lattice_rnn
Framework pytorch

A Survey on Compiler Autotuning using Machine Learning

Title A Survey on Compiler Autotuning using Machine Learning
Authors Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, Cristina Silvano
Abstract Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.
Tasks
Published 2018-01-13
URL http://arxiv.org/abs/1801.04405v5
PDF http://arxiv.org/pdf/1801.04405v5.pdf
PWC https://paperswithcode.com/paper/a-survey-on-compiler-autotuning-using-machine
Repo https://github.com/quepas/ReadingPublications
Framework none

Stochastic Adaptive Neural Architecture Search for Keyword Spotting

Title Stochastic Adaptive Neural Architecture Search for Keyword Spotting
Authors Tom Véniat, Olivier Schwander, Ludovic Denoyer
Abstract The problem of keyword spotting i.e. identifying keywords in a real-time audio stream is mainly solved by applying a neural network over successive sliding windows. Due to the difficulty of the task, baseline models are usually large, resulting in a high computational cost and energy consumption level. We propose a new method called SANAS (Stochastic Adaptive Neural Architecture Search) which is able to adapt the architecture of the neural network on-the-fly at inference time such that small architectures will be used when the stream is easy to process (silence, low noise, …) and bigger networks will be used when the task becomes more difficult. We show that this adaptive model can be learned end-to-end by optimizing a trade-off between the prediction performance and the average computational cost per unit of time. Experiments on the Speech Commands dataset show that this approach leads to a high recognition level while being much faster (and/or energy saving) than classical approaches where the network architecture is static.
Tasks Keyword Spotting, Neural Architecture Search
Published 2018-11-16
URL http://arxiv.org/abs/1811.06753v1
PDF http://arxiv.org/pdf/1811.06753v1.pdf
PWC https://paperswithcode.com/paper/stochastic-adaptive-neural-architecture
Repo https://github.com/TomVeniat/SANAS
Framework pytorch

Rule Induction Partitioning Estimator

Title Rule Induction Partitioning Estimator
Authors Vincent Margot, Jean-Patrick Baudry, Frederic Guilloux, Olivier Wintenberger
Abstract RIPE is a novel deterministic and easily understandable prediction algorithm developed for continuous and discrete ordered data. It infers a model, from a sample, to predict and to explain a real variable $Y$ given an input variable $X \in \mathcal X$ (features). The algorithm extracts a sparse set of hyperrectangles $\mathbf r \subset \mathcal X$, which can be thought of as rules of the form If-Then. This set is then turned into a partition of the features space $\mathcal X$ of which each cell is explained as a list of rules with satisfied their If conditions. The process of RIPE is illustrated on simulated datasets and its efficiency compared with that of other usual algorithms.
Tasks
Published 2018-07-12
URL http://arxiv.org/abs/1807.04602v1
PDF http://arxiv.org/pdf/1807.04602v1.pdf
PWC https://paperswithcode.com/paper/rule-induction-partitioning-estimator
Repo https://github.com/VMargot/RIPE
Framework none

Disc-aware Ensemble Network for Glaucoma Screening from Fundus Image

Title Disc-aware Ensemble Network for Glaucoma Screening from Fundus Image
Authors Huazhu Fu, Jun Cheng, Yanwu Xu, Changqing Zhang, Damon Wing Kee Wong, Jiang Liu, Xiaochun Cao
Abstract Glaucoma is a chronic eye disease that leads to irreversible vision loss. Most of the existing automatic screening methods firstly segment the main structure, and subsequently calculate the clinical measurement for detection and screening of glaucoma. However, these measurement-based methods rely heavily on the segmentation accuracy, and ignore various visual features. In this paper, we introduce a deep learning technique to gain additional image-relevant information, and screen glaucoma from the fundus image directly. Specifically, a novel Disc-aware Ensemble Network (DENet) for automatic glaucoma screening is proposed, which integrates the deep hierarchical context of the global fundus image and the local optic disc region. Four deep streams on different levels and modules are respectively considered as global image stream, segmentation-guided network, local disc region stream, and disc polar transformation stream. Finally, the output probabilities of different streams are fused as the final screening result. The experiments on two glaucoma datasets (SCES and new SINDI datasets) show our method outperforms other state-of-the-art algorithms.
Tasks
Published 2018-05-19
URL http://arxiv.org/abs/1805.07549v1
PDF http://arxiv.org/pdf/1805.07549v1.pdf
PWC https://paperswithcode.com/paper/disc-aware-ensemble-network-for-glaucoma
Repo https://github.com/HzFu/DENet_GlaucomaScreen
Framework tf

Improving MMD-GAN Training with Repulsive Loss Function

Title Improving MMD-GAN Training with Repulsive Loss Function
Authors Wei Wang, Yuan Sun, Saman Halgamuge
Abstract Generative adversarial nets (GANs) are widely used to learn the data sampling process and their performance may heavily depend on the loss functions, given a limited computational budget. This study revisits MMD-GAN that uses the maximum mean discrepancy (MMD) as the loss function for GAN and makes two contributions. First, we argue that the existing MMD loss function may discourage the learning of fine details in data as it attempts to contract the discriminator outputs of real data. To address this issue, we propose a repulsive loss function to actively learn the difference among the real data by simply rearranging the terms in MMD. Second, inspired by the hinge loss, we propose a bounded Gaussian kernel to stabilize the training of MMD-GAN with the repulsive loss function. The proposed methods are applied to the unsupervised image generation tasks on CIFAR-10, STL-10, CelebA, and LSUN bedroom datasets. Results show that the repulsive loss function significantly improves over the MMD loss at no additional computational cost and outperforms other representative loss functions. The proposed methods achieve an FID score of 16.21 on the CIFAR-10 dataset using a single DCGAN network and spectral normalization.
Tasks Image Generation
Published 2018-12-24
URL http://arxiv.org/abs/1812.09916v4
PDF http://arxiv.org/pdf/1812.09916v4.pdf
PWC https://paperswithcode.com/paper/improving-mmd-gan-training-with-repulsive
Repo https://github.com/richardwth/MMD-GAN
Framework tf

A Probabilistic Disease Progression Model for Predicting Future Clinical Outcome

Title A Probabilistic Disease Progression Model for Predicting Future Clinical Outcome
Authors Yingying Zhu, Mert R. Sabuncu
Abstract In this work, we consider the problem of predicting the course of a progressive disease, such as cancer or Alzheimer’s. Progressive diseases often start with mild symptoms that might precede a diagnosis, and each patient follows their own trajectory. Patient trajectories exhibit wild variability, which can be associated with many factors such as genotype, age, or sex. An additional layer of complexity is that, in real life, the amount and type of data available for each patient can differ significantly. For example, for one patient we might have no prior history, whereas for another patient we might have detailed clinical assessments obtained at multiple prior time-points. This paper presents a probabilistic model that can handle multiple modalities (including images and clinical assessments) and variable patient histories with irregular timings and missing entries, to predict clinical scores at future time-points. We use a sigmoidal function to model latent disease progression, which gives rise to clinical observations in our generative model. We implemented an approximate Bayesian inference strategy on the proposed model to estimate the parameters on data from a large population of subjects. Furthermore, the Bayesian framework enables the model to automatically fine-tune its predictions based on historical observations that might be available on the test subject. We applied our method to a longitudinal Alzheimer’s disease dataset with more than 3000 subjects [23] and present a detailed empirical analysis of prediction performance under different scenarios, with comparisons against several benchmarks. We also demonstrate how the proposed model can be interrogated to glean insights about temporal dynamics in Alzheimer’s disease.
Tasks Bayesian Inference
Published 2018-03-13
URL http://arxiv.org/abs/1803.05011v1
PDF http://arxiv.org/pdf/1803.05011v1.pdf
PWC https://paperswithcode.com/paper/a-probabilistic-disease-progression-model-for
Repo https://github.com/zyy123jy/kdd
Framework tf

Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation

Title Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation
Authors Florian Wenzel, Theo Galy-Fajou, Christan Donner, Marius Kloft, Manfred Opper
Abstract We propose a scalable stochastic variational approach to GP classification building on Polya-Gamma data augmentation and inducing points. Unlike former approaches, we obtain closed-form updates based on natural gradients that lead to efficient optimization. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to two orders of magnitude faster than the state-of-the-art while being competitive in terms of prediction performance.
Tasks Data Augmentation
Published 2018-02-18
URL http://arxiv.org/abs/1802.06383v2
PDF http://arxiv.org/pdf/1802.06383v2.pdf
PWC https://paperswithcode.com/paper/efficient-gaussian-process-classification
Repo https://github.com/UnofficialJuliaMirrorSnapshots/AugmentedGaussianProcesses.jl-38eea1fd-7d7d-5162-9d08-f89d0f2e271e
Framework none

State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines

Title State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines
Authors Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
Abstract In this paper we evaluate Optical Character Recognition (OCR) of 19th century Fraktur scripts without book-specific training using mixed models, i.e. models trained to recognize a variety of fonts and typesets from previously unseen sources. We describe the training process leading to strong mixed OCR models and compare them to freely available models of the popular open source engines OCRopus and Tesseract as well as the commercial state of the art system ABBYY. For evaluation, we use a varied collection of unseen data from books, journals, and a dictionary from the 19th century. The experiments show that training mixed models with real data is superior to training with synthetic data and that the novel OCR engine Calamari outperforms the other engines considerably, on average reducing ABBYYs character error rate (CER) by over 70%, resulting in an average CER below 1%.
Tasks Optical Character Recognition
Published 2018-10-08
URL http://arxiv.org/abs/1810.03436v1
PDF http://arxiv.org/pdf/1810.03436v1.pdf
PWC https://paperswithcode.com/paper/state-of-the-art-optical-character
Repo https://github.com/chreul/19th-century-fraktur-OCR
Framework none

Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin

Title Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin
Authors Uwe Springmann, Christian Reul, Stefanie Dipper, Johannes Baiter
Abstract In this paper we describe a dataset of German and Latin \textit{ground truth} (GT) for historical OCR in the form of printed text line images paired with their transcription. This dataset, called \textit{GT4HistOCR}, consists of 313,173 line pairs covering a wide period of printing dates from incunabula from the 15th century to 19th century books printed in Fraktur types and is openly available under a CC-BY 4.0 license. The special form of GT as line image/transcription pairs makes it directly usable to train state-of-the-art recognition models for OCR software employing recurring neural networks in LSTM architecture such as Tesseract 4 or OCRopus. We also provide some pretrained OCRopus models for subcorpora of our dataset yielding between 95% (early printings) and 98% (19th century Fraktur printings) character accuracy rates on unseen test cases, a Perl script to harmonize GT produced by different transcription rules, and give hints on how to construct GT for OCR purposes which has requirements that may differ from linguistically motivated transcriptions.
Tasks Optical Character Recognition
Published 2018-09-14
URL http://arxiv.org/abs/1809.05501v1
PDF http://arxiv.org/pdf/1809.05501v1.pdf
PWC https://paperswithcode.com/paper/ground-truth-for-training-ocr-engines-on
Repo https://github.com/chreul/19th-century-fraktur-OCR
Framework none

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

Title Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image
Authors Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu
Abstract We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model. Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes. The proposed HSG captures three essential and often latent dimensions of the indoor scenes: i) latent human context, describing the affordance and the functionality of a room arrangement, ii) geometric constraints over the scene configurations, and iii) physical constraints that guarantee physically plausible parsing and reconstruction. We solve this joint parsing and reconstruction problem in an analysis-by-synthesis fashion, seeking to minimize the differences between the input image and the rendered images generated by our 3D representation, over the space of depth, surface normal, and object segmentation map. The optimal configuration, represented by a parse graph, is inferred using Markov chain Monte Carlo (MCMC), which efficiently traverses through the non-differentiable solution space, jointly optimizing object localization, 3D layout, and hidden human context. Experimental results demonstrate that the proposed algorithm improves the generalization ability and significantly outperforms prior methods on 3D layout estimation, 3D object detection, and holistic scene understanding.
Tasks 3D Object Detection, Object Detection, Object Localization, Scene Parsing, Scene Understanding, Semantic Segmentation
Published 2018-08-07
URL http://arxiv.org/abs/1808.02201v1
PDF http://arxiv.org/pdf/1808.02201v1.pdf
PWC https://paperswithcode.com/paper/holistic-3d-scene-parsing-and-reconstruction
Repo https://github.com/thusiyuan/holistic_scene_parsing
Framework none

Tract orientation mapping for bundle-specific tractography

Title Tract orientation mapping for bundle-specific tractography
Authors Jakob Wasserthal, Peter F. Neher, Klaus H. Maier-Hein
Abstract While the major white matter tracts are of great interest to numerous studies in neuroscience and medicine, their manual dissection in larger cohorts from diffusion MRI tractograms is time-consuming, requires expert knowledge and is hard to reproduce. Tract orientation mapping (TOM) is a novel concept that facilitates bundle-specific tractography based on a learned mapping from the original fiber orientation distribution function (fODF) peaks to a list of tract orientation maps (also abbr. TOM). Each TOM represents one of the known tracts with each voxel containing no more than one orientation vector. TOMs can act as a prior or even as direct input for tractography. We use an encoder-decoder fully-convolutional neural network architecture to learn the required mapping. In comparison to previous concepts for the reconstruction of specific bundles, the presented one avoids various cumbersome processing steps like whole brain tractography, atlas registration or clustering. We compare it to four state of the art bundle recognition methods on 20 different bundles in a total of 105 subjects from the Human Connectome Project. Results are anatomically convincing even for difficult tracts, while reaching low angular errors, unprecedented runtimes and top accuracy values (Dice). Our code and our data are openly available.
Tasks
Published 2018-06-14
URL http://arxiv.org/abs/1806.05580v1
PDF http://arxiv.org/pdf/1806.05580v1.pdf
PWC https://paperswithcode.com/paper/tract-orientation-mapping-for-bundle-specific
Repo https://github.com/MIC-DKFZ/TractSeg
Framework pytorch
comments powered by Disqus