October 21, 2019

3241 words 16 mins read

Paper Group AWR 96

Self-supervised Knowledge Distillation Using Singular Value Decomposition. Human Motion Prediction via Spatio-Temporal Inpainting. Real-world Noisy Image Denoising: A New Benchmark. Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation. A Survey on Compiler Autotuning using Machine Learning. Stochastic Adaptive Neural Architect …

Self-supervised Knowledge Distillation Using Singular Value Decomposition


Title	Self-supervised Knowledge Distillation Using Singular Value Decomposition
Authors	Seung Hyun Lee, Dae Ha Kim, Byung Cheol Song
Abstract	To solve deep neural network (DNN)‘s huge training dataset and its high computation issue, so-called teacher-student (T-S) DNN which transfers the knowledge of T-DNN to S-DNN has been proposed. However, the existing T-S-DNN has limited range of use, and the knowledge of T-DNN is insufficiently transferred to S-DNN. To improve the quality of the transferred knowledge from T-DNN, we propose a new knowledge distillation using singular value decomposition (SVD). In addition, we define a knowledge transfer as a self-supervised task and suggest a way to continuously receive information from T-DNN. Simulation results show that a S-DNN with a computational cost of 1/5 of the T-DNN can be up to 1.1% better than the T-DNN in terms of classification accuracy. Also assuming the same computational cost, our S-DNN outperforms the S-DNN driven by the state-of-the-art distillation with a performance advantage of 1.79%. code is available on https://github.com/sseung0703/SSKD_SVD.
Tasks	Transfer Learning
Published	2018-07-18
URL	http://arxiv.org/abs/1807.06819v1
PDF	http://arxiv.org/pdf/1807.06819v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-knowledge-distillation-using
Repo	https://github.com/wnma3mz/KD_Notes
Framework	tf

Human Motion Prediction via Spatio-Temporal Inpainting


Title	Human Motion Prediction via Spatio-Temporal Inpainting
Authors	Alejandro Hernandez Ruiz, Juergen Gall, Francesc Moreno-Noguer
Abstract	We propose a Generative Adversarial Network (GAN) to forecast 3D human motion given a sequence of past 3D skeleton poses. While recent GANs have shown promising results, they can only forecast plausible motion over relatively short periods of time (few hundred milliseconds) and typically ignore the absolute position of the skeleton w.r.t. the camera. Our scheme provides long term predictions (two seconds or more) for both the body pose and its absolute position. Our approach builds upon three main contributions. First, we represent the data using a spatio-temporal tensor of 3D skeleton coordinates which allows formulating the prediction problem as an inpainting one, for which GANs work particularly well. Secondly, we design an architecture to learn the joint distribution of body poses and global motion, capable to hypothesize large chunks of the input 3D tensor with missing data. And finally, we argue that the L2 metric, considered so far by most approaches, fails to capture the actual distribution of long-term human motion. We propose two alternative metrics, based on the distribution of frequencies, that are able to capture more realistic motion patterns. Extensive experiments demonstrate our approach to significantly improve the state of the art, while also handling situations in which past observations are corrupted by occlusions, noise and missing frames.
Tasks	Motion Forecasting, motion prediction
Published	2018-12-13
URL	https://arxiv.org/abs/1812.05478v2
PDF	https://arxiv.org/pdf/1812.05478v2.pdf
PWC	https://paperswithcode.com/paper/human-motion-prediction-via-spatio-temporal
Repo	https://github.com/magnux/MotionGAN
Framework	tf

Real-world Noisy Image Denoising: A New Benchmark


Title	Real-world Noisy Image Denoising: A New Benchmark
Authors	Jun Xu, Hui Li, Zhetong Liang, David Zhang, Lei Zhang
Abstract	Most of previous image denoising methods focus on additive white Gaussian noise (AWGN). However,the real-world noisy image denoising problem with the advancing of the computer vision techiniques. In order to promote the study on this problem while implementing the concurrent real-world image denoising datasets, we construct a new benchmark dataset which contains comprehensive real-world noisy images of different natural scenes. These images are captured by different cameras under different camera settings. We evaluate the different denoising methods on our new dataset as well as previous datasets. Extensive experimental results demonstrate that the recently proposed methods designed specifically for realistic noise removal based on sparse or low rank theories achieve better denoising performance and are more robust than other competing methods, and the newly proposed dataset is more challenging. The constructed dataset of real photographs is publicly available at \url{https://github.com/csjunxu/PolyUDataset} for researchers to investigate new real-world image denoising methods. We will add more analysis on the noise statistics in the real photographs of our new dataset in the next version of this article.
Tasks	Denoising, Image Denoising
Published	2018-04-07
URL	http://arxiv.org/abs/1804.02603v1
PDF	http://arxiv.org/pdf/1804.02603v1.pdf
PWC	https://paperswithcode.com/paper/real-world-noisy-image-denoising-a-new
Repo	https://github.com/csjunxu/PolyU-Real-World-Noisy-Images-Dataset
Framework	none

Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation


Title	Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation
Authors	Qiujia Li, Preben Ness, Anton Ragni, Mark Gales
Abstract	The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word. In the simplest case, these scores are word posterior probabilities whilst more complex schemes utilise bi-directional recurrent neural network (BiRNN) models. A number of upstream and downstream applications, however, rely on confidence scores assigned not only to 1-best hypotheses but to all words found in confusion networks or lattices. These include but are not limited to speaker adaptation, semi-supervised training and information retrieval. Although word posteriors could be used in those applications as confidence scores, they are known to have reliability issues. To make improved confidence scores more generally available, this paper shows how BiRNNs can be extended from 1-best sequences to confusion network and lattice structures. Experiments are conducted using one of the Cambridge University submissions to the IARPA OpenKWS 2016 competition. The results show that confusion network and lattice-based BiRNNs can provide a significant improvement in confidence estimation.
Tasks	Information Retrieval, Speech Recognition
Published	2018-10-30
URL	http://arxiv.org/abs/1810.13024v2
PDF	http://arxiv.org/pdf/1810.13024v2.pdf
PWC	https://paperswithcode.com/paper/bi-directional-lattice-recurrent-neural
Repo	https://github.com/alecokas/lattice_rnn
Framework	pytorch

A Survey on Compiler Autotuning using Machine Learning


Title	A Survey on Compiler Autotuning using Machine Learning
Authors	Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, Cristina Silvano
Abstract	Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.
Tasks
Published	2018-01-13
URL	http://arxiv.org/abs/1801.04405v5
PDF	http://arxiv.org/pdf/1801.04405v5.pdf
PWC	https://paperswithcode.com/paper/a-survey-on-compiler-autotuning-using-machine
Repo	https://github.com/quepas/ReadingPublications
Framework	none

Stochastic Adaptive Neural Architecture Search for Keyword Spotting


Title	Stochastic Adaptive Neural Architecture Search for Keyword Spotting
Authors	Tom Véniat, Olivier Schwander, Ludovic Denoyer
Abstract	The problem of keyword spotting i.e. identifying keywords in a real-time audio stream is mainly solved by applying a neural network over successive sliding windows. Due to the difficulty of the task, baseline models are usually large, resulting in a high computational cost and energy consumption level. We propose a new method called SANAS (Stochastic Adaptive Neural Architecture Search) which is able to adapt the architecture of the neural network on-the-fly at inference time such that small architectures will be used when the stream is easy to process (silence, low noise, …) and bigger networks will be used when the task becomes more difficult. We show that this adaptive model can be learned end-to-end by optimizing a trade-off between the prediction performance and the average computational cost per unit of time. Experiments on the Speech Commands dataset show that this approach leads to a high recognition level while being much faster (and/or energy saving) than classical approaches where the network architecture is static.
Tasks	Keyword Spotting, Neural Architecture Search
Published	2018-11-16
URL	http://arxiv.org/abs/1811.06753v1
PDF	http://arxiv.org/pdf/1811.06753v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-adaptive-neural-architecture
Repo	https://github.com/TomVeniat/SANAS
Framework	pytorch

Rule Induction Partitioning Estimator


Title	Rule Induction Partitioning Estimator
Authors	Vincent Margot, Jean-Patrick Baudry, Frederic Guilloux, Olivier Wintenberger
Abstract	RIPE is a novel deterministic and easily understandable prediction algorithm developed for continuous and discrete ordered data. It infers a model, from a sample, to predict and to explain a real variable $Y$ given an input variable $X \in \mathcal X$ (features). The algorithm extracts a sparse set of hyperrectangles $\mathbf r \subset \mathcal X$, which can be thought of as rules of the form If-Then. This set is then turned into a partition of the features space $\mathcal X$ of which each cell is explained as a list of rules with satisfied their If conditions. The process of RIPE is illustrated on simulated datasets and its efficiency compared with that of other usual algorithms.
Tasks
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04602v1
PDF	http://arxiv.org/pdf/1807.04602v1.pdf
PWC	https://paperswithcode.com/paper/rule-induction-partitioning-estimator
Repo	https://github.com/VMargot/RIPE
Framework	none

Disc-aware Ensemble Network for Glaucoma Screening from Fundus Image


Title	Disc-aware Ensemble Network for Glaucoma Screening from Fundus Image
Authors	Huazhu Fu, Jun Cheng, Yanwu Xu, Changqing Zhang, Damon Wing Kee Wong, Jiang Liu, Xiaochun Cao
Abstract	Glaucoma is a chronic eye disease that leads to irreversible vision loss. Most of the existing automatic screening methods firstly segment the main structure, and subsequently calculate the clinical measurement for detection and screening of glaucoma. However, these measurement-based methods rely heavily on the segmentation accuracy, and ignore various visual features. In this paper, we introduce a deep learning technique to gain additional image-relevant information, and screen glaucoma from the fundus image directly. Specifically, a novel Disc-aware Ensemble Network (DENet) for automatic glaucoma screening is proposed, which integrates the deep hierarchical context of the global fundus image and the local optic disc region. Four deep streams on different levels and modules are respectively considered as global image stream, segmentation-guided network, local disc region stream, and disc polar transformation stream. Finally, the output probabilities of different streams are fused as the final screening result. The experiments on two glaucoma datasets (SCES and new SINDI datasets) show our method outperforms other state-of-the-art algorithms.
Tasks
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07549v1
PDF	http://arxiv.org/pdf/1805.07549v1.pdf
PWC	https://paperswithcode.com/paper/disc-aware-ensemble-network-for-glaucoma
Repo	https://github.com/HzFu/DENet_GlaucomaScreen
Framework	tf

Improving MMD-GAN Training with Repulsive Loss Function


Title	Improving MMD-GAN Training with Repulsive Loss Function
Authors	Wei Wang, Yuan Sun, Saman Halgamuge
Abstract	Generative adversarial nets (GANs) are widely used to learn the data sampling process and their performance may heavily depend on the loss functions, given a limited computational budget. This study revisits MMD-GAN that uses the maximum mean discrepancy (MMD) as the loss function for GAN and makes two contributions. First, we argue that the existing MMD loss function may discourage the learning of fine details in data as it attempts to contract the discriminator outputs of real data. To address this issue, we propose a repulsive loss function to actively learn the difference among the real data by simply rearranging the terms in MMD. Second, inspired by the hinge loss, we propose a bounded Gaussian kernel to stabilize the training of MMD-GAN with the repulsive loss function. The proposed methods are applied to the unsupervised image generation tasks on CIFAR-10, STL-10, CelebA, and LSUN bedroom datasets. Results show that the repulsive loss function significantly improves over the MMD loss at no additional computational cost and outperforms other representative loss functions. The proposed methods achieve an FID score of 16.21 on the CIFAR-10 dataset using a single DCGAN network and spectral normalization.
Tasks	Image Generation
Published	2018-12-24
URL	http://arxiv.org/abs/1812.09916v4
PDF	http://arxiv.org/pdf/1812.09916v4.pdf
PWC	https://paperswithcode.com/paper/improving-mmd-gan-training-with-repulsive
Repo	https://github.com/richardwth/MMD-GAN
Framework	tf

A Probabilistic Disease Progression Model for Predicting Future Clinical Outcome


Title	A Probabilistic Disease Progression Model for Predicting Future Clinical Outcome
Authors	Yingying Zhu, Mert R. Sabuncu
Abstract	In this work, we consider the problem of predicting the course of a progressive disease, such as cancer or Alzheimer’s. Progressive diseases often start with mild symptoms that might precede a diagnosis, and each patient follows their own trajectory. Patient trajectories exhibit wild variability, which can be associated with many factors such as genotype, age, or sex. An additional layer of complexity is that, in real life, the amount and type of data available for each patient can differ significantly. For example, for one patient we might have no prior history, whereas for another patient we might have detailed clinical assessments obtained at multiple prior time-points. This paper presents a probabilistic model that can handle multiple modalities (including images and clinical assessments) and variable patient histories with irregular timings and missing entries, to predict clinical scores at future time-points. We use a sigmoidal function to model latent disease progression, which gives rise to clinical observations in our generative model. We implemented an approximate Bayesian inference strategy on the proposed model to estimate the parameters on data from a large population of subjects. Furthermore, the Bayesian framework enables the model to automatically fine-tune its predictions based on historical observations that might be available on the test subject. We applied our method to a longitudinal Alzheimer’s disease dataset with more than 3000 subjects [23] and present a detailed empirical analysis of prediction performance under different scenarios, with comparisons against several benchmarks. We also demonstrate how the proposed model can be interrogated to glean insights about temporal dynamics in Alzheimer’s disease.
Tasks	Bayesian Inference
Published	2018-03-13
URL	http://arxiv.org/abs/1803.05011v1
PDF	http://arxiv.org/pdf/1803.05011v1.pdf
PWC	https://paperswithcode.com/paper/a-probabilistic-disease-progression-model-for
Repo	https://github.com/zyy123jy/kdd
Framework	tf

Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation


Title	Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation
Authors	Florian Wenzel, Theo Galy-Fajou, Christan Donner, Marius Kloft, Manfred Opper
Abstract	We propose a scalable stochastic variational approach to GP classification building on Polya-Gamma data augmentation and inducing points. Unlike former approaches, we obtain closed-form updates based on natural gradients that lead to efficient optimization. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to two orders of magnitude faster than the state-of-the-art while being competitive in terms of prediction performance.
Tasks	Data Augmentation
Published	2018-02-18
URL	http://arxiv.org/abs/1802.06383v2
PDF	http://arxiv.org/pdf/1802.06383v2.pdf
PWC	https://paperswithcode.com/paper/efficient-gaussian-process-classification
Repo	https://github.com/UnofficialJuliaMirrorSnapshots/AugmentedGaussianProcesses.jl-38eea1fd-7d7d-5162-9d08-f89d0f2e271e
Framework	none

State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines


Title	State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines
Authors	Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
Abstract	In this paper we evaluate Optical Character Recognition (OCR) of 19th century Fraktur scripts without book-specific training using mixed models, i.e. models trained to recognize a variety of fonts and typesets from previously unseen sources. We describe the training process leading to strong mixed OCR models and compare them to freely available models of the popular open source engines OCRopus and Tesseract as well as the commercial state of the art system ABBYY. For evaluation, we use a varied collection of unseen data from books, journals, and a dictionary from the 19th century. The experiments show that training mixed models with real data is superior to training with synthetic data and that the novel OCR engine Calamari outperforms the other engines considerably, on average reducing ABBYYs character error rate (CER) by over 70%, resulting in an average CER below 1%.
Tasks	Optical Character Recognition
Published	2018-10-08
URL	http://arxiv.org/abs/1810.03436v1
PDF	http://arxiv.org/pdf/1810.03436v1.pdf
PWC	https://paperswithcode.com/paper/state-of-the-art-optical-character
Repo	https://github.com/chreul/19th-century-fraktur-OCR
Framework	none

Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin


Title	Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin
Authors	Uwe Springmann, Christian Reul, Stefanie Dipper, Johannes Baiter
Abstract	In this paper we describe a dataset of German and Latin \textit{ground truth} (GT) for historical OCR in the form of printed text line images paired with their transcription. This dataset, called \textit{GT4HistOCR}, consists of 313,173 line pairs covering a wide period of printing dates from incunabula from the 15th century to 19th century books printed in Fraktur types and is openly available under a CC-BY 4.0 license. The special form of GT as line image/transcription pairs makes it directly usable to train state-of-the-art recognition models for OCR software employing recurring neural networks in LSTM architecture such as Tesseract 4 or OCRopus. We also provide some pretrained OCRopus models for subcorpora of our dataset yielding between 95% (early printings) and 98% (19th century Fraktur printings) character accuracy rates on unseen test cases, a Perl script to harmonize GT produced by different transcription rules, and give hints on how to construct GT for OCR purposes which has requirements that may differ from linguistically motivated transcriptions.
Tasks	Optical Character Recognition
Published	2018-09-14
URL	http://arxiv.org/abs/1809.05501v1
PDF	http://arxiv.org/pdf/1809.05501v1.pdf
PWC	https://paperswithcode.com/paper/ground-truth-for-training-ocr-engines-on
Repo	https://github.com/chreul/19th-century-fraktur-OCR
Framework	none

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image


Title	Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image
Authors	Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu
Abstract	We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model. Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes. The proposed HSG captures three essential and often latent dimensions of the indoor scenes: i) latent human context, describing the affordance and the functionality of a room arrangement, ii) geometric constraints over the scene configurations, and iii) physical constraints that guarantee physically plausible parsing and reconstruction. We solve this joint parsing and reconstruction problem in an analysis-by-synthesis fashion, seeking to minimize the differences between the input image and the rendered images generated by our 3D representation, over the space of depth, surface normal, and object segmentation map. The optimal configuration, represented by a parse graph, is inferred using Markov chain Monte Carlo (MCMC), which efficiently traverses through the non-differentiable solution space, jointly optimizing object localization, 3D layout, and hidden human context. Experimental results demonstrate that the proposed algorithm improves the generalization ability and significantly outperforms prior methods on 3D layout estimation, 3D object detection, and holistic scene understanding.
Tasks	3D Object Detection, Object Detection, Object Localization, Scene Parsing, Scene Understanding, Semantic Segmentation
Published	2018-08-07
URL	http://arxiv.org/abs/1808.02201v1
PDF	http://arxiv.org/pdf/1808.02201v1.pdf
PWC	https://paperswithcode.com/paper/holistic-3d-scene-parsing-and-reconstruction
Repo	https://github.com/thusiyuan/holistic_scene_parsing
Framework	none

Tract orientation mapping for bundle-specific tractography


Title	Tract orientation mapping for bundle-specific tractography
Authors	Jakob Wasserthal, Peter F. Neher, Klaus H. Maier-Hein
Abstract	While the major white matter tracts are of great interest to numerous studies in neuroscience and medicine, their manual dissection in larger cohorts from diffusion MRI tractograms is time-consuming, requires expert knowledge and is hard to reproduce. Tract orientation mapping (TOM) is a novel concept that facilitates bundle-specific tractography based on a learned mapping from the original fiber orientation distribution function (fODF) peaks to a list of tract orientation maps (also abbr. TOM). Each TOM represents one of the known tracts with each voxel containing no more than one orientation vector. TOMs can act as a prior or even as direct input for tractography. We use an encoder-decoder fully-convolutional neural network architecture to learn the required mapping. In comparison to previous concepts for the reconstruction of specific bundles, the presented one avoids various cumbersome processing steps like whole brain tractography, atlas registration or clustering. We compare it to four state of the art bundle recognition methods on 20 different bundles in a total of 105 subjects from the Human Connectome Project. Results are anatomically convincing even for difficult tracts, while reaching low angular errors, unprecedented runtimes and top accuracy values (Dice). Our code and our data are openly available.
Tasks
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05580v1
PDF	http://arxiv.org/pdf/1806.05580v1.pdf
PWC	https://paperswithcode.com/paper/tract-orientation-mapping-for-bundle-specific
Repo	https://github.com/MIC-DKFZ/TractSeg
Framework	pytorch