Paper Group AWR 96
Self-supervised Knowledge Distillation Using Singular Value Decomposition. Human Motion Prediction via Spatio-Temporal Inpainting. Real-world Noisy Image Denoising: A New Benchmark. Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation. A Survey on Compiler Autotuning using Machine Learning. Stochastic Adaptive Neural Architect …
Self-supervised Knowledge Distillation Using Singular Value Decomposition
Title | Self-supervised Knowledge Distillation Using Singular Value Decomposition |
Authors | Seung Hyun Lee, Dae Ha Kim, Byung Cheol Song |
Abstract | To solve deep neural network (DNN)‘s huge training dataset and its high computation issue, so-called teacher-student (T-S) DNN which transfers the knowledge of T-DNN to S-DNN has been proposed. However, the existing T-S-DNN has limited range of use, and the knowledge of T-DNN is insufficiently transferred to S-DNN. To improve the quality of the transferred knowledge from T-DNN, we propose a new knowledge distillation using singular value decomposition (SVD). In addition, we define a knowledge transfer as a self-supervised task and suggest a way to continuously receive information from T-DNN. Simulation results show that a S-DNN with a computational cost of 1/5 of the T-DNN can be up to 1.1% better than the T-DNN in terms of classification accuracy. Also assuming the same computational cost, our S-DNN outperforms the S-DNN driven by the state-of-the-art distillation with a performance advantage of 1.79%. code is available on https://github.com/sseung0703/SSKD_SVD. |
Tasks | Transfer Learning |
Published | 2018-07-18 |
URL | http://arxiv.org/abs/1807.06819v1 |
http://arxiv.org/pdf/1807.06819v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-knowledge-distillation-using |
Repo | https://github.com/wnma3mz/KD_Notes |
Framework | tf |
Human Motion Prediction via Spatio-Temporal Inpainting
Title | Human Motion Prediction via Spatio-Temporal Inpainting |
Authors | Alejandro Hernandez Ruiz, Juergen Gall, Francesc Moreno-Noguer |
Abstract | We propose a Generative Adversarial Network (GAN) to forecast 3D human motion given a sequence of past 3D skeleton poses. While recent GANs have shown promising results, they can only forecast plausible motion over relatively short periods of time (few hundred milliseconds) and typically ignore the absolute position of the skeleton w.r.t. the camera. Our scheme provides long term predictions (two seconds or more) for both the body pose and its absolute position. Our approach builds upon three main contributions. First, we represent the data using a spatio-temporal tensor of 3D skeleton coordinates which allows formulating the prediction problem as an inpainting one, for which GANs work particularly well. Secondly, we design an architecture to learn the joint distribution of body poses and global motion, capable to hypothesize large chunks of the input 3D tensor with missing data. And finally, we argue that the L2 metric, considered so far by most approaches, fails to capture the actual distribution of long-term human motion. We propose two alternative metrics, based on the distribution of frequencies, that are able to capture more realistic motion patterns. Extensive experiments demonstrate our approach to significantly improve the state of the art, while also handling situations in which past observations are corrupted by occlusions, noise and missing frames. |
Tasks | Motion Forecasting, motion prediction |
Published | 2018-12-13 |
URL | https://arxiv.org/abs/1812.05478v2 |
https://arxiv.org/pdf/1812.05478v2.pdf | |
PWC | https://paperswithcode.com/paper/human-motion-prediction-via-spatio-temporal |
Repo | https://github.com/magnux/MotionGAN |
Framework | tf |
Real-world Noisy Image Denoising: A New Benchmark
Title | Real-world Noisy Image Denoising: A New Benchmark |
Authors | Jun Xu, Hui Li, Zhetong Liang, David Zhang, Lei Zhang |
Abstract | Most of previous image denoising methods focus on additive white Gaussian noise (AWGN). However,the real-world noisy image denoising problem with the advancing of the computer vision techiniques. In order to promote the study on this problem while implementing the concurrent real-world image denoising datasets, we construct a new benchmark dataset which contains comprehensive real-world noisy images of different natural scenes. These images are captured by different cameras under different camera settings. We evaluate the different denoising methods on our new dataset as well as previous datasets. Extensive experimental results demonstrate that the recently proposed methods designed specifically for realistic noise removal based on sparse or low rank theories achieve better denoising performance and are more robust than other competing methods, and the newly proposed dataset is more challenging. The constructed dataset of real photographs is publicly available at \url{https://github.com/csjunxu/PolyUDataset} for researchers to investigate new real-world image denoising methods. We will add more analysis on the noise statistics in the real photographs of our new dataset in the next version of this article. |
Tasks | Denoising, Image Denoising |
Published | 2018-04-07 |
URL | http://arxiv.org/abs/1804.02603v1 |
http://arxiv.org/pdf/1804.02603v1.pdf | |
PWC | https://paperswithcode.com/paper/real-world-noisy-image-denoising-a-new |
Repo | https://github.com/csjunxu/PolyU-Real-World-Noisy-Images-Dataset |
Framework | none |
Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation
Title | Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation |
Authors | Qiujia Li, Preben Ness, Anton Ragni, Mark Gales |
Abstract | The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word. In the simplest case, these scores are word posterior probabilities whilst more complex schemes utilise bi-directional recurrent neural network (BiRNN) models. A number of upstream and downstream applications, however, rely on confidence scores assigned not only to 1-best hypotheses but to all words found in confusion networks or lattices. These include but are not limited to speaker adaptation, semi-supervised training and information retrieval. Although word posteriors could be used in those applications as confidence scores, they are known to have reliability issues. To make improved confidence scores more generally available, this paper shows how BiRNNs can be extended from 1-best sequences to confusion network and lattice structures. Experiments are conducted using one of the Cambridge University submissions to the IARPA OpenKWS 2016 competition. The results show that confusion network and lattice-based BiRNNs can provide a significant improvement in confidence estimation. |
Tasks | Information Retrieval, Speech Recognition |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.13024v2 |
http://arxiv.org/pdf/1810.13024v2.pdf | |
PWC | https://paperswithcode.com/paper/bi-directional-lattice-recurrent-neural |
Repo | https://github.com/alecokas/lattice_rnn |
Framework | pytorch |
A Survey on Compiler Autotuning using Machine Learning
Title | A Survey on Compiler Autotuning using Machine Learning |
Authors | Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, Cristina Silvano |
Abstract | Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field. |
Tasks | |
Published | 2018-01-13 |
URL | http://arxiv.org/abs/1801.04405v5 |
http://arxiv.org/pdf/1801.04405v5.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-compiler-autotuning-using-machine |
Repo | https://github.com/quepas/ReadingPublications |
Framework | none |
Stochastic Adaptive Neural Architecture Search for Keyword Spotting
Title | Stochastic Adaptive Neural Architecture Search for Keyword Spotting |
Authors | Tom Véniat, Olivier Schwander, Ludovic Denoyer |
Abstract | The problem of keyword spotting i.e. identifying keywords in a real-time audio stream is mainly solved by applying a neural network over successive sliding windows. Due to the difficulty of the task, baseline models are usually large, resulting in a high computational cost and energy consumption level. We propose a new method called SANAS (Stochastic Adaptive Neural Architecture Search) which is able to adapt the architecture of the neural network on-the-fly at inference time such that small architectures will be used when the stream is easy to process (silence, low noise, …) and bigger networks will be used when the task becomes more difficult. We show that this adaptive model can be learned end-to-end by optimizing a trade-off between the prediction performance and the average computational cost per unit of time. Experiments on the Speech Commands dataset show that this approach leads to a high recognition level while being much faster (and/or energy saving) than classical approaches where the network architecture is static. |
Tasks | Keyword Spotting, Neural Architecture Search |
Published | 2018-11-16 |
URL | http://arxiv.org/abs/1811.06753v1 |
http://arxiv.org/pdf/1811.06753v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-adaptive-neural-architecture |
Repo | https://github.com/TomVeniat/SANAS |
Framework | pytorch |
Rule Induction Partitioning Estimator
Title | Rule Induction Partitioning Estimator |
Authors | Vincent Margot, Jean-Patrick Baudry, Frederic Guilloux, Olivier Wintenberger |
Abstract | RIPE is a novel deterministic and easily understandable prediction algorithm developed for continuous and discrete ordered data. It infers a model, from a sample, to predict and to explain a real variable $Y$ given an input variable $X \in \mathcal X$ (features). The algorithm extracts a sparse set of hyperrectangles $\mathbf r \subset \mathcal X$, which can be thought of as rules of the form If-Then. This set is then turned into a partition of the features space $\mathcal X$ of which each cell is explained as a list of rules with satisfied their If conditions. The process of RIPE is illustrated on simulated datasets and its efficiency compared with that of other usual algorithms. |
Tasks | |
Published | 2018-07-12 |
URL | http://arxiv.org/abs/1807.04602v1 |
http://arxiv.org/pdf/1807.04602v1.pdf | |
PWC | https://paperswithcode.com/paper/rule-induction-partitioning-estimator |
Repo | https://github.com/VMargot/RIPE |
Framework | none |
Disc-aware Ensemble Network for Glaucoma Screening from Fundus Image
Title | Disc-aware Ensemble Network for Glaucoma Screening from Fundus Image |
Authors | Huazhu Fu, Jun Cheng, Yanwu Xu, Changqing Zhang, Damon Wing Kee Wong, Jiang Liu, Xiaochun Cao |
Abstract | Glaucoma is a chronic eye disease that leads to irreversible vision loss. Most of the existing automatic screening methods firstly segment the main structure, and subsequently calculate the clinical measurement for detection and screening of glaucoma. However, these measurement-based methods rely heavily on the segmentation accuracy, and ignore various visual features. In this paper, we introduce a deep learning technique to gain additional image-relevant information, and screen glaucoma from the fundus image directly. Specifically, a novel Disc-aware Ensemble Network (DENet) for automatic glaucoma screening is proposed, which integrates the deep hierarchical context of the global fundus image and the local optic disc region. Four deep streams on different levels and modules are respectively considered as global image stream, segmentation-guided network, local disc region stream, and disc polar transformation stream. Finally, the output probabilities of different streams are fused as the final screening result. The experiments on two glaucoma datasets (SCES and new SINDI datasets) show our method outperforms other state-of-the-art algorithms. |
Tasks | |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07549v1 |
http://arxiv.org/pdf/1805.07549v1.pdf | |
PWC | https://paperswithcode.com/paper/disc-aware-ensemble-network-for-glaucoma |
Repo | https://github.com/HzFu/DENet_GlaucomaScreen |
Framework | tf |
Improving MMD-GAN Training with Repulsive Loss Function
Title | Improving MMD-GAN Training with Repulsive Loss Function |
Authors | Wei Wang, Yuan Sun, Saman Halgamuge |
Abstract | Generative adversarial nets (GANs) are widely used to learn the data sampling process and their performance may heavily depend on the loss functions, given a limited computational budget. This study revisits MMD-GAN that uses the maximum mean discrepancy (MMD) as the loss function for GAN and makes two contributions. First, we argue that the existing MMD loss function may discourage the learning of fine details in data as it attempts to contract the discriminator outputs of real data. To address this issue, we propose a repulsive loss function to actively learn the difference among the real data by simply rearranging the terms in MMD. Second, inspired by the hinge loss, we propose a bounded Gaussian kernel to stabilize the training of MMD-GAN with the repulsive loss function. The proposed methods are applied to the unsupervised image generation tasks on CIFAR-10, STL-10, CelebA, and LSUN bedroom datasets. Results show that the repulsive loss function significantly improves over the MMD loss at no additional computational cost and outperforms other representative loss functions. The proposed methods achieve an FID score of 16.21 on the CIFAR-10 dataset using a single DCGAN network and spectral normalization. |
Tasks | Image Generation |
Published | 2018-12-24 |
URL | http://arxiv.org/abs/1812.09916v4 |
http://arxiv.org/pdf/1812.09916v4.pdf | |
PWC | https://paperswithcode.com/paper/improving-mmd-gan-training-with-repulsive |
Repo | https://github.com/richardwth/MMD-GAN |
Framework | tf |
A Probabilistic Disease Progression Model for Predicting Future Clinical Outcome
Title | A Probabilistic Disease Progression Model for Predicting Future Clinical Outcome |
Authors | Yingying Zhu, Mert R. Sabuncu |
Abstract | In this work, we consider the problem of predicting the course of a progressive disease, such as cancer or Alzheimer’s. Progressive diseases often start with mild symptoms that might precede a diagnosis, and each patient follows their own trajectory. Patient trajectories exhibit wild variability, which can be associated with many factors such as genotype, age, or sex. An additional layer of complexity is that, in real life, the amount and type of data available for each patient can differ significantly. For example, for one patient we might have no prior history, whereas for another patient we might have detailed clinical assessments obtained at multiple prior time-points. This paper presents a probabilistic model that can handle multiple modalities (including images and clinical assessments) and variable patient histories with irregular timings and missing entries, to predict clinical scores at future time-points. We use a sigmoidal function to model latent disease progression, which gives rise to clinical observations in our generative model. We implemented an approximate Bayesian inference strategy on the proposed model to estimate the parameters on data from a large population of subjects. Furthermore, the Bayesian framework enables the model to automatically fine-tune its predictions based on historical observations that might be available on the test subject. We applied our method to a longitudinal Alzheimer’s disease dataset with more than 3000 subjects [23] and present a detailed empirical analysis of prediction performance under different scenarios, with comparisons against several benchmarks. We also demonstrate how the proposed model can be interrogated to glean insights about temporal dynamics in Alzheimer’s disease. |
Tasks | Bayesian Inference |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.05011v1 |
http://arxiv.org/pdf/1803.05011v1.pdf | |
PWC | https://paperswithcode.com/paper/a-probabilistic-disease-progression-model-for |
Repo | https://github.com/zyy123jy/kdd |
Framework | tf |
Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation
Title | Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation |
Authors | Florian Wenzel, Theo Galy-Fajou, Christan Donner, Marius Kloft, Manfred Opper |
Abstract | We propose a scalable stochastic variational approach to GP classification building on Polya-Gamma data augmentation and inducing points. Unlike former approaches, we obtain closed-form updates based on natural gradients that lead to efficient optimization. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to two orders of magnitude faster than the state-of-the-art while being competitive in terms of prediction performance. |
Tasks | Data Augmentation |
Published | 2018-02-18 |
URL | http://arxiv.org/abs/1802.06383v2 |
http://arxiv.org/pdf/1802.06383v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-gaussian-process-classification |
Repo | https://github.com/UnofficialJuliaMirrorSnapshots/AugmentedGaussianProcesses.jl-38eea1fd-7d7d-5162-9d08-f89d0f2e271e |
Framework | none |
State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines
Title | State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines |
Authors | Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe |
Abstract | In this paper we evaluate Optical Character Recognition (OCR) of 19th century Fraktur scripts without book-specific training using mixed models, i.e. models trained to recognize a variety of fonts and typesets from previously unseen sources. We describe the training process leading to strong mixed OCR models and compare them to freely available models of the popular open source engines OCRopus and Tesseract as well as the commercial state of the art system ABBYY. For evaluation, we use a varied collection of unseen data from books, journals, and a dictionary from the 19th century. The experiments show that training mixed models with real data is superior to training with synthetic data and that the novel OCR engine Calamari outperforms the other engines considerably, on average reducing ABBYYs character error rate (CER) by over 70%, resulting in an average CER below 1%. |
Tasks | Optical Character Recognition |
Published | 2018-10-08 |
URL | http://arxiv.org/abs/1810.03436v1 |
http://arxiv.org/pdf/1810.03436v1.pdf | |
PWC | https://paperswithcode.com/paper/state-of-the-art-optical-character |
Repo | https://github.com/chreul/19th-century-fraktur-OCR |
Framework | none |
Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin
Title | Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin |
Authors | Uwe Springmann, Christian Reul, Stefanie Dipper, Johannes Baiter |
Abstract | In this paper we describe a dataset of German and Latin \textit{ground truth} (GT) for historical OCR in the form of printed text line images paired with their transcription. This dataset, called \textit{GT4HistOCR}, consists of 313,173 line pairs covering a wide period of printing dates from incunabula from the 15th century to 19th century books printed in Fraktur types and is openly available under a CC-BY 4.0 license. The special form of GT as line image/transcription pairs makes it directly usable to train state-of-the-art recognition models for OCR software employing recurring neural networks in LSTM architecture such as Tesseract 4 or OCRopus. We also provide some pretrained OCRopus models for subcorpora of our dataset yielding between 95% (early printings) and 98% (19th century Fraktur printings) character accuracy rates on unseen test cases, a Perl script to harmonize GT produced by different transcription rules, and give hints on how to construct GT for OCR purposes which has requirements that may differ from linguistically motivated transcriptions. |
Tasks | Optical Character Recognition |
Published | 2018-09-14 |
URL | http://arxiv.org/abs/1809.05501v1 |
http://arxiv.org/pdf/1809.05501v1.pdf | |
PWC | https://paperswithcode.com/paper/ground-truth-for-training-ocr-engines-on |
Repo | https://github.com/chreul/19th-century-fraktur-OCR |
Framework | none |
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image
Title | Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image |
Authors | Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu |
Abstract | We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model. Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes. The proposed HSG captures three essential and often latent dimensions of the indoor scenes: i) latent human context, describing the affordance and the functionality of a room arrangement, ii) geometric constraints over the scene configurations, and iii) physical constraints that guarantee physically plausible parsing and reconstruction. We solve this joint parsing and reconstruction problem in an analysis-by-synthesis fashion, seeking to minimize the differences between the input image and the rendered images generated by our 3D representation, over the space of depth, surface normal, and object segmentation map. The optimal configuration, represented by a parse graph, is inferred using Markov chain Monte Carlo (MCMC), which efficiently traverses through the non-differentiable solution space, jointly optimizing object localization, 3D layout, and hidden human context. Experimental results demonstrate that the proposed algorithm improves the generalization ability and significantly outperforms prior methods on 3D layout estimation, 3D object detection, and holistic scene understanding. |
Tasks | 3D Object Detection, Object Detection, Object Localization, Scene Parsing, Scene Understanding, Semantic Segmentation |
Published | 2018-08-07 |
URL | http://arxiv.org/abs/1808.02201v1 |
http://arxiv.org/pdf/1808.02201v1.pdf | |
PWC | https://paperswithcode.com/paper/holistic-3d-scene-parsing-and-reconstruction |
Repo | https://github.com/thusiyuan/holistic_scene_parsing |
Framework | none |
Tract orientation mapping for bundle-specific tractography
Title | Tract orientation mapping for bundle-specific tractography |
Authors | Jakob Wasserthal, Peter F. Neher, Klaus H. Maier-Hein |
Abstract | While the major white matter tracts are of great interest to numerous studies in neuroscience and medicine, their manual dissection in larger cohorts from diffusion MRI tractograms is time-consuming, requires expert knowledge and is hard to reproduce. Tract orientation mapping (TOM) is a novel concept that facilitates bundle-specific tractography based on a learned mapping from the original fiber orientation distribution function (fODF) peaks to a list of tract orientation maps (also abbr. TOM). Each TOM represents one of the known tracts with each voxel containing no more than one orientation vector. TOMs can act as a prior or even as direct input for tractography. We use an encoder-decoder fully-convolutional neural network architecture to learn the required mapping. In comparison to previous concepts for the reconstruction of specific bundles, the presented one avoids various cumbersome processing steps like whole brain tractography, atlas registration or clustering. We compare it to four state of the art bundle recognition methods on 20 different bundles in a total of 105 subjects from the Human Connectome Project. Results are anatomically convincing even for difficult tracts, while reaching low angular errors, unprecedented runtimes and top accuracy values (Dice). Our code and our data are openly available. |
Tasks | |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05580v1 |
http://arxiv.org/pdf/1806.05580v1.pdf | |
PWC | https://paperswithcode.com/paper/tract-orientation-mapping-for-bundle-specific |
Repo | https://github.com/MIC-DKFZ/TractSeg |
Framework | pytorch |