Paper Group ANR 409
Analysis Of Momentum Methods. A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm. Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM. High-dimensional semi-supervised learning: in search for optimal inference of the mean. Quality analysis of DCGAN-generated mammography lesions. Learning Str …
Analysis Of Momentum Methods
Title | Analysis Of Momentum Methods |
Authors | Nikola B. Kovachki, Andrew M. Stuart |
Abstract | Gradient decent-based optimization methods underpin the parameter training which results in the impressive results now found when testing neural networks. Introducing stochasticity is key to their success in practical problems, and there is some understanding of the role of stochastic gradient decent in this context. Momentum modifications of gradient decent such as Polyak’s Heavy Ball method (HB) and Nesterov’s method of accelerated gradients (NAG), are widely adopted. In this work, our focus is on understanding the role of momentum in the training of neural networks, concentrating on the common situation in which the momentum contribution is fixed at each step of the algorithm; to expose the ideas simply we work in the deterministic setting. We show that, contrary to popular belief, standard implementations of fixed momentum methods do no more than act to rescale the learning rate. We achieve this by showing that the momentum method converges to a gradient flow, with a momentum-dependent time-rescaling, using the method of modified equations from numerical analysis. Further we show that the momentum method admits an exponentially attractive invariant manifold on which the dynamic reduces to a gradient flow with respect to a modified loss function, equal to the original one plus a small perturbation. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04285v1 |
https://arxiv.org/pdf/1906.04285v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-momentum-methods |
Repo | |
Framework | |
A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm
Title | A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm |
Authors | Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael I. Jordan |
Abstract | In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors. |
Tasks | |
Published | 2019-02-11 |
URL | http://arxiv.org/abs/1902.03736v1 |
http://arxiv.org/pdf/1902.03736v1.pdf | |
PWC | https://paperswithcode.com/paper/a-short-note-on-concentration-inequalities |
Repo | |
Framework | |
Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM
Title | Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM |
Authors | Sheng Lin, Xiaolong Ma, Shaokai Ye, Geng Yuan, Kaisheng Ma, Yanzhi Wang |
Abstract | Weight quantization is one of the most important techniques of Deep Neural Networks (DNNs) model compression method. A recent work using systematic framework of DNN weight quantization with the advanced optimization algorithm ADMM (Alternating Direction Methods of Multipliers) achieves one of state-of-art results in weight quantization. In this work, we first extend such ADMM-based framework to guarantee solution feasibility and we have further developed a multi-step, progressive DNN weight quantization framework, with dual benefits of (i) achieving further weight quantization thanks to the special property of ADMM regularization, and (ii) reducing the search space within each step. Extensive experimental results demonstrate the superior performance compared with prior work. Some highlights: we derive the first lossless and fully binarized (for all layers) LeNet-5 for MNIST; And we derive the first fully binarized (for all layers) VGG-16 for CIFAR-10 and ResNet for ImageNet with reasonable accuracy loss. |
Tasks | Model Compression, Quantization |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.00789v1 |
https://arxiv.org/pdf/1905.00789v1.pdf | |
PWC | https://paperswithcode.com/paper/toward-extremely-low-bit-and-lossless |
Repo | |
Framework | |
High-dimensional semi-supervised learning: in search for optimal inference of the mean
Title | High-dimensional semi-supervised learning: in search for optimal inference of the mean |
Authors | Yuqian Zhang, Jelena Bradic |
Abstract | We provide a high-dimensional semi-supervised inference framework focused on the mean and variance of the response. Our data are comprised of an extensive set of observations regarding the covariate vectors and a much smaller set of labeled observations where we observe both the response as well as the covariates. We allow the size of the covariates to be much larger than the sample size and impose weak conditions on a statistical form of the data. We provide new estimators of the mean and variance of the response that extend some of the recent results presented in low-dimensional models. In particular, at times we will not necessitate consistent estimation of the functional form of the data. Together with estimation of the population mean and variance, we provide their asymptotic distribution and confidence intervals where we showcase gains in efficiency compared to the sample mean and variance. Our procedure, with minor modifications, is then presented to make important contributions regarding inference about average treatment effects. We also investigate the robustness of estimation and coverage and showcase widespread applicability and generality of the proposed method. |
Tasks | |
Published | 2019-02-02 |
URL | http://arxiv.org/abs/1902.00772v1 |
http://arxiv.org/pdf/1902.00772v1.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-semi-supervised-learning-in |
Repo | |
Framework | |
Quality analysis of DCGAN-generated mammography lesions
Title | Quality analysis of DCGAN-generated mammography lesions |
Authors | Basel Alyafi, Oliver Diaz, Joan C Vilanova, Javier del Riego, Robert Marti |
Abstract | Medical image synthesis has gained a great focus recently, especially after the introduction of Generative Adversarial Networks (GANs). GANs have been used widely to provide anatomically-plausible and diverse samples for augmentation and other applications, including segmentation and super resolution. In our previous work, Deep Convolutional GANs were used to generate synthetic mammogram lesions, masses mainly, that could enhance the classification performance in imbalanced datasets. In this new work, a deeper investigation was carried out to explore other aspects of the generated images evaluation, i.e., realism, feature space distribution, and observers studies. t-Stochastic Neighbor Embedding (t-SNE) was used to reduce the dimensionality of real and fake images to enable 2D visualisations. Additionally, two expert radiologists performed a realism-evaluation study. Visualisations showed that the generated images have a similar feature distribution of the real ones, avoiding outliers. Moreover, Receiver Operating Characteristic (ROC) curve showed that the radiologists could not, in many cases, distinguish between synthetic and real lesions, giving 48% and 61% accuracies in a balanced sample set. |
Tasks | Image Generation, Super-Resolution |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12850v2 |
https://arxiv.org/pdf/1911.12850v2.pdf | |
PWC | https://paperswithcode.com/paper/quality-analysis-of-dcgan-generated |
Repo | |
Framework | |
Learning Structured Twin-Incoherent Twin-Projective Latent Dictionary Pairs for Classification
Title | Learning Structured Twin-Incoherent Twin-Projective Latent Dictionary Pairs for Classification |
Authors | Zhao Zhang, Yulin Sun, Zheng Zhang, Yang Wang, Guangcan Liu, Meng Wang |
Abstract | In this paper, we extend the popular dictionary pair learning (DPL) into the scenario of twin-projective latent flexible DPL under a structured twin-incoherence. Technically, a novel framework called Twin-Projective Latent Flexible DPL (TP-DPL) is proposed, which minimizes the twin-incoherence constrained flexibly-relaxed reconstruction error to avoid the possible over-fitting issue and produce accurate reconstruction. In this setting, our TP-DPL integrates the twin-incoherence based latent flexible DPL and the joint embedding of codes as well as salient features by twin-projection into a unified model in an adaptive neighborhood-preserving manner. As a result, TP-DPL unifies the salient feature extraction, representation and classification. The twin-incoherence constraint on codes and features can explicitly ensure high intra-class compactness and inter-class separation over them. TP-DPL also integrates the adaptive weighting to preserve the local neighborhood of the coefficients and salient features within each class explicitly. For efficiency, TP-DPL uses Frobenius-norm and abandons the costly l0/l1-norm for group sparse representation. Another byproduct is that TP-DPL can directly apply the class-specific twin-projective reconstruction residual to compute the label of data. Extensive results on public databases show that TP-DPL can deliver the state-of-the-art performance. |
Tasks | |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.07878v1 |
https://arxiv.org/pdf/1908.07878v1.pdf | |
PWC | https://paperswithcode.com/paper/190807878 |
Repo | |
Framework | |
Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference
Title | Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference |
Authors | Peter Kraft, Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia |
Abstract | Systems for ML inference are widely deployed today, but they typically optimize ML inference workloads using techniques designed for conventional data serving workloads and miss critical opportunities to leverage the statistical nature of ML. In this paper, we present Willump, an optimizer for ML inference that introduces two statistically-motivated optimizations targeting ML applications whose performance bottleneck is feature computation. First, Willump automatically cascades feature computation for classification queries: Willump classifies most data inputs using only high-value, low-cost features selected through empirical observations of ML model performance, improving query performance by up to 5x without statistically significant accuracy loss. Second, Willump accurately approximates ML top-K queries, discarding low-scoring inputs with an automatically constructed approximate model and then ranking the remainder with a more powerful model, improving query performance by up to 10x with minimal accuracy loss. Willump automatically tunes these optimizations’ parameters to maximize query performance while meeting an accuracy target. Moreover, Willump complements these statistical optimizations with compiler optimizations to automatically generate fast inference code for ML applications. We show that Willump improves the end-to-end performance of real-world ML inference pipelines curated from major data science competitions by up to 16x without statistically significant loss of accuracy. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.01974v3 |
https://arxiv.org/pdf/1906.01974v3.pdf | |
PWC | https://paperswithcode.com/paper/willump-a-statistically-aware-end-to-end |
Repo | |
Framework | |
Biologic and Prognostic Feature Scores from Whole-Slide Histology Images Using Deep Learning
Title | Biologic and Prognostic Feature Scores from Whole-Slide Histology Images Using Deep Learning |
Authors | Okyaz Eminaga, Mahmood Abbas, Yuri Tolkach, Rosalie Nolley, Christian Kunder, Axel Semjonow, Martin Boegemann, Andreas Loening, James Brook, Daniel Rubin |
Abstract | Histopathology is a reflection of the molecular changes and provides prognostic phenotypes representing the disease progression. In this study, we introduced feature scores generated from hematoxylin and eosin histology images based on deep learning (DL) models developed for prostate pathology. We demonstrated that these feature scores were significantly prognostic for time to event endpoints (biochemical recurrence and cancer-specific survival) and had simultaneously molecular biologic associations to relevant genomic alterations and molecular subtypes using already trained DL models that were not previously exposed to the datasets of the current study. Further, we discussed the potential of such feature scores to improve the current tumor grading system and the challenges that are associated with tumor heterogeneity and the development of prognostic models from histology images. Our findings uncover the potential of feature scores from histology images as digital biomarkers in precision medicine and as an expanding utility for digital pathology. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09100v3 |
https://arxiv.org/pdf/1910.09100v3.pdf | |
PWC | https://paperswithcode.com/paper/biologic-and-prognostic-feature-scores-from |
Repo | |
Framework | |
A Constructive Prediction of the Generalization Error Across Scales
Title | A Constructive Prediction of the Generalization Error Across Scales |
Authors | Jonathan S. Rosenfeld, Amir Rosenfeld, Yonatan Belinkov, Nir Shavit |
Abstract | The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. Nevertheless, the functional form of this dependency remains elusive. In this work, we present a functional form which approximates well the generalization error in practice. Capitalizing on the successful concept of model scaling (e.g., width, depth), we are able to simultaneously construct such a form and specify the exact models which can attain it across model/data scales. Our construction follows insights obtained from observations conducted over a range of model/data scales, in various model types and datasets, in vision and language tasks. We show that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data. |
Tasks | |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1909.12673v2 |
https://arxiv.org/pdf/1909.12673v2.pdf | |
PWC | https://paperswithcode.com/paper/a-constructive-prediction-of-the |
Repo | |
Framework | |
FPGA-based Binocular Image Feature Extraction and Matching System
Title | FPGA-based Binocular Image Feature Extraction and Matching System |
Authors | Qi Ni, Fei Wang, Ziwei Zhao, Peng Gao |
Abstract | Image feature extraction and matching is a fundamental but computation intensive task in machine vision. This paper proposes a novel FPGA-based embedded system to accelerate feature extraction and matching. It implements SURF feature point detection and BRIEF feature descriptor construction and matching. For binocular stereo vision, feature matching includes both tracking matching and stereo matching, which simultaneously provide feature point correspondences and parallax information. Our system is evaluated on a ZYNQ XC7Z045 FPGA. The result demonstrates that it can process binocular video data at a high frame rate (640$\times$480 @ 162fps). Moreover, an extensive test proves our system has robustness for image compression, blurring and illumination. |
Tasks | Image Compression, Stereo Matching, Stereo Matching Hand |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.04890v2 |
https://arxiv.org/pdf/1905.04890v2.pdf | |
PWC | https://paperswithcode.com/paper/fpga-based-binocular-image-feature-extraction |
Repo | |
Framework | |
Nearest Neighbor Sampling of Point Sets using Random Rays
Title | Nearest Neighbor Sampling of Point Sets using Random Rays |
Authors | Liangchen Liu, Louis Ly, Colin Macdonald, Yen-Hsi Richard Tsai |
Abstract | We propose a new framework for the sampling, compression, and analysis of distributions of point sets and other geometric objects embedded in Euclidean spaces. A set of randomly selected rays are projected onto their closest points in the data set, forming the ray signature. From the signature, statistical information about the data set, as well as certain geometrical information, can be extracted, independent of the ray set. We present promising results from “RayNN”, a neural network for the classification of point clouds based on ray signatures. |
Tasks | |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.10737v2 |
https://arxiv.org/pdf/1911.10737v2.pdf | |
PWC | https://paperswithcode.com/paper/nearest-neighbor-sampling-of-point-sets-using |
Repo | |
Framework | |
SenseNet: Deep Learning based Wideband spectrum sensing and modulation classification network
Title | SenseNet: Deep Learning based Wideband spectrum sensing and modulation classification network |
Authors | Shivam Chandhok, Himani Joshi, A V Subramanyam, Sumit J. Darak |
Abstract | Next generation networks are expected to operate in licensed, shared as well as unlicensed spectrum to support spectrum demands of a wide variety of services.Due to shortage of radio spectrum, the need for communication systems(like cognitive radio) that can sense wideband spectrum and locate desired spectrum resources in real time has increased.Automatic modulation classifier (AMC) is an important part of wideband spectrum sensing (WSS) as it enables identification of incumbent users transmitting in the adjacent vacant spectrum.Most of the proposed AMC work on Nyquist samples which need to be further processed before they can be fed to the classifier.Working with Nyquist sampled signal demands high rate ADC and results in high power consumption and high sensing time which is unacceptable for next generation communication systems.To overcome this drawback we propose to use sub-nyquist sample based WSS and modulation classification. In this paper, we propose a novel architecture called SenseNet which combines the task of spectrum sensing and modulation classification into a single unified pipeline.The proposed method is endowed with the capability to perform blind WSS and modulation classification directly on raw sub-nyquist samples which reduces complexity and sensing time since no prior estimation of sparsity is required. We extensively compare the performance of our proposed method on WSS as well as modulation classification tasks for a wide range of modulation schemes, input datasets, and channel conditions.A significant drawback of using sub-nyquist samples is reduced performance compared to systems that employ nyquist sampled signal.However,we show that for the proposed method,the classification accuracy approaches to Nyquist sampling based deep learning AMC with an increase in signal to noise ratio. |
Tasks | |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05255v1 |
https://arxiv.org/pdf/1912.05255v1.pdf | |
PWC | https://paperswithcode.com/paper/sensenet-deep-learning-based-wideband |
Repo | |
Framework | |
Adaptive Noise Injection: A Structure-Expanding Regularization for RNN
Title | Adaptive Noise Injection: A Structure-Expanding Regularization for RNN |
Authors | Rui Li, Kai Shuang, Mengyu Gu, Sen Su |
Abstract | The vanilla LSTM has become one of the most potential architectures in word-level language modeling, like other recurrent neural networks, overfitting is always a key barrier for its effectiveness. The existing noise-injected regularizations introduce the random noises of fixation intensity, which inhibits the learning of the RNN throughout the training process. In this paper, we propose a new structure-expanding regularization method called Adjective Noise Injection (ANI), which considers the output of an extra RNN branch as a kind of adaptive noises and injects it into the main-branch RNN output. Due to the adaptive noises can be improved as the training processes, its negative effects can be weakened and even transformed into a positive effect to further improve the expressiveness of the main-branch RNN. As a result, ANI can regularize the RNN in the early stage of training and further promoting its training performance in the later stage. We conduct experiments on three widely-used corpora: PTB, WT2, and WT103, whose results verify both the regularization and promoting the training performance functions of ANI. Furthermore, we design a series simulation experiments to explore the reasons that may lead to the regularization effect of ANI, and we find that in training process, the robustness against the parameter update errors can be strengthened when the LSTM equipped with ANI. |
Tasks | Language Modelling |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.10885v1 |
https://arxiv.org/pdf/1907.10885v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-noise-injection-a-structure |
Repo | |
Framework | |
Catastrophic forgetting: still a problem for DNNs
Title | Catastrophic forgetting: still a problem for DNNs |
Authors | B. Pfülb, A. Gepperth, S. Abdullah, A. Kilian |
Abstract | We investigate the performance of DNNs when trained on class-incremental visual problems consisting of initial training, followed by retraining with added visual classes. Catastrophic forgetting (CF) behavior is measured using a new evaluation procedure that aims at an application-oriented view of incremental learning. In particular, it imposes that model selection must be performed on the initial dataset alone, as well as demanding that retraining control be performed only using the retraining dataset, as initial dataset is usually too large to be kept. Experiments are conducted on class-incremental problems derived from MNIST, using a variety of different DNN models, some of them recently proposed to avoid catastrophic forgetting. When comparing our new evaluation procedure to previous approaches for assessing CF, we find their findings are completely negated, and that none of the tested methods can avoid CF in all experiments. This stresses the importance of a realistic empirical measurement procedure for catastrophic forgetting, and the need for further research in incremental learning for DNNs. |
Tasks | Model Selection |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.08077v1 |
https://arxiv.org/pdf/1905.08077v1.pdf | |
PWC | https://paperswithcode.com/paper/catastrophic-forgetting-still-a-problem-for |
Repo | |
Framework | |
Predicting assisted ventilation in Amyotrophic Lateral Sclerosis using a mixture of experts and conformal predictors
Title | Predicting assisted ventilation in Amyotrophic Lateral Sclerosis using a mixture of experts and conformal predictors |
Authors | Telma Pereira, Sofia Pires, Marta Gromicho, Susana Pinto, Mamede de Carvalho, Sara C. Madeira |
Abstract | Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease characterized by a rapid motor decline, leading to respiratory failure and subsequently to death. In this context, researchers have sought for models to automatically predict disease progression to assisted ventilation in ALS patients. However, the clinical translation of such models is limited by the lack of insight 1) on the risk of error for predictions at patient-level, and 2) on the most adequate time to administer the non-invasive ventilation. To address these issues, we combine Conformal Prediction (a machine learning framework that complements predictions with confidence measures) and a mixture experts into a prognostic model which not only predicts whether an ALS patient will suffer from respiratory insufficiency but also the most likely time window of occurrence, at a given reliability level. Promising results were obtained, with near 80% of predictions being correctly identified. |
Tasks | |
Published | 2019-07-30 |
URL | https://arxiv.org/abs/1907.13070v1 |
https://arxiv.org/pdf/1907.13070v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-assisted-ventilation-in |
Repo | |
Framework | |