Paper Group ANR 744
On the Bias-Variance Tradeoff: Textbooks Need an Update. Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum. Pipelined Training with Stale Weights of Deep Convolutional Neural Networks. Acceleration of the NVT-flash calculation for multicomponent mixtures using deep neural network models. Parameters Estimation for the …
On the Bias-Variance Tradeoff: Textbooks Need an Update
Title | On the Bias-Variance Tradeoff: Textbooks Need an Update |
Authors | Brady Neal |
Abstract | The main goal of this thesis is to point out that the bias-variance tradeoff is not always true (e.g. in neural networks). We advocate for this lack of universality to be acknowledged in textbooks and taught in introductory courses that cover the tradeoff. We first review the history of the bias-variance tradeoff, its prevalence in textbooks, and some of the main claims made about the bias-variance tradeoff. Through extensive experiments and analysis, we show a lack of a bias-variance tradeoff in neural networks when increasing network width. Our findings seem to contradict the claims of the landmark work by Geman et al. (1992). Motivated by this contradiction, we revisit the experimental measurements in Geman et al. (1992). We discuss that there was never strong evidence for a tradeoff in neural networks when varying the number of parameters. We observe a similar phenomenon beyond supervised learning, with a set of deep reinforcement learning experiments. We argue that textbook and lecture revisions are in order to convey this nuanced modern understanding of the bias-variance tradeoff. |
Tasks | |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.08286v1 |
https://arxiv.org/pdf/1912.08286v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-bias-variance-tradeoff-textbooks-need |
Repo | |
Framework | |
Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum
Title | Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum |
Authors | Yosuke Shinya, Edgar Simo-Serra, Taiji Suzuki |
Abstract | ImageNet pre-training has been regarded as essential for training accurate object detectors for a long time. Recently, it has been shown that object detectors trained from randomly initialized weights can be on par with those fine-tuned from ImageNet pre-trained models. However, the effects of pre-training and the differences caused by pre-training are still not fully understood. In this paper, we analyze the eigenspectrum dynamics of the covariance matrix of each feature map in object detectors. Based on our analysis on ResNet-50, Faster R-CNN with FPN, and Mask R-CNN, we show that object detectors trained from ImageNet pre-trained models and those trained from scratch behave differently from each other even if both object detectors have similar accuracy. Furthermore, we propose a method for automatically determining the widths (the numbers of channels) of object detectors based on the eigenspectrum. We train Faster R-CNN with FPN from randomly initialized weights, and show that our method can reduce ~27% of the parameters of ResNet-50 without increasing Multiply-Accumulate operations and losing accuracy. Our results indicate that we should develop more appropriate methods for transferring knowledge from image classification to object detection (or other tasks). |
Tasks | Image Classification, Object Detection |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.04021v1 |
https://arxiv.org/pdf/1909.04021v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-the-effects-of-pre-training-for |
Repo | |
Framework | |
Pipelined Training with Stale Weights of Deep Convolutional Neural Networks
Title | Pipelined Training with Stale Weights of Deep Convolutional Neural Networks |
Authors | Lifu Zhang, Tarek S. Abdelrahman |
Abstract | The growth in the complexity of Convolutional Neural Networks (CNNs) is increasing interest in partitioning a network across multiple accelerators during training and pipelining the backpropagation computations over the accelerators. Existing approaches avoid or limit the use of stale weights through techniques such as micro-batching or weight stashing. These techniques either underutilize of accelerators or increase memory footprint. We explore the impact of stale weights on the statistical efficiency and performance in a pipelined backpropagation scheme that maximizes accelerator utilization and keeps memory overhead modest. We use 4 CNNs (LeNet-5, AlexNet, VGG and ResNet) and show that when pipelining is limited to early layers in a network, training with stale weights converges and results in models with comparable inference accuracies to those resulting from non-pipelined training on MNIST and CIFAR-10 datasets; a drop in accuracy of 0.4%, 4%, 0.83% and 1.45% for the 4 networks, respectively. However, when pipelining is deeper in the network, inference accuracies drop significantly. We propose combining pipelined and non-pipelined training in a hybrid scheme to address this drop. We demonstrate the implementation and performance of our pipelined backpropagation in PyTorch on 2 GPUs using ResNet, achieving speedups of up to 1.8X over a 1-GPU baseline, with a small drop in inference accuracy. |
Tasks | |
Published | 2019-12-29 |
URL | https://arxiv.org/abs/1912.12675v1 |
https://arxiv.org/pdf/1912.12675v1.pdf | |
PWC | https://paperswithcode.com/paper/pipelined-training-with-stale-weights-of-deep-1 |
Repo | |
Framework | |
Acceleration of the NVT-flash calculation for multicomponent mixtures using deep neural network models
Title | Acceleration of the NVT-flash calculation for multicomponent mixtures using deep neural network models |
Authors | Yiteng Li, Tao Zhang, Shuyu Sun |
Abstract | Phase equilibrium calculation, also known as flash calculation, has been extensively applied in petroleum engineering, not only as a standalone application for separation process but also an integral component of compositional reservoir simulation. It is of vital importance to accelerate flash calculation without much compromise in accuracy and reliability, turning it into an active research topic in the last two decades. In this study, we establish a deep neural network model to approximate the iterative NVT-flash calculation. A dynamic model designed for NVT flash problems is iteratively solved to produce data for training the neural network. In order to test the model’s capacity to handle complex fluid mixtures, three real reservoir fluids are investigated, including one Bakken oil and two Eagle Ford oils. Compared to previous studies that follow the conventional flash framework in which stability testing precedes phase splitting calculation, we incorporate stability test and phase split calculation together and accomplish both two steps by a single deep learning model. The trained model is able to identify the single vapor, single liquid and vapor-liquid state under the subcritical region of the investigated fluids. A number of examples are presented to show the accuracy and efficiency of the proposed deep neural network. It is found that the trained model makes predictions at most 244 times faster than the iterative flash calculation under the given cases. Even though training a multi-level network model does take a large amount of time that is comparable to the computational time of flash calculations, the one-time offline training process gives the deep learning model great potential to speed up compositional reservoir simulation. |
Tasks | |
Published | 2019-01-27 |
URL | http://arxiv.org/abs/1901.09380v1 |
http://arxiv.org/pdf/1901.09380v1.pdf | |
PWC | https://paperswithcode.com/paper/acceleration-of-the-nvt-flash-calculation-for |
Repo | |
Framework | |
Parameters Estimation for the Cosmic Microwave Background with Bayesian Neural Networks
Title | Parameters Estimation for the Cosmic Microwave Background with Bayesian Neural Networks |
Authors | Hector J. Hortua, Riccardo Volpi, Dimitri Marinelli, Luigi Malagò |
Abstract | In this paper, we present the first study that compares different models of Bayesian Neural Networks (BNNs) to predict the posterior distribution of the cosmological parameters directly from the Cosmic Microwave Background temperature and polarization maps. We focus our analysis on four different methods to sample the weights of the network during training: Dropout, DropConnect, Reparameterization Trick (RT), and Flipout. We find out that Flipout outperforms all other methods regardless of the architecture used, and provides tighter constraints for the cosmological parameters. Additionally, we describe existing strategies for calibrating the networks and propose new ones. We show how tuning the regularization parameter for the scale of the approximate posterior on the weights in Flipout and RT we can produce unbiased and reliable uncertainty estimates, i.e., the regularizer acts as a hyper parameter analogous to the dropout rate in Dropout. The best performances are nevertheless achieved with a more convenient method, in which the network parameters are let free during training to achieve the best uncalibrated performances, and then the confidence intervals are calibrated in a subsequent phase. Furthermore, we claim that the correct calibration of these networks does not change the behavior for the aleatoric and epistemic uncertainties provided for BNNs when the size of the training dataset changes. The results reported in the paper can be extended to other cosmological datasets in order to capture features that can be extracted directly from the raw data, such as non-Gaussianity or foreground emissions. |
Tasks | Calibration |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08508v2 |
https://arxiv.org/pdf/1911.08508v2.pdf | |
PWC | https://paperswithcode.com/paper/parameters-estimation-for-the-cosmic |
Repo | |
Framework | |
Privacy Leakage Avoidance with Switching Ensembles
Title | Privacy Leakage Avoidance with Switching Ensembles |
Authors | Rauf Izmailov, Peter Lin, Chris Mesterharm, Samyadeep Basu |
Abstract | We consider membership inference attacks, one of the main privacy issues in machine learning. These recently developed attacks have been proven successful in determining, with confidence better than a random guess, whether a given sample belongs to the dataset on which the attacked machine learning model was trained. Several approaches have been developed to mitigate this privacy leakage but the tradeoff performance implications of these defensive mechanisms (i.e., accuracy and utility of the defended machine learning model) are not well studied yet. We propose a novel approach of privacy leakage avoidance with switching ensembles (PASE), which both protects against current membership inference attacks and does that with very small accuracy penalty, while requiring acceptable increase in training and inference time. We test our PASE method, along with the the current state-of-the-art PATE approach, on three calibration image datasets and analyze their tradeoffs. |
Tasks | Calibration |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07921v1 |
https://arxiv.org/pdf/1911.07921v1.pdf | |
PWC | https://paperswithcode.com/paper/privacy-leakage-avoidance-with-switching |
Repo | |
Framework | |
UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions
Title | UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions |
Authors | Aciel Eshky, Manuel Sam Ribeiro, Joanne Cleland, Korin Richmond, Zoe Roxburgh, James Scobbie, Alan Wrench |
Abstract | We introduce UltraSuite, a curated repository of ultrasound and acoustic data, collected from recordings of child speech therapy sessions. This release includes three data collections, one from typically developing children and two from children with speech sound disorders. In addition, it includes a set of annotations, some manual and some automatically produced, and software tools to process, transform and visualise the data. |
Tasks | |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00835v1 |
https://arxiv.org/pdf/1907.00835v1.pdf | |
PWC | https://paperswithcode.com/paper/ultrasuite-a-repository-of-ultrasound-and |
Repo | |
Framework | |
Mitigating Gender Bias in Natural Language Processing: Literature Review
Title | Mitigating Gender Bias in Natural Language Processing: Literature Review |
Authors | Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, William Yang Wang |
Abstract | As Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity, it becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. Although NLP models have shown success in modeling various applications, they propagate and may even amplify gender bias found in text corpora. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent. In this paper, we review contemporary studies on recognizing and mitigating gender bias in NLP. We discuss gender bias based on four forms of representation bias and analyze methods recognizing gender bias. Furthermore, we discuss the advantages and drawbacks of existing gender debiasing methods. Finally, we discuss future studies for recognizing and mitigating gender bias in NLP. |
Tasks | |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.08976v1 |
https://arxiv.org/pdf/1906.08976v1.pdf | |
PWC | https://paperswithcode.com/paper/mitigating-gender-bias-in-natural-language |
Repo | |
Framework | |
Knowledge distillation for optimization of quantized deep neural networks
Title | Knowledge distillation for optimization of quantized deep neural networks |
Authors | Sungho Shin, Yoonho Boo, Wonyong Sung |
Abstract | Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique is exploited for quantized deep neural networks (QDNNs) training as a way to restore the performance sacrificed by word-length reduction. KD, however, employs additional hyper-parameters, such as temperature, coefficient, and the size of teacher network for QDNN training. We analyze the effect of these hyper-parameters for QDNN optimization with KD. We find that these hyper-parameters are inter-related, and also introduce a simple and effective technique that reduces \textit{coefficient} during training. With KD employing the proposed hyper-parameters, we achieve the test accuracy of 92.7% and 67.0% on Resnet20 with 2-bit ternary weights for CIFAR-10 and CIFAR-100 data sets, respectively. |
Tasks | |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01688v3 |
https://arxiv.org/pdf/1909.01688v3.pdf | |
PWC | https://paperswithcode.com/paper/empirical-analysis-of-knowledge-distillation |
Repo | |
Framework | |
Fairness through Equality of Effort
Title | Fairness through Equality of Effort |
Authors | Wen Huang, Yongkai Wu, Lu Zhang, Xintao Wu |
Abstract | Fair machine learning is receiving an increasing attention in machine learning fields. Researchers in fair learning have developed correlation or association-based measures such as demographic disparity, mistreatment disparity, calibration, causal-based measures such as total effect, direct and indirect discrimination, and counterfactual fairness, and fairness notions such as equality of opportunity and equal odds that consider both decisions in the training data and decisions made by predictive models. In this paper, we develop a new causal-based fairness notation, called equality of effort. Different from existing fairness notions which mainly focus on discovering the disparity of decisions between two groups of individuals, the proposed equality of effort notation helps answer questions like to what extend a legitimate variable should change to make a particular individual achieve a certain outcome level and addresses the concerns whether the efforts made to achieve the same outcome level for individuals from the protected group and that from the unprotected group are different. We develop algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort. We also develop an optimization-based method for removing discriminatory effects from the data if discrimination is detected. We conduct empirical evaluations to compare the equality of effort and existing fairness notion and show the effectiveness of our proposed algorithms. |
Tasks | Calibration |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.08292v1 |
https://arxiv.org/pdf/1911.08292v1.pdf | |
PWC | https://paperswithcode.com/paper/fairness-through-equality-of-effort |
Repo | |
Framework | |
A Provably Correct and Robust Algorithm for Convolutive Nonnegative Matrix Factorization
Title | A Provably Correct and Robust Algorithm for Convolutive Nonnegative Matrix Factorization |
Authors | Anthony Degleris, Nicolas Gillis |
Abstract | In this paper, we propose a provably correct algorithm for convolutive nonnegative matrix factorization (CNMF) under separability assumptions. CNMF is a convolutive variant of nonnegative matrix factorization (NMF), which functions as an NMF with additional sequential structure. This model is useful in a number of applications, such as audio source separation and neural sequence identification. While a number of heuristic algorithms have been proposed to solve CNMF, to the best of our knowledge no provably correct algorithms have been developed. We present an algorithm that takes advantage of the NMF model underlying CNMF and exploits existing algorithms for separable NMF to provably find a solution under certain conditions. Our approach guarantees the solution in low noise settings, and runs in polynomial time. We illustrate its effectiveness on synthetic datasets, and on a singing bird audio sequence. |
Tasks | |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.06899v4 |
https://arxiv.org/pdf/1906.06899v4.pdf | |
PWC | https://paperswithcode.com/paper/a-provably-correct-and-robust-algorithm-for |
Repo | |
Framework | |
Average-case Analysis of the Assignment Problem with Independent Preferences
Title | Average-case Analysis of the Assignment Problem with Independent Preferences |
Authors | Yansong Gao, Jie Zhang |
Abstract | The fundamental assignment problem is in search of welfare maximization mechanisms to allocate items to agents when the private preferences over indivisible items are provided by self-interested agents. The mainstream mechanism \textit{Random Priority} is asymptotically the best mechanism for this purpose, when comparing its welfare to the optimal social welfare using the canonical \textit{worst-case approximation ratio}. Despite its popularity, the efficiency loss indicated by the worst-case ratio does not have a constant bound. Recently, [Deng, Gao, Zhang 2017] show that when the agents’ preferences are drawn from a uniform distribution, its \textit{average-case approximation ratio} is upper bounded by 3.718. They left it as an open question of whether a constant ratio holds for general scenarios. In this paper, we offer an affirmative answer to this question by showing that the ratio is bounded by $1/\mu$ when the preference values are independent and identically distributed random variables, where $\mu$ is the expectation of the value distribution. This upper bound also improves the upper bound of 3.718 in [Deng, Gao, Zhang 2017] for the Uniform distribution. Moreover, under mild conditions, the ratio has a \textit{constant} bound for any independent random values. En route to these results, we develop powerful tools to show the insights that in most instances the efficiency loss is small. |
Tasks | |
Published | 2019-06-01 |
URL | https://arxiv.org/abs/1906.00182v1 |
https://arxiv.org/pdf/1906.00182v1.pdf | |
PWC | https://paperswithcode.com/paper/190600182 |
Repo | |
Framework | |
Probabilistic hypergraph grammars for efficient molecular optimization
Title | Probabilistic hypergraph grammars for efficient molecular optimization |
Authors | Egor Kraev, Mark Harley |
Abstract | We present an approach to make molecular optimization more efficient. We infer a hypergraph replacement grammar from the ChEMBL database, count the frequencies of particular rules being used to expand particular nonterminals in other rules, and use these as conditional priors for the policy model. Simulating random molecules from the resulting probabilistic grammar, we show that conditional priors result in a molecular distribution closer to the training set than using equal rule probabilities or unconditional priors. We then treat molecular optimization as a reinforcement learning problem, using a novel modification of the policy gradient algorithm - batch-advantage: using individual rewards minus the batch average reward to weight the log probability loss. The reinforcement learning agent is tasked with building molecules using this grammar, with the goal of maximizing benchmark scores available from the literature. To do so, the agent has policies both to choose the next node in the graph to expand and to select the next grammar rule to apply. The policies are implemented using the Transformer architecture with the partially expanded graph as the input at each step. We show that using the empirical priors as the starting point for a policy eliminates the need for pre-training, and allows us to reach optima faster. We achieve competitive performance on common benchmarks from the literature, such as penalized logP and QED, with only hundreds of training steps on a budget GPU instance. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01845v1 |
https://arxiv.org/pdf/1906.01845v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-hypergraph-grammars-for |
Repo | |
Framework | |
Mini Lesions Detection on Diabetic Retinopathy Images via Large Scale CNN Features
Title | Mini Lesions Detection on Diabetic Retinopathy Images via Large Scale CNN Features |
Authors | Qilei Chen, Xinzi Sun, Ning Zhang, Yu Cao, Benyuan Liu |
Abstract | Diabetic retinopathy (DR) is a diabetes complication that affects eyes. DR is a primary cause of blindness in working-age people and it is estimated that 3 to 4 million people with diabetes are blinded by DR every year worldwide. Early diagnosis have been considered an effective way to mitigate such problem. The ultimate goal of our research is to develop novel machine learning techniques to analyze the DR images generated by the fundus camera for automatically DR diagnosis. In this paper, we focus on identifying small lesions on DR fundus images. The results from our analysis, which include the lesion category and their exact locations in the image, can be used to facilitate the determination of DR severity (indicated by DR stages). Different from traditional object detection for natural images, lesion detection for fundus images have unique challenges. Specifically, the size of a lesion instance is usually very small, compared with the original resolution of the fundus images, making them diffcult to be detected. We analyze the lesion-vs-image scale carefully and propose a large-size feature pyramid network (LFPN) to preserve more image details for mini lesion instance detection. Our method includes an effective region proposal strategy to increase the sensitivity. The experimental results show that our proposed method is superior to the original feature pyramid network (FPN) method and Faster RCNN. |
Tasks | Object Detection |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08588v1 |
https://arxiv.org/pdf/1911.08588v1.pdf | |
PWC | https://paperswithcode.com/paper/mini-lesions-detection-on-diabetic |
Repo | |
Framework | |
Self-Driving Car Steering Angle Prediction Based on Image Recognition
Title | Self-Driving Car Steering Angle Prediction Based on Image Recognition |
Authors | Shuyang Du, Haoli Guo, Andrew Simpson |
Abstract | Self-driving vehicles have expanded dramatically over the last few years. Udacity has release a dataset containing, among other data, a set of images with the steering angle captured during driving. The Udacity challenge aimed to predict steering angle based on only the provided images. We explore two different models to perform high quality prediction of steering angles based on images using different deep learning techniques including Transfer Learning, 3D CNN, LSTM and ResNet. If the Udacity challenge was still ongoing, both of our models would have placed in the top ten of all entries. |
Tasks | Transfer Learning |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05440v1 |
https://arxiv.org/pdf/1912.05440v1.pdf | |
PWC | https://paperswithcode.com/paper/self-driving-car-steering-angle-prediction |
Repo | |
Framework | |