Paper Group ANR 373
A Deep Learning Approach to Diagnosing Multiple Sclerosis from Smartphone Data. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning. Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning. Distance-Based Regularisation of Deep Networks for Fine-Tuning. Imperialist Competitive Algorithm …
A Deep Learning Approach to Diagnosing Multiple Sclerosis from Smartphone Data
Title | A Deep Learning Approach to Diagnosing Multiple Sclerosis from Smartphone Data |
Authors | Patrick Schwab, Walter Karlen |
Abstract | Multiple sclerosis (MS) affects the central nervous system with a wide range of symptoms. MS can, for example, cause pain, changes in mood and fatigue, and may impair a person’s movement, speech and visual functions. Diagnosis of MS typically involves a combination of complex clinical assessments and tests to rule out other diseases with similar symptoms. New technologies, such as smartphone monitoring in free-living conditions, could potentially aid in objectively assessing the symptoms of MS by quantifying symptom presence and intensity over long periods of time. Here, we present a deep-learning approach to diagnosing MS from smartphone-derived digital biomarkers that uses a novel combination of a multilayer perceptron with neural soft attention to improve learning of patterns in long-term smartphone monitoring data. Using data from a cohort of 774 participants, we demonstrate that our deep-learning models are able to distinguish between people with and without MS with an area under the receiver operating characteristic curve of 0.88 (95% CI: 0.70, 0.88). Our experimental results indicate that digital biomarkers derived from smartphone data could in the future be used as additional diagnostic criteria for MS. |
Tasks | |
Published | 2020-01-02 |
URL | https://arxiv.org/abs/2001.09748v1 |
https://arxiv.org/pdf/2001.09748v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-learning-approach-to-diagnosing |
Repo | |
Framework | |
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Title | Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning |
Authors | Mitchell A. Gordon, Kevin Duh, Nicholas Andrews |
Abstract | Universal feature extractors, such as BERT for natural language processing and VGG for computer vision, have become effective methods for improving deep learning models without requiring more labeled data. A common paradigm is to pre-train a feature extractor on large amounts of data then fine-tune it as part of a deep learning model on some downstream task (i.e. transfer learning). While effective, feature extractors like BERT may be prohibitively large for some deployment scenarios. We explore weight pruning for BERT and ask: how does compression during pre-training affect transfer learning? We find that pruning affects transfer learning in three broad regimes. Low levels of pruning (30-40%) do not affect pre-training loss or transfer to downstream tasks at all. Medium levels of pruning increase the pre-training loss and prevent useful pre-training information from being transferred to downstream tasks. High levels of pruning additionally prevent models from fitting downstream datasets, leading to further degradation. Finally, we observe that fine-tuning BERT on a specific task does not improve its prunability. We conclude that BERT can be pruned once during pre-training rather than separately for each task without affecting performance. |
Tasks | Transfer Learning |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08307v1 |
https://arxiv.org/pdf/2002.08307v1.pdf | |
PWC | https://paperswithcode.com/paper/compressing-bert-studying-the-effects-of-1 |
Repo | |
Framework | |
Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning
Title | Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning |
Authors | Jize Zhang, Bhavya Kailkhura, T. Yong-Jin Han |
Abstract | This paper studies the problem of post-hoc calibration of machine learning classifiers. We introduce the following desiderata for uncertainty calibration: (a) accuracy-preserving, (b) data-efficient, and (c) high expressive power. We show that none of the existing methods satisfy all three requirements, and demonstrate how Mix-n-Match calibration strategies (i.e., ensemble and composition) can help achieve remarkably better data-efficiency and expressive power while provably preserving classification accuracy of the original classifier. We also show that existing calibration error estimators (e.g., histogram-based ECE) are unreliable especially in small-data regime. Therefore, we propose an alternative data-efficient kernel density-based estimator for a reliable evaluation of the calibration performance and prove its asymptotically unbiasedness and consistency. |
Tasks | Calibration |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07329v1 |
https://arxiv.org/pdf/2003.07329v1.pdf | |
PWC | https://paperswithcode.com/paper/mix-n-match-ensemble-and-compositional |
Repo | |
Framework | |
Distance-Based Regularisation of Deep Networks for Fine-Tuning
Title | Distance-Based Regularisation of Deep Networks for Fine-Tuning |
Authors | Henry Gouk, Timothy M. Hospedales, Massimiliano Pontil |
Abstract | We investigate approaches to regularisation during fine-tuning of deep neural networks. First we provide a neural network generalisation bound based on Rademacher complexity that uses the distance the weights have moved from their initial values. This bound has no direct dependence on the number of weights and compares favourably to other bounds when applied to convolutional networks. Our bound is highly relevant for fine-tuning, because providing a network with a good initialisation based on transfer learning means that learning can modify the weights less, and hence achieve tighter generalisation. Inspired by this, we develop a simple yet effective fine-tuning algorithm that constrains the hypothesis class to a small sphere centred on the initial pre-trained weights, thus obtaining provably better generalisation performance than conventional transfer learning. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results. It outperforms both state of the art fine-tuning competitors, and penalty-based alternatives that we show do not directly constrain the radius of the search space. |
Tasks | Transfer Learning |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08253v1 |
https://arxiv.org/pdf/2002.08253v1.pdf | |
PWC | https://paperswithcode.com/paper/distance-based-regularisation-of-deep |
Repo | |
Framework | |
Imperialist Competitive Algorithm with Independence and Constrained Assimilation for Solving 0-1 Multidimensional Knapsack Problem
Title | Imperialist Competitive Algorithm with Independence and Constrained Assimilation for Solving 0-1 Multidimensional Knapsack Problem |
Authors | Ivars Dzalbs, Tatiana Kalganova, Ian Dear |
Abstract | The multidimensional knapsack problem is a well-known constrained optimization problem with many real-world engineering applications. In order to solve this NP-hard problem, a new modified Imperialist Competitive Algorithm with Constrained Assimilation (ICAwICA) is presented. The proposed algorithm introduces the concept of colony independence, a free will to choose between classical ICA assimilation to empires imperialist or any other imperialist in the population. Furthermore, a constrained assimilation process has been implemented that combines classical ICA assimilation and revolution operators, while maintaining population diversity. This work investigates the performance of the proposed algorithm across 101 Multidimensional Knapsack Problem (MKP) benchmark instances. Experimental results show that the algorithm is able to obtain an optimal solution in all small instances and presents very competitive results for large MKP instances. |
Tasks | |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.06617v1 |
https://arxiv.org/pdf/2003.06617v1.pdf | |
PWC | https://paperswithcode.com/paper/imperialist-competitive-algorithm-with |
Repo | |
Framework | |
BoostTree and BoostForest for Ensemble Learning
Title | BoostTree and BoostForest for Ensemble Learning |
Authors | Changming Zhao, Dongrui Wu, Jian Huang, Ye Yuan, Hai-Tao Zhang |
Abstract | Bootstrap aggregation (Bagging) and boosting are two popular ensemble learning approaches, which combine multiple base learners to generate a composite learner. This article proposes BoostForest, which is an ensemble learning approach using BoostTree as base learners and can be used for both classification and regression. BoostTree constructs a tree by gradient boosting, which trains a linear or nonlinear model at each node. When a new sample comes in, BoostTree first sorts it down to a leaf, then computes the final prediction by summing up the outputs of all models along the path from the root node to that leaf. BoostTree achieves high randomness (diversity) by sampling its parameters randomly from a parameter pool, and selecting a subset of features randomly at node splitting. BoostForest further increases the randomness by bootstrapping the training data in constructing different BoostTrees. BoostForest is compared with four classical ensemble learning approaches on 30 classification and regression datasets, demonstrating that it can generate more accurate and more robust composite learners. |
Tasks | |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09737v1 |
https://arxiv.org/pdf/2003.09737v1.pdf | |
PWC | https://paperswithcode.com/paper/boosttree-and-boostforest-for-ensemble |
Repo | |
Framework | |
A deep belief network-based method to identify proteomic risk markers for Alzheimer disease
Title | A deep belief network-based method to identify proteomic risk markers for Alzheimer disease |
Authors | Ning An, Liuqi Jin, Huitong Ding, Jiaoyun Yang, Jing Yuan |
Abstract | While a large body of research has formally identified apolipoprotein E (APOE) as a major genetic risk marker for Alzheimer disease, accumulating evidence supports the notion that other risk markers may exist. The traditional Alzheimer-specific signature analysis methods, however, have not been able to make full use of rich protein expression data, especially the interaction between attributes. This paper develops a novel feature selection method to identify pathogenic factors of Alzheimer disease using the proteomic and clinical data. This approach has taken the weights of network nodes as the importance order of signaling protein expression values. After generating and evaluating the candidate subset, the method helps to select an optimal subset of proteins that achieved an accuracy greater than 90%, which is superior to traditional machine learning methods for clinical Alzheimer disease diagnosis. Besides identifying a proteomic risk marker and further reinforce the link between metabolic risk factors and Alzheimer disease, this paper also suggests that apidonectin-linked pathways are a possible therapeutic drug target. |
Tasks | Feature Selection |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05776v1 |
https://arxiv.org/pdf/2003.05776v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-belief-network-based-method-to |
Repo | |
Framework | |
Corruption-Tolerant Gaussian Process Bandit Optimization
Title | Corruption-Tolerant Gaussian Process Bandit Optimization |
Authors | Ilija Bogunovic, Andreas Krause, Jonathan Scarlett |
Abstract | We consider the problem of optimizing an unknown (typically non-convex) function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS), based on noisy bandit feedback. We consider a novel variant of this problem in which the point evaluations are not only corrupted by random noise, but also adversarial corruptions. We introduce an algorithm Fast-Slow GP-UCB based on Gaussian process methods, randomized selection between two instances labeled “fast” (but non-robust) and “slow” (but robust), enlarged confidence bounds, and the principle of optimism under uncertainty. We present a novel theoretical analysis upper bounding the cumulative regret in terms of the corruption level, the time horizon, and the underlying kernel, and we argue that certain dependencies cannot be improved. We observe that distinct algorithmic ideas are required depending on whether one is required to perform well in both the corrupted and non-corrupted settings, and whether the corruption level is known or not. |
Tasks | |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.01971v1 |
https://arxiv.org/pdf/2003.01971v1.pdf | |
PWC | https://paperswithcode.com/paper/corruption-tolerant-gaussian-process-bandit |
Repo | |
Framework | |
Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors
Title | Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors |
Authors | Lulu Tang, Ke Chen, Chaozheng Wu, Yu Hong, Kui Jia, Zhixin Yang |
Abstract | Existing deep learning algorithms for point cloud analysis mainly concern discovering semantic patterns from global configuration of local geometries in a supervised learning manner. However, very few explore geometric properties revealing local surface manifolds embedded in 3D Euclidean space to discriminate semantic classes or object parts as additional supervision signals. This paper is the first attempt to propose a unique multi-task geometric learning network to improve semantic analysis by auxiliary geometric learning with local shape properties, which can be either generated via physical computation from point clouds themselves as self-supervision signals or provided as privileged information. Owing to explicitly encoding local shape manifolds in favor of semantic analysis, the proposed geometric self-supervised and privileged learning algorithms can achieve superior performance to their backbone baselines and other state-of-the-art methods, which are verified in the experiments on the popular benchmarks. |
Tasks | |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.04803v1 |
https://arxiv.org/pdf/2001.04803v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-semantic-analysis-on-point-clouds |
Repo | |
Framework | |
Statistical Adaptive Stochastic Gradient Methods
Title | Statistical Adaptive Stochastic Gradient Methods |
Authors | Pengchuan Zhang, Hunter Lang, Qiang Liu, Lin Xiao |
Abstract | We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in stochastic gradient methods. SALSA first uses a smoothed stochastic line-search procedure to gradually increase the learning rate, then automatically switches to a statistical method to decrease the learning rate. The line search procedure ``warms up’’ the optimization process, reducing the need for expensive trial and error in setting an initial learning rate. The method for decreasing the learning rate is based on a new statistical test for detecting stationarity when using a constant step size. Unlike in prior work, our test applies to a broad class of stochastic gradient algorithms without modification. The combined method is highly robust and autonomous, and it matches the performance of the best hand-tuned learning rate schedules in our experiments on several deep learning tasks. | |
Tasks | |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10597v1 |
https://arxiv.org/pdf/2002.10597v1.pdf | |
PWC | https://paperswithcode.com/paper/statistical-adaptive-stochastic-gradient |
Repo | |
Framework | |
Neural arbitrary style transfer for portrait images using the attention mechanism
Title | Neural arbitrary style transfer for portrait images using the attention mechanism |
Authors | S. A. Berezin, V. M. Volkova |
Abstract | Arbitrary style transfer is the task of synthesis of an image that has never been seen before, using two given images: content image and style image. The content image forms the structure, the basic geometric lines and shapes of the resulting image, while the style image sets the color and texture of the result. The word “arbitrary” in this context means the absence of any one pre-learned style. So, for example, convolutional neural networks capable of transferring a new style only after training or retraining on a new amount of data are not con-sidered to solve such a problem, while networks based on the attention mech-anism that are capable of performing such a transformation without retraining - yes. An original image can be, for example, a photograph, and a style image can be a painting of a famous artist. The resulting image in this case will be the scene depicted in the original photograph, made in the stylie of this picture. Recent arbitrary style transfer algorithms make it possible to achieve good re-sults in this task, however, in processing portrait images of people, the result of such algorithms is either unacceptable due to excessive distortion of facial features, or weakly expressed, not bearing the characteristic features of a style image. In this paper, we consider an approach to solving this problem using the combined architecture of deep neural networks with a attention mechanism that transfers style based on the contents of a particular image segment: with a clear predominance of style over the form for the background part of the im-age, and with the prevalence of content over the form in the image part con-taining directly the image of a person. |
Tasks | Style Transfer |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.07643v1 |
https://arxiv.org/pdf/2002.07643v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-arbitrary-style-transfer-for-portrait |
Repo | |
Framework | |
Multiclass classification by sparse multinomial logistic regression
Title | Multiclass classification by sparse multinomial logistic regression |
Authors | Felix Abramovich, Vadim Grinshtein, Tomer Levy |
Abstract | In this paper we consider high-dimensional multiclass classification by sparse multinomial logistic regression. We propose a feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the nonasymptotic bounds for misclassification excess risk of the resulting classifier. We establish also their tightness by deriving the corresponding minimax lower bounds. In particular, we show that there exist two regimes corresponding to small and large number of classes. The bounds can be reduced under the additional low noise condition. Implementation of any complexity penalty based procedure, however, requires a combinatorial search over all possible models. To find a feature selection procedure computationally feasible for high-dimensional data, we propose multinomial logistic group Lasso and Slope classifiers and show that they also achieve the optimal order in the minimax sense. |
Tasks | Feature Selection |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.01951v2 |
https://arxiv.org/pdf/2003.01951v2.pdf | |
PWC | https://paperswithcode.com/paper/multiclass-classification-by-sparse |
Repo | |
Framework | |
Automated classification of stems and leaves of potted plants based on point cloud data
Title | Automated classification of stems and leaves of potted plants based on point cloud data |
Authors | Zichu Liu, Qing Zhang, Pei Wang, Zhen Li, Huiru Wang |
Abstract | The accurate classification of plant organs is a key step in monitoring the growing status and physiology of plants. A classification method was proposed to classify the leaves and stems of potted plants automatically based on the point cloud data of the plants, which is a nondestructive acquisition. The leaf point training samples were automatically extracted by using the three-dimensional convex hull algorithm, while stem point training samples were extracted by using the point density of a two-dimensional projection. The two training sets were used to classify all the points into leaf points and stem points by utilizing the support vector machine (SVM) algorithm. The proposed method was tested by using the point cloud data of three potted plants and compared with two other methods, which showed that the proposed method can classify leaf and stem points accurately and efficiently. |
Tasks | |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12536v1 |
https://arxiv.org/pdf/2002.12536v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-classification-of-stems-and-leaves |
Repo | |
Framework | |
Sperm Detection and Tracking in Phase-Contrast Microscopy Image Sequences using Deep Learning and Modified CSR-DCF
Title | Sperm Detection and Tracking in Phase-Contrast Microscopy Image Sequences using Deep Learning and Modified CSR-DCF |
Authors | Mohammad reza Mohammadi, Mohammad Rahimzadeh, Abolfazl Attar |
Abstract | Nowadays, computer-aided sperm analysis (CASA) systems have made a big leap in extracting the characteristics of spermatozoa for studies or measuring human fertility. The first step in sperm characteristics analysis is sperm detection in the frames of the video sample. In this article, we used a deep fully convolutional network, as the object detector. Sperms are small objects with few attributes, that makes the detection more difficult in high-density samples and especially when there are other particles in semen, which could be like sperm heads. One of the main attributes of sperms is their movement, but this attribute cannot be extracted when only one frame would be fed to the network. To improve the performance of the sperm detection network, we concatenated some consecutive frames to use as the input of the network. With this method, the motility attribute has also been extracted, and then with the help of deep convolutional layers, we have achieved high accuracy in sperm detection. In the tracking phase, we modify the CSR-DCF algorithm. This method also has shown excellent results in sperm tracking even in high-density sperm samples, occlusions, sperm colliding, and when sperms exit from a frame and re-enter in the next frames. The average precision of the detection phase is 99.1%, and the F1 score of the tracking method evaluation is 96.46%. These results can be a great help in studies investigating sperm behavior and analyzing fertility possibility. |
Tasks | |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04034v3 |
https://arxiv.org/pdf/2002.04034v3.pdf | |
PWC | https://paperswithcode.com/paper/sperm-detection-and-tracking-in-phase |
Repo | |
Framework | |
Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives
Title | Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives |
Authors | Michael Muehlebach, Michael I. Jordan |
Abstract | We analyze the convergence rate of various momentum-based optimization algorithms from a dynamical systems point of view. Our analysis exploits fundamental topological properties, such as the continuous dependence of iterates on their initial conditions, to provide a simple characterization of convergence rates. In many cases, closed-form expressions are obtained that relate algorithm parameters to the convergence rate. The analysis encompasses discrete time and continuous time, as well as time-invariant and time-variant formulations, and is not limited to a convex or Euclidean setting. In addition, the article rigorously establishes why symplectic discretization schemes are important for momentum-based optimization algorithms, and provides a characterization of algorithms that exhibit accelerated convergence. |
Tasks | |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12493v1 |
https://arxiv.org/pdf/2002.12493v1.pdf | |
PWC | https://paperswithcode.com/paper/optimization-with-momentum-dynamical-control |
Repo | |
Framework | |