April 2, 2020

3099 words 15 mins read

Paper Group ANR 373

A Deep Learning Approach to Diagnosing Multiple Sclerosis from Smartphone Data. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning. Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning. Distance-Based Regularisation of Deep Networks for Fine-Tuning. Imperialist Competitive Algorithm …

A Deep Learning Approach to Diagnosing Multiple Sclerosis from Smartphone Data


Title	A Deep Learning Approach to Diagnosing Multiple Sclerosis from Smartphone Data
Authors	Patrick Schwab, Walter Karlen
Abstract	Multiple sclerosis (MS) affects the central nervous system with a wide range of symptoms. MS can, for example, cause pain, changes in mood and fatigue, and may impair a person’s movement, speech and visual functions. Diagnosis of MS typically involves a combination of complex clinical assessments and tests to rule out other diseases with similar symptoms. New technologies, such as smartphone monitoring in free-living conditions, could potentially aid in objectively assessing the symptoms of MS by quantifying symptom presence and intensity over long periods of time. Here, we present a deep-learning approach to diagnosing MS from smartphone-derived digital biomarkers that uses a novel combination of a multilayer perceptron with neural soft attention to improve learning of patterns in long-term smartphone monitoring data. Using data from a cohort of 774 participants, we demonstrate that our deep-learning models are able to distinguish between people with and without MS with an area under the receiver operating characteristic curve of 0.88 (95% CI: 0.70, 0.88). Our experimental results indicate that digital biomarkers derived from smartphone data could in the future be used as additional diagnostic criteria for MS.
Tasks
Published	2020-01-02
URL	https://arxiv.org/abs/2001.09748v1
PDF	https://arxiv.org/pdf/2001.09748v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-approach-to-diagnosing
Repo
Framework

Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning


Title	Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Authors	Mitchell A. Gordon, Kevin Duh, Nicholas Andrews
Abstract	Universal feature extractors, such as BERT for natural language processing and VGG for computer vision, have become effective methods for improving deep learning models without requiring more labeled data. A common paradigm is to pre-train a feature extractor on large amounts of data then fine-tune it as part of a deep learning model on some downstream task (i.e. transfer learning). While effective, feature extractors like BERT may be prohibitively large for some deployment scenarios. We explore weight pruning for BERT and ask: how does compression during pre-training affect transfer learning? We find that pruning affects transfer learning in three broad regimes. Low levels of pruning (30-40%) do not affect pre-training loss or transfer to downstream tasks at all. Medium levels of pruning increase the pre-training loss and prevent useful pre-training information from being transferred to downstream tasks. High levels of pruning additionally prevent models from fitting downstream datasets, leading to further degradation. Finally, we observe that fine-tuning BERT on a specific task does not improve its prunability. We conclude that BERT can be pruned once during pre-training rather than separately for each task without affecting performance.
Tasks	Transfer Learning
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08307v1
PDF	https://arxiv.org/pdf/2002.08307v1.pdf
PWC	https://paperswithcode.com/paper/compressing-bert-studying-the-effects-of-1
Repo
Framework

Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning


Title	Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning
Authors	Jize Zhang, Bhavya Kailkhura, T. Yong-Jin Han
Abstract	This paper studies the problem of post-hoc calibration of machine learning classifiers. We introduce the following desiderata for uncertainty calibration: (a) accuracy-preserving, (b) data-efficient, and (c) high expressive power. We show that none of the existing methods satisfy all three requirements, and demonstrate how Mix-n-Match calibration strategies (i.e., ensemble and composition) can help achieve remarkably better data-efficiency and expressive power while provably preserving classification accuracy of the original classifier. We also show that existing calibration error estimators (e.g., histogram-based ECE) are unreliable especially in small-data regime. Therefore, we propose an alternative data-efficient kernel density-based estimator for a reliable evaluation of the calibration performance and prove its asymptotically unbiasedness and consistency.
Tasks	Calibration
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07329v1
PDF	https://arxiv.org/pdf/2003.07329v1.pdf
PWC	https://paperswithcode.com/paper/mix-n-match-ensemble-and-compositional
Repo
Framework

Distance-Based Regularisation of Deep Networks for Fine-Tuning


Title	Distance-Based Regularisation of Deep Networks for Fine-Tuning
Authors	Henry Gouk, Timothy M. Hospedales, Massimiliano Pontil
Abstract	We investigate approaches to regularisation during fine-tuning of deep neural networks. First we provide a neural network generalisation bound based on Rademacher complexity that uses the distance the weights have moved from their initial values. This bound has no direct dependence on the number of weights and compares favourably to other bounds when applied to convolutional networks. Our bound is highly relevant for fine-tuning, because providing a network with a good initialisation based on transfer learning means that learning can modify the weights less, and hence achieve tighter generalisation. Inspired by this, we develop a simple yet effective fine-tuning algorithm that constrains the hypothesis class to a small sphere centred on the initial pre-trained weights, thus obtaining provably better generalisation performance than conventional transfer learning. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results. It outperforms both state of the art fine-tuning competitors, and penalty-based alternatives that we show do not directly constrain the radius of the search space.
Tasks	Transfer Learning
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08253v1
PDF	https://arxiv.org/pdf/2002.08253v1.pdf
PWC	https://paperswithcode.com/paper/distance-based-regularisation-of-deep
Repo
Framework

Imperialist Competitive Algorithm with Independence and Constrained Assimilation for Solving 0-1 Multidimensional Knapsack Problem


Title	Imperialist Competitive Algorithm with Independence and Constrained Assimilation for Solving 0-1 Multidimensional Knapsack Problem
Authors	Ivars Dzalbs, Tatiana Kalganova, Ian Dear
Abstract	The multidimensional knapsack problem is a well-known constrained optimization problem with many real-world engineering applications. In order to solve this NP-hard problem, a new modified Imperialist Competitive Algorithm with Constrained Assimilation (ICAwICA) is presented. The proposed algorithm introduces the concept of colony independence, a free will to choose between classical ICA assimilation to empires imperialist or any other imperialist in the population. Furthermore, a constrained assimilation process has been implemented that combines classical ICA assimilation and revolution operators, while maintaining population diversity. This work investigates the performance of the proposed algorithm across 101 Multidimensional Knapsack Problem (MKP) benchmark instances. Experimental results show that the algorithm is able to obtain an optimal solution in all small instances and presents very competitive results for large MKP instances.
Tasks
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06617v1
PDF	https://arxiv.org/pdf/2003.06617v1.pdf
PWC	https://paperswithcode.com/paper/imperialist-competitive-algorithm-with
Repo
Framework

BoostTree and BoostForest for Ensemble Learning


Title	BoostTree and BoostForest for Ensemble Learning
Authors	Changming Zhao, Dongrui Wu, Jian Huang, Ye Yuan, Hai-Tao Zhang
Abstract	Bootstrap aggregation (Bagging) and boosting are two popular ensemble learning approaches, which combine multiple base learners to generate a composite learner. This article proposes BoostForest, which is an ensemble learning approach using BoostTree as base learners and can be used for both classification and regression. BoostTree constructs a tree by gradient boosting, which trains a linear or nonlinear model at each node. When a new sample comes in, BoostTree first sorts it down to a leaf, then computes the final prediction by summing up the outputs of all models along the path from the root node to that leaf. BoostTree achieves high randomness (diversity) by sampling its parameters randomly from a parameter pool, and selecting a subset of features randomly at node splitting. BoostForest further increases the randomness by bootstrapping the training data in constructing different BoostTrees. BoostForest is compared with four classical ensemble learning approaches on 30 classification and regression datasets, demonstrating that it can generate more accurate and more robust composite learners.
Tasks
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09737v1
PDF	https://arxiv.org/pdf/2003.09737v1.pdf
PWC	https://paperswithcode.com/paper/boosttree-and-boostforest-for-ensemble
Repo
Framework

A deep belief network-based method to identify proteomic risk markers for Alzheimer disease


Title	A deep belief network-based method to identify proteomic risk markers for Alzheimer disease
Authors	Ning An, Liuqi Jin, Huitong Ding, Jiaoyun Yang, Jing Yuan
Abstract	While a large body of research has formally identified apolipoprotein E (APOE) as a major genetic risk marker for Alzheimer disease, accumulating evidence supports the notion that other risk markers may exist. The traditional Alzheimer-specific signature analysis methods, however, have not been able to make full use of rich protein expression data, especially the interaction between attributes. This paper develops a novel feature selection method to identify pathogenic factors of Alzheimer disease using the proteomic and clinical data. This approach has taken the weights of network nodes as the importance order of signaling protein expression values. After generating and evaluating the candidate subset, the method helps to select an optimal subset of proteins that achieved an accuracy greater than 90%, which is superior to traditional machine learning methods for clinical Alzheimer disease diagnosis. Besides identifying a proteomic risk marker and further reinforce the link between metabolic risk factors and Alzheimer disease, this paper also suggests that apidonectin-linked pathways are a possible therapeutic drug target.
Tasks	Feature Selection
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05776v1
PDF	https://arxiv.org/pdf/2003.05776v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-belief-network-based-method-to
Repo
Framework

Corruption-Tolerant Gaussian Process Bandit Optimization


Title	Corruption-Tolerant Gaussian Process Bandit Optimization
Authors	Ilija Bogunovic, Andreas Krause, Jonathan Scarlett
Abstract	We consider the problem of optimizing an unknown (typically non-convex) function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS), based on noisy bandit feedback. We consider a novel variant of this problem in which the point evaluations are not only corrupted by random noise, but also adversarial corruptions. We introduce an algorithm Fast-Slow GP-UCB based on Gaussian process methods, randomized selection between two instances labeled “fast” (but non-robust) and “slow” (but robust), enlarged confidence bounds, and the principle of optimism under uncertainty. We present a novel theoretical analysis upper bounding the cumulative regret in terms of the corruption level, the time horizon, and the underlying kernel, and we argue that certain dependencies cannot be improved. We observe that distinct algorithmic ideas are required depending on whether one is required to perform well in both the corrupted and non-corrupted settings, and whether the corruption level is known or not.
Tasks
Published	2020-03-04
URL	https://arxiv.org/abs/2003.01971v1
PDF	https://arxiv.org/pdf/2003.01971v1.pdf
PWC	https://paperswithcode.com/paper/corruption-tolerant-gaussian-process-bandit
Repo
Framework

Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors


Title	Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors
Authors	Lulu Tang, Ke Chen, Chaozheng Wu, Yu Hong, Kui Jia, Zhixin Yang
Abstract	Existing deep learning algorithms for point cloud analysis mainly concern discovering semantic patterns from global configuration of local geometries in a supervised learning manner. However, very few explore geometric properties revealing local surface manifolds embedded in 3D Euclidean space to discriminate semantic classes or object parts as additional supervision signals. This paper is the first attempt to propose a unique multi-task geometric learning network to improve semantic analysis by auxiliary geometric learning with local shape properties, which can be either generated via physical computation from point clouds themselves as self-supervision signals or provided as privileged information. Owing to explicitly encoding local shape manifolds in favor of semantic analysis, the proposed geometric self-supervised and privileged learning algorithms can achieve superior performance to their backbone baselines and other state-of-the-art methods, which are verified in the experiments on the popular benchmarks.
Tasks
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04803v1
PDF	https://arxiv.org/pdf/2001.04803v1.pdf
PWC	https://paperswithcode.com/paper/improving-semantic-analysis-on-point-clouds
Repo
Framework

Statistical Adaptive Stochastic Gradient Methods


Title	Statistical Adaptive Stochastic Gradient Methods
Authors	Pengchuan Zhang, Hunter Lang, Qiang Liu, Lin Xiao
Abstract	We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in stochastic gradient methods. SALSA first uses a smoothed stochastic line-search procedure to gradually increase the learning rate, then automatically switches to a statistical method to decrease the learning rate. The line search procedure ``warms up’’ the optimization process, reducing the need for expensive trial and error in setting an initial learning rate. The method for decreasing the learning rate is based on a new statistical test for detecting stationarity when using a constant step size. Unlike in prior work, our test applies to a broad class of stochastic gradient algorithms without modification. The combined method is highly robust and autonomous, and it matches the performance of the best hand-tuned learning rate schedules in our experiments on several deep learning tasks. \|
Tasks
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10597v1
PDF	https://arxiv.org/pdf/2002.10597v1.pdf
PWC	https://paperswithcode.com/paper/statistical-adaptive-stochastic-gradient
Repo
Framework

Neural arbitrary style transfer for portrait images using the attention mechanism


Title	Neural arbitrary style transfer for portrait images using the attention mechanism
Authors	S. A. Berezin, V. M. Volkova
Abstract	Arbitrary style transfer is the task of synthesis of an image that has never been seen before, using two given images: content image and style image. The content image forms the structure, the basic geometric lines and shapes of the resulting image, while the style image sets the color and texture of the result. The word “arbitrary” in this context means the absence of any one pre-learned style. So, for example, convolutional neural networks capable of transferring a new style only after training or retraining on a new amount of data are not con-sidered to solve such a problem, while networks based on the attention mech-anism that are capable of performing such a transformation without retraining - yes. An original image can be, for example, a photograph, and a style image can be a painting of a famous artist. The resulting image in this case will be the scene depicted in the original photograph, made in the stylie of this picture. Recent arbitrary style transfer algorithms make it possible to achieve good re-sults in this task, however, in processing portrait images of people, the result of such algorithms is either unacceptable due to excessive distortion of facial features, or weakly expressed, not bearing the characteristic features of a style image. In this paper, we consider an approach to solving this problem using the combined architecture of deep neural networks with a attention mechanism that transfers style based on the contents of a particular image segment: with a clear predominance of style over the form for the background part of the im-age, and with the prevalence of content over the form in the image part con-taining directly the image of a person.
Tasks	Style Transfer
Published	2020-02-17
URL	https://arxiv.org/abs/2002.07643v1
PDF	https://arxiv.org/pdf/2002.07643v1.pdf
PWC	https://paperswithcode.com/paper/neural-arbitrary-style-transfer-for-portrait
Repo
Framework

Multiclass classification by sparse multinomial logistic regression


Title	Multiclass classification by sparse multinomial logistic regression
Authors	Felix Abramovich, Vadim Grinshtein, Tomer Levy
Abstract	In this paper we consider high-dimensional multiclass classification by sparse multinomial logistic regression. We propose a feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the nonasymptotic bounds for misclassification excess risk of the resulting classifier. We establish also their tightness by deriving the corresponding minimax lower bounds. In particular, we show that there exist two regimes corresponding to small and large number of classes. The bounds can be reduced under the additional low noise condition. Implementation of any complexity penalty based procedure, however, requires a combinatorial search over all possible models. To find a feature selection procedure computationally feasible for high-dimensional data, we propose multinomial logistic group Lasso and Slope classifiers and show that they also achieve the optimal order in the minimax sense.
Tasks	Feature Selection
Published	2020-03-04
URL	https://arxiv.org/abs/2003.01951v2
PDF	https://arxiv.org/pdf/2003.01951v2.pdf
PWC	https://paperswithcode.com/paper/multiclass-classification-by-sparse
Repo
Framework

Automated classification of stems and leaves of potted plants based on point cloud data


Title	Automated classification of stems and leaves of potted plants based on point cloud data
Authors	Zichu Liu, Qing Zhang, Pei Wang, Zhen Li, Huiru Wang
Abstract	The accurate classification of plant organs is a key step in monitoring the growing status and physiology of plants. A classification method was proposed to classify the leaves and stems of potted plants automatically based on the point cloud data of the plants, which is a nondestructive acquisition. The leaf point training samples were automatically extracted by using the three-dimensional convex hull algorithm, while stem point training samples were extracted by using the point density of a two-dimensional projection. The two training sets were used to classify all the points into leaf points and stem points by utilizing the support vector machine (SVM) algorithm. The proposed method was tested by using the point cloud data of three potted plants and compared with two other methods, which showed that the proposed method can classify leaf and stem points accurately and efficiently.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12536v1
PDF	https://arxiv.org/pdf/2002.12536v1.pdf
PWC	https://paperswithcode.com/paper/automated-classification-of-stems-and-leaves
Repo
Framework

Sperm Detection and Tracking in Phase-Contrast Microscopy Image Sequences using Deep Learning and Modified CSR-DCF


Title	Sperm Detection and Tracking in Phase-Contrast Microscopy Image Sequences using Deep Learning and Modified CSR-DCF
Authors	Mohammad reza Mohammadi, Mohammad Rahimzadeh, Abolfazl Attar
Abstract	Nowadays, computer-aided sperm analysis (CASA) systems have made a big leap in extracting the characteristics of spermatozoa for studies or measuring human fertility. The first step in sperm characteristics analysis is sperm detection in the frames of the video sample. In this article, we used a deep fully convolutional network, as the object detector. Sperms are small objects with few attributes, that makes the detection more difficult in high-density samples and especially when there are other particles in semen, which could be like sperm heads. One of the main attributes of sperms is their movement, but this attribute cannot be extracted when only one frame would be fed to the network. To improve the performance of the sperm detection network, we concatenated some consecutive frames to use as the input of the network. With this method, the motility attribute has also been extracted, and then with the help of deep convolutional layers, we have achieved high accuracy in sperm detection. In the tracking phase, we modify the CSR-DCF algorithm. This method also has shown excellent results in sperm tracking even in high-density sperm samples, occlusions, sperm colliding, and when sperms exit from a frame and re-enter in the next frames. The average precision of the detection phase is 99.1%, and the F1 score of the tracking method evaluation is 96.46%. These results can be a great help in studies investigating sperm behavior and analyzing fertility possibility.
Tasks
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04034v3
PDF	https://arxiv.org/pdf/2002.04034v3.pdf
PWC	https://paperswithcode.com/paper/sperm-detection-and-tracking-in-phase
Repo
Framework

Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives


Title	Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives
Authors	Michael Muehlebach, Michael I. Jordan
Abstract	We analyze the convergence rate of various momentum-based optimization algorithms from a dynamical systems point of view. Our analysis exploits fundamental topological properties, such as the continuous dependence of iterates on their initial conditions, to provide a simple characterization of convergence rates. In many cases, closed-form expressions are obtained that relate algorithm parameters to the convergence rate. The analysis encompasses discrete time and continuous time, as well as time-invariant and time-variant formulations, and is not limited to a convex or Euclidean setting. In addition, the article rigorously establishes why symplectic discretization schemes are important for momentum-based optimization algorithms, and provides a characterization of algorithms that exhibit accelerated convergence.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12493v1
PDF	https://arxiv.org/pdf/2002.12493v1.pdf
PWC	https://paperswithcode.com/paper/optimization-with-momentum-dynamical-control
Repo
Framework