Paper Group ANR 211
Deep Transfer Learning for Physiological Signals. Benign overfitting in the large deviation regime. Interpretable Crowd Flow Prediction with Spatial-Temporal Self-Attention. Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System. Residual Block-based Multi-Label Classification and Localization Network with Int …
Deep Transfer Learning for Physiological Signals
Title | Deep Transfer Learning for Physiological Signals |
Authors | Hugh Chen, Scott Lundberg, Gabe Erion, Jerry H. Kim, Su-In Lee |
Abstract | Deep learning is increasingly common in healthcare, yet transfer learning for physiological signals (e.g., temperature, heart rate, etc.) is under-explored. Here, we present a straightforward, yet performant framework for transferring knowledge about physiological signals. Our framework is called PHASE (PHysiologicAl Signal Embeddings). It i) learns deep embeddings of physiological signals and ii) predicts adverse outcomes based on the embeddings. PHASE is the first instance of deep transfer learning in a cross-hospital, cross-department setting for physiological signals. We show that PHASE’s per-signal (one for each signal) LSTM embedding functions confer a number of benefits including improved performance, successful transference between hospitals, and lower computational cost. |
Tasks | Transfer Learning |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.04770v1 |
https://arxiv.org/pdf/2002.04770v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-transfer-learning-for-physiological |
Repo | |
Framework | |
Benign overfitting in the large deviation regime
Title | Benign overfitting in the large deviation regime |
Authors | Geoffrey Chinot, Matthieu Lerasle |
Abstract | We investigate the benign overfitting phenomenon in the large deviation regime where the bounds on the prediction risk hold with probability $1-e^{-\zeta n}$, for some absolute constant $\zeta$. We prove that these bounds can converge to $0$ for the quadratic loss. We obtain this result by a new analysis of the interpolating estimator with minimal Euclidean norm, relying on a preliminary localization of this estimator with respect to the Euclidean norm. This new analysis complements and strengthens particular cases obtained in previous works for the square loss and is extended to other loss functions. To illustrate this, we also provide excess risk bounds for the Huber and absolute losses, two widely spread losses in robust statistics. |
Tasks | |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.05838v1 |
https://arxiv.org/pdf/2003.05838v1.pdf | |
PWC | https://paperswithcode.com/paper/benign-overfitting-in-the-large-deviation |
Repo | |
Framework | |
Interpretable Crowd Flow Prediction with Spatial-Temporal Self-Attention
Title | Interpretable Crowd Flow Prediction with Spatial-Temporal Self-Attention |
Authors | Haoxing Lin, Weijia Jia, Yongjian You, Yiping Sun |
Abstract | Crowd flow prediction has been increasingly investigated in intelligent urban computing field as a fundamental component of urban management system. The most challenging part of predicting crowd flow is to measure the complicated spatial-temporal dependencies. A prevalent solution employed in current methods is to divide and conquer the spatial and temporal information by various architectures (e.g., CNN/GCN, LSTM). However, this strategy has two disadvantages: (1) the sophisticated dependencies are also divided and therefore partially isolated; (2) the spatial-temporal features are transformed into latent representations when passing through different architectures, making it hard to interpret the predicted crowd flow. To address these issues, we propose a Spatial-Temporal Self-Attention Network (STSAN) with an ST encoding gate that calculates the entire spatial-temporal representation with positional and time encodings and therefore avoids dividing the dependencies. Furthermore, we develop a Multi-aspect attention mechanism that applies scaled dot-product attention over spatial-temporal information and measures the attention weights that explicitly indicate the dependencies. Experimental results on traffic and mobile data demonstrate that the proposed method reduces inflow and outflow RMSE by 16% and 8% on the Taxi-NYC dataset compared to the SOTA baselines. |
Tasks | |
Published | 2020-02-22 |
URL | https://arxiv.org/abs/2002.09693v1 |
https://arxiv.org/pdf/2002.09693v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-crowd-flow-prediction-with |
Repo | |
Framework | |
Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System
Title | Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System |
Authors | Nataniel Ruiz, Mona Jalal, Vitaly Ablavsky, Danielle Allessio, John Magee, Jacob Whitehill, Ivon Arroyo, Beverly Woolf, Stan Sclaroff, Margrit Betke |
Abstract | In the context of building an intelligent tutoring system (ITS), which improves student learning outcomes by intervention, we set out to improve prediction of student problem outcome. In essence, we want to predict the outcome of a student answering a problem in an ITS from a video feed by analyzing their face and gestures. For this, we present a novel transfer learning facial affect representation and a user-personalized training scheme that unlocks the potential of this representation. We model the temporal structure of video sequences of students solving math problems using a recurrent neural network architecture. Additionally, we extend the largest dataset of student interactions with an intelligent online math tutor by a factor of two. Our final model, coined ATL-BP (Affect Transfer Learning for Behavior Prediction) achieves an increase in mean F-score over state-of-the-art of 45% on this new dataset in the general case and 50% in a more challenging leave-users-out experimental setting when we use a user-personalized training scheme. |
Tasks | Transfer Learning |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.05242v1 |
https://arxiv.org/pdf/2002.05242v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-affect-transfer-learning-for |
Repo | |
Framework | |
Residual Block-based Multi-Label Classification and Localization Network with Integral Regression for Vertebrae Labeling
Title | Residual Block-based Multi-Label Classification and Localization Network with Integral Regression for Vertebrae Labeling |
Authors | Chunli Qin, Demin Yao, Han Zhuang, Hui Wang, Yonghong Shi, Zhijian Song |
Abstract | Accurate identification and localization of the vertebrae in CT scans is a critical and standard preprocessing step for clinical spinal diagnosis and treatment. Existing methods are mainly based on the integration of multiple neural networks, and most of them use the Gaussian heat map to locate the vertebrae’s centroid. However, the process of obtaining the vertebrae’s centroid coordinates using heat maps is non-differentiable, so it is impossible to train the network to label the vertebrae directly. Therefore, for end-to-end differential training of vertebra coordinates on CT scans, a robust and accurate automatic vertebral labeling algorithm is proposed in this study. Firstly, a novel residual-based multi-label classification and localization network is developed, which can capture multi-scale features, but also utilize the residual module and skip connection to fuse the multi-level features. Secondly, to solve the problem that the process of finding coordinates is non-differentiable and the spatial structure is not destructible, integral regression module is used in the localization network. It combines the advantages of heat map representation and direct regression coordinates to achieve end-to-end training, and can be compatible with any key point detection methods of medical image based on heat map. Finally, multi-label classification of vertebrae is carried out, which use bidirectional long short term memory (Bi-LSTM) to enhance the learning of long contextual information to improve the classification performance. The proposed method is evaluated on a challenging dataset and the results are significantly better than the state-of-the-art methods (mean localization error <3mm). |
Tasks | Multi-Label Classification |
Published | 2020-01-01 |
URL | https://arxiv.org/abs/2001.00170v1 |
https://arxiv.org/pdf/2001.00170v1.pdf | |
PWC | https://paperswithcode.com/paper/residual-block-based-multi-label |
Repo | |
Framework | |
I-SPEC: An End-to-End Framework for Learning Transportable, Shift-Stable Models
Title | I-SPEC: An End-to-End Framework for Learning Transportable, Shift-Stable Models |
Authors | Adarsh Subbaswamy, Suchi Saria |
Abstract | Shifts in environment between development and deployment cause classical supervised learning to produce models that fail to generalize well to new target distributions. Recently, many solutions which find invariant predictive distributions have been developed. Among these, graph-based approaches do not require data from the target environment and can capture more stable information than alternative methods which find stable feature sets. However, these approaches assume that the data generating process is known in the form of a full causal graph, which is generally not the case. In this paper, we propose I-SPEC, an end-to-end framework that addresses this shortcoming by using data to learn a partial ancestral graph (PAG). Using the PAG we develop an algorithm that determines an interventional distribution that is stable to the declared shifts; this subsumes existing approaches which find stable feature sets that are less accurate. We apply I-SPEC to a mortality prediction problem to show it can learn a model that is robust to shifts without needing upfront knowledge of the full causal DAG. |
Tasks | Mortality Prediction |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08948v1 |
https://arxiv.org/pdf/2002.08948v1.pdf | |
PWC | https://paperswithcode.com/paper/i-spec-an-end-to-end-framework-for-learning |
Repo | |
Framework | |
Interpretable Machine Learning Model for Early Prediction of Mortality in Elderly Patients with Multiple Organ Dysfunction Syndrome (MODS): a Multicenter Retrospective Study and Cross Validation
Title | Interpretable Machine Learning Model for Early Prediction of Mortality in Elderly Patients with Multiple Organ Dysfunction Syndrome (MODS): a Multicenter Retrospective Study and Cross Validation |
Authors | Xiaoli Liu, Pan Hu, Zhi Mao, Po-Chih Kuo, Peiyao Li, Chao Liu, Jie Hu, Deyu Li, Desen Cao, Roger G. Mark, Leo Anthony Celi, Zhengbo Zhang, Feihu Zhou |
Abstract | Background: Elderly patients with MODS have high risk of death and poor prognosis. The performance of current scoring systems assessing the severity of MODS and its mortality remains unsatisfactory. This study aims to develop an interpretable and generalizable model for early mortality prediction in elderly patients with MODS. Methods: The MIMIC-III, eICU-CRD and PLAGH-S databases were employed for model generation and evaluation. We used the eXtreme Gradient Boosting model with the SHapley Additive exPlanations method to conduct early and interpretable predictions of patients’ hospital outcome. Three types of data source combinations and five typical evaluation indexes were adopted to develop a generalizable model. Findings: The interpretable model, with optimal performance developed by using MIMIC-III and eICU-CRD datasets, was separately validated in MIMIC-III, eICU-CRD and PLAGH-S datasets (no overlapping with training set). The performances of the model in predicting hospital mortality as validated by the three datasets were: AUC of 0.858, sensitivity of 0.834 and specificity of 0.705; AUC of 0.849, sensitivity of 0.763 and specificity of 0.784; and AUC of 0.838, sensitivity of 0.882 and specificity of 0.691, respectively. Comparisons of AUC between this model and baseline models with MIMIC-III dataset validation showed superior performances of this model; In addition, comparisons in AUC between this model and commonly used clinical scores showed significantly better performance of this model. Interpretation: The interpretable machine learning model developed in this study using fused datasets with large sample sizes was robust and generalizable. This model outperformed the baseline models and several clinical scores for early prediction of mortality in elderly ICU patients. The interpretative nature of this model provided clinicians with the ranking of mortality risk features. |
Tasks | Interpretable Machine Learning, Mortality Prediction |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.10977v1 |
https://arxiv.org/pdf/2001.10977v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-machine-learning-model-for |
Repo | |
Framework | |
Regular Intersection Emptiness of Graph Problems: Finding a Needle in a Haystack of Graphs with the Help of Automata
Title | Regular Intersection Emptiness of Graph Problems: Finding a Needle in a Haystack of Graphs with the Help of Automata |
Authors | Petra Wolf, Henning Fernau |
Abstract | The Int_reg-problem of a combinatorial problem P asks, given a nondeterministic automaton M as input, whether the language L(M) accepted by M contains any positive instance of the problem P. We consider the Int_reg-problem for a number of different graph problems and give general criteria that give decision procedures for these Int_reg-problems. To achieve this goal, we consider a natural graph encoding so that the language of all graph encodings is regular. Then, we draw the connection between classical pumping- and interchange-arguments from the field of formal language theory with the graph operations induced on the encoded graph. Our techniques apply among others to the Int_reg-problem of well-known graph problems like Vertex Cover and Independent Set, as well as to subgraph problems, graph-edit problems and graph-partitioning problems, including coloring problems. |
Tasks | graph partitioning |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.05826v1 |
https://arxiv.org/pdf/2003.05826v1.pdf | |
PWC | https://paperswithcode.com/paper/regular-intersection-emptiness-of-graph |
Repo | |
Framework | |
Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting
Title | Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting |
Authors | Lemeng Wu, Mao Ye, Qi Lei, Jason D. Lee, Qiang Liu |
Abstract | We propose signed splitting steepest descent (S3D), which progressively grows neural architectures by splitting critical neurons into multiple copies, following a theoretically-derived optimal scheme. Our algorithm is a generalization of the splitting steepest descent (S2D) of Liu et al. (2019b), but significantly improves over it by incorporating a rich set of new splitting schemes that allow negative output weights. By doing so, we can escape local optima that the original S2D can not escape. Theoretically, we show that our method provably learns neural networks with much smaller sizes than these needed for standard gradient descent in overparameterized regimes. Empirically, our method outperforms S2D and prior arts on various challenging benchmarks, including CIFAR-100, ImageNet and ModelNet40. |
Tasks | |
Published | 2020-03-23 |
URL | https://arxiv.org/abs/2003.10392v1 |
https://arxiv.org/pdf/2003.10392v1.pdf | |
PWC | https://paperswithcode.com/paper/steepest-descent-neural-architecture |
Repo | |
Framework | |
Missing Data Imputation using Optimal Transport
Title | Missing Data Imputation using Optimal Transport |
Authors | Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi |
Abstract | Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage optimal transport distances to quantify that criterion and turn it into a loss function to impute missing data values. We propose practical methods to minimize these losses using end-to-end learning, that can exploit or not parametric assumptions on the underlying distributions of values. We evaluate our methods on datasets from the UCI repository, in MCAR, MAR and MNAR settings. These experiments show that OT-based methods match or out-perform state-of-the-art imputation methods, even for high percentages of missing values. |
Tasks | Imputation |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03860v2 |
https://arxiv.org/pdf/2002.03860v2.pdf | |
PWC | https://paperswithcode.com/paper/missing-data-imputation-using-optimal |
Repo | |
Framework | |
Vehicle Tracking in Wireless Sensor Networks via Deep Reinforcement Learning
Title | Vehicle Tracking in Wireless Sensor Networks via Deep Reinforcement Learning |
Authors | Jun Li, Zhichao Xing, Weibin Zhang, Yan Lin, Feng Shu |
Abstract | Vehicle tracking has become one of the key applications of wireless sensor networks (WSNs) in the fields of rescue, surveillance, traffic monitoring, etc. However, the increased tracking accuracy requires more energy consumption. In this letter, a decentralized vehicle tracking strategy is conceived for improving both tracking accuracy and energy saving, which is based on adjusting the intersection area between the fixed sensing area and the dynamic activation area. Then, two deep reinforcement learning (DRL) aided solutions are proposed relying on the dynamic selection of the activation area radius. Finally, simulation results show the superiority of our DRL aided design. |
Tasks | |
Published | 2020-02-22 |
URL | https://arxiv.org/abs/2002.09671v1 |
https://arxiv.org/pdf/2002.09671v1.pdf | |
PWC | https://paperswithcode.com/paper/vehicle-tracking-in-wireless-sensor-networks |
Repo | |
Framework | |
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Title | The Implicit Regularization of Stochastic Gradient Flow for Least Squares |
Authors | Alnur Ali, Edgar Dobriban, Ryan J. Tibshirani |
Abstract | We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $\lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence). |
Tasks | |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07802v1 |
https://arxiv.org/pdf/2003.07802v1.pdf | |
PWC | https://paperswithcode.com/paper/the-implicit-regularization-of-stochastic |
Repo | |
Framework | |
Nonmyopic Gaussian Process Optimization with Macro-Actions
Title | Nonmyopic Gaussian Process Optimization with Macro-Actions |
Authors | Dmitrii Kharkovskii, Chun Kai Ling, Kian Hsiang Low |
Abstract | This paper presents a multi-staged approach to nonmyopic adaptive Gaussian process optimization (GPO) for Bayesian optimization (BO) of unknown, highly complex objective functions that, in contrast to existing nonmyopic adaptive BO algorithms, exploits the notion of macro-actions for scaling up to a further lookahead to match up to a larger available budget. To achieve this, we generalize GP upper confidence bound to a new acquisition function defined w.r.t. a nonmyopic adaptive macro-action policy, which is intractable to be optimized exactly due to an uncountable set of candidate outputs. The contribution of our work here is thus to derive a nonmyopic adaptive epsilon-Bayes-optimal macro-action GPO (epsilon-Macro-GPO) policy. To perform nonmyopic adaptive BO in real time, we then propose an asymptotically optimal anytime variant of our epsilon-Macro-GPO policy with a performance guarantee. We empirically evaluate the performance of our epsilon-Macro-GPO policy and its anytime variant in BO with synthetic and real-world datasets. |
Tasks | |
Published | 2020-02-22 |
URL | https://arxiv.org/abs/2002.09670v1 |
https://arxiv.org/pdf/2002.09670v1.pdf | |
PWC | https://paperswithcode.com/paper/nonmyopic-gaussian-process-optimization-with |
Repo | |
Framework | |
GenNet : Reading Comprehension with Multiple Choice Questions using Generation and Selection model
Title | GenNet : Reading Comprehension with Multiple Choice Questions using Generation and Selection model |
Authors | Vaishali Ingale, Pushpender Singh |
Abstract | Multiple-choice machine reading comprehension is difficult task as its required machines to select the correct option from a set of candidate or possible options using the given passage and question.Reading Comprehension with Multiple Choice Questions task,required a human (or machine) to read a given passage, question pair and select the best one option from n given options. There are two different ways to select the correct answer from the given passage. Either by selecting the best match answer to by eliminating the worst match answer. Here we proposed GenNet model, a neural network-based model. In this model first we will generate the answer of the question from the passage and then will matched the generated answer with given answer, the best matched option will be our answer. For answer generation we used S-net (Tan et al., 2017) model trained on SQuAD and to evaluate our model we used Large-scale RACE (ReAding Comprehension Dataset From Examinations) (Lai et al.,2017). |
Tasks | Machine Reading Comprehension, Reading Comprehension |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.04360v2 |
https://arxiv.org/pdf/2003.04360v2.pdf | |
PWC | https://paperswithcode.com/paper/gennet-reading-comprehension-with-multiple |
Repo | |
Framework | |
Data Set Description: Identifying the Physics Behind an Electric Motor – Data-Driven Learning of the Electrical Behavior (Part I)
Title | Data Set Description: Identifying the Physics Behind an Electric Motor – Data-Driven Learning of the Electrical Behavior (Part I) |
Authors | Sören Hanke, Oliver Wallscheid, Joachim Böcker |
Abstract | Two of the most important aspects of electric vehicles are their efficiency or achievable range. In order to achieve high efficiency and thus a long range, it is essential to avoid over-dimensioning the drive train. Therefore, the drive train has to be kept as lightweight as possible while at the same time being utilized to the best possible extent. This can only be achieved if the dynamic behavior of the drive train is accurately known by the controller. The task of the controller is to achieve a desired torque at the wheels of the car by controlling the currents of the electric motor. With machine learning modeling techniques, accurate models describing the behavior can be extracted from measurement data and then used by the controller. For the comparison of the different modeling approaches, a data set consisting of about 40 million data points was recorded at a test bench for electric drive trains. The data set is published on Kaggle, an online community of data scientists. |
Tasks | |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07273v3 |
https://arxiv.org/pdf/2003.07273v3.pdf | |
PWC | https://paperswithcode.com/paper/data-set-description-identifying-the-physics-1 |
Repo | |
Framework | |