Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Predicting risk of cardiovascular death in the high-dimensional cohort follow-up data in the presence of competing events: a guide for building a modeling pipeline

Predicting risk of cardiovascular death in the high-dimensional cohort follow-up data in the... Predictive models driven by time-to-event data are commonly used in survival analysis. Owing to the availability of high-dimensional epidemiological cohorts, there is a need for models and learning algorithms capable of utilizing hundreds or even thousands of predictors. Advanced machine learning tools with embedded variable selection are being modified for use with time-to-event data in the presence of competing risks and censoring. In this study, random survival forests were compared to the cause-specific and sub-distribution hazard models widely applied in survival analysis with competing risks. Using the extensive cohort data of 2,682 subjects and 950 predictors collected within the Kuopio Ischemic Heart Disease Risk Factor Study (1984–1989), we built models to predict cardiovascular and non-cardiovascular deaths for a 30-year prediction horizon in two scenarios: first, on a subset of manually selected risk factors and, second, on hundreds of more freely chosen available predictors. The experimental results show that without a manual pre-selection of predictors, the random forest reaches the highest area under the curve (AUC). In addition, this study presents a modeling pipeline, including a non-parametric analysis, missing data imputation, model validation, and result assessment. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biostatistics & Epidemiology Taylor & Francis

Predicting risk of cardiovascular death in the high-dimensional cohort follow-up data in the presence of competing events: a guide for building a modeling pipeline

Predicting risk of cardiovascular death in the high-dimensional cohort follow-up data in the presence of competing events: a guide for building a modeling pipeline

Abstract

Predictive models driven by time-to-event data are commonly used in survival analysis. Owing to the availability of high-dimensional epidemiological cohorts, there is a need for models and learning algorithms capable of utilizing hundreds or even thousands of predictors. Advanced machine learning tools with embedded variable selection are being modified for use with time-to-event data in the presence of competing risks and censoring. In this study, random survival forests were compared to...
Loading next page...
 
/lp/taylor-francis/predicting-risk-of-cardiovascular-death-in-the-high-dimensional-cohort-IDTR3ssnxV
Publisher
Taylor & Francis
Copyright
© 2022 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
ISSN
2470-9379
eISSN
2470-9360
DOI
10.1080/24709360.2022.2084704
Publisher site
See Article on Publisher Site

Abstract

Predictive models driven by time-to-event data are commonly used in survival analysis. Owing to the availability of high-dimensional epidemiological cohorts, there is a need for models and learning algorithms capable of utilizing hundreds or even thousands of predictors. Advanced machine learning tools with embedded variable selection are being modified for use with time-to-event data in the presence of competing risks and censoring. In this study, random survival forests were compared to the cause-specific and sub-distribution hazard models widely applied in survival analysis with competing risks. Using the extensive cohort data of 2,682 subjects and 950 predictors collected within the Kuopio Ischemic Heart Disease Risk Factor Study (1984–1989), we built models to predict cardiovascular and non-cardiovascular deaths for a 30-year prediction horizon in two scenarios: first, on a subset of manually selected risk factors and, second, on hundreds of more freely chosen available predictors. The experimental results show that without a manual pre-selection of predictors, the random forest reaches the highest area under the curve (AUC). In addition, this study presents a modeling pipeline, including a non-parametric analysis, missing data imputation, model validation, and result assessment.

Journal

Biostatistics & EpidemiologyTaylor & Francis

Published: Jun 14, 2022

Keywords: Data-driven models; competing risks; cardiovascular death; proportional hazard models; random survival forest

References