Development and Validation of Machine Learning Models
for Prediction of 1-Year Mortality Utilizing Electronic Medical Record
Data Available at the End of Hospitalization in Multicondition
Patients: a Proof-of-Concept Study
Nishant Sahni, MD, MS
, Gyorgy Simon, PhD
, and Rashi Arora, MD
Division of General Internal Medicine, University of Minnesota, Minneapolis, MN, USA;
Institute of Health Informatics, University of Minnesota,
Minneapolis, MN, USA.
BACKGROUND: Predicting death in a cohort of clinically
diverse, multicondition hospitalized patients is difficult.
Prognostic models that use electronic medical record
(EMR) data to determine 1-year death risk can improve
end-of-life planning and risk adjustment for research.
OBJECTIVE: Determine if the final set of demographic,
vital sign, and laboratory data from a hospitalization can
be used to accurately quantify 1-year mortality risk.
DESIGN: A retrospective study using electronic medical
record data linked with the state death registry.
PARTICIPANTS: A total of 59,848 hospitalized patients
within a six-hospital network over a 4-year period.
MAIN MEASURES: The last set of vital signs, complete
blood count, basic and complete metabolic panel, demo-
graphic information, and ICD codes. The outcome of in-
terest was death within 1 year.
KEY RESULTS: Model performance was measured on the
validation data set. Random forests (RF) outperformed
logisitic regression (LR) models in discriminative ability.
An RF model that used the final set of demographic, vitals,
and laboratory data from the final 48 h of hospitalization
had an AUC of 0.86 (0.85–0.87) for predicting death with-
in a year. Age, blood urea nitrogen, platelet count, hemo-
globin, and creatinine were the most important variables
in the RF model. Models that used comorbidity variables
alone had the lowest AUC. In groups of patients with a
high probability of death, RF models underestimated the
probability by less than 10%.
CONCLUSION: The last set of EMR data from a hospital-
ization can be used to accurately estimate the risk of 1-
year mortality within a cohort of multicondition hospital-
KEY WORDS: machine learning; hospital outcomes; predictive models;
Aspartate amino transferase
Alanine amino transferase
Agency for Health Care Research and Quality
End of life
End of life planning
Electronic medical record
Mean corpuscular volume
White blood cell count
Complete metabolic panel
Complete blood count
Basic metabolic panel
Area under the curve
Out of bag
International Classification of Diseases
International Classification of Diseases 10
Mean decrease in Gini Index
J Gen Intern Med 33(6):921–8
© Society of General Internal Medicine 2018
During hospitalizations, seriously ill patients are frequently
exposed to unwanted interventions.
Surveys of seriously ill
hospitalized patients have found that communication about
end-of-life planning (EOLp) is an area of potential improve-
ment for hospitals.
An informed EOLp discussion is based on
an accurate estimation of a patient’s likelihood of death. Esti-
mates of the probability of patient survival are also needed to
adjust for the risk of death as a confounder. This is useful in
comparing outcomes between hospitals.
used prognostic models are applicable to specific diseases
and subpopulations (e.g., the MELD-Na score in end-stage
liver disease). Some studies have used Electronic Health Re-
cord (EHR) data to estimate inpatient mortality risk.
ever, there is a lack of prognostic models in diverse, multi-
condition, hospitalized patients for estimating longer term
outcomes, such as mortality at 1 year.
Electronic supplementary material The online version of this article
(https://doi.org/10.1007/s11606-018-4316-y) contains supplementary
material, which is available to authorized users.
Received August 28, 2017
Revised December 7, 2017
Accepted January 9, 2018
Published online January 30, 2018