TY - JOUR AU - Najarian, Kayvan AB - Background Timely referral for advanced therapies (i.e., heart transplantation, left ventricular assist device) is critical for ensuring optimal outcomes for heart failure patients. Using electronic health records, our goal was to use data from a single hospitalization to develop an interpretable clinical decision-making system for predicting the need for advanced therapies at the subsequent hospitalization. Methods Michigan Medicine heart failure patients from 2013–2021 with a left ventricular ejection fraction ≤ 35% and at least two heart failure hospitalizations within one year were used to train an interpretable machine learning model constructed using fuzzy logic and tropical geometry. Clinical knowledge was used to initialize the model. The performance and robustness of the model were evaluated with the mean and standard deviation of the area under the receiver operating curve (AUC), the area under the precision-recall curve (AUPRC), and the F1 score of the ensemble. We inferred membership functions from the model for continuous clinical variables, extracted decision rules, and then evaluated their relative importance. Results The model was trained and validated using data from 557 heart failure hospitalizations from 300 patients, of whom 193 received advanced therapies. The mean (standard deviation) of AUC, AUPRC, and F1 scores of the proposed model initialized with clinical knowledge was 0.747 (0.080), 0.642 (0.080), and 0.569 (0.067), respectively, showing superior predictive performance or increased interpretability over other machine learning methods. The model learned critical risk factors predicting the need for advanced therapies in the subsequent hospitalization. Furthermore, our model displayed transparent rule sets composed of these critical concepts to justify the prediction. Conclusion These results demonstrate the ability to successfully predict the need for advanced heart failure therapies by generating transparent and accessible clinical rules although further research is needed to prospectively validate the risk factors identified by the model. Introduction Heart failure (HF) affected 6.5 million adults in the United States in 2017 and is expected to impact 8 million by 2030 [1,2]. While medical and device therapies have improved outcomes over the past four decades, 5-year mortality for heart failure remains high at nearly 50% [1,3]. Amongst patients with HF, approximately 5% annually progress to an advanced disease state also known as Stage D HF [4]. For these patients, heart transplantation (HT) and left ventricular assist devices (LVADs) [5–8] offer the best opportunities for long-term survival with improved quality of life. HF advanced therapies, however, carry risks for adverse events and there exists a supply-demand mismatch for donor hearts, necessitating careful patient selection [9]. Therefore, whether and when to refer a HF patient for advanced therapies relies on timely clinical judgment requiring an effective and efficient method of identifying potential candidates amongst all HF patients. With the high prevalence of HF in the United States, most patients are cared for by general cardiologists or primary care clinicians who may lack specialized training in HF advanced therapies [10]. However, the capacity of advanced HF specialists to evaluate patients for advanced therapies is finite. There is thus a need for decision support systems that can identify patients with HF in need of advanced therapies to ensure that they are referred to a HF cardiologist at the appropriate time. While several risk models have been developed to assist in risk stratification of patients with advanced HF, their focus has been on predicting HF hospitalizations and mortality, typically at pre-specified time points such as 1 or 5 years, rather than assisting providers at the bedside with the timing of advanced therapy delivery. In addition, a subset incorporates data types not collected in routine practice [11–13]. Machine learning has gained significant popularity in recent years and has found applications across various domains, such as biology, medicine, and healthcare, as evident from numerous studies [14–18]. Notably, recent advancements in machine learning have led to successful models for identifying high-risk patients [19–21]. However, these models come with certain limitations, primarily the challenge of interpretability and transparency in model recommendations. This lack of interpretability, often referred to as the "black box" aspect of traditional machine learning models, may pose challenges in gaining acceptance from healthcare providers and in their subsequent integration into healthcare practices. Although some machine learning methods provide a way to interpret the importance of every feature (interpretability), most of the models are not transparent to present the decision logic in a rule format (transparency). Consequently, there is an urgent need for more transparent risk prediction models that can be broadly implemented within electronic health records (EHRs) that use routinely collected data to predict need for HF advanced therapies. This can be used to ensure that HF patients are referred to advanced HF cardiologists at the appropriate time or to prompt HF cardiologists to initiate a timely advanced therapies evaluation. Recently, an interpretable algorithm based on a tropical geometry-based fuzzy neural network (TGFNN) was developed [22]. Unlike traditional machine learning methods, this model incorporates existing clinical knowledge and produces a set of criteria by which to explain the rationale for its recommendations [22,23]. We extend that early work herein from classification to risk prediction, predicting the future need for HF advanced therapies using routinely collected clinical variables from a single hospitalization. Methods The study utilized a TGFNN to identify patients who would require advanced therapies for heart failure during a subsequent hospitalization. The proposed system’s flow diagram is depicted in Fig 1A. Prior to analysis, the study obtained approval from the University of Michigan Institutional Review Board (HUM00184418) which waived the need for informed consent, and the EHR data used in the study were completely deidentified prior to analysis. The data was accessed from Michigan Medicine on May 18th, 2021. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Flowchart of the clinical decision support system and the interpretable tropical geometry-based fuzzy neural network algorithm. Fig 1A describes the system and training strategy: The EHR dataset is collected from Michigan Medicine and then the patient selection and outcome definition were performed. The data is split into training and test sets. Five-fold cross-validation is performed on the training dataset. Rules extracted from five-folds are ensembled. The model is then retrained on the whole training dataset with ensembled rule initialization. The trained model is later validated on the test set. Fig 1B depicts the structure of tropical geometry-based fuzzy neural network: The encoding layer encodes the input features into ‘low’, ‘medium’ and ‘high’ fuzzy concepts. The rule layer combines different concepts to generate several rules and decisions are made at the final inference layer by leveraging all rules. The edges between modules are trainable parameters to optimize the model. xi: continuous variables; : low concept membership function; : medium concept function; : high concept membership function; A: attention matrix; M: connection matrix; W: inference matrix. https://doi.org/10.1371/journal.pone.0295016.g001 Study cohort Eligible patients were 18–80 years of age at the time of a hospitalization at Michigan Medicine for acute on chronic HF, as derived from billing codes, where they received at least one dose of an intravenous diuretic. Qualifying hospitalizations were between January 1, 2013, and January 30, 2021. Patients were included if their most recent ejection fraction recorded in the EHR at the time of admission was less than or equal to 35% and excluded if they had a body mass index (BMI) > 50 kg/m2. Patients were required to have at least two eligible HF hospitalizations in one year to support the study design. To predict the next-visit state, eligible pairs of HF hospitalizations were then labeled as positive if the patient received an urgent HT or LVAD during their second hospitalization, with urgent HTs defined as those transplanted at Status 1A or 1B prior to October 17, 2018, or at statuses 1–4 thereafter. The remaining hospitalization pairs were labeled as negative and included those too well for HF advanced therapies, which we defined as patients who survived at least two years after their first HF hospitalization without the need for HT or LVAD implantation. The resulting cohort consisted of 557 HF hospitalizations pairs (samples) from 300 patients, 193 of whom received advanced therapy at their second hospitalization. Within our study cohort, 157 patients had two encounters, 66 three encounters, and the remaining patients more than three encounters. The time intervals between the two consecutive visits exhibit the following distribution: the ¼ quantile is 27 days, the ½ quantile is 61 days, and the ¾ quantile is 189 days. Clinical features and preprocessing Our model incorporated continuous and categorical clinical variables from each hospitalization identified by HF cardiologists as being of clinical value in the setting of HF, including laboratory values, vital signs, and comorbidities as determined by Elixhauser [24] (S1 and S2 Tables in S1 File). For most continuous variables, we only utilized the first measurement obtained during a given hospitalization. For brain natriuretic peptide (BNP) and creatinine, the relative change over each hospitalization (first to last measurement) was expressed as the percent change over the hospitalization. Furthermore, the first measurement of both systolic and diastolic blood pressure (BP) was used to calculate mean arterial pressure (MAP) and pulse pressure. Standardization was executed for all continuous features after data partition, while one-hot encoding was employed for all categorical features to transform them into binary or ordinal representations. Data imbalance was not addressed during this phase; instead, weighted loss was implemented throughout the entire training process by assigning higher weights to positive samples. For missing values, carry-forward imputation was applied between encounters, and any remaining missing values were imputed using multiple imputations [25]. Although echocardiographic features had an approximately 40% missing rate, they were retained due to their importance in HF decision-making. Any other features with a missing rate greater than 60% were removed, and any patient with more than ten missing values was excluded from the analysis. Tropical geometry-based interpretable machine learning method We used the TGFNN to predict the future need of advanced therapies. While the TGFNN model has demonstrated successful application in the classification task using the REVIVAL and INTERMACS registries, its potential in the prediction task with EHR data remains unexplored. In this proof-of-concept study for the prediction task, we utilize EHR data from previous hospitalizations as input. The overall architecture of the algorithm is depicted in Fig 1B. The TGFNN algorithm consists of three modules: encoding, rule extraction, and inference. In the encoding module, each comorbidity is assigned a value of 1 if it exists in the patient’s medical history and 0 otherwise with an indicator function. At the same time, continuous variables are encoded into three fuzzy concepts: ’low’, ’medium’ and ’high’. Fuzzy concepts offer a valuable approach to address the complexities and uncertainties associated with determining cutoff points in clinical practice by encoding continuous variables into three categories, thereby accommodating the inherent ambiguity in defining thresholds. The three fuzzy concepts correspond to membership functions , and , which are defined as follows: (1) (2) (3) (4) where ai,j denotes the cutoff parameters for the concepts and ε controls the smoothness of membership functions. The membership function defines to what degree the measurements belong to each concept rather than trichotomizing variables that exist along a continuum. Therefore, the uncertainty that fuzzy concepts introduce can make the model more flexible in terms of interpretability. In addition, the ‘cut-off points’ for three fuzzy concepts are learned from the study cohort by the algorithm rather than pre-defined. The rule module consists of two layers. The first layer leverages the three concepts of variable i in relation to rule k using attention tensor A∈ℝN×3×K. Each entry Ai,j,k shows the importance of xi being of concept j to the rule k. The message passing formula for this layer is given as . Every entry is normalized to [0, 1] and trainable. High value in A indicates the higher importance in the decision system. The second layer measures the importance of i-th variable to k-th rule using a connection matrix M of size N×K, whose entries are also normalized between [0,1] and trainable. The second half of the message-passing formula of the rule module is given by . rk measures the rule firing strength using a parameterized T-norm. The parameterized T-norm with two inputs is define by (5) (6) When ε2 approaches to 1, parameterized T-norm approaches to the multiplication operation while ε2 approaches to 0, parameterized T-norm takes the minimum of the inputs. The higher value in M, the higher contribution to the firing strength. The N-input T-norm is defined as: (7) In the inference module, an inference matrix W of size K×C is learned to estimate the rule firing strengths, where C denotes the number of outcome classes. Each entry Wk,c of the inference matrix represents the contribution of the rule k to the final prediction of the class c, which is calculated as . The parametrized T-conorm with two inputs can be defined as . When ε3→1, parameterized T-conorm approaches to the addition operation while ε3→0, parameterized T-conorm takes the maximum of the inputs. The two-inputs T-conorm can be generalized to K-input T-conorm (8) The three smoothness parameters ε1, ε2, ε3 are all trainable with constraint 0<ε1, ε2, ε3<1 and can control the (1) sharpness of the membership function (2) the behavior of the T-norm and T-conorm functions. In our current study, we used the same ε for simplicity. All parameters were trained using Adam optimizer except for ε. ε is initialized with 0.99 and then decrease at every training step using the scheduling formula: ε =max (εmin, ε∙γtraining_step) where γ is the decay rate. Furthermore, one-hot encoded categorical features do not require membership functions, but their category levels behave like the three concepts of continuous variables. Therefore, the weighting process in the rule and inference modules can be applied as well to the categorical variables if we adapt the number of concepts from 3 to the number of category levels. The algorithm was trained with a weighted cross entropy alongside two regularization terms through backpropagation. The first regularization term is defined as below in favor of feature sparsity: (9) The second regularization term add penalty for highly correlated rules and is formulated as: (10) where S is constructed utilizing the entries in the attention matrix, A and connection matrix M as follows: Si,d,k = Ai,d,k×Mi,k, where i∈{1,…N}, d∈{1,2,3}, k∈{1,…K}. The contribution matrix S represents the contributions of individual concepts and variables to each rule. The total loss function can be therefore written as: (11) Where vec(∙) denotes matrix vectorization. Through the attention matrix and connection matrix, the algorithm operates with complete transparency, enabling the extraction and display of the underlying rules, thus revealing the overall decision-making logic. TGFNN rule initialization, ensemble, and extraction To enhance the TGFNN model with clinical knowledge, we gathered four simplified rules from HF cardiologists and from the medical literature [1,2,26]. Rules used for network initialization are provided below: IF left ventricular ejection fraction (LVEF) is low AND systolic BP (SBP) is low, THEN refer to HT/ LVAD IF LVEF is low AND mitral regurgitation is high (severe) THEN refer to HT/ LVAD IF LVEF is low AND BNP change elevated (positive delta from first to last measurements during the initial hospitalization) THEN refer to HT/ LVAD IF LVEF is low and serum sodium is low THEN refer to HT/ LVAD These rules were used to initialize the network for every fold prior to training. The resulting learned rules were filtered based on their firing strength and correlations with one another, while the corresponding features were selected by their contribution to each individual rule. Thus, the highest-weighted and least-correlated rules with important variables were retained and ensembled for network re-initialization. Details are illustrated in the supporting information. Experimental design This study compared the TGFNN to the following classical machine learning models: Random Forest, XGBoost, Logistic Regression, Support Vector Machines, Naive Bayes, and Decision Trees on the same dataset. We performed patient-wise five-fold cross-validation on the training dataset to evaluate model performance and robustness. A random search algorithm for hyperparameter tuning was employed during the training stage. The models were then evaluated on the validation and test datasets, which were kept separate from the training dataset. In our experimental setting, the validation sets were utilized to assess the model’s performance and robustness as an unseen dataset with the same data distribution as the training dataset. The test set served as an external dataset for overall model evaluation. The details of the data split are further described in supplementary materials. Model evaluation Models were then evaluated by using the mean (standard deviation [SD]) and by their accuracy ((true positive + true negative) / (all positives + all negatives)), recall (true positives/(true positives + false negatives)), specificity, precision (true positives/(true positives + false positives)), F1 score (harmonic mean of precision and recall), area under the receiver operating curve (AUC), area under the precision-recall curve (AUPRC) and Matthews correlation coefficient (MCC) [27]. Each of the machine learning models was also assessed concerning (1) whether the model could evaluate variable importance (interpretability) and (2) whether model prediction can be explained by clinical rules (transparency). Study cohort Eligible patients were 18–80 years of age at the time of a hospitalization at Michigan Medicine for acute on chronic HF, as derived from billing codes, where they received at least one dose of an intravenous diuretic. Qualifying hospitalizations were between January 1, 2013, and January 30, 2021. Patients were included if their most recent ejection fraction recorded in the EHR at the time of admission was less than or equal to 35% and excluded if they had a body mass index (BMI) > 50 kg/m2. Patients were required to have at least two eligible HF hospitalizations in one year to support the study design. To predict the next-visit state, eligible pairs of HF hospitalizations were then labeled as positive if the patient received an urgent HT or LVAD during their second hospitalization, with urgent HTs defined as those transplanted at Status 1A or 1B prior to October 17, 2018, or at statuses 1–4 thereafter. The remaining hospitalization pairs were labeled as negative and included those too well for HF advanced therapies, which we defined as patients who survived at least two years after their first HF hospitalization without the need for HT or LVAD implantation. The resulting cohort consisted of 557 HF hospitalizations pairs (samples) from 300 patients, 193 of whom received advanced therapy at their second hospitalization. Within our study cohort, 157 patients had two encounters, 66 three encounters, and the remaining patients more than three encounters. The time intervals between the two consecutive visits exhibit the following distribution: the ¼ quantile is 27 days, the ½ quantile is 61 days, and the ¾ quantile is 189 days. Clinical features and preprocessing Our model incorporated continuous and categorical clinical variables from each hospitalization identified by HF cardiologists as being of clinical value in the setting of HF, including laboratory values, vital signs, and comorbidities as determined by Elixhauser [24] (S1 and S2 Tables in S1 File). For most continuous variables, we only utilized the first measurement obtained during a given hospitalization. For brain natriuretic peptide (BNP) and creatinine, the relative change over each hospitalization (first to last measurement) was expressed as the percent change over the hospitalization. Furthermore, the first measurement of both systolic and diastolic blood pressure (BP) was used to calculate mean arterial pressure (MAP) and pulse pressure. Standardization was executed for all continuous features after data partition, while one-hot encoding was employed for all categorical features to transform them into binary or ordinal representations. Data imbalance was not addressed during this phase; instead, weighted loss was implemented throughout the entire training process by assigning higher weights to positive samples. For missing values, carry-forward imputation was applied between encounters, and any remaining missing values were imputed using multiple imputations [25]. Although echocardiographic features had an approximately 40% missing rate, they were retained due to their importance in HF decision-making. Any other features with a missing rate greater than 60% were removed, and any patient with more than ten missing values was excluded from the analysis. Tropical geometry-based interpretable machine learning method We used the TGFNN to predict the future need of advanced therapies. While the TGFNN model has demonstrated successful application in the classification task using the REVIVAL and INTERMACS registries, its potential in the prediction task with EHR data remains unexplored. In this proof-of-concept study for the prediction task, we utilize EHR data from previous hospitalizations as input. The overall architecture of the algorithm is depicted in Fig 1B. The TGFNN algorithm consists of three modules: encoding, rule extraction, and inference. In the encoding module, each comorbidity is assigned a value of 1 if it exists in the patient’s medical history and 0 otherwise with an indicator function. At the same time, continuous variables are encoded into three fuzzy concepts: ’low’, ’medium’ and ’high’. Fuzzy concepts offer a valuable approach to address the complexities and uncertainties associated with determining cutoff points in clinical practice by encoding continuous variables into three categories, thereby accommodating the inherent ambiguity in defining thresholds. The three fuzzy concepts correspond to membership functions , and , which are defined as follows: (1) (2) (3) (4) where ai,j denotes the cutoff parameters for the concepts and ε controls the smoothness of membership functions. The membership function defines to what degree the measurements belong to each concept rather than trichotomizing variables that exist along a continuum. Therefore, the uncertainty that fuzzy concepts introduce can make the model more flexible in terms of interpretability. In addition, the ‘cut-off points’ for three fuzzy concepts are learned from the study cohort by the algorithm rather than pre-defined. The rule module consists of two layers. The first layer leverages the three concepts of variable i in relation to rule k using attention tensor A∈ℝN×3×K. Each entry Ai,j,k shows the importance of xi being of concept j to the rule k. The message passing formula for this layer is given as . Every entry is normalized to [0, 1] and trainable. High value in A indicates the higher importance in the decision system. The second layer measures the importance of i-th variable to k-th rule using a connection matrix M of size N×K, whose entries are also normalized between [0,1] and trainable. The second half of the message-passing formula of the rule module is given by . rk measures the rule firing strength using a parameterized T-norm. The parameterized T-norm with two inputs is define by (5) (6) When ε2 approaches to 1, parameterized T-norm approaches to the multiplication operation while ε2 approaches to 0, parameterized T-norm takes the minimum of the inputs. The higher value in M, the higher contribution to the firing strength. The N-input T-norm is defined as: (7) In the inference module, an inference matrix W of size K×C is learned to estimate the rule firing strengths, where C denotes the number of outcome classes. Each entry Wk,c of the inference matrix represents the contribution of the rule k to the final prediction of the class c, which is calculated as . The parametrized T-conorm with two inputs can be defined as . When ε3→1, parameterized T-conorm approaches to the addition operation while ε3→0, parameterized T-conorm takes the maximum of the inputs. The two-inputs T-conorm can be generalized to K-input T-conorm (8) The three smoothness parameters ε1, ε2, ε3 are all trainable with constraint 0<ε1, ε2, ε3<1 and can control the (1) sharpness of the membership function (2) the behavior of the T-norm and T-conorm functions. In our current study, we used the same ε for simplicity. All parameters were trained using Adam optimizer except for ε. ε is initialized with 0.99 and then decrease at every training step using the scheduling formula: ε =max (εmin, ε∙γtraining_step) where γ is the decay rate. Furthermore, one-hot encoded categorical features do not require membership functions, but their category levels behave like the three concepts of continuous variables. Therefore, the weighting process in the rule and inference modules can be applied as well to the categorical variables if we adapt the number of concepts from 3 to the number of category levels. The algorithm was trained with a weighted cross entropy alongside two regularization terms through backpropagation. The first regularization term is defined as below in favor of feature sparsity: (9) The second regularization term add penalty for highly correlated rules and is formulated as: (10) where S is constructed utilizing the entries in the attention matrix, A and connection matrix M as follows: Si,d,k = Ai,d,k×Mi,k, where i∈{1,…N}, d∈{1,2,3}, k∈{1,…K}. The contribution matrix S represents the contributions of individual concepts and variables to each rule. The total loss function can be therefore written as: (11) Where vec(∙) denotes matrix vectorization. Through the attention matrix and connection matrix, the algorithm operates with complete transparency, enabling the extraction and display of the underlying rules, thus revealing the overall decision-making logic. TGFNN rule initialization, ensemble, and extraction To enhance the TGFNN model with clinical knowledge, we gathered four simplified rules from HF cardiologists and from the medical literature [1,2,26]. Rules used for network initialization are provided below: IF left ventricular ejection fraction (LVEF) is low AND systolic BP (SBP) is low, THEN refer to HT/ LVAD IF LVEF is low AND mitral regurgitation is high (severe) THEN refer to HT/ LVAD IF LVEF is low AND BNP change elevated (positive delta from first to last measurements during the initial hospitalization) THEN refer to HT/ LVAD IF LVEF is low and serum sodium is low THEN refer to HT/ LVAD These rules were used to initialize the network for every fold prior to training. The resulting learned rules were filtered based on their firing strength and correlations with one another, while the corresponding features were selected by their contribution to each individual rule. Thus, the highest-weighted and least-correlated rules with important variables were retained and ensembled for network re-initialization. Details are illustrated in the supporting information. Experimental design This study compared the TGFNN to the following classical machine learning models: Random Forest, XGBoost, Logistic Regression, Support Vector Machines, Naive Bayes, and Decision Trees on the same dataset. We performed patient-wise five-fold cross-validation on the training dataset to evaluate model performance and robustness. A random search algorithm for hyperparameter tuning was employed during the training stage. The models were then evaluated on the validation and test datasets, which were kept separate from the training dataset. In our experimental setting, the validation sets were utilized to assess the model’s performance and robustness as an unseen dataset with the same data distribution as the training dataset. The test set served as an external dataset for overall model evaluation. The details of the data split are further described in supplementary materials. Model evaluation Models were then evaluated by using the mean (standard deviation [SD]) and by their accuracy ((true positive + true negative) / (all positives + all negatives)), recall (true positives/(true positives + false negatives)), specificity, precision (true positives/(true positives + false positives)), F1 score (harmonic mean of precision and recall), area under the receiver operating curve (AUC), area under the precision-recall curve (AUPRC) and Matthews correlation coefficient (MCC) [27]. Each of the machine learning models was also assessed concerning (1) whether the model could evaluate variable importance (interpretability) and (2) whether model prediction can be explained by clinical rules (transparency). Results Prediction performance Patient characteristics are shown in Table 1. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Demographic characteristics of patients requiring HT/LVAD evaluation (“Positive”) and those too well for HF advanced therapies (“Negative”). Displayed are mean (standard deviation) for continuous variables or N (%) for categorical variables. https://doi.org/10.1371/journal.pone.0295016.t001 The cross-validation results for both the TGFNN and other standard machine-learning models are summarized in Table 2. The model’s performance when initialized with clinical knowledge achieved an F1 score of 0.569, an AUC of 0.747, and an AUPRC of 0.642. Our TGFNN with clinical initialization outperformed all standard machine learning models with respect to their F1 scores, AUC, and AUPRC except for XGBoost and Random Forest, which are ensemble models that lack explicit rules and operate through complex combinations of decision boundaries, making it challenging to interpret their inner workings. These results highlight the advantage of our model regarding transparency, interpretability, and performance in comparison to traditional ensemble approaches. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Performance of machine learning models on HF dataset using 5-fold cross validation. Models are referred to as transparent if they can explain their recommendations in a way understood by humans. The column ‘Interpretability’ indicates whether the feature importance can be provided with the model. Although Random Forest, XGBoost and SVM are listed as interpretable, these models can only be interpreted using external approach such as SHAP (SHapley Additive exPlanations). The column “Rules” refers to whether the model provides a set of clinical rules by which to explain its prediction. https://doi.org/10.1371/journal.pone.0295016.t002 We also assessed model performance on the holdout test dataset as shown in Table 3. The re-initialized TGFNN model with ensembled rules extracted from 5 folds improved the F1 score from 0.577 to 0.656. In addition, the re-initialized TGFNN model achieved the highest AUC (0.855) and AUPRC (0.833) of all machine learning models. In clinical practice, there is a preference for models that can provide their underlying logic, as it enables a better understanding of the reasoning behind their predictions for healthcare professionals. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Performance of machine learning models on the HF test dataset. https://doi.org/10.1371/journal.pone.0295016.t003 TGFNN rules The rules extracted from the re-initialized TGFNN model are presented in Fig 2, showing how the network makes decisions. Among the seven rules, rules 2–5 were learned from the data apart from the initially injected rules. For example, patients with low systolic blood pressure, low BMI, and low LVEF were more likely to be recommended for heart transplantation. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Clinical rules extracted from the network: In the heatmap, each column represents a rule, while each row represents one concept of a clinical feature. The number beneath every rule measures the contribution of the rule. The color shades on the heatmap indicate the importance of individual concepts for each rule. Rule 1 can be written as: IF Systolic Blood Pressure is low AND Left Ventricular Ejection Fraction is low, THEN refer for heart transplantation/ LVAD. KEY: BMI = body mass index; BNP = brain natriuretic peptide; CREAT = creatine; HGB = hemoglobin; LVEF = left ventricular ejection fraction; MAP = mean arterial pressure; SBP = systolic blood pressure; SOD = sodium. https://doi.org/10.1371/journal.pone.0295016.g002 Based on the learned rules, low SBP, low MAP, and low LVEF on hospital admission are the most important indicators of needing HF advanced therapies. Other factors, such as a relative increase in creatinine or hyponatremia, were also important indicators of need for HF advanced therapies. Demonstrative rules for identifying patients in need of HF advanced therapy are presented below: Rule: IF MAP is low AND Creatinine increases AND LVEF is low, THEN refer for HT/ LVAD Rule: IF SBP is low AND Hemoglobin is low AND LVEF is low AND No Diabetes THEN refer for HT/ LVAD Range inference Membership functions for several important continuous medical variables drawn from the model are shown in Fig 3, depicting how concepts (low, medium, high) are assigned to clinical variables to generate fuzzy sets in the rules. Furthermore, our model allows the transition between the ’low’ and ’medium’ concepts for SBP to be smooth and the change between the ’medium’ and ’high’ concepts to be sharp. Asymmetrical smoothness is vital as it provides greater flexibility and uncertainty in decision-making, which is useful for interpretation, suggesting the different disease progressions at the boundary between the adjacent fuzzy concepts. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Membership function visualization: Continuous clinical features are encoded into three concepts: ‘‘low’, ‘medium’ and ‘high’. Membership values range from 0 to 1. The x-axis of each membership function represents the range of possible values, while the y-axis represents the degree of membership of each value in the corresponding fuzzy set, ranging from 0 to 1. The X-coordinates of the intersection of two membership functions indicate where the transition from one concept to another occurs KEY: SBP = systolic blood pressure; MAP = mean arterial pressure; SOD = sodium; HGB = hemoglobin; BMI = body mass index; CREAT = creatine. https://doi.org/10.1371/journal.pone.0295016.g003 We can infer the possible ranges leading to decision-making by utilizing these membership function boundaries. The learned boundaries for these three concepts for clinical features are shown in Table 4. We observe consistency by comparing the learned boundaries and possible reference ranges that HF cardiologists employ in clinical practice. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Critical values of membership functions learned from the study cohort by the algorithm. These critical values indicate the potential threshold where adjacent concepts transition. https://doi.org/10.1371/journal.pone.0295016.t004 Individual performance In addition to the population-level rule extraction and predictive performance, rules can also be applied at the individual level. Upon feeding the patient’s EHR profile into the model, it not only generates predictions but also highlights the specific rules that are triggered for the given case. We illustrate this by using one patient who was referred for advanced therapies in our test set (Fig 4). The TGFNN with ensemble initialization successfully predicted the patient’s need for HF advanced therapies at his subsequent hospitalization. Rules 1, 2, 4, 5 and 6 in Fig 3 activated for this patient, leading to the recommendation for advanced therapies. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Profile for a patient in the test dataset, showing the composite rules that fired. https://doi.org/10.1371/journal.pone.0295016.g004 Prediction performance Patient characteristics are shown in Table 1. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Demographic characteristics of patients requiring HT/LVAD evaluation (“Positive”) and those too well for HF advanced therapies (“Negative”). Displayed are mean (standard deviation) for continuous variables or N (%) for categorical variables. https://doi.org/10.1371/journal.pone.0295016.t001 The cross-validation results for both the TGFNN and other standard machine-learning models are summarized in Table 2. The model’s performance when initialized with clinical knowledge achieved an F1 score of 0.569, an AUC of 0.747, and an AUPRC of 0.642. Our TGFNN with clinical initialization outperformed all standard machine learning models with respect to their F1 scores, AUC, and AUPRC except for XGBoost and Random Forest, which are ensemble models that lack explicit rules and operate through complex combinations of decision boundaries, making it challenging to interpret their inner workings. These results highlight the advantage of our model regarding transparency, interpretability, and performance in comparison to traditional ensemble approaches. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Performance of machine learning models on HF dataset using 5-fold cross validation. Models are referred to as transparent if they can explain their recommendations in a way understood by humans. The column ‘Interpretability’ indicates whether the feature importance can be provided with the model. Although Random Forest, XGBoost and SVM are listed as interpretable, these models can only be interpreted using external approach such as SHAP (SHapley Additive exPlanations). The column “Rules” refers to whether the model provides a set of clinical rules by which to explain its prediction. https://doi.org/10.1371/journal.pone.0295016.t002 We also assessed model performance on the holdout test dataset as shown in Table 3. The re-initialized TGFNN model with ensembled rules extracted from 5 folds improved the F1 score from 0.577 to 0.656. In addition, the re-initialized TGFNN model achieved the highest AUC (0.855) and AUPRC (0.833) of all machine learning models. In clinical practice, there is a preference for models that can provide their underlying logic, as it enables a better understanding of the reasoning behind their predictions for healthcare professionals. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Performance of machine learning models on the HF test dataset. https://doi.org/10.1371/journal.pone.0295016.t003 TGFNN rules The rules extracted from the re-initialized TGFNN model are presented in Fig 2, showing how the network makes decisions. Among the seven rules, rules 2–5 were learned from the data apart from the initially injected rules. For example, patients with low systolic blood pressure, low BMI, and low LVEF were more likely to be recommended for heart transplantation. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Clinical rules extracted from the network: In the heatmap, each column represents a rule, while each row represents one concept of a clinical feature. The number beneath every rule measures the contribution of the rule. The color shades on the heatmap indicate the importance of individual concepts for each rule. Rule 1 can be written as: IF Systolic Blood Pressure is low AND Left Ventricular Ejection Fraction is low, THEN refer for heart transplantation/ LVAD. KEY: BMI = body mass index; BNP = brain natriuretic peptide; CREAT = creatine; HGB = hemoglobin; LVEF = left ventricular ejection fraction; MAP = mean arterial pressure; SBP = systolic blood pressure; SOD = sodium. https://doi.org/10.1371/journal.pone.0295016.g002 Based on the learned rules, low SBP, low MAP, and low LVEF on hospital admission are the most important indicators of needing HF advanced therapies. Other factors, such as a relative increase in creatinine or hyponatremia, were also important indicators of need for HF advanced therapies. Demonstrative rules for identifying patients in need of HF advanced therapy are presented below: Rule: IF MAP is low AND Creatinine increases AND LVEF is low, THEN refer for HT/ LVAD Rule: IF SBP is low AND Hemoglobin is low AND LVEF is low AND No Diabetes THEN refer for HT/ LVAD Range inference Membership functions for several important continuous medical variables drawn from the model are shown in Fig 3, depicting how concepts (low, medium, high) are assigned to clinical variables to generate fuzzy sets in the rules. Furthermore, our model allows the transition between the ’low’ and ’medium’ concepts for SBP to be smooth and the change between the ’medium’ and ’high’ concepts to be sharp. Asymmetrical smoothness is vital as it provides greater flexibility and uncertainty in decision-making, which is useful for interpretation, suggesting the different disease progressions at the boundary between the adjacent fuzzy concepts. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Membership function visualization: Continuous clinical features are encoded into three concepts: ‘‘low’, ‘medium’ and ‘high’. Membership values range from 0 to 1. The x-axis of each membership function represents the range of possible values, while the y-axis represents the degree of membership of each value in the corresponding fuzzy set, ranging from 0 to 1. The X-coordinates of the intersection of two membership functions indicate where the transition from one concept to another occurs KEY: SBP = systolic blood pressure; MAP = mean arterial pressure; SOD = sodium; HGB = hemoglobin; BMI = body mass index; CREAT = creatine. https://doi.org/10.1371/journal.pone.0295016.g003 We can infer the possible ranges leading to decision-making by utilizing these membership function boundaries. The learned boundaries for these three concepts for clinical features are shown in Table 4. We observe consistency by comparing the learned boundaries and possible reference ranges that HF cardiologists employ in clinical practice. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Critical values of membership functions learned from the study cohort by the algorithm. These critical values indicate the potential threshold where adjacent concepts transition. https://doi.org/10.1371/journal.pone.0295016.t004 Individual performance In addition to the population-level rule extraction and predictive performance, rules can also be applied at the individual level. Upon feeding the patient’s EHR profile into the model, it not only generates predictions but also highlights the specific rules that are triggered for the given case. We illustrate this by using one patient who was referred for advanced therapies in our test set (Fig 4). The TGFNN with ensemble initialization successfully predicted the patient’s need for HF advanced therapies at his subsequent hospitalization. Rules 1, 2, 4, 5 and 6 in Fig 3 activated for this patient, leading to the recommendation for advanced therapies. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Profile for a patient in the test dataset, showing the composite rules that fired. https://doi.org/10.1371/journal.pone.0295016.g004 Discussion Herein we describe a transparent and interpretable machine learning model capable of using EHR data to predict whether HF patients will require advanced therapies at a future HF hospitalization. Our model outperformed all but two of the standard machine learning models to which it was compared and was the only model to be both transparent, providing the rationale for its recommendations, and interpretable. Importantly, unlike prior machine learning models, the current model uses routinely collected EHR data from a single hospitalization to predict the need for HF advanced therapies during a subsequent hospitalization. Such an approach allows the mobilization of critical resources to ensure that patients are able to undergo a comprehensive, advanced therapies evaluation in an anticipatory rather than a reactive manner, with the latter placing patients at risk for clinical deterioration precluding advanced therapies. In light of the burgeoning numbers of HF patients, there has been growing interest in developing clinical decision-support systems capable of identifying patients with advanced HF [28]. These models have differentiated themselves from many of the historical regression-based models whose limitations have included a focus on mortality and hospitalizations at pre-specified time points, reliance on data not routinely collected in practice, need for a relatively small number of clinical variables, and inability to account for non-linear relationships amongst variables. Recent machine learning models have overcome many of the limitations of these traditional models. These include an augmented intelligence-enabled workflow for identifying outpatients with Stage D HF warranting clinical review to determine need for referral to a HF cardiologist [19] and an ensemble deep learning model trained to predict all-cause death, listing for HT, or extracorporeal membrane oxygenation (ECMO)/VAD within 1-year [20]. Our model distinguished itself from the previous methods in a number of ways. First, the transparent structure of the TGFNN method allowed for the justification of treatment recommendations at both the population and individual levels through fuzzy rules. These rules enable the evaluation of feature importance and feature interaction and can be quickly verified by clinicians and tested for applicability in other clinical settings. Second, the model defined abnormal ranges for continuous variables, aiding in model interpretability and in the utilization of these ranges when caring for patients when clinical decision support may be unavailable. Finally, our model predicted the future need for advanced HF therapies using routinely collected data from a single hospitalization, thereby moving from classification to prediction and avoiding the risk of missing the optimal advanced therapies window. This study should be interpreted within the context of its limitations. First, the data were obtained from a single medical center, limiting the generalizability of our study findings. Thus, our work will need to be validated in additional settings with a larger sample. Second, the model requires prospective validation using the EHR with subsequent clinician review of model recommendations. Such an approach, when implemented elsewhere, led to an increase in clinical referrals to HF cardiologists as well as an increase in advanced therapies evaluations [19]. Third, the current algorithm only uses the previous visit to predict whether the patient will subsequently require advanced therapies. Future enhancements of the model will incorporate more extensive longitudinal data, potentially improving model performance. Finally, our analysis only incorporated a subset of the clinical variables with known associations with advanced HF. A greater number of diverse variables will be added to the analysis for future exploration, potentially improving model performance and allowing the generation of additional clinical rules. In conclusion, in this study, a TGFNN, an interpretable and transparent machine learning method, was applied to predict the future need for HF advanced therapies using data routinely collected in the EHR. The results show that this method’s performance exceeds existing traditional machine learning methods while extracting clinical rules that are easily interpretable and verifiable. Future research is needed, however, to incorporate longitudinal data and a broader sample of HF patients for long-term prediction. Supporting information S1 File. Contains the training details, clinical characteristic of patient encounters from Michigan Medicine and the data split information. https://doi.org/10.1371/journal.pone.0295016.s001 (DOCX) TI - Predicting need for heart failure advanced therapies using an interpretable tropical geometry-based fuzzy neural network JF - PLoS ONE DO - 10.1371/journal.pone.0295016 DA - 2023-11-28 UR - https://www.deepdyve.com/lp/public-library-of-science-plos-journal/predicting-need-for-heart-failure-advanced-therapies-using-an-8oC5SFQjWW SP - e0295016 VL - 18 IS - 11 DP - DeepDyve ER -