Background: Heart failure (HF) is a major driver of global morbidity and mortality. Early identification of patients at risk remains challenging due to complex, multivariate clinical relationships. Machine learning (ML) methods offer promise for more accurate prognostication.
Objective: We evaluated the predictive value of electrocardiogram (ECG)-derived features and developed an ML model to stratify HF risk.
Methods: We analyzed a public cohort of 1061 patients, of whom 589 (55.5%) developed HF. Records were randomly divided into training (70%, n = 742) and test (30%, n = 319) sets. After preprocessing, we trained a random forest (RF) classifier. Performance on the test set was assessed via accuracy, sensitivity, specificity, F1 score, and area under the receiver operating characteristic curve (AUC). Feature selection employed Gini importance and the Boruta algorithm, while SHAP values provided model interpretability.
Results: The RF model achieved an AUC of 0.969, with 91.8% accuracy, 93.8% sensitivity, 89.4% specificity, and a 92.7% F1-score. The top predictors included ST depression (Oldpeak), maximum heart rate (MaxHR), ST-segment slope, and serum cholesterol. Confusion matrix analysis confirmed robust discrimination between HF and non-HF cases. SHAP interpretation reinforced the dominant influence of ECG-related indices and cholesterol on individual risk estimates.
Conclusion: An RF model leveraging ECG features demonstrated excellent performance for HF risk prediction and highlighted key physiologic markers. Future work should integrate comorbidity profiles and detailed biochemical data to further enhance clinical applicability.
Keywords: SHAP interpretation; electrocardiographic monitoring; heart failure; machine learning; predictive modeling; random forests.
© 2025 The Author(s). Annals of Noninvasive Electrocardiology published by Wiley Periodicals LLC.