Supervised machine learning algorithms to predict the duration and risk of long-term hospitalization in HIV-infected individuals: a retrospective study

Front Public Health. 2024 Jan 5:11:1282324. doi: 10.3389/fpubh.2023.1282324. eCollection 2023.

Abstract

Objective: The study aimed to use supervised machine learning models to predict the length and risk of prolonged hospitalization in PLWHs to help physicians timely clinical intervention and avoid waste of health resources.

Methods: Regression models were established based on RF, KNN, SVM, and XGB to predict the length of hospital stay using RMSE, MAE, MAPE, and R2, while classification models were established based on RF, KNN, SVM, NN, and XGB to predict risk of prolonged hospital stay using accuracy, PPV, NPV, specificity, sensitivity, and kappa, and visualization evaluation based on AUROC, AUPRC, calibration curves and decision curves of all models were used for internally validation.

Results: In regression models, XGB model performed best in the internal validation (RMSE = 16.81, MAE = 10.39, MAPE = 0.98, R2 = 0.47) to predict the length of hospital stay, while in classification models, NN model presented good fitting and stable features and performed best in testing sets, with excellent accuracy (0.7623), PPV (0.7853), NPV (0.7092), sensitivity (0.8754), specificity (0.5882), and kappa (0.4672), and further visualization evaluation indicated that the largest AUROC (0.9779), AUPRC (0.773) and well-performed calibration curve and decision curve in the internal validation.

Conclusion: This study showed that XGB model was effective in predicting the length of hospital stay, while NN model was effective in predicting the risk of prolonged hospitalization in PLWH. Based on predictive models, an intelligent medical prediction system may be developed to effectively predict the length of stay and risk of HIV patients according to their medical records, which helped reduce the waste of healthcare resources.

Keywords: AIDS; HIV; calibration curves; length of stay; machine learning; risk factors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • HIV Infections* / epidemiology
  • Humans
  • Length of Stay
  • Retrospective Studies
  • Supervised Machine Learning

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. Support for this work was provided by: (1) Beijing Municipal Administration of Hospitals' Ascent Plan (DFL20191802); (2) Beijing Municipal Administration of Hospitals Clinical Medicine Development of Special Funding Support (ZYLX202126); and (3) Capital' s Funds for Health Improvement and Research (2020-2-2174). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.