Integrating Nonindividual Patient Features in Machine Learning Models of Hospital-Onset Bacteremia

JAMA Netw Open. 2025 Jul 1;8(7):e2518815. doi: 10.1001/jamanetworkopen.2025.18815.

Abstract

Importance: Hospital-onset bacteremia and fungemia (HOB) are common and potentially preventable complications of hospital care.

Objective: To assess whether nonindividual patient features, which summarize interactions with other patients and health care workers (HCWs), can contribute to predictive and causal machine learning models for HOB.

Design, setting, and participants: This prognostic study included adult patients admitted to Barnes-Jewish Hospital, an academic hospital in St Louis, Missouri, in 2021. Analyses were developed between October 2023 and August 2024 and in April 2025.

Exposure: Individual patient features were extracted from electronic health records and used to engineer nonpatient features, including interactions with HCWs and direct or indirect (consecutive room occupancy) patient contact.

Main outcomes and measures: HOB was defined as a positive blood culture after the third day of hospitalization. Patients who were hospitalized for more than 3 days were considered at risk for the outcome. We developed 3 gradient boosting models: 2 predictive (with patient features only and with both patient and nonpatient features to predict the occurrence of HOB) and 1 causal to test the association of nonpatient features and HOB. Predictive performance is reported using area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC), and the results of the causal model are reported as difference in average effects. Sensitivity analyses separated intensive care unit-onset and ward-onset HOB and included a methicillin-resistant Staphylococcus aureus-specific model to adjust for colonization pressure.

Results: Among the 52 442 patients, 34 855 (66.5%) had admissions longer than 72 hours and were included for analysis; of these, 556 (1.6%) developed HOB. The median age for the included patients was 60 (IQR, 44-70) years, 50.5% were female, and obesity was the most frequent comorbidity (25.0%). Nonpatient features, such as a prior occupant of the same room receiving antipseudomonal beta-lactams and the mean number of HCWs per day for the 7 days preceding HOB, improved the model's performance (AUROC, 0.88 [95% CI, 0.88-0.89]; AUPRC, 0.20 [95% CI, 0.20-0.22]) compared with the patient-only model (AUROC, 0.85 [95% CI, 0.85-0.86]; AUPRC, 0.13 [95% CI, 0.12-0.14]) (P < .001). These 2 features were also associated with a higher likelihood of HOB in the causal gradient boosting model.

Conclusions and relevance: These findings suggest that nonindividual patient features may contribute to a comprehensive analysis of HOB when integrated with individual patient features in a machine learning model.

MeSH terms

  • Adult
  • Aged
  • Bacteremia* / diagnosis
  • Bacteremia* / epidemiology
  • Cross Infection* / epidemiology
  • Female
  • Fungemia* / epidemiology
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Missouri / epidemiology
  • Risk Factors