AI-Enabled Diagnostic Prediction within Electronic Health Records to Enhance Biosurveillance and Early Outbreak Detection

Andre R Goncalves; Jose Cadena Pico; Yeping Hu; David Schlessinger; John Greene; Liam O'suilleabhain; Heather Clancy; Michael Vollmer; Vincent Liu; Tom Bates; Priyadip Ray

doi:10.1101/2025.05.14.25327606

AI-Enabled Diagnostic Prediction within Electronic Health Records to Enhance Biosurveillance and Early Outbreak Detection

medRxiv [Preprint]. 2025 May 16:2025.05.14.25327606. doi: 10.1101/2025.05.14.25327606.

Affiliations

¹ Computational Engineering Division, Engineering Directorate, Lawrence Livermore National Laboratory.
² Division of Research, Kaiser Permanente Northern California.

Abstract

Detecting infectious disease outbreaks promptly is crucial for effective public health responses, minimizing transmission, and enabling critical interventions. This study introduces a method that integrates machine learning (ML)-based diagnostic predictions with traditional epidemiological surveillance to enhance biosurveillance systems. Using 4.5 million patient records from 2010 to 2022, ML models were trained to predict, within 24-hour intervals, the likelihood of patients being diagnosed with infectious or unspecified gastrointestinal, respiratory, or neurological diseases. High-confidence predictions were combined with final diagnoses and analyzed using spatiotemporal outbreak detection techniques. Among diseases with five or more outbreaks between 2014 and 2022, 33.3% (41 of 123 outbreaks) were detected earlier, with lead times ranging from 1 to 24 days and an average of 1.33 false positive outbreaks detected annually. This approach demonstrates the potential of integrating ML with conventional methods for faster outbreak detection, provided adequate disease-specific training data is available.

Publication types

Preprint

Grants and funding

R35 GM128672/GM/NIGMS NIH HHS/United States