Detection and prediction of real-world severe asthma phenotypes by application of machine learning to electronic health records

Mehmet Furkan Bağcı; Toan Do; Samantha R Spierling Bagsic; Rahul F Gomez; Judy H Jun; Anna L Ritko; Sally E Wenzel; Truong Nguyen; Yusuf Öztürk; Brian D Modena

doi:10.1016/j.jacig.2025.100473

Detection and prediction of real-world severe asthma phenotypes by application of machine learning to electronic health records

J Allergy Clin Immunol Glob. 2025 Apr 17;4(3):100473. doi: 10.1016/j.jacig.2025.100473. eCollection 2025 Aug.

Authors

Affiliations

¹ Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, Calif.
² Department of Electrical and Computer Engineering, San Diego State University, San Diego, Calif.
³ Department of Allergy & Immunology, University of California San Diego School of Medicine, San Diego, Calif.
⁴ Department of Research Development, Scripps Health, San Diego, Calif.
⁵ Department of Knowledge Management, Scripps Health, San Diego, Calif.
⁶ Department of Internal Medicine, Scripps Health, San Diego, Calif.
⁷ University of Pittsburgh, Pittsburgh, Pa.
⁸ Modena Allergy + Asthma, La Jolla, Calif.

Abstract

Background: Asthma is a heterogeneous disease with a diverse array of phenotypes that differ in inflammatory characteristics and severity. Identifying and classifying phenotypes in the real world could provide a foundation to improve and personalize asthma management. Leveraging machine learning in analyzing electronic health records (EHRs) provides an opportunity to identify real-world asthma phenotypes.

Objective: We utilized machine-learning techniques applied to EHRs to detect and predict real-world severe asthma (SA) phenotypes and improve the precision of asthma severity diagnoses.

Methods: Data from 31,795 asthma patients were extracted from a health care system's EHR, with 1,112 patients meeting inclusion criteria for analysis. Principal component analysis (PCA) and a Gaussian mixture model classified patients into subject clusters (SCs). Asthma severity was assessed using two predictive models, one based on the American Thoracic Society (ATS) definition and the other a supervised model trained on 50 randomly selected patients whose disease severity was predetermined by 2 independent physicians.

Results: Three principal components (PCs) emerged, reflecting lung function (PC1), blood inflammatory markers (PC2), and systemic corticosteroid receipt (PC3). PCA identified 5 distinct asthma phenotypes with significant clinical, physiologic, and inflammatory differences. A supervised model, trained on 50 randomly selected patients, predicted SA with 92% precision and 85% accuracy. SC3 was classified as an inflammatory, SA phenotype, making it highly suitable for biologic therapy.

Conclusion: Integrating machine learning with EHRs successfully classified and identified real-world asthma phenotypes, demonstrating the potential of this approach to identify SA for appropriate management and/or clinical studies.

Keywords: Asthma; electronic health records; machine learning; predictive modeling.