Background: Asthma is a heterogeneous disease with a diverse array of phenotypes that differ in inflammatory characteristics and severity. Identifying and classifying phenotypes in the real world could provide a foundation to improve and personalize asthma management. Leveraging machine learning in analyzing electronic health records (EHRs) provides an opportunity to identify real-world asthma phenotypes.
Objective: We utilized machine-learning techniques applied to EHRs to detect and predict real-world severe asthma (SA) phenotypes and improve the precision of asthma severity diagnoses.
Methods: Data from 31,795 asthma patients were extracted from a health care system's EHR, with 1,112 patients meeting inclusion criteria for analysis. Principal component analysis (PCA) and a Gaussian mixture model classified patients into subject clusters (SCs). Asthma severity was assessed using two predictive models, one based on the American Thoracic Society (ATS) definition and the other a supervised model trained on 50 randomly selected patients whose disease severity was predetermined by 2 independent physicians.
Results: Three principal components (PCs) emerged, reflecting lung function (PC1), blood inflammatory markers (PC2), and systemic corticosteroid receipt (PC3). PCA identified 5 distinct asthma phenotypes with significant clinical, physiologic, and inflammatory differences. A supervised model, trained on 50 randomly selected patients, predicted SA with 92% precision and 85% accuracy. SC3 was classified as an inflammatory, SA phenotype, making it highly suitable for biologic therapy.
Conclusion: Integrating machine learning with EHRs successfully classified and identified real-world asthma phenotypes, demonstrating the potential of this approach to identify SA for appropriate management and/or clinical studies.
Keywords: Asthma; electronic health records; machine learning; predictive modeling.
© 2025 The Author(s).