In-depth analysis of the risk factors for persistent severe acute respiratory syndrome coronavirus 2 infection and construction of predictive models: an exploratory research study

BMC Infect Dis. 2025 May 14;25(1):699. doi: 10.1186/s12879-025-11083-2.

Abstract

Background: Persistent severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection differs from long coronavirus disease (COVID-19) (acute symptoms ≥ 12 weeks post-clearance). The Omicron BA.5 variant has a shorter median clearance time (10-14 days) than the Delta variant, suggesting that the traditional 20-day diagnostic threshold may delay interventions in high-risk populations. This study integrated multi-threshold analysis (14/20/30 days), whole-genome sequencing, and machine learning to investigate diagnostic thresholds for persistent SARS-CoV-2 infection and developed a generalizable risk prediction model.

Methods: This retrospective study analyzed data from 1,216 patients with COVID-19 hospitalized at Aerospace Center Hospital between January 2021 and October 2024. We used whole-genome sequencing to genotype all COVID-19 cases and to identify major variants (such as Omicron BA. 5, Delta). The outcome, "persistent SARS-CoV-2 infection," was defined as viral nucleic acid positivity ≥ 14 days. Risk factors associated with persistent infection were identified through subgroup analysis with multiple logistic regression (adjusted for age, comorbidities, vaccination status, and virus strain) and machine learning models (70% training, 30% testing dataset).

Results: Persistent SARS-CoV-2 infection was identified in 15.5% (188/1,216) of hospitalized COVID-19 patients. Key predictors included comorbidities-hypertension, diabetes, and active malignancy-and immune dysfunction, marked by reduced B-cell and CD4 + T-cell counts. Unvaccinated patients exhibited an 82% higher risk of persistent infection. Elevated inflammatory markers (C-reactive protein and interleukin-6) and bilateral lung infiltrates on computed tomography further distinguished persistent cases. The predictive model demonstrated strong discrimination with an area under the curve (AUC) of 0.847 (95% confidence interval: 0.815-0.879) and an AUC of 0.81 externally in external validation, underscoring its clinical utility for risk stratification.

Conclusions: Hypertension, diabetes, malignancy, immunosuppression (low B/CD4 + cells), and non-vaccination are independent risk factors for persistent SARS-CoV-2 infection. Integrating these factors into clinical risk stratification may optimize management of high-risk populations.

Keywords: Clinical manifestations; Persistent infection; Predictive model construction; Risk factors; SARS-CoV-2.

MeSH terms

  • Adult
  • Aged
  • COVID-19* / diagnosis
  • COVID-19* / epidemiology
  • COVID-19* / virology
  • Female
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • Retrospective Studies
  • Risk Factors
  • SARS-CoV-2* / genetics
  • Whole Genome Sequencing