Comparative performance of twelve machine learning models in predicting COVID-19 mortality risk in children: a population-based retrospective cohort study in Brazil

Adriano Lages Dos Santos; Maria Christina L Oliveira; Enrico A Colosimo; Robert H Mak; Clara C Pinhati; Stella C Gallante; Hercílio Martelli-Júnior; Ana Cristina Simões E Silva; Eduardo A Oliveira

doi:10.7717/peerj-cs.2916

Comparative performance of twelve machine learning models in predicting COVID-19 mortality risk in children: a population-based retrospective cohort study in Brazil

PeerJ Comput Sci. 2025 May 28:11:e2916. doi: 10.7717/peerj-cs.2916. eCollection 2025.

Authors

Adriano Lages Dos Santos^{1

2}, Maria Christina L Oliveira², Enrico A Colosimo³, Robert H Mak⁴, Clara C Pinhati², Stella C Gallante², Hercílio Martelli-Júnior⁵, Ana Cristina Simões E Silva², Eduardo A Oliveira²

Affiliations

¹ Engineering and Informatics, Federal Institute of Science and Technology of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
² Department of Pediatrics, School of Medicine, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
³ Department of Statistics, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
⁴ Division of Pediatric Nephrology, Rady Children's Hospital, University of California, San Diego, San Diego, California, United States.
⁵ Department of Health Sciences, School of Odontology, Montes Claros State University, Montes Claros, Minas Gerais, Brazil.

Abstract

The COVID-19 pandemic has catalyzed the application of advanced digital technologies such as artificial intelligence (AI) to predict mortality in adult patients. However, the development of machine learning (ML) models for predicting outcomes in children and adolescents with COVID-19 remains limited. This study aimed to evaluate the performance of multiple machine learning models in forecasting mortality among hospitalized pediatric COVID-19 patients. In this cohort study, we used the SIVEP-Gripe dataset, a public resource maintained by the Ministry of Health, to track severe acute respiratory syndrome (SARS) in Brazil. To create subsets for training and testing the machine learning (ML) models, we divided the primary dataset into three parts. Using these subsets, we developed and trained 12 ML algorithms to predict the outcomes. We assessed the performance of these models using various metrics such as accuracy, precision, sensitivity, recall, and area under the receiver operating characteristic curve (AUC). Among the 37 variables examined, 24 were found to be potential indicators of mortality, as determined by the chi-square test of independence. The Logistic Regression (LR) algorithm achieved the highest performance, with an accuracy of 92.5% and an AUC of 80.1%, on the optimized dataset. Gradient boosting classifier (GBC) and AdaBoost (ADA), closely followed the LR algorithm, producing similar results. Our study also revealed that baseline reduced oxygen saturation, presence of comorbidities, and older age were the most relevant factors in predicting mortality in children and adolescents hospitalized with SARS-CoV-2 infection. The use of ML models can be an asset in making clinical decisions and implementing evidence-based patient management strategies, which can enhance patient outcomes and overall quality of medical care. LR, GBC, and ADA models have demonstrated efficiency in accurately predicting mortality in COVID-19 pediatric patients.

Keywords: Artificial intelligence; COVID-19; Children; Death prediction; Healthcare; Machine learning; Mortality; Risk.