A comparative study of machine learning models predicting post-hepatectomy liver failure: enhancing risk estimation in over 25,000 National Surgical Quality Improvement Program patients

Gautham Nair; Ali Hadi; Kartik Gupta; Edward Tran; Geerthan Srikantharajah; Evelyn Waugh; Ephraim Tang; Anton Skaro; Juan Glinka

doi:10.14701/ahbps.25-046

A comparative study of machine learning models predicting post-hepatectomy liver failure: enhancing risk estimation in over 25,000 National Surgical Quality Improvement Program patients

Ann Hepatobiliary Pancreat Surg. 2025 Jul 7. doi: 10.14701/ahbps.25-046. Online ahead of print.

Authors

Gautham Nair¹, Ali Hadi¹, Kartik Gupta¹, Edward Tran¹, Geerthan Srikantharajah², Evelyn Waugh^{1

3}, Ephraim Tang^{1

3

4}, Anton Skaro^{1

3}, Juan Glinka^{1

3}

Affiliations

¹ Schulich School of Medicine, University of Western Ontario, London, ON, Canada.
² Toronto Metropolitan University, Toronto, ON, Canada.
³ Department of Surgery, Schulich School of Medicine, London, ON, Canada.
⁴ Department of Oncology Schulich School of Medicine, London, ON, Canada.

PMID: 40619161
DOI: 10.14701/ahbps.25-046

Abstract

Backgrounds/aims: Post-hepatectomy liver failure (PHLF) is a significant complication with an incidence rate between 8% and 12%. Machine learning (ML) can analyze large datasets to uncover patterns not apparent through traditional methods, enhancing PHLF prediction and potentially mitigate complications.

Methods: Using the National Surgical Quality Improvement Program (NSQIP) database, patients who underwent hepatectomy were randomized into training and testing sets. ML algorithms, including LightGBM, Random Forest, XGBoost, and Deep Neural Networks, were evaluated against logistic regression. Performance metrics included receiver operating characteristic area under the curve (ROC AUC) and Brier score loss. Shapley Additive exPlanations was used to identify individual variable relevance.

Results: 28,192 patients from 2013 to 2021 who underwent hepatectomy were included; PHLF occurred in 1,305 patients (4.6%). Preoperative and intraoperative factors most contributed to PHLF. Preoperative factors were international normalized ratio > 1.0, sodium < 139 mEq/L, albumin < 3.9 g/dL, American Society of Anesthesiologists score > 2, total bilirubin > 0.65 mg/dL. Intraoperative risks include transfusion requirements, trisectionectomy, operative time > 266.5 minutes, open surgical approach. The LightGBM model performed best with an ROC AUC of 0.8349 and a Brier Score loss of 0.0834.

Conclusions: While topical, the role of ML models in surgical risk stratification is evolving. This paper shows the potential of ML algorithms in identifying important subclinical changes that could affect surgical outcomes. Thresholds explored should not be taken as clinical cutoffs but as a proof of concept of how ML models could provide clinicians more information. Such integration could lead to improved clinical outcomes and efficiency in patient care.

Keywords: Hepatectomy; Liver failure; Machine learning; National Surgical Quality Improvement Program; Risk.