A distributional reinforcement learning model for optimal glucose control after cardiac surgery

NPJ Digit Med. 2025 May 27;8(1):313. doi: 10.1038/s41746-025-01709-9.

Abstract

This study introduces Glucose Level Understanding and Control Optimized for Safety and Efficacy (GLUCOSE), a distributional offline reinforcement learning algorithm for optimizing insulin dosing after cardiac surgery. Trained on 5228 patients, tested on 920, and externally validated on 649, GLUCOSE achieved a mean estimated reward of 0.0 [-0.07, 0.06] in internal testing and -0.63 [-0.74, -0.52] in external validation, outperforming clinician returns of -1.29 [-1.37, -1.20] and -1.02 [-1.16, -0.89]. In multi-phase human validation, GLUCOSE first showed a significantly lower mean absolute error (MAE) in insulin dosing, with 0.9 units MAE versus clinicians' 1.97 units (p < 0.001) in internal testing and 1.90 versus 2.24 units (p = 0.003) in external validation. The second and third phases found GLUCOSE's performance as comparable to or exceeding that of senior clinicians in MAE, safety, effectiveness, and acceptability. These findings suggest GLUCOSE as a robust tool for improving postoperative glucose management.