A distributional reinforcement learning model for optimal glucose control after cardiac surgery

Jacob M Desman; Zhang-Wei Hong; Moein Sabounchi; Ashwin S Sawant; Jaskirat Gill; Ana C Costa; Gagan Kumar; Rajeev Sharma; Arpeta Gupta; Paul McCarthy; Veena Nandwani; Doug Powell; Alexandra Carideo; Donnie Goodwin; Sanam Ahmed; Umesh Gidwani; Matthew A Levin; Robin Varghese; Farzan Filsoufi; Robert Freeman; Avniel Shetreat-Klein; Alexander W Charney; Ira Hofer; Lili Chan; David Reich; Patricia Kovatch; Roopa Kohli-Seth; Monica Kraft; Pulkit Agrawal; John A Kellum; Girish N Nadkarni; Ankit Sakhuja

doi:10.1038/s41746-025-01709-9

A distributional reinforcement learning model for optimal glucose control after cardiac surgery

NPJ Digit Med. 2025 May 27;8(1):313. doi: 10.1038/s41746-025-01709-9.

Authors

Jacob M Desman^{1

2}, Zhang-Wei Hong³, Moein Sabounchi^{1

2}, Ashwin S Sawant^{1

2

4}, Jaskirat Gill⁵, Ana C Costa⁶, Gagan Kumar⁷, Rajeev Sharma⁸, Arpeta Gupta⁹, Paul McCarthy¹⁰, Veena Nandwani¹⁰, Doug Powell¹⁰, Alexandra Carideo⁵, Donnie Goodwin¹⁰, Sanam Ahmed⁵, Umesh Gidwani⁵, Matthew A Levin¹¹, Robin Varghese^{5

6}, Farzan Filsoufi⁶, Robert Freeman¹, Avniel Shetreat-Klein¹², Alexander W Charney¹, Ira Hofer^{1

2

11}, Lili Chan^{1

2

13}, David Reich¹¹, Patricia Kovatch¹⁴, Roopa Kohli-Seth⁵, Monica Kraft¹⁵, Pulkit Agrawal³, John A Kellum¹⁶, Girish N Nadkarni^#^{1

2

13}, Ankit Sakhuja^#^{17

18

19}

Affiliations

¹ The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
² Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
³ Improbable AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
⁴ Division of Hospital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁵ Institute for Critical Care Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁶ Department of Cardiothoracic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁷ Department of Pulmonary and Critical Care Medicine, Northeast Georgia Medical Center, Gainesville, GA, USA.
⁸ Division of Endocrinology, Hackensack University Medical Center, Hackensack, NJ, USA.
⁹ Division of Endocrinology, Millenium Physician Group, Jacksonville, FL, USA.
¹⁰ Section of Cardiovascular Critical Care, Department of Cardiovascular and Thoracic Surgery, West Virginia University, Morgantown, WV, USA.
¹¹ Department of Anesthesiology, Perioperative, and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
¹² Department of Rehabilitation and Physical Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
¹³ Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
¹⁴ Scientific Computing, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
¹⁵ Samuel Bronfman Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
¹⁶ Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
¹⁷ The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA. ankit.sakhuja@mssm.edu.
¹⁸ Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA. ankit.sakhuja@mssm.edu.
¹⁹ Institute for Critical Care Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA. ankit.sakhuja@mssm.edu.

^# Contributed equally.

Abstract

This study introduces Glucose Level Understanding and Control Optimized for Safety and Efficacy (GLUCOSE), a distributional offline reinforcement learning algorithm for optimizing insulin dosing after cardiac surgery. Trained on 5228 patients, tested on 920, and externally validated on 649, GLUCOSE achieved a mean estimated reward of 0.0 [-0.07, 0.06] in internal testing and -0.63 [-0.74, -0.52] in external validation, outperforming clinician returns of -1.29 [-1.37, -1.20] and -1.02 [-1.16, -0.89]. In multi-phase human validation, GLUCOSE first showed a significantly lower mean absolute error (MAE) in insulin dosing, with 0.9 units MAE versus clinicians' 1.97 units (p < 0.001) in internal testing and 1.90 versus 2.24 units (p = 0.003) in external validation. The second and third phases found GLUCOSE's performance as comparable to or exceeding that of senior clinicians in MAE, safety, effectiveness, and acceptability. These findings suggest GLUCOSE as a robust tool for improving postoperative glucose management.

Abstract

Grants and funding