Development and validation of machine learning-augmented algorithm for insulin sensitivity assessment in the community and primary care settings: a population-based study in China

Hao Zhang; Tianshu Zeng; Jiaoyue Zhang; Juan Zheng; Jie Min; Miaomiao Peng; Geng Liu; Xueyu Zhong; Ying Wang; Kangli Qiu; Shenghua Tian; Xiaohuan Liu; Hantao Huang; Marina Surmach; Ping Wang; Xiang Hu; Lulu Chen

doi:10.3389/fendo.2024.1292346

Development and validation of machine learning-augmented algorithm for insulin sensitivity assessment in the community and primary care settings: a population-based study in China

Front Endocrinol (Lausanne). 2024 Jan 25:15:1292346. doi: 10.3389/fendo.2024.1292346. eCollection 2024.

Authors

Hao Zhang^{1

2}, Tianshu Zeng^{1

2}, Jiaoyue Zhang^{1

2}, Juan Zheng^{1

2}, Jie Min^{1

2}, Miaomiao Peng^{1

2}, Geng Liu^{1

2}, Xueyu Zhong^{1

2}, Ying Wang^{1

2}, Kangli Qiu^{1

2}, Shenghua Tian^{1

2}, Xiaohuan Liu^{1

2}, Hantao Huang³, Marina Surmach⁴, Ping Wang⁵, Xiang Hu^{1

2}, Lulu Chen^{1

2}

Affiliations

¹ Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
² Hubei Provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China.
³ Department of Emergency Medicine, Yichang Yiling Hospital, Yichang, China.
⁴ Department of Public Health and Health Services, Grodno State Medical University, Grodno, Belarus.
⁵ Precision Health Program, Department of Radiology, College of Human Medicine, Michigan State University, East Lansing, MI, United States.

Abstract

Objective: Insulin plays a central role in the regulation of energy and glucose homeostasis, and insulin resistance (IR) is widely considered as the "common soil" of a cluster of cardiometabolic disorders. Assessment of insulin sensitivity is very important in preventing and treating IR-related disease. This study aims to develop and validate machine learning (ML)-augmented algorithms for insulin sensitivity assessment in the community and primary care settings.

Methods: We analyzed the data of 9358 participants over 40 years old who participated in the population-based cohort of the Hubei center of the REACTION study (Risk Evaluation of Cancers in Chinese Diabetic Individuals). Three non-ensemble algorithms and four ensemble algorithms were used to develop the models with 70 non-laboratory variables for the community and 87 (70 non-laboratory and 17 laboratory) variables for the primary care settings to screen the classifier of the state-of-the-art. The models with the best performance were further streamlined using top-ranked 5, 8, 10, 13, 15, and 20 features. Performances of these ML models were evaluated using the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPR), and the Brier score. The Shapley additive explanation (SHAP) analysis was employed to evaluate the importance of features and interpret the models.

Results: The LightGBM models developed for the community (AUROC 0.794, AUPR 0.575, Brier score 0.145) and primary care settings (AUROC 0.867, AUPR 0.705, Brier score 0.119) achieved higher performance than the models constructed by the other six algorithms. The streamlined LightGBM models for the community (AUROC 0.791, AUPR 0.563, Brier score 0.146) and primary care settings (AUROC 0.863, AUPR 0.692, Brier score 0.124) using the 20 top-ranked variables also showed excellent performance. SHAP analysis indicated that the top-ranked features included fasting plasma glucose (FPG), waist circumference (WC), body mass index (BMI), triglycerides (TG), gender, waist-to-height ratio (WHtR), the number of daughters born, resting pulse rate (RPR), etc.

Conclusion: The ML models using the LightGBM algorithm are efficient to predict insulin sensitivity in the community and primary care settings accurately and might potentially become an efficient and practical tool for insulin sensitivity assessment in these settings.

Keywords: community settings; insulin sensitivity assessment; machine learning; primary care settings; risk factors.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Algorithms
China / epidemiology
Humans
Insulin
Insulin Resistance*
Machine Learning
Primary Health Care

Substances

Insulin

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was supported by grants from the National Natural Science Foundation of China (82170822, 82173517, and 81900734) and the Ministry of Science and Technology of the People’s Republic of China (2016YFC0901200 and 2016YFC0901203).