Background: We analyzed variables reported during routine clinical practice using a registrational database to estimate risk factors for depression in people with type 2 diabetes mellitus.
Methods: A Patient Health Questionnaire (PHQ-9) score of 15 was selected as the cut-off for clinically meaningful depression. Missing data was either filled in with a median value, the k-nearest neighbors' method, or the entire variable was removed. Logistic regression, random forest, and decision tree machine learning models were used to decide which factors were most relevant to depression. The accuracy of each algorithm was evaluated with a testing set.
Results: When all variables were included in the logistic regression model, the area under the receiver operating characteristic curve was 0.81. In the random forest model, the most important factor was quality of life (QoL). Upon removing QoL-related variables, bloating, and autoimmune disease became the greatest contributing factors. Model accuracy was 83.1%. In the decision tree model, QoL was also observed as the most decisive factor. Upon removing QoL variables, bloating was the first node. Model accuracy was 82.5%.
Conclusion: Quality of life, bloating, and autoimmune disease were the most important factors associated with depression in type 2 diabetes mellitus patients.
Keywords: Depression; Risk factor; Taiwan; Type 2 diabetes mellitus.
Copyright © 2025, the Chinese Medical Association.