MDD-LLM: Towards accuracy large language models for major depressive disorder diagnosis

Yuyang Sha; Hongxin Pan; Wei Xu; Weiyu Meng; Gang Luo; Xinyu Du; Xiaobing Zhai; Henry H Y Tong; Caijuan Shi; Kefeng Li

doi:10.1016/j.jad.2025.119774

MDD-LLM: Towards accuracy large language models for major depressive disorder diagnosis

J Affect Disord. 2025 Jun 26:119774. doi: 10.1016/j.jad.2025.119774. Online ahead of print.

Authors

Yuyang Sha¹, Hongxin Pan¹, Wei Xu¹, Weiyu Meng¹, Gang Luo¹, Xinyu Du², Xiaobing Zhai¹, Henry H Y Tong¹, Caijuan Shi³, Kefeng Li⁴

Affiliations

¹ Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, 999708, Macao.
² Department of Dentistry, Botou Hospital, Cangzhou 062150, HeHei, China.
³ College of Artificial Intelligence, North China University of Science and Technology, TangShan 063210, HeBei, China.
⁴ Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, 999708, Macao. Electronic address: kefengl@mpu.edu.mo.

PMID: 40581100
DOI: 10.1016/j.jad.2025.119774

Abstract

Background: Major depressive disorder (MDD) impacts >300 million individuals worldwide, highlighting a significant public health issue. However, the uneven distribution of medical resources and the complexity of diagnostic methods have resulted in inadequate attention to this disorder in numerous countries and regions.

Methods: This paper introduces a high-performance MDD diagnosis tool named MDD-LLM, an AI-driven framework that utilizes fine-tuned large language models (LLMs) and extensive real-world samples to tackle challenges in MDD diagnosis. Specifically, we select 274,348 individual records from the UK Biobank cohort and design three tabular data transformation methods to create a large corpus for training and evaluating the proposed method. To illustrate the advantages of MDD-LLM, we perform comprehensive experiments and provide several comparative analyses against existing model-based solutions across multiple evaluation metrics.

Results: Experimental results show that MDD-LLM (70B) achieves an accuracy of 0.8378 and an AUC of 0.8919 (95 % CI: 0.8799-0.9040), significantly outperforming existing machine and deep learning frameworks for MDD diagnosis. Given the limited exploration of LLMs in MDD diagnosis, we examine numerous factors that may influence the performance of our proposed method, including tabular data transformation techniques and different fine-tuning strategies. Furthermore, we also analyze the model's interpretability, requiring the MDD-LLM to explain its predictions and provide corresponding reasons.

Conclusion: This paper investigates the application of LLMs and large-scale training samples for diagnosing MDD. The findings indicate that LLMs-driven schemes offer significant potential for accuracy, robustness, and interpretability in MDD diagnosis compared to traditional model-based solutions.

Keywords: Artificial intelligence; Large language models; Major depressive disorder; Medical data processing; Supervised fine-tuning.