An Interpretable Artificial Intelligence System for Crohn's Disease Ulcer Identification and Grading on Double-Balloon Enteroscopy Images

United European Gastroenterol J. 2025 Jul 3. doi: 10.1002/ueg2.70068. Online ahead of print.

Abstract

Background: Crohn's disease (CD) is an incurable inflammatory bowel disease that can lead to a variety of complications and requires lifelong treatment. However, the diagnosis and management of Crohn's disease exhibit high rates of misdiagnosis and missed diagnoses, along with significant variability, among primary care facilities and novice endoscopists. Therefore, we established an interpretable artificial intelligence (AI) system using double-balloon enteroscopy to facilitate Crohn's disease ulcer identification and grading.

Objective: To develop an interpretable AI system for the identification and grading of Crohn's disease ulcer images, offering bounding box localization for visual interpretability and factor-specific grading explanations for each ulcer to improve assessment performance.

Methods: We constructed a region and grading model of individual ulcers based on the YOLO-v5 algorithm. By analyzing the predicted results of all ulcers in each image, the clinical interpretation for the screening and assessment of Crohn's disease ulcer images was further achieved. To evaluate the system, we prepared the training and validation datasets (17,036 double-balloon enteroscopy images, 558 patients) and further collected a test cohort (2018 images, 70 patients) and an external validation set. A further reader study was conducted on the internal test set in which nine endoscopists participated to evaluate the auxiliary effectiveness of the explainable system.

Results: The Crohn's disease ulcer image detection sensitivity and area under the curve (AUC) were 91.8% and 0.949. The accuracies in assessing the severity of Crohn's disease ulcer images on three factors (size/ulcerated surface/depth) were 94.1%/92.5%/93.0%, respectively. With the system's support of visualized and analyzable predictions, junior endoscopists improved their Crohn's disease ulcer image recognition sensitivity by 12.7% and their accuracy and consistency of severity assessment by 26% and 27.4%.

Conclusion: The AI system outperformed general endoscopists in approaching expert-level proficiency in Crohn's disease ulcer identification and assessment. Its transparency in decision-making facilitated integration into clinical workflows, enhancing trust and consistency among endoscopists.

Keywords: artificial intelligence; crohn's disease; double‐balloon enteroscopy; inflammatory bowel diseases; interpretable system; severity grading; training; ulcer identification.