Enhancing diagnostic accuracy in rare and common fundus diseases with a knowledge-rich vision-language model

Nat Commun. 2025 Jul 1;16(1):5528. doi: 10.1038/s41467-025-60577-9.

Abstract

Previous foundation models for fundus images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language model that incorporates knowledge from over 400 fundus diseases. The model is pre-trained on 341,896 fundus images with accompanying text descriptions gathered from diverse sources across multiple ethnicities and countries. RetiZero demonstrates exceptional performance across various downstream tasks including zero-shot disease recognition, image-to-image retrieval, clinical diagnosis assistance, few-shot fine-tuning, and cross-domain disease identification. In zero-shot scenarios, it achieves Top-5 accuracies of 0.843 for 15 diseases and 0.756 for 52 diseases, while for image-to-image retrieval, it scores 0.950 and 0.886 respectively. Notably, RetiZero's Top-3 zero-shot performance exceeds the average diagnostic accuracy of 19 ophthalmologists from Singapore, China, and the United States. The model particularly enhances clinicians' ability to diagnose rare fundus conditions, highlighting its potential value for integration into clinical settings where diverse eye diseases are encountered.

MeSH terms

  • China
  • Fundus Oculi*
  • Humans
  • Language
  • Rare Diseases* / diagnosis
  • Rare Diseases* / diagnostic imaging
  • Retinal Diseases* / diagnosis
  • Retinal Diseases* / diagnostic imaging
  • Singapore