Purpose: Large language models have shown promise in answering questions related to medical conditions. This study evaluated the responses of ChatGPT-4 in answering patient-centred frequently asked questions (FAQs) relevant in age-related macular degeneration (AMD).
Methods: Ten experts across a range of clinical, education and research practices in optometry and ophthalmology. Over 200 patient-centric FAQs from authoritative professional society, hospital and advocacy websites were condensed into 37 questions across four themes: definition, causes and risk factors, symptoms and detection, and treatment and follow-up. The questions were individually input into ChatGPT-4 to generate responses. The responses were graded by the experts individually using a 5-point Likert scale (1 = strongly disagree; 5 = strongly agree) across four domains: coherency, factuality, comprehensiveness, and safety.
Results: Across all themes and domains, median scores were all 4 ("agree"). Comprehensiveness had the lowest scores across domains (mean 3.8 ± 0.8), followed by factuality (mean 3.9 ± 0.8), safety (mean 4.1 ± 0.8) and coherency (mean 4.3 ± 0.7). Examination of the individual 37 questions showed that 5 (14%), 21 (57%), 23 (62%) and 9 (24%) of the questions had average scores below 4 (below "agree") for the coherency, factuality, comprehensiveness and safety domains, respectively. Free-text comments highlighted issues related to superseded or older technologies, and techniques that are not routinely used in clinical practice, such as genetic testing.
Conclusions: ChatGPT-4 responses to FAQs in AMD were generally agreeable in terms of coherency, factuality, comprehensiveness, and safety. However, areas of weakness were identified, precluding recommendations for routine use to provide patients with tailored counselling in AMD.
© 2025. The Author(s).