First steps toward building natural history of diseases computationally: Lessons learned from the Noonan syndrome use case

Am J Hum Genet. 2025 May 1;112(5):1158-1172. doi: 10.1016/j.ajhg.2025.03.014. Epub 2025 Apr 16.

Abstract

Rare diseases (RDs) are conditions affecting fewer than 1 in 2,000 people, with over 7,000 identified, primarily genetic in nature, and more than half impacting children. Although each RD affects a small population, collectively, between 3.5% and 5.9% of the global population, or 262.9-446.2 million people, live with an RD. Most RDs lack established treatment protocols, highlighting the need for proper care pathways addressing prognosis, diagnosis, and management. Advances in generative AI and large language models (LLMs) offer new opportunities to document the temporal progression of phenotypic features, addressing gaps in current knowledge bases. This study proposes an LLM-based framework to capture the natural history of diseases, specifically focusing on Noonan syndrome. The framework aims to document phenotypic trajectories, validate against RD knowledge bases, and integrate insights into care coordination using electronic health record (EHR) data from the Undiagnosed Diseases Program Singapore.

Keywords: Human Phenotype Ontology; Large Language Models; Noonan syndrome; generative AI; natural history of disease; rare diseases.

MeSH terms

  • Child
  • Electronic Health Records
  • Humans
  • Knowledge Bases
  • Noonan Syndrome* / diagnosis
  • Noonan Syndrome* / epidemiology
  • Noonan Syndrome* / genetics
  • Noonan Syndrome* / pathology
  • Phenotype
  • Rare Diseases* / epidemiology
  • Rare Diseases* / genetics
  • Singapore / epidemiology