Classifying Stereotactic Radiosurgery Patients by Primary Diagnosis Using Natural Language Processing of Clinical Notes

JCO Clin Cancer Inform. 2025 Jun:9:e2400268. doi: 10.1200/CCI-24-00268. Epub 2025 Jun 13.

Abstract

Purpose: Accurate identification of the primary tumor diagnosis of patients who have undergone stereotactic radiosurgery (SRS) from electronic health records is a critical but challenging task. Traditional methods of identifying the primary tumor histology relying on International Classification of Diseases (ICD)9 and ICD10 CM codes often fall short in granularity and completeness, particularly for patients with metastatic cancer.

Methods: In this study, we propose an approach leveraging natural language processing (NLP) algorithms to enhance the accuracy of extracting primary tumor histology from the patient's electronic records.

Results: Through manual annotation of patient data and subsequent algorithm training, we achieved improvements in accuracy and efficiency in primary tumor type classification and finding histology subtypes not available in ICD10 CM.

Conclusion: Our findings underscore the value of NLP in refining research processes, identifying patients' cohorts, and improving efficiencies with the goal of potentially improving patient outcomes in SRS treatment.

MeSH terms

  • Algorithms
  • Electronic Health Records*
  • Humans
  • Natural Language Processing*
  • Neoplasms* / diagnosis
  • Neoplasms* / surgery
  • Radiosurgery* / methods