Prediction of Lymph Node Metastasis in Non-Small Cell Lung Carcinoma Using Primary Tumor Somatic Mutation Data

JCO Clin Cancer Inform. 2025 Jun:9:e2400303. doi: 10.1200/CCI-24-00303. Epub 2025 May 30.

Abstract

Purpose: Lymph node metastasis (LNM) significantly affects prognosis and treatment strategies in non-small cell lung cancer (NSCLC). Current diagnostic methods, including imaging and histopathology, have limited sensitivity and specificity. This study aims to develop and evaluate machine learning (ML) models that predict LNM in NSCLC using single-nucleotide polymorphism (SNP) data from The Cancer Genome Atlas.

Methods: A cohort of 542 patients with NSCLC with comprehensive SNP data were analyzed. After preprocessing, feature selection was performed using chi-square tests to identify SNPs significantly associated with LNM. Twelve ML models, including Logistic Regression, Naive Bayes, and Support Vector Machines, were trained and evaluated using bootstrapped data sets. Model performance was assessed using metrics such as accuracy, area under the receiver operating characteristic curve (AUC), and F1 score. Shapley additive explanations values were used for feature interpretability, and survival analysis was conducted to assess clinical outcomes.

Results: Naive Bayes and Logistic Regression models achieved the highest predictive performance, with median AUCs of 0.93 and 0.91, respectively. Key SNPs, including mutations in TANC2, KCNT2, and CENPF, were consistently identified as predictive features. Survival analysis demonstrated significant differences in outcomes on the basis of model-predicted LNM status (log-rank P = .0268). Feature selection improved model accuracy and robustness, highlighting the biological relevance of selected SNPs.

Conclusion: ML models leveraging primary tumor SNP data can enhance LNM prediction in NSCLC, outperforming traditional diagnostic methods. These findings underscore the potential of integrating genomics and ML to develop noninvasive biomarkers, enabling precise risk stratification and personalized treatment strategies in oncology.

MeSH terms

  • Aged
  • Bayes Theorem
  • Biomarkers, Tumor* / genetics
  • Carcinoma, Non-Small-Cell Lung* / genetics
  • Carcinoma, Non-Small-Cell Lung* / mortality
  • Carcinoma, Non-Small-Cell Lung* / pathology
  • Female
  • Humans
  • Lung Neoplasms* / genetics
  • Lung Neoplasms* / mortality
  • Lung Neoplasms* / pathology
  • Lymphatic Metastasis
  • Machine Learning
  • Male
  • Middle Aged
  • Mutation*
  • Polymorphism, Single Nucleotide
  • Prognosis
  • ROC Curve

Substances

  • Biomarkers, Tumor