cfMethylPre: deep transfer learning enhances cancer detection based on circulating cell-free DNA methylation profiling

Brief Bioinform. 2025 May 1;26(3):bbaf303. doi: 10.1093/bib/bbaf303.

Abstract

Cancer remains a significant global health burden, underscoring the need for innovative diagnostic tools to enable early detection and improve patient outcomes. While circulating cell-free DNA (cfDNA) methylation has emerged as a promising biomarker for noninvasive cancer diagnostics, existing methods often face limitations in handling the high-dimensionality of methylation data, small sample sizes, and a lack of biological interpretability. To address these challenges, we propose cfMethylPre, a novel deep transfer learning framework tailored for cancer detection using cfDNA methylation data. cfMethylPre leverages large language model pretrained embeddings from DNA sequence information and integrates them with methylation profiles to enhance feature representation. The deep transfer learning process involves pretraining on bulk DNA methylation data encompassing 2801 samples across 82 cancer types and normal controls, followed by fine-tuning with cfDNA methylation data. This approach ensures robust adaptation to cfDNA's unique characteristics while improving predictive accuracy. Our model achieved superior predictive accuracy compared with state-of-the-art methods, with a weighted Matthews Correlation Coefficient of 0.926 and a weighted F1-score of 0.942. Through model interpretation and biological experimental validation, we identified three novel breast cancer genes-PCDHA10, PRICKLE2, and PRTG-demonstrating their inhibitory effects on cell proliferation and migration in breast cancer cell lines. These findings establish cfMethylPre as a powerful and interpretable tool for cancer diagnostics and biological discovery, paving the way for its application in precision oncology.

Keywords: cell-free DNA methylation; deep learning; large language model; transfer learning.

MeSH terms

  • Biomarkers, Tumor* / blood
  • Biomarkers, Tumor* / genetics
  • Cell-Free Nucleic Acids* / blood
  • Cell-Free Nucleic Acids* / genetics
  • DNA Methylation*
  • Deep Learning*
  • Humans
  • Neoplasms* / blood
  • Neoplasms* / diagnosis
  • Neoplasms* / genetics

Substances

  • Cell-Free Nucleic Acids
  • Biomarkers, Tumor