Transformer-based deep learning for accurate detection of multiple base modifications using single molecule real-time sequencing

Commun Biol. 2025 Apr 14;8(1):606. doi: 10.1038/s42003-025-08009-8.

Abstract

We had previously reported a convolutional neural network (CNN) based approach, called the holistic kinetic model (HK model 1), for detecting 5-methylcytosine (5mC) by single molecule real-time sequencing (Pacific Biosciences). In this study, we constructed a hybrid model with CNN and transformer layers, named HK model 2. We improve the area under the receiver operating characteristic curve (AUC) for 5mC detection from 0.91 for HK model 1 to 0.99 for HK model 2. We further demonstrate that HK model 2 can detect other types of base modifications, such as 5-hydroxymethylcytosine (5hmC) and N6-methyladenine (6mA). Using HK model 2 to analyze 5mC patterns of cell-free DNA (cfDNA) molecules, we demonstrate the enhanced detection of patients with hepatocellular carcinoma, with an AUC of 0.97. Moreover, HK model 2-based detection of 6mA enables the detection of jagged ends of cfDNA and the delineation of cellular chromatin structures. HK model 2 is thus a versatile tool expanding the applications of single molecule real-time sequencing in liquid biopsies.

MeSH terms

  • 5-Methylcytosine / analogs & derivatives
  • 5-Methylcytosine / analysis
  • Adenine / analogs & derivatives
  • Carcinoma, Hepatocellular* / diagnosis
  • Carcinoma, Hepatocellular* / genetics
  • Cell-Free Nucleic Acids
  • DNA Methylation*
  • Deep Learning*
  • Humans
  • Liver Neoplasms* / diagnosis
  • Liver Neoplasms* / genetics
  • Neural Networks, Computer
  • Sequence Analysis, DNA* / methods

Substances

  • 5-Methylcytosine
  • 5-hydroxymethylcytosine
  • Cell-Free Nucleic Acids
  • Adenine