Accelerating the Discovery of Anticancer Peptides through Deep Forest Architecture with Deep Graphical Representation

Int J Mol Sci. 2023 Feb 21;24(5):4328. doi: 10.3390/ijms24054328.

Abstract

Cancer is one of the leading diseases threatening human life and health worldwide. Peptide-based therapies have attracted much attention in recent years. Therefore, the precise prediction of anticancer peptides (ACPs) is crucial for discovering and designing novel cancer treatments. In this study, we proposed a novel machine learning framework (GRDF) that incorporates deep graphical representation and deep forest architecture for identifying ACPs. Specifically, GRDF extracts graphical features based on the physicochemical properties of peptides and integrates their evolutionary information along with binary profiles for constructing models. Moreover, we employ the deep forest algorithm, which adopts a layer-by-layer cascade architecture similar to deep neural networks, enabling excellent performance on small datasets but without complicated tuning of hyperparameters. The experiment shows GRDF exhibits state-of-the-art performance on two elaborate datasets (Set 1 and Set 2), achieving 77.12% accuracy and 77.54% F1-score on Set 1, as well as 94.10% accuracy and 94.15% F1-score on Set 2, exceeding existing ACP prediction methods. Our models exhibit greater robustness than the baseline algorithms commonly used for other sequence analysis tasks. In addition, GRDF is well-interpretable, enabling researchers to better understand the features of peptide sequences. The promising results demonstrate that GRDF is remarkably effective in identifying ACPs. Therefore, the framework presented in this study could assist researchers in facilitating the discovery of anticancer peptides and contribute to developing novel cancer treatments.

Keywords: anticancer peptide; deep forest; drug discovery; feature extraction; sequence analysis.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Humans
  • Neoplasms*
  • Neural Networks, Computer
  • Peptides* / chemistry

Substances

  • Peptides

Grants and funding

This work was supported by the Guangdong Province Basic and Applied Basic Research Fund (2021A1515012447), National Natural Science Foundation of China (32070659), Shenzhen-Hong Kong Cooperation Zone for Technology and Innovation (HZQB-KCZYB-2020056), and the Kobilka Institute of Innovative Drug Discovery, The Chinese University of Hong Kong, Shenzhen, China. This work was also supported by the College of Biological Science and Technology, National Yang Ming Chiao Tung University, Taiwan.