Cytotoxic T lymphocytes (CTLs) play a key role in the defense of cancer and infectious diseases. CTLs are mainly activated by T cell receptors (TCRs) after recognizing the peptide-bound class I major histocompatibility complex, and subsequently kill virus-infected cells and tumor cells. Therefore, identification of antigen-specific CTLs and their TCRs is a promising agent for T-cell based intervention. Currently, the experimental identification and validation of antigen-specific CTLs is well-used but extremely resource-intensive. The machine learning methods for TCR-pMHC prediction are growing interest particularly with advances in single-cell technologies. This review clarifies the key biological processes involved in TCR-pMHC binding. After comprehensively comparing the advantages and disadvantages of several state-of-the-art machine learning algorithms for TCR-pMHC prediction, we point out the discrepancies with these machine learning methods under specific disease conditions. Finally, we proposed a roadmap of TCR-pMHC prediction. This roadmap would enable more accurate TCR-pMHC binding prediction when improving data quality, encoding and embedding methods, training models, and application context. This review could facilitate the development of T-cell based vaccines and therapy.
Keywords: TCR-pMHC; data quality; deep learning; encoding; prediction.
© The Author(s) 2025. Published by Oxford University Press.