Biologically relevant integration of transcriptomics profiles from cancer cell lines, patient-derived xenografts, and clinical tumors using deep learning

Sci Adv. 2025 Jan 17;11(3):eadn5596. doi: 10.1126/sciadv.adn5596. Epub 2025 Jan 17.

Abstract

Cell lines and patient-derived xenografts are essential to cancer research; however, the results derived from such models often lack clinical translatability, as they do not fully recapitulate the complex cancer biology. Identifying preclinical models that sufficiently resemble the biological characteristics of clinical tumors across different cancers is critically important. Here, we developed MOBER, Multi-Origin Batch Effect Remover method, to simultaneously extract biologically meaningful embeddings while removing confounder information. Applying MOBER on 932 cancer cell lines, 434 patient-derived tumor xenografts, and 11,159 clinical tumors, we identified preclinical models with greatest transcriptional fidelity to clinical tumors and models that are transcriptionally unrepresentative of their respective clinical tumors. MOBER allows for transformation of transcriptional profiles of preclinical models to resemble the ones of clinical tumors and, therefore, can be used to improve the clinical translation of insights gained from preclinical models. MOBER is a versatile batch effect removal method applicable to diverse transcriptomic datasets, enabling integration of multiple datasets simultaneously.

MeSH terms

  • Animals
  • Cell Line, Tumor
  • Deep Learning*
  • Gene Expression Profiling*
  • Gene Expression Regulation, Neoplastic
  • Heterografts
  • Humans
  • Mice
  • Neoplasms* / genetics
  • Neoplasms* / pathology
  • Transcriptome*
  • Xenograft Model Antitumor Assays