Background: Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful tool for cancer research, enabling in-depth characterization of tumor heterogeneity at the single-cell level. Recently, several scRNA-seq copy number variation (scCNV) inference methods have been developed, expanding the application of scRNA-seq to study genetic heterogeneity in cancer using transcriptomic data. However, the fidelity of these methods has not been investigated systematically.
Methods: We benchmarked five commonly used scCNV inference methods: HoneyBADGER, CopyKAT, CaSpER, inferCNV, and sciCNV. We evaluated their performance across four different scRNA-seq platforms using data from our previous multicenter study. We evaluated scCNV performance further using scRNA-seq datasets derived from mixed samples consisting of five human lung adenocarcinoma cell lines and also sequenced tissues from a small cell lung cancer patient and used the data to validate our findings with a clinical scRNA-seq dataset.
Results: We found that the sensitivity and specificity of the five scCNV inference methods varied, depending on the selection of reference data, sequencing depth, and read length. CopyKAT and CaSpER outperformed other methods overall, while inferCNV, sciCNV, and CopyKAT performed better than other methods in subclone identification. We found that batch effects significantly affected the performance of subclone identification in mixed datasets in most methods we tested.
Conclusion: Our benchmarking study revealed the strengths and weaknesses of each of these scCNV inference methods and provided guidance for selecting the optimal CNV inference method using scRNA-seq data.
Keywords: RNA-seq; benchmarking; copy number variation (CNV) inference; scRNA-seq; scRNA-seq CNV methods.
© The Author(s) 2025. Published by Oxford University Press on behalf of the West China School of Medicine & West China Hospital of Sichuan University.