A benchmarking study of copy number variation inference methods using single-cell RNA-sequencing data

Xin Chen; Li Tai Fang; Zhong Chen; Wanqiu Chen; Hongjin Wu; Bin Zhu; Malcolm Moos Jr; Andrew Farmer; Xiaowen Zhang; Wei Xiong; Shusheng Gong; Wendell Jones; Christopher E Mason; Shixiu Wu; Chunlin Xiao; Charles Wang

doi:10.1093/pcmedi/pbaf011

A benchmarking study of copy number variation inference methods using single-cell RNA-sequencing data

Precis Clin Med. 2025 Jun 4;8(2):pbaf011. doi: 10.1093/pcmedi/pbaf011. eCollection 2025 Jun.

Authors

Xin Chen^{1

2

3}, Li Tai Fang⁴, Zhong Chen^{1

2}, Wanqiu Chen^{1

2}, Hongjin Wu¹, Bin Zhu⁵, Malcolm Moos Jr⁶, Andrew Farmer⁷, Xiaowen Zhang⁸, Wei Xiong⁹, Shusheng Gong⁹, Wendell Jones¹⁰, Christopher E Mason¹¹, Shixiu Wu¹², Chunlin Xiao¹³, Charles Wang^{1

2}

Affiliations

¹ Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA.
² Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA.
³ Discovery and Exploratory Statistics, AbbVie Bioresearch Center, Worcester, MA 01605, USA.
⁴ Bioinformatics Research Engineering, Freenome Holdings Inc., South San Francisco, CA 94080, USA.
⁵ Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, 9609 Medical Center Drive, Bethesda, Maryland 20892, USA.
⁶ Center for Biologics Evaluation and Research, Office of Cellular Therapies and Human Tissues, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, USA.
⁷ Takara Bio USA, Inc., San Jose, CA 95131, USA.
⁸ Department of Otolaryngology, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou 510182, China.
⁹ Department of Otolaryngology, Beijing Friendship Hospital, Capital Medical University, Beijing 100050, China.
¹⁰ IQVIA Laboratories Genomics, Durham, NC 27703, USA.
¹¹ Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA.
¹² Quzhou Hospital, Wenzhou Medical University, Quzhou 324000, China.
¹³ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

Abstract

Background: Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful tool for cancer research, enabling in-depth characterization of tumor heterogeneity at the single-cell level. Recently, several scRNA-seq copy number variation (scCNV) inference methods have been developed, expanding the application of scRNA-seq to study genetic heterogeneity in cancer using transcriptomic data. However, the fidelity of these methods has not been investigated systematically.

Methods: We benchmarked five commonly used scCNV inference methods: HoneyBADGER, CopyKAT, CaSpER, inferCNV, and sciCNV. We evaluated their performance across four different scRNA-seq platforms using data from our previous multicenter study. We evaluated scCNV performance further using scRNA-seq datasets derived from mixed samples consisting of five human lung adenocarcinoma cell lines and also sequenced tissues from a small cell lung cancer patient and used the data to validate our findings with a clinical scRNA-seq dataset.

Results: We found that the sensitivity and specificity of the five scCNV inference methods varied, depending on the selection of reference data, sequencing depth, and read length. CopyKAT and CaSpER outperformed other methods overall, while inferCNV, sciCNV, and CopyKAT performed better than other methods in subclone identification. We found that batch effects significantly affected the performance of subclone identification in mixed datasets in most methods we tested.

Conclusion: Our benchmarking study revealed the strengths and weaknesses of each of these scCNV inference methods and provided guidance for selecting the optimal CNV inference method using scRNA-seq data.

Keywords: RNA-seq; benchmarking; copy number variation (CNV) inference; scRNA-seq; scRNA-seq CNV methods.