Cell-type identification is a crucial step in single cell RNA-seq (scRNA-seq) data analysis, for which supervised methods are preferred due to their accuracy and efficiency. Performance is highly dependent on the quality of the reference data, but there is no method for selecting and constructing reference data. We develop Target-Oriented Reference Construction (TORC), a widely applicable strategy for constructing reference data given a target dataset for scRNA-seq supervised cell-type identification. TORC alleviates the differences in data distribution and cell-type composition between reference and target. Extensive benchmarks on simulated and real data analyses demonstrate consistent improvements in cell-type identification from TORC.
Keywords: Cell-type identification; Reference construction; ScRNA-seq; Supervised learning.
© 2025. The Author(s).