dbscATAC: a resource of single-cell super-enhancers/enhancers and gene markers derived from scATAC-seq data

Bioinformatics. 2025 Jun 23:btaf364. doi: 10.1093/bioinformatics/btaf364. Online ahead of print.

Abstract

Motivation: scATAC-seq enables high-resolution mapping of cis-regulatory elements. It has been widely applied to uncover cell type-specific regulatory networks and complement scRNA-seq analysis in numerous studies. However, a large number of datasets generated by scATAC-seq remain underutilized due to limited exploration of super-enhancers/typical enhancers and gene markers. A comprehensive resource enabling cell type specific annotation of cis-regulatory elements and their dynamic enhancer-gene linkages remains an urgent unmet need for scATAC-seq.

Results: We present dbscATAC, a specialized single-cell database for annotating super-enhancers, gene markers, and enhancer-gene interactions derived from scATAC-seq data. Using improved machine learning algorithms, we identified 213,835 super-enhancers across 520 tissue/cell types from 3 species, as well as 347,484 gene markers, 13,470,526 enhancers, and 10,402,346 enhancer-gene interactions derived from 1,668,076 single cells spanning 1,028 tissue/cell types in 13 species. A easy-to-use online platform with multiple analytic modules and hierarchical query options was developed for searching, browsing and visualizing single-cell super-enhancers, enhancers, and gene markers. dbscATAC provides a comprehensive resource to facilitate the exploration of enhancer landscapes, gene regulation, and cell-type-specific characteristics in single-cell epigenomics.

Availability and implementation: The database with all the super-enhancer/enhancer annotation data is available at http://singlecelldb.com/dbscATAC/index.php. And the source code of dbscATAC for prediction of SEs, enhancers, and gene markers are available at https://github.com/EvansGao/dbscATAC. The source code, tissue/cell type description, and data summary can be downloaded at DOI : 10.6084/m9.figshare.28706414.

Supplementary information: Supplementary data are available at Bioinformatics online.