The regulatory sequences of vertebrate genomes remain incompletely understood. To address this, we developed an ultra-throughput, ultra-sensitive single-nucleus assay for transposase-accessible chromatin using sequencing (UUATAC-seq) protocol that enables the construction of chromatin accessibility landscapes for one species in a 1-day experiment. Using UUATAC-seq, we mapped candidate cis-regulatory elements (cCREs) across five representative vertebrate species. Our analysis revealed that genome size differences across species influence the number but not the size of cCREs. We introduced Nvwa cis-regulatory element (NvwaCE), a mega-task deep-learning model designed to interpret cis-regulatory grammar and predict cCRE landscapes directly from genomic sequences with high precision. NvwaCE demonstrated that regulatory grammar is more conserved than nucleotide sequences and that this grammar organizes cCREs into distinct functional modules. Moreover, NvwaCE accurately predicted the effects of synthetic mutations on lineage-specific cCRE function, aligning with causal quantitative trait loci (QTLs) and genome editing results. Together, our study provides a valuable resource for decoding the vertebrate regulatory language.
Keywords: NvwaCE; UUATAC-seq; cCRE; chromatin accessibility landscape; deep learning; genome editing; genomics; mutation effect; regulatory sequence; snATAC-seq.
Copyright © 2025 The Authors. Published by Elsevier Inc. All rights reserved.