The domain of Archaea has gathered significant interest for its ecological and biotechnological potential and its role in helping us to understand the evolutionary history of Eukaryotes. In comparison to the bacterial domain, the number of adequately described members in Archaea is relatively low, with less than 1000 species described. It is not clear whether this is solely due to the cultivation difficulty of its members or, indeed, the domain is characterized by evolutionary constraints that keep the number of species relatively low. Based on molecular evidence that bypasses the difficulties of formal cultivation and characterization, several novel clades have been proposed, enabling insights into their metabolism and physiology. Given the extent of global sampling and sequencing efforts, it is now possible and meaningful to question the magnitude of global archaeal diversity based on molecular evidence. To do so, we extracted all sequences classified as Archaea from 500 thousand amplicon samples available in public repositories. After processing through our highly conservative pipeline, we named this comprehensive resource the 'Global Archaea Diversity' (GAD), which encompassed nearly 3 million molecular species clusters at 97% similarity, and organized it into over 500 thousand genera and nearly 100 thousand families. Saline environments have contributed the most to the novel taxa of this previously unseen diversity. The majority of those 16S rRNA gene sequence fragments were verified by matches in metagenomic datasets from IMG/M. These findings reveal a vast and previously overlooked diversity within the Archaea, offering insights into their ecological roles and evolutionary importance while establishing a foundation for the future study and characterization of this intriguing domain of life.
Keywords: Archaea; Asgardarchaeota; IMNGS; genetic diversity; microbial dark matter.