Long-read transcriptomics of a diverse human cohort reveals widespread ancestry bias in gene annotation

Pau Clavell-Revelles; Fairlie Reese; Sílvia Carbonell-Sala; Fabien Degalez; Winona Oliveros; Carme Arnan; Roderic Guigó; Marta Melé

doi:10.1101/2025.03.14.643250

Long-read transcriptomics of a diverse human cohort reveals widespread ancestry bias in gene annotation

bioRxiv [Preprint]. 2025 Mar 17:2025.03.14.643250. doi: 10.1101/2025.03.14.643250.

Authors

Pau Clavell-Revelles^{1

2

3}, Fairlie Reese¹, Sílvia Carbonell-Sala², Fabien Degalez², Winona Oliveros^{1

3}, Carme Arnan², Roderic Guigó^{2

4}, Marta Melé¹

Affiliations

¹ Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Catalonia.
² Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Catalonia.
³ Universitat de Barcelona (UB), Barcelona, Catalonia.
⁴ Universitat Pompeu Fabra (UPF), Barcelona, Catalonia.

Abstract

Accurate gene annotations are fundamental for interpreting genetic variation, cellular function, and disease mechanisms. However, current human gene annotations are largely derived from transcriptomic data of individuals with European ancestry, introducing potential biases that remain uncharacterized. Here, we generate over 800 million full-length reads with long-read RNA-seq in 43 lymphoblastoid cell line samples from eight genetically-diverse human populations and build a cross-ancestry gene annotation. We show that transcripts from non-European samples are underrepresented in reference gene annotations, leading to systematic biases in allele-specific transcript usage analyses. Furthermore, we show that personal genome assemblies enhance transcript discovery compared to the generic GRCh38 reference assembly, even though genomic regions unique to each individual are heavily depleted of genes. These findings underscore the urgent need for a more inclusive gene annotation framework that accurately represents global transcriptome diversity.

Publication types

Preprint

Abstract

Publication types

Grants and funding