Genome-wide profiling of highly similar paralogous genes using HiFi sequencing

Nat Commun. 2025 Mar 8;16(1):2340. doi: 10.1038/s41467-025-57505-2.

Abstract

Variant calling is hindered in segmental duplications by sequence homology. We developed Paraphase, a HiFi-based informatics method that resolves highly similar genes by phasing all haplotypes of paralogous genes together. We applied Paraphase to 160 long (>10 kb) segmental duplication regions across the human genome with high (>99%) sequence similarity, encoding 316 genes. Analysis across five ancestral populations revealed highly variable copy numbers of these regions. We identified 23 paralog groups with exceptionally low within-group diversity, where extensive gene conversion and unequal crossing over contribute to highly similar gene copies. Furthermore, our analysis of 36 trios identified 7 de novo SNVs and 4 de novo gene conversion events, 2 of which are non-allelic. Finally, we summarized extensive genetic diversity in 9 medically relevant genes previously considered challenging to genotype. Paraphase provides a framework for resolving gene paralogs, enabling accurate testing in medically relevant genes and population-wide studies of previously inaccessible genes.

MeSH terms

  • DNA Copy Number Variations
  • Gene Conversion
  • Genetic Variation
  • Genome, Human* / genetics
  • Haplotypes
  • Humans
  • Polymorphism, Single Nucleotide
  • Segmental Duplications, Genomic / genetics
  • Sequence Analysis, DNA / methods