Intrinsic DNA topology as a prioritization metric in genomic fine-mapping studies

Nucleic Acids Res. 2020 Nov 18;48(20):11304-11321. doi: 10.1093/nar/gkaa877.

Abstract

In genomic fine-mapping studies, some approaches leverage annotation data to prioritize likely functional polymorphisms. However, existing annotation resources can present challenges as many lack information for novel variants and/or may be uninformative for non-coding regions. We propose a novel annotation source, sequence-dependent DNA topology, as a prioritization metric for fine-mapping. DNA topology and function are well-intertwined, and as an intrinsic DNA property, it is readily applicable to any genomic region. Here, we constructed and applied Minor Groove Width (MGW) as a prioritization metric. Using an established MGW-prediction method, we generated a MGW census for 199 038 197 SNPs across the human genome. Summarizing a SNP's change in MGW (ΔMGW) as a Euclidean distance, ΔMGW exhibited a strongly right-skewed distribution, highlighting the infrequency of SNPs that generate dissimilar shape profiles. We hypothesized that phenotypically-associated SNPs can be prioritized by ΔMGW. We tested this hypothesis in 116 regions analyzed by a Massively Parallel Reporter Assay and observed enrichment of large ΔMGW for functional polymorphisms (P = 0.0007). To illustrate application in fine-mapping studies, we applied our MGW-prioritization approach to three non-coding regions associated with systemic lupus erythematosus. Together, this study presents the first usage of sequence-dependent DNA topology as a prioritization metric in genomic association studies.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Base Sequence
  • Bayes Theorem
  • Black People / genetics
  • Chromosome Mapping / methods*
  • DNA / chemistry*
  • DNA-Binding Proteins / genetics
  • Databases, Genetic
  • Genome, Human*
  • Genome-Wide Association Study / methods*
  • Genomics / methods*
  • Hispanic or Latino / genetics
  • Humans
  • Lupus Erythematosus, Systemic / genetics
  • Molecular Sequence Annotation / methods
  • Polymorphism, Single Nucleotide
  • Proteins / genetics
  • Quantitative Trait Loci
  • STAT4 Transcription Factor / genetics
  • White People / genetics
  • src-Family Kinases / genetics

Substances

  • DNA-Binding Proteins
  • FAM167A protein, human
  • Proteins
  • STAT4 Transcription Factor
  • STAT4 protein, human
  • TNIP1 protein, human
  • DNA
  • BLK protein, human
  • src-Family Kinases