A comparative analysis of relative occurrence of transcription factor binding sites in vertebrate genomes and gene promoter areas

Bioinformatics. 2005 May 1;21(9):1789-96. doi: 10.1093/bioinformatics/bti307. Epub 2005 Feb 4.

Abstract

Motivation: The detection of transcription factor binding sites (TFBS) in genomic sequences is a basic task for elucidating the transcriptional aspects of gene regulation. Evaluation procedures applicable to the TFBS prediction outputs need improvement. Predicted TFBS located outside of the transcription associated areas are often neglected from the functional and the evolutionary points of view, therefore deserving a systematic overview.

Results: We calculated theoretical occurrences of 184 TFBS according to their position weight matrices and the dinucleotide statistics of the completed vertebrate genomes, then performed a TFBS prediction in the corresponding complete genomic sequences and their repeat-free, repetitive and regulatory fractions. Repeat-free fractions of the closely related mammalian genomes were characterized by strong similarities in TFBS occurrences. A significant over-representation of multiple TFBS was found in both repetitive and non-repetitive genome fractions.

Availability: F-values and real TFBS occurrences calculated for human, chimp, mouse, rat, zebrafish and fugu genomes are available for free download at http://www.gmu.edu/departments/mmb/baranova/pages/bioinformatics

MeSH terms

  • Animals
  • Binding Sites / genetics*
  • Chromosome Mapping / methods*
  • Evolution, Molecular
  • Promoter Regions, Genetic / genetics*
  • Protein Binding
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid
  • Species Specificity
  • Transcription Factors / genetics*
  • Transcriptional Activation / genetics*
  • Vertebrates

Substances

  • Transcription Factors