Motivation: The detection of transcription factor binding sites (TFBS) in genomic sequences is a basic task for elucidating the transcriptional aspects of gene regulation. Evaluation procedures applicable to the TFBS prediction outputs need improvement. Predicted TFBS located outside of the transcription associated areas are often neglected from the functional and the evolutionary points of view, therefore deserving a systematic overview.
Results: We calculated theoretical occurrences of 184 TFBS according to their position weight matrices and the dinucleotide statistics of the completed vertebrate genomes, then performed a TFBS prediction in the corresponding complete genomic sequences and their repeat-free, repetitive and regulatory fractions. Repeat-free fractions of the closely related mammalian genomes were characterized by strong similarities in TFBS occurrences. A significant over-representation of multiple TFBS was found in both repetitive and non-repetitive genome fractions.
Availability: F-values and real TFBS occurrences calculated for human, chimp, mouse, rat, zebrafish and fugu genomes are available for free download at http://www.gmu.edu/departments/mmb/baranova/pages/bioinformatics