Identification and classification of protein fold families

C A Orengo; T P Flores; W R Taylor; J M Thornton

doi:10.1093/protein/6.5.485

Identification and classification of protein fold families

Protein Eng. 1993 Jul;6(5):485-500. doi: 10.1093/protein/6.5.485.

Authors

C A Orengo¹, T P Flores, W R Taylor, J M Thornton

Affiliation

¹ Department of Biochemistry, University College, London, UK.

PMID: 8415576
DOI: 10.1093/protein/6.5.485

Abstract

We have developed a method for identifying fold families in the protein structure data bank. Pairwise sequence alignments are first performed to extract families of homologous proteins having 35% or more sequence identity. Representatives are selected with the best resolution and R-factor to give a nonhomologous data set. Subsequent structure comparisons between all members of this set detect homologous folds with low sequence identity but highly conserved structures. By softening the requirement on structural similarity, families of analogous proteins are obtained that have related folds but more diverse structures. Representatives are selected to give a non-analogous data set. Starting with 1410 chains from the Brookhaven Data Bank, we generate a set of 150 nonhomologous folds and a set of 112 non-analogous folds. Analysis of sequence and structure conservation within the larger families shows the globins to be the most highly conserved family and the TIM barrels the most weakly conserved.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Classification / methods
Databases, Factual / trends
Mathematical Computing
Models, Molecular
Protein Conformation*
Protein Structure, Secondary
Protein Structure, Tertiary
Proteins / chemistry*
Proteins / classification*
Sequence Alignment / methods*
Sequence Homology, Amino Acid

Substances

Proteins