We have developed a method for identifying fold families in the protein structure data bank. Pairwise sequence alignments are first performed to extract families of homologous proteins having 35% or more sequence identity. Representatives are selected with the best resolution and R-factor to give a nonhomologous data set. Subsequent structure comparisons between all members of this set detect homologous folds with low sequence identity but highly conserved structures. By softening the requirement on structural similarity, families of analogous proteins are obtained that have related folds but more diverse structures. Representatives are selected to give a non-analogous data set. Starting with 1410 chains from the Brookhaven Data Bank, we generate a set of 150 nonhomologous folds and a set of 112 non-analogous folds. Analysis of sequence and structure conservation within the larger families shows the globins to be the most highly conserved family and the TIM barrels the most weakly conserved.