Comparative and structural genomics to explore the evolution of protein function

Christine A. Orengo

Department of Biochemistry and Molecular Biology, University College London, UK

Corin Yeats, Gabrielle Reeves, Oliver Redfern, Juan Ranea and Christina Orengo

How can structural genomics initiatives target relatives from protein families in a manner that increases our understanding of the evolution of protein structures and functions within families? There are now nearly 100,000 domain structures in the CATH database which can be classified into approximately 2000 evolutionary superfamilies. Using HMM based methods to predict structural relatives in completed genomes we observe that more than half of the domain sequences can be assigned to known structural families in CATH [1]. This structural mapping allows us to probe more deeply into the evolutionary history of these families and their differential expansion in the genomes. Although, there are about 140 structural families that are common to all kingdoms of life, a small proportion of these (<20) are highly recurrent accounting for nearly 50% of domain structure annotations in the genomes. Furthermore, many of these very large families are observed to be highly structurally and functionally divergent, though functional divergence is generally limited to changes within a COG major functional class rather than a complete change of functional class. Structural analyses of the most divergent enzyme families reveals a mechanism whereby small accretions of secondary structural elements along the polypeptide change during evolution, are amplified in their impact on the structure through co-localisation in 3D. These secondary structure embellishments often modify the geometry of the active site or the structural characteristics on the surface of the protein promoting different protein-protein interactions [2]. Whilst local structure comparison methods and 3D-templates based on functional sites have difficulty in distinguishing functional subgroups within a structural superfamily, template methods based on global structural comparison show increased specificity and selectivity and reflect the ability of these approaches to capture a broader range of surface characteristics. Sequence based methods for predicting functional subgroups within superfamilies identifies functionally distint subfamilies with no close structural relatives available for homology modelling. These can be targetted by the structural genomics initiatives to improve our understanding of structure-function space.

[1]Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Marsden RL, Lee D, Maibaum M, Yeats C, Orengo CA. (2006) Nucleic Acids Res 34, 1066-1080.
[2]Structural Diversity of Domain Superfamilies in the CATH Database. G.A. Reeves, T.J. Dallman, O.C. Redfern, A. Akpor & C.A. Orengo. (2006) Journal of Molecular Biology 360, 725-41.
Key words: domain families, genome analysis

Back