Structural genomics of membrane proteins: fold space, target selection, and structure prediction

Dmitrij Frishman

Technische Universität München, Freising, Germany

Recent progress in structure determination techniques has led to a significant growth in the number of known membrane protein structures, and the first structural genomics projects focusing on membrane proteins have been initiated, warranting an investigation of appropriate bioinformatics strategies for optimal structural target selection for these molecules. What determines a membrane protein fold? How many membrane structures need to be solved to provide sufficient structural coverage of the membrane protein sequence space? We developed the CAMPS database (Computational Analysis of the Membrane Protein Space) containing almost 45,000 proteins with three or more predicted transmembrane helices (TMH) from 120 bacterial species. CAMPS sequences are hierarchically organized in clusters that reflect three main levels of interest for structural genomics: fold, function, and modeling distance. Given that 24 out of 266 clusters corresponding to membrane folds already have associated known structures, we estimate that 242 additional new structures, one for each remaining cluster, would provide structural coverage at the fold level of roughly 70% of prokaryotic membrane proteins belonging to the currently most populated families. In the second part of my talk I will review bioinformatics methods for predicting experimentally tractable membrane proteins. Integral membrane proteins are challenging targets for structure determination due to the substantial experimental difficulties involved in their sample preparation. Based on the target status information available in the TargetDB repository, we conducted the first large-scale analysis of experimental behavior of membrane proteins. Using information on recalcitrant and propagating targets as negative and positive sets, respectively, we developed naive Bayes classifiers capable of predicting, from sequence alone, those proteins that are more amenable to cloning, expression, and solubilization studies. Based on the CAMPS database and the predicted experime ntal behaviour of membrane proteins we developed a highly customizable target selection protocol for structural genomics of membrane proteins. Finally, I will present our latest results on predicting interacting helices in membrane proteins.

Back