Predicting Transcription Factor Affinities to DNA from a Biophysical Model

Helge Roider

Max Planck Institute for Molecular Genetics, Computational Biology, Berlin, Germany

Theoretical efforts to understand the regulation of gene expression are traditionally centered around the identification of transcription factor binding sites at specific DNA positions. More recently these efforts have been supplemented by large-scale experimental data (ChIP-chip) for the relative binding strength of proteins to longer intergenic sequences. The question arises to what extent these two approaches converge. So far, a direct comparison has been made difficult by the presence of an arbitrary cutoff, which is commonly imposed on both in vivo data and on in silico binding site predicitions.

Here we adopt the physical binding model of Berg and von Hippel to predict the binding probabilities and relative binding strengths of a given transcription factor to any sequence region. In contrast to the traditional search for binding sites, we do not impose any threshold, but integrate the contributions from strong and weak binding sites to calculate the overall binding strength to a given region. This approach pertains directly to the experimental situation of ChIP-chip data, and we draw upon a large scale data set from S. cerevisae to calibrate the parameters of the model. After calibration, our transcription factor affinity predicition (TRAP) tool is suitable for predicting the relative binding strength of transcription factors even in the absence of large-scale experimental binding data.

We demonstrate that, within this probabilistic framework, a significant fraction of experimental low and high affinity binding data can be rationalized in terms of only two universal parameters. Our method can assign high affinities to sequences where hit-based methods fail to report any "match" and it also accounts for differences in the binding strength of sites which are traditionally reported only as hits. We compare our predictions to a number of traditional approaches and find that it has a higher predictive power with respect to experimental binding ratios than any of the hit-based methods. Finally, we illustrate the applicability of our approach to promoter regions of higher eukaryotes.

Back