Theoretical efforts to understand the regulation of gene expression
are traditionally centered around the identification of
transcription factor binding sites at specific DNA positions.
More recently these efforts have been supplemented by large-scale
experimental data (ChIP-chip) for the relative binding strength of
proteins to longer intergenic sequences. The question arises to what
extent these two approaches converge. So far, a direct comparison
has been made difficult by the presence of an arbitrary cutoff, which
is commonly imposed on both in vivo data and on in silico
binding site predicitions.
Here we adopt the physical binding model of Berg and von Hippel to predict the binding probabilities and relative binding strengths of a given transcription factor to any sequence region. In contrast to the traditional search for binding sites, we do not impose any threshold, but integrate the contributions from strong and weak binding sites to calculate the overall binding strength to a given region. This approach pertains directly to the experimental situation of ChIP-chip data, and we draw upon a large scale data set from S. cerevisae to calibrate the parameters of the model. After calibration, our transcription factor affinity predicition (TRAP) tool is suitable for predicting the relative binding strength of transcription factors even in the absence of large-scale experimental binding data. We demonstrate that, within this probabilistic framework, a significant fraction of experimental low and high affinity binding data can be rationalized in terms of only two universal parameters. Our method can assign high affinities to sequences where hit-based methods fail to report any "match" and it also accounts for differences in the binding strength of sites which are traditionally reported only as hits. We compare our predictions to a number of traditional approaches and find that it has a higher predictive power with respect to experimental binding ratios than any of the hit-based methods. Finally, we illustrate the applicability of our approach to promoter regions of higher eukaryotes. |
![]() |