Abstract
The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.
- Received 24 February 2016
DOI:https://doi.org/10.1103/PhysRevX.6.041009
Published by the American Physical Society under the terms of the Creative Commons Attribution 3.0 License. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.
Published by the American Physical Society
Physics Subject Headings (PhySH)
Focus
Evolution Thins Out Distracting DNA
Published 14 October 2016
Proteins sometimes bind to the wrong stretch of DNA, but these "imposter" DNA sequences are statistically rare in many genomes, suggesting that evolution works against them.
See more in Physics
Popular Summary
Nonfunctional DNA binding by proteins can disrupt basic cellular processes such as transcription, gene regulation, replication, and mutational repair. By evolving their global composition of short DNA “words,” genomes could potentially reduce the frequency of nonfunctional binding. Here, we show that such an evolutionary process imposes global constraints in all genomes and operates via tiny steps over time scales spanning millions of generations.
In this study, we analyze DNA-binding proteins in 947 bacterial or archaeal genomes and the genomes of 75 eukaryotic species. We determine the global constraints that are set by the distinct complement of DNA-binding proteins present in each genome that are responsible for preserving ancient phylogenetic signals. Using a mathematical model, we show that a distinctive genomic signature of such constraints is preserved in the genomes of various divergent species, for example, between D. melanogaster (the common fruit fly) and M. musculus (the house mouse), species that diverged over 600 million years ago. Our analysis demonstrates that weak binding sites in genomes are preferentially avoided, a result that holds true across the domains of life. Put another way, we show that the global word composition of each genome has been molded by its DNA-binding proteins over the course of evolution.
The outcomes of our study, which reveal that a large number of small effects act collectively to maintain genomic binding landscapes over long evolutionary time scales, pave the way for investigations of how this general evolutionary mechanism impacts a wide range of cellular processes.