The Tension on dsDNA Bound to ssDNA/RecA Filaments May Play an Important Role in Driving Efficient and Accurate Homology Recognition and Strand Exchange

It is well known that during homology recognition and strand exchange the double stranded DNA (dsDNA) in DNA/RecA filaments is highly extended, but the functional role of the extension has been unclear. We present an analytical model that calculates the distribution of tension in the extended dsDNA during strand exchange. The model suggests that the binding of additional dsDNA base pairs to the DNA/RecA filament alters the tension in dsDNA that was already bound to the filament, resulting in a non-linear increase in the mechanical energy as a function of the number of bound base pairs. This collective mechanical response may promote homology stringency and underlie unexplained experimental results.


I. INTRODUCTION
Sexual reproduction and DNA damage repair often include homologous recombination facilitated by RecA family proteins [1,2]. In homologous recombination, a single stranded DNA molecule (ssDNA) locates and pairs with a sequence matched double-stranded DNA molecule (dsDNA). In the first step of the process, the incoming ssDNA binds to site I in RecA monomers, resulting in a helical ssDNA-RecA filament with 3 base pairs/monomer and ∼ 6 monomers/helical turn [3]. This helical ssDNA-RecA filament then searches dsDNA molecules for homologous sequences by rapidly binding and unbinding dsDNA to site II in RecA [3]. Thus, the sequence of the ssDNA in the searching filament is fixed, and the system then searches through the available dsDNA to find a sequence match for that ssDNA. The binding of dsDNA to site II is very unstable, so if the dsDNA is not homologous to the ssDNA bound to site I, the dsDNA rapidly unbinds from the ssDNA-RecA filament. If the dsDNA is homologous, strand exchange should occur, probably via base-flipping that transfers the Watson-Crick pairing of the complementary strand from the outgoing DNA strand bound to site II to the incoming DNA strand bound to site I [4]. This strand exchange reduces the unbinding rate for the dsDNA [5]. RecA is an ATPase, but in vitro homology recognition and strand exchange can occur without ATP hydrolysis [6][7][8]. Thus, each step in the homology search/strand exchange process is fully reversible.
During the homology search and strand exchange process, dsDNA bound to RecA is extended significantly beyond the B-form length [9]. Recent theoretical work proposed that the free energy penalty associated with extension may promote rapid unbinding of nonhomologous sequences, but the free energy penalty was assumed to be a linear function of the number of bound triplets and the kinetic trapping due to near homologs was not considered [10].Earlier work had also suggested that the dsDNA extension promotes base-flipping [11] and reduces kinetic trapping since the lattice mismatch between extended dsDNA and B-form dsDNA presents a steric barrier to interactions between unbound dsDNA and bound ssDNA which implies that the dsDNA must bind to the filament in order to interact with the ssDNA [12]. These studies assumed the dsDNA in the DNA/RecA filament is uniformly extended; however, the X-ray crystal structures of the dsDNA in the final post-strand exchange state and the ssDNA in the homology searching state both consist of base pair triplets in a nearly B-form conformation separated by large rises as illustrated in Figure 1a. The rises occur at the interfaces between adjoining RecA monomers [3], as illustrated in Fig. 2. The functional role of the non-uniform extension has been unclear.
In this paper we present a simple model that calculates the extension of each base pair triplet in a dsDNA. Using this model, we calculate the free energy changes associated with progression through the homology recognition/strand exchange process. The results of that calculation suggest a resolution to the long standing question of why strand exchange is free energetically favorable even though the Watson-Crick pairing in the initial and final states is the same and the DNA/protein contacts in the ssDNA-RecA filament and final post-strand exchange state are nearly the same. [3] The model also makes several significant qualitative predictions, the most significant being the suggestion that the collective behavior of the triplets due to their attachment to the phosphate backbones leads to a free energy that is a non-linear function of the number of consecutive bound triplets. As a result of this non-linearity, total binding energy has a minimum as a function of the number bound triplets in a given conformation. After that minimum is reached, adding more triplets that given conformation becomes free energetically unfavorable.
Such a change in sign in the binding energy as a function of the number of bound triplets can never occur in a theory where the energy is a purely linear function of the number of bound triplets since the binding of any base pair anywhere in the system is equally likely regardless of the state of any of the other triplets in the system. In a system with a binding energy that is a linear function of the number of correctly paired bound triplets, if homologous triplets can initially bind to the system, then additional homologous triplets will always continue to bind. Thus, binding will readily progress across a non-homologous triplet. As we will discuss in detail in this work, in a system with a linear energy and more than ∼ 4 binding sites, either homologs will be too unstable or near homologs will be too stable.
In contrast, the non-linearity may provide more rapid and accurate homology recognition than is available for systems using linear energies because the non-linearity requires that dsDNA binding to the ssDNA-RecA proceed iteratively triplet by triplet through a series of checkpoints which inhibit the progression of strand exchange past a non-homologous triplet.
At each checkpoint, the progression of strand exchange to more stable binding conformations is only free energetically favorable if a sufficient number of contiguous homologous base pair triplets are bound to the ssDNA/RecA filament in the appropriate conformations. The general qualitative features of the homology recognition based on the non-linear energy follow from basic properties of the simple model and are insensitive to the parameters chosen.
These features include the following: 1. that homology recognition will proceed iteratively through consecutive triplets 2. that strand exchange reversal is much more favorable at the ends of the filament than at the center 3. that there will be two checkpoints that cannot be passed unless the bound dsDNA contains a sufficient number of contiguous homologous base pairs in the appropriate conformations. Though the general features of the model are very robust, the exact number of contiguous homologous bp required to progress past a particular checkpoint depends strongly on the choice of model parameters. Analytical modeling and numerical simulations suggest that there are a small range of parameters that allow the free energies predicted by the model presented in this paper to provide homology recognition which is both fast and accurate [13,14].
Simulation results suggest that though the initial binding of ∼ 9 base pairs (bp) is free energetically accessible, adding more bound triplets is not favorable, and adding more than 15 bp is enormously unlikely unless the first checkpoint is passed. [14] The first major checkpoint requires that ∼ 9 of the ∼ 15 bp that initially bind to the filament are contiguous and homologous. If the initial ∼ 15 bp do not contain ∼ 9 contiguous homologous base pairs, the dsDNA cannot make a transition to the more stably bound intermediate state; therefore, the weakly bound dsDNA will almost immediately unbind from the filament. This checkpoint rapidly rejects all but 50 of the ∼ 10,000,000 possible binding positions in a bacterial genome. If the initial ∼ 15 bp do include ∼ 9 contiguous homologous base pairs, the system can make a transition to a metastable intermediate state, which allows more base pairs to be added to the filament. The second major checkpoint occurs when ∼ 18 contiguous bp are bound to the filament in the metastable intermediate state. If all of the bp are contiguous and homologous, the system can make a transition to the final post-strand exchange state. Otherwise, the long regions of accidental homology will slowly reverse strand exchange and unbind. Given the statistics of bacterial genomes, passing the homology requirement for the second checkpoint would guarantee that the correct match had been found. These predictions are in good agreement with known experimental results that measure the stability of strand exchange products as a function of the number of contiguous bound base pairs [15].
In this work, we will not attempt to optimize the parameters of the model in order to provide rapid and accurate homology recognition. Rather, we will consider why homology recognition systems in which the energies are a linear function of the number of bound base pairs can either provide rapid unbinding of non-homologs or stable binding of complete homologs, but not both if the number of binding sites in the system is >∼ 4 . We will then discuss how qualitative features of the non-linear free energy predicted by the model allow strand exchange to avoid kinetic trapping in near mismatches while also permitting homologs to progress completely to strand exchange in systems where the number of binding sites is > 4.
A. General Issues in Self-Assembly Based on the Pairing of Arrays of Matching

Binding Sites
In efficient self-assembly/recognition systems that create correct assembly by matching linear arrays of binding sites, correctly paired arrays of binding sites must remain stably bound, whereas incorrectly paired arrays must rapidly unbind even if the incorrect pairing contains only one single mismatched binding site. Thus, the requirements for rapid and accurate recognition can only be met if N = 1. The requirements become more stringent if the specificity ratio is more strict than 1/20. As we will discuss below, accurate homology recognition in a bacterial genome requires accurate recognition over a length of more than 12 bp, which implies N > 4 since 12 base pairs is 4 triplets.
Some of the problems with realizing accurate recognition in systems at thermodynamic equilibrium were recognized by John Hopfield in the 1970's, inducing him to propose a kinetic proofreading system that requires an irreversible process. [16] In such systems, the energy of the bound state can be very deep without making the energy of the searching state deep because of the irreversible step that transfers the system from the searching state to the bound state. Even in Hopfield's system there is a tradeoff between search speed and accuracy since greater sequence discrimination requires greater unbinding probabilities for homologs. The increased binding probability for homologs increases the searching time because the correct binding site must be revisited many times before the homolog makes the irreversible transition to the bound state. Earlier work had proposed that RecA based homology recognition could proceed via kinetic proofreading, [5], but homology recognition in vitro is known to proceed without an irreversible step. [6][7][8] In this work, we will consider loops provide significant mechanical support. [3,22] The model utilizes work presented by deGennes that calculated the force required to shear dsDNA [23]. We extend deGennes' model to the triplet structures in the initial, intermediate, and final dsDNA conformations. In the model the actual three dimensional helical structure is converted into a one dimensional system. In the simple one dimensional model, L R is rise between the triplets in the incoming and outgoing strands and a single variable, γ, characterizes the equilibrium spacing between phosphates when the complementary strand is in a particular state. Thus, for a particular state, the difference between the equilibrium spacing is given by (1 − γ)L R .

D. Predicted Extension and Energy
As shown in Fig. 3, the extensions of rises between base pair triplets in a strand bound directly to the protein are given by v N,i and for a system with N triplets bound RecA, The u N,i specify the extensions of the rises in the complementary strand. At equilibrium, the net force on each u N,i must be zero; therefore, for j = 2 to N where R and Q are the spring constants for the base pairs and the backbones, respectively.
These values for R and Q may be substantially different from those for individual dsDNA base pairs when the dsDNA is not bound to RecA because of the interactions between the charged phosphates and the protein and because dsDNA is grouped in triplets where the stacking between the triplets is strongly disrupted; however, it is still likely that R Q as it is in naked dsDNA since the interactions between the bases on opposite strand is significantly weaker than the interaction between the phosophates in the backbone of the same strand.
The boundary condition on the last triplet u N,1 requires that The values of u N,i can be found using Equations ??-??. The angle between base pairs and the DNA helical axis is shown as θ bp in Fig. 3.
In the continuous limit where the discrete subscript i is replaced by a continuous variable where χ = Sqrt[R/(2Q)] is the deGennes length for RecA bound dsDNA and the constant A is found by using the boundary conditions for the ends, yielding In the limit where 1/(2χ) 1 These assumptions and features lead to a nuanced picture of the distribution of tension during strand exchange. The lattice mismatch between the complementary strand and its pairing partners is largest at the ends of the filament as shown in Fig. 4; therefore, the base pair tension is largest at the end of the filament. Furthermore, the lattice mismatch at the ends increases significantly with the number of bound triplets as shown in Fig. 4  Given the values of u N,i the mechanical energy of the system, which is of the form 1/2k(x− x 0 ) 2 , may be calculated from: When R Q this expression simplifies to: Using the continuous limit allows us to generate scaling laws for the extension as a function of the total number of bound base pairs. In the continuous limit, the non-linear contribution to the free energy is given by Thus, when χ 1 the non linear energy term has the following scaling In the limit where the number of base pairs bound is much less than the deGennes length so χN 1 the energy the non-linear energy scaling is Thus, when N is small the energy increases as the square of the number of bound triplets, consistent with the exact results for the discrete case given in equations ??,?? and ??. In contrast, when N is larger than the deGennes length, the non-linear energy term approaches zero and the mechanical energy increases linearly with increasing N . In this limit, the base pairs at the center of the filament are no longer under tension. Thus, adding a triplet to the end of the filament effectively adds another triplet to the unstressed center rather than increasing the stress on all of the bound triplets.
The total energy of the system includes the mechanical energy calculated above and the non-mechanical binding energy per RecA monomer, E bind . Assuming the free energy of an unbound dsDNA is zero and the free energy gain upon binding a triplet is independent of N and i then the non-mechanical contribution to the binding energy for N triplets is N E bind .
When the first RecA monomer binds, E total [1] = E bind , which is a constant negative value.
In contrast, when N > 1, the stress on the molecule yields a total energy of which changes in sign and magnitude depending on L R , R, Q and γ.  [21]. The crystal structure shows that the incoming strand is located near the center of the helical DNA/protein structure whereas the residues associated with the binding of the outgoing strand are much farther away from the center, as illustrated in Fig. 1; however, if strand exchange occurs via the base flipping of base pair triplets, the spacing within triplets must be approximately the same for all three strands. Thus, given that the total extension of the outgoing strand backbone is be much larger than the total extension of the incoming strand backbone, the rises in the outgoing strand must be much larger than the rises in the incoming strand, as illustrated in Fig. 2.
In the one dimensional model considered here where γ is the only parameter characterizing each dsDNA conformation, γ is smallest for the initial searching state, where the complementary strand is paired with the very highly extended outgoing strand, consistent with experimental results that the dsDNA in the initial bound state has a large differential extension between the outgoing and complementary strands that prevents more base pairs from binding to the filament unless the dsDNA undergoes strand exchange. Experimental results suggest that strand exchange is only marginally stable when ∼ 9 contiguous homologous bp undergo strand exchange, [15] suggesting that rapid unbinding will occur for all tested sequences except for 1/4 9 ∼ 2 × 10 −6 which represents only ∼ 50 possible positions for a bacterial genome with a length of 10,000,000 bp. All other sequences will rapidly unbind from the RecA filament because the binding energy for sequence independent searching state is very weak when only ∼ 9 bp are bound and adding more base pairs to the searching state would increase rather than decrease the binding energy. for states with all of the triplets same conformation rarely represent the free energy of the system; however, it is clear that reductions in the mechanical energy of the bound dsDNA can drive strand exchange for homologs, as we discuss below.

Strand exchange
While it is energetically favorable to add base pairs to the initial bound state for small numbers of base pairs, the quadratic term of E total from Equation ?? rapidly increases as a function of increasing number of base pairs, making the binding of a large number of base pairs in the initial state unfavorable. This is because of the significant tension due to the lattice mismatch between the complementary strand and the outgoing strand; consequently, for the parameters based on simulation, no more than ∼ 15 bp can bind to the filament in the initial searching state because the energy required greatly exceeds kT . Thus, once ∼ 15 bp are bound to the filament, the system is in a highly free energetically unfavorable state which will force it to choose between the following: 1. unbinding from the filament 2. strand exchange, which is unfavorable for non-homologs. Again, the prediction that there will be an checkpoint in the progression of strand exchange from the initial bound state to the intermediate state is insensitive to the model parameters. The parameters only determine whether more than 6 contiguous homologous base pairs are required to progress past the checkpoint.
Homologs can rapidly progress to complete strand exchange if the weak initial binding holds long enough for ∼ 9 homologous contiguous base pairs to undergo strand exchange, which stabilizes the binding for those homologous bp and allows strand exchange to progress. [13,14] The non-linearity in the free energy makes the strand exchange of consecutive homologous triplets increasingly favorable as long as the number of bound base pairs is <∼ 30; consequently, the non-linearity in the free energy makes strand exchange reversal more improbable as a more contiguous homologous base pairs are strand exchanged. Furthermore, the non-linearity makes strand exchange at the center of the filament increasingly unfavorable as the number of bound triplets increases as shown in Fig. 8, while still allowing strand exchange reversal to remain possible at the ends of the filament. Again, these qualitative features are basic properties of the model that are highly insensitive to the model parameters. These qualitative features allow true homologs to progress to complete strand exchange even though non-homologs readily unbind. In contrast, for a system with a linear free the probability that strand exchange will be reversed for a given triplet is independent of the number of other triplets bound and of the position of the particular triplet in the filament. As a result, such systems either suffer rapid unbinding of homologs or strong kinetic trapping in near homologs, as discussed above for the general case of a system with a linear binding energy and greater than 4 binding sites.  It has previously been assumed that the free energy penalty for strand exchange of a triplet is approximately equal to the loss of Watson-Crick pairing for that triplet, with a possible additional factor due to the effect of the mismatch on the pairing of the two neighboring bases which ranges from ∼ 1.5 to ∼ 4kT . [24] In contrast, for a system with the non-linearity considered here if the initially bound base pairs contain a single mismatch, then strand exchange may be significantly more unfavorable because the unfavorable free energy contribution due to this mismatch must include not only the Watson-Crick pairing energy for that base pair and its neighbors, but also the the increased mechanical stress on the two matched base pairs. This stress not only makes a direct contribution to the free energy penalty, but it can also increase the stacking penalty by distorting the bonds between the two homologous base pairs which lowers their Watson-Crick pairing energy.
A detailed structural calculation would be required to correctly assess all of these factors.
In what follows, we will assume that the free energy penalty for the strand exchange of a mismatched base is approximately equal to the Watson-Crick pairing loss as long as the number of < 18 bp are bound to the filament.

G. dsDNA Tension Inhibits Progression of Strand Exchange Past a Mismatch
In a system with a linear free energy as a function of the number of bound homologous triplets, adding more homologous triplets is always favorable even if the last triplet added were non-homologous, resulting in enormous kinetic trapping. In contrast, the non-linear energy inhibits binding of additional base pairs after a non-homologous triplet has bound, as illustrated in Fig. 7. The dashed black line shows the curve for a perfect homolog adding a triplet to the initial bound state if all of the other bound triplets have undergone strand exchange. For up to ∼ 18 bp thermal energy is sufficient to bind additional base pairs.
The dashed gray line shows the free energy penalty for adding a homologous triplet if the last triplet added was non-homologous. The free energy penalty is only slightly larger than the penalty for a homolog; however, the solid gray line shows that the penalty for adding a second triplet is very large, even though both triplets added after the non-homolog were in fact homologous. For comparison, the solid black line shows the energetic favorability of strand exchange of a homologous triplet from the initial binding state. This graph shows that the non-linearity makes adding additional triplets to the initial state is unfavorable once a mismatched triplet has bound, even when the subsequent base pairs are homologous.

H. Possible Explanations of Biological Results
We have already discussed the proposal that the energetic non-linearity explains why strand exchange is free energetically favorable even though the sequences of the incoming and outgoing strands are the same and the protein contacts in the initial searching ssDNA-RecA filament are similar to those in the final post-strand exchange state.
In addition, experimental results have shown that a rapid initial interaction incorporating ∼ 15 base pairs is followed by a slower progression of strand exchange that occurs in triplets [19]. Figure 6 suggests that the binding of dsDNA to site II is favorable for fewer than 9 bases and requires only a few kT of energy for fewer than 15 bases, whereas for more bases the binding is highly free energetically unfavorable.
Furthermore, FRET based studies indicate that homology recognition may be accurate for short sequences, but inaccurate for longer sequences [17]. A separate study showed that strand exchange pauses at sequence mismatches [25,26], and we argue that such pauses lead to the unbinding of shorter non-homologous sequences because the binding of the dsDNA to the filament occurs sequentially In the model presented here the pause in strand exchange at a mismatch results from the free energy cost of transferring the non-homologous triplet to the intermediate state as well as the cost of progressing past a mismatched triplet. Spontaneous unbinding of the entire strand exchange product becomes unlikely as the sequence lengthens because so many free energetically unfavorable transitions are required. If the strand exchange product becomes too long, the unbinding time exceeds the recognition time available to the organism; however, as discussed above, accidental mismatches that extend beyond 18 bp rarely exist in vivo. In vivo, strand exchange does progress through regions of non-homology once a sufficiently long stand exchange product is formed, but ATP hydrolysis is required. [27,28] Finally, it is also well known that in the presence of ATP hydrolysis the size of the strand exchange product increases monotonically until it reaches a limit of M ∼ 80 bp [22], where M is the number of bound dsDNA base pairs. Strand exchange then continues to progress, but M remains constant because the heteroduplex dsDNA unbinds from the lagging edge of the filament at the same average rate that new dsDNA binds to site II [22,29]. Since the dsDNA can freely unbind from the filament, free energy minimization implies M will remain ∼ M f reemin . Additional effects associated with dynamics may explain why the strand exchange window moves along the dsDNA with M ∼ M f reemin rather than remaining stationary [30].

I. Additional Features in Three Dimensions
In the real RecA system steric factors are associated with the mismatch between the 150 bp persistence length of dsDNA and the strong bending of dsDNA in the 18 bp/turn helical RecA filament. The local rigidity of the dsDNA may play a role in limiting the initial binding length to ∼ 9 bp since that many base pairs can interact with the ssDNA-RecA without significant bending. After some dsDNA triplets are bound, the rigidity may also play a role in preventing non-contiguous triplets from being added to the filament.
The nearest unbound triplet is already very near to the filament because it is attached to the bound triplets by the phosphate backbones which cannot extend much more than 0.5 nm/bp. Thus, the phosphates are in a position to interact strongly with positively charged residues on the protein which can provide sufficient free energy for the required bending. In contrast, the second neighboring triplet will be separated by a larger distance which reduces the interaction with the protein and requires more bending. A detailed structural calculation would be required to correctly evaluate these effects, but both effects would further support rapid and accurate homology recognition. In the simple one dimensional model discussed here, the free energy effects of the bending can be included in the γ for the initial bound state, but the additional degrees of freedom would alter the coupling between the initial bound state γ and the γ for the intermediate state.
In addition, in the final post-strand exchange state interactions with the L1 and L2 loops may be more favorable for homologous triplets than non-homologous triplets due to steric factors. Thus, the final state could have a sequence dependent linear contribution to the free energy that was not considered in this model, but may provide additional homology stringency. increases. This non-linearity is important because in systems with more than ∼ 4 binding sites, neither thermodynamic equilibrium nor kinetic proofreading can combine accurate and efficient homology recognition when the energy is a linear function of the number of correct pairings. In contrast, an unfavorable non-linear energy combined with the a favorable linear energy due to DNA/protein contacts can promote rapid and accurate homology recognition by making initial sequence independent binding interactions favorable for up to ∼ 9 base pairs, while preventing any additional base pairs from binding unless the bound base pairs include 9 contiguous homologous base pairs. If the initially bound base pairs do not contain 9 contiguous homologous base pairs, adding more base pairs to the filament is highly improbable regardless of whether or not the additional base pairs are homologous. This effect combined with the statistics of the sequence distribution of bacterial genomes implies that of all but the ∼ 50 out of 10,000,000 possible pairings will rapidly unbind.
In addition , the non-linearity forces the addition of triplets to the filament to proceed sequentially from the initially binding, where adding more than two base pair triplets af- These features provide much more rapid and accurate homology recognition than systems using linear energies: in systems with linear energies addition and strand exchange of a homologous triplet is always favorable; therefore, in systems with linear energies even short regions of accidental homology can produce substantial trapping times, as demonstrated by both analytical modeling and numerical simulations. [14] Qualitative features of the