Statistical properties of metastable intermediates in DNA unzipping

We unzip DNA molecules using optical tweezers and determine the sizes of the cooperatively unzipping and zipping regions separating consecutive metastable intermediates along the unzipping pathway. Sizes are found to be distributed following a power law, ranging from one base pair up to more than a hundred base pairs. We find that a large fraction of unzipping regions smaller than 10 bp are seldom detected because of the high compliance of the released single stranded DNA. We show how the compliance of a single nucleotide sets a limit value around 0.1 N/m for the stiffness of any local force probe aiming to discriminate one base pair at a time in DNA unzipping experiments.

We unzip DNA molecules using optical tweezers and determine the sizes of the cooperatively unzipping and zipping regions separating consecutive metastable intermediates along the unzipping pathway. Sizes are found to be distributed following a power law, ranging from one base pair up to more than a hundred base pairs. We find that a large fraction of unzipping regions smaller than 10 bp are seldom detected because of the high compliance of the released single stranded DNA. We show how the compliance of a single nucleotide sets a limit value around 0.1 N/m for the stiffness of any local force probe aiming to discriminate one base pair at a time in DNA unzipping experiments. The mechanical response of biomolecules to externally applied forces allows us to investigate molecular free energy landscapes with unprecedented accuracy. Single molecule experiments with optical tweezers, atomic force microscope (AFM), and magnetic tweezers are capable of measuring forces in the pN range and energies as small as tenths of kcal/mol. An experiment that nicely illustrates the potential applications of single molecule manipulation is molecular unzipping [1][2][3][4][5]. By applying mechanical force to the ends of biopolymers such as DNA, RNA, and proteins, it is possible to break the bonds that hold the native structure and measure free energies and kinetic rates. In unzipping experiments, a DNA double helix is split into two single strands by pulling them apart and the force vs. distance curve (FDC) measured. A typical FDC shows a force plateau around 15 pN with a characteristic sawtooth pattern corresponding to the progressive separation of the two strands. Mechanical unzipping is also a process mimicked by motor proteins that unwind the double helix. In fact, anticorrelations between unzipping forces and unwinding rates have been found in DNA helicases suggesting that such enzymes unzip DNA by exerting local stress [6]. DNA unzipping experiments have several applications such as identifying specific locations at which proteins and enzymes bind to the DNA [5]. Moreover, the strong dependence of the shape of the sawtooth pattern with the sequence might be used for DNA sequencing [7], i.e., a way to infer the DNA sequence from the unzipping data. A limitation factor in these applications is the accuracy at which base pair (bp) locations along the DNA can be resolved. This is mainly determined by the combined stiffness of the force probe and the large compliance of the released single stranded DNA (ssDNA) [1,8]. The unzipping process, even if carried out reversibly (i.e., infinitely slowly), shows a progression of cooperative unzipping-zipping transitions that involve groups of bps of different sizes. These cooperatively unzipping-zipping transitions regions (CUR) of bps breath in an all-or-none fashion hindering details about the individual bps participating in such transitions. Unzipping experiments pose challenging questions to the ex-perimentalist and the theorist. What is the typical size of these CUR? What is the smallest size of the CUR that can be detected with single molecule techniques? Under what experimental conditions might be possible to resolve large CUR into individual bps? There have been several DNA unzipping studies with controlled force using magnetic tweezers. Because at constant force the unzipping transition is abrupt, this setup is not suitable to answer such questions [3,9].
We carried out DNA unzipping experiments with optical tweezers [10,11] and determined the distribution of CUR sizes in DNA fragments a few kbp long. For the experiments, a 2.2 kbp DNA molecular construct was synthesized [11]. In a typical unzipping experiment one bead is held fixed at the tip of a micropipette and the other bead is optically trapped and the force exerted on the molecule measured. By moving the center of the optical trap at a very low speed (10 nm/s) double stranded DNA (dsDNA) is progressively and quasireversibly converted into ssDNA through a succession of intermediate states corresponding to the successive opening of CUR (Fig. 1a). The experimentally measured FDC shows a sawtoothlike pattern (Fig. 1b) that alternates force rips and gentle slopes. Slopes correspond to the elastic response of the molecule while the force rips correspond to the release of CUR. The slope is due to the combined elastic response of the optical trap and the released ssDNA. The size of a CUR can be inferred from the difference of slopes that precede and follow a given force rip. However, the identification of the CUR sizes is not straightforward as often the slopes cannot be isolated because the experimental FDC exhibits noise. Here we extract the different sizes of the CUR that separate contiguous intermediate states along the unzipping pathway. For that we adopt a Bayesian approach where for each experimental data point (distance, force) we determine the most probable intermediate state to which the data point belongs.
To this end we consider the molecular system as composed of different elements: the optical trap, the dsDNA handles, the released ssDNA and the hairpin at the intermediate state I n where n bases are open. We express the total distance between trap and pipette x tot at a given force f as the sum of the extensions of each element at that force: where x b is the position of the bead with respect to the center of the optical trap; x h is the extension of the flanking dsDNA handles; x s is the extension of the released ssDNA and φ b is the diameter of the bead. The extension of the ssDNA depends on the number of open bases at the intermediate state I n . The different contributions to Eq. (1) are calculated by using well-known elastic models for biopolymers [11]. For each experimental data point of the FDC (x, f ), the intermediate state I n * that passes closest to that point for a fixed force f is determined by In this way each experimental data point (x, f ) is associated to a value of n * (red or gray curve in Fig. 1b). The histogram built from all values n * results in a series of sharp peaks that can be identified with the many intermediate states I n (Fig. 1c). The histogram contains information about the stability of the intermediate states: the higher the peak, the higher the stability of that state and the larger the GC content of that part of the sequence (data not shown). The histogram can be fit to a sum of Gaussians each one characterized by its mean, variance and statistical weight (Figs. 1d and 2a). Finally, the size of the CUR is obtained by calculating the difference of the means (in bps) between consecutive Gaussians. The experimental distribution of CUR sizes is shown in Fig. 2b. Sizes range from a few bps up to 90 bp with a maximum number of detected CUR sizes between 20 and 50 bp.
To better understand the distribution of CUR sizes we have computed the sequence dependent free energy profile using a mesoscopic model for DNA based on nearest neighbour bp interactions that includes the different elements of the experimental setup [12,13]. The model is defined by the total free energy of the system, G(x tot ), which gets contributions from the partial free energies G(x tot , n) of the many intermediates I n : we can determine the theoretical FDC by using the relation, f (x tot ) = ∂G(xtot) ∂xtot . We have used this model to compute the partial free energies G(x tot , n) of all intermediates I n . For a given value of x tot we identify the most stable intermediate I n * corresponding to the value of n * for which G(x tot , n) is the absolute minimum [i.e., G(x tot , n * ) ≤ G(x tot , n), ∀n]. Integer values of n * change in a stepwise manner as x tot is continuously varied according to the following scheme where n * a , n * b , n * c indicate the number of open bps corresponding to consecutive intermediates. Differences between consecutive values of n * provide the sizes of the CUR. The resulting size distributions are shown in Fig. 2b. The good agreement between the experimental and the theoretical size distributions shows that our method of analysis is capable of discriminating the metastable intermediates during unzipping. There are two remarkable facts in Fig. 2b. First, the mesoscopic model predicts a large fraction of CUR of size smaller than 10 bp that are not experimentally observed. Second, size distributions are not smooth but have a rough shape in agreement with the prediction by the mesoscopic model. In order to check the generality of these results we have repeated the same analysis by unzipping a different and longer molecular construct of 6.8 kbp (Figs. 2c, 2d). The agreement between experiments and theory remains good. Again a large fraction of predicted CUR sizes smaller than 10 bp are not detected [11]. However, the CUR size distributions are now smoother suggesting that a monotonically decreasing continuous distribution could describe the distribution of CUR in the thermodynamic (infinite DNA length) limit. The fact that CUR sizes show a long tailed distribution indicates that large sizes occur with finite probability. However, largesized CUR hinder their internal DNA sequence limiting the possibility of sequencing DNA by mechanical unzipping. Under what experimental conditions is it possible to break up large-sized CUR into individual bps?
In order to answer this question we have developed a toy model useful to elucidate the mathematical form of the CUR size distribution. Similar distributions have been investigated in the context of DNA thermal denaturation [14,15] and DNA unzipping experiments in the constant force ensemble [9]. Our model contains only two elements: the bead in the optical trap and the DNA construct to be unzipped. The latter is composed of the DNA duplex and the released ssDNA (Fig. 3a). The optical trap is modeled by a harmonic spring with energy, The DNA duplex is modeled as a onedimensional random model with bp free energies i along the sequence [16]. The free energy of a given intermediate I n is given by where µ(< 0) and σ are the mean and the standard deviation of the energies, respectively (other more realistic energy distributions give similar results). The released ssDNA is taken as inextensible: its extension (x m ) is given by x m = 2dn, where d is the interphosphate dis- tance, n is the number of open bps, and the factor 2 stands for the two strands of ssDNA. By using the relation x b = x tot − 2dn (Fig. 3a), the total energy of the system can be written as At fixed x tot , the system will occupy the state (n * ) that minimizes the total energy of the system, i.e., E(x tot , n * ) ≤ E(x tot , n), ∀n. The function n * (x tot ) gives the thermodynamic energy function at the minimum, E m (x tot ), and the FDC, f (x tot ) = ∂Em(xtot) ∂xtot . The FDC obtained from this model reproduces the sawtooth pattern that is experimentally observed (Fig. 3b). Equation 4 can be approximated by neglecting the disorder and taking i = µ, ∀i. This gives, From this approximation we immediately get the following results n * ( . These expressions capture the dependence of the averaged number of open bps, energy, and force on the external parameters µ, σ [11]. Finally we have numerically computed the CUR size distribution. We find that this mostly depends on σ and k. For several combinations of σ and k we simulated 10 4 realizations (i.e., sequences) of 10 4 bp sequences, while d and µ were kept constant. The size distributions are excellently fit by a power law with a superexponential cutoff [11]: where P (n) is the probability of observing a CUR of size What is the limiting factor in detecting small-sized CUR? A look at Figs. 1c,2a,2c, and Figs. S14 and S15 in [11] shows that histograms become smoother as the molecule is progressively unzipped. The increased compliance of the molecular setup as ssDNA is released markedly decreases the resolution in discriminating intermediates. In fact, for the 6.8 kbp construct we found that along the first 1500 bp of the hairpin only 30% of the total number CUR smaller than 10 bp are detected whereas beyond that limit no CUR smaller than that size is discriminated. If we define the threshold size n thr as the size of the CUR above which 50% of the predicted CUR are experimentally detected we find that n thr increases linearly with the number of open bps putting a limit around 10 bp for the smallest CUR size that we can detect (Fig. 2d, inset). What is the limiting factor in resolving large-sized CUR into single bps? Only by applying local force on the opening fork (thereby avoiding the large compliance of the molecular setup) and by increasing the stiffness of the probe might be possible to shrink CUR size distributions down to a single bp [8]. Figures 3c and 3d show how the CUR size distributions shrink and the largest CUR size decreases as the stiffness increases. Its value should be around 50-100 pN/nm for all CUR sizes to collapse into a single bp. Remarkably enough this number is close to the stiffness value expected for an individual DNA nucleotide strecthed at the unzipping force [11]. Any probe more rigid than that will not do better. Similarly to the problem of atomic friction between AFM tips and surfaces we can define a parameter η (defined as the ratio between the rigidities of substrate and cantilever) that controls the transition from stick slip to continuous motion [17]. For DNA unzipping we have η = |µ| kd 2 where µ is the average free energy of formation of a single bp, k is the probe stiffness, and d is the interphosphate distance. The value η = 1 determines the boundary where all CUR are of size equal to one bp (η < 1). In our experiments we have η 500 and to reach the boundary limit η = 1 we should have k ∼ 100 pN/nm consistently with what is shown in Figs 3c and 3d. It is remarkable that the elastic properties of ssDNA lie just at the boundary to allow for one bp discrimination. This suggests that molecular motors that mechanically unwind DNA can locally access the genetic information one bp at a time [11].
In summary, we have measured the distribution of sizes of unzipping regions of DNA. A toy model reproduces the experimental results and can be used to infer the experimental conditions under which the unzipping is done one bp at a time. This is achieved when the stiffness of the probe is higher than 100 pN/nm, which coincides with the stiffness of one base of ssDNA at the unzipping force.  Another oligonucleotide that forms a tetraloop (5'-acta-3') is also annealed at the other end of the 2.2 kb (6.8 kb) fragment. The handles were labeled with biotin and digoxigenin that specifically attach to coated polystyrene beads. Figure S1 shows the resulting sequences.

Measurements calibration and data acquisition
The instrument has a force resolution below 1 pN, which represents about 6% uncertainty at the mean unzipping force (

Statistics and reproducibility of measurements
Six different molecules were analyzed for the 2.2 kb and the 6.8 kb DNA sequences.
In fig S2 we show force-distance curves measured for 3 different molecules corresponding to the 2.2 kb and 6.8 kb sequences. As can be seen our measurements are reproducible.
Slight differences between different traces are due to the variability of the molecular setup and instrumental drift effects.

About using force-distance curves (FDCs) instead of force-extension curves (FECs)
We define the distance x tot as the length between the bead of the micropipette and the center of the optical trap (see Fig. 1a for an illustration of how x tot is defined and Eq. (1) for a mathematical expression). This magnitude is a measurement that we collect directly from the instrument as the optical trap is moved up and down along the fluidics chamber. This is the control parameter in the experiment, i.e. the variable that does not fluctuate and the parameter that determines the statistical ensemble (what we call mixed ensemble). Since we know the trap stiffness (k) and we measure the total distance (x tot ) and the force (f ), it is straightforward to convert the force-distance curve into a force-extension curve using the following relation: x m = x tot − f /x b , where x m is the molecular extension. As we show in the figure S4 we do not appreciate any significant difference when computing the histogram using a FEC or a FDC.   curves for many DNA sequences is beyond our capabilities. However to estimate that error we can use the the toy model to determine the expected standard deviation of the CUR size distributions (see Fig. S7).

Discrepancies between experimental and theoretical CUR size distributions
Discrepancies between the experimental results and the mesoscopic model are attributed to two factors: 1) Small CUR are missed due to limited instrumental resolution as described in the paper; 2) Medium and large CUR sizes are prone to large error because less than 10 bp CUR are seldom detected. Indeed, the power law describing the CUR size distributions indicates that the majority of CUR is small sized. However if one small sized CUR is missed then medium or large sized CUR will be overcounted as they should split into smaller pieces  the discontinuous opening of base pairs (i.e. the CUR). In contrast, equation (5) in the main text is an approximation that ignores the sequence dependence. The solution to this approximation are smooth expressions that collect the average behavior of the system over an ensemble of sequences (i.e. realizations of the disorder). Figure S8 shows the approximated solution superimposed on one disorder realization. parameters. By varying the parameters of the model along a wide range we observe how the shape of the CUR size distribution changes.
In all our simulations we took d = 0.59 nm and µ = −1.6 kcal/mol constant, since the distribution of CUR sizes weakly depends on them. Therefore we only changed σ (the standard deviation of the random distribution of energies) and k (the stiffness of the optical trap). We simulated sequences of 10 4 base pairs and we made 10 4 realizations for each value of σ and k.

Dependence on σ
We fixed the trap stiffness at k=60 pN/µm. The distribution of CUR obtained for each value of σ is shown in fig S9. The data was fit to Eq. (6) in main text, where a set of 4 parameters (A, B, C, n c ) was obtained for each value of the parameter σ. Fig. S10 shows the dependence of these parameters with σ.

Dependence on k
We fixed the amount of disorder at σ=3.20 kcal/mol. Figure S11 shows the distribution of CUR for some values of k and their fit to Eq. (6) in main text. Note that in the low k  range the CUR size distributions are wide and have good statistics to extract the values for A, B, C, n c . However, for k > 5 he CUR size distributions are too narrow to be reliably fit to Eq. (6). Figure S12 shows the dependence of the four parameters (A, B, C, n c ) on the trap stiffness.

STIFFNESS OF ONE NUCLEOTIDE
Here we calculate the expected stiffness of one nucleotide of ssDNA. The numerical value has been calculated from the elastic response of Freely Jointed Chain (FJC) model for semiflexible polymers, which is given by the following Extension vs. Force curve, where x s is the extension, f is the force applied at the ends of the polymer, L 0 is the contour length, b is the Kuhn length, k B is the Boltzmann constant and T is the temperature. In the case of a polymer, the contour length (L 0 ) can be written in terms of the number of monomers (n) times the length of one monomer (d) according to In the case of a ssDNA molecule, n is the number of bases and d is the interphosphate distance of one nucleotide. The FJC model assumes that the elastic response of the polymer scales with the number of bases. Therefore, the resulting Extension vs. Force expression is a homogeneous function with respect to the number of bases. The stiffness of the polymer at each stretching force is the derivative of the force with respect to the extension k s (f ) = df /dx s = (dx s /df ) −1 . For the FJC model, the stiffness is given by the following expression Using the parameters from section 2 2.1 (b = 1.2 nm, d = 0.59 nm) for one nucleotide (n = 1) we get a stiffness of k s = 113 pN/nm at f = 15 pN and k s = 127 pN/nm at f = 16 pN (see

PROTEIN-DNA INTERACTION
In the cell, the function of helicases is to unzip DNA during the replication process.
Although their mechano-chemistry is not clear [5] we interpret that helicases pull directly on the ssDNA. In this simplified view of the process, we visualize the helicase as a clamp that slides along one strand of the DNA and applies local force at the unzipping fork. In a more general scheme, the helicase applies force on the DNA by means of an effective stiffness k −1 eff = k −1 h + k −1 s , where k h is the stiffness of the helicase and k s is the stiffness of one base of ssDNA. From the conclusions of our work, we know that k s is high enough to locally unzip DNA one bp at a time. Therefore, the unzipping process will be one bp at a time as long as k h is higher than k s . Indeed, when the helicase pulls directly on DNA the stiffness of the helicase can be assumed to be very large (proteins are indeed very rigid objects) compared to the stiffness of a single base pair (k −1 h k −1 s ) and the effective stiffness between the helicase and the DNA is approximately equal to the stiffness of ssDNA (k eff ∼ k s ).
The previous explanation can be extended to proteins that interact with DNA. If a protein increases the stiffness of one bp of ssDNA, the local unzipping still could be done one bp at a time. On the other hand, if a protein decreases the ssDNA stiffness below the boundary of k s ∼ 100 pN/nm the local unzipping would show CUR of sizes larger than one bp. As far as we know, there is no protein with high compliance bound to the ssDNA between the helicase and the unzipping fork when the replication complex (helicase, polymerase, etc.) is set. However the full scenario of what might happen for different biological models under varied conditions remains to be seen.

CUR AND GENES
As an extra information to the reader, here we show the position of the genes that are  Figure S15 shows the location of these genes superimposed on the histogram of number of unzipped base pairs. It can be clearly observed that the lengths of most of these genes span over several rips.
We should not expect correlations between the CUR and the genes because the CUR depend on the trap stiffness used in the experimental setup. In other words, a different trap stiffness produces a different distribution of CUR on the same DNA molecule.