Self-Assembly of Informational Polymers by Templated Ligation

The emergence of evermore complex entities from prebiotic building blocks is a key aspect of origins of life research. The RNA-world hypothesis posits that RNA oligomers known as ribozymes acted as the first self-replicating entities. However, the mechanisms governing the self-assembly of complex informational polymers from the shortest prebiotic building blocks were unclear. One open issue concerns the relation between concentration and oligonucleotide length, usually assumed to be exponentially decreasing. Here, we show that a competition of timescales in the self-assembly of informational polymers by templated ligation generically leads to nonmonotonic strand-length distributions with two distinct length scales. The first length scale characterizes the onset of a strongly nonequilibrium regime and is visible as a local minimum. Dynamically, this regime is governed by extension cascades, where the elongation of a “ primer ” with a short building block is more likely than its dehybridization. The second length scale appears as a local concentration maximum and reflects a balance between degradation and dehybridization of completely hybridized double strands in a heterocatalytic extension-reassembly process. Analytical arguments and extensive numerical simulations within a sequence-independent model allowed us to predict and control these emergent length scales. Nonmonotonic strand-length distributions confirming our theory were obtained in thermocycler experiments using random DNA sequences from a binary alphabet. Our work emphasizes the role of structure-forming processes already for the earliest stages of prebiotic evolution. The accumulation of strands with a typical length reveals a possible starting point for higher-order self-organization events that ultimately lead to a self-replicating, evolving system. DOI:


I. INTRODUCTION
A key question in research on the origins of life is how structure and biochemical complexity could emerge from unstructured conditions on early Earth.One of the most well-known hypotheses in this context is that of an "RNA world" [1][2][3][4][5][6].In this scenario, RNA oligomers acted as both the carrier of information and "ribozymes," i.e., catalytic molecules allowing for the replication of this information and other metabolic functions.Yet, the RNA-world hypothesis does not address the question of how RNA strands that are complex enough to act as functional ribozymes came into being [7][8][9][10][11].In the light of evolutionary principles, a multistep scenario of self-organization seems plausible, cf.Fig. 1.However, the intermediate steps on the way toward functional polynucleotides are still not well understood.
The smallest ribozymes known today are 50-100 nucleotides (nt) long [12][13][14].A common view is that the reliable self-assembly of replicating RNA molecules required specific sequences of 20-30 nt in length [15][16][17][18].It has been shown that selective (Watson-Crick) base pairing can lead to a vast reduction of complexity in sequence space, a phenomenon called cooperative ligation [19,20].Moreover, a recent hypothesis suggests that a replicating catalytic network would emerge as a "virtual circular genome," which self-assembles from an initial distribution of short oligonucleotides [16].
The efficiency and viability of such catalytic networks strongly depend on the relative concentrations of oligonucleotides of different lengths.Commonly exponential length distributions are assumed [16,21].While a decay in length is natural, the exact shape of length distributions emerging from short building blocks is not known [16].Models and experiments have reported different observations [11,19,22].
Importantly, for low concentrations, the concatenation of oligonucleotides is dominated by a process known as templated ligation [11,[23][24][25].Unlike random ligation, where two oligonucleotides directly combine into a longer strand [26,27], templated ligation involves a third strand, cf.Fig. 1.The third strand, also called the "template," enables the covalent bonding of two other strands adjacently hybridized on the template.Thus, the self-assembly of oligonucleotides cannot be captured by standard polymerization theory, where exponential length distributions are well understood [28,29].
An alternative for providing the energy for bond formation is using a ligase, which can be either an RNA-based ribozyme or a modern protein.The latter is not prebiotically plausible.However, these enzymes drastically increase yields and reaction speeds.While ligases require oligomers of lengths between six and ten nucleotides, enzymatic systems can still serve as conceptual models to explore the principles of selfassembly and ligation-based early replication [8,19].
In order to study the self-assembly from smallest building blocks, we employed a computational and analytical approach based on a minimal "bottom-up" model.A transition between two dynamical regimes featuring differently decaying distributions has been reported recently [22].These results were obtained within a coarse-grained, deterministic model, which does not capture the entire complexity associated with templated ligation.
A general yet simple theory identifying the generic properties of the self-assembly from shortest oligomer building blocks has been missing.The goal of this work is to close this gap.To this end, we investigated a model that captures the elementary mechanism of self-assembly: the hybridization of strands to form arbitrary complexes on which templated ligation can occur.To focus on the assembly process alone, the dependence on oligonucleotide sequences was neglected: The binding energy of a hybridization site is proportional to its length and characterized by a single parameter γ, reflecting a typical binding energy per nucleotide, which emerges naturally in mean-field descriptions like the "random sequence approximation" [22].
Our main result is that the competition of timescales between (length-dependent) dehybridization, extension, and a degradation or observation timescale generically leads to a nonmonotonic strand-length distribution.We show that different dynamic processes govern different regions in the space of strand lengths.The boundaries between these regions are given by a local minimum at a length L min and a local maximum at L max > L min , which can be approximated by two analytical length scales L Ã ∼ L min and L † ∼ L max .This accumulation of strands at the typical length scale L max constitutes a novel structureforming process.Many of the microscopic details only enter the theory via a single parameter that characterizes the effective rate of extension.This allowed us to apply our theory to experiments, where a nonmonotonic length distribution emerges from the enzymatic ligation of a random pool of DNA sequences in a thermocycler.

A. State of the art
Previous theoretical work largely studied templated ligation by effective models.The description of the state space had been reduced to strand lengths, without taking into account the hybridization complexes explicitly [20,22,24,[42][43][44][45][46][47][48][49].In such a coarse-grained picture, (de) hybridization and templated ligation are not elementary reactions but are combined into an effective extension reaction.To specify the corresponding rate, the intricacies of the assembly process are neglected and a priori assumptions regarding the relevant configurations are made [20,24,[42][43][44][45].Many models neglect the dependence of the binding energy on the number of paired nucleotides [20,24,[42][43][44][45]47,48].Others consider a length-dependent dehybridization rate only up to some cutoff length such that the timescale of ligation is always much larger than the timescale of the dehybridization kinetics [22].A study addressing the full complexity of the assembly was limited by small system sizes [50].Strands and complexes.-Thebasic element of our dynamics is a directed oligomer called a strand, which consists of covalently linked nucleotides.All linear conformations that can arise from a set of strands are allowed; see Fig. 2(b).Only self-folding and branched hybridization structures are excluded.While these effects might become important for longer strands, they can be neglected when dealing with the self-assembly from short strands.The overlapping region between two strands is referred to as a hybridization site.Single strands are called m-mers.We explicitly refer to monomers, dimers, trimers, and tetramers for m ¼ 1, 2, 3, 4.
Elementary processes and parameters.-Theinternal elementary reactions are hybridization, dehybridization, and templated ligation; see Fig. 2(c).Hybridization and dehybridization are assumed to be elementary and reversible reactions with rates r on and r off .Thermodynamic consistency [51,52] connects r on and r off to the free energy ΔG ∘ b of a hybridization site: where β ¼ ðk B TÞ −1 , k B is Boltzmann's constant and T denotes the absolute temperature, N A is the Avogadro constant and c ∘ ¼ 1 mol=l is the standard concentration.When two strands of length L 1 and L 2 are hybridized adjacently on a third strand, they may ligate to a new strand of length L 1 þ L 2 .The ligation rate r lig is assumed to be independent of L 1 , L 2 , the directionality of the strands, and microscopic details.The uniform ligation rate can be interpreted as an effective average.A more detailed model could reflect that short oligomers predominately ligate to 3 0 ends [16,53] due to the underlying chemistry [54,55] or include stalling effects [39,56,57].Since template-free ligation is a much slower process than templated ligation [23], it is neglected.Moreover, two external reactions connect the system to its environment, cf.Fig. 2(d).(i) A coupling to a reservoir fixes the concentrations c m of m-mers with m ∈ R. (ii) Each complex exits the system at a constant rate r out .
Thermodynamics and kinetics of hybridization.-Thebinding energy ΔG ∘ b of a hybridization is assumed to be directly proportional to the length of the binding site l, see Fig. 2 where γ < 0 is a parameter that gives the (negative) binding energy per unit length in units of k B T. Equation (1) thermodynamically constrains the ratio of r on and r off .An additional kinetic parameter is needed for a full parametrization of the rates.Here, we use a constant rate of collision between two complexes r coll ¼ ðVN A c ∘ t 0 Þ −1 , where t 0 ¼ ðr 0 Þ −1 is a microscopic, intensive, collision timescale; see Sec.S1 of the Supplemental Material (SM) [58].All times are measured in units of t 0 .In general, two colliding complexes can form multiple configurations via Θ distinct hybridization channels; see Fig. 2(e).The probability of choosing each of these channels is assumed to be equal: Hence, the hybridization rate via a given channel is For the dehybridization rate we obtain from Eq. (1) In reality, the collision rate depends on the properties of the colliding complexes, the solvent, and temperature.
A parametrization where the binding energy γl is attributed to the dehybridization rate r off is a common kinetic assumption, and has been confirmed experimentally [59][60][61].The kinetic assumptions Eqs.(4) and ( 5) reduce the computational complexity, while still maintaining the sampling of all configurations thermodynamically consistent; see Sec.S1 in the SM [58].
In addition to this standard model, we also consider a "bounded" variant, where the dehybridization rate cannot become smaller than a minimal rate r cut , such that r off ¼ r cut if e γl =Θ < r cut .The bounded model can be thought of as an effective implementation of a system subjected to an external mechanism causing dehybridization of all complexes with a timescale of τ ∼ ðr cut Þ −1 .Such a situation can be realized by the thermal cycling in a "thermal trap" situated in a hydrothermal vent or be the consequence of other naturally occurring cycles [22,[62][63][64][65][66].
Standard parameters.-Ourprimary focus is a scenario where the building blocks entering from the reservoir are dimers only.If not indicated otherwise, c 2 ¼ 2 mM.The volume V is chosen such that 10 4 single-stranded dimers are present.This dimer-only scenario is the simplest model allowing for templated ligation and makes analytical considerations easier.If not otherwise stated, γ ¼ −0.5, r lig ¼ e −6 , and r out ¼ 5 × 10 −9 .In the bounded model r cut is a further parameter.

III. SIMULATION RESULTS AND ANALYSIS
The main observable in this work is the length distribution of strands ρðLÞ.It expresses the concentration of strands of length L, irrespective of the complexes they belong to.
A. Self-enhancing catalysis leads to long-tailed distributions Self-assembly via templated ligation is a self-enhancing mode of growth, where long strands facilitate their own formation.This process competes with degradation.For large outflux rates, strands remain inside the reaction volume only for short times and participate in few or even no templated ligations.The resulting stationary length distribution is therefore expected to be short tailed.
In contrast, for a small outflux rate, strands spend more time inside the system and thus have a higher chance to serve as a template or to get ligated, leading to the formation of longer strands.These longer strands again allow for larger hybridization sites and are better templates.Consequently, we expect the existence of a crossover value for the outflux rate r out ¼ r c out , where the formation of longer strands is dominantly self-enhancing.In the Appendix A, we derive the value of the crossover rate, under the assumption that (i) short-tailed distributions are dominated by the smallest building blocks and (ii) timescales of the dehybridization of these building blocks are small compared to the timescale of ligation.We probed the stationary distribution for various values of the outflux rate r out .Simulation results for the standard model are shown in Fig. 3(a).Figure 3(b) gives the analogous results for the bounded model.
Since the derivation of Eq. ( 6) does not rely on the dynamics of long strands affected by the cutoff, we expect the same transition from short-to long-tailed distributions in both scenarios.For sufficiently large outflux rates the resulting short-tailed length distributions look quantitatively similar.The curves for the crossover outflux rate r c out ¼ 3.24 × 10 −7 obtained from Eq. ( 6) are indicated as dashed (orange) lines.The long-tailed distributions for small outflux rates differ significantly: In the standard model, Fig. 3(a), a local minimum and maximum emerge.
In contrast, the long-tailed distributions in the bounded model, Fig. 3(b), decay monotonically.This behavior is rationalized in the right-hand column of Fig. 3, where we sketch the dependence of the (effective) rates of the processes affecting the strand length.The crucial effective growth process is the extension reaction, i.e., hybridization of a third strand followed by ligation.The effective rate is denoted by r ext .In the unbounded model, the dehybridization rate r off intersects the horizontal lines corresponding to constant extension and outflux rates at two distinct length scales L Ã and L † .This already hints at the two emergent length scales L min and L max in the length distribution.This intersection does not occur for the bounded model.
An analogous argument to Eq. ( 6) for the transition from long to short tails was made in Ref. [22].There, the authors studied assembly in a model, where strands break by cleavage.The crucial difference between their work and our unbound model is that in their model ligation is always the slowest process.

B. Competition of timescales enables extension cascades and persisting complexes
We now focus on the standard model without effective thermal cycling and with a sufficiently low outflux rate.It already became clear that the nonmonotonic behavior stems from complexes for which dehybridization is not necessarily the fastest process.If the binding energy of a duplex is close to zero, it dehybridizes quickly.In contrast, if the binding energy has a large absolute value, the duplex is stable and the extension with a third strand becomes probable.The extended complex is even more stable and another extension becomes even more probable.We call this phenomenon an extension cascade.
Disregarding dehybridization and outflux for now, an extension cascade only stops when no further extension is possible.In our model this is only the case for a fully hybridized duplex consisting of two maximally overlapping strands of the same length.These duplexes persist for long times.The fate of such a long-lived complex is determined by either dehybridization or outflux.
Structure of complexes.-Wepartition complexes into different classes by distinguishing between single strands, duplexes, and higher-order complexes, cf.Fig. 4(a).We further subdivide duplexes according to "parity": Fully hybridized duplexes have zero parity.In contrast, duplexes with even or odd overhangs have even or odd parity.Note that in the dimer-only model, mixed parities are excluded, because all strand lengths are even.
Extension cascades only reach a terminal fully hybridized duplex when they start from even duplexes.Duplexes with odd parity will undergo quasi-infinite extension cascades.Figures 4(a) and 4(b) show the partitioned length distribution.Short strands are mostly single stranded.In contrast, the concentration peak is dominated by fully hybridized duplexes.The effect of quasi-infinite extension cascades is visible in the tail.Higher-order complexes are  ) for all lengths.In both models, the length distributions develop long tails when decreasing the outflux rate r out .The orange (dashed) curves correspond to a system where the outflux rate takes the crossover value r out ¼ 3.24 × 10 −7 , cf.Eq. ( 6).For outflux rates below the transition value, the unbounded model exhibits a nonmonotonic length distribution with a local minimum and maximum at L min and L max .less abundant and do not contribute significantly to the shape of the distribution.
The minimum at L ¼ L min is due to the increase of the concentration of fully hybridized duplexes at a characteristic length scale L Ã ≲ L min , which we derive below.Figure 4(c) shows that L Ã is the typical length scale on which duplexes become stable enough for extension cascades to start.
Kinetics of duplexes.-Sincethe dehybridization rate depends on the length of the hybridization site, it connects timescales to length scales.As such, the characteristic scales L Ã and L † also divide the length distribution into different dynamical regimes.Since the length distribution is dominated by single strands and duplexes, we now consider the kinetics of duplexes in detail.
A duplex consisting of strands S 1 and S 2 with lengths L 1 and L 2 is fully characterized by the 3-tuple D ≔ ðL 1 ; L 2 ; o 1 Þ.The number o 1 is the (positive or negative) overhang of strand S 1 on its 3 0 end; see Fig. 2(f).When the two strands collide, they can form Applying this to Eq. ( 5), the dehybridization rate becomes First, we formally derive the onset of extension cascades at L Ã : Hybridization of a short m-mer can occur on one of the two nonzero overhangs o i , i ∈ ð1; 2Þ, of the duplex D and results in a triplex T i .If the m-mer is subsequently ligated to its neighboring strand, we call the combined process an extension.In that case, the length of the hybridization site of the m-mer with the duplex is z i ¼ minðjo i j; mÞ.To calculate an effective rate for this process, we assume that the dynamics of the m-mer hybridization is fast compared to the ligation rate and to the dehybridization rate of the duplex D. Consequently, the concentrations of the duplex D and the triplex T i can be assumed to be at a binding equilibrium and we obtain With this, we define the effective extension rate with an m-mer as the ratio of the rate of ligations from that triplex and the duplex concentration, i.e., r ext;m ¼ r lig c T =c D .Using Eq. ( 8) and taking into account that there are generally two ligation sites (i ¼ 1, 2), the extension rate reads Hence, the extension rate with a short m-mer is The ratio of r ext and r off gives the condition for the onset of extension cascades for the duplex D, 1 < r ext ðDÞ=r dupl off ðDÞ.As dimers are the most abundant species, we approximate r ext ðDÞ ≳ r ext;2 ðDÞ, yielding the lower bound: To be more systematic, we now consider a system containing only strand lengths smaller or equal to some fixed value L 0 .We then determine the minimal L 0 such that duplexes appear which can undergo extension cascades.Using Eqs. ( 7) and ( 9) we write the ratio in Eq. (11) as This ratio is largest for symmetric duplexes with The two duplex configurations maximizing the ratio are thus the odd duplex D AE1 ¼ ðL 0 ; L 0 ; AE1Þ and the even duplex D AE2 ¼ ðL 0 ; L 0 ; AE2Þ.The smallest L 0 , for which extension cascades are possible, defined as L Ã , are found by solving which yields L Ã ≈ 16.2.As the shortest building blocks are dimers, L Ã • ¼ ⌈L Ã ⌉ is calculated by ceiling L Ã to the next even integer, i.e., L Ã • ¼ 18.For strong binding, i.e., γ < −1, the subexpontial length dependence which enters via the channel number Θ can be neglected.To leading order one then has where we made the dependence of the microscopic kinetic parameter r 0 explicit.The distinct peak in the strand-length distribution is caused by fully hybridized duplexes ðL; L; 0Þ being end points of extension cascades.These duplexes persist until they dehybridize or leave the system.This gives rise to two different fates depending on their length.For L smaller than some critical value L † , r dupl off ðL; L; 0Þ > r out , duplex production in the stationary state is mostly balanced by dehybridization.For long duplexes with L > L † , we have r dupl off ðL; L; 0Þ < r out , and hence the stationary concentration is mostly determined by a balance of their production with the outflux.The outflux rate r out is independent of L, whereas r off decreases exponentially with L. We thus expect the existence of two different regimes where the stationary concentration of the fully hybridized duplexes exhibits a different dependence on L. We can find the length where the dehybridization rate becomes smaller than the outflux rate by Solving this equation numerically for the standard parameters, we obtain L † ¼ 30.07.Ceiling to the next even integer yields L † ▴ ¼ ⌈L † ⌉ ¼ 32.As above, we may ignore the logarithmic kinetic dependence on the length for strong binding and obtain From Fig. 4(a) we see that L † coincides with the position of the maximum L max , whereas L Ã serves as a proxy for the position of the minimum L min .
In Appendix B we perform an extensive screening of the parameter space demonstrating that the analytical estimates Eqs. ( 13) and ( 15) are generally valid.Moreover, the transient behavior of the length distribution in a closed system is discussed in Appendix C. There, the global transient observation time τ obs limits the maximal lifetime of any complex and thus plays the same role as the global degradation timescale r −1 off in an open system.

C. Monomer-dimer mixtures
So far, we have studied systems using dimers as initial building blocks.While this made our analytical considerations easier, only strands of even length appeared in the system.These strands enabled infinite extension cascades starting from duplexes with odd parity.As a result, the length distribution had a heavy tail; see Fig. 4(a).
Figure 5 shows the length distribution for a reservoir containing monomers and dimers at a total building block concentration of c tot ¼ c 1 þ c 2 ¼ 2 mM.The fraction of monomers f m ≔ c 1 =c tot is set to 70%.Now, infinite extension cascades are suppressed and the long tail collapses.The partitioning of complexes into various substructures shows that fully hybridized duplexes again dominate the tail of the distribution.As above, duplexes with finite overlap are distinguished by the parity of their overhangs, with the addition of mixed parity duplexes, having different parities at the different sites.
The general understanding of the characteristic features of the length distribution developed above remains valid.Repeating the calculations leading to Eq. ( 13) for the onset of extension cascades with the combined extension rate for both monomers and dimers leads to the same equation, with the dimer concentration c 2 replaced by the total concentration c tot ; see Appendix D. The position of the maximum does not depend on the building block concentration, cf.Eq. (15).Hence, the peak in Fig. 5 is roughly at the same position as in the dimer-only system.We confirm the validity of this result by probing different monomer fractions f m in Appendix D.

D. Growth of complexes
The length scale L † relates the dehybridization to the outflux timescale.However, its role in the dynamics is not straightforward.We now show that L † is a typical scale where self-enhancing processes leading to the growth of strands and complexes break down.
Trajectories of stable duplexes.
-In what follows, we investigate the trajectories of extension cascades starting from stable duplexes until they leave the system in a fully hybridized configuration.We sample trajectories from the steady state of the monomer-dimer system with a monomer fraction of f m ¼ 70%; see Fig. 5. Our sampling algorithm is consistent with the events that occur in a steady state and is explained in Sec.S2 of the SM [58].
An initial stable duplex consists of a long strand of size L long and a short strand of size L short ≤ L long .It has an overlap l initial and a length C initial ¼ L long þ L short − l initial , cf.Fig. 6(a).These stable duplexes are the starting point for extension cascades and eventually become fully hybridized duplexes of length C final ≥ C initial .If the length of the initial complex is the same as that of the final complex, C final ¼ C initial , no extension occurs beyond the length of the original duplex and we speak of pure primer extension.In contrast, if C final > C initial , processes occurred that extended the length of the initial duplex and we speak of duplex extension.A detailed look at extension cascades involving duplex extension can be found in Appendix E.
Figure 6(b) shows the joint probability distribution pðC initial ; l initial Þ.It is maximal for C initial ∼ L † and l initial ∼ L Ã .The accumulation of probability at this point characterizes a typical initial configuration, but does not determine the length of the individual strands.
Figure 6(c) shows pðL long ; L short Þ.We see that it is restricted to the lower triangle defined by L Ã ≲ L ≲ L † .The boundaries of this region reflect our analysis above: Strands shorter than L Ã typically do not bind strongly enough to start extension cascades.In contrast, strands longer than L † are mostly double stranded and thus not available to form the FIG. 5. Partitioned length distribution for a system coupled to a reservoir containing monomers and dimers at a monomer fraction of f m ¼ 0.7.In contrast to Fig. 4, virtually all strands with L > L Ã belong to a fully hybridized duplex.initial duplexes.The approximately uniform behavior of the distribution in that region indicates that no particular combination of strand lengths is preferred.
Next, we consider the length C final of the fully hybridized duplex that marks the end of an extension cascade.Figure 6 (d) shows the joint probability distribution pðC final ; C initial Þ.The diagonal line C initial ¼ C final indicates pure primer extension and has a total weight of ∼17%.The point Finally, Fig. 6(e) shows pðC final ; L long Þ.Purely autocatalytic processes, where the long strand facilitates the formation of itself, are contained on the diagonal line L long ¼ C final .These autocatalytic trajectories constitute a fraction of about 2.5% of all extension cascades.
Autocatalytic trajectories (L long ¼ C final ) are a subset of trajectories with pure primer extension (C initial ¼ C final ).Most extension cascades, however, lead to fully hybridized duplexes that are longer than either of the two strands of the initial complex.In order to emphasize the cooperative effects, we refer to this more general process as heterocatalytic growth.
Catalytic growth and reassembly.-Autocatalytic and heterocatalytic cycles are formed by combining extension cascades with a dehybridization of the final duplex and eventual reassembly.Figure 7(a) illustrates an autocatalytic cycle, while Fig. 7(b) shows heterocatalytic growth.Note that in general heterocatalytic processes involve duplex extensions.
In the following we investigate how such catalytic cycles shape the strongly nonequilibrium regime L Ã ≤ L ≤ L † of the strand-length distribution: After a fully hybridized duplex is reached at the end of an extension cascade it will dehybridize or leave the system.If it dehybridizes, it may hybridize to another single strand and create a new stable duplex with a new overhang.By this reassembly, long strands catalyze the formation of other long strands.The reassembly probability p ra is mostly determined by the competition between outflux and dehybridization.A sigmoidal dependence on the length C final follows: The reassembly probablity p ra decays exponentially with the C final .Thus, the production rate of longer strands C final ∼ L † is drastically reduced.In summary, the strongly nonequilibrium catalytic strand growth is constrained to strand length between L Ã and L † .It is this self-enhancing dynamical behavior which leads to the increased production of strands with lengths L Ã < L < L † .This effect directly yields a region in the strand length distribution, where concentration increases with length.To the right of the peak at L ∼ L † , catalytic cycles producing longer strands are too slow in order to compete with the outflux rate r out .Similarly, in the analogous transient situation, the observation time τ obs is too short to allow the catalytic production of strands beyond a certain length.

IV. EXPERIMENTAL SYSTEM
To test our theory experimentally, we used DNA strands of length L bb ¼ 12 as basic building blocks in a closed volume.The strands have random sequences drawn from a binary alphabet of A (adenine) and T (thymine).As discussed above and in Appendix C, in closed systems the global transient observation time τ obs plays the same role as r −1 off in an open system.Enzyme-free templated ligation is slow and not compatible with experimental timescales [19].Ligases speed up the assembly process, but require the formation of complexes involving at least three strands with L⪆12 and the ligase.The probability of finding such complexes decreases with temperature.Further, the ligase activity itself is temperature dependent, resulting in a nontrivial temperature dependence of the effective extension rate.In isothermal systems, one may encounter a stalemate situation.For high temperatures, the extension rate is small since the formation of the required complexes is thermally suppressed.In contrast, for low temperatures, the dehybridization rate is small and the system is effectively frozen.This stalemate can be resolved using temperature cycles [8,19]: During the cold phase, the extension rate is initially high until virtually all possible ligations in existing complexes have occurred.Hence, the hot phase is required to create new ligatable complexes.However, temperatures in the hot phase must still be such that the binding energy γ remains negative.Only then is the binding energy still proportional to the overlap length, such that the competition of timescales gives rise to a nonmonotonic length distribution.

A. Experimental method and results
Our experiment was performed using a Taq DNA ligase from New England BioLabs and a ThermoFisher ProFlex PCR system to generate the temperature profile shown in Fig. 8(a).This setup is similar to the setup used in Ref. [8].The analysis of the length distributions is done by running the sample in a polyacrylamide gel electrophoresis, poststaining the DNA with intercalating SYBR gold dye, and taking fluorescent images of the gel in a BioRad ChemiDoc MP.Concentration quantification is performed with a custom software extracting the lane intensity from gel CCD images (see Sec. S3 D in the SM [58]).The bands visible at lengths of 16 and 24 nt for all lanes in Fig. 8(b) are artifacts from the ligation buffer and DNA synthesis, respectively.
We analyzed the length distribution for various observation times τ obs for different isothermal conditions and cycling scenarios, where temperature alternated between T cold ¼ 33 °C at variable temperature T hot , cf.Fig. 8(a).Isothermal experiments resulted in no product formation within 60 and 116.5 h, cf.Sec.S3 of the SM [58].For low temperatures, even short duplexes with strands of length L bb cannot separate.For high temperatures, the extension is suppressed because no stable ligatable complexes are formed.
Cyclic conditions led to different product distributions, shown in Fig. 8(b).The length distribution decayed quickly for T hot ¼ 50 °C, while it decayed slowly for T hot ¼ 58 °C.All length distributions showed a nonmonotonic behavior exhibiting a local minimum L min between 24 and 48 nt and a maximum L max between 36 and 72 nt.For higher dissociation temperatures the peak was found to be flatter and wider.The shape of the distribution changed significantly in a limited range for T hot .

B. Effective theory
In order to understand the behavior of the experimental system when varying the temperature T hot in the hot phase, we consider the thermodynamics of the standard Gibbs free energy ΔG ∘ .It enters our theory as the central temperature-dependent binding parameter ΔG ∘ =ðk B TÞ.To leading order in T, the Gibbs energy can be written as ΔG ∘ ¼ ΔH ∘ − TΔS ∘ , where the standard enthalpy ΔH ∘ and standard entropy ΔS ∘ are temperature-independent microscopic parameters [70,71].
The most significant effects occur when the binding energy changes sign at the critical temperature T c ¼ ΔH ∘ =ΔS ∘ .Assuming that ΔH ∘ and ΔS ∘ scale linearly with the length of the hybridization site, T c is independent of length.For our experiment, we estimated the expected value of T c between 60°C and 75°C (see Sec. S3 A in the SM [58]).At positive binding energy, i.e., for T hot > T c , strands of all lengths dissociate quickly in the hot phase.We are then in a situation akin to the bounded model discussed in Sec.III A. In order to observe a nonmonotonic distribution, the dehybridization (and thus the reassembly) rate in the hot phase must decay exponentially with strand length, which requires T hot < T c .For our effective theory we employ a linear expansion of the binding parameter γ below the critical temperature T c : In this formula, ΔG °1 is a typical binding energy per nucleotide.The parameter ξ has units of temperature and characterizes the (inverse) slope of γðTÞ around T c .From ΔG °1 ¼ ΔH °1 − TΔS °1 it follows that ξ ¼ −k B T 2 c =ΔH °1 ≈ 30 K for typical enthalpies and entropies (see Sec. S3 A in the SM [58]).
Unlike T c , the parameter ξ is inversely proportional to ΔH °1 and thus depends on strand length.Our simple model is based on effective binding energies of (self-complementary) nucleotides.It is therefore questionable whether the value of ξ ≈ 30 K obtained from standard libraries for matching nucleotides is appropriate here.Since hybridization sites also contain mismatches, the correct value of ξ describing the experimental behavior is likely smaller.Under the assumption that a nucleotide has a probability of 1=2 to encounter its complement, an adjusted value of ξ ≈ 15 K seems reasonable.Finally, for a full parametrization of the dehybridization rate r off ðL; T hot Þ ∼ r 0 exp½γðT hot ÞL, we need to specify the collision rate r 0 .While the exact value of this rate depends on microscopic details, experimental evidence suggest that a value of r 0 ¼ 10 6 s −1 is reasonable [72][73][74].
Figure 9 shows the length dependence of the dehybridization rate r off for various values of T hot below T c ¼ 62 °C and ξ ¼ 13 K.One should recall that the extension rate r ext is the effective rate at which a duplex binds to a third strand and subsequently ligates.As explained initially, extensions are likely to happen only in the cold phase.Because of frustrated dehybridization, we expect a single extension per duplex per cycle.Consequently, the extension rate determining L Ã is given by r ext ∼ τ −1 cyc .In transient systems without outflux, the inverse observation time τ −1 obs ¼ N cyc τ cyc replaces r out in determining L † .
For τ cyc ¼ 180 s and N cyc ¼ 1000, we obtain the two horizontal lines in Fig. 9.The intersections with the lengthdependent dehybridization rate determines the scales L Ã and L † as a function of T hot .The big dots and triangles denote the values L Ã • and L L ▴ obtained by ceiling to the next integer multiple of L bb .
We observe that the scales L Ã and L † shown in Fig. 9 agree well with the experimental observations for L min and L max shown in Fig. 8.The exact values of L Ã and L † depend on the exact values for the parameters r 0 , ξ, and T c , for which we used reasonable estimates.As such, they should not be confused with rigorous predictions.Nonetheless, both the order of magnitude and the qualitative dependence of experimental results on T hot are fully captured by our effective theory.
However, we expect the effective theory to break down close to the critical temperature T c .For small γ, the contributions to the dehybridization rate due to microscopic details play a more prominent role.Further, when approaching T c , the characteristic features of the distribution shift to larger lengths.Then, both the experimental timescales and the overall oligonucleotide mass become limiting and the depletion of building blocks starts to play a role.Moreover, the quantitative evaluation of the gel plots is more difficult for long strands, cf.Fig. 8(a).For this case, other effects, like self-folding of strands, may be an important mechanism that is absent from the theory, cf.Ref. [8].

V. SUMMARY AND DISCUSSION
Since major transitions in evolution appear to have occurred when smaller entities were coming together to FIG. 9. Temperature dependence of the emerging length scales in the effective theory.Solid lines: dehybridization rate r off ¼ r 0 e γðT hot ÞL for various values of T hot .The inset shows the linear function γðTÞ, Eq. ( 18), with T c ¼ 62 °C and ξ ¼ 13 K. Horizontal dashed lines denote the effective extension rate τ −1 cyc and the inverse observation time τ −1 obs .As in Fig. 3(a), the competition of timescales determines the scales L Ã and L † as the intersection between the solid and dashed lines.They are mapped to the observable length scales L min (circles) and L max (triangles) by ceiling the intersection point to the next multiple of L bb ¼ 12.By approaching T c , the binding energy and thus the slope become smaller in magnitude, and the intersection points move to larger lengths.
form larger ones [75], a multistep scenario toward increased complexity also seems natural in a prebiotic context.While the importance of templated ligation in this scenario is clear [76], the assembly dynamics emerging from this interaction of short building blocks were not fully understood previously.
Using a minimal bottom-up model, we showed that a nonmonotonic length distribution arises from the competition of three timescales or, equivalently, the corresponding rates.
(1) The dehybridization rate r off which decreases exponentially upon increasing strand length L, with the decay determined by the binding parameter per nucleotide γ.
(2) An effective extension rate r ext of a duplex, which is determined by the ligation rate r lig , γ, and the concentrations of building blocks.(3) A global timescale determined by either the outflux rate r out or an observational time τ obs .The competition between r ext and r out determines whether we see long-tailed distributions at all: If r out is larger than r ext , ligations are unlikely.The competition between r off and r ext leads to the emergence of extension cascades at a typical length scale L Ã : As soon as strands in a hybridization complex have a length such that r ext > r off , they undergo extension cascades resulting in persistent configurations, which cannot extend further.The fate of such a configuration is determined by the competition between r off and r out : Fully hybridized duplexes shorter than L † dehybridize before leaving the system.The single strands released in this way subsequently act as templates in newly formed primer-template complexes and thus catalyze further strand growth.
The combination of extension cascades and reassembly represents (auto or hetero)catalytic cycles producing longer strands from shorter building blocks.In the strand-length distribution, this strongly nonequilibrium regime is visible as an increase in concentration with length.Extension cascades are fast.Therefore, the dehybridization time of the fully hybridized duplex at the end of the cascade determines the completion-time scale of these cycles.For strand lengths where this timescale becomes comparable to transient or global degradation times, these catalytic cycles have no time to complete, and the length distribution decays.
The validity of this scenario was revealed using a stateof-the-art simulation.To our knowledge, no comparable simulation is available to this date.As our experiments demonstrate, the emerging length scales can be tuned by changing environmental parameters such as the melting temperature T hot without changing the chemistry.On early Earth, strands of a characteristic scale L † emerging from the self-assembly could act as building blocks of a higher level of organization.Moreover, length-dependent accumulation of such strands might trigger novel effects like phase transitions [64,65,77,78].
Advantages and shortcomings of our model.-Ourminimal model allowed us to reveal universal features of the self-assembly process and to derive analytical expressions for the emerging length scales.Yet, there are several aspects that our model does not capture.It does not allow for secondary structures like hairpins and other "nonproductive" configurations [8,16].While these (potentially functional) structures will likely be important in later stages of evolution, in the current scenario they probably have the same role as fully hybridized duplexes.
The strongest simplification in this work is the negligence of any explicit sequence dependence for the binding energy, i.e., the use of self-complementary nucleotides.However, an effective self-complementary description of hybridization arises naturally in a mean-field random sequence approximation [22].It assumes that differences in binding energies between the (complementary and noncomplementary) nucleotide pairs are small with respect to the average binding energy γ per nucleotide.
While this scenario constitutes the extreme of vanishing sequence selectivity, a comparable situation arises in the other limit of perfect selectivity.There, only fully complementary strands bind, and a similar description is achieved using a combinatorial factor for each nucleotide, which can be incorporated by a reduction of γ or using effective concentrations (see also Sec.S1 H of the SM [58]).During extension cascades, the incorporation of mismatches is suppressed, since building blocks matching the template bind stronger and thus lead to higher extension rates.Additional stalling effects for nonmatching short building blocks likely enhance this effect [39,56,57].Moreover, since ligation reactions are irreversible in our model, it corresponds to the high dissipation limit of sequence copying, which generally increases fidelity, cf.Ref. [79].Consequently, we expect that in a fully sequence-dependent model, where mismatches are penalized, the maximum in the length distribution is still present and caused by fully hybridized duplexes with comparatively few errors. Outlook.
-Our work provides a first step toward understanding the emergence of structure in a kinetically and thermodynamically consistent bottom-up approach.Extending our algorithm to include sequence-dependent parameters can be a starting point for future studies.
In any explicitly sequence-dependent model, the timescales involved in the extension-reassembly process would include the sequence of the strand in addition to its length [19].In particular, such a model extension would allow for a more direct study of evolutionary processes in sequence space.As discussed above, sequence selectivity and thus replication may arise during extension cascades.The combined heterocatalytic and autocatalytic nature of the assembly process emphasizes the importance of cooperation, cf.Sec.III D. Therefore, model extensions could also provide a testing ground for abstract frameworks involving catalytic networks [20,[80][81][82]

APPENDIX A: ESTIMATION OF THE OUTFLUX RATE AT THE TRANSITION FROM SHORT-TO LONG-TAILED DISTRIBUTIONS
The transition from short-tailed to long-tailed distributions occurs when the direct production of long strands from reservoir strands is balanced by the production involving long strands as templates, cf.Fig. 10.In the following, the corresponding crossover value of the outflux rate r out in the dimer-only model, Eq. ( 6), is derived.
Consider the total concentration ρ > of strands with a length larger than two, i.e., strands not provided by the reservoir.In a steady state we have where ϕ is the concentration flux indicating processes by which ρ > grows, namely the formation of tetramers from dimers.Notice that the formation of strands with L ≥ 4 does not change ρ > .In general, this templated ligation can happen in all triplex configurations with two dimers that are adjacently hybridized.Ignoring higher-order complexes, we assume that the dominant contribution to the production of longer strands arises from a ligation reaction happening at triplexes consisting of two dimers and a templating strand of length L ≥ 2; see Fig. 10.
As the hybridization dynamics of dimers are fast, we assume a biding equilibrium.This means that the ratio of the concentration of a triplex and its constituents is determined by its binding energy.
With the elementary rates for hybridization and dehybridization defined in Sec.II B, the binding energy of a complex C is given by where we sum over all hybridization sites.The term σ lnð2Þ is a "symmetry penalty" that occurs if the complex is rotationally symmetric (σ ¼ 1) and is zero (σ ¼ 0) otherwise (see Sec. S1 G in the SM for details [58]).Using Eq. (A2), the ligation flux for triplexes consisting of dimers only is ϕ 2 ¼ ðc 2 Þ 3 e −2γ r lig ; see Fig. 10(a).In contrast, the ligation flux corresponding to templates of length m > 2 is where we took into account the different configurations of the relevant triplexes; see Fig. 10(b).We separate the ligation flux into two components, ϕ ¼ ϕ 2 þ ϕ > .The first term, ϕ 2 , only involves the building blocks provided by the reservoir.In contrast, the second term, ϕ > ≔ P m>2 ϕ m , involves longer strands.The transition occurs when the latter dominates the former.
Assuming that the length distribution is dominated by single strands, we approximate ρ > ≈ P m>2 c m , to obtain an expression (lower bound) for ϕ > as In the stationary situation the balance equation (A1) is which is solved by the crossover value ρ > ¼ ρ c > .In this approximation, autocatalysis starts to dominate the production of longer strands from the background when In terms of the outflux rate this means that autocatalysis dominates if To verify that our results of Sec.III B are indeed generic, we performed an extensive parameter sweep.In each row of Fig. 11, a single parameter is varied while all the other parameters are fixed at their standard values.The left-hand column in Fig. 11 shows simulated stationary length distributions.The right-hand column presents the analytical expressions for the (ceiled) values of L Ã and L † with the characteristic lengths L min and L max from the simulation result.A colored curve in the lefthand panel corresponds to the accordingly colored marker in the right-hand panel.The tails of the distributions are smoothed using a standard running-average smoothing algorithm.
Figure 11(a) shows the result for a variation of the outflux rate r out .The transition from a short-to a long-tailed length distribution was already discussed in Sec.III A. As the outflux rate should not influence the onset of extension cascades, we expect the minimum to remain constant, which the simulation confirms.Increasing the outflux rate shifts L max to lower lengths in a logarithmic way in accordance with Eq. ( 16).
In Fig. 11(b) we vary the binding energy γ.We observe that increasing the binding energy displaces the characteristic peak toward shorter strands.The behavior of both curves is roughly inversely proportional: L ∝ −γ −1 .
Next, we vary the bare ligation rate r lig ; see Fig. 11(c).The position of the maximum remains unchanged, since the transition determining the fate of a fully hybridized state is not affected by the ligation rate; see Eq. (15).In accordance with Eq. ( 13), decreasing r lig logarithmically shifts the onset of extension cascade and the position of the minimum to larger lengths.For the smallest ligation rate plotted, we cross the transition toward short-tailed distributions described in Sec.III A, and the characteristic peak in the length distribution disappears.
Figure 11(d) shows the effect of varying the dimer concentration c 2 .Since reducing c 2 logarithmically reduces the effective rate of extension with a dimer, higher concentrations enable extension cascades already for duplexes consisting of shorter strands, shifting the minimum to the left.Again, the position of the peak remains constant.For the smallest concentration shown, we cross the transition toward a short-tailed distribution.
In summary, the phenomenological positions of the minimum L min and the peak L max are well described by the expressions for L Ã and L † , Eqs. ( 13) and (15).

APPENDIX C: TRANSIENT BEHAVIOR IN CLOSED SYSTEMS
Next, we investigate a closed system without influx or outflux.We prescribed the concentration of initial building blocks and let the system evolve transiently.Because of the irreversibility of the ligation reaction, closed systems are not ergodic: Short building blocks will deplete and the final configuration contains only two very long strands.However, this stationary state will never be reached on practical timescales.
Thus we consider a transient state at intermediate times.We focus on the situation where long strands have already formed, and extension cascades are possible, with still a sufficient amount of short building blocks available.Then, the system behaves similar to the steady state in an open system with small outflux rates.
As in the stationary case, we observe a minimum and maximum in the length distribution.Figure 12(a) shows the length distribution for the standard choice of parameters for various values of the transient observation time t ¼ τ obs .Figure 12(b) shows that the position of the maximum increases logarithmically with the observation time.
In order to get an intuition for this behavior, we again use an argument involving the competition of timescales.As in the open systems, strands longer than L Ã will dominantly occur in fully hybridized configurations.In contrast, the second timescale is not determined by a global outflux rate and fully hybridized duplexes eventually dehybridize with a length-dependent rate r off ðLÞ.
Yet, dehybridization of duplexes of length L only plays a role for observation times longer than τ obs ∼ r off ðLÞ −1 .
We thus expect the global transient observation time τ obs to play the same role as the timescale r −1 off in an open system.The length scale L ¼ L † that determines the peak in a closed system can then be obtained by replacing r off with τ −1 obs in Eq. ( 15) or (16).In that case, the position of the peak should increase logarithmically with time, consistent with the results shown in Fig. 12(b).

APPENDIX D: EXPLORING MONOMER-DIMER MIXTURES
Figure 13(a) shows the length distribution for a reservoir where the total initial building block concentration c tot ¼ c 1 þ c 2 ¼ 2 mM is constant.We then vary the monomer fraction f m ≔ ðc 1 =c tot Þ from zero to 90%.The orange curve is the dimer-only system at standard parameters, showing the long tail caused by the infinite extension cascades.For any finite monomer concentration, infinite extension cascades are suppressed and the long tail collapses.
The length distributions for finite monomer fractions look qualitatively similar.The larger f m , the less nucleotide mass is added by the influx, and the lower the concentrations.The lower the monomer concentration, the more of the bias toward strands of even lengths is retained.The bias dominates for short strands, leading to the zigzag pattern visible in Fig. 5(a).For long strands, the bias vanishes.In accordance with Eq. ( 15), the position of the maximum is unchanged, as it does not depend on the building block concentration.
The minimum position, i.e., the typical length L Ã for the onset of extension cascades, is derived analogously to the dimer-only model using the condition 1 < r ext ðDÞ=r dupl off ðDÞ, cf.Sec.III B. Instead of considering the extension with a dimer only, one needs to include the extension with a monomer.The extension rate is ðc 2 e −γ½minðjo i j;2Þ þ c 1 e −γ½minðjo i j;1Þ Þ: ðD1Þ The criterion for extension cascades then reads ðc 2 e −γ½lþminðjo i j;2Þ þ c 1 e −γ½lþminðjo i j;1Þ Þ: ðD2Þ The right-hand side of the Eq.(D2) is maximal for the odd duplex configuration D AE1 ¼ ðL 0 ; L 0 ; AE1Þ, for which l þ minðjo i j; 2Þ ¼ l þ minðjo i j; 1Þ ¼ L 0 , which leads to Consequently, L Ã for monomer-dimer mixtures obeys where c tot ¼ c 1 þ c 2 is the total concentration of building blocks.Equation (D4) is the same formula as for the dimer-only system, except that the dimer concentration c 2 is substituted by the total concentration of building blocks c tot .In accordance with formula Eq. (D4), we observe that the position of the minimum is constant L min ¼ 19 under variation of the monomer fraction while keeping the total concentration fixed at c tot ¼ 2 mM; see Fig.

APPENDIX E: BEYOND PURE PRIMER EXTENSION
Figure 6(a) already depicted an example of an extension step leading to the growth of a complex beyond its initial length.In the following, we take a more detailed look at this phenomenon.
Growth happens essentially independently at each end of a duplex.It thus makes sense to take the perspective of a single end, since it allows us to distinguish the two strands by their roles: We call the strand whose end is overhanging the template, whereas the other strand is called the primer.Moreover, we refer to the length of the overhang at the start of a trajectory as its initial copy site length l cs .
The obvious mechanism that leads to duplex extension is depicted in Fig. 14(b).It occurs when the original primer is extended with a strand that is longer than the (remaining) length of the copy site.After this extension, the roles of primer and template are reversed and a new copy site is created.We thus denote this process as primer-template switching.
A complex undergoing an extension cascade is not always a simple duplex.Ligation reactions can also occur away from the stable hybridization site.We say that template extension occurs, if another strand facilitates the extension of the template strand; see Fig. 14(c).From the perspective of the stable hybridization site, the length of its associated copy site l cs has increased.Figure 14(d) shows the number of extension events along a trajectory as a function of the total single-stranded length that is covered during the trajectory, C final − l initial .The standard primer-extension steps (p, red curve) are most common.In contrast, template extension (t, blue curve) is rare.For large C final − l initial , the number of events behaves strictly linear and primer-template switching (s, black curve) is approximately 3 times less likely than primer extension.For small values of C final − l initial , the relative fraction of primer-template switching increases, since a short available overhang increases the chance of primertemplate switching.The monomer fraction f m is varied between zero and 90% at a total concentration c tot ¼ 2 mM.For low f m the concentration between even and odd strands oscillates heavily for short strands.The long tail that is present for f m ¼ 0 (orange curve, only even strand lengths shown) collapses even for very small f m .(b) L Ã in the monomer-dimer system is calculated via Eq.(D4), which is the same as the formula for the dimer-only system upon substituting the dimer concentration with the total concentration c tot .

FIG. 1 .
FIG.1.Prebiotic evolution is a multistep process that creates new entities exhibiting emergent mechanisms of interaction.Our work outlines the emergence of structured oligonucleotides from the smallest building blocks.
FIG. 2. (a)Short strands entering the reaction vessel from the reservoir are the initial building blocks of the system.Inside the vessel, strands form various complexes via hybridization and dehybridization.Subsequent ligation leads to longer strands eventually leaving the system.(b) Examples of higher-order complexes with multiple hybridiziation sites.(c) The internal elementary processes are hybridization, dehybridization, and templated ligation with corresponding rates r on , r off , and r lig .(d) The external elementary processes couple the system to its environment.Short strands of length L ¼ μ for μ ∈ R are chemostated via the coupling to an external reservoir of initial building blocks at fixed concentrations c μ .All complexes leave the system at a constant rate r out .(e) When two complexes collide, they can form Θ different hybridization complexes.(f) Duplexes D ≔ ðL 1 ; L 2 ; o 1 Þ are uniquely characterized by the strand lengths L 1 ; L 2 ∈ N and one of the overhangs o i ∈ Z of strand S i , i ∈ f1; 2g, at its 3 0 end.Overhangs o i can be negative, as for the case of o 2 in the right-hand example.

FIG. 4 .
FIG. 4. (a)Partitioning the contributions of the different subgroups to the strand-length distribution reveals the dominant configurations: Short strands are mostly single stranded.Strands with lengths around the peaks are in the persistent fully hybridized zero-parity configuration.In the dimer-only model, odd duplexes never reach a fully hybridized state and cause the long tail of the distribution.(b) The probability of different complex types conditioned on strand length.(c) The probability that a duplex with nonzero parity is stable conditioned on strand length.Around L ¼ L Ã [cf.Eq. (13)] this probability increases rapidly.

FIG. 3 .
FIG.3.Stationary length distributions (left) and competition of timescales (right) for the standard model (a) and its bounded variant (b) for different values of the outflux rate r out .In the bounded model, dehybridization cannot become smaller than r cut ¼ 0.05.Dehybridization is thus faster than ligation (r lig ¼ 2.5 × 10 −3 ) for all lengths.In both models, the length distributions develop long tails when decreasing the outflux rate r out .The orange (dashed) curves correspond to a system where the outflux rate takes the crossover value r out ¼ 3.24 × 10 −7 , cf.Eq. (6).For outflux rates below the transition value, the unbounded model exhibits a nonmonotonic length distribution with a local minimum and maximum at L min and L max .

FIG. 6 .
FIG. 6.(a) Trajectories start with an initial stable duplex characterized by its strand lengths L long and L short together with the initial overlap l initial and complex length C initial .Trajectories not creating new single-stranded regions beyond the initial complex are referred to as "pure primer extension."In contrast, duplex extension leads to a final complex with C final > C initial .(b)-(e) Trajectory statistics can be understood from various joint probability distributions, where L Ã ∼ 17 and L † ∼ 31.See main text for details.

FIG. 7 .
FIG.7.Heterocatalytic (a) and autocatalytic (b) processes for the growth of strands.In the strongly nonequilibrium regime, extension cascades cover the available overhang of stable duplex and form longer fully hybridized strands.These long strands can then dehybridize and reassemble, thus creating new overhangs to be covered by extension cascades.The reassembly probability p ra is determined by the balance between dehybridization and outflux and decays to zero fast for L ≳ L † .

FIG. 8 .
FIG. 8. Product concentration analysis for a 12 nt random sequence AT-only pool.(a) Experimental temperature profile.Ligation occurs for 120 s at 33 °C after which the sample is heated to the variable hot reassembly temperature T hot for 20 s.(b) Image of a polyacrylamide gel electrophoresis.The first lane on the left shows the "baseline" sample, which is similar to the other lanes but was not subjected to temperature cycling.The other lanes have the same ligation conditions but different temperatures for dissociation.(c) Quantitative results for the length distribution.Nonmonotonic length distributions with a maximum and minimum were observed.

FIG. 10 .
FIG. 10.(a) Formation of a tetramer from the dimer background.A total overlap of two leads to a total binding energy of βΔG ∘ ¼ 2γ.(b) Templated ligation of dimers on an m-mer.There are two overhanging configurations with βΔG ∘ ¼ 3γ and m − 3 configurations with βΔG ∘ ¼ 4γ.

FIG. 11 .
FIG. 11.Probing the parameter space of the dimer-only model.Left-hand column: stationary length distributions.Right-hand column: comparison of the observed values L min and L max and the predictions for L Ã and L † via Eqs.(13) and (15).Variable parameters are (a) the outflux rate r out , (b) the dimensionless binding energy per nucleotide γ, (c) the bare ligation rate r lig , and (d) the concentration of chemostated single-stranded dimers c 2 . 13(b).

FIG. 12 .
FIG. 12. Transient strand distributions.Left: temporal development of the length distribution in a closed system.Over time the concentration of short strands decreases and the minimum develops into depleted region.Right: the position of the maximum L max shifts logarithmically with time toward longer lengths.

FIG. 13
FIG. 13.(a) Length distributions for monomer-dimer mixtures.The monomer fraction f m is varied between zero and 90% at a total concentration c tot ¼ 2 mM.For low f m the concentration between even and odd strands oscillates heavily for short strands.The long tail that is present for f m ¼ 0 (orange curve, only even strand lengths shown) collapses even for very small f m .(b) L Ã in the monomer-dimer system is calculated via Eq.(D4), which is the same as the formula for the dimer-only system upon substituting the dimer concentration with the total concentration c tot .

FIG. 14 .
FIG.14.Pure primer extension (a) versus duplex extension (b), (c).The overhang at the beginning of a (partial) trajectory is called a copy site (blue) with length l cs .(b) In primer-template switching events a building block extends the primer beyond the original copy site.The original copy site is fully covered and a new copy site is formed.The roles of primer and template are exchanged.(c) Copy sites can grow independently of the original primer by template extension with the help of a helper strand.(d) The number of extension events occurring during the covering of the total copy site C final − l initial split according to different types of extensions. .