Simple model for the Darwinian transition in early evolution

It has been hypothesized that in the era just before the last universal common ancestor emerged, life on earth was fundamentally collective. Ancient life forms shared their genetic material freely through massive horizontal gene transfer (HGT). At a certain point, however, life made a transition to the modern era of individuality and vertical descent. Here we present a minimal model for this hypothesized"Darwinian transition."The model suggests that HGT-dominated dynamics may have been intermittently interrupted by selection-driven processes during which genotypes became fitter and decreased their inclination toward HGT. Stochastic switching in the population dynamics with three-point (hypernetwork) interactions may have destabilized the HGT-dominated collective state and led to the emergence of vertical descent and the first well-defined species in early evolution. A nonlinear analysis of a stochastic model dynamics covering key features of evolutionary processes (such as selection, mutation, drift and HGT) supports this view. Our findings thus suggest a viable route from early collective evolution to the start of individuality and vertical Darwinian evolution, enabling the emergence of the first species.


I. INTRODUCTION
A. The last universal common ancestor In the final chapter of "On the Origin of Species", Charles Darwin speculated that all life on earth may have descended from a common ancestor. As he observed, "all living things have much in common, in their chemical composition, their germinal vesicles, their cellular structure, and their laws of growth and reproduction. . . . Therefore I should infer from analogy that probably all the organic beings which have ever lived on this earth have descended from some one primordial form, into which life was first breathed" [1].
A century after Darwin, molecular biology provided new lines of circumstantial evidence for a universal common ancestor. All organisms were found to use the same molecule (DNA) for their genetic material, as well as a canonical look-up table (the genetic code) for translating nucleotide sequences into amino-acid sequences [2][3][4]. Further clues came from cross-species comparisons of the molecules involved in the most fundamental processes of life, such as protein synthesis, core metabolism, and the storage and handling of the genetic material. The first such analysis [5], based on snippets of ribosomal RNA, provoked a revolution in our understanding of life's family tree [6][7][8]. It indicated that life is divided into three different domains: the Archaea, the Bacteria and the Eucarya [5,6,8]. Later studies using other molecular sequences placed the root of the tree, corresponding to the last universal common ancestor, somewhere between the Bacteria and Archaea [9][10][11][12][13], roughly 3.5 − 3.8 billion years ago. The nature of the last universal common ancestor, however, remains unresolved: Was it prokaryotic or eukaryotic? Did it thrive in extreme or moderate temperatures? Was its genome based on RNA or DNA? For a review, see Ref. [14] and references therein.

B. The era of collective evolution
Our work in this paper was inspired by a conjecture proposed by Woese and his colleagues [15][16][17][18]. According to this conjecture, the last universal common ancestor was a community, not a single creature. It marked a turning point in the history of life: before it, evolution was collective and dominated by horizontal gene transfer; after it, evolution was Darwinian and dominated by vertical gene transfer.
In Woese's scenario, life in the epoch leading up to the universal ancestor was intensely communal. It was organized into loose-knit consortia of protocells far simpler than the bacteria or archaea we know today. Woese and Fox [15] called those hypothetical ancient life forms "progenotes." The term signifies that the coupling between genotype and phenotype had not yet fully evolved, mainly because the process for translating genes into proteins had not yet fully evolved either. A rudimentary form of translation existed, but it was ambiguous and hence had a statistical character. Instead of producing a single protein, early translation produced a cloud of similar proteins. This ambiguity in protein synthesis in turn limited the specificity of all the progenote's interactions. For example, lacking the large, complex proteins necessary for accurate copying and repair of the genetic material, the progenote's genome was tiny and subject to high mutation rates.
Progenotes were not well-defined organisms as such, because they had no individuality and no long-term genetic pedigree. Their genes and component parts could come and go, being swapped in or out with other members of the community via horizontal transfer. But because biochemical innovations produced by any member of the community were available to all, evolution at this time was rapid-probably more rapid than at any time since. Selection acted on whole communities, not on individual progenotes. Those communities that were better at sharing their biochemical breakthroughs flourished. Out of this cauldron of evolutionary innovation, the universal genetic code and its translational machinery coevolved, in response to the selective pressures favoring efficient sharing and interoperability.
Vetsigian, Woese, and Goldenfeld [19] confronted and constrained these speculations with mathematical analysis and computer simulations. Going beyond Woese's conjectures, they probed early evolution scientifically by interpreting the available data on the genetic code. Their dynamical model for the evolution of the genetic code [19] showed that a collective state of life is required to obtain the observed [20,21] statistical properties of the code, in particular, its simultaneous universality and optimality. A later study by Goldenfeld and colleagues [22] provided further evidence that only a collective state of life could have created the highly optimized code used by all life today [21].

C. The Darwinian transition
How did the era of collective evolution come to an end? Woese speculated that as the translation process began to improve, and as progenotic subsystems became increasingly complex and specialized, it would have become harder to find foreign parts compatible with them. Thus, horizontal gene transfer would have become increasingly difficult. The only possible modifications at this point would have come from within the progenote's lineage itself, through mutation and gene duplication. It was in this way that the progenotes would have made the Darwinian transition [17,18] to become "genotes," i.e., life forms with a tight coupling between their nucleic acid genotypes and their protein phenotypes, and that could therefore evolve through the familiar Darwinian process of vertical descent.
The model considered below is an attempt to explore, in mathematical terms, how the Darwinian transition from the collective state to the modern era of individuality might have taken place. Our approach shares with Ref. [19] the outlook that a dynamical systems calculation should be devised to support or refute the hypotheses considered. Our results lend support to the proposed collective state of life [15][16][17][18][19] by providing a potential mechanism for the exit from that state.

D. Horizontal gene transfer
Over the last decades more and more evidence has accumulated that, besides selection, mutation, and drift [23], another process drives evolution: horizontal gene transfer. Here we briefly review the main points about horizontal gene transfer relevant to the mathematical model developed below.
While reproduction implies a vertical transfer of genes from one entity to the next in the phylogenetic tree, there are also processes in which possibly unrelated individuals exchange genetic material during their lifetimes, i.e., horizontally in the sense of the tree. This transfer of genes within one generation is consequently termed horizontal gene transfer (HGT) [24][25][26][27] or lateral gene transfer. It is now widely accepted that HGT is a fundamental driving force of evolution [25,[28][29][30][31][32][33], and that its existence raises profound theoretical problems for evolutionary biology. For example, the longstanding problem of defining bacterial species [30,[34][35][36][37][38] is due, in part, to the promiscuous use of HGT by bacteria. A recent primer on horizontal gene transfer and its potential for evolutionary processes in general is given in [39].
As discussed above, if HGT was rampant in the early stages of evolution, the last universal common ancestor was a community, not a single organism [16,17,19,40]. In this collective state, individuals could not yet be distinguished, as each progenote's genes were frequently exchanged through HGT. In terms of the model to be developed below, the total pool of genotypes in the collective state would be spread out and thus broadly distributed in the state space of all theoretically possible genotypes. Conversely, a genotype distribution that is highly localized in state space, being concentrated on just one or a few genotypes, would be the model's version of a welldefined species.
Woese postulated that as the collective state of the progenote population slowly evolved toward higher complexity, its rate of HGT slowly decreased [17]. At some point the system crossed the "Darwinian threshold" [17]. Then natural selection instead of HGT started to dominate the dynamics. The fitter individuals were selected for and the first species emerged from the distributed state. In the colorful language of Dyson [41]: But then, one evil day, a cell resembling a primitive bacterium happened to find itself one jump ahead of its neighbors in efficiency. That cell, anticipating Bill Gates by three billion years, separated itself from the community and refused to share. Its offspring became the first species of bacteria-and the first species of any kind-reserving their in-tellectual property for their own private use. With their superior efficiency, the bacteria continued to prosper and to evolve separately, while the rest of the community continued its communal life. Some millions of years later, another cell separated itself from the community and became the ancestor of the archaea. Some time after that, a third cell separated itself and became the ancestor of the eukaryotes.
After making a Darwinian transition, evolution proceeds in the familiar vertical manner, being driven by selection, mutation, and drift [23], with HGT playing only a minor role. Such Darwinian dynamics have, of course, been studied extensively in both experimental and model settings [23,[42][43][44][45][46]. Compared to the dynamics of HGT their properties are relatively well understood. Recently, potential influences of HGT on such evolutionary dynamics have been investigated [16,19,[47][48][49][50][51]. Some mathematical models of HGT have focused on how it can increase a population's fitness in Darwinian evolution [52].
Keep in mind, however, that the hypothesized HGT associated with progenotes and the Darwinian transition, being associated with ribosomal genes and the rest of the core machinery of the cell, would have been of far greater evolutionary significance than the HGT of, say, antibiotic resistance genes seen in bacterial communities today. In Woese's scenario, the ancient form of HGT was rampant, pervasive, and tremendously disruptive and innovative. It was the prime mover in shaping the fabric of the cell [18].
We would like to understand what such a Darwinian transition would look like, mathematically. The model described in the next section is deliberately minimal. It leaves out all the biology of ribosomes, proteins, genetic codes, and the like. What remains is an attempt to capture the essence of Woese's speculations. In place of a community of progenotes, we consider a community of abstract genomes, represented by binary sequences. They interact via HGT, and are subject to mutation, selection, and drift on a fitness landscape. Our work suggests that HGT-dominated dynamics may have been intermittently interrupted by selection-driven processes during which genotypes became fitter and decreased their inclination toward HGT. Such stochastic switching in the nonlinear population dynamics may have destabilized the HGT-dominated state and thus led to a Darwinian transition and the emergence of the first species in early evolution.
On a side note, an interesting mathematical aspect of the model is that it necessarily involves three-point interactions, since HGT transforms one genotype into a second by importing pieces of a third. Thus the model provides a natural biological example of a complex hypernetwork. Until now, most models in evolutionary dy-namics and population biology did not need to go beyond ordinary network structure, with two-point interactions between nodes connected by links.

II. EVOLUTIONARY MODEL
To explore the consequences of HGT on evolution, we consider a model community of N progenotes evolving on a fitness landscape [23,53] in the presence of selection, mutation, drift, and HGT. Each progenote carries a genome of length l composed of a sequence of the bases 0 and 1. The genome of progenote i determines its fitness f i . The progenotes reproduce by the Moran process [23,54], i.e., each progenote reproduces randomly in time, with its reproduction rate given by its fitness f i . Whenever a progenote of genotype i reproduces, an offspring is added to the population which is either identical to genotype i or a mutant of genotype j with probability µ ij . Instantaneously after such a reproduction event, one progenote in the population is chosen randomly to die and is hence removed from the population. We assume that one mutation event will only affect one of the bases of the genome, so that the Hamming distance between genotypes i and j is 1.
Hence, our fitness landscape may be represented by a network where the different genotypes are the nodes of the network and the possible mutations form the links. Assigning two different bases, 0 and 1, and given the structure of the mutations, the resulting network is an l-dimensional hypercube (Fig. 1).
The fitness landscape underlying our model is assumed to be a Mount Fuji landscape [23]: The highest fitness is assigned to one single genotype, the peak. Other genotypes are assigned lower fitness: the farther away from the peak in genotype space, the lower the fitness. Thus, a single-peaked mountain landscape is created on genotype space, and a population evolving purely through the processes of selection and mutation should converge to this peak. Note that the Moran process described above is a random process. It thereby constitutes a minimal model intrinsically including the effects of selection, mutation and genetic drift [23]. The latter is induced by the stochastic selection in the combined process of reproduction, mutation and death and has the effect of randomly walking the population around in genotype space even if no fitness differences were present.
To reveal the potential impact of HGT we incorporate its basic features into the stochastic evolution model. Two progenotes A and B may meet and a subsequence s of progenote B's genome may be inserted into A's genome. As a result of this horizontal gene transfer event, the genotype of progenote A will transform into another genotype C, determined by its original genotype and the subsequence s. This process is illustrated in Figure 1.
To model this process we add HGT-hyperlinks to the hypercube network representing the fitness landscape. One such hyperlink symbolizes a three-genotype interaction and is defined through the following process. We choose two genotypes A and B randomly as well as a random subsequence of genome B with length between x = 2 and x = l − 2 bases. This subsequence is inserted at a random position of genome A. The remaining x bases at the end of A's sequence are cut off, keeping the sequence length of A constant. The new sequence determines a genotype C, which genotype A becomes on interacting with B via this HGT-link, denoted ( − −−−− → A, B, C). If the resulting genotype C is identical to A, this HGT-link would not alter the population dynamics and would thus be irrelevant. We therefore neglect such self-projecting HGT-links. We repeat the above procedure until a predefined number m of new HGT-links has been added to the system. An HGT-link defines one type of HGT-event, in which part of genotype A is replaced by part of genotype B and is thereby transformed to genotype C. We consider these events to occur independently of each other. Let k X denote the number of progenotes of genotype X in the population. Then the HGT-events above occur at a rate Here the effective competence for HGT is modeled as a constant c ≥ 0 that captures both the rate at which the progenotes meet and their actual preference for the initiation of an HGT event, given that they meet. Note that interactions of the form (1), independent of any model details, imply collective dynamics on a complex hypernetwork, due to their intrinsic three-genotype coupling involving A, B, and C. The dynamics of horizontal gene transfer in biological systems depends on a multitude of factors, including the mode (e.g., natural transformation or conjugative transfer) of HGT [27], and may vary with the fitness of the donor and recipient [47,55] and other factors such as environmental conditions [56]. To focus on qualitative mechanisms, we here consider the simplest setting where c is just a nonnegative constant. We note that, via the factors k A , k B and the presence or absence of HGT-links ( − −−−− → A, B, C), the actual rate of all HGT events in the population still depends on how the population is distributed in genotype space.

III. QUANTIFYING STOCHASTIC SWITCHING
To see how HGT influences the evolutionary dynamics we study how the collective model dynamics depends on the competence c. The population sizes k i (t) of progenotes of different genotypes i present in the population fully describe the state of the system at time t. We introduce the population entropy to quantify how broadly the population is distributed in genotype space. Populations consisting of only one genotype have population entropy zero. If the population is uniformly spread out in genotype space, the population entropy takes its maximal value S max = l log(2). Direct simulations of the stochastic dynamics reveal that for large competences c, the collective dynamics converge to a state of high population entropy where the population is highly spread out in genotype space (Figure 2a). It may only transiently switch to a state localized in state space, i.e., with relatively little spread in genetic material. In this high entropy state the total HGT rate in the population is orders of magnitude higher than in a speciated state (see below). The population does not adapt to the underlying fitness landscape; in that sense, HGT is the main driving process in this large-c scenario. We identify this state of high population entropy with a The low entropy state is rendered globally stable for low c so that the population entropy fluctuates slightly above zero for all initial conditions. In the low entropy state the population dynamics are driven by selection, in the high entropy state by HGT. Panels (i) show the entropy dynamics, (ii) the average fitness f of the population corresponding to the entropy dynamics and (iii) the corresponding HGT rates rHGT that the population exhibits at time t. For low population entropies the fitness is high and HGT rate small and vice versa for high population entropies. The mutation probability was set to µij = 0.0001. Into the resulting Fujiyama fitness landscape [23] with fitness values between fmin = 0.9 and fmax = 1.1 we inserted m = 2000 HGT-links.
pre-Darwinian collective state, as in this state no distinct species can be distinguished and HGT is the dominant force driving the evolutionary dynamics.
In contrast, if the progenotes' competence for HGT is low, we observe a population dynamics which converges toward a state of low population entropy (Figure 2c). This confirms the observation that selection, mutation and drift will drive a population to adapt to a fitness landscape if the mutation rate is not too high [23,44]. The population is thus concentrated around the fittest genotype with only rare mutations and genetic drift causing some spread of the population. As a consequence, large parts of the population exhibit the same or similar genotypes such that it is in a speciated state.
While the system spends almost all time close to its speciated state for low competence, the dynamics switch stochastically between the speciated and the distributed state if the competence is not small enough. Figures 2ac show that the higher the progenotes' competence for HGT is, the longer the system stays in the distributed state.
We conclude that both the speciated state and the distributed state are dynamically accessible metastable states (for high enough competence). The dynamics only switch from one of these states to the other due to rare events in the stochastic dynamics. This is supported by the fact that the dynamics switch between these states on much shorter time scales than the time they remain in them (see also Figure 3

below).
How does the distributed state disappear for low competences? To answer this question we developed a method based on the population entropy defined in (2) to study the forces induced on the dynamics by reproduction and HGT. The evolution dynamics are event-driven, and the population entropy S may only change at these event times. At each event there is a population entropy S − directly before the event and a population entropy S + directly after the event. The change of population entropy induced by a single event will in general depend on the type of event (reproduction or HGT) and the actual dis-  Figure 3. The high entropy state is dynamically stable also for vanishing mutation probabilities. Shown is the measured percentage of time a population stayed in the distributed state for a system with mutation probabilities µij = 10 −3 (blue), µij = 10 −4 , µij = 10 −5 (orange), µij = 10 −6 (green) and µij = 0 (gray). Qualitatively, the results are similar, only for higher mutation probabilities the critical transition occurs at a lower value ccr. System parameters were l = 7, N = 1000 and m = 3000 HGT-links were introduced into a Fujiyama fitness landscape with fitness values between fmin = 0.9 and fmax = 1.1. Each datapoint was obtained in a simulation of length T = 10 6 with the initial condition S(0) = Smax. tribution of the population over genotype space. If the population is in a state with population entropy S, one event will thus induce a mean change ∆S(S) averaged over all events occurring at population entropy S. The rate r(S) at which these events occur depends on the state of the system as well. Multiplying the mean change induced by the single events with the rate at which the events occur, we obtain the average rate of change induced on the dynamics. The reproduction and HGT events in our model occur independently of each other, so that their contributions separate additively according to = r Repr (S) · ∆S Repr (S) + r HGT (S) · ∆S HGT (S). (6) We measured these functional dependencies in simulations of the dynamics (for more details see Supplementary Information), thereby obtaining the forces induced by reproduction and HGT which drive the population entropy dynamics (Figure 4). The bistability of the dynamics emerges because the impact of HGT increases with the diversity of the population. Thus, if the population entropy is high, HGT will drive it toward even higher population entropy, and hence toward the distributed state. However, if the competence drops below a critical value, the impact of HGT on the population's dynamics is always smaller than that of selection, independent of the diversity of the population. Thus, the distributed state disappears in a saddle-node bifurcation and the population converges to a speciated state. Furthermore, our analysis reveals that HGT alone can drive a population into a distributed state, even in a total absence of mutations ( Figure 3).  (7) we define a potential V (S) for the dynamics which is shown in (d) for the competences c = 1 (blue), c = 3 (red) and c = 5 (orange) and additionally for c = 0.5 (gray), c = 2 (green) and c = 4 (black). The potential valley at high population entropies emerges between c = 1 and c = 2 so that the critical competence must lie between these two values. Each dataset was obtained in simulations measuring the dynamics for a time T = 10 7 .
Why do the dynamics almost always remain in the high entropy state for high competence? Using the average rate of changeṠ(S) we define a potential in which the dynamics move under additional stochastic forcing. This potential is shown in Figure 4d. According to reaction rate theory [57], the depths of the two stable states' potential wells determine the average time the dynamics stay close to each of the stable states. As the potential well at the distributed state becomes ever deeper for higher competence the dynamics hence stay ever longer in this state. Thus, our results suggest that when progenotes had high competence for HGT in early evolution, a dis-  Figure 5. A possible scenario for the evolution of distinct species from a pre-Darwinian distributed state. The three time series sketched here are not simulation data, but encapsulate the speculations in the text, showing how the average competence, the average fitness, and the population entropy may evolve in the transition from a distributed state to the first distinct species. In the initial state (marked in blue) the competence is high, so that HGT drives the dynamics; the population exhibits a high population entropy and low average fitness. Through a stochastic switching the dynamics reaches a state of low population entropy where the fitness is higher as the population adapts to the fitness landscape. Here the population could evolve slowly toward lower competence. Thus, the dynamics switch back and forth between the low and the high entropy state remaining longer and longer in the low entropy state as the competence decreases (marked in red). When the competence goes below a critical value (marked by the dashed line in the top panel) the high entropy state disappears (marked in green), the dynamics remains in the low entropy state, the population's average fitness increases and the first species may robustly evolve.
A decrease in competence may rely on a mechanism that combines the stochastic switching uncovered above with the suggestion that fitter populations may tend to be less prone to HGT events, as schematically illustrated in Figure 5. As the speciated state is always stable, even if the population's competence is high, the population dynamics will stochastically switch to this state repeatedly for relatively short times. In the selection-dominated state (i.e., at low S) the population's fitness increases. A fitter population that might be less prone to HGT events, as suggested recently [50], has a decreased overall competence (lower c in our simplified model setting). Smaller c in turn increases the stochastic residence times the population spends in the selection-dominated state. This combination of two mutually amplifying contribu-tions (decreasing HGT rate and increasing fitness in the population) may yield decreasing competence in the long term such that after sufficiently many switches to the low-S state, the competence may drop below a critical value where the distributed state disappears. The population then stays localized around the fittest genotypes, thus marking the time of transition to Darwinian evolution. At this time, the first species can robustly emerge. The scenario shown schematically in Figure 5 illustrates one potential course of such repeated switching dynamics, with temporarily increased phases of higher fitness and decreasing HGT competence on long time scales.

IV. CONCLUSION
Our results provide a first glimpse of the possible dynamics that may have led to the emergence of the first species from a distributed state dominated by HGT. We demonstrated that a high competence for HGT in a population may suffice to drive the population into a distributed state ( Figure 2). In this state HGT dominates the dynamics, in the sense that it inhibits the population's ability to adapt to the underlying fitness landscape and thus prevents it from crystallizing into distinct species. Our analysis revealed that, independently of the mutation rate exhibited by the population, HGT can drive the population dynamics into a state where the population is widely spread out in genotype space ( Figure 3). We identify this state with a pre-Darwinian collective state envisioned by Woese [15][16][17][18].
Similarly, a state where no distinct species exist can emerge if the mutation rate in the population is too large [44,58,59]. Above a critical mutation rate (the error threshold) the population cannot adapt to the underlying fitness landscape and will always evolve toward a quasispecies state [44,58,59] similar to the distributed state induced by HGT shown above. However, there is a fundamental difference in the dynamics induced by HGT and that induced by mutations: While a mutation rate above an error threshold will always lead to a quasispecies state [44], high rates of HGT as studied above induce a bistability of the dynamics where the distributed state coexists with a localized "speciated" state of low S. This coexistence may be essential for the evolution toward lower competence in a population and thus for the emergence of the first species; the coexistence is what enables a population originally in a distributed state to repeatedly switch to a low-S state. As selection plays a major role in such a low-S state, progenotes with lower competence would be selected for. Thus, with time, the entire population would evolve toward lower competence until the distributed state disappears as selection effects dominate the dynamics and the first species emerge.
For the breakdown of the distributed state it is essential that the population evolves toward a lower compe-tence. That the latter may in principle be possible was already suggested by Vogan and Higgs [50]. Our results on an idealized model now demonstrate how stochastic switching and fitness-dependent competence may combine to create a transition from a bistable state to a speciated-only state. They in particular also suggest that HGT may be present at similar competence levels before and after the emergence of the first species. From a complementary perspective, whereas one or a few species may already have existed, other population parts may still have been mixed without any clear species. So the very first species may only have marked the beginning of the decline of genuinely non-specific life, with other Darwinian transitions to follow.