Memory-cognizant generalization to Simon's random-copying neutral model

Simon's classical random-copying model, introduced in 1955, has garnered much attention for its ability, in spite of an apparent simplicity, to produce characteristics similar to those observed across the spectrum of complex systems. Through a discrete-time mechanism in which items are added to a sequence based upon rich-gets-richer dynamics, Simon demonstrated that the resulting size distributions of such sequences exhibit power-law tails. The simplicity of this model arises from the approach by which copying occurs uniformly over all previous elements in the sequence. Here we propose a generalization of this model which moves away from this uniform assumption, instead incorporating memory effects that allow the copying event to occur via an arbitrary age-dependent kernel. Through this approach we first demonstrate the potential to determine further information regarding the structure of sequences from the classical model before illustrating, via analytical study and numeric simulation, the flexibility offered by the arbitrary choice of memory. Furthermore we demonstrate how previously proposed memory-dependent models can be further studied as specific cases of the proposed framework.


I. INTRODUCTION
Within natural systems from an assortment of domains there are underlying properties which are found to consistently appear. These occurrences result in researchers aiming to develop general theories which can capture aspects of these phenomena independent of the domain under consideration [1,2]. One such instance is the distribution underlying the abundance of a system's constituents, which is frequently described by a highly right-skewed distribution [3][4][5]. This apparently ubiquitous phenomena has been observed in a variety of domains, including baby-name popularity [6,7], citations to scientific literature [8][9][10][11], user-generated passwords [12], and the market share of different cryptocurrencies [13].
Motivated by this apparently ubiquitous property, researchers have investigated mechanisms which may reproduce such heavy-tailed size distributions. These models tend to create a population of elements, each of which has a certain identity or variant, who reproduce over time through a given process, the abundances of said variants are generally the quantities of interest. Typically these mechanisms also offer the possibility of mutation or the creation of a new type of variant in a given reproduction event. The most common of these frameworks are known as neutral models due to the assumption that each variant in the population has equal fitness [14,15]. This neutrality implies that there are no intrinsic advantage offered to any type of variant within the model. For a detailed overview of the most common of these models, alongside their limitations, we refer the interested reader to Ref. [16].
In 1955 Herbert Simon, building upon the earlier work of G. U. Yule [17], famously introduced such a neutral model based upon random-copying (reproduction was viewed as a copying event) with mutation (which he referred to as an innovation event) framework that could reproduce the power-law distributions observed within empirical systems [4]. The fundamental property of this model is that the likelihood of the next element being of a certain variant is dependent upon the number of previous occurrences of this variant. Notwithstanding the model's apparent simplicity, it has been shown to accurately describe the distribution of abundances within a number of complex systems, including the citation dynamics of scientific literature [18,19], family-names [20], and the growth of both the world wide web [21] and open-source software developments [22].
Similar models have appeared throughout the literature since Simon's original article -in particular those placing emphasis upon the cumulative advantage or richget-richer mechanisms. Arguably the most famous such approach has been in the development of network growth models within the field of complex networks initiated by Barabási and Albert in 1999 [23] (although the first such framework was produced through Price's model of citation growth [8,24]). In spite of the misleading use of the word 'advantage' within these frameworks it is important to highlight, as mentioned in [16], that these models are in fact neutral models as the nodes have no inherent fitness: their likelihood of selection is determined only by the number of times they have previously been selected, although there have been a number of extensions proposed that move beyond from this neutral assumption [25,26].
As demonstrated through the preceding commentary, Simon's model has proven extremely adaptable in describing behaviors within a wide spectrum of domains. Remarkably, in spite of the model's supposed straightforwardness, the research community is still providing new insights into its behaviors. For example, analytical quantification of the first-mover advantage offered towards the initial variant in Simon's model was demonstrated in [19], while the effect of averaging across multiple realizations of a closely related model upon the distribution of larger abundances, denoted as peloton dynamics, have also been considered [27]. There have also been a number of studies which have incorporated memory-effects within Simon's model such that the likelihood of a variant being copied is no longer uniform across all previous elements but rather is dependent upon how long ago the variant was last used. Such a framework requires the incorporation of a probability function φ(τ, t) describing the likelihood of copying at time t an element which appeared at time τ . For example, in Refs. [28,29] the authors introduce a fat-tailed memory distribution to the model which could capture long-range correlations in times between copying events. Furthermore, the authors in Ref. [30] consider a finite-sized memory kernel such that if a certain time has passed since a variant has last been copied it becomes extinct. Reference [31] studies the scenario whereby the likelihood of copying is dependent on a power function of its abundance, with the asymptotic behavior of said abundance being analyzed via a branching process interpretation. Lastly, a specific age-dependent model described by an additional parameter is again studied in terms of its branching behavior in Ref. [32]. Questions remain however as to the effect of incorporating a more general memory function, depending on the time since an element appeared, to Simon's model.
With these questions in mind, in this article we consider a branching process approach [33,34] towards describing a generalized Simon's model. Such techniques have previously proven extremely useful in representing similar random-copying phenomena in online diffusion scenarios [35][36][37][38][39][40]. Initially, we take this approach in describing the classical Simon's model and demonstrate how the interpretation allows numerous quantities describing the distribution of abundances within the process to be obtained in an analytically tractable manner. Moreover, we obtain results regarding the temporal evolution of a given variant as a consequence of the model incorporating the time at which a given element first appears. We proceed to use a similar framework to study a general memory function which depends upon the time t − τ elapsed since an element appeared to produce a generalized Simon's model (GSM) and demonstrate how statistical properties of abundances arising through such a process may be obtained for an arbitrary choice of said memory function. We also highlight how the previous literature focusing on specific choices of memory function [28][29][30], which are viewed as particular scenarios of our proposed framework, may be further understood using the branching process interpretation proposed here.
The remainder of the paper is laid out as follows. In Section II we introduce Simon's original model in detail before demonstrating a branching process approach towards describing it from which classic, alongside new, results are derived. Having demonstrated the usefulness of our framework we proceed in Section III to introduce a generalization of Simon's model incorporating arbitrary age-dependent memory effects and present a thorough analysis of its features. In Section IV we proceed to validate the GSM framework through extensive numerical simulations. We demonstrate how previous extensions to Simon's model may be viewed further studied using the framework presented here resulting in additional information regarding said extensions now being attainable in Section V before drawing our conclusions in Section VI.

II. RESULTS FROM SIMON'S CLASSICAL MODEL -NEW AND OLD
The model devised by Simon [4], like most neutral models, is remarkable for both its apparent simplicity and the variety of behaviors it exhibits. The original representation describes an author creating a body of text that is the population of interest, with every word used representing an element:each unique word corresponds to a variant in the neutral model framework. In the analysis to follow we shall use the neutral model descriptors (e.g., population, element, variant). The system initializes at time t = 1 with a single element and then at each subsequent discrete time-step of length ∆t another element is added. How the variant of this element is chosen occurs through a probabilistic framework whereby there may be a mutation (innovation event in the original model) with probability (w.p.) µ such that this new element is a new variant in the population. Alternatively, occurring w.p. 1 − µ, the new element is from a reproduction event implying a random replication from one of the previous elements. Our main focus in this article is on considering how the new element chooses a previous element to copy from; initially however, we study the case considered by Simon whereby this choice is made uniformly over all previously used elements.
We now focus on the main quantity of interest which is the number of times n that a variant which first appeared at time τ has been used by time Ω. Of course, due to the stochastic nature of the model, this quantity is a random variable and so we define q n (τ, Ω) to describe the corresponding probability of it taking the value n. More specifically, we shall interact with this quantity through its corresponding probability generating function (PGF) [41] given by with final condition H(Ω, Ω; x) = x, i.e., a variant which is created through a mutation event just as the process concludes would have an abundance of one. We now examine the functional form of the PGF that results from the mechanisms of Simon's model (see Fig. 1). In particular, we consider how the distribution of a variant's abundance may change over a short time interval (τ − ∆t, τ ). x)] 2 to the PGF, where we have used the fact that both elements are now identical due to the variant abundance being the only factor influencing future copying events alongside the property that the PGF of the sum of two random variables is simply the product of their corresponding PGFs [41].
2. Second, the new element may again be a result of reproduction w.p. 1 − µ but with the source of the reproduction (copying) being another element rather than the one under consideration. Such an copying event occurs w.p. (τ − ∆t)/τ , i.e., the fraction of time the other elements were present for. In this scenario a further branch is not created and as such the contribution to the PGF of interest is simply H(τ, Ω; x), i.e., a single branch that may result in future copying events.
3. Finally, the new element may be as a result of a mutation, w.p. µ. This of course results in a different variant to that under consideration and as such again contributes H(τ, Ω; x) to the PGF.
Taking these terms and their corresponding probabilities together we produce the following difference equation describing the time evolution of the PGF and if we proceed to consider the scenario in which ∆t τ we may move to a continuum time limit that proves more tractable for mathematical analysis: This is an ordinary differential equation under the assumption of fixed Ω, where H = H(τ, Ω; x). This equation is in fact of the Ricatti form which, when combined with the aforementioned final condition, may be solved to obtain an exact representation of the PGF's functional form by With this solution for the PGF at hand we may proceed to conduct analysis regarding the underlying distribution of variant abundance.

A. Analytically obtained statistical properties
The advantage of having obtained a functional representation regarding the dynamics of Simon's model is that it provides information regarding the entire distribution of abundances and is also extremely amenable to quantitative analysis. We now proceed to highlight some, although far from all, of these possible analyses regarding the statistical behavior of the variants' abundance.

Distribution of variant abundance for given seed time
First of all we consider the probability distribution of a variant's abundance for a given seed-time. Specifically we focus on the previously defined probability q n (τ, Ω) which may be accessed through the PGF H via the wellknown property such that knowledge of the n-th derivative of Eq. (4) is required. Generally this distribution is completed numerically or, to avoid the inaccuracies caused from numerical differentiation, via transformation to the complex plane from which Cauchy's theorem may be used [38,42]. Fortunately, in this specific case, it can be seen that the PGF in fact describes a geometric distribution (with an additional power of x) such that the probability mass function can be written explicitly as An array of analysis is possible with the analytical distribution at hand, for example we may consider the likelihood that a variant which first appeared at time τ does not appear again by time Ω which is given by Furthermore, we may readily calculate the moments of abundance size directly from the distribution, however, with the analysis to come later in this article in mind, we will now focus on obtaining such quantities directly from the PGF.

Moments of variant abundance distribution
We first consider the average number of times m(τ, Ω) that a variant that first appeared at time τ has appeared across all elements by time Ω. Taking advantage of the fact that the distribution's moments are readily attainable through repeated differentiation of Eq. (4) w.r.t. x, the average or first moment is obtained through the following calculation Interestingly, we note that this quantity is the inverse of the probability that the variant had only appeared once in the population given by Eq. (7). This expression highlights a number of facets within Simon's model, most obviously the 'early-mover' advantage suggesting that variants with earlier seed times τ will have, on average, appeared more times in a given realization of the process.
We specifically highlight that the mean abundance of a variant in fact exhibits a power-law relationship with the seed time.
One final point we comment on regarding this moment is the average number of appearances for the variant describing the introductory element seeded at time τ = 1, which is given by (1/Ω) −(1−µ) . This expression is in general agreement with the analysis found in [19] (assuming that µ 1 such that 1/Γ(2−µ) ≈ 1) demonstrating the intrinsic advantage offered to the first variant to appear in a realization of Simon's model.
Of course we may obtain further moments of the distribution through similar calculations to those shown above. For example, the variance of variant abundance v(τ, Ω) may be found by observing that which allows analysis regarding the dispersal of the abundance size to be readily calculated.

Distribution of variant abundances across all seed times
Simon's model originally found fame for its ability to generate power-law distributed popularity values when all variants within a given sequence are considered, regardless of their ages. This distributionq n (Ω) thus focuses on all seed times rather than the distribution of variants with a given seed time as our preceding analysis has considered. The classic result is however directly obtainable from our analysis by taking the distribution for a given seed time from Eq. (6) and averaging over all seed times, which are uniformly distributed over the interval [0, Ω], to obtaiñ where β represent the beta distribution and by consequently making use of the approximation β(a, b) ∼ b −a for large b, we obtaiñ the power-law distribution with exponent larger than two governing the entire population sequence as originally demonstrated by Simon.

B. Numerical simulations
We now provide results from numerical simulations of the classical Simon's model with the aim of validating the branching process approach described in the preceding section. Specifically, we compute an ensemble of 10 6 realizations of the process with length Ω before considering the quantities derived above. In general, when considering a given seed time τ we force a variant to be created at this time but otherwise the simulations are exactly as described by Simon in his original work. We first consider the overall distribution of variant abundance originally shown to result in a power-law distribution for larger abundances and as described through Eq. (11). The corresponding distribution is shown in Fig. 2(a) where simulations have been performed with Ω = 5 × 10 5 and µ = 0.01.
We now consider the first original result obtained as a consequence of our approach. Specifically, the distribution of abundances for a variant seeded at time τ given by Eq. (6) is considered for a number of different seed times τ = 5 × 10 2 , 10 3 , 5 × 10 3 , and 10 4 . We then observe the distribution of abundances observed after 2.5 × 10 4 time units in each scenario such that each seed time is considered with the same age at observation time Ω = τ + 2.5 × 10 4 . The results of said simulations are shown in Fig. 2(b) where we observe the 'early mover' advantage offered by Simon's model which has been a motivating factor in similar preferential attachment based models. Figure 2(c) considers a similar framework but instead focuses on how the probability of a variant having abundance n varies as a function of seed time, where in each case Ω = 10 5 . We again observe the early-mover advantage here with those large n values being more likely for the early seed times, we also show the probability of the variant not being used again after creation given by Eq. (7). Lastly, Fig. 2(d) shows the average popularity of a variant seeded at time τ and observed at time Ω = 10 5 along with the theoretical calculation given by Eq. (8), demonstrating the power-law relationship between the two quantities. We conclude by commenting on the generally excellent agreement between simulations and the theoretical results offered by the branching process interpretation of Simon's classic model.

III. A GENERALIZED SIMON'S MODEL
One of the enticing properties of Simon's original model is the simple manner in which previously used elements are chosen from when a reproduction event occurs. Specifically, the choice is made over all previously used elements such that the likelihood of choosing a certain variant is proportional to the number of times that variant has appeared prior to the choice, resulting in a preferential attachment form of dynamics. While such properties have resulted in an extensive array of literature in more recent times the concept of temporal memory -more formally known as the burstiness [43][44][45] -underlying how one makes decisions when copying has become increasingly important from a modeling perspective.
Accordingly, here we focus on such aspects by proposing a generalized Simon's model. In this model, when a new element from a reproduction event is choosing their variant type from the previously used elements the likelihood of copying from an element which was present t time units prior is proportional to a given function φ(t). This probability may be viewed as a memory function through assuming the property ∞ 0 φ(w) dw = 1. The quantity of interest is, as before, the probability that a variant which first appeared at time τ has been used n times by time Ω, denoted by q n (τ, Ω). Incorporation of the temporal element of memory requires some further considerations when performing calculations for this quantity. Specifically, we start by considering the PGF H(τ, a, Ω; x) of the random variable describing the number of times that an element, seeded at time τ due to a reproduction event from an element which had age a at the time (i.e., first appeared itself at time τ −a), has caused further elements to have the same variant type. Now, as in the classical case of Sec. II, we consider how this PGF changes over the time interval (τ − ∆t, τ ). If ∆t is sufficiently small such that at most one event occurs in said time interval, there are three possible events: 2. There may again be a reproduction event w.p. 1 − µ but in this instance the new element copies its variant from an alternative element w.p. 1 − φ(a)∆t which does not affect the abundance resulting from the element of interest, thus the contribution to its PGF is simply H(τ, a, Ω; x).
3. Finally, there may be a mutation event rather than a reproduction w.p. µ which again does not change the abundance size resulting from the element under consideration here thus contributes H(τ, a, Ω; x) to the PGF.
Taking these terms alongside their corresponding probabilities leads to the following difference equation describing Taking, as in Sec. II, the case where ∆t τ , and using a two-dimensional Taylor approximation gives the following partial differential equation which may be solved via the method of characteristics (we refer the interested reader to [46] for a similar calculation) to obtainH whereH(Ω − τ, a; x) = H(τ, a, Ω; x), i.e., the functional form of this PGF now depends only upon the time interval t = Ω − τ between seeding and observation.
This PGF describes the distribution of abundance as a result of a new element's variant being chosen by copying from a previous element of age a. We are generally interested in the first such seeding i.e., by an element with age a = 0, to obtain information regarding the entire tree size, and as such we may drop the explicit use of a and obtaiñ We note the similarity between this distribution and Eq. (8) of [46], indicating that the GSM may be viewed as a specific version of the self-exciting component within a Hawkes process [47] for a given variant. As such, previous results regarding the predictability of future popularity size of said processes are readily applicable to abundance size in the case of GSM.This apparent equivalence is rather intriguing and demonstrates a deep underlying relationship between age-dependent branching processes and a range of random-copying continuous processes. We do note however that this equivalence arises from the branching process assumption, which describes the Hawkes process exactly, but in the case of Simon's model is an approximation as during each time step only one event should occur. This violates the independence assumption of the branching process, although in Appendix B we demonstrate that the errors arising from this approximation in the classical model are minor.

A. Analysis
Having obtained the functional form of the PGF through Eq. (15) for the GSM we may now proceed to consider some possible analytical quantities obtainable regarding the process.

Mean Popularity
As in Sec. II A 2, the mean abundance of a given variant after t time units have passed since it first appeared may be obtained directly from Eq. (15) through the following calculation which does not, in general, have an analytical solution. We may, however, conduct analysis in the Laplace space to determine properties of this quantity as followŝ where hatted quantities represent their Laplace transformφ(s) = ∞ 0 φ(t)e −st ds. We proceed to solve this equation for the Laplace-transformed average popularity to obtainm (s) = 1 From this expression we may immediately calculate, by means of the final-value theorem, the average abundance in the infinite time limit as follows More generally (aside from specific distributions, e.g., the exponential, which may be handled analytically) the finite-time behavior of this quantity is obtained via numerical inversion of the Laplace transform given by Eq. (16). This is readily performed using the efficient Talbot algorithm [39,48,49]. One final consideration we describe here is the relationship between the mean memory-time T = ∞ 0 t φ(t) dt and the corresponding large-time mean abundance. This is studied by considering the small-s behavior of Eq. (18) in particular noting that where we have used the fact thatφ(0) = ∞ 0 φ(t) dt = 1 and dφ(0) ds = ∞ 0 −tφ(t) dt = − T . Using this approximation in Eq. (18) allows the mean abundance of a given variant t time units after first appearing to be written explicitly as Furthermore, in the case of an exponential memory kernel with mean time β i.e., φ(t) = 1 β e −t/β , we may exactly determine the mean abundance through inversion of Eq. (18) to obtain Indeed, this expression for the exponential memory kernel may be expanded to first order in the case where t β/µ giving i.e., a linear function of time with an inverse dependence upon the mean memory-time of the distribution.

Infinite-age distribution
Analytically obtaining the probability distribution associated with the PGF given by Eq. (15), unlike that for the classical model, generally proves infeasible as a consequence of its convoluted nature. We may, however, conduct asymptotic analysis regarding the distribution in the case of those variants with large age [38]. Specifically we proceed to assume thatH(t; x) →H ∞ (x), independent of t, as t → ∞ and make use of the fact that ∞ 0 φ(w) dw = 1 in Eq. (15) to obtaiñ The large-n behavior of the underlying distribution q n (∞) may be studied by letting x = 1 − w and H ∞ (x) = 1 − γ(w) before proceeding to make use of the small-w and small-γ asymptotics of Eq. (24). Through this analysis (detailed in Appendix C) it may be shown that the large-n abundance distribution is described by a power-law with exponential cutoff q n (∞) ∼ A n −3/2 e −n/κ , as n → ∞, where the prefactor A is given by and the cutoff scale κ is We note that this heavy-tailed behavior is distinct from that found in Simon's original analysis, as in the original case Simon considered an abundance distribution across all variants within the system whereas we focus on a single variant. The present result however suggests that the incorporation of a decaying memory within the model results in a heavier tail describing the large-n behavior of the abundance distribution even in the case of a single variant. A similar result has been shown in [30] for a specific memory function (see Sec. V) which also resulted in a power law with an exponent of -3/2 and exponential cutoff, the results presented here therefore demonstrate the generalization of this behavior to the case of an arbitrary memory kernel.

IV. SIMULATIONS
We now proceed to consider different forms of the GSM via numerical simulations alongside the analytical estimates of their behavior obtained in the preceding section. In each case we generate sequences of elements similar to those in Sec. II B but now, in the case of a replication event, the time from which to choose a target element for copying is a random variable drawn from the specified memory distribution. Due to the generally infinite support of the GSM kernel there are times in which the copying time may occur prior to the start time of the system; we consider such cases as mutation events due to elements from prior to the system's start time not being of the same variant as that in which we are interested.
First, we simulate processes of length 10 4 with an exponentially distributed memory kernel with mean time β and innovation probability µ = 0.01. The average number of times the variant has been copied by the time it has age t, i.e., when there have been t further elements created, is shown in Fig. 3(a) for β = 100, 300, 500, along with the theoretical prediction given by Eq. (22). We point out the apparent inverse relationship between mean memory time and average abundance of a given variant, suggesting that shorter memory times favor the likelihood of a variant having larger popularity. We proceed to consider the distribution of abundances across these realizations in the case β = 300 at a number of different ages t = 100, 500, 1000, 5000 in Fig. 3(b), where the complementary cumulative distribution function is shown along with the corresponding theoretical estimates obtained from inversion of Eq. (15). Lastly we consider, as in Fig. 2(b), two variants seeded at different times τ = 500, 5000 and observe the abundance of the variant at Ω = τ + 50000 i.e., when they both have the same age. Now, unlike in the classical Simon's model, the distributions are equivalent such that there is no longer a 'early-mover' advantage for variants.
We now proceed to consider an alternative memory kernel to demonstrate the robustness of our GSM formalism, specifically we consider a gamma-distributed memory given by where k is called the shape parameter and θ the scale parameter. This distribution offers a wide-range of behaviors, the effect of which upon the GSM may be observed. In particular, the average memory-time is given by T = kθ while the structure of the distribution can vary widely from exponential (when k = 1) to more symmetrically distributed about the mean when k > θ. Figure 4 again shows the average abundance of a variant simulated with this process over 10 6 realizations. Two features are immediately noted, firstly we see that the general trend of the abundance is dependent upon the mean memorytime whereby the cases where k = 10, 100, 1000 and θ = 100, 10, 1 all have a equivalent general trend while the process with larger mean memory-time (k = θ = 50) demonstrates greater average abundance growth. The second interesting aspect here is the effect of the shape of the memory distribution upon the dynamics of the average abundance. In particular, the more skewed scenario (k = 10, θ = 100) results in linear-like growth as observed in the exponentially distributed case. On the other hand, when the memory distribution is more sym- (c) CCDF of abundance for two variants which have been seeded at different times (τ = 500, 5000) but have been observed after the same amount of time has passed after seeding, resulting in identical distributions. Also note the power-law along with exponential cut-off behavior as predicted by Eq. (25).

V. SPECIAL CASES
In the preceding section we proposed the incorporation of a general memory kernel within a Simon's model framework. Prior literature has considered specific kernels, including Refs. [28,29] where a hyperbolic memory was studied, and Ref. [30] which, like the original Simon's model, utilizes a uniform memory that is, however, bounded such that elements are not considered after a certain amount of time has passed. Here we demonstrate how our proposed framework can describe these generalizations (particularly in the case of the latter) in continuous time while also allowing the determination of additional information regarding the statistical properties of the abundance distribution.

A. Schaigorodsky et al.
In [30] Schaigorodsky et al. propose an extension to Simon's model which they coin bounded memory prefer-ential growth (BMPG) whereby their choice of memory kernel is given by a step function This implies that the variant of a new element is uniformly chosen from the previous κ elements (with ∆t = 1). We may proceed, as before, to consider the average popularity of a variant with age t by calculating the Laplace transform of Eq. (29) given bŷ before substituting in Eq. (18) to obtain which gives the Laplace transform of the mean abundance size at a time t after the variant first appears in the BMPG model. While this quantity itself does not have an analytical inverse transform, it is still possible to invert it numerically as described in Sec. III A 1. Furthermore, we may perform approximate analysis regarding the mean abundance, as per Eq. (21), which is a valid approximation here in the regime where t κ. This calculation depends on the mean-memory time which in the case of BMPG is given by T = κ/2 and the corresponding expected abundance for a variant with age t is given by If we then consider the Taylor series of this function about t = 0, particularly focusing on the linear term, which is valid when t Furthermore, we readily obtain the entire distribution of abundances for variants of any age by substituting Eq. (29) into Eq. (15) and inverting numerically as before, from which moments of the distribution may also be calculated.
In order to validate the applicability of our GSM to this scenario we proceed to conduct numerical simulations of the process, with results shown in Fig. 5. Each simulation is averaged over 10 6 realizations with the dots representing simulated values and the curves their theoretical equivalents. Figure 5(a) shows the average abundance of a variant for various κ values where, as per [30], in each case the innovation probability µ is the reciprocal of the memory length κ. The theoretical results are determined via numerical inversion of Eq. (31) and generally offer excellent agreement with simulation. We also observe that in this case the linear approximation offered by Eq.(33) holds within the time regime κ t κ 2 /2, which supports the linear behavior observed in the mean abundance and furthermore gives an estimate for the slope of these functions to be approximately 2/κ. The corresponding probability distribution functions of abundance size at a number of different ages are shown in Fig. 5(b) with κ = 100. We also observe the 'peloton dynamics' reported in [27] whereby there are bulges within the distribution as it converges towards its large-time limit.

B. Cattuto et al.
A further study which analyses a memory-dependent Simon's model may be found in the works [28,29] by Cattuto et al. in which the memory kernel is defined as where γ represents a timescale characteristic to the system under consideration and C is simply a normalization factor determined by the condition tmax 0 φ(w) dw = 1, where t max is the current total time of the process, i.e., the memory distribution is varying at each step. This specific time-dependent memory kernel is beyond the scope of our proposed framework in that we assume a fixed kernel defined such that ∞ 0 φ(w) dw = 1. We may, however, consider a similar constant kernel for which the support is bounded by some upper time limit t * whereby C is determined to ensure t 0 φ(w) dw = 1. Numerical simulations (omitted from the present article) of the GSM with this memory kernel, like those studied above, offer perfect agreement with theoretical estimates of the abundance size distribution obtained from inversion of Eq. (15). Exact agreement with the original model proposed in [28,29] however would require the incorporation of a time-varying memory function within our framework which is beyond the scope of the present article.

VI. CONCLUSIONS
In this paper we have proposed a branching process framework to describe Simon's renowned randomcopying model [4] and generalizations thereof. This mapping is determined by identifying the replication events of the original model as equivalent birth events in the branching process. Through this description we obtain a detailed understanding of the temporal evolution of a new variant's abundance via a knowledge of its probability distribution under the branching process assumption. Using this analysis we reproduce classical results while also demonstrating the potential to determine further statistical properties. These findings are validated via extensive numerical simulations which ratify this approach in describing the model. Interestingly, in spite of the model's discrete nature, we find that the continuous time description suggested by our framework is extremely useful in further understanding properties of the original model.
Motivated by the branching process framework accurately describing Simon's original model (see Appendix B for a discussion on the accuracy of this interpretation) we proceeded to extend beyond the classical uniform random-copying probability to any arbitrary agedependent memory kernel resulting in our GSM framework. Such considerations have previously been studied in the case of specific kernels [28][29][30][31] and we now extend this to the general scenario. Using these ideas we demonstrate the potential for rich behaviors in such models that are in strong agreement with numerical simulations. Indeed, as per the classical model, it is possible to determine statistical properties of the process via analytical (or semi-analytical) calculation rather than through extensive Monte Carlo simulations. Lastly, we highlight the fact that power-law distributions of abundance size, as originally highlighted in Simon's model, occur for a different reason in the GSM for a variant with large age, albeit now with a smaller exponent (3/2) and an exponential tail.
Given the ubiquitous use of Simon's model as a basis for a range of different frameworks, including growing networks [23,27] and in describing the abundance distributions in numerous empirical systems [20][21][22], we believe that the theory introduced in this article offers the potential for much further exciting research. An important example is the incorporation of memory effects [39,45] within models for dynamical systems describing the behaviors of human activity. Lastly, while Simon's model is arguably the most notable neutral model [16], there are many other similar models which we anticipate to benefit from our framework in further understanding the temporal dynamics of variant abundance size.
Appendix B: Errors from the branching process approximation Throughout Sec. II of the main text we describe the classical Simon's model via a branching process approximation. It is important to note that this approach is, as stated, an approximation and in particular we consider the regime in which the time-step is smaller than the observation time, i.e., ∆t Ω, enabling a continuous time analysis to be conducted. Simon's model is, however, a discrete process by nature. In spite of this we demonstrate, via numerical simulations, that the approach appears to work extremely well in terms of capturing the behavior of the abundance of a given variant.
Here we consider the branching process approximation but in a discrete-time setting describing variants which appear near the observation time, for which the exact probabilities in both the BP setting and the model itself may be calculated and compared to determine the error associated with the approximation.
We define G τ (Ω) = G(τ, Ω; x) to be the PGF for the probability mass function describing the abundance of a variant which appeared at time-step τ observed at time Ω. Using similar arguments to those found in the main text we obtain the following difference equation to describe the dynamics of the process (B.1) where the squared term arises from the fact that a successful copying event can be construed as two separate trees beginning due to each element having the same probability of being copied in the classical model. Furthermore, we are aware of the final condition of this equation namely G Ω (Ω) = x, i.e., a variant seeded at the observation time must have only appeared once. This equation, while due to its non-linear nature being analytically troublesome, may be comfortably studied symbolically.
The approach taken above may seem extremely close to that underlying Simon's original model however the branching process description is not exact due to the fact that each tree created as a consequence of a successful copying event must be treated independently of one another from the branching process assumption. In the model itself, on the other hand, these trees are not independent as only one event may happen in a time step, i.e., if one element is successfully copied the other can not be.
In order to study the effect of this assumption in the branching process model we consider a variant which appeared at time τ = Ω − 2 and we are concerned with the probability mass function of its abundance at time Ω, i.e., after three potential copying events. Exact calculation of the quantities in the classical model, q exact n (Ω − 2, Ω) requires one to consider the possible sample paths for a variant to be copied n − 1 times, for example we now demonstrate how one may calculate q exact 2 (Ω − 2, Ω): Clearly, this approach itself becomes computationally expensive as we move the seeding event further backwards in time. Similarly, we iterate Eq. (B.1) to obtain the equivalent probabilities for the branching process interpretation, q BP n (Ω − 2, Ω), which are the coefficients of the resulting polynomial in x. Table. I demonstrates the probability values obtained from both approaches alongside the error resulting from the branching process approximation. Importantly, the error scales inversely with the observation time to the power of the number of time steps -three in the above example -which, in the case of large observation time, explains the good agreement observed between the branching process approximation and numerical simulations. I: Analysis of errors associated with the branching process approximation of Simon's classical model. The distribution of variant abundance from a mutation event at time Ω − 2 observed at time Ω is determined. The probabilities are calculated via both the branching process model from Eq. B.1 and the exact values via the sample paths of the true process, alongside the error calculated as the difference between the two. Abundance n q exact n (Ω − 2, Ω) q BP n (Ω − 2, Ω) Error Eq. (24) about w = 0 and keeping terms of order w, ψ and ψ 2 but ignoring terms of order ψw we obtain which has solution where we have taken the positive root as the PGF should be bounded above by one. Note that a branch point α exists in the complex x plane at x = 1 + µ 2 2(1−µ) 3 . Now, using the Cauchy theorem we may express the discrete probabilities in terms of the underlying generating function as where C is a closed contour in the complex plane which does not enclose any poles ofH. We deform C such that it does not enclose the branch point α into the contour C R ∪ l 2 ∪ C ε ∪ l 1 as seen in Fig. C.1. It can be easily shown that the contributions to the integral from the circular contours C R and C ε limit to zero as R → ∞ and ε → 0 [50]. Also, the contribution from the analytical part ofH will cancel due to the opposite directions of the line integrals. Therefore as w → 0 we have We now make the substitution ν = re iθ where along l 1 θ = π, ν = −r and √ ν = √ r e iπ 2 and along l 2 θ = −π, ν = −r and √ ν = √ r e −iπ 2 , taking the limit as R → ∞ we now consider the limits on the integral. In the x-plane l 1 varies from 1 + µ 2 2(1−µ) 2 to ∞ and as such w varies from − µ 2 2(1−µ) 2 to −∞, hence ρ takes values between µ 2 2(1−µ) 2 and ∞, and finally ν varies between 0 to −∞. A similar line of thought allows one to see that along l 2 , ν varies between −∞ to 0. Thus our integral may now be expressed as where the integral is now well-known with solution q n = (1 − µ) 1/2 √ 2π n − 3 2 e −nµ 2 2(1−µ) 2 (C. 8)