Effects of Network Structure, Competition and Memory Time on Social Spreading Phenomena

Online social media has greatly affected the way in which we communicate with each other. However, little is known about what fundamental mechanisms drive dynamical information flow in online social systems. Here, we introduce a generative model for online sharing behavior that is analytically tractable and that can reproduce several characteristics of empirical micro-blogging data on hashtag usage, such as (time-dependent) heavy-tailed distributions of meme popularity. The presented framework constitutes a null model for social spreading phenomena that, in contrast to purely empirical studies or simulation-based models, clearly distinguishes the roles of two distinct factors affecting meme popularity: the memory time of users and the connectivity structure of the social network.

Recent advances in communication technologies and the emergence of social media have made it possible to communicate rapidly on a global scale.However, since we receive pieces of information from multiple sources, this has also made the information ecosystem highly competitive: in fact, users' influence and visibility are highly heterogeneous and memes or topics strive for users' attention in online social systems.Although several studies have described the dynamics of information flow in popular communication media [1][2][3][4][5], the main factors determining the observed patterns have not been identified and there is no theoretical framework that addresses this challenge.Indeed, given the potential for applications-e.g., having more efficient systems to spread information for safety and preparedness in the face of threats-a better understanding of how memes (ideas, hashtags, etc.) emerge and compete in online social networks is critical.
To address this problem, we develop a theoretical framework that describes how users choose among multiple sources of incoming information and affect the spreading of memes on a directed social network, like Twitter [1][2][3].Our probabilistic model, in contrast to other studies [3][4][5][6][7] that Figure 1: Schematic of the model.(A) Timeline of users' actions in a typical realization of the model.User A is followed by users B and C; arrows between nodes denote the direction of information transmission.Note that user B also follows many other users, and so his stream contains more memes than the streams of A or C. At time t A R , user A retweets a previously-seen meme (with probability 1 − µ, given A is active).She chooses the red meme to retweet, by looking backwards in her stream a distance determined by the memory-time distribution Φ (only memes that A deemed "interesting" are shown in her stream).Her retweet of the red meme is accepted as "interesting" (and so inserted into their stream) by each follower of A with probability λ.At time t C R user C retweets the red meme to his followers, so further increasing the popularity of the red meme.At time t A I user A innovates (a probability µ event, given A is active) by inventing the new blue meme and broadcasting it to her followers.(B) Branching process representation (Sec.S2) of the popularities of the red meme and of the blue meme.Each retweet generates new branches of the process, as the meme is inserted into the streams of followers of the tweeting user.are based on intensive computational simulations to fit to data, allows us to get analytical insights into the respective roles of the network degree distribution, the memory-time distribution of users, and the omnipresent competition between memes for the limited resource of user attention.Using this analysis, we fit the model to hashtag popularities extracted from micro-blogging data, and predict features of the time-dependent data.
Our model is as follows.In online communication platforms like Twitter, users follow (receive the broadcasts or "tweets" of) other users.In graph-theoretical terms, these relationships constitute directed links from the followed node (user) to the follower (Fig. 1) and the network composed of all users is characterized by an out-degree distribution p k (p k =probability that a randomly-chosen user has k followers).For simplicity, let us first assume that each user follows, on average, z others, where z = k kp k is the mean degree of the network; the general case in which this assumption is relaxed is addressed in the supplementary materials (SM).Additionally, each user has a "stream" that records all tweets she receives, time-stamped by their arrival time.We assume that only a fraction λ of the tweets received are deemed "interesting" by the user, and only the interesting tweets are considered for possible retweeting by that user.(Here we use the term "retweeting" in a general sense, to include any reuse of a previously-received meme such as a hashtag).The activity rate of a user-the average number of tweets that she sends per unit time-can depend on how well-connected the user is within the social network [3], but here we describe only the simplest case of homogeneous activity rates (for the general case, see SM), so that each user sends a tweet, on average, once per time unit (i.e., as a Poisson process with rate 1).When a user decides, at time t, to send a tweet, she has two options (see Fig. 1): with probability µ, the user innovates, i.e., invents a new meme, and tweets this new meme to all followers.The new meme appears in the user's own stream (it is automatically interesting to the originating user), and in the streams of all her followers, where it may be deemed interesting by each follower, independently, with probability λ.If not innovating (with probability 1 − µ), the user instead chooses a meme from her stream to retweet.The meme for retweeting is chosen by looking backwards in time an amount t m determined by a draw from the memory-time distribution Φ(t m ), and finding the first interesting meme in her stream that arrived prior to the time t − t m .The retweeted meme then appears in the streams of the user's followers (time-stamped as time t), but because it is a retweet, it does not appear a second time in the stream of the tweeting user.The popularity n of a meme is the number of times it has been tweeted or retweeted; this quantity depends on the age a of the meme (the time since it was first tweeted) [7].
The model as described is a "neutral model" [8,9] in the sense that all memes have the same "fitness" [10]: no meme has an inherent advantage in terms of its attractiveness to users.Nevertheless, the competition between memes for the limited resource of user attention causes initial random fluctuations in popularities of memes to be amplified, and leads to popularity distributions with very heavy tails [11]: heavier, for example, than can be generated by models of preferential attachment or cumulative advantage type [12][13][14][15][16].This "competition-induced criticality" was studied for a zero-memory (Φ(t m ) = δ(t m )) version of this model in Ref. [17]; numerical simulation results for a closely related model were first reported in Ref. [3].
A branching process approximation for the model (SM), enables us to understand how the network structure (via the out-degree distribution p k ) and the users' memory-time distribution (Φ(t m )) affect the popularity distribution of memes.Defining q n (a) as the probability that a meme has popularity (total number of (re)tweets) n at age a, the branching process provides (B) Fraction q 1 (a) of memes that are not retweeted by age a, on the scale-free network of (A) and for various memory-time distributions Φ(t m ) (red = exponential with mean T , blue/green = Gamma(0.1,10T )), using Eq.S2.Dashed lines show the T = 5 cases; solid lines represent T = 1.
(C) Mean popularity of memes of age a, for the same cases as in (B), and compared with Eq. 1 (using the numerical Laplace transform inversion described in Sec.S4); inset shows the large-a behavior.All panels have µ = 0.02 and (except for green curves) λ = 1.
analytical expressions that determine the probability generating function (PGF) of the popularity distribution, H(a; x) = ∞ n=1 q n (a)x n .In the small-innovation limit µ → 0, the model describes a critical branching process [18,19], for which power-law distributions of popularity (avalanche size) are expected [20][21][22].Although the analytical expressions are somewhat complicated, we can see the impact of p k and Φ(t m ) by focusing on certain features of the distribution.For example, the fraction of memes that are not retweeted at age a, i.e., those whose popularity remains at n = 1 until at least age a, is given by q 1 (a) = lim x→0 H(a; x)/x.The large-age asymptotic behavior of this quantity can be written explicitly in terms of the network out-degree distribution and the memory-time distribution (Sec.S2.3).In particular, q 1 (a) limits to a non-zero value as the age a tends to infinity, capturing the observed fact that the competition for user attention results in a substantial fraction q 1 (∞) of memes being ignored (never retweeted) subsequent to their birth [1,2,23], see Fig. 2B.
The expected (mean) popularity of memes of age a, m(a) = n nq n (a) = ∂H/∂x| x=1 , can also be found explicitly, up to a Laplace transform inversion: where hats denote Laplace transforms.Strikingly, unlike q 1 , the mean popularity depends on the network degree distribution p k only through the mean degree z, implying that the mean popularity is independent of the finer details of the network structure.The large-age, small-µ asymptotics of m(a) may be inferred from Eq. ( 1): this analysis (Sec.S2.5) is valid for ages a with a ≫ T , where T = ∞ 0 t m Φ(t m )dt m is the mean memory time.We find that m(a) grows approximately linearly with age a: m(a) ≈ λz+1 T λz+1 a, until ages of order T λz+1 µ(λz+1) , whereupon m(a) limits to its a → ∞ value of 1/µ.If the memory-time distribution has significant probability mass at low values of t m (e.g., the gamma distribution in Fig. 2) then the mean popularity initially grows faster-than-linearly with age (see Fig. 2C).
Notably, the infinite-age limits of both q 1 (a) and m(a) are independent of the memory-time distribution Φ(t m ).This property is, in fact, shared by the entire popularity distribution in the a → ∞ limit, which implies that the large-n asymptotic behavior can be determined using similar methods as for the zero-memory case in Ref. [17].We find that if the out-degree distribution p k of the network has a finite second moment, then the popularity distribution is a truncated power law with exponent 3/2: q n (∞) ∼ A e −n/κ n −3/2 as n → ∞ (see Sec. S2.6 for explicit expressions for the constants A and κ; note the exponential cutoff κ limits to ∞ as µ tends to 0).However, if the network has a heavy-tailed degree distribution, i.e., p k ∝ k −γ with 2 < γ < 3, then the infiniteage popularity distribution is also power law: q n (∞) ∼ B n −γ/(γ−1) as n → ∞ if µ ≈ 0. Note the exponent of the power law, γ/(γ − 1), lies between 1.5 and 2, indicating that the popularity distribution is extremely heavily skewed.
The approach of the popularity distribution to its infinite-age limit, i.e., the behavior for finite but large ages, is of particular interest for real-world social networks that have p k ∼ D k −γ for large k.We focus especially on the case where γ is close to 2, as found for the out-degree distribution of many real-world social networks.In this case, we show (Sec.S2.7) that for large n and µ ≈ 0, the probability q n (a) grows approximately linearly with age a for large a, with a slope that is proportional to the growth rate of the mean popularity m(a).This result implies that the Here, our basic model with homogeneous user activity rates does not fit well to the data (red curve), but allowing for heterogeneous user activity rates improves the fit (blue curve), without compromising the other matches with data (see Fig. S5).
ratio q n (a)/m(a) is independent of age a, at least for values of a within the range of validity of the approximations.As a consequence, we expect that if we rescale the age-a popularity distribution by dividing by its mean m(a), then the rescaled distributions for different ages should collapse onto a single curve.We tested this prediction on a 1-year dataset comprised of the popularities of 1.4×10 5 hashtags related to the 2011 15M protest movement in Spain [24,25].We randomly sampled 8.2 × 10 5 users on Twitter to determine the distribution p k of number of followers, and found p k ∼ D k −γ for large k, with γ = 2.13 (Fig. S2).Although the model parameters cannot be directly calculated from the dataset (Sec.S1), by assuming that the memory time distribution for users is a gamma distribution [19,26], Φ = Gamma(k G , θ), Eqs. 1 and S1 enabled us to find model parameters λ, µ, k G and θ that provide a good fit to the age-dependent mean and the old-age (a → ∞) distribution of hashtag popularities.Figure 3B demonstrates that the model and data both exhibit the predicted collapse of age-dependent distributions when scaled by their mean (as fitted in Fig. 3C).We show in the SM (Sec.S3) that the fit of the model to the observed q 1 (a) (Fig. 3D) can be improved by including realistic dependence of user activity rates upon the number of followers [3,27].
In summary, despite its simplicity, the model matches the empirical popularity distribution of hashtags on Twitter remarkably well; this is consistent with random-copying models of human decision-making [28] where the quality of the product-here, the "interestingness" of the meme-is less important than the social influence of peers' decisions [29].The generalization of the model (as shown in the SM) to incorporate (i) heterogeneous user activity rates and (ii) a joint distribution p jk of the number of users followed j and the number of followers k, remains analytically tractable and confirms the robustness of our main finding: that competition between memes for the limited resource of user attention induces criticality in the vanishing-innovation limit, giving power-law popularity distributions and epochs of linear-in-time popularity growth.We believe that the analytical results and potential for fast fitting to data will render this a useful null model for further investigations of the entangled effects of memory, network structure, and competition on information spread through social networks [30].

Contents
S1 Further information on Figures 2 and 3 8

S1 Further information on Figures 2 and 3
For the numerical simulation results in Fig. 2 of the main text, we generate configuration-model directed networks with prescribed out-degree distribution p k .Each one of N users (nodes) is assigned a random number k (drawn from the distribution p k ) of out-links (links to followers).The identities of the k followers are chosen uniformly at random from the set of all users; in the N → ∞ limit, this gives a Poisson in-degree distribution p j which, for sufficiently large z, gives similar results to using the in-degree distribution p j = δ j,z , i.e., assuming every user follows exactly z others [17].
The large-a, large-n, µ = 0 asymptotics of the popularity distribution are determined (see derivation in Sec.S2.7) from the Laplace transform of a probability generating function (PGF), which is given (under the assumptions of the main text-p j = δ j,z and homogeneous user activity rates-and for networks with the scale-free out-degree distribution second moment (e.g., a Poisson out-degree distribution, as in the inset of Fig. 2A) is given by Eq. (S60).Under the assumptions of the main text, the fraction q 1 (a) of memes that have not been retweeted by age a has the following large-a asymptotic behavior (see Sec. S2.3 for derivation): where C(a) is the cumulative distribution function for memory times: The results in Figs.2B and 2C are specific to networks with the scale-free out-degree distribution p k = D k −γ for k ≥ 4 (and p k = 0 for k < 4), with exponent γ = 2.5. Figure S1 shows the corresponding results for networks with a Poisson out-degree distribution (e.g., Erdös-Rényi directed graphs) with mean degree z matching that of the scale-free networks in Fig. 2. The age-dependence of q 1 (a) in Fig. S1A is qualitatively similar to that of Fig. 2B, but note that the limiting value as a → ∞ is different in the two cases: using Eq.(S2) (with C(∞) = 1) we obtain q 1 (∞) = 0.50 for the scale-free case with λ = 1, whereas q 1 (∞) = 0.37 for the Poisson network.In contrast, the mean popularity shown in Fig. S1B is essentially identical to that of Fig. 2C: this is because the mean popularity given by Eq. ( 1) of the main text depends only on the mean degree z of the network, which is the same in both the scale-free and Poisson networks.
The data plotted in Fig. 3 gives the popularity of 1.4 × 10 5 hashtags related to the 2011 15M protest movement in Spain, that were tracked over the 1-year period from March 2011 to March 2012 [24,25].In Fig. 3 we use all hashtags for which we have at least 200 days of data; each curve in Fig. 3A shows the popularity distribution for all hashtags which have the same age (to the nearest day).The model parameter λ and the memory-time distribution Φ(t m ) cannot be directly estimated from the data because in cases where users receive multiple copies of the same meme (hashtag) prior to retweeting it, it is impossible to tell which of received memes "caused" the retweet 1 .Therefore, we instead use the analytical results of the model (Eqs.( 1) and (S1)) to find parameter values that fit the model to the statistical characteristics of the data, see Fig. 3.
The data does, however, provide an upper bound on the value of the innovation probability µ.Recall that µ is defined as the probability that a tweeted meme (hashtag) is an innovation, i.e., that the hashtag has never before appeared in the system.Each innovation event thus increases by one the number of distinct hashtags that appear in the dataset, whereas a non-innovative (copying) tweet will instead increase the number of copies of a hashtag that is already present in the dataset.We can therefore calculate an upper bound on the empirical innovation probability from the ratio μ = number of distinct hashtags used in the dataset total number of hashtags tweeted by users = 322799 5886837 = 0.055.(S3) Note this upper bound is consistent with the parameter value of µ = 0.033 used in Fig. 3.The reason why Eq. ( S3) gives an upper bound rather than an exact value for µ is the finite size of the dataset: the data collection started at a specific point in time and so any hashtags that are in fact copied from tweets received prior to the start date will be erroneously counted as "distinct hashtags" in the estimate, thus leading to an overestimate of the true innovation probability.
To sample the out-degree distribution p k of the Twitter network, we randomly selected 8.2×10 5 Twitter user ids and recorded the number of followers k of each user.The measured mean number of followers is z = 703, but the distribution p k is heavy-tailed.The complementary cumulative distribution function (CCDF) of the k values is shown in Fig. S2, along with the line D/(γ−1)k 1−γ with D = 240 and γ = 2.13 that corresponds to an out-degree distribution with tail scaling as p k ∼ D k −γ as k → ∞ [31].We use these values of D, γ and z in Eq. (S1) to produce the model results in Fig. 3A and 3B (using numerical inversion of the Laplace transformed PGF, see Sec.S4 for details).

S2 Derivation and analysis of model equations
In this section we derive the equations for the branching process approximation of the model, and use asymptotic methods to understand how the results depend on the parameters of the model.

S2.1 Derivation of governing equations
We begin by considering a somewhat more general setting for the model than that described in the main text, and we derive the equations for the branching process approximation in this general case; the specialization to the case considered in the main text will then be straightforward.In particular, we allow here for a more general network structure, and for heterogeneous user activity rates, while in the main text we specialize to networks described by only their out-degree distribution p k and we assume all users have equal activity rates.The straight line corresponds to an out-degree distribution with tail scaling as p k ∼ D k −γ as k → ∞, with D = 240 and γ = 2.13 (x min = 1.1 × 10 4 , fitted as described in [31]).

S2.1.1 Description of the generalized model
The network structure is defined by the joint probability p jk that a randomly-chosen node (user) has in-degree j (i.e., follows j other Twitter users) and out-degree k (i.e., has k followers), but the network is otherwise assumed to be maximally random (a configuration model directed network).The mean degree of the network is z = j,k kp jk = j,k jp jk .For the special case of the main text, we assume all users follow z others, so p jk can be replaced with δ j,z p k , where δ j,z is the Kronecker delta and p k is the out-degree distribution.
Each user has a "stream" that records all tweets received by the user, time-stamped by their arrival time.We assume that only a fraction λ of the tweets received are deemed "interesting" by the user, and only the interesting tweets are considered for possible retweeting by that user.The activity rate of a user-the average number of tweets that she sends per unit time, i.e., the rate of the Poisson process that describes her tweeting activity-is, in the general case, assumed to depend on her in-degree j and out-degree k (her "(j, k)-class" for short); this assumption is supported by empirical evidence from Twitter, see Fig. 6 of [27].The user activity rates β jk give the relative activity levels of users in the (j, k) class; the rates are normalized so that jk β jk p jk = 1.If there are N users in the network, this rate implies that an average of N tweets are sent in each model time unit.For the description in the main text (except in Fig. 3D), we specialize to the case where all user activity rates are equal: When a user decides, at time t, to send a tweet, she has two options (refer to Fig. 1 of the main text): with probability µ, the user innovates, i.e., invents a new meme, and tweets this new meme to all followers.The new meme appears in the user's own stream (it is automatically interesting to the originating user), and in the streams of all her followers (where it may be deemed interesting Figure S3: Schematic for the derivation of the PGF equations, see Sec.S2.1.2.(A) The stream of user A, showing only memes that were deemed interesting by user A; each color represents a different meme.At time t, user A decides to retweet a meme from the past, and looks back to time r, where she finds meme M (colored red).She sends this meme to her followers (not shown); each follower independently deems the meme interesting with probability λ.Also shown is a later retweet event, which also copies meme M .(B) The retweet tree for meme M , seeded at time τ .Each retweet by user A of meme M generates a new branch on this tree; each branch can also generate further retweets by followers of A, these subtrees are denoted by squares.(C) Schematic depiction of Eqs.(S9) and (S17).by each follower, independently, with probability λ).If not innovating (with probability 1−µ), the user chooses a meme from her stream to retweet.The meme for retweeting is chosen by looking backwards in time an amount t m determined by a draw from the memory-time distribution Φ(t m ), and finding the first interesting meme in her stream that arrived prior to the time t − t m .The retweeted meme then appears in the streams of the user's followers (time-stamped as time t), but because it is a retweet, it does not appear a second time in the stream of the tweeting user.The popularity n(a) of a meme is the total number of times it has been tweeted or retweeted by age a, i.e., by a time a after its first appearance (when it was tweeted as an innovation).

S2.1.2 Derivation of the PGF equations
We define G jk (τ, Ω; x) as the probability generating function (PGF) for the size of the "retweet tree", as observed at time Ω, that grows from the retweeting of a meme that entered, at time τ ≤ Ω, the stream of a (j, k)-class user, see Fig. S3B.To obtain an equation for G jk , we consider the stream of a random (j, k)-class user (called "user A") with a meme M that entered the stream at time τ (either by innovation, or because it was received from a followed user and deemed interesting by A), see Fig. S3A.
The likelihood that meme M is retweeted in the future depends on how quickly other tweets enter the stream of user A. In fact, meme M can be considered to "occupy" the stream for a time interval ℓ stretching from τ until the time τ + ℓ when the next interesting meme enters the stream of user A. New memes enter the stream as a Poisson process at the constant rate 2 r jk = jβλ + µβ jk , (S4) so the occupation time ℓ of meme M -the time it occupies the stream of user A-is an exponentially distributed random variable with PDF We note in passing that the mean occupation time is, for small innovation probabilities µ, inversely proportional to j, the number of users followed.Thus, a user who follows many others experiences tweets entering his stream at a higher rate than a lower-j user (compare the streams of users B and C in the schematic Fig. 1).Consequently, the high-j user is less likely to see (and so to retweet) a given meme than a low-j user.This aspect of the model clearly reflects empirical data, see Fig. 3 of [32] for example.
To determine the size of trees originating from meme M , we consider that trees observed at a time Ω ≥ τ must be created by the retweeting by user A, at some time(s) between τ and Ω, via looking back in her stream to a time r, where r lies between τ and min(τ + ℓ, Ω) (i.e., r lies within the time interval where meme M occupies the stream).Let's consider a time interval of (small) length dr, centered at time r, and calculate the size of trees that are seeded by a retweet based on a lookback into this interval, from a time t, with t > r, see Fig. S3.In each dt interval centered at time t, a tree will be seeded with probability3 and will grow to a tree with size distribution (at observation time Ω) generated by4 2 User A follows j users, each of which is assumed to tweet at the average rate β = jk k z β jk p jk .Each meme sent by these j users is deemed interesting by A with probability λ, so the rate at which interesting memes enter the stream of user A is jβλ.Moreover, user A innovates at a rate µβ jk , which gives the second term of Eq. (S4).If either an incoming tweet or an innovation event occurs, a new meme is inserted into the stream of user A, and the occupation time of meme M is ended. where is the PGF for the sizes of trees originating from the successful insertion at time t of a meme (that is deemed interesting) into the stream of a random follower.
To calculate the total size of the tree seeded by copying from the dr-interval, we must add the sizes of trees that are copied into all times t with t > r.Since each copying event is independent, the total tree size is generated by Taking logarithms of both sides of this equation and expanding to first order in dt gives so J(r; x) can be written as Recall that J(r; x) is the PGF for trees seeded by copying from time r.To obtain the total size of all children trees of meme M , we must consider trees seeded at all possible times r from τ to the time min(τ + ℓ, Ω) that marks the end of the occupation of user A's stream by meme M .Each dr time interval again independently generates trees with sizes distributed according to Eq. (S12), so the PGF for the total size is found by multiplying together copies of the J(r; x) function for each dr time interval, thus:

S2.2 Criticality of the branching process
A branching process may be classified by the expected (mean) number ξ of "children" of each "parent": if this number (called the "branching number") is less than 1, the process is subcritical and if ξ is greater than 1 the process is supercritical.Critical branching processes, with an average of exactly one child per parent, give rise to power-law distributions of tree-sizes and of durations of growth cascades [20,22].Here we demonstrate that the general process derived in Sec.S1 is a critical branching process in the limit of vanishing innovation µ → 0. We identify the "parent" in the process as a meme that was accepted into the stream (i.e., deemed interesting) of a (j, k)-class user at time τ : see, for example, meme M in the stream of user A, as shown in Fig. S3.The "children" of this meme are the retweets of it that are accepted into the streams of the followers of A at any time t > τ .The PGF for the number of children of meme M is derived by following the same steps as in Sec.S2.1.2,but replacing R k by (1−λ+λx) k : each power of x then counts a successful insertion of meme M into the stream of one of the k followers of A. The resulting PGF, for a meme of age a, is (cf.Eq. (S15)) where C(t) = t 0 Φ(t m )dt m is the cumulative distribution function for memory times.The expected (mean) number of children for a meme in the (j, k)-class stream is determined from the PGF in the usual way, by differentiating with respect to x and evaluating at x = 1, thus: In the limit of large ages, a → ∞, we use the fact that C(∞) = 1 to obtain Averaging over all (j, k) classes, the effective branching number ξ of the process is the expected number of children of a meme that is accepted into the stream of a random follower: (recall that β ≡ j,k k z β jk p jk ).Thus, we have shown that the branching process underlying the model is critical when µ = 0.The occupation time of a meme in a users' stream is due to the competition between neutralfitness memes for the limited resource of user attention; this competition ensures that the mean number of successful retweets (children) generated during the finite occupation time of the meme is precisely one, and so induces the power-law distributions of cascade sizes that are characteristic of critical branching processes [20,22].

S2.3 An explicit expression for q 1 (a)
The value q 1 (a) is the probability that a meme, once created via an innovation event, is not retweeted by the time it reaches age a.This probability may be calculated explicitly using Eq.(S19): with G(a; 0) given, from Eq. (S16), by and C(a) is the cumulative distribution function for memory times.If we consider the large-age limit, a → ∞, than we can approximate the integral of the cumulative distribution function as and the integral over ℓ can be calculated to give the large-a approximation In the simplified case p jk = δ j,z p k and β jk ≡ 1, Eqs. (S27) and (S28) reduce to Eq. (S2), as used in Figs.2B and 3D.The a = ∞ limit of q 1 (a) gives the fraction of memes that are never retweeted, and so have popularity n = 1 forever 5 .The value of q 1 (∞) is obtained from Eqs. (S27) and (S28) by setting C(a) to its a → ∞ limit of 1.The approach of q 1 (a) towards the value q 1 (∞) depends, through the CDF C(a), on the tail of the memory-time distribution Φ.If the distribution Φ is heavy-tailed, there is a non-negligible probability that a meme may be retweeted even if a very long time has elapsed since its birth.

S2.4 Distribution of response times
It is worth noting that all agents in the model have constant activity rates, so that the actions of each individual agent constitute a Poisson process.A Poisson process is characterized by an exponential distribution of inter-event times, where each event corresponds to an innovation or a retweeting action.This assumption is contrary to studies such as [19,[33][34][35], where heavy-tailed distributions of inter-event times are examined.Despite this, in our model the memory-time distribution Φ(t m ) directly influences the waiting times (or "response times") between the receipt of a specific meme, and the retweeting of it.Indeed, if Φ(t m ) is a heavy-tailed distribution, then a meme received by a given user at time τ will be retweeted by that user at a time t (with t ≫ τ ) with probability proportional to6 Φ(t − τ ).Therefore, a heavy-tailed memory distribution gives rise to a heavy-tailed waiting-time distribution for individual memes, despite the fact that the activity of each individual user is described by a Poisson process (cf. the heavy-tailed waiting-time distributions found in empirical studies of email correspondence [36,37]).It is clearly important to distinguish between the distributions of inter-event times (for actions of users) and of the waiting times experienced by individual memes: the model assumes each user has exponentiallydistributed inter-event times, but it can nevertheless produce heavy-tailed distributions of waiting times for memes to be retweeted.
In particular, if the memory-time distribution Φ(t m ) is a Gamma(k G , θ) distribution [35] as used in Fig. 3 of the main text, i.e., Φ(t m ) = exp (−t m /θ), then Φ(t m ) is approximately power-law for memory times t m with t m ≪ θ, with an exponential cutoff at larger times.The corresponding waiting-time distribution shows a similar scaling in this range; for the parameters used in Fig. 3 (k G = 0.25, θ = 80 days) the waiting-time distribution scales as t −0.75 m for t m ≪ θ, similar to the slow decay noted in empirical response times for Twitter users in Fig. 5 of [32].

S2.5 Mean popularity
The age dependence of the mean popularity (i.e., the expected number of tweets/retweets for a meme of age a) is given by popularity distribution q n (∞) may also be applied here: this is based on writing x = 1 − w and G ∞ = 1 − φ(w) and analyzing the small-w, small-φ asymptotics of Eqs.(S43) and (S44).We refer to [17] for details, and here summarize the main results for the case β jk ≡ 1, p jk = δ j,z p k that is considered in the main text.
• Case 1: p k has finite second moment The large-n scaling of the popularity distribution is given by a power-law with exponential cutoff: where the prefactor A is7 and the cutoff κ is Note that κ is proportional to 1/µ 2 for small µ, so in the limit of vanishing innovation probability the exponential cutoff tends to infinity and the power-law part of the popularity distribution extends to all n.
• Case 2: p k ∼ D k −γ as k → ∞, with γ between 2 and 3 Immediately taking the µ → 0 limit, we find in this case that the popularity distribution has a power-law form with exponent γ/(γ − 1) lying between 3/2 and 2: with prefactor B given by (S49)

S2.7 Large-a, large-n asymptotics of popularity distribution
Next we consider how the popularity distribution q n (a) behaves for large, but finite, ages.We are particularly interested in the case where the out-degree distribution of the social network is scalefree, as we seek to understand the "parallel" CCDFs observed at various ages in the empirical data of Fig. 3A of the main text.).In all cases of interest the values of F are close to 1, and so we conclude that the one-term expansion used in Eq. (S51) gives a good estimate of the exact steady-state solution.
large (e.g., λz = 0.32 for the model fit in Fig. 3), the values of F (λz, γ) can still be close to unity if γ is sufficiently close to 2.
To consider small deviations from the steady state, we define g(a; w) by φ(a; w) = φ(∞; w) (1 − g(a; w)) (S56) with g(a; w) → 0 as a → ∞.Assuming that g is sufficiently small to allow the use of the linearizing approximation Eq. (S53) can be solved for the Laplace transform of g: The Laplace transform of φ then follows from Eq. (S56) and a similar asymptotic analysis of Eq. (S19) yields Substituting from Eqs. (S56) and (S58) results in Eq. (S1).Numerical inversion of the Laplace transform and of the PGF, as described in Sec.S4, give the results shown in Figs. 2 and 3 of the main text.
A similar analysis can be performed in the case where the out-degree distribution p k has finite second moment.We again utilize a one-term expansion similar to Eq. (S51), but we can also retain a non-vanishing innovation probability µ in this case.The one-term expansion can be on n, and the CCDFs for various ages appear almost parallel in the log-log plot of Fig. 3A of the main text (note γ = 2.13 in the Twitter network used in Fig. 3 of the main text, see Sec.S1).
As we saw in Sec.S2.5 for the large-age asymptotics of the mean popularity, the long-time behavior of the popularity distribution may be obtained by inserting the small-s approximation Φ(s) ≈ 1 − sT in Eq. (S1) and examining the linear (early-age) growth of the inverse transforms.The resulting popularity distributions q n (a) show (for large n) a regime of linear-in-age growth, and in the case where γ ≈ 2, the rate of this growth depends only weakly on n.Since the mean popularity m(a) is also growing linearly during this age period (see Eq. (S41)), the division of the CCDFs at various ages by the corresponding mean m(a) leads to the collapse of the data onto the single curve that is seen in Fig. 3B.

S3 Extension to heterogeneous activity rates
To focus on understanding the combined effects of memory and out-degree distribution, most of our results thus far are specialized to the case of uniform user activity rates, β jk ≡ 1.It is interesting, however, to examine the impact of heterogeneous activity rates upon the results we have obtained.To this end, we extend here to the case where the activity rate of a user depends on its out-degree k while retaining the assumption p jk = δ j,z p k , so that β jk = β k (normalized so that k β k p k = 1 and with β = k k z β k p k ).The mean popularity is given in the general case by Eq. (S34).Repeating the asymptotic analysis of Eq. (S38) though to Eq. (S41) for the µ → 0 limit, we again find linear growth of m(a) with age a, with a slope that generalizes that found in Eq. (S41):  To demonstrate the effect of heterogeneous activity rates, we consider a model for β k inspired by the data analysis shown in Fig. 6(a) of [27].There, the average activity rate (as measured by the number of tweets by a user in a fixed time period) is found to grow approximately linearly with the number of followers k of that user, for k from 0 to about 100.Then, for k values from about 100 up to the maximum shown in the plot (k = 10 3 ), the activity rate grows as a more slowly increasing linear function of k.We model these characteristics (which are also seen in other studies, e.g., [38]), using a piecewise-linear and continuous function of k, assuming a saturation of activity at very high k, as follows: where the values are chosen to closely match the linear growth rates in Fig. 6(a) of [27].Using this heterogeneous activity rate (with the constant of proportionality set by the condition k β k p k = 1), Fig. S5 shows results that correspond closely to the homogeneous-activity example of Fig. 3 of the main text.A comparison of panels D from both figures clearly shows that including heterogeneous activity rates allows a better fitting of the model to data for the fraction q 1 (a) of non-retweeted memes.However, the other results of the model (panels A, B and C of Fig. S5 compared to same panels in Fig. 3) are relatively unaffected by the activity rate, so that the good matches between model and data seen in Fig. 3 are not compromised by including heterogeneity in activity rates.

S4 Numerical inversion of Laplace transforms and PGFs
Many of our results for the popularity distribution q n (a) are expressed in terms of the corresponding PGF H(a; x).As in [17], we use the Fast Fourier Transform method of [39][40][41] to numerically invert the PGF at a fixed age a to produce, for example, the model distributions in Figs. 2 and 3; see Sec.S2 of [17] for further details and links to Octave/Matlab code for implementing the PGF inversion.
The results of the model for the age-dependence of several quantities are expressed in terms of Laplace transforms.To numerically invert the Laplace transforms we use the efficient Talbot algorithm [42], in its simplified version described in Sec.6 of [43].The Talbot algorithm is based on a numerical evaluation of the Bromwich (Laplace inversion) integral, using a cleverly-chosen deformation of the contour in the complex-s plane.The Laplace inversion of Ĥ(s; x) to obtain H(a; x) at a desired age a, for example, can be quickly computed using the 2M L − 1 weights γ k and nodes δ k defined by [44]

ΦFigure 2 :
Figure2: Numerical simulations of the model, compared with analytical results.(A) Complementary cumulative distribution functions (CCDFs) for meme popularity at age a: numerical simulation results (black) on a network with scale-free out-degree distribution (p k ∝ k −γ for k ≥ 4 with γ = 2.5, mean degree z = 11, N = 10 5 nodes), compared with asymptotic model result Eq.S1 (colored curves).The memory-time distribution is Φ = Gamma(k G , θ) with k G = 0.1 and θ = 50, so the mean memory time is T = k G θ = 5.Inset: As main, but for Poisson outdegree distribution p k (z = 11) and gamma memory-time distribution with k G = 0.1 and θ = 0.5.(B) Fraction q 1 (a) of memes that are not retweeted by age a, on the scale-free network of (A) and for various memory-time distributions Φ(t m ) (red = exponential with mean T , blue/green = Gamma(0.1,10T )), using Eq.S2.Dashed lines show the T = 5 cases; solid lines represent T = 1.(C) Mean popularity of memes of age a, for the same cases as in (B), and compared with Eq. 1 (using the numerical Laplace transform inversion described in Sec.S4); inset shows the large-a behavior.All panels have µ = 0.02 and (except for green curves) λ = 1.

Figure 3 :
Figure 3: Comparison of the model with Twitter hashtags data.(A) CCDFs for popularity of hashtags at age a (at time a after their first appearance in the dataset).The model CCDFs (from Eq.S1) are multiplied by 10 for clarity.Model parameters are: λ = 4.5 × 10 −4 , µ = 0.033, k G = 0.25, θ = 500, with one model time unit corresponding to 0.16 days (B) CCDFs at age a, each divided by the mean popularity at age a.The data shows a collapse onto a single curve that is closely matched by the model.(C) The mean popularity of hashtags of age a. (D)The fraction q 1 (a) of hashtags that are not retweeted by age a.Here, our basic model with homogeneous user activity rates does not fit well to the data (red curve), but allowing for heterogeneous user activity rates improves the fit (blue curve), without compromising the other matches with data (see Fig.S5).

ΦFigure S1 :
Figure S1: As panels B and C of Fig. 2 of the main text, but for a network with Poisson out-degree distribution (mean degree z = 11).As in Fig. 2, both panels have µ = 0.02 and (except for green curves) λ = 1.
Figure S2: CCDF for the number of followers k of a random sample of 8.2 × 10 5 Twitter users.The straight line corresponds to an out-degree distribution with tail scaling as p k ∼ D k −γ as k → ∞, with D = 240 and γ = 2.13 (x min = 1.1 × 10 4 , fitted as described in[31]).

Figure S4 :
Figure S4: The function F (ζ, γ), as defined in Eq. (S55), for values of γ close to 2. The highlighted points are the parameter values that are relevant to Fig. 2 (λz = 11, γ = 2.5) and to Fig.3(λz = 0.32, γ = 2.13).In all cases of interest the values of F are close to 1, and so we conclude that the one-term expansion used in Eq. (S51) gives a good estimate of the exact steady-state solution.