Neutral Theory for competing attention in social networks

We used an ecological approach based on a neutral model to study the competition for attention in an online social network. This novel approach allow us to analyze some ecological patterns that has also an insightful meaning in the context of information ecosystem. Specifically, we focus on the study of patterns related with the persistence of a meme within the network and the capacity of the system to sustain coexisting memes. Not only are we able of doing such analysis in an approximated continuum limit, but also we get exact results of the finite-size discrete system.


I. INTRODUCTION
An online social network (OSN) is a virtual social structure made of individuals using the Internet as a communication medium for interacting, sharing contents and opinions. OSNs allow hundreds of millions of Internet users worldwide to produce and consume contents, providing access to a very vast source of information on an unprecedented scale. Therefore, nowadays OSNs constitute mainstream communication channels to interact, exchange opinions, and reach consensus.
In recent years, it has increasingly become evident that competition significantly shapes the structure and the dynamics on these information-driven platforms [1][2][3]: users thrive for visibility, while memes can be thought of as entities that compete for users' attention.
Nevertheless, it is hard to disentangle the effects of limited attention from many concurrent factors [4], such as the structure of the underlying social network [5,6], the activity of users [7], the different degrees of influence of information spreaders [8], the intrinsic quality of the information they spread [9], and the persistence of topics [10].
Through the analysis of OSNs data, emergent properties, such as the emergence of consensus [1], viral spreading [11], power-law distribution of memes popularity [12] and echo-chambers effects [13,14], have been observed even among different platforms. Nevertheless, the availability of massive streams of online data does not give per se theoretical insights to understand these complex inter-plays among the structure of OSNs, memes popularity and users attention dynamics.
The aforementioned ubiquitous properties are signatures of the emergent simplicity characterizing complex systems. Therefore, it naturally calls for a statistical mechanics approach, i.e., the attempt to understand regularities at large scale as collective effects of the dynamics at the individual/meme scale. The bridge between these scales would also allow to better understand how these macroscopic (system-wide) patterns are affected by the microscopic dynamics of the interacting elements (users and memes) forming the OSN.
One of the first attempts to unveil the emergence of fat-tailed power-law distributions for memes popularity starting from an interacting particle model is the work of Gleeson and collaborators [15]. Therein, they studied a stochastic model with simple microscopic rules to describe the evolution, that is, the spreading of memes competing for the limited resource of users attention, on a Twitter-like OSN. In real life, each user in the OSN pays attention to a finite number of memes constrained by the finite capacity of users. In the model, this picture is simplified assuming that each user can just pay attention to a single meme; although further generalizations have already been considered [12].
The type of stochastic processes introduced in [15] has been used for decades in population genetics [16] and in ecology [17]. In particular, these models, based on simple stochastic rules, which neglect species (genes) fitness, represent "null" neutral models in which no intrinsic advantage is ascribed to a particular type of species (genes). Therefore, the fate of a species depends on its topological role in the network and on random demographic effects.
The concept of neutrality and the so-called neutral models have attracted a lot of interest in the communities of statistical physicists [18][19][20]. Indeed, neutral theory has been proven to be very successful when describing ubiquitous emergent patterns, which do not depend on the specific details of the system under study [21][22][23][24].
The analogy between memes competing for attention in an OSN and species competing for resources in an ecosystem suggests that the approaches introduced in the framework of ecological neutral theories can be exploited to extract novel relevant information about the dynamics of users attention. Different memes can be thought of as different species and, in addition to the spreading dynamics, new memes are introduced in the system by innovations events (speciation processes).
Such qualitative similarities between natural and "information" ecosystems have already been explored towards more quantitative accounts. Along this direction, Borge-Holthoefer et al. [25] have found evidence of nested structural signatures -a landmark feature in natural mutualistic assemblages [26][27][28][29] -when analyzing time-resolved online communication discussions after external information "shocks", e.g. breaking news. Similarly, Lorenz and collaborators [30], using a mathematical model based on Lotka-Volterra equations, have been able to explain some empirical data patterns in different OSNs. Moreover, they have suggested that relatively fast loss of attention is driven by increasing production and consumption of contents, leading to higher turnover rates and shorter collective attention for individual topics.
Following on this line of research, here we propose an analytically tractable neutral theory for competing attention in OSNs. In particular, we derive the equations for the evolution of users' attention to the different memes. The approach starts from a Master Equation describing the stochastic evolution of users' attention to a meme. In the limit of large networks, the equation is well approximated by a Fokker-Planck equation which resembles the dynamics of species abundances in ecological neutral theories [31]. We thus analytically compute several new quantities of interest to characterize communication dynamics such as: the number of different coexisting memes -a measure of diversity of coexisting ideas in the OSN -, the distribution of user attention to these memes, and the average persistence time of user's attention on a meme. Moreover, to investigate how the emergent properties depend on the underlying network of users interactions, we compare our theoretical predictions to numerical simulations of the model, and finally, test our approach against empirical stream-data from the Twitter OSN .
The work is organized as follows. In the next section, we introduce the interacting particle model describing memes propagation. In section III we derive the evolution equation for competing attention in OSNs. Therein, we present both (exact) discrete and (approximated) continuous description of the system, computing several relevant patterns of the OSN related to attention. As expected, the evolution of users attention depends on the OSN structure. Moreover, therein we comparesuccessfully validating-the results of the model with real data collected from Twitter. Finally, in section IV, we summarize and discuss the achievements of the proposed neutral model to the study of competing attentions in FIG. 1. Sketch of the model. Each node represents a user in the network with the color denoting the different memes on its screen. The directed edges stand for the relations of following. Specifically, our convention is that the edge starts in the node which is followed and ends in the follower. The node chosen to spread the meme is highlighted with a dotted circle. A time step is represented with its two possible outcomes, either an innovation event (with probabilityμ) or a spreading event (with probability 1 −μ). social networks.

II. A NEUTRAL MODEL FOR USER ATTENTION
Let us consider a directed OSN of N nodes where each of them represents a user. The network is solely characterized by its out-degree distribution p k , that is, a random user has k followers with probability p k . Each node can be thought of as the screen of the individual displaying the meme of current interest for that user. For the sake of simplicity, we assume that each screen has capacity for only one meme although some generalizations to this respect are possible [12,15]. Therefore, the state of the system at timet is given by the list of memes appearing in all the nodes.
The dynamics is introduced in discrete time steps representing the subsequent times where an action is carried out by any of the users. During each time step one node is picked at random and it either (i ) (re)tweets the meme currently on its screen to all its followers with probability 1 −μ, while its own screen remains unchanged; or (ii ) innovates with probabilityμ generating a brand-new meme that appears on its screen and is tweeted to all the followers. An illustrative visualization of one time step of the dynamics is shown in Fig. 1.
The configurational state of the system in a given time is described by the correspondence list of memes the users are paying attention to. We define the set of attention variables x m as the fraction of nodes in the system which are paying attention to meme m. Consequently, the normalization constraint m x m = 1 holds for all time.
Another relevant variable in the simulated OSN, as originally introduced by Gleeson and collaborators [15] is the memes popularity. The popularity n m of meme m is defined as the number of times that the meme m has been broadcasted. Because of this definition, n m is a non-decreasing function of time, which increases by one every time m is (re)tweeted. On the contrary, the evolution of attention x m can either increase or decrease until the meme disappears from the system. Note that, in our model, once a meme disappears from the OSN, there is no mechanism to re-introduce it in the system (innovation brings always brand-new memes). Therefore, once a meme goes extinct, its popularity remains constant for the remaining simulation time.
In the aforementioned work by Gleeson et al. [15], they semi-analytically find, and compare to numeric simulations, asymptotic power-laws behavior for the popularity distribution. In particular they have investigated the cases where p k = δ k,k * (constant number of followers for all users), and p k ∝ k −γ , i.e. a power-law distribution for the number of followers. In both cases, they have found that memes popularity has a fat tail distribution, but with different exponents [15]. Popularity can be measured from real Twitter data streams, as the number of (re)tweets since a given hashtag appears in the OSN. Although a systematic analysis of memes popularity distributions is still lacking, there are some results suggesting that indeed they display asymptotic power-law behavior [12], pointing out thus that the proposed model is a good candidate to elucidate at least this pattern observed in empirical OSNs. In the next section, we show how in fact, exploiting neutral theory, we can go beyond meme popularity distribution and analytically study also the diversity of coexisting ideas in the OSN and the the distribution of users attention to such coexisting memes, and also the average persistence time of user's attention on a meme.

III. NEUTRAL EMERGENT PATTERNS IN USERS ATTENTION
Because of the neutrality hypothesis, we assume complete symmetry among memes (i.e., we neglect meme identities or labels) and we thus consider the system as an effective two-memes system with large but finite N . We call A the meme we are focusing on, whereas B simply represents the collection of memes that are non-A. The variable x ≡ x A describes the fraction of users that are paying attention to meme A (i.e., the meme is on their screen) and hence x B = 1 − x. It comes in handy to define the total number of users paying attention to meme A as ν, i.e., ν ≡ x · N .
The dynamics, as described in the previous section, is defined in discrete time steps and obeys the Markovian property, i.e. the transition probability to the next state depends only on the current state. Consequently, we can properly formulate the time evolution of the probability P ν,t of having ν users paying attention to meme A at the time stept by means of a discrete Master Equation, which reads In the Master Equation (1), we have taken into account that there are three main different classes of transformations with different transition probabilities W . Namely, the selected node to act in a particular time step may be either A or B: this is indicated by the the first superscript in the transition probabilities W . The second superscript denotes the type of event, i.e. {s} if the selected node spreads the current meme with probability 1 −μ or {i} if it first innovates with probabilityμ and then spread the new meme. Note that, since B is defined as non-A, innovation of B generates also a B meme and thus there is no need to differentiate between spread or innovation when the selected node contains a B meme. The subscripts give information about the connected nodes of the selected node (its followers ). We define k ∈ {0, . . . , N − 1} as the number of the followers of the selected node, whereas j ∈ {0, . . . , k} is the number of followers having a meme on their screen that is different from the meme spreading from the i node. Thus, the dependence on the properties of the network is carried in the specific form of the transition rates. The detailed form of the transition rates W and the physical meaning of the Master Equation (1) are provided in Appendix A.
Assuming the innovation rate scales asμ = µ/N and defining the continuous time variable t =t/N 2 , it is possible to perform diffusive approximation of the Master Equation (1). The Kramers-Moyal expansion [24,32] yields the Fokker-Planck equation with respectively drift and diffusion coefficients The drift term is negative and drives the memes to extinction with a rate that increases with the innovation rate and the mean number of followers in the network k = k kp k . The dependence of the diffusion term on x(1 − x) [33] indicates that there is a limit (a carrying capacity, in the language of ecology) in the attention that both competing memes can collect, as we are considering a finite number of users in the system. The constant b increases the (demographic) fluctuations with k and the variability of the network through the variance Thanks to the Master Equation (1) and the Fokker-Planck equation (2), it is possible to study the emergent patterns in users' attention in the proposed OSN neutral model.

A. Mean persistence time for attention
The dynamics of the users' attention has an absorbing boundary at x = 0, i.e. the attention to a given meme will eventually go to zero. The time that a meme persists in the OSN receiving attention is a relevant quantity to study to better understand communication dynamics, as it gives insights on the virality of that meme, i.e., the longer it remains "active" the more probable it may go viral. In the ecological context, species persistence times [23] or lifetimes [34] have been thoroughly studied in different ecological neutral models and proved to be able to describe and thus elucidate the ecological mechanisms behind the power-law shape (with exponential cut-off) of species persistence patterns observed in real ecosystems [35]. Due to the stochastic nature of the neutral dynamics of the memes, these persistence times are random variables identically distributed among memes. Herein, we focus on the computation of the mean persistence time of a meme and its variance.
Previously, we have derived the forward evolution equations that govern the probability density function, P (x, t), of the attention. The probability density function Q(x 0 , t) of reaching the absorbing boundary at zero at time t, departing from initial attention x 0 , obeys the backward Fokker-Planck equation [36] with the same A and B defined in (3).
Multiplying (4) by t and integrating over all times, one obtains the differential equation that governs the mean persistence time τ (x 0 ) to reach zero departing from x 0 . That equation can be analytically solved with the proper boundary conditions (see Appendix B for details) giving the result where 2 F 1 stands for the ordinary hypergeometric function, γ is the Euler-Mascheroni constant, ψ denotes the digamma function, and the pair {a, b} is defined in (3).
Having an analytical form for τ (x 0 ) allows us to study how the persistence of attention varies for the OSN's innovation rate, and structure. We find that τ is a monotonic decreasing function of µ, k and σ 2 k , taking the rest of parameters fixed.
We find that, as expected, higher innovation rate favors the demise of living memes, and thus decreases the average persistence time on meme's attention. Similarly, higher connectivity of the OSN, measured by the average network degree, also drives, on average, to a faster decay of attention on existing memes. This result can be understood by observing that large k increases the diffusivity of the memes spreading in the systems, thus accelerating the time at which the attention to the meme will reach zero. Finally, we find that for fixed average connectivity, heterogeneity of the network, as measured by σ 2 k , disfavours, on average, long persistent attention on a given meme. In other words, we find that the presence of large hubs (influencers) coexisting with many poorly connected nodes (users) drives the communication dynamics towards a state dominated by fast, short persistent tweets, which disappear relatively quickly from the OSN.
Note that (5) is an approximated result, since it comes out from the diffusive continuum limit. To go beyond this approximation, we write the backward Master Equation for the discrete system, describing the evolution of the probability Q(ν 0 ,t) for a meme to reach zero users attention at timet, if at the initial time the meme has ν 0 users following it. This equation reads Multiplying byt the above backward Master Equation and summing over all times, we obtain a system of linear equations whose solution is the vector of mean persistence timesτ , whose components areτ (ν 0 ) with ν 0 ∈ {1, . . . , N }. This equation, written in matrix form, where the components of the matrix M are appropriate combinations of the transition probabilities W (see Appendix C for the details) and 1 is a N -dimensional vector of ones. Therefore, the exact result for the mean persistence times is given bỹ with I being the identity matrix in dimension N . Furthermore, one can obtain higher order moments using (6), multiplying it by powers oft and summing over all times. Specifically, we have done so in order to obtain the meme attention variance σ 2 τ = t 2 −τ 2 . The calculation of the mean squared attention persistence time results Detailed calculations are provided in Appendix C.
In Fig. 2 we have compared the results obtained from numerical simulation of the model regarding persistence time of memes with our theoretical predictions. We have done simulations for a system with N = 10 3 users. The two OSN structures chosen along this work in order to study the role of the network structure are: (i) homogeneous networks, i.e., p k = δ k,k * with constant k * and thus all users have the same number of followers, and (ii) scale-free networks with p k ∝ k −α for k ≥ k min . For further details on the simulation algorithm, see Appendix E.
The value k * = 10 has been chosen so to avoid falling in a regime very far from the diffusive approximation. As seen in the panel (a) of Fig. 2, the exact results perfectly agree with the simulations for both mean value τ and standard deviation σ τ . Remarkably, in spite of the diffusive approximation, the prediction given by (5) still gives a very good estimation of the persistence time of the memes. This relative success of the diffusive approximation is reasonable in a system where there are not big chances of having burst events. Nevertheless, we expect the diffusive theory to fail when the distribution allows such burst events. In fact, if k * is very large, then in a single step it is likely that most of the users in the network change their attention, and thus the dynamics regime for the attention is bursty rather than diffusive. This is the case for example in scale free OSNs, where p k presents long tails, i.e. few nodes have very large degree centrality.
Indeed, the most interesting case is when users have a distribution that is strongly uneven as in real OSNs. To investigate the impact of such heterogeneous structure on the dynamics, we have considered a power-law out-degree distribution which has been also found empirically [12] in some OSNs. More specifically, we assume that p k is zero for k < k min and p k ∝ k −α for k ≥ k min [15]. In particular, in the following we consider k min = 4 and α = 2.5 so that we have an average number of followers k ≈ 10 close to the previous case k * = 10, but with much higher variance. In the panel (b) of Fig. 2 we observe clearly that the result of the exact approach using the backward Master Equation matches accurately the values found in the numerical simulations. The diffusive approximation, instead, as expected because of the presence of bursting events, fails to quantitatively describe the observed patterns.
B. Diversity patterns in information ecosystems.
Biodiversity in natural ecological systems plays a fundamental role for preserving ecosystem functioning and health. Similarly, diversity of memes is a crucial aspect if we want to maintain a "healthy" information in social systems, where plurality of ideas circulate, in opposition to echo-chambered few polarized amplification of ideologies in the OSN.
In this section, we focus specifically on two different aspects of meme diversity: the number of different memes coexisting in the system and how user attention is distributed among such memes, i.e. how memes are distributed on users screens.

Number of coexisting memes
The capacity of attention of users in an OSN is finite and affect memes spreading [9,37,38]. Therefore, a given network cannot sustain an arbitrarily large number of memes. Interestingly, due to both neutrality and constant innovation rate (µ) assumption, it is possible to exactly calculate the probability P S of having S different memes coexisting in the OSN [39]. In fact, the stationary solution for P S , reached in the large time limit, is a Poisson distribution with an average number of memes given by that for the particular case of a delta peaked out-degree distribution p k = δ k,k * , simply reduces to S =μτ (k * + 1). Interestingly, the Poisson shape of the distribution of the memes diversity is "universal" regardless the details of the underlying network. The specific effect of the network structure is in determining the average number S of the coexisting memes in the systems. This result was originally derived in the ecological context for a coarse-grain birth-death model for species diversity [39]. However, in our model memes may appear with any ν 0 initial number of users paying attention to it, with a probability which is given by the out-degree  (panel (a)) and scale-free (panel (b)) networks. The simulations (symbols) agrees perfectly with the exact solution (blue line) given by (8) in both panels. Moreover, the diffusive approximation in (4) (red line) gives also a quite good estimation in the homogeneous network whereas that is not the case in the scale-free network. We have considered a system with N = 10 3 , an out-degree distribution p k = δ k,10 and µ = 15 for the panel (a); while we have used p k ∝ k −2.5 for k ≥ 4 and µ = 100 for the panel (b). In the inset, the variance of the persistence time shows a perfect agreement with the exact solution (9). distribution p ν0−1 . We can thus see how the diversity of ideas circulating in the network is promoted by large mean persistence times, and thus scale free networks are detrimental plurality of ideas in the OSN and promote strong polarization (and visibility) of few ideologies.
Note that, equivalently, we can formulate equation (11) using the framework of continuous time. It simply reads S = N µ 1 0 dx 0 τ (x 0 )p(x 0 ), where we have used that, in the continuous description, innovation is a Poisson process with rate N µ and p(x 0 ) is the continuous counterpart of the out-degree distribution, giving the initial fraction of users attention when the meme appears in the OSN. Consistently, the fraction S /N depends only on the parameters of the continuous model.
In Fig. 3 we have compared the theoretically predicted Poisson distribution with the histogram obtained in numerical simulations for long time in the systems studied above, that is, we consider N = 10 3 and either µ = 15 and p k = δ k,10 ; or µ = 100 and p k ∝ k −2.5 for k ≥ 4. Moreover, in the homogeneous network, we have considered in the inset different values for the innovation rate µ to study the dependence of the average number of memes. On the one hand, we obtain a perfect agreement when we use the exact calculation forτ as given by (8) especially for the homogeneous network. Poisson distribution, explains quite successfully the pattern observed in the scale-free network although the quality of the agreement is slightly lower, but still quite good. On the other hand, although we did not expect a quantitative agreement (because of the bursty nature of the communication dynamics), also the prediction provided by the diffusive approximation (5) remarkably shares the same qualitative behavior. As expected, both theories and simulations converge to S = 1 for the limit case µ → 0 since, in the absence of innovation, fluctuations eventually drive the system to the absorbing state and thus there is only one meme surviving at the end.
Finally, and motivated by the good agreement between theory and numerical simulations, we decided to test our predictions in a real scenario against data from the OSN Twitter. To do so, we collected samples of Twitter activity freely available from the Twitter Streaming API, spanning a period of 4 years between 2015 and 2018, and comprising more than 135 million tweets (see Appendix F for details on the data collection).
For the comparison, we assume that each meme in the model represents a hashtag in Twitter. Its popularity is traceable through the number of times it has been observed in the data set. However, given obvious privacy restrictions in the data that allow to see when a hashtag has been tweeted but not when it appeared on a screen, some quantities such as the attention paid to a hashtag at a given time cannot be directly measured. For these reasons we decided to limit the comparison only to observables that can be estimated directly from the data, such as the distribution P S of different memes present in the system and the mean number of co-existing memes S as a function ofμ (see Appendix F for the estimation ofμ and S from the data).
In Fig. 4 we show an analogous information to that presented in Fig. 3, but using the data extracted from Twitter. In this case, since each sample in the data has a differentμ and different size, the theoretical distribution (blue line in Fig. 4) has been obtained as a weighted average of different Poisson distributions given by Eq. (10), one for each value ofμ observed in the data (see Appendix F). In the inset of Fig. 4 instead, we repeated exactly the same analysis done for Fig. 3. Remarkably, given the correlations and possible biases in the data that could contradict the assumptions of our model, the good agreement demonstrates that, at least at the macroscopic scale, such emergent human communication patterns in OSNs can be explained successfully by the proposed neutral model.

Relative Memes Attention
Equations (10) and (11) characterize diversity of memes at stationarity, but it gives no information on the relative attention that users give to such memes. We are thus interested in this new pattern that can be described by the probability P RM A that a meme receives attention from a fraction x of users in the OSN. We name this pattern as relative memes attention (RMA), which is completely analogous to the relative species abundance (RSA) pattern in ecology [18,21,24]. This quantity is clearly relevant in communication dynamics, as it allows to understand how users' attention is distributed among memes. For example, when the RMA is peaked around a specific value, each meme attracts attention of a characteristic number of users. On the other hand, if RMA is fat tailed, then users' attention is strongly unevenly distributed among tweets, with some meme attracting attention of the majority of users.
Let us define φ(x) as the density function of the average number of memes receiving an attention x. Note that the only difference between φ(x) and P RM A (x) is the normalization constant, being the latter normalized to unity, while the former to the total number of different memes S. In the stationary state, the memes that contribute to the RMA are those which have been generated by innovation a time t before and they are still present in the OSN. They thus contribute to φ(x) with an amount 1 0 dx 0 p(x 0 )P (x, t|x 0 ), which is integrated over all possible initial configurations x 0 , weighted by the out-degree distribution. We have explicitly included the initial condition in the solution of the forward Fokker-Planck equation (2). Since the number of species generated in a small time interval dt is simply µN dt, the RMA finally reads [18] The application of Eq. (12) is not straightforward, since we need the full time solution for P (x, t|x 0 ). Although the Fokker-Planck equation (2) cannot be exactly solved, nevertheless we have an approximated solution that can be obtained by neglecting the quadratic term in x in the diffusion coefficient (3b), i.e. the diffusive term is simply ∝ x [31]. This approximation is particularly good if one can assume that, during the time evolution, the invasion of most of the users screens by one single meme is unlikely.
Plugging the full transient solution of P (x, t|x 0 ) [31] into (12) is neither immediate and an explicit analytical integration is not possible. However, if x 0 is small enough, a Taylor expansion of P (x, t|x 0 ) in x 0 is suitable and makes the integration possible. Once the integral is carried out, we just need to take into account the normalization of P RM A to finally obtain where we recall that ψ denotes the digamma function. The normalization has been imposed from 1/N that is the minimum non-vanishing value for attention fraction up to infinity. Note, that our assumptions legitimate the change of one by infinity in the upper bound. For a detailed derivation of (13), see Appendix D. We find that the outcome of our prediction for the RMA is a log-series distribution, rekindling a classical result for RSA in ecological systems. [40]. Such RMA displays a power law behavior p(x) ∼ x −β with an exponent β = 1. Therefore, as expected, our model predicts that users' attention is strongly heterogeneously distributed among memes, with most of them attracting a negligible number of users, while few (the viral ones) catching the attention of almost all of the OSN's users.
We compare the log-series with the relative meme attention obtained from numerical simulations of the interacting particle model in Fig. 5. As before, the results for both networks homogeneous and scale-free are shown. On the one hand, in spite of the strong approximation we have carried out, the log-series reproduce very well the simulated pattern for the homogeneous network. On the other hand, as expected, we find that the approximated diffusive theory cannot reproduce such simulated pattern, which is the result of a bursty dynamics. Interestingly, in this latter case P RM A is compatible with a power-law with exponent 1.5 in a broad range of attention regime.

IV. DISCUSSION AND CONCLUSIONS
Having high diversity in information ecosystem is not less important than having a rich biodiversity in natural ecosystems. In fact, diversity and heterogeneity of ideas in OSNs is a crucial aspect for the quality of the deliberative process [41]. Online spaces dominated by one or few visions, i.e., with very low biodiversity, represent biased information ecosystems where phenomena such as fake news, filter bubbles, and echo chambers crystallize beliefs and annihilate diverse opinions [42]. Therefore, being able to characterize diversity patterns in information ecosystems is not only a theoretically intriguing problem, but also an important aspect to measure their "health".
In this work, we have proposed an analytically tractable neutral theory to describe the dynamics of user attention to competing memes in OSNs. In particular, we have shown that we are able to compute several new quantities of interest such as the number of coexisting memes, the distribution of user attention among these memes, and the average persistence time for attention on a meme. All these emergent properties have an ecological analogy in natural systems, suggesting that an ecological approach to study information ecosystems can open novel paths and understanding of the dynamics of memes in OSN.
By comparing our theoretical predictions to numerical simulations of the model, we have shown that the continuous approximation provides accurate results when the dynamics of the user attention x is diffusive rather than bursty. This, in turn, depends on the underlying architecture of the user-user network. In fact, if the network is characterized by a scale-free degree distribution, then the diffusive approximation fails to quantitatively reproduce the biodiversity patterns, although the qualitative behavior is still well described. Using a backward semianalytical Master Equation approach allows to overcome this limitation and to correctly predict the mean number of different coexisting memes and their mean persistence attention time. We note that having an analytical theory to calculate such quantities is of paramount importance, as it allows to easily understand the effect of the system parameters (such as the network connectivity or the innovation rate) to the persistence of active memes in the OSN and their diversity. Moreover, we note that, because of very strong finite size effects and fluctuations in the dynamics, simulations may lead to wrong conclusions especially when considering, as it typically is, large and heterogeneous systems.
Just as for the numerical simulations, the excellent agreement between the theory and the patterns extracted from the Twitter samples have important implications, both theoretical and practical. First of all, it indicates that our framework can provide information on the status of real information ecosystems. From few samples, it could be possible to estimate the plurality and "health" of online discourses. Moreover, our results demonstrate that OSNs can be studied through the lens of neutral theory. Such a result draws a direct connection between information and natural ecosystem and allows the use of all the machinery developed in theoretical ecology [24] for the study of online human interactions.  (panel (a)) and scale-free (panel (b)) networks. The probability PRMA(x) of finding a meme which receive a fraction x of attention is plotted. Theoretical prediction (red line) given by (13) reproduces remarkably well the tail of the pattern obtained by numerical simulations (symbols). We have considered the same model parameters to those chosen in Fig. 2.
We would like to stress that the success of our theory in explaining and reproducing real patterns, does not translate in the claim that our model is capturing all the different mechanisms driving social interactions and meme spreading. Instead, our work show that we have put forward a "simple enough" model to explain and reproduce successfully some real emergent patterns in OSNs, hinting that for such properties, it is competition the dominating driver.
Finally we note that, differently from popularity measures, user attentions to competing memes is not a feature that can be directly measured from data but can only be evaluated through some proxy. To this regard, a future perspective of this work is to connect mean persistence attention time and RMA to actual measurable proxies. Doing so, we could also understand if -and when-the neutrality of the dynamics is broken. For instance, strong external events, such as breaking news, may have an impact to the attention dynamics that cannot be described in terms of demographic stochasticity, but calls for incorporating in this framework also environmental noise [43] and non-neutral effects [44].

Appendix A: Transition probabilities and diffusive approximation
We put forward explicitly, in this appendix, the functional form of the transition probabilities assumed in our model of OSN. As explained in the main text, there are three different classes of transformations that correspond to: • Spread of a node that holds the meme A and has k followers, j of the which hold the meme B. The probability of such a transition in a system with ν nodes carrying meme A is • Innovation of a node that holds the meme A and has k followers, j of the which hold the meme A.
The probability of such a transition in a system with ν nodes carrying meme A is • Action of a node that holds the meme B and has k followers, j of the which hold the meme A. The probability of such a transition in a system with ν nodes carrying meme A is All expressions are obtained as the product of four probability factors. The first one is the probability of choosing the node that acts. The second one is the probability of the type of event either spread or innovation. The third one is the probability of having k followers. The last one is the hypergeometric probability coming from the distribution of the possible states of the k followers. Note that the "second" factor in (A3) is a factor 1 that is not explicitly written. The Master Equation (1) is simply a balance of gained and lost probability due to transitions between states. Note that ν − ∆ν appears as the argument of P with ∆ν being the change in the number of users paying attention to node A.
In order to perform the diffusive approximation, we consider the limit N → ∞, with x = ν/N . In such a limit, we can approximate the hypergeometric distribution with a binomial one, which yields the continuous description of the transition rates Carrying out an expansion of the Master Equation (1) in powers of N −1 , introducing transition probabilities as in (A4) and assuming scaling of time t =t/N 2 and innovation rate µ = Nμ as introduced in the main text, we achieve the diffusive approximation given by the Fokker-Planck equation (2) with the coefficient reported in (3).

Appendix B: Backward Fokker-Planck equation and lifetime distribution
We have obtained the forward Fokker-Planck equation (B1) The backward version of this equation, governs the probability Q(x 0 , t) of reaching the absorbing boundary placed at x = 0 departing from x 0 after an evolution of time t. Consequently, the differential equation that holds for the probability of reaching the absorbing point at any time Π( where we have used that Q(x 0 , 0) = lim t→∞ Q(x 0 , t) = 0. Since, the only absorbing boundary is that at x = 0, the meaningful solution for the previous equation is Π(x 0 ) = 1. In other words, all the memes eventually will reach the extinction.
To study the mean lifetime τ (x 0 ) of the memes we need to multiply the backward equation (B2) by t and then integrate for all time. After assuming regular behavior for the boundary terms and taking into account that Π(x 0 ) = 1, we get Imposing the boundary conditions τ (0) = 0 and lim x0→1 τ (x 0 ) < ∞ we obtain solution (5) presented in the main text.

Appendix C: Backward Master Equation and lifetime distribution
The backward Master Equation that rules the dynamics of the probability Q(ν 0 ,t) of reaching extinction departing from a initial attention of ν 0 aftert time steps is written in (6). The probability in the next time step is a linear combination of the probability in the current one. Therefore, we can write a vector equation of dimension N + 1 using matrix notation with Q(t) being the vector with components Q(ν 0 ,t), for ν 0 = {0, . . . , N }, and M the (N + 1) × (N + 1) matrix with elements In the previous equation, to prevent from clutter our formulae, we assume that transition probabilities are equal to zero when the indexes involves a transformation with no physical meaning, e.g. second subindex does not belong to the interval [0, k]. Probability conservation is reflected in the property The absorbing boundary condition at ν = 0 makes that Q(0,t) = δt ,0 . Hence, in dimension N , we have that where Q(t) is the vector Q(t) after removing the first component corresponding to ν 0 = 0, M is the submatrix of M obtained after removing the rows and columns corresponding to value 0, and M 0 is a vector column that contains the transition probabilities to the extinction state. We define the total probability of arriving 0 departing from n as the time sum where the components of Π are Π(ν 0 ). Summing the backward Master Equation (C4) for all time, and taking into account that Q(0) = limt →∞ Q(t) = 0, we get Since property (C3) holds, we have that the solution for the previous equation is simply Π = 1, where we have used the same notation for the vector of dimension N full of ones that in the main text. Consistently with our results in the continuous description and with our physical understanding, we get, in this exact framework, that all the memes will reach extinction eventually. The mean first passage timeτ (ν 0 ) to reach the absorbing point starting from the state ν 0 is given bỹ Multiplying (C4) byt and summing for all time, we obtain the equation for the mean persistence time (7) presented in the main text with its corresponding solution (8).
It is also possible to compute the average of the square passage time, which is useful for the variance. We define Multiplying (C4) byt 2 and summing for all time, we obtain which has the solution (9) presented in the main text.

Appendix D: Derivation of the relative meme attention
Our starting point here is the approximated Fokker-Planck equation after neglecting the squared term in the diffusion coefficient, that is, (D1) From now on, we consider that the equation is defined in the region x > 0. We understand the neglect of the squared term as a rescaling of the attention that moves the boundary from x = 1 to infinity. The solution of (D1) submitted to an absorbing boundary at x = 0 and the initial condition P (x, t|x 0 ) = δ(x − x 0 ) is [31] P (x, t|x 0 ) = 2a b A Taylor expansion of this solution up to linear order yields With this approximation, valid for small x 0 , we can carry out exactly the integration for all time that results where we have explicitly written the dependence with x. Since the relative meme attention P RM A is proportional to the integral in (D4), we just need to impose the normalization condition. Taking into account that the minimum non-vanishing value for the attention is 1/N , and assuming, as said above, that the upper bound is at infinity, we end up with the P RM A reported in equation (13) of the main text.

Appendix E: Simulation details
In this appendix, we cover in detail the algorithm used to carry out the numerical simulations presented in this work. We have explicitly simulated the microscopic dynamics described in the main text. Specifically, we consider a system made by N nodes. Each of them carries just one meme. The dynamics is generated by repetition of the next recipe for each time step: • We randomly choose a node with homogeneous probability 1/N .
• With probabilityμ, the node innovates the meme.
• The degree of the node k is chosen from the out degree distribution p k .
• From the available N − 1 nodes, we randomly choose the k followers.
• The meme of the firstly chosen node spreads to all the followers.
Note that the connections in the network are not fixed, we work with a random network which is dynamic. This random dynamic feature guarantees that the approach given by the Master Equation used in the main text is the correct mathematical description of our system.
For each set of parameters, we have run N S = 100 simulations of a total durationt f = 5 · 10 6 time steps. The measurements that correspond to the persistence time comes from the average of all registered times for those memes which reach extinction inside the simulation window. Recording the history of all memes, it is easy to recover the times to extinction starting from all possible initial attentions. We just need to subtract the time in which the meme has that initial attention to the extinction time of the corresponding meme. We have increased the statistics of the measurements of biodiversity, which are performed in the stationary state, taking different times of observation. We have taken not onlyt f , but alsot f − n ∆t with n = 1, . . . , 10 and ∆t = 2.5 · 10 5 time steps. The choice of ∆t has to guarantee some requirements. On the one hand, ∆t has to be big enough to prevent correlation between the different observation times. On the other hand, ∆t has to be small enough to assure that the system is close enough to the stationary state in the first observation timet f − 10∆t.
In general, in the main text we report all values using the continuous scales, which is related with the discrete dynamics through a factor N 2 for the timescale, t =t/N 2 , and a factor N for the attention x = ν/N .

Appendix F: Twitter data
In this appendix, we describe the details of the data collection from the OSN Twitter and the analyses used in the comparison with the theoretical predictions. In order to get an unbiased sample of Twitter activity, we collected data from the Twitter Streaming API. The data collection covers a period of 4 years from January 2015 to December 2018. Only geolocalized tweets originated in a rectangle area covering the UK have been requested. This latter condition allowed us to avoid the 1% total traffic limitations imposed by the Streaming API and to assure that the majority of tweets and hashtags would be in English. Finally, to protect privacy, users' information and the tweetID have been anonymized through a hash function before their storage and, in any case, for each tweet, only the timestamp and the hashtags contained in the text have been used in the analyses. The complete data set comprises approximately 135 million tweets containing, at least, one hashtag.
To reduce the possible noise generated by misspelled hashtags or bots, we filtered out all the hashtags that have been twitted less than -i.e. have a minimum popularity of-5 times. We have explicitly checked that our results remain almost unchanged under different filters between 3 and 500. After the filtering and with the aim of different samples for the statistical analyses, we divided the final data set in 10 4 bins/samples, each of the size of around 12000 tweets. Moreover, to test the robustness of our results with respect to the choice of the number and size of the bins, we also repeated the calculations for 10 3 , 2 × 10 4 and 5 × 10 4 bins along with one test in which we considered one day in the data as one bin. In all the cases, we find robust results that behave qualitatively in the same way that to those presented in Fig. 4.
Once we filtered and divided the data, for each bin/sample, we estimate the innovation rateμ as the fraction of new hashtags -i.e. used for the first time-observed in a bin over the total number of unique hashtags observed until that bin. Note that, therefore, the innovation rate does not remain constant. This dynamics is in contrast to our model. Nevertheless, each bin/sample can be thought of as a realization of our model with different innovation rates. For S/N, we take the number of unique hashtags surviving at the end of each bin normalized by the maximum value of S /N given by the theory. The histogram in Fig. 4 is made just binning and counting these data along the S/N axis. In order to get a better statistics to determine average values, we group the samples binning inμ, where we call nμ to the number of samples considered within the bin with center at µ. Therefore, we obtain in this way an averaged number of coexistent memes S /N and its variance σ 2 S/N for each considered innovation rate. The inset of Fig. 4 is constructed with these averages, where the size of the errorbars are given by the σ S/N .
For the construction of the theoretical prediction in Fig. 4, we have used, in an auxiliary way the full data set, that is, the full list of corresponding (μ, nμ, S /N ). Specifically, we have used a weighted combination of Poisson distributions, one for each bin ofμ with its observed S /N . In other words, we have defined a weight wμ = nμ/ μ nμ and we have built the distribution as the linear combination of Poisson distribution P (μ) S parameterized by the average S corresponding to that µ. Note that, in order to combine the Poisson distributions, and then rescale from S to S/N , the value of N is required. Therefore, we have exploited the Poisson theoretical prediction associating an effective size to the data set given by which is also averaged over bins.
For the theoretical prediction in the inset, since Twitter users' out-degree (followers) distribution is extremely fat-tailed [12,45], in the model we assumed the same power-law degree distribution than the shown in the main text. We have used such parameters, also introduced in [15], in the absence of a better knowledge of the real network.