Dynamical approach to Zipf’s law

,


I. INTRODUCTION
Zipf's law [1,2] is an empirical scaling relation that connects the sizes of a set of objects with their ranking when sorted according to the size itself. Being ubiquitous in nature, it represents one of the most studied topics in complex systems: it has been observed in the size distribution of cities [3], of firms [4] and of GDPs [3], but also in natural language [1], in web page visits [5], in scientific citations [6,7] and many natural systems, such as earthquakes [7,8] and lunar craters [7]. There are currently numerous approaches to explain Zipf's law based on different mechanisms including multiplicative processes [9,10], adjacent possible framework [11,12], sample space reducing processes [13], and information theory arguments [14,15]. However, while all these models give insights on the upset of Zipf's scaling, they fail in providing a general explanation of the phenomenon.
In this work we show that the dynamical evolution in space or time of these models and natural systems, when analyzed from the perspective of Zipf's law, provides unprecedented quantitative insights. A first result is that while some systems are dynamically attracted toward Zipf's law, others show Zipf's law only temporarily and, as a consequence, we can label them as spuriously Zipfian. This relatively simple observation has a number of crucial implications. Indeed, spuriously Zipfian systems during their evolution deviate from Zipf's law by sampling the tail of the (power-law) probability density function (PDF) from which their sizes are extracted. More precisely large events are completely sampled. This allows to determine the upper cutoff of the PDF and since we demonstrate that earthquakes follow Zipf's law only spuriously, we can determine * Correspondence email address: andrea.zaccaria@cnr.it the maximum possible magnitude of an earthquake occurring in a given geographical region. Conversely, systems for which Zipf's law is a dynamical attractor are intrinsically under-sampled and, as such, in a sort of permanent out of equilibrium or transient state. These genuine Zipfian systems are characterized by a dynamical constraint, that we call coherence, relating the sampling rate to the lower and upper cutoffs of the generating PDF. Natural language, the first and most famous application of Zipf's law, naturally satisfies such constraint due to the presence of grammar and semantic rules. Moreover, our approach allows to understand why Zipf's law holds for the cities of single countries, but not for larger sets of nations, a phenomenon abundantly debated in the literature [3]. Finally, we applied our dynamical approach also to generative models such as multiplicative processes [10] and the Yule-Simon model [16,17]. We analytically show that both processes are genuinely Zipfian and, as a by-product, that the preferential attachment mechanism reproduces the dynamical evolution of world countries. Using these results we can provide also a novel insight about Heaps' law [18], a likewise ubiquitous scaling law whose connection with Zipf's law has been largely debated. By generalizing previous analyses [19], we demonstrate that Heapsian systems are a subset of the Zipfian ones. This allows us to predict that US cities evolved according to Heaps' scaling, a result confirmed by empirical analysis. From this body of analysis it clearly emerges that our framework is completely general and can be applied to any system or model claimed to show Zipf's law.

A. Dynamical constraint determine Zipfian dynamics
In order to fix the notation, we give a formal definition of Zipf's law. Given a set of N objects and denoting by arXiv:1911.04844v3 [physics.soc-ph] 13 Jul 2020 For instance, N could be regarded as the number of cities in a country and S(k) as the population of the kth most populous urban settlement. More generally, S(k) can be fitted using the Zipf-Mandelbrot relation [15], which accounts for the presence of deviations at low ranks: whereS and Q are generally regarded as free parameters. Zipf's law is recovered forS = S(1) and Q = 0: as such, Q can be regarded as an empirical measure of the deviation from a pure Zipf's law, whose importance is also given by the specific involvement of the largest objects. Zipf's law is usually associated to a power-law probability distribution function (PDF) according to which sizes are distributed [3,7]. Real power laws are always characterized by an upper and a lower cutoff, which correspond to intrinsic physical limits and that will play a fundamental role in determining the Zipfian behavior of the system[20]. Let us suppose to consider N objects whose sizes are distributed according to a truncated power law PDF of the form where s m is the lower cutoff and s M the upper one. As we show in the Methods section, the parameters of the Zipf-Mandelbrot relation (1) describing these objects can be written as simple functions of N and the parameters of the PDF: Note that we also recovered the well known result γ = 1 α−1 which links the exponent of the PDF to the asymptotic exponent of the rank-size plot [3,21]. Moreover Q is explicitly related the level of sampling of the distribution, relating the extension of the PDF to the number of elements in the system, and it is a non-negative parameter. Substituting these expressions into the Zipf-Mandelbrot scaling law, we get the rank-size relation followed by N objects whose sizes are power law distributed: All the systems where Zipf's law is found are, in a certain sense, dynamical: both the number N of objects in the system and the parameters of the generating PDF will in general vary over time. Since the adherence to Zipf's law, as measured by Q, is a function of both N and these countries, the novel Moby Dick, a multiplicative process, the conditioned sampling and a typical trajectory of the Yule-Simon process. Real systems are denoted by circles, while models by triangles. We recall that for earthquakes the magnitude is the logarithm (in base 10) of the size i.e. of the maximal amplitude detected, for cities the size is the population, in the case of words size is the number of occurrences while for countries the size is defined as the GDP per capita. Italian earthquakes and world cities are moved toward high values of Q by dynamics and so evolve incoherently. Differently, Moby Dick, US metropolitan areas and World countries tend to the low Q region and consequently show a Zipfian dynamics. For what concerns models, the conditioned sampling is not Zipfian, while the Yule-Simon model and the multiplicative process show Zipfian dynamics. However while the multiplicative process performs a trajectory very far from the ones of the real systems we considered, the evolution of the Yule-Simon process (blue triangles) is very similarly to the one of world countries (green circles).
parameters, it is then natural to investigate how Q is influenced by the dynamics, so to determine if Zipf's law is dynamically stable. Consider, for example, a very simple situation in which new objects are drawn with the passage of time, keeping the parameters of the PDF fixed. From Eqs. (3) it is clear that in this case the deviation parameter Q is expected to increase with time. As a consequence, a static approach would address the system as Zipfian while N remains relatively small, and would address it as not-Zipfian when N becomes large with respect to s M sm α−1 . Clearly, one should consider the first situation as a spurious manifestation of Zipf's law, because it is only due to a temporary under-sampling of the PDF. In other words, in this case Zipf's law is not stable and consequently it does not represent an attractor of the dynamics. Real systems are characterized by different and non trivial behaviors concerning the dynamics of the rank-size plot. Fig. 1 provides some empirical examples of such dynamics. In the first column we show the rank-size plots of the earthquakes occurred in Italy, collected from INGV historical dataset [22]. We recall that in this case the size S is defined as the exponential of the (moment) magnitude of the earthquake considered. In panel (a) we consider only the earthquakes occurred between 1900 and 2000, that per-fectly adhere to Zipf's law, while in panel (b) those which occurred in the same area, but during a wider time window . In this last case the first ranks clearly deviate from a pure Zipfian scaling being Q relatively large. Note, however, that other systems show a different trend when N is increased, as in the case of the novel Moby Dick, second column of Fig. 1, and US metropolitan areas, third column of the same figure. Here the sizes are, respectively, the number of word occurrences and population. The population of US metro areas comes from the work of Schroeder [23]. In the case of Moby Dick, an increase of the words considered does not result in a increase of Q, which, instead, slightly decreases. The dynamics of US metro areas is even more peculiar and evident. Indeed metro areas were not distributed according to Zipf's law in 1790 (panel (e)), soon after the Declaration of Independence, while this scaling relation is found considering the same system at the beginning of the last century (panel (f )). In both these systems N is increasing and the parameters of the PDF are varying, but, unlike from what happens considering earthquakes, they are varying coherently, that is, in such a way to make Q decrease with the dynamics. The essence of a genuine Zipfian system, as opposed to a simpler set of objects whose sizes are drawn from a power law distribution, is then contained in the dy-namical relations among N and the parameters of the PDF α, s m and s M .

B. Coherence of the "time" evolution
The considerations of the previous paragraph strongly suggest that an indicative study of Zipf's law can be performed only dynamically, by checking the deviation parameter Q as a function of time. Therefore, we propose to focus the attention not on the static identification of systems which show Zipf's law, but on the dynamics which makes N and the parameters of the PDF evolve coherently, that is, in such a way that Q decreases. We call this behavior Zipfian dynamics: • a system shows Zipfian dynamics when the rank-size plot, following the dynamics, does not present increasing deviations from a straight line. In mathematical terms, this is equivalent to requiring the underlying PDF to be a power law and the parameter Q(n) to be not increasing with n, where n is a variable which plays the role of time.
We call genuine Zipfian a system which shows Zipfian dynamics.
In order to visualize the evolution of systems under this perspective, we introduce the Zipf 's plane, defined by the axes N s  Fig. 2 shows how the systems introduced above and other we will study in detail in the following moves in the Zipf's plane. The color gradients from blue to yellow correspond to increasing values of log Q. A trajectory from the yellow to the blue region corresponds to a Zipfian dynamics, while going in the opposite direction indicates that Zipf's law can be followed only temporarily and so spuriously.
The implications of Zipfian dynamics can be better understood considering as temporal variable the sum of sizes at a given point of the evolution, n = N k=1 S(k).
For example, in regard of cities, n coincides with the total urban population, for what concerns books it represents the total number of words used in the sampling, while in the case of earthquakes it is directly related to the total energy released by all the events considered. Using n as a temporal variable, the condition for the onset of a Zipfian dynamics can be written as where we defined R = s m /s M . This yields This dynamical constraint, that we call coherence, relates the growth of the probabilistic space (left side) to the growth of the physical space (right side), regarded as the number of groups or elements N into which n is partitioned. For systems showing Zipfian dynamics, a fast increase of N must be compensated by an even faster growth of the range of the PDF. This implies that Zipfian systems are out of equilibrium driven, because even if the system evolves and the number of elements enlarges, the underlying distribution is never completely sampled. In other words, for Zipfian systems, the empirical frequencies do not coincide with the generating PDF and this is a direct consequence of coherence. Indeed the growth of the probabilistic space is faster than the enlargement of the physical one, making the upper cutoff grow faster than the rate at which the probabilistic space is explored. Conversely, if the dynamics is not Zipfian and the system shows Zipf's law only temporally and so spuriously, the system evolves toward equilibrium: the probabilistic space is fully explored and the empirical frequencies become a good approximation of the inherent PDF. We can then divide the systems showing Zipf's law in two distinct categories, which show radically different dynamical properties: • Spurious Zipfian systems which present Zipf's law only at a certain moment of their evolution due to a temporary and accidental under-sampling. For these systems, Zipf's law does not represent an attractor of the dynamics and therefore this scaling law can not be used for characterizing their behavior. In this case the eventual observation of Zipf's law is entirely attributable to the underlying scale free distribution; • Genuine Zipfian systems which dynamically evolve toward Zipf's law. Such behavior is produced by a fast enlargement of the probabilistic space, which makes these systems be out of equilibrium driven. In this case, the underlying scale free distribution is not sufficient to explain the stability of Zipf's law, indeed there must also be a dynamical constraint, that we call coherence, making the parameters of the PDF evolve in an appropriate way.
These different dynamical behaviors can be both visualized through the Zipf's plane, Fig. 2, and by considering the evolution of Q with respect to n, as shown in Fig. 3. In this last case, red trajectories represent spurious Zipf's law, while blue ones are indicative of a Zipfian dynamics, so of genuine Zipf's law. The figure well demonstrates the need of approaching Zipf's law from a dynamical point of view, because in this way systems which could appear, at first glance, statistically similar, show all their differences.

C. Relation with Heaps' law
The dynamics of N with respect to n is usually described in terms of Heaps' law. Lu et al [19] derived an expression relating the growth of n to N , more precisely This relation holds under the assumption of a stable Zipf's exponent γ and lower cutoff equal to one. In the limit We considered also casualties provoked by volcanic eruptions [24] and a typical trajectory of the Sample Space Reducing process [13].
N → ∞ the scaling of N thus satisfies Proceeding as done by Lu et al, but taking into account the presence of an eventually non-zero Q, we can write n summing over the N sizes of the elements composing the system Recalling thats = N γ s m and using the expression for Q we obtain This expression has to be compared with that derived by Lu et al Eq. (6). First of all, let us consider a system for which the cutoffs s m and s M are fixed and let us suppose s M s m , in this case Eq. (8) The conclusion is that, apart for an initial transient, the growth of N (n) is linear for any γ if the cutoffs are fixed. The crossover N c between the two regimes satisfies Clearly an analogous conclusion holds also if the cutoffs are not fixed but the dynamics is not Zipfian. Indeed in this is growing and therefore the scaling Here we represented these different behaviors for γ = 2, by performing a random sampling from a power law distribution with α = 3 2 and different upper cutoffs, namely s M = 10 2 (panel (a)), s M = 10 4 (panel (b)), s M = 10 6 (panel (c)) and s M = 10 8 (panel (d)). Red dashed lines represent the transition point between the two regimes, which are enlightened by the black dashed lines (N (n) ∼ n 1/2 ) and by the black dotted ones (N (n) ∼ n).
identified by Lu et al holds only transiently for Q 1. This behavior is reported in Fig. 4, where we plotted different Heapsian trajectories. More precisely we performed four random samplings from a power law with exponent α = 3 2 (which corresponds to a Zipf's exponent γ = 2) and we studied the scaling of N (n) considering the following upper cutoffs: 10 2 , 10 4 , 10 6 and 10 8 (from panel (a) to panel (d)). The theoretical crossover is represented by red dashed lines, while the black lines enlighten the two scaling regimes, namely N (n) ∼ n 1/2 and N (n) ∼ n.
We can now turn to genuine Zipfian systems, for which, being the dynamics Zipfian, Q does not increase with n. As a consequence if N is growing and for n sufficiently large, it will hold N (n) Q(n) and therefore Eq. (8) reduces to This expression is analogous to the one derived by Lu et al, but there is an explicit dependence on s m . This implies that the scaling defined by Eq. (7) asymptotically holds, but only if the lower cutoff is constant in time. As a consequence the conclusion that any system showing a stable Zipf's exponent always presents Heaps' law is not correct. Nevertheless Zipf's scaling is more fundamental than Heaps' one, as suggested by Lu et al. [19], even if the relation among these laws is more complicated than previ-ously noticed. In particular • if the system shows spurious Zipf's law and the lower cutoff is fixed, then Heaps' exponent is asymptotically equal to one, independently of the Zipf's exponent. For γ > 1 a transient regime is present, and it lasts up to Q ∼ 1. In this case, Heaps' exponent coincides with that found by Lu et al. [19]; • if the system shows Zipfian dynamics and so a genuine Zipf's scaling, then Heaps' law is found and Heaps' exponent asymptotically (i.e., as soon as N Q) coincides with that derived in [19], provided that the lower cutoff s m does not vary over time; • if the lower cutoff s m is varying in time, the scaling of N (n) can not be easily derived, being explicitly influenced by that of s m . However, moving to the reference frame in which the lower cutoff is held fixed, this case can be traced back to those discussed above.
This explains, for instance, why the Sample Space Reducing process shows, for large n, a stable Zipf's law without the presence of Heaps' scaling [25]. Moreover it also clarifies the findings in [19] concerning the increasing of the Heaps' exponent in presence of an exponential cutoff in the tail of the generating power law.

D. Spurious Zipf 's law and the upper cutoff
We previously noticed that while a system showing Zipfian dynamics is out of equilibrium driven, never sampling the inherent PDF, spurious Zipfian systems completely explore the full range of the distribution during their evolution. As we are going to show, this property can be used to give an estimate of the upper cutoff s M for these systems. We recall that the deviation parameter satisfies while, using Eq.(4), we see that the largest object in the system is given by It then follows where we defined the empirical deviation parameter Q e , whose expression is We can then relate the upper cutoff of the PDF to the rank one object, in particular we obtain For Q e = 1, that is for a perfect Zipf's law, the upper cutoff diverges, meaning that we can not infer it starting from the data. However, if the dynamics is not Zipfian, Q increases with time and, as shown by Eq. (10), Q e does the same. As a consequence, for N sufficiently large, it will hold Q e 1. In this limit we can expand the previous expression, obtaining Using this expression we can estimate the upper cutoff s M for any system showing spurious Zipf's law.

III. PRACTICAL APPLICATIONS
In the following we apply the methodology introduced in the previous section to earthquakes, cities, and language. In all three cases our dynamical approach will provide novel and quantitative insights into the considered systems. Then we discuss two models which notoriously lead to Zipf's scaling: multiplicative processes [10] and the Yule-Simon model [16,17], showing analytically that they produce a truly Zipfian dynamics.

A. Earthquakes
It is well known that earthquakes follow the Gutenberg-Richter law [26] -i.e. the energy released is power-law distributed -and it is reasonable to assume that, in a given seismic zone, the upper cutoff of this PDF can only vary over geological times. By using Eq. (3), we thus deduce that increasing the numerosity N of the set, that is by considering larger and larger time windows (but always much smaller than geological scales), results in higher values of Q. This is confirmed by the corresponding trajectory in the Zipf's plane, Fig. 2, where we plotted the trajectory of Italian earthquakes. From right to left, the points correspond to an interval of 50 years (1950 − 2000), of 100 years (1900 − 2000) and so on, up to 1000 years (1000 − 2000). These points accumulate in correspondence of the maximum possible size for an earthquake occurring in Italy, enlightening the absence of a Zipfian dynamics. This is also confirmed by the growth of Q(n), represented in Fig. 3. The conclusion is that earthquakes can show only spurious Zipf's law. This result directly derives from the fact that earthquakes, neglecting short time correlations, are by a good extent independent over long periods (tens of years), being energy always injected into the system. It is therefore clear that power-law distributed objects cannot tend to Zipf's law if they evolve independently, as long as the cutoffs of the inherent PDF are fixed. Moreover, by looking at 1, it is possible to conclude that no future Italian earthquake will be substantially stronger than the largest event already recorded, which is a rather interesting and non-trivial result, being obtained only from simple statistical considerations. Indeed, the deviation parameter Q is large and we can use Eq. (11) for obtaining an estimate of the maximal magnitude of an earthquake occurring in Italy These considerations also explain the findings of Newman [7] and Sornette et al [8] about Californian earthquakes. Newman analyzes only events recorded in the period 1910-1992, finding a pure Zipf's law with no deviations. Analogously Sornette et al considered earthquakes occurred in Southern California in the period 1930-1990. In both cases no deviations from Zipf's law are observed and this is due to the small dimension of the samples used. Indeed, the time window considered is too small to appreciate the upper cutoff and deviations appears increasing the sampling interval. This is shown in Fig. 5, were we plotted the rank-size plot of Californian earthquakes occurred between 1769 and 2000 with the corresponding fit. Being Q ≈ 9 we conclude that also in this case the largest earthquake registered in the sample we considered, whose magnitude is M (1) ca ≈ 7.9, is a good estimator of the upper cutoff of the earthquakes size distribution.
Our dynamical approach consequently shows that a critical usage of the rank-size plot provides information which goes well beyond the simple identification of a scale free distribution. Indeed the presence of deviations at low ranks and the identification of spurious Zipf's law allow to perform risk assessment and to understand if the available  [7] indicate the lack of Zipfian dynamics. Moreover such deviations imply that the strongest earthquake observed is a good approximation of the upper cutoff. (b) Rank-size plot of world tsunami occurred from 2100 BC to 2020. In this case considering a large time window does not make deviations appear. The conclusion is that the available sample does not allow to infer the upper cutoff of tsunami height, which, as in the case of earthquakes, is expected to vary only over geological scales.
data are a sufficient statistic of the phenomenon considered. For instance, if the rank-size plot of earthquakes occurred in a given region is straight, then the most powerful earthquake observed will not be, in general, a good estimate of the upper cutoff of the distribution. Clearly this has strong implications for the planning of antiseismic measures. As a consequence, when studying the risk connected with natural hazards, an inspection of the ranksize plot can be a very fast procedure to understand the effective reliability of the available data. For example, in respect of world tsunami, considering a very large time window does not make deviations from Zipf's law appear. This is shown in Fig. 5, where we plotted the rank-size plot of tsunami occurred worldwide since 2100BC. For tsunami run-up heights we used NOAA NCEI/WDS Global Historical Tsunami Database, 2100 BC to Present [27]. Also in this case the upper cutoff of the distribution is expected to be fixed over human times, therefore the conclusion is that there is no statistical evidence that the highest tsunami ever observed represents a good estimate for the upper cutoff of tsunami distribution.
B. Cities

US cities are Zipfian
The population of metropolitan areas is a prototype of system usually claimed to follow Zipf's law. Here we show that, while US cities follow a Zipfian dynamics, when we consider world cities the dynamics is not Zipfian, a direct consequence of the non-additivity property of Zipfian dynamics, that we analytically prove below. Let us first focus on the time evolution of US cities. Up to 1776 the US were, by a good extent, independent entities and each of them used its resources to make its own capital grow. As soon as interaction became relevant and the USA turned into a single nation, resources were centralized, flowing into only some of those cities and allowing them to reach a population that would have been impossible to sustain for a single State. As a consequence, the upper cutoff of the size distribution of cities enormously increased. This process, which corresponds to the emergence of New York as the driving city of the USA, is well represented by the corresponding trajectory in the Zipf's plane (Fig. 2, black circles). The size of the largest possible city s 1/γ M increases very fast with respect to N · s 1/γ m , leading to a decrease of Q(n), as shown in Fig. 3. [28] Being the dynamics Zipfian, we can that expect US cities to follow also Heaps' law. Indeed the lower cutoff of the distribution, which coincides with the size of the smallest urban Q(n) grows linearly in n for the random sampling. Remarkably this linear growth is observed also considering world cities. Being the lower cutoff of such system fixed (and equal to 300, 000 inhabitants), this suggest that the largest urban centers are reaching the intrinsic upper cutoff of the distribution.
settlement, remained almost constant during the development of US urban system, being administratively fixed at 2500 inhabitants. The applicability of Heaps' law to urban structures has been widely ignored and only recently this point has been considered [29], even if not from a dynamical point of view. More precisely previous works [29] study the functional form of N (n) by plotting the number of cities as function of the population for many countries at present days. This yield a static and aggregate picture, which lacks in providing any information about the urban development of a given nation. Differently, here we focus on the dynamics of a single country, namely the US, following its urban development over time. We reported in panel (a) of Fig. 6 the growth of the number of US cities N (n) as function of the urban population n, also a fit to Heaps law is drawn. The adherence to this scaling relation is strict and the exponent β of Heaps' scaling satisfies β = 0.8768 (0.8459, 0.9077) This sub-linear growth is what one would expect being the size of the system finite and the average (over the period 1790-1900) Zipf's exponent γ ≈ 0.86 [19]. The fact that also urban systems evolve according to Heaps' law has never been pointed out, but it is a very natural consequence of the dynamical framework we developed.

World cities are not Zipfian
Now we consider an aspect of Zipf's law that, despite its relevance, has been only partially discussed [3]: is the union of two Zipfian set still Zipfian? In this section we prove that Zipfian dynamics is not an additive property. Urban systems constitute a perfect framework to apply our framework to fully understand this phenomenon. Zipf's law is observed for almost any country [30] and therefore, also guided by our findings regarding the US, one could expect that also the system formed by all cities in the world is Zipfian. For each individual country k coherence is respected, consequently we have d log where the apex (k) indicates that all these quantities are referred to the kth nation. Denoting by M the number of countries, the system formed by world cities is obtained summing over the M national urban system, this implies M . Note that this cutoff will not be, in general, the one growing faster. We finally assume that the Zipf's exponent is approximately the same for all countries i.e. γ (k) ≈ γ for any k. This approximation is well supported by empirical studies [30]. We can now write the coherence condition for the world system: the right side of Eq. (5) is Dividing and multiplying by N (k) and introducing the frequencies x (k) = N (k) N we rewrite this expression as  Fig. 6, where we plotted also the trend which results from a random sampling with fixed cutoffs. Note that Q and n have been rescaled for better comparing the two systems and moreover we used only values Q 1 for avoiding the effect of noise on small Q. In the case of the random sampling, Q(n) is expected to be a linear function of n, indeed, recalling Eqs. (3) and the results of Subsec. II C regarding spurious Zipfian systems, it holds Being the cutoffs and the Zipf's exponent fixed, Q grows linearly with n, as also confirmed by panel (b) of Fig. 6. Remarkably this trend is observed also considering world cities, as shown in the same figure. Being s m fixed (and equal to 300, 000), this suggest that also in this case s M is fixed, meaning that the largest cities in the world are getting closer and closer to an upper limit of population. This is consistent with many studies asserting that large urban centers are not efficient due to, for instance, traffic jams, pollution, complexity of water management or vulnerability to natural hazards [32][33][34]. Exploiting Eq. (11) we can then give an estimate of the upper cutoff of urban population, it results s M ≈ 41 · 10 6 inhabitants.
This number has to be compared with the population of Tokyo metro area, that, with a population of ≈ 37 · 10 6 inhabitants, is the largest urban settlement in the world. Clearly, in contrast with earthquakes, the upper cutoff of world cities may vary thanks to, for instance, technological innovations, as happened with the development of skyscrapers in the past century. As a consequence, the value we computed should be considered as a limit of population which is expected not to be overcome in the next few years. In this sense our estimate, obtained only by statistical arguments, is in good agreement with projections [35], according to which this population limit will substantially hold up to 2050.
C. Language

The dynamics of language
Natural language is the first and most prominent application of Zipf's law. We can confirm that this system is Zipfian by looking the evolution of Q(n) Fig. 3 and the trajectory in the Zipf's plane Fig. 2, both referred to the words occurrences in the Moby Dick novel. Here the objects are the different words, the sizes S are their numbers of occurrence, and "time" n is the progressively increasing fraction of words of the novel we consider. In this case, occurrences of different words are not independent events, being them constrained by grammar and semantic rules, which makes the upper limit of the number of occurrences grow faster than the product N γ · s m . We recall that N is the number of different words observed. For instance, in order to increase N , new meaningful sentences, semantically coordinated with the previous text, are to be composed. However these sentences must contain, on average, many occurrences of the most frequent word, which is the article "the". This makes s M grow faster than N γ · s m and thus a coherent, Zipfian dynamics emerge.
In order to prove that Zipfian dynamics emerges as a consequence of the effect of correlations induced by grammar and semantic rules, we consider two different systems in which such rules are adopted to a different extent. In panel (a) of Fig. 7 we show the rank-size plot of the most common words used by children and adults. In particular we used CHILDES database [36] and we analyzed with CLAN program all the American English corpora available to obtain two rank-size lists, one referred to children below six and the other to adults. It is evident that childish language is characterized by considerable deviations from Zipf's law and so by a larger value of Q, in particular performing two fits we obtained Q children ≈ 3.6 and Q adults ≈ 0.80. Language, seen as a system which evolves during its learning, is then characterized by a Zipfian dynamics.

Zipfian dynamics increases language efficiency
A well know argument explaining the onset of Zipf's law in language is due to Mandelbrot [15,37] and it is based on information theory. Let us consider a dictionary of N words, each of them will be characterized by a frequency of occurrence which can be regarded as its normalized size. If we want to efficiently store and transmit sentences the best thing to do is to associate low coding numbers to the most common words. Denoting by C(k) the coding of the Deviations from Zipf's law are much more prominent in the former, a consequence of the imperfect adoption of grammar rules at the early age. This suggests that the dynamics of language is Zipfian. (b) Average cost per information of simulated languages as function of the deviation parameter Q. As it is possible to see, at fixed Zipf's exponent γ, lower is Q, larger is the efficiency of the language (inverse of A). The Zipfian dynamics of language can then be explained in terms of an optimization process.
kth word by frequency the most efficient choice is The average information per word is given by the entropy of the language H, defined as where f (k) is the frequency of the kth most common word. As pointed out by Mandelbrot, an efficient language should maximize the average information, while lowering the average cost C, whose expression is In other words, the quantity A = C/H must be as small as possible. By minimizing A one obtains that f (k) follows Zipf's law [15,37], but the Zipf's exponent γ depends on the vocabulary size N and diverges for large N [38]. This drawback seriously compromise the argument as originally formulated, however Mandelbrot's idea can still be used for understanding Zipfian dynamics in language. Indeed, a large Q implies that the empirical distribution f (k) is curved at low ranks, making the most common words being equally frequent. This is expected to lower the efficiency of the language, as follows from the expression of the cost C.
We have numerically proved this by studying the value of A as function of Q at fixed Zipf's exponent. In particular, we extracted N frequencies using different values of the upper cutoff s M and we repeated the process for various Zipf's coefficients γ. Results are reported in panel (b) of Fig. 7. It clearly emerges that, ceteris paribus, the lower is Q the lower is A. As a consequence the presence of a Zipfian dynamics in natural language has a very natural explanation in the context of linguistic and information theory, because an evolution towards low values of Q increases the efficiency of the language in terms of the ratio between information and cost. In other words, even if Mandelbrot's argument does not explain why a finite Zipf's exponent is observed, it allows to understand why, if Zipf's exponent is hold fixed by some other mechanism, the language evolves with a Zipfian dynamics.

D. Multiplicative process
Multiplicative processes are a well known example of models capable of explaining the onset of power laws [10]; in particular, they have been used to model the evolution of incomes and stock prices [9,37]. Given the variable S t , representing for instance the price of a stock at time t, its evolution is defined as where r t is a random variable drawn at t from a given PDF, usually a normal or a uniform distribution. This stochastic dynamics is equivalent to a random walk in logarithmic space and it is easy to show that, under mild assumptions, the variable S is asymptotically log-normally distributed [39]. If however we constrain S t to be larger than a lower cutoff s m (that is, S t performs a random walk with a reflecting barrier in s m ), then the limiting distribution of S becomes a power law [10]. Making the continuum approximation the process can be described in terms of a Fokker-Planck equation with drift ν and diffusion coefficient D.
In particular it holds [40] ν = log r D = (log r) 2 − log r 2 and the limiting distribution of S is a power law of the form In the transient regime, however, the power law distribution above presents an upper cutoff s M given by [40] s In other words, the probabilistic space enlarges exponentially fast with time. Now we show that this leads to a Zipfian dynamics. Let us consider an ensemble of N objects, which could be, for instance, a set of stock prices, evolving according to the multiplicative process with lower cutoff described above. Suppose that all the prices are initially equal to s m , Recalling the expression of Q Eq. (3) and using Eq. (13), we obtain that the deviation parameter Q evolves in time according to So Q exponentially decreases towards zero and the system shows Zipfian dynamics. The coherence condition Eq. (5) can be written as Using the example of stock prices, here n would be the total value of the stocks considered, being the sum of the N prices. N is fixed, so we rewrite this condition as The first derivative, by virtue of Eq. (13), is positive, and the second is non negative, because all the prices are initially set equal to the lower cutoff s m . We conclude that the multiplicative process is an example of Zipfian dynamics, as also shown by a typical trajectory in the Zipf plane Fig. 2 and the decrease of Q(n) in numerical simulations Fig. 3. Also, we note that even if the dynamics is Zipfian, the systems can not show Heaps' law because N is constant. This could look like a contradiction, because we stated that Heaps' law asymptotically holds whenever the dynamics is Zipfian and the lower cutoff is fixed. A remark clarifies this point, after an initial transient, n fluctuates around a constant value, as shown in Fig. 3, because the ensemble reaches a stationary Zipfian distribution, whose parameters can be derived using Eq (3) As a consequence, being N , s m and γ fixed, there is no growth of n [41] and then Heaps' law can not be defined at all.

E. Yule-Simon model
Yule-Simon process [16,17], based on the concept of preferential attachment, is one of the most famous examples of power laws generating model. We shortly illustrate the process in the contest of urban systems. Consider a set of N 0 initial urban centers, each with unitary population. At each time n: • with probability 1−p a unit of population is added to a random urban center j, selected with a probability proportional to its population S n (j); • with probability p a new urban settlement, with unitary population, is added to the system.
It has been proven that these cities are asymptotically power law distributed, more precisely Now we prove that the Yule-Simon model shows coherence. We start by noting that Eq.(10) implies that if Q e is decreasing also Q decreases. This consideration implies that coherence is found if it holds In the context of urban settlements, S n (1) is the population of the largest city, N is the number of different urban settlements, and n is the total population. Moreover, the lower cutoff s m is equal to one (cities with unitary population are injected into the system at a constant rate). The population of the largest city, S n (1), evolves according to while the growth of N satisfies N (n + 1) = N (n) + p → dN dn = p, N (n) = N 0 + pn.
Plugging these expressions into Eq. (15), and recalling Eq. (14), we get That is always satisfied, since N 0 > 0 and p > 0. We have thus proved that also Yule-Simon model satisfies the coherence condition and, therefore, that it shows Zipfian dynamics. This is also visually confirmed by the corresponding trajectory in the Zipf's plane Fig. 2, which corresponds to a Yule-Simon process with N 0 = 100 and p = 0.5, and by the evolution of Q reported in Fig. 3. Finally, it is interesting to note that such trajectory is very similar to the one performed by world countries, where the size of a country is given by its GDPppp. The rich get richer mechanism seems therefore more appropriate than a multiplicative process for describing the evolution of the world system.

IV. CONCLUSIONS AND DISCUSSION
Zipf's law is a scaling relation present in the rank-size plots of many different natural and socio-economic systems. Despite its ubiquity and the numerous empirical and theoretical investigations, a deep understanding and a unified framework of analysis is still lacking. In this work, we start from the empirical observation that the deviation parameter Q of a given system changes with time, or even considering different subsets of the same database. The importance of this parameter is enhanced by its connection to the first ranks, i.e. the largest objects, the larger is Q, the more the first ranks deviate from Zipf's law. A probabilistic argument permits us to express the deviation Q as a function of the intrinsic natural cut-offs of the underlying power law PDF (s m and s M ), its exponent α, and the number of elements in the system N , Eq. (3): This relation permits to understand why a dynamical approach is crucial: in general, for real systems N , s m , and s M vary during evolution and, as a consequence, deviations from Zipf's law can increase, as in the case of earthquakes, or decrease, as happens with US cities. For this reason, we introduce the concept of Zipfian dynamics, which drives the system to exact Zipf's law, and Zipf 's plane, a visual tool which allows studying deviations dynamically.
In particular, we demonstrate that Zipfian dynamics can not be produced by a simple truncated power law PDF of sizes. Indeed, it is connected to the presence of mechanisms which make the level of sampling and the parameters of the PDF evolve in a peculiar way, such that Q decreases during evolution. More precisely, we find a dynamical constraint, that we name coherence, relating the growth of the probabilistic space s M /s m to the enlargement of the physical space N , meant as the number of elements composing the system, Eq. (5); we call a system or a model Zipfian if This expression implies that Zipfian systems, i.e. those that satisfy the above inequality, are attracted toward Zipf's law and evolve out of equilibrium, never fully sampling their probabilistic space. Moreover, by generalizing the treatment of Lu et al [19], we demonstrate that Heaps' law is a particular case of Zipfian dynamics.
Conversely some systems, such as earthquakes, show Zipf's only temporarily. We call this effect, a consequence of a possibly accidental under-sampling, spurious Zipf's law. In this case, the growth of N is fast enough to ensure a sampling of the large events, this allowing to estimate the upper cutoff of the PDF. For instance, we determine the maximal possible magnitude of an earthquake occurring in Italy and we show that, on the contrary, the largest tsunami database does not provide a sufficient statistic for inferring the upper cutoff of the distribution. This technique can be easily generalized to other natural or manprovoked hazards, such as hurricanes or terrorist attacks, both claimed to be power-law distributed [42,43]. We stress that a spurious Zipf 's law can be identified only by performing a dynamical analysis, and this opens questions about the effective ubiquity of Zipf's scaling. Indeed, many systems where Zipf's law is claimed to be found, such as world cities and earthquakes, are actually evolving towards a high Q configuration, so departing from Zipf's law.
Then we studied a number of concrete applications of our quantitative framework of analysis: • we show that earthquakes, being essentially independent (in the sense specified before) and characterized by a fixed upper limit, can evolve only incoherently and show Zipf's law spuriously. As aforementioned we used this property for computing the maximal magnitude of an earthquake occurring in Italy; • natural language dynamics is intrinsically Zipfian thanks to the inherent cohrence provided by the grammar rules. Zipfian dynamics, moreover, increases the efficiency of the language and can be directly related to the renowned optimization argument proposed by Mandelbrot; • US metropolitan areas evolved Zipfianly from the Declaration of Independence and the number of different US cities grew according to Heaps's law; • we analitically show that Zipfian dynamics is not additive, confirming this finding also empirically by comparing the evolution of US versus world cities. Moreover, the dynamics of world cities suggests that the largest urban settlements are getting closer to an intrinsic upper limit of population, that we also estimated.
Our framework can be directly applied also to theoretical generative models, in particular we considered multiplicative processes and the Yule-Simon model. We find  that both processes are characterized by Zipfian dynamics, but only the latter presents an evolution which, albeit qualitatively, reflects the one of the real systems we considered. In particular, using the Zipf's plane, we show that the Yule-Simon process reproduces the trajectory followed by world countries, suggesting the presence of a rich-getricher mechanism. This dynamical approach thus allows to understand not only if a model reproduces the emergence of Zipf's scaling, but also if it is suitable for describing the evolution of systems toward the Zipf's regime.
Recently, Corominas-Murtra et al. [13] introduced sample-space-reducing processes, showing that they produce Zipf's law. We note that the kind of random walks with space restriction introduced in [13] can be seen as a very particular case in which the space that the walker can visit at each times reduces coherently with the previous dynamics. However we stress the fact that the mechanism behind the generation of genuine Zipfian dynamics is from one hand much more general and and from the other much more interesting when the space limits enlarge coherently with the dynamics, as in the case for instance of natural language. Indeed only in this case Zipf law emerges as stable self-averaging dynamical feature of the system. For instance the recently proposed Urn Model with Triggering [11], based on the concept of adjacent possible [12], goes in this direction. In table I we summarize if the various systems and models we analyzed are Zipfian or not, and the the main specific findings our framework provides. Clearly, our approach can be applied to all systems that are claimed to follow Zipf's law.
In short, the main points of our work are the following: 1. Zipf's law should be studied during its evolution: this is the only way to recognize if the system or, better, its dynamics, is truly Zipfian or not; 2. Zipf's law may appear only temporary and, so, spuriously: in particular, any power-law distributed system can show Zipf's law the underlying PDF is under-sampled; 3. if a system shows a spurious Zipf's, then one can estimate the upper cutoff of the generating distribution; 4. if a system is truly Zipfian, then is inherently out of equilibrium, since its the probabilistic space enlarge faster than the physical one; 5. systems showing Heaps' law are a subset of those characterized by a Zipfian dynamics; 6. studying the dynamics allows to determine if a generative model is capable of explaining not only the emergence of Zipf's law, but also the dynamical evolution of systems toward this scaling regime.
Finally we stress that our study opens also a series of questions which should be deeply investigated. Firstly, most of the analysis concerning Zipf's scaling are performed statically, by checking the straightness of the ranksize plot at a given time. As a consequence the list of systems showing genuine Zipf's law may be drastically smaller than usually claimed. Focusing only on genuine Zipfian systems may allow to find a universal generating mechanism for Zipf's law; clearly such a goal is not achievable without the exclusion of spurious Zipfian systems. For example the Zipfian distribution of Lunar craters, a never explained manifestation of this scaling law, could be addressed as the result of an accidental under-sampling provoked by the low rate of asteroids collisions and so as a spurious manifestation of Zipf's law. The possibility of analyzing systems from a never considered dynamical perspective is another novel aspect made available by our framework. For instance we noticed that while some systems, such as natural language or metro areas, present a regular evolution of Q, other systems, such as that formed by world conflicts, show a dynamics characterized by sudden decreases of Q. This behavior could be explained in terms of a jump of the upper cutoff, as follows from Eq. (3), and therefore our framework can also be used for gathering novel insight on that phenomena usually defined Black Swans. These and other topics will be the object of future studies.
It is possible to express the rank-size relation as a function of the PDF parameters using the fact that given the PDF P (S) of a continuous variable S, the values of its Cumulative Distribution Function (CDF) C(S), associated to the different values of S, are approximately equiprobable. In fact if P (s) is the PDF of the variable S defined in the interval [s m , s M ], then C(S) = S sm ds P (s ). By performing the change of variables from S to C = C(S), and calling f (C) its PDF, we get by definition of PDF and CDF f (C) = dS(C) dC P (S)| S=S(C) = 1 for 0 ≤ C ≤ 1. This implies that, given N values of S independently extracted from P (S), with good approximation they can be taken as uniformly spaced in the corresponding variable C. Thus, the k th size ranked value S(k) approximately corresponds to the CDF value N +1−k N +1 . In formulas A similar computation has been performed by Lu et al. [19] in order to study the relation between Heaps and Zipf's laws. By comparing Eqs. (4) and (1) we can derive the following expressions that relate the number of values/objects and the parameters of the PDF P (S) on one side, and the Zipf-Mandelbrot parameters on the other, that is Eq.( (3)) of the Results section.

B. Databases used
All the databases we used are freely accessible on the web. In the following we shortly describe them.
• Earthquakes For our analysis of the Italian earthquakes we used the INGV Parametric Catalogue of Italian Earthquakes, which "provides homogeneous macroseismic and instrumental data and parameters for Italian earthquakes with maximum intensity ≥ 5 or magnitude geq4.0 in the period 1000-2017" [22]. For what concerns Californian ones we used the California Department of conservation dataset. It ranges from 1769 to 2000 and is an extension of Petersen et al. catalog [44].
• Tsunami The study of world tsunami has been performed using NOAA NCEI/WDS Global Historical Tsunami Database, 2100 BC to Present [27].
• US metropolitan areas Our study of US metro areas is based on the work of Schroeder [23]. The database, which contains historical estimates of US metro areas and counties population, can be accessed here. The number of different US cities and the urban population can be found in [45].
• World cities The historical population of world cities comes from the World Urbanization Prospects 2018 of the Department of Economic and Social Affairs (UN) [31]. In particular we used the "Annual Population of Urban Agglomerations with 300,000 Inhabitants or More in 2018, by country, 1950-2035" (ignoring 2021-2035 data).
• Language In order to study the evolution of language we used CHILDES database [36]. In particular we analyzed with CLAN program all the American English corpora available to obtain two rank-size list, one referred to children below six and the other to adults.
• GDP PPP of countries Maddison database [46], available here, provides GDP PPP of countries from 1 AD to 2008. We integrated it with IMF data to obtain a database which ranges from 1900 to 2019.
• Volcanic eruptions The casualties provoked by volcanic eruptions have been collected from NOAA historical dataset, which ranges from 4300 BC to present.

C. Fitting procedure
If an high level of precision is needed, the fitting of power law distributions is a particularly difficult procedure which has been studied extensively [47,48]. In particular, it has been shown that using least squares techniques give biased estimates of the slope of the PDF. However, in our work we are not interested in obtaining a precise estimate of the parameters of the PDF, but rather in checking the trend they follow. We then adopted a standard non linear least squares fitting procedure, whose accuracy, when applied to the rank-size plot or to the complementary cumulative distribution, is comparable to maximum likelihood estimates [49]. In particular, we used Eq. (4)  The use of a more sophisticated technique would probably remove some noise from the trajectories, but in our opinion the trend is clear also with the procedure we followed. The presence or absence of deviations at first ranks can be easily checked at glance and is definitely less problematic than the computation of an unbiased estimator of the slope.