Zipf’s law is not a consequence of the central limit theorem

G. Troll and P. beim Graben
Phys. Rev. E 57, 1347 – Published 1 February 1998
PDFExport Citation

Abstract

It has been observed that the rank statistics of string frequencies of many symbolic systems (e.g., word frequencies of natural languages) follows Zipf’s law in good approximation. We show that, contrary to claims in the literature, Zipf’s law cannot be realized by the central limit theorem(s). The observation that a log-normal distribution of string frequencies yields an approximately Zipf-like rank statistics is actually misleading. Indeed, Zipf’s law for the rank statistics is strictly equivalent to a power law distribution of frequencies. There are two natural ways to perform the infinite size limit for the vocabulary. The first one is the method of choice in the literature; it makes the upper word length bound tend to infinity and leads in the case of a multistate Bernoulli process via a central limit theorem to a log-normal frequency distribution. An alternative and for text samples actually better realizable way is to make the lower frequency bound tend to zero. This limit procedure leads to a power law distribution and hence to Zipf’s law—at least for Bernoulli processes and to a very good approximation for natural languages where it passes the χ2 test. For the Bernoulli case we will give a heuristic proof.

  • Received 23 April 1997

DOI:https://doi.org/10.1103/PhysRevE.57.1347

©1998 American Physical Society

Authors & Affiliations

G. Troll and P. beim Graben

  • Nichtlineare Dynamik, Universität Potsdam D-14415 Potsdam, Germany

References (Subscription Required)

Click to Expand
Issue

Vol. 57, Iss. 2 — February 1998

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×