Zipf’s law, the central limit theorem, and the random division of the unit interval

Richard Perline

doi:10.1103/PhysRevE.54.220

Abstract

It is shown that a version of Mandelbrot’s monkey-at-the-typewriter model of Zipf’s inverse power law is directly related to two classical areas in probability theory: the central limit theorem and the ‘‘broken stick’’ problem, i.e., the random division of the unit interval. The connection to the central limit theorem is proved using a theorem on randomly indexed sums of random variables [A. Gut, Stopped Random Walks: Limit Theorems and Applications (Springer, New York, 1987)]. This reveals an underlying log-normal structure of pseudoword probabilities with an inverse power upper tail that clarifies a point of confusion in Mandelbrot’s work. An explicit asymptotic formula for the slope of the log-linear rank-size law in the upper tail of this distribution is also obtained. This formula relates to known asymptotic results concerning the random division of the unit interval that imply a slope value approaching -1 under quite general conditions. The role of size-biased sampling in obscuring the bottom part of the distribution is explained and connections to related work are noted. © 1996 The American Physical Society.

Received 30 August 1995

DOI:https://doi.org/10.1103/PhysRevE.54.220

Physical Review E

covering statistical, nonlinear, biological, and soft matter physics