Zipf-Mandelbrot law

Zipf-Mandelbrot
Probability mass function
Cumulative distribution function
Parameters N \in \{1,2,3\ldots\} (integer)
q \in [0;\infty) (real)
s>0\, (real)
Support k \in \{0,1,2,\ldots,N\}
pmf \frac{1/(k+q)^s}{H_{N,q,s}}
cdf \frac{H_{k,q,s}}{H_{N,q,s}}
Mean \frac{H_{N,q,s-1}}{H_{N,q,s}}-q
Median N/A
Mode \frac{1/(1+q)^s}{H_{N,q,s}}
Variance
Skewness
Kurtosis
Entropy
mgf
Char. func.

In probability theory and statistics, the Zipf-Mandelbrot law is a discrete probability distribution. Also known as the Pareto-Zipf law, it is a power-law distribution on ranked data, named after the Harvard linguistics professor George Kingsley Zipf (1902-1950) who suggested a simpler distribution called Zipf's law, and the mathematician Benoit Mandelbrot (born November 20, 1924), who subsequently generalized it.

The probability mass function is given by:

f_k(N,q,s)=\frac{1/(k+q)^s}{H_{N,q,s}}

where HN,q,s is given by:

H_{N,q,s}=\sum_{i=1}^N \frac{1}{(i+q)^s}

which may be thought of as a generalization of a harmonic number. In the limit as N approaches infinity, this becomes the Hurwitz zeta function ζ(q,s). For finite N and q = 0 the Zipf-Mandelbrot law becomes Zipf's law. For infinite N and q = 0 it becomes a Zeta distribution.

Applications

The distribution of words ranked by their frequency in a random corpus of text is generally a power-law distribution, known as Zipf's law.

If one plots the frequency rank of words contained in a large corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Gelbukh and Sidoro 2001).

External links

See also: Zipf-Mandelbrot law, 1902, 1924, 1950, 2001, Benoit Mandelbrot, Characteristic function