How to calculate the inverse CDF for the Zipf distribution

1k Views Asked by At

How can I calculate the inverse CDF for Zipf distribution?

This distribution has

$$ \mathrm{pdf}(x) = \frac{1}{x^s \times H_{N,s}} $$ $$ \mathrm{cdf}(x) = \frac{H_{x,s}}{H_{N,s}} $$

where $s \gt 0$ is the scale parameter, $H$ is the Harmonic number $H_{N,s} = \sum_{i=1}^N \frac{1}{i^s}$

1

There are 1 best solutions below

3
On BEST ANSWER

The cdf is given by the formula you wrote only when $x \in \{1,\ldots,N\}$. If $x$ is, for example, $3.6$, then $\Pr(X \le x) = \Pr(X \le 3.6) = \Pr (X \le 3)$, so you have to put the greatest integer less than or equal to $x$ in place of $x$ in your formula.

Since this cdf has jump discontinuities, it doesn't have an inverse function in the usual sense. Maybe what you want is a function $G$ such that if $U$ is a uniformly distributed random variable on the interval $[0,1]$, then $G(U)$ is a random variable with the restricted Zipf distribution you're writing about. If the cdf $F$ is continuous, then $G = F^{-1}$ will serve in that role. But this one is not continuous, so instead you can do something like this:

If $0\le U < \frac{H_{1,s}}{H_{N,s}}$, then $G(U) = 1$

If $\frac{H_{1,s}}{H_{N,s}}\le U < \frac{H_{1,s}+H_{2,s}}{H_{N,s}}$ then $G(U)=2$

If $\frac{H_{1,s}+H_{2,s}}{H_{N,s}}\le U < \frac{H_{1,s}+H_{2,s}+H_{3,s}}{H_{N,s}}$ then $G(U)=3$

If $\frac{H_{1,s}+H_{2,s}+H_{3,s}}{H_{N,s}}\le U < \frac{H_{1,s}+H_{2,s}+H_{3,s}+H_{4,s}}{H_{N,s}}$ then $G(U)=4$

and so on (where $U$ is a uniformly distributed random varibale as above).