how far the distribution from the uniform distribution

9.5k Views Asked by At

I have two discrete probability distributions $P$ and $Q$, where $P=(p_1,...,p_n)$ and $Q=(q_1,...,q_n)$, in addition I have uniform distribution $U=(\frac{1}{n},...,\frac{1}{n})$.

The question is how to measure which distribution $P$ or $Q$ is the closest to uniform distribution.

I am not sure if I can use Kullback–Leibler divergence because is not a "correct" distance. Also, I don't know if I can use entropy.

2

There are 2 best solutions below

5
On BEST ANSWER
  • Total variation distance, also known as statistical distance, is a good metric (very stringent). (Note that up to a factor $2$, it's equivalent to $\ell_1$ distance between the vectors of probabilities.) It also has a nice interpretation in terms of closeness of events.

  • $\ell_2$ will be much more forgiving towards small differences, and put the emphasis on outliers.

  • Hellinger also has some nice properties, and interpretation (although is maybe less commonly used).

  • Kolmogorov distance (equivalently., the $\ell_\infty$ distance between CDFs) will make sense if your domain $\{1,\dots,n\}$ has a meaningful order on it.

All of these (and more, e.g. Wasserstein/Earthmover) are valid choices -- ultimately, it'll depend on your application.

A good resource: Distances and affinities between measures, Chapter 3 of Asymptopia by Pollard. "On choosing and bounding probability metrics" by Gibbs and Su is also a recommended read.

1
On

What prevents you from using Kullback-Leibler divergence (KL divergence) as a measure of distance from the uniform distribution? I do agree with you on the fact that KL divergence is not a true measure of "distance" because it does not satisfy (a) symmetry, and (b) triangle inequality.

Nonetheless, it can serve as a criterion for measuring how far/close a distribution is to the uniform distribution. Suppose $\mathcal{X}=\{x_{1},\ldots,x_{n}\}$ is a finite alphabet, and $P=\{p_{1},\ldots,p_{n}\}$ and $U=\{1/n,\ldots,1/n\}$ are two distributions on $\mathcal{X}$, with $U$ being the uniform distribution. Then, the KL-divergence between $P$ and $U$, denoted as $D(P||U)$, is defined to be the following quantity:

\begin{align} D(P||U)&=\sum\limits_{i=1}^{n}P(x_{i})\log_{2}\left(\frac{P(x_{i})}{U(x_{i})}\right)\\ &=\sum\limits_{i=1}^{n}p_{i}\log_{2}\left(\frac{p_{i}}{1/n}\right)\\ &=\log_{2}\left(n\right)+\sum\limits_{i=1}^{n}p_{i}\log_{2}\left({p_{i}}\right)\\ &=\log_{2}(n)-H(P), \end{align} where $H(P)=\sum\limits_{i=1}^{n}p_{i}\log_{2}\left(\frac{1}{p_{i}}\right)$ is the (Shannon) entropy of the distribution $P$. Since $D(\cdot||\cdot)\geq 0$, it is clear that the uniform distribution is the most "random" distribution that can be assigned to an alphabet since its entropy is equal to $\log_{2}(n)$ bits.

If there is another distribution $Q=\{q_{1},\ldots,q_{n}\}$ defined on $\mathcal{X}$, and if $D(P||U)<D(Q||U)$, then $H(P)>H(Q)$, and thus, $P$ is more "random" than $Q$ (which makes sense since $P$ is closer to the uniform distribution than $Q$).

Thus, closer a distribution is to the uniform distribution (closer in the sense of KL divergence), more "random" it is.