Application of distance between probability measures

216 Views Asked by At

Let $P\sim Q$ be two equivalent probability measures. There seem to exsist different notions of how to define a difference between the two probability measures/distributions. For example,

  • Total variation: $$\delta(P,Q)=\sup_{A} |P(A)-Q(A)|$$
  • Kullback–Leibler divergence: $$D_{KL}(P,Q)=\int_\mathbb{R}p(x)\ln\left(\frac{p(x)}{q(x)}\right)\mathrm{d}x$$
  • Hellinger distance: $$H^2(P,Q)=\int_\mathbb{R}\left(\sqrt{p(x)}-\sqrt{q(x)}\right)^2\mathrm{d}x$$
  • Bhattacharyya distance: $$B(P,Q)=-\ln\left(\int_\mathbb{R}\sqrt{p(x)q(x)}\mathrm{d}x\right)$$
  • Jensen–Shannon divergence: $$JSD(P,Q)=\frac{1}{2}D_{KL}\left(P,\frac{P+Q}{2}\right)+\frac{1}{2}D_{KL}\left(Q,\frac{P+Q}{2}\right)$$

I've got two questions.

  1. What is the intuitive meaning? Is it as simple as: if the distance between $P$ and $Q$ is big, then an unlikely event under $P$ may be very likely under $Q$ and vice versa?
  2. Does any of these differences tell me anything about how $E^P[X]$ differs from $E^Q[X]$ for a measurable random variable $X$? What about higher moments of $X$?
1

There are 1 best solutions below

1
On BEST ANSWER

This answer is not complete, just a piece of useful intuition.

I can speak for the KL-Divergence with some intuitions of a related quantity.

Firstly note that KL Divergence is not a metric, $D_{KL}(P,Q) \neq D_{KL}(Q,P)$. Hence this measure of "distance" doesn't agree to our intuition of the metric.

To see then what it is worth for, let us suppose that $(X,Y)\sim P_{XY}$. Then if we choose, $P=P_{XY}$ and $Q=P_{X}P_{Y}$ then, $$D_{KL}(P,Q)=\mathbb{E}\left[\log\frac{P_{XY}}{P_Xp_Y}\right]$$ When is $D_{KL}(P,Q)=0$?

This happens exactly when $P_{XY}=P_XP_Y$. In other words, $X,Y$ are independent random variables. Hence for this case, $D_{KL}$ measures "how much" the random variables are independent of each other.

(If you are familiar with Information Theory, $D_{KL}(P_{XY},P_XP_Y)=I(X;Y)$ is know as the Mutual Information between $X,Y$.)

The Jensen–Shannon divergence is an extension of KL Divergence to make it symmetric about its arguments.