Understanding the relationship of the $L^1$ norm to the total variation distance of probability measures, and the variance bound on it

7.6k Views Asked by At

I am trying to find a bound for variance of an arbitrary distribution $f_Y$ given a bound of a Kullback-Leiber divergence from a zero-mean Gaussian to $f_Y$, as I've explained in this related question. From page 10 of this article, it seems to me that:

$$\frac{1}{2}\left(\int_{-\infty}^{\infty}|p_Z(x)-p_Y(x)|dx\right)^2 \leq D(p_Z\|p_Y)$$

I have two questions:

1) How does this come about? The LHS is somehow related to total variation distance, which is $\sup\left\{|\int_A f_X(x)dx-\int_A f_Y(x)dx|:A \subset \mathbb{R}\right\}$ according to wikipedia article, but I don't see a connection. Can someone elucidate?

2) Section 6 on page 10 of the same article seems to talk about variation bounds, but I can't understand it... Can someone "translate" that to the language that someone with a graduate-level course on probability can understand? (I haven't taken measure theory, unfortunately.)

1

There are 1 best solutions below

0
On BEST ANSWER

1) Check out Lemma 11.6.1 in Elements of Information Theory by Thomas and Cover.

2) The LHS is essentially the total variation between probability measures $p_Z$ and $p_Y$ (see here). I think "variation bounds" quite literally means bounds on the total variation between the probability measures, as given in the Lemma on p. 11.