Some questions about Pinsker inequality, KL divergence and total variation distance

284 Views Asked by At

In https://en.wikipedia.org/wiki/Pinsker%27s_inequality#Alternative_version it is stated that $$ D(P\| Q) \ge \frac{1}{2 \ln 2} V^2(p, q), $$ where $D(P\| Q)$ is the KL-divergence between $P$ and $Q$ and $V(p,q)$ is the variation distance between $p$ and $q$. As I understand, $$V(p,q) = \sum_x |p(x)-q(x)|$$ As I understand, Pinsker inequality states that $$\sqrt{\frac{D(P||Q)}{2}} \ge \frac{1}{2} \sum_x |p(x)-q(x)|$$

Therefore, I do not understand where the $\ln 2$ comes from in the first inequality above.

I also have a related question: can I state that the following quantity is equivalent to $V(p,q)/2$ as defined above?

$$\frac{1}{2} V(p,q) = \frac{1}{2}\int_{\mathcal{X}} |p(x)-q(x)| \mu(dx) $$

The above expression in the right hand side appears in Theorem 13.1.1. in

L. Lehmann, J. P. Romano, and G. Casella, Testing statistical hypotheses. Springer, 2005

I wanted to make sure I got it right

1

There are 1 best solutions below

0
On BEST ANSWER

For your first question, this simply depends on what basis of logarithm you use in the definition of KL-divergence. $D_{KL}$ is defined using $\ln$ (logarithm in base $e$), whereas the reference for the alternative version defines $D$ with $\log_2$ (logarithm in base $2$). Of course, with these notations, $D=D_{KL}/\ln 2$. I agree that this is not clear from the Wikipedia page.


For your second question, it is indeed possible to write $V(p,q)$ like this, provided you define $p,q$, and $\mu$ appropriately. In general, if $P$ and $Q$ are two probability measures that are absolutely continuous with respect to the same $\sigma$-finite measure $\lambda$ (with Radon-Nikodym derivatives $P=f\cdot \lambda$ and $Q=g\cdot \lambda$), then $$ d_{TV}(P,Q)=\frac{1}{2}\int |f-g| \,d\lambda, $$ where $d_{TV}$ is defined as $d_{TV}(P,Q)=\sup_{A} |P(A)-Q(A)|$. In particular, if $P$ and $Q$ are probability measures on a countable set $\mathcal X$, one can take $\lambda$ to be the counting measure on $\mathcal X$, and $f$ and $g$ are simply the probability mass functions of $P$ and $Q$ (what you called $p$ and $q$ above). In that case, we can identically write $$ d_{TV}(P,Q)=\frac{1}{2}\int |p-q|\,d\lambda=\frac{1}{2}\sum_x |p(x)-q(x)|. $$ It is probably better to have some background in measure theory to be comfortable with these expressions.