Intermediate step in bound for Kullback-Leibler (KL) divergence

56 Views Asked by At

Question

I am having a hard time deriving the first step in the proof of Theorem 3.1 as found in McAllester and Stratos's paper entitled ``Formal Limitations on the Measurement of Mutual Information '' (Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , 2020), the full text of which is available at the following link http://proceedings.mlr.press/v108/mcallester20a.html.

Said theorem reads as follows:

Theorem 3.1. Let $B$ be any distribution-free high-confidence lower bound on $D_\text{KL}(p_X||q_X)$ computed with complete knowledge of $p_X$ but only samples from $q_X$.

More specifically, let $B(p_X, S, \delta)$ be any real-valued function of a distribution $p_X$, a multiset $S$, and a confidence parameter $\delta$ such that, for any $p_X$, $q_X$ and $\delta$, with probability at least 1-$\delta$ over a draw of $S$ from $q_X^N$ we have \begin{align*} D_\text{KL}(p_X||q_X) \ge B(p_X, S, \delta) \end{align*} for any such bound, and for $N\ge2$, for any $q_X$, with probability at least $1-4\delta$ over the draw of $S$ from $q_X^N$ we have \begin{align*} B(p_X, S, \delta) \le \text{ln}N. \end{align*}

Note in particular, that $p_X$ and $q_X$ are distributions over a random variable $X$ and that $N$ denotes the number of samples drawn from $q_X$.

The proof follows directly in the manuscript:

Proof. Consider distributions $p_X$ and $q_X$ and $N\ge 2$. Define $\tilde{q}_X$ by \begin{align*} \tilde{q}(x) &= \left(1-\frac{1}{N}\right)q_X(x) + \frac{1}{N}p_X(x) \end{align*} We now have $D_\text{KL}(p_X||\tilde{q}_X)\le \ln N$. [...]

How is this first step, i.e. $D_\text{KL}(p_X||\tilde{q}_X)\le \ln N$, derived?

My attempt so far

\begin{align*} D_\text{KL}\left(p_X||\tilde{q}_X\right) &= \text{E}_{x\sim p_X}\left[-\ln \frac{\tilde{q}_X(x)}{p_X(x)}\right]\\ &= \text{E}_{x\sim p_X}\left[-\ln \tilde{q}_X(x)\right] - \underbrace{\text{E}_{x \sim p_X}\left[-\ln p_X(x)\right]}_{\ge 0,\ \text{entropy term}} \\ &\le \text{E}_{x \sim p_X}\left[-\ln \left(\left(1-\frac{1}{N}\right)q_X(x) + \frac{1}{N}p_X(x)\right)\right] \\ &\le \text{E}_{x \sim p_X}\left[- \left(1-\frac{1}{N}\right) \ln q_X(x) + -\frac{1}{N}\ln p_X(x) \right] \\ &= \left(1-\frac{1}{N}\right) \text{E}_{x \sim p_X}\left[-\ln q_X(x)\right] + \frac{1}{N}\text{E}_{x \sim p_X}\left[-\ln p_X(x) \right] \end{align*} where the second to last step exploites the fact that $-\ln$ is a convex function, and $\tilde{q}$ is a convex combination of $q$ and $p$. However this does not seem to be of any help...