Question
I am having a hard time deriving the first step in the proof of Theorem 3.1 as found in McAllester and Stratos's paper entitled ``Formal Limitations on the Measurement of Mutual Information '' (Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , 2020), the full text of which is available at the following link http://proceedings.mlr.press/v108/mcallester20a.html.
Said theorem reads as follows:
Theorem 3.1. Let $B$ be any distribution-free high-confidence lower bound on $D_\text{KL}(p_X||q_X)$ computed with complete knowledge of $p_X$ but only samples from $q_X$.
More specifically, let $B(p_X, S, \delta)$ be any real-valued function of a distribution $p_X$, a multiset $S$, and a confidence parameter $\delta$ such that, for any $p_X$, $q_X$ and $\delta$, with probability at least 1-$\delta$ over a draw of $S$ from $q_X^N$ we have \begin{align*} D_\text{KL}(p_X||q_X) \ge B(p_X, S, \delta) \end{align*} for any such bound, and for $N\ge2$, for any $q_X$, with probability at least $1-4\delta$ over the draw of $S$ from $q_X^N$ we have \begin{align*} B(p_X, S, \delta) \le \text{ln}N. \end{align*}
Note in particular, that $p_X$ and $q_X$ are distributions over a random variable $X$ and that $N$ denotes the number of samples drawn from $q_X$.
The proof follows directly in the manuscript:
Proof. Consider distributions $p_X$ and $q_X$ and $N\ge 2$. Define $\tilde{q}_X$ by \begin{align*} \tilde{q}(x) &= \left(1-\frac{1}{N}\right)q_X(x) + \frac{1}{N}p_X(x) \end{align*} We now have $D_\text{KL}(p_X||\tilde{q}_X)\le \ln N$. [...]
How is this first step, i.e. $D_\text{KL}(p_X||\tilde{q}_X)\le \ln N$, derived?
My attempt so far
\begin{align*} D_\text{KL}\left(p_X||\tilde{q}_X\right) &= \text{E}_{x\sim p_X}\left[-\ln \frac{\tilde{q}_X(x)}{p_X(x)}\right]\\ &= \text{E}_{x\sim p_X}\left[-\ln \tilde{q}_X(x)\right] - \underbrace{\text{E}_{x \sim p_X}\left[-\ln p_X(x)\right]}_{\ge 0,\ \text{entropy term}} \\ &\le \text{E}_{x \sim p_X}\left[-\ln \left(\left(1-\frac{1}{N}\right)q_X(x) + \frac{1}{N}p_X(x)\right)\right] \\ &\le \text{E}_{x \sim p_X}\left[- \left(1-\frac{1}{N}\right) \ln q_X(x) + -\frac{1}{N}\ln p_X(x) \right] \\ &= \left(1-\frac{1}{N}\right) \text{E}_{x \sim p_X}\left[-\ln q_X(x)\right] + \frac{1}{N}\text{E}_{x \sim p_X}\left[-\ln p_X(x) \right] \end{align*} where the second to last step exploites the fact that $-\ln$ is a convex function, and $\tilde{q}$ is a convex combination of $q$ and $p$. However this does not seem to be of any help...