Let $X_1,X_2,\dots$ be i.i.d. samples drawn from a discrete space $\mathcal{X}$ according to probability distribution $P$, and denote the resulting empirical distribution based on n samples by $\hat{P}_n$. Also let $Q$ be an arbitrary distribution. It is clear that (KL-divergence)
\begin{equation} KL( \hat{P}_n || Q) \stackrel{n\rightarrow \infty}{\longrightarrow} KL(P || Q) \end{equation}
but I am wondering if there exist any known quantitative rate of convergence for it. I mean if it can be shown that
\begin{equation} \Pr\Big[ | KL( \hat{P}_n || Q) - KL(P || Q) | \geq \delta\Big] \leq f(\delta, n, |\mathcal{X}|) \end{equation}
and what is the best expression for the RHS if there is any.
Thanks a lot!
Here's a thought. I don't know whether I am in an imaginary world and asking too much. Any way, let me propose this. I am writing it as an answer as it is little larger to put it as a comment.
Suppose, for every fixed $Q$, we can find a linear family $\Pi$ of prob. distributions consisting of all empirical measures $\hat{P_n}$ and $P$, and such that $P$ is the $I$-projection of $Q$ on $\Pi$, then the Pythagorean property $$D(\hat{P_n}\| Q)=D(\hat{P_n}\|P)+D(P \| Q)$$ holds and hence the convergence rate of $D(\hat{P}_n\| Q)\to D(P \| Q)$ is same as that of $D(\hat{P}_n\| P)\to 0$.