Empirical distribution vs. the true one: How fast $KL( \hat{P}_n || Q)$ converges to $KL( P || Q)$?

1.6k Views Asked by At

Let $X_1,X_2,\dots$ be i.i.d. samples drawn from a discrete space $\mathcal{X}$ according to probability distribution $P$, and denote the resulting empirical distribution based on n samples by $\hat{P}_n$. Also let $Q$ be an arbitrary distribution. It is clear that (KL-divergence)

\begin{equation} KL( \hat{P}_n || Q) \stackrel{n\rightarrow \infty}{\longrightarrow} KL(P || Q) \end{equation}

but I am wondering if there exist any known quantitative rate of convergence for it. I mean if it can be shown that

\begin{equation} \Pr\Big[ | KL( \hat{P}_n || Q) - KL(P || Q) | \geq \delta\Big] \leq f(\delta, n, |\mathcal{X}|) \end{equation}

and what is the best expression for the RHS if there is any.

Thanks a lot!

3

There are 3 best solutions below

0
On

Here's a thought. I don't know whether I am in an imaginary world and asking too much. Any way, let me propose this. I am writing it as an answer as it is little larger to put it as a comment.

Suppose, for every fixed $Q$, we can find a linear family $\Pi$ of prob. distributions consisting of all empirical measures $\hat{P_n}$ and $P$, and such that $P$ is the $I$-projection of $Q$ on $\Pi$, then the Pythagorean property $$D(\hat{P_n}\| Q)=D(\hat{P_n}\|P)+D(P \| Q)$$ holds and hence the convergence rate of $D(\hat{P}_n\| Q)\to D(P \| Q)$ is same as that of $D(\hat{P}_n\| P)\to 0$.

0
On

In addition to the last answer, the most popular concentration inequality for the KL divergence is for finite alphabets. You can look for Theo. 11.2.1 of "Elements of Information Theory" by Thomas Cover and Joy Thomas: $$\mathbf{P}\left(D(\hat{P}_n\|P)\geq\epsilon\right)\leq e^{-n\left(\epsilon-|\mathcal{X}|\frac{\log(n+1)}{}n\right)}$$

0
On

This is addressed in

Rohit Agrawal. Multinomial concentration in relative entropy at the ratio of alphabet and sample sizes. CoRR, abs/1904.02291

See, e.g., Theorem 8 in this short note of mine.