Empirical distribution vs. the true one: How fast $KL( \hat{P}_n || Q)$ converges to $KL( P || Q)$?

1.6k Views Asked by Bumbble Comm At 13 Apr 2026 - 2:44

Let $X_1,X_2,\dots$ be i.i.d. samples drawn from a discrete space $\mathcal{X}$ according to probability distribution $P$, and denote the resulting empirical distribution based on n samples by $\hat{P}_n$. Also let $Q$ be an arbitrary distribution. It is clear that (KL-divergence)

\begin{equation} KL( \hat{P}_n || Q) \stackrel{n\rightarrow \infty}{\longrightarrow} KL(P || Q) \end{equation}

but I am wondering if there exist any known quantitative rate of convergence for it. I mean if it can be shown that

\begin{equation} \Pr\Big[ | KL( \hat{P}_n || Q) - KL(P || Q) | \geq \delta\Big] \leq f(\delta, n, |\mathcal{X}|) \end{equation}

and what is the best expression for the RHS if there is any.

Thanks a lot!

Original Q&A

There are 3 best solutions below

Bumbble Comm On 10 Dec 2012 - 6:00

Here's a thought. I don't know whether I am in an imaginary world and asking too much. Any way, let me propose this. I am writing it as an answer as it is little larger to put it as a comment.

Suppose, for every fixed $Q$, we can find a linear family $\Pi$ of prob. distributions consisting of all empirical measures $\hat{P_n}$ and $P$, and such that $P$ is the $I$-projection of $Q$ on $\Pi$, then the Pythagorean property $$D(\hat{P_n}\| Q)=D(\hat{P_n}\|P)+D(P \| Q)$$ holds and hence the convergence rate of $D(\hat{P}_n\| Q)\to D(P \| Q)$ is same as that of $D(\hat{P}_n\| P)\to 0$.

Bumbble Comm On 10 Sep 2021 - 12:09

In addition to the last answer, the most popular concentration inequality for the KL divergence is for finite alphabets. You can look for Theo. 11.2.1 of "Elements of Information Theory" by Thomas Cover and Joy Thomas: $$\mathbf{P}\left(D(\hat{P}_n\|P)\geq\epsilon\right)\leq e^{-n\left(\epsilon-|\mathcal{X}|\frac{\log(n+1)}{}n\right)}$$

Bumbble Comm On 10 Sep 2021 - 12:17

This is addressed in

Rohit Agrawal. Multinomial concentration in relative entropy at the ratio of alphabet and sample sizes. CoRR, abs/1904.02291

See, e.g., Theorem 8 in this short note of mine.

Empirical distribution vs. the true one: How fast $KL( \hat{P}_n || Q)$ converges to $KL( P || Q)$?

There are 3 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in REFERENCE-REQUEST

Related Questions in INFORMATION-THEORY

Related Questions in LEARNING

Trending Questions

Popular # Hahtags

Popular Questions