may I ask for some reference pointers? My bad as I got a classic case of losing my reference and thus unsure what I wrote was right or wrong. I tried looking my old references and internet and didn't find anything (wrong keywords probably). So I was hoping to try my luck here.
Given vector $\vec{X}$ of $n$ iid observations on finite alphabet $\mathsf{X}$ from distribution $P$ and $\hat{P}_n(a) = \frac{1}{n}\sum_{i=1}^n \mathbb{1}_{X_i=a}$.
$$ E\left[D(\hat{P}_n||Q)-D(\hat{P}_n||P)\right] = D(P||Q) $$ where the expectation is under $P$ which is the same density of $\hat{P}_n$.
I hope there isn't any errors above as its from a hand-written note. Assuming if it is right, can anyone help point to where I can find the source regarding this ?
In general, for any distribution $T$:
$$ D(\hat{P}_n||T) = \sum_i \hat{P}_n(i) [\log \hat{P}_n(i) - \log T(i)] $$
Then
$$ D(\hat{P}_n||Q)-D(\hat{P}_n||P) = \sum_i \hat{P}_n(i) \log \frac{P(i)}{Q(i)} $$
The above is random, because $\hat{P}_n$ is random. Now, I'll assume that the samples come from distribution $P$. Taking expectation over $P$ , and using $E_P(\hat{P}_n(i))=P(i)$ and linearity of expectation, we get
$$ D(\hat{P}_n||Q)-D(\hat{P}_n||P) = \sum_i P(i) \log \frac{P(i)}{Q(i)} = D(P\mid\mid Q)$$