I am familiar with several interpretations of the KL divergence, last week I heard of a new one, mentioned in a lecture on probabilistic graphical models. It was stated kind of offhandedly, so I hope I'm getting the gist, but I seem to remember something like
"The KL divergence between a distribution $\mathcal{D}$ and an empirical distribution $\mathcal{D}_{emp}$ based on a sample $\mathcal{X}\sim\mathcal{D}$ is the number of bits required to represent the error of a MLE which is based on $\mathcal{X}$"
(I know this sounds weird, MLE for what parameter? Maybe for any parameter? I'm honestly not sure)
Is anyone familiar with such a result?