Entropy and KL divergence $H(\hat{\theta}|D_n) - H(\hat{\theta}|D_n, Z_{n+1})$ where $D_n = \{Z_i\}_{i= 1}^n$

22 Views Asked by At

Consider a stream of data $D_n = \{Z_i\}_{i= 1}^n$ and the goal is to estimate a parameter $\theta$ based on the observed data. In a lecture note I am reading it says that, if $H(X)$ denotes the entropy of distribution $X$, then,

\begin{equation} H(\hat{\theta}|D_n) - H(\hat{\theta}|D_n, Z_{n+1}) = \mathbb{E}_{\hat{\theta} \sim \mathcal{P}(\cdot|D_n)} [KL((Z_{n+1}|\hat{\theta}) \mid \mid Z_{n+1}|D_n)], \end{equation} where KL denotes KL-divergence. I am not sure how the note concluded that. What is the proof of this relation?

1

There are 1 best solutions below

0
On BEST ANSWER

In general you can write $$I(X;Y|Z)=D_{KL}(p(x,y)||p(x)p(y)\big| p(z))=\mathbb{E}_{p(x|z)}\left[ D_{KL}(p(y|x)||p(y)\big| p(z)) \right].$$

For further details, you may refer to this lecture notes: http://people.lids.mit.edu/yp/homepage/sdpi_course.html