I am trying to understand the following equivalence:
\begin{array}{c}{\mathbb{E}_{p(x, y)}\left[\log \frac{q(x | y)}{p(x)} \frac{p(x | y)}{q(x | y)}\right] =\mathbb{E}_{p(x, y)}\left[\log \frac{q(x | y)}{p(x)}\right]+\mathbb{E}_{p(y)}\left[D_{K L}(p(x | y) \| q(x | y))\right]}\end{array}
which is used in [1] at eq. 2. The first term is clear to me. However I don't understand completely how the second term is calculated. As far as I understand it is based on the following:
$ \mathbb{E}_{p(x,y)}\left[\log \frac{p(x | y)}{q(x | y)}\right] = \mathbb{E}_{p(y)}\left[D_{K L}(p(x | y) \| q(x | y))\right] $
Apparently we have;
$D_{K L}(p(x | y) \| q(x | y)) = \mathbb{E}_{p(x | y)}\left[\log \frac{p(x | y)}{q(x | y)}\right]$
So to my understanding this implies the following:
$ \mathbb{E}_{p(x,y)}\left[\log \frac{p(x | y)}{q(x | y)}\right] = \mathbb{E}_{p(y)}\left[ \mathbb{E}_{p(x | y)}\left[\log \frac{p(x | y)}{q(x | y)}\right]\right] $
Unfortunately It is not clear to me how respectively why $\mathbb{E}_{p(x,y)}$ can be decomposed into this nested expectations under the marginal $p(y)$ and the conditional $p(x|y)$. I think I am clearly missing a very simple point here.
This is just the law of total expectation: Quite generally,
$$ \mathbb E_{p(x,y)}[Z]=\mathbb E_{p(y)}\left[\mathbb E_{p(x\mid y)}[Z]\right]\;. $$
That is, you can first form the expectation as if you knew $Y$, and then form the expectation of the result using the marginal distribution of $Y$. In the discrete case, this is just
$$ \sum_{x,y}p(x,y)Z(x,y)=\sum_y\left(\sum_\xi p(\xi,y)\right)\sum_x\frac{p(x,y)}{\sum_\xi p(\xi,y)}Z(x,y)\;. $$