I am reading some tutorials for Coordinate ascent variational inference (CAVI). The below derivation appears many times without detailed explanation:
$$ E_{q}[log p(z_j|z_{-j},x)] = \int q(z_j) E_{q_{-j}} [log p(z_j|z_{-j},x)] $$
This decomposed the expectation over $q(z)$ as an integral over $z_j$ of an expectation over $q(z_{−j})$. $z_j$ is $j$th unknown parameter in $z$, and $x$ is known data.
It appears in
- top of page6 of this tutorial: https://www.cs.cmu.edu/~epxing/Class/10708-15/notes/10708_scribe_lecture13.pdf
- equation 21-22 in this tutorial: https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf
- equation 19 in this paper: Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. "Variational inference: A review for statisticians." Journal of the American statistical Association 112.518 (2017): 859-877. https://www.cs.columbia.edu/~blei/fogm/2018F/materials/BleiKucukelbirMcAuliffe2017.pdf
My (wrong) understanding is:
Using the chain rule, I have: $$ p(z|x) = \prod_j p(z_j | z_{1:(j-1)}, x) $$
Thus,
$$ log(p(z|x)) = log(\prod_j p(z_j | z_{1:(j-1)}, x)) =\sum_j log(p(z_j | z_{1:(j-1)}, x)) $$
Thus, $$ E_{q(z)}(log(p(z | x)))= E_{q(z)}[\sum_j log(p(z_j | z_{1:(j-1)}, x))]= \sum_j E_{q(z_j)} [log(p(z_j | z_{1:(j-1)}, x))] $$
So for $j$th unknown (assume it is the last unknown), we have $E_{q(z_j)} [log(p(z_j | z_{-j}, x))]$ for $j$th unknown.
Firstly, I don't understand why in those tutorial it is $E_{q(z)} [log(p(z_j | z_{-j}, x))]$, instead of $E_{q(z_j)} [log(p(z_j | z_{-j}, x))]$. Secondly, I don't know how they move from $E_{q(z)} [log(p(z_j | z_{-j}, x))]$ to $\int q(z_j) E_{q_{-j}} [log p(z_j|z_{-j},x)]$
Thanks!
Carol