I have a small problem when watching this video https://youtu.be/2pEkWk-LHmU?t=12m59s, on proving minimizing KL divergence is the same as maximizing ELBO.
The equation goes:
KL(q(z) || p(z|x)) = E_q[log(q(z))] - E_q[log(p(z|x))]
I know that p(z|x) = p(z,x)/p(x), so the later half should expand to
E_q[log[p(z,x)/p(x)]]
= E_q[log(p(z,x)) - log(p(x))]
= E_q[log(p(z,x))] - E_q[log(p(x))]
but in the video, the later half is shown as log(p(x)) with expectation sign dropped....why is it that the first term we can't drop but the second term we can??
As he states in the video,
The expectation $\mathbb{E}_q$ is with respect to the randomness in $z$ which follows the distribution $q$. Since $\log p(x)$ has no $z$, it is deterministic/constant, so the expectation can be dropped.
Edit for clarification: throughout the derivation, $x$ is constant. However, $z$ is a random variable. In the derivation above, the density of $z$ is $q$. [Note that it is important to clarify this because $z$ can follow other distributions. For example, in the latent variable model the distribution of $z$ is $p$, not $q$.]
So, $\mathbb{E}_q[z]$ is just the expectation of $z$ when it follows the distribution $q$. More generally, for any function $f$, $\mathbb{E}_q[f(z)]$ is the expectation of $f(z)$ when $z$ follows the distribution $q$. For example, you begin with $\mathbb{E}_q[\log p(z \mid x)]$ which is a special case where $f(z):= \log p(z \mid x)$.
Now, if $c$ is some constant (deterministic, does not depend on the random variable $z$), then $\mathbb{E}_q[c]=c$. This is the case here with $c=\log p(x)$; since $x$ is constant, $\log p(x)$ is a constant.