I am reading section $20.10.3$ of the book Deep Learning on Variational Autoencoders, where the authors write:
To generate a sample from the model, the VAE first draws a sample $z$ from the code distribution $p_{model}(z)$. The sample is then run through a differentiable generator network $g(z)$. Finally, $x$ is sampled from a distribution $p_{model}(x;g(z)) =p_{model}(x | z)$. During training, however, the approximate inference network (or encoder) $q(z | x)$ is used to obtain $z$, and $p_{model}(x | z)$ is then viewed as a decoder network.The key insight behind variational autoencoders is that they can be trained by maximizing the variational lower bound $L(q)$ associated with data point $x:$
$$L(q) = \mathbb{E}_{z∼q(z|x)}log (p_{model}(z, x)) + H(q(z | x))$$ $$= \mathbb{E}_{z∼q(z|x)}log (p_{model}(x | z)) − D_{KL}(q(z | > x)||p_{model}(z))$$ $$≤ log (p_{model}(x))$$
I'm not sure about the last inequality. I know that the $KL$ divergence is always positive, but I'm not sure why the expectation is not present. By definition: $\mathbb{E}_{z∼q(z|x)}log (p_{model}(x | z)) = \sum_{z \in Z} q(z|x) log (p_{model}(x | z))$, but how does this relate to $log (p_{model}(x))$?