I'm reading about information criteria and I bumped into an example, where the author tries to approximate a true data generating normal function $g(x|\mu_0, \sigma^2_ 0)$ with an approximation model:
$$f(x|\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right).$$
The goal is to have $f(x|\mu,\sigma^2)\approx g(x|\mu_0, \sigma^2_ 0)$. Anyways, in the text I'm reading the author then writes the following:
$$\log f(x|\mu,\sigma^2) = -\frac{1}{2}\log2\pi\sigma^2-\frac{(x-\mu)^2}{2\sigma^2},$$
which is completely clear. But this next one is where I start to get trouble, he writes:
$$E_g[\log f(x|\mu,\sigma^2)] = -\frac{1}{2}\log2\pi\sigma^2-\sigma^2_0+\frac{(\mu-\mu_0)^2}{\sigma^2},$$
where expectation has been calculated with respect to the true distribution $g$. It therefore seems to me that:
$$E_g\left[-\frac{(x-\mu)^2}{2\sigma^2}\right] = -\sigma^2_0+\frac{(\mu-\mu_0)^2}{\sigma^2}.$$
I don't immediately see why this is true, so I tried to show it myself:
$$E_g\left[-\frac{(x-\mu)^2}{2\sigma^2}\right]=-\frac{1}{2\sigma^2}\left(E_g\left[x^2-2x\mu+\mu^2\right]\right) = -\frac{1}{2\sigma^2}\left(E_g\left[x^2\right]-2\mu E_g\left[x\right]+E_g\left[\mu^2\right]\right)$$
$$=-\frac{1}{2\sigma^2}\left(E_g\left[x^2\right]-2\mu\mu_0+\mu^2\right).$$
In the above I have used the assumption that $E_g\left[x\right]=\mu_0$ and I also assume that $E_g\left[(x-\mu_0)^2\right]=\sigma^2_0$. So now I need to show that:
$$-\frac{1}{2\sigma^2}\left(E_g\left[x^2\right]-2\mu\mu_0+\mu^2\right)=-\sigma^2_0+\frac{(\mu-\mu_0)^2}{\sigma^2}$$
and now I start to doubt my derivation.
Question: Am I going in the right direction? Is there some mistake in my reasoning here that I have made?
P.S. If you need more information please let me know. Sorry I had small mistake in the equation, fixed.
UPDATE: to see my reference book where my question originates, please see page 62 in the book Information Criteria and Statistical Modeling
\begin{equation} E_g[x^2] = \sigma_0^2 + \mu_0^2 \end{equation} Therefore, \begin{equation} \begin{aligned} E_g[x^2] - 2 \mu \mu_0 + \mu^2 &= \sigma_0 + \mu_0^2 - 2 \mu \mu_0 + \mu^2\\ &=\sigma_0^2 + (\mu - \mu_0)^2 \end{aligned} \end{equation} and it holds that \begin{equation} E_g\left[-\frac{(x-\mu)^2}{2\sigma^2}\right] = - \frac{\sigma^2_0+(\mu-\mu_0)^2}{2\sigma^2}. \end{equation}
This agrees with equation (3.7) in the book you are refering to. However, when there is a relation between $\sigma_0$ and $\sigma$ the formula can be reduced.
It seems in the book you are referring to that the specified model has a $\sigma$ given by the empirical variance. Try writing this in terms of the true variance.