I am reading Pattern Recognition and Machine Learbing by Christopher Bishop and I am having a hard time tracking down an explanation of this formula.
In chapter 1 he states...
we can readily find expectations of functions of x under the Gaussian distribution in particular the average value of x is given by
$$ \mathbb{E}_x = \int_{-\infty}^\infty N(x | \mu, \sigma^2)x dx = \mu $$
How can I explain why this is true?
Just consider the case you have a variable taking n values $ x_1$ to $x_n$. You are asked to find the expected value of the variable. It would be the mean of that variable over all the values it can take. In case each value is equally probable(1/n) you can simply find this as $\frac 1 n \sum x_i$. But they may not be equiprobable. So then, you multiply each by its probability (which was 1/n in earlier case) and add them. For a random variable, the probability is given by pmf or $f_X(x)$. So $E(X)=\sum xf_X(x)$. So this was the discrete random variable. In the continuous, also the expression is similar, we use PDF, which gives a measure of probability and instead of summation we use integration. So $E(X)=\int xf(x) dx$