I read the following in Stein / Shakarchi's Fourier Analysis book, where they discussed the notion of expectation of a probility density.
"Consider the simpler (idealized) situation where we are given that the particle can be found at only finitely many different points, $x_1, x_2, \ldots , x_N$ on the real axis, with $p_i$ the probability that the particle is at $x_i$, and $p_1 +p_2 + \ldots + p_N =1$. Then, if we knew nothing else, and were forced to make one choice as to the position of the particle, we would naturally take $x = \Sigma x_i p_i$, which is the appropriate weighted average of the possible positions."
This makes no sense to me, for the following reason: Suppose that $x_1 = -1$ and $x_2 = 1$, with $p_1 = p_2 = 1/2$. Then $x = 1/2 - 1/2 = 0$, so their logic dictates that we should pick $0$ as a best guess for where the particle should be. But this makes no sense, because the particular cannot appear at $0$... so I certainly wouldn't pick it.
I mean - x is generically not in the set of possibilities, unless there is one point... help? What do they mean?
The whole purpose of calculating the arithmetic mean of a distribution is to minimize the expectation of the square of the error. It all makes sense if you are penalized one dollar per square of the error. So if you know that tossing 3 fair coins gives you the usual distribution of outcomes, your best guess for the number of heads to show is 1.5, even though that is an impossible outcome. However, it minimizes the expected square of the error. If you change the penalty system, you change the optimum guess. For example, if you know that there is a penalty of \$1000 for guessing a larger number than what happens, and no penalty for guessing a lower number, then you would choose 0. That would minimize your expected penalties. This is also found in real life, where people bias their estimations of things in accordance with the penalties of getting it right or wrong. This is kind of related to game theory.