When I was introduced to the concept of expected value, I learnt it as an extension of the calculation of arithmetic mean (I will illustrate this process below).
-Start of illustration-
Suppose two fair coins are tossed 6 times and X is the number of heads that occur per toss of the 2 coins - then the values of X are 0, 1 and 2. Suppose that the experiment yields 0 heads, 1 head and 2 head a total of 1, 3, and 2 times respectively. Then the average number of heads per toss of the two coins is then:
$\displaystyle \frac{(0)(1)+(1)(3)+(2)(2)}{6} \approx 1.17$ (3 s.f.)
We can then re-express the LHS as
$\displaystyle (0)(\frac{1}{6})+(1)(\frac{3}{6})+(2)(\frac{2}{6})$
Recognising that the fractions $\displaystyle \frac{1}{6}, \frac{3}{6}, \frac{2}{6}$ are the relative frequencies for the different values of X in the experiment, this was generalised to lead up to the formula for expectation (at least for discrete random variables):
$E(X)= \sum_{\text{all }x} x \cdot P(X=x)$
-End of illustration-
I could understand the derivation of the formula, but am slightly uncomfortable with the concept of "adding things up to get an average" - intuitively, we need to add things up AND divide things up to get an average value. I understand that, in the expectation formula, this division is implicitly embedded given the fact that we are multiplying each value by its probability (which is a fraction, and contains the "division" in the denominator). Nevertheless, the counter-intuitive idea of "adding up all the possible values" (albeit multiplying each by their probability of occurring) just does not seem comfortable. Addition should not give an average! This discomfort is compounded by the fact that expectation is called the "weighted average", not the "weighted sum" (or at least, far less commonly). If the emphasis is on summation, why don't we call it "weighted sum"?
Would really appreciate the advice of anyone who can help me see why "addition can give average" in an intuitive manner, or at least, comment on my confusion, thank you!
Suppose $X$ is a discrete random variable and the possible values of $X$ are $x_1, \ldots, x_m$. Let $p_i = P(X=x_i)$ for $i = 1, \ldots, m$.
Imagine that we repeat our random experiment $N$ times, each time observing a new value of $X$. We would like to predict the average of all these observed values of $X$. Let $N_i$ be the number of trials for which $X = x_i$. Then the average of the observed values of $X$ is $$ \frac{1}{N} \sum_{i=1}^m N_i x_i = \sum_{i=1}^m x_i \frac{N_i}{N}. $$ But we would predict that $N_i/N$ will be equal to $p_i$. So, it seems reasonable to predict that the average of the observed values of $X$ will be equal to $$ \sum_{i=1}^m x_i p_i. $$ This motivates the definition of the expected value of $X$. (Maybe we should really call it the "long run average value" of $X$, or just the average value of $X$.)