I have been reading blogs on Expectation. I am quite familiar with what it is but still don't understand what do we infer from it, what does it tell us about the experiment?
For eg: expected number of coin flips for two consecutive heads is 6. So what does this "6" say...are these the most probable coin flips that would get me two consecutive heads? but again most probable value is different from expected value...
Can someone please explain it?
PS: please refer some good book/blog that covers expectation to the depth.
Suppose you repeat your random experiment a large number of times, each time observing the value of a random variable $X$. The expected value of $X$ is your prediction for the average of all these observed values.
Here are some more details in the case where $X$ is a discrete random variable whose possible values are $x_1, \ldots, x_m$. Suppose that we repeat our random experiment $N$ times. Let $n_i$ be the number of trials for which $X$ is equal to $x_i$. Then the average of the $N$ observed values of $X$ is $$ A =\frac{1}{N} \sum_{i=1}^m n_i x_i = \sum_{i=1}^m \frac{n_i}{N} x_i. $$ What would be a reasonable prediction for this average value? Our predicted value for $n_i/N$ is $p_i = P(X = x_i)$. So, it makes sense to predict that the value of $A$ will be $\sum_{i=1}^m x_i p_i$.
What if $X$ is not discrete? We could consider separately the case where $X$ is continuous with density function $f$, and we would discover that $\int_{-\infty}^\infty x f(x) \, dx$ is a reasonable prediction for the value of $A$. But, what about random variables that are neither discrete nor continuous? It would be nice to have one method that works in all cases.
Here is a similar but more general approach to predicting the value of $A$ that makes sense for any random variable $X$. Chop up the real line into extremely tiny intervals $[x_i, x_{i+1})$. Let $n_i$ be the number of trials for which $X \in [x_i, x_{i+1})$. The average value $A$ of the $N$ observed values of $X$ can be approximated as $$ A \approx \sum_{i=-\infty}^\infty \frac{n_i}{N} x_i. $$ Our prediction for the value of $n_i/N$ is $P(X \in [x_i, x_{i+1}))$. So our prediction for the value of $A$ is $$ \sum_{i=-\infty}^\infty x_i P(X \in [x_i, x_{i+1})). $$ If we were to chop up the real line more and more finely, we would obtain better and better predictions for the value of $A$. The limit of these predictions is a number that is denoted $$ \int X dP. $$ This number, also denoted $E(X)$, is our best prediction for the average value of $X$ over a large number of trials.
By the way, this number $\int X dP$ is called the (Lebesgue) integral of $X$ with respect to the probability measure $P$. The above thought process is a natural way to discover the Lebesgue integral.