Idea of Mean or Expectation value from statistical to probability theory

189 Views Asked by At

Mean of a random variable $X$, the expectation value of $X$ is defined as $$ \operatorname E(x)=\mu=\sum_{i=1}^n x_i p_i=x_1 p_1+x_2 p_2+\cdots+x_n p_n $$

Can I have a derivation of this formula from the statistical definition of mean ?

My Understanding

The mean of a statistical data is defined as $$ \bar{x}=\frac{\sum_{i=1}^n x_if_i}{\sum_{i=1}^n f_i} = \frac{\sum_{i=1}^n x_i f_i} n $$ The first expression defines the mean of data points that could be obtained if we perform a random experiment, ie. mean obtained from theoretical predictions using probability theory, and the later expresses the mean of the data points obtained after performing a random experiment, ie. mean obtained after the random experiment is performed. Please correct me if I am wrong !

My Attempt

Consider a random experiment and $X$ is a random variable taking values $\{x_1, x_2, \ldots, x_n\}$ with probabilities $p_i=P(x_i)$ and we are conducting $n(S)=N$ independent trials and the outcomes of the experiment are $\{ x^1, x^2, \ldots, x^N\}$. $$ p_i =\frac{n(x_i)}{n(S)}=\frac{n(x_i)}{N}\implies n(x_i)=N p_i $$ So the frequency of occurrence of $x_i$ is $f_i=N p_i$ $$ \bar{x}=\frac{\sum_{j=1}^n x^j} N = \frac{\sum_{i=1}^nf_i x_i} N \approx \frac{\sum_{i=1}^n N p_i x_i} N = \sum_{i=1}^n p_i x_i = \operatorname E(X) $$

Is there a better explanation/derivation of the expression of the expectation value in probability theory or how else one can show both expression mean the same thing ?

3

There are 3 best solutions below

0
On BEST ANSWER

You can't exactly show how to derive the population mean $\mu$ of a discrete random variable $X$ from the sample mean $\bar X$ of $n$ realizations of the random variable because $\mu$ and $\bar X$ are different things. However, a modification of the displayed equation at the end of your Question can show that the principle behind the two is the same. My approach is to give a slightly better version of your last equation.

First, if the observations $X_1, X_2, \dots, X_n$ are put into 'frequency-value' format, you have a frequency $f_j$ for each value $v_j.$ For example, if the $n = 10$ observations are $X = (1, 3, 2, 3, 4, 5, 2, 1, 1, 5),$ then you have $k = 5$ values $(v_j)$ 1,2,3,4,5 with respective frequencies $(f_j)$ 3, 2, 2, 1, 2. Because there are 10 observations the $f_j$'s must sum to 10. [Notice that I use different symbols for $X_i$'s and $v_j$'s in order to avoid confusion. And for convenience we assume that our sample of ten values has taken on all possible population values, otherwise we would have to include extra $v_j$'s with frequencies $f_j = 0.]$

There are two ways to compute $\bar X.$ First,

$$\bar X = \frac{1}{n}\sum_{i=1}^{n} x_i = \frac{1}{10}(1+3+2+3+4+5+2+1+1+5) = 27/10 = 2.7.$$

Second,

$$\bar X = \frac{\sum_{j=1}^k f_jv_j}{\sum_{j=1}^k f_j} = \frac{3(1)+2(2)+2(3)+1(4)+2(5)}{3+2+2+1+2} = 27/10 = 2.7.$$

Using the second expression for the sample mean, one can write

$$\bar X = \frac{\sum_{j=1}^k f_jv_j}{\sum_{j=1}^k f_j} = \frac{\sum_{j=1}^k f_jv_j}{n} = \sum_{j=1}^k (f_j/n)v_j = \sum_{j=1}^k r_jv_j,$$ where the last sum uses the notation $r_j = f_n/n$ for relative frequencies.

Now imagine larger and larger samples as the sample size $n$ approaches the population size $N.$ Then the values $v_j$ remain the same and are called $x_j$'s again to represent the various values the the random variable $X$ can assume. Also, by the Law of Large Numbers, the $r_j$'s converge to probabilities $p_j= P(X = x_j).$ Then we can say that the sample mean $\bar X_n$ of $n$ observations converges ("in probability") to the population mean $\mu.$ We might write this as $$\text{plim}_{n\rightarrow\infty}\bar X_n = \mu = E(X) = \sum_j p_jx_j = \sum_j x_jP(X=x_j),$$ where the sums are taken over all the possible values in the distribution.


As a practical example, consider $X \sim \mathsf{Binom}(n=4, p = 1/2),$ the distribution of the number of Heads when four fair coins are tossed. A sample of size $m = 10$ might have values $x = (3, 0, 4, 3, 1, 0, 2, 3, 2, 1).$ The frequencies of the five possible values are 2, 2, 2, 3, 1, respectively. The sample mean is $\bar X = 19/10 = 1.9.$ We know that the mean of the binomial distribution is $E(X) = \mu = np = 4(1/2) = 2.$ So $\bar X_{10}$ is close to $E(X),$ but not exactly equal. (Simulation in R statistical software.)

 x = rbinom(10, 4, .5)
 [1] 3 0 4 3 1 0 2 3 2 1
 mean(x)
 [1] 1.9

Now if we take a very large sample of size $m = 100,000$ (that's like playing $m$ four-toss games). Here are results from one simulation. Notice, in particular, the $\bar X_{100000} = 2.00035 \approx E(X) = 2.$

 x = rbinom(10^5, 4, .5)
 mean(x)
 [1] 2.00035  # sample mean is aprx E(X) = 2
 table(x)
 x
    0     1     2     3     4 
 6151 24994 37696 24987  6172  # frequencies of each value of X 
 table(x)/10^6 
 x
        0        1        2        3        4 
 0.006151 0.024994 0.037696 0.024987 0.006172 # relative frequencies

A histogram of the results is shown below [blue bars] along with the exact probabilities from $\mathsf{Binom}(4, .5)$ [red dots]. The relative frequencies of the sample of size 100,000 are not quite exactly the same as the binomial probabilities, but the differences are too small to see clearly at the resolution of the figure. (In this case the Density scale of the histogram is the same as a Relative Frequency scale, because bars are of width $1.$)

enter image description here

3
On

If I understand you correctly, then you are asking a rather difficult question.

Given a sequence of iid samples $X_1,X_2,\dots$ from a discrete random variable $X$,let $\overline X_n=(X_1+\dots+X_n)/n$ be the sample means of the first $n$ samples, and $E[X]=\sum_{i} p_i x_i$, where $x_i$ are the possible values of $X$. You are trying to prove that $$ \overline X_n\approx EX $$ This approximation can only hold when $n$ is large. There are two ways I can think to interpret this:

  • $\overline X_n$ is a random variable which is tightly concentrated around $EX$. This is easy to prove, provided $\def\V{\operatorname{Var}}\V X$ exists; straightforward calculations show $E\overline X_n=EX$ and $\V \overline X_n=\frac1n \V X$. This result is known as the Weak Law of Large Numbers.

  • As $n\to\infty$, the observations $\overline X_n$ will converge to $EX$. This is known as the Strong Law of Large Numbers, and is a bit difficult to prove.

0
On

If you look around in mathematics, we actually use the word "mean" for quite a few things: the arithmetic mean of a list of numbers, the geometric mean, the harmonic mean, the Mean Value Theorem, and perhaps a few others I don't know or didn't think of. Now in addition to all these other meanings of "mean" we have the expected value (or mean) of a random variable, the sample mean, and the population mean.

The sample mean is just that, the mean (specifically, the arithmetic mean) of the values you found in a sample.

The population mean is a somewhat different kind of thing than the mean of a random variable, because the idea of a population is that it is finite, so you cannot just keep taking larger and larger samples. Eventually you sample the whole population and you cannot get a larger sample; also, when you do that, your sample mean is (by definition) exactly the same as your population mean, not just an estimator of it.

But you can also have a kind of sample mean relative to a random variable: if you observe $n$ independent random variables with the same distribution, the sample mean will tend to be about the same as the mean of the random distribution. That is, just as you surmised, $\bar x \approx E(X).$ For any reasonably well-behaved random variable (there are some bizarre exceptions), the larger a sample you take, the more assurance you have of getting a close approximation. You can spend a whole semester in an undergraduate probability course working up toward this fact.

So I would say the definition of expectation in probability theory is independent of sample means in a formal mathematical sense; but it does relate to some ideas about the "real life" meaning of random variables, and it is also true that a sample mean is a good estimator of the expected value of the distribution from which the sample is drawn. So the connection you made is no accident.