Understanding the proof of sample mean being unbiased estimator of population mean in SRSWOR.

Question

Understanding the proof of sample mean being unbiased estimator of population mean in SRSWOR.

5.5k Views Asked by user142971 At 01 Jul 2025 - 2:02

I was reading about the proof of the sample mean being the unbiased estimator of population mean. Here is the concerned derivation:

Let us consider the simple arithmetic mean $\bar y = \frac{1}{n}\,\sum_{i=1}^{n} y_i$ as an unbiased estimator of population mean $\overline Y = \frac{1}{N}\,\sum_{i=1}^{N} Y_i$.

Simple Random Sampling Without Replacement

Let $t_i = \sum_{i=1}^n y_i\;.$

\begin{align}\mathrm E(\bar y)&= \frac{1}{n}\,\mathrm E{\left(\sum_{i=1}^n y_i\right)}\\&= \frac{1}{n}\,\mathrm E(t_i)\\ &= \frac{1}{n}\color{red}{\left(\frac{1}{N \choose n}\,\sum_{i=1}^{N \choose n}t_i\right)}\\ &= \frac{1}{n}\left(\frac{1}{N \choose n}\,\sum_{i=1}^{N \choose n}\left(\sum y_i\right)\right)\\ &= \frac{1}{n}\left(\frac{1}{N \choose n}\,\color{red}{{N-1\choose n-1}\sum_{i=1}^N y_i}\right)\\ &=\frac{1}{N}\sum_{i=1}^{N}y_i\\ &= \overline Y\;.\end{align}

I couldn't understand the derivation; I really couldn't conceive how the red coloured terms came from nowhere.

Can anyone please help me explain the derivation by showing how the red coloured terms came in the concerned steps?

Original Q&A

There are 2 best solutions below

DomB On 03 Apr 2016 - 8:28

I have been working on a question for this forum. During the process I was able to solve my own problem. However, I thought it might be useful if I shared a proof encountered in William Cochran(1977: page 22): Sampling Techniques. He proves that the sample mean $\bar y$ is an unbiased estimate of the population mean $\mu$ in a simple random sampling (SRS) process while sampling without replacement. An estimation is unbiased if the average value of the estimate, taken over all possible samples of given size n, is exactly equal to the true population value.

To investigate whether $\bar y$ is unbiased with SRS, he calculates the value $\bar y$ for all $_N C _n$ samples in order to find the average of the estimates. Here $_N C _n$ stands for the sample space (i.e. the different sample compositions you can obtain for different N's and n's - see tree diagram). His starting point is:

$$E(\bar y) = \frac{\sum_{}\bar y}{_N C _n} = \frac{\sum_{} y_1+y_2+y_3+....+y_n}{n[N!/n!(N-n)!]}$$

Again, $\frac{nN!}{n!(N-n)!}$ stands for the different outcomes you can obtain in total. For example, if N = 4 and n = 2 than your sample space consists of 12 potential outcomes.

To evaluate this sum, we find out in how many samples any specific value $y_i$ appears. Since there are (N-1) other units available for the rest of the sample and (n-1) other places left in our sample, the number of samples containing $y_i$ is

$$_{N-1} C _{n-1} = \frac{(N-1)!}{(n-1)!(N-n)!]}$$

For example, if N = 4 (i.e. 1,2,3,4) and n = 2 then you have 3 potential outcomes that contain 1 (i.e. 1 & 2 or 1 & 3 or 1 & 4). Hence:

$$\sum_{} y_1+y_2+y_3+....+y_n = \frac{(N-1)!}{(n-1)!(N-n)!]}(y_1+y_2+y_3+....+y_n)$$

Plug this into the first equation and obtain:

$$E(\bar y) = \frac{(N-1)!}{(n-1)!(N-n)!]} \frac{n!(N-n)!}{nN!}(y_1+y_2+y_3+....+y_n)$$

This simplifies to:

$$E(\bar y) = \frac{(y_1+y_2+y_3+....+y_n)}{N} $$

This is equal to $\mu$. I hope this helps

**joriki** · Accepted Answer

My first advice would be to get a different book, since this one is very badly written – the use of $i$ as both a bound and a free variable in the same equation and as the summation index of both sums in a double sum is pure madness.

The first red step takes the expectation over all possible samples, of which there are $\binom Nn$, the number of ways of choosing $n$ members of the population from $N$. The index $i$ here runs from $1$ to $\binom Nn$, whereas the other, entirely unrelated index $i$ runs from $1$ to $n$. They then substitute the sum over the sample values, exacerbating the notational chaos by explicating the $i$ on one sum and not the other but using the implicit and not the explicit one in the summand, thus completely obscuring which sum is summing over what.

In the second red sum, the sum is now over the population, and they counted in how many possible samples each member of the population occurs. This is $\binom{N-1}{n-1}$, since $n-1$ other members of the sample can be chosen from the $N-1$ other members of the population.

Understanding the proof of sample mean being unbiased estimator of population mean in SRSWOR.

Simple Random Sampling Without Replacement

There are 2 best solutions below

Related Questions in STATISTICS

Related Questions in SAMPLING-THEORY

Trending Questions

Popular # Hahtags

Popular Questions