Sampling distribution of the mean confusion

285 Views Asked by At

Say we have some population and decide to randomly take a sample of size $n$ from this population. What does it then mean to talk about the distribution of the sample mean?

In other words, what do we mean by the distribution of the sample mean here?


I’m quite confused about this as from what I understood from my textbook, it doesn’t make sense to talk about a distribution of a sample mean in this scenario (since we are looking at one sample). Obviously, I’m wrong, but I’m not seeing why.

3

There are 3 best solutions below

0
On BEST ANSWER

If you take $n$ i.i.d. samples from a population with a particular distribution, you can then take the mean of the sample. This sample mean will also be a random variable, and will have its own distribution, called the distribution of the sample mean.

In particular, the mean of the distribution of the sample mean will be the mean of the original distribution for the population, while the variance of the distribution of the sample mean will be the variance of the original distribution for the population divided by $n$ (assuming these exist).

So the sample mean is an unbiased estimator of the population mean, while the likely error will reduce as the sample size $n$ increases.

As an example, suppose the population is exponentially distributed with rate $1$ (so its mean is $1$ and its variance is $1$), and you take a sample of size $n=4$. You might get data like one of the following rows

          X1          X2          X3         X4    sample mean
 [1,] 0.3929842 0.59443344 0.03191802 0.1188843 0.2845550
 [2,] 0.3873812 0.20919761 1.53515619 1.6538193 0.9463886
 [3,] 0.1229839 0.75581917 0.06593355 0.7986400 0.4358442
 [4,] 2.0959258 0.97214578 0.40010242 0.4257327 0.9734767
 [5,] 0.3621020 0.14805365 0.22586113 3.3202165 1.0140583
 [6,] 2.4047787 0.10890730 1.14527881 0.6846319 1.0858992
 [7,] 0.3713004 0.09458689 2.71174074 1.7904559 1.2420210
 [8,] 0.1483070 0.92459828 0.63327521 3.1271168 1.2083243
 [9,] 0.1239702 0.43675343 2.55238171 0.8339072 0.9867531
[10,] 0.6990532 0.23775550 0.90174834 0.5666429 0.6013000

and you may be able to see that many different values are possible, but the sample mean often tends to be closer to $1$ than the individual sample. In this example, the individual samples have an exponential distribution by construction as shown by the blue density in the chart below, while the mean has a stretched gamma distribution (which would get closer in shape to a normal distribution if the sample size were increased) as shown by the red density below, more concentrated around $1$. It is this red density line which shows the distribution of the sample mean

distribution of the sample mean

0
On

With the help of @Stephen, it seems you have a good intuitive understanding of the idea of 'standard error.' Here is an elementary statement of some basic facts so that you will know the precise terminology involved.

Estimation of $\mu$ when $\sigma$ is known. If you are starting with a normal population with unknown mean $\mu$ and known SD $\sigma,$ then the sample mean $\bar X$ is the estimator of $\mu.$ In this context $SD(\bar X) = \sigma/\sqrt{n}$ is called the standard error of the mean. Perhaps the most common application is that a 95% confidence interval for $\mu$ is of the form $\bar X \pm 1.96\sigma/\sqrt{n}.$

Estimation of $\mu$ when $\sigma$ is unknown. If you are starting with a normal population with unknown mean $\mu$ and variance $\sigma^2,$ then $\bar X$ is estimator of $\mu$ and the sample variance $S^2$ is the estimator of $\sigma^2;$ with $E(\bar X) = \mu,\,$ $E(S^2) = \sigma^2,\,$ $Var(\bar X) = \sigma^2/n,\,$ and $SD(\bar X) = \sigma/\sqrt{n}.$

Then the standard deviation of the mean $SD(\bar X) = \sigma \sqrt{n}$ is again called the standard error of the mean. However, because $\sigma$ is unknown, the the estimated standard error of the mean is $S/\sqrt{n}.$ By abbreviation or sloppiness, the word "estimated" is sometimes dropped and one refers to $S/\sqrt{n}$ as the "standard error of the mean."

In this case, a 95% CI for $\mu$ is of the form $\bar X \pm t^*S/\sqrt{n},$ where $t^*$ (cutting probability 2.5% from the upper tail of Student's t distribution with $n - 1$ degrees of freedom) can be found from printed tables or using software.

0
On

Suppose $n=1$ and $X_1,X_2,X_3$ are independent and identically distributed and $$X_1 = \begin{cases} 1 & \text{with probability } 1/2, \\ 2 & \text{with probability } 1/2. \end{cases}$$ The sample mean is $(X_1+X_2 + X_3)/3.$

You have $$ (X_1,X_2,X_3) = \begin{cases} (1,1,1) \\ (1,1,2) \\ (1,2,1) \\ (1,2,2) \\ (2,1,1) \\ (2,1,2) \\ (2,2,1) \\ (2,2,2) \end{cases} $$ each with probability $1/8.$ Therefore $$ \frac{X_1+X_2 + X_3} 3 = \begin{cases} 1 & \text{with probability } 1/8, \\ 4/3 & \text{with probability } 3/8, \\ 5/3 & \text{with probability } 3/8, \\ 2 & \text{with probability } 1/8. \end{cases} $$ That is the probability distribution of the sample mean.