Distribution of the average

398 Views Asked by At

If the true population average is 15 and we are going to take a random sample and calculate the average 1,000,000 times, what is the distribution of the estimated average?

My thoughts:

By the CLT, $\frac{\bar{x} - E\bar{x}}{\sqrt{Var {\bar{x}}}} \sim Normal(0,1)$ as the number of trials to calculate the mean approaches infinity. So, the distribution of the estimated average should be $Normal(15, Var(\bar{x}))$, but $Var(\bar{x})= Var(X)/n$, where $X$ is a random number from the population and $n$ is the size of the random sample.

Is this right? So, are the random samples should be of the same size n?

2

There are 2 best solutions below

5
On

The Central Limit Theorem states that the distribution of the sample mean approaches a normal distribution as the sample size approaches infinity.

If you have a fixed sample size, then the sampling distribution will not actually be normal in most cases - for a very simple example, if the sample size is 1 then the distribution of the sample mean is the distribution of the population (because all you're doing is measuring one value from the population at a time).

However, if the sample mean is "big enough", then you can indeed say that the distribution of the sample mean is approximately normal, with the distribution you say, assuming that the sample is of fixed size. As it stands, the question seems to be missing some key information to comment more accurately.

2
On

Comment: I'm wondering (a) if the following shows what you're doing, and (b) What is the purpose of this? A classroom demonstration on the CLT? Or something else?

In R, we simulate a million realizations of $A = \bar X,$ the sample average of $n = 900$ values from $\mathsf{Norm}(\mu = 15, \sigma = 3).$

set.seed(506)
a = replicate( 10^6, mean(rnorm(900, 15, 3)) ) 
 mean(a)
[1] 14.99992     # aprx E(samp avg) = 15
var(a)
[1] 0.009992273  # aprx V(samp avg) = 9/900 = 0.01

Histogram of the one million sample averages $A = \bar X.$ The red curve is the density of $\mathsf{Norm}(\mu = 15, \sigma = 1/10).$

hist(a, prob=T, br=40, col="skyblue2")
  curve(dnorm(x, 15, 1/10), add=T, col="red")

enter image description here

Again, but with data sampled from a (right-skewed) exponential distribution. [R uses rate parameter $\lambda = 1/\mu]:$

set.seed(2010)
a = replicate( 10^6, mean(rexp(900, .1)) ) 
mean(a)
[1] 10.00039     # aprx E(samp mean) = 10
var(a)
[1] 0.1112148    # aprx Var(samp mean) = 100/900 = 0.1111

This time the distribution of $A = \bar X$ is very nearly normal, but not exactly (still a tiny bit of skewness, hardly visible in plot). The red curve is the density of $\mathsf{Norm}(\mu = 10, \sigma = 1/3);$ the (almost coincident) black dotted curve is the density curve of the exact distribution $\mathsf{Gamma}(\mathrm{shape}=900, \mathrm{rate} = 90)$ of $A = \bar X.$

hist(a, prob=T, br=40, col="skyblue2")
  curve(dnorm(x, 10, 1/3), add=T, col="red")

enter image description here