Samples made with the bootstrap method and its means distribution

231 Views Asked by At

So,can we state that sample means of bootstrap samples are distributed normally? And if no , how we can find confidence interval for mean of general distribution? I know that we can calculate 2.5 and 97.5 percentiles, but why mean of general distribution will be in it with 95% confidence?

1

There are 1 best solutions below

0
On BEST ANSWER

Suppose adults in your country have mean height 165cm with standard deviation 10cm and that heights are approximately normal.

Now suppose you have a random sample of size $n=1000$ of adults and that their mean height is $\bar X = 164.725$ with standard deviation $S =10.362,$ as shown below. [Sampling and computation in R.]

set.seed(2020)
x = round(rnorm(1000, 165, 10))
mean(x);  sd(x)
[1] 164.725
[1] 10.36228

A standard 95% confidence interval assuming normal heights is of the form $\bar X \pm 1.96 S/\sqrt{n},$ where 1.96 cuts probability 0.025 from the upper tail of Student's t distribution with 999 degrees of freedom (very nearly normal). This computes to give the CI $(164,08, 165.37)$. Essentially, $n = 1000$ may give you a narrower CI than you need. Adults in the country average very nearly 164.7cm in height.

pm = c(-1,1)
164.725 + pm*1.96*sd(x)/sqrt(1000)
[1] 164.0827 165.3673

Based on this same sample, 95% nonparametric bootstrap CI using the quantile method does not specifically assume that heights are normally distributed. (However the 1000 subjects with heights in vector x were sampled from a normal population, and so inevitably contain some information about the normality of the sample.) The bootstrap CI is $(164.08, 165.36).$ It is essentially the same as the CI above from normal theory.

set.seed(821)
a.obs = mean(x)  # observed average
d.re = replicate(5000, mean(sample(x, 1000, rep=T))-a.obs)
LU = quantile(d.re, c(.975,.025))
a.obs - LU
   97.5%     2.5% 
 164.077 165.357 

At each of its 5000 steps, the bootstrap procedure 're-samples' (with replacement) 1000 heights from among the 1000 heights in the sample and finds how much the average of the re-sample differs from the sample x itself.

Re-samples are taken with replacement. It would make no sense to sample without replacement because a sample without replacement would just be a rearrangement of the original sample. The idea is that re-sampling give an idea how variable sample means of size 1000 from such a sample might be. The result is that these deviations are very consistent; they average about 0 and their standard deviation is only about $1/3$ of a cm. So it is not surprising that the bootstrap CI is quite narrow--even if not as narrow as the CI from normal theory.

mean(d.re)
[1] 0.0030026
sd(d.re)
[1] 0.323941

It is important to understand that 're-samples' from a sample provide no new information about the population.