Why does bootstrapping approach the distribution of estimator, not mean of the estimator with normal distribution?

352 Views Asked by At

The wiki page (https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) sated that bootstrapping allow one to compute and estimate the approximate distribution of the estimator.

But why does bootstrapping approach the distribution of estimator, not mean of the estimator with normal distribution?

Suppose one resample the data of size $N$ for $100000$ times and compute the estimator every time, why does the distribution of the computed estimator not approach normal distribution?

1

There are 1 best solutions below

0
On BEST ANSWER

For most bootstrapping procedures, the answer to your main question is that the Law of Large Numbers is involved in the convergence to a useful result, but the Central Limit Theorem need not be relevant. As two examples, let's look at a nonparametric bootstrap and a parametric bootstrap based on a known distribution.

Nonparametric bootstrap: Suppose we have $n = 100$ observations, and want to estimate the mean of the population from which they came. We make two assumptions: (a) The population distribution mean $\mu$ exists. (b) Although we make no assumption about the shape of the population distribution, we assume that our $n$ observations are a random sample from that population.

Data description: A summary and stripchart of the $n = 100$ observations are shown below:

summary(x); stripchart(x, pch="|")
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   5.00   68.75   97.00   98.55  111.25  501.00 

enter image description here

Rationale for bootstrap: We can use $\bar X_{\text{obs}} = 98.55$ as a point estimate of $\mu.$ In order to make a 95% confidence interval for $\mu,$ we must have some idea of the variability of $\bar X.$ Specifically, if we knew the distribution of $D = \bar X - \mu,$ we could find numbers $L$ and $U$ such that $$P(L < \bar X - \mu < U) = P(\bar X - U <\mu < \bar X - L) = 0.95,$$ so that a 95% CI would be of the form $(\bar X - U, \bar X - L).$

Because we don't know the form of the population distribution, we use bootstrap re-sampling to estimate $L$ by $L^*$ and $U$ by $U^*.$

In the "bootstrap world": We take a large number $B$ of re-samples from the data x. These samples are of size $n = 100$ and are taken with replacement. For each re-sample, we find its mean $\bar X^*$ and the the re-sampled value $D^* = \bar X^* - \bar X_{{obs}}.$ (Temporarily, we use $\bar X_{{obs}}$ as a proxy for the unknown $\mu$ because it is the best available estimate.

Back in the "real world": Then we estimate $L$ by quantile .025 of the $B$ values of $D^*$ and we estimate $U$ by quantile .975 of the $D^*$'s. Then the 95% nonparametric bootstrap CI for $\mu$ is of the form $(\bar X_{{obs}} - U^*, \bar X_{{obs}} - L^*),$ where $\bar X_{{obs}}$ returns to its original role as the observed mean of the sample.

R code and results: The R code below implements the bootstrap procedure; in the code, the asterisks ($*$'s) above are replaced by suffixes .re and $\bar X_{{obs}}$ is denoted as a.obs. The bootstrap CI is $(85.6, 109.4).$ Because it is based on simulation the endpoints may change a little from one run to the next, but with $B = 10000$ iterations, not by enough to matter. [A second run gave the interval $(85.8, 109.4).]$

a.obs = mean(x);  n = 100;  B = 10000
d.re = replicate( B, mean(sample(x, n, rep=T)) - a.obs )
L.re = quantile(d.re, .025);  U.re = quantile(d.re, .975)
c(a.obs - U.re, a.obs - L.re)
    97.5%      2.5% 
 85.58975 109.42050 

Note: The data were simulated from a mixture distribution with $\mu = 100,$ so in this simulated example we know that the CI captures the true value of $\mu.$

Non-normal bootstrap distribution: Furthermore, the simulated distribution of $D^*$ is not normal. As shown by the dotted normal curve, the distribution of $D^*$ is distinctly right-skewed. Also, notice that the observed estimate 98.55 does not lie at the center of the bootstrap CI.

enter image description here

Sample distinctly non-normal, so t CI inappropriate: Intervals based on t distributions are rightly recognized as being quite robust against non-normality, but the stripchart of our data at the beginning shows extreme skewness and a far outlier. So many statisticians would hesitate to trust a t confidence interval as valid. [For the record, a 95% t CI for these data is $(96.5, 110.6).$ This interval is shorter than the bootstrap CI; it is misleadingly short because it uses the incorrect additional information that the data are normal.]

Parametric bootstrap. A 'parametric' bootstrap procedure also uses re-sampling, but in a different way. Even if we know the shape of the population distribution, we may not know how to use that information to make a confidence interval for the population mean $\mu.$

Briefly, at each bootstrap iteration, we use the sample to get a point estimate of the parameter(s) of the population distribution. Then we use those estimates to simulate a re-sample from the estimated distribution based on the estimates. Unless the population distribution happens to be normal, there is no reason to expect that the bootstrap distribution would be normal.

Specifically, suppose we have a sample of size $n = 100$ from a population distributed $\mathsf{Gamma}(\alpha = 2, \text{rate}=\lambda),$ where the (one single) parameter $\lambda$ is unknown. A point estimate based on the maximum likelihood method is $\hat \lambda = \alpha/\bar X.$ The mathematical statistics required to get a 95% CI for $\lambda$ is not impossible, but also not trivial, so we find a parametric bootstrap CI.

Observed values are $\bar X = 0.6777,\, \hat\lambda_{{obs}} = 2.9511.$ So at each of the $B$ iterations we take a re-sample of size $n = 100$ from $\mathsf{Gamma}(\alpha = 2, \hat\lambda_{{obs}}).$ Then we find $\hat\lambda^*$ and $D^* = \hat\lambda^* - \hat\lambda_{{obs}}.$

Finally, we use the simulated distribution of $D^*$ to make the 95% parametric bootstrap CI $(2.50, 3.31).$ A plot of the simulated distribution of $D^*$ (not shown) is about as skewed as the simulated distribution for the nonparametric bootstrap above. Also, notice that the parameter being estimated is not the population mean.

lam.obs = lam = al/mean(x)
d.re = replicate( B, al/mean(rgamma(n, al, lam.obs)) - lam.obs )
L.re = quantile(d.re, .025);  U.re = quantile(d.re, .975)
c(lam.obs - U.re, lam.obs - L.re)
   97.5%     2.5% 
2.504464 3.313133 

Note: The data were simulated from $\mathsf{Gamma}(\alpha = 2, \lambda= 3),$ so in this simulated example we know that the parametric bootstrap CI captures the true value of $\lambda.$