In order to understand the functionality of bootstrap, i may use a population with uniform distribution to infer.
We can generate a sample of 50 points from a uniform distribution $U(0, 1)$ with $\mu=0.5$, and $\sigma=0.2887$. An example result (using Matlab) is here : $$ \bar{x} = 0.5698,\: \: s =0.2952 , \: \: \: (\text{from 50 random points})$$
Using $ \hat{\theta} = \sum_{i=1}^{50} X_{i}/50 $ as the estimator for the populatiok mean $\mu$, the bootstrap result (with $k=1000$ iterations) is here : $$ \bar{xb}=0.5707, \: \: sb =0.043 , $$
So, by this 1000 resampling, the bootstrap mean will be closer to the sample mean. But this does not infer anything about the population's parameter.
By the CLT, the distribution of the sample mean would be normal $N(\mu=0.5, \: \sigma=\frac{0.2887}{\sqrt{50}}=0.0408)$. The standard deviation of the bootstrap resampling is close to 0.0408, the standard deviation of sample mean distribution of the population.
From this experiment, the only functionality of bootstrap resampling that i can see is that we can infer the standard deviation of the sample mean distribution. Is this statement true? (Is this the true functionality of bootstrap resampling?)
I have read some statements about bootstrap method, they say it is effective and does not require any assumptions about the population's distribution. But i have not really understand how to properly use this method.
Some insights on this will be appreciated. Thanks. Regards, Arief.
Some of what you have written appears to be a tangle of confusion. But I think I can help dispel some of the confusion. Let me start from one of your statements, and use that as a basis for illustrating a 95% nonparametric bootstrap confidence interval (CI) for the mean $\mu$ of the population from which available data are sampled:
Data. Suppose you have $n = 50$ observations sampled at random from an unknown distribution. Suppose these are as follows (the numbers in brackets show the index of the first observation on each line):
Point estimate. The observed sample mean $\bar X = 0.4692.$ In my R code I use
a
for this value. It is an estimate of the population mean $\mu.$Nonparametric bootstrap CI. In order to make a CI for $\mu,$ I need information about the variability of $\bar X.$ Specifically, I would like to know the distribution of the differences $D = \bar X - \mu.$ If I knew this distribution then I could use it to find $L$ and $U$ such that
$$ 0.95 = P(L < D = \bar X -\mu < U) = P(\bar X - U < \mu < \bar X - L),$$
so that a 95% CI for $\mu$ would be of the form $(\bar X - U, \bar X - L).$
Not knowing the distribution of $D.$ I enter the
bootstrap world
, where I repeatedly re-sample from the sample of 50 observations in order to obtain estimates $L^*$ of $L$ and $U^*$ of $U$.Specifically, one re-sample consists of a sample of size $50$ chosen with replacement from the sample
x
. I find its mean $\bar X^*.$ And then, temporarily, using the observed $A = 0.4692$ as proxy for the unknown $\mu$, I find $D^* = \bar X^* - A.$ In this same way, I make a large number $B$ re-samples, obtaining another value $D^*$ from each re-sample.Back in the
real world
I find quantiles .025 and .975 of the $B$ values $D^*,$ which I use as $L^*$ and $U^*,$ respectively. Finally, my 95% nonparametric bootstrap CI for $\mu$ is of the form $(\bar X - U^*, \bar X - L^*),$ which I found to be $(0.396, 0.541),$ based on $B = 100,000$ re-samples of size $n=50.$R code for bootstrap. Because bootstrapping is a simulation process, you may get slightly different CIs on each run. (With $B = 10^5,$ often only the last digit of the confidence bounds changes.) If you use the same data and R with the same seed shown at the head of the program, you will get exactly the same result I did.
In the R code below, I use suffixes
.re
instead of $*$ to indicate re-sampling in the bootstrap world. (I have tried to use simplified R code so it will be clear what is going on even if you are not familiar with R.)Reality check. By way of confession, I simulated the sample
x
as you suggested by taking $n = 50$ values from $\mathsf{Unif}(0,1),$ rounded to two places.A t confidence interval should come pretty close to finding a valid 95% CI for $\mu = 0.5.$ I obtained the 95% t-interval $(0.394, 0.544),$ which is in excellent agreement with the 95% nonparametric boostrap CI. (You can generate exactly the same 50 values
x
, if you use R with the same seed.)