Intrepretation of Bootstrap method in a simple example, with uniform population to infer.

Question

Intrepretation of Bootstrap method in a simple example, with uniform population to infer.

1.1k Views Asked by Redsbefall At 26 Apr 2025 - 4:03

In order to understand the functionality of bootstrap, i may use a population with uniform distribution to infer.

We can generate a sample of 50 points from a uniform distribution $U(0, 1)$ with $\mu=0.5$, and $\sigma=0.2887$. An example result (using Matlab) is here : $$ \bar{x} = 0.5698,\: \: s =0.2952 , \: \: \: (\text{from 50 random points})$$

Using $ \hat{\theta} = \sum_{i=1}^{50} X_{i}/50 $ as the estimator for the populatiok mean $\mu$, the bootstrap result (with $k=1000$ iterations) is here : $$ \bar{xb}=0.5707, \: \: sb =0.043 , $$

So, by this 1000 resampling, the bootstrap mean will be closer to the sample mean. But this does not infer anything about the population's parameter.

By the CLT, the distribution of the sample mean would be normal $N(\mu=0.5, \: \sigma=\frac{0.2887}{\sqrt{50}}=0.0408)$. The standard deviation of the bootstrap resampling is close to 0.0408, the standard deviation of sample mean distribution of the population.

From this experiment, the only functionality of bootstrap resampling that i can see is that we can infer the standard deviation of the sample mean distribution. Is this statement true? (Is this the true functionality of bootstrap resampling?)

I have read some statements about bootstrap method, they say it is effective and does not require any assumptions about the population's distribution. But i have not really understand how to properly use this method.

Some insights on this will be appreciated. Thanks. Regards, Arief.

Original Q&A

There are 2 best solutions below

Redsbefall On 25 Jul 2017 - 4:39

Relating to mr.Bruce's @BruceET answer.

Generating the empirical bootstrap distribution $B = \frac{\bar{X_{b}} - \bar{x}}{\sigma/\sqrt{n}}$ (population $\sigma$ is known) using Python :

This time, I generate only $n=10$ samples from the standard uniform distribution. The sample mean is obtained $ \bar{x} = 0.3327 $, so this is not close enough to the population mean $\mu = 0.5$.

with number of bootstrap resampling $n_{b} = 10,000$

I get the plot and compare it with the empirical standard-t from the uniform population ($9$ degrees of freedom) by generating $30,000$ random samples of standard-t (which should be close to the exact one) :

The result above shows that : if $\sigma$ is known, with only $n=10$ samples we can get nice approximation for the population. Although the sample mean $\bar{x}$ is relatively far from the hidden $\mu = 0.5$.

Generating the empirical bootstrap distribution $B = \frac{\bar{X_{b}} - \bar{x}}{\sigma/\sqrt{n}}$ ($\mu$ and $\sigma$ are not known) : For this case, I presume we may use $S^{2}$ to substitute the $\sigma$. With different $n=10$ samples from the previous case $ \bar{x}=0.3875$, I get the plots :

in Summary : i may say that the results above is one of main applications of bootstrap method?

Thanks. All the best.

**BruceET** · Accepted Answer

Some of what you have written appears to be a tangle of confusion. But I think I can help dispel some of the confusion. Let me start from one of your statements, and use that as a basis for illustrating a 95% nonparametric bootstrap confidence interval (CI) for the mean $\mu$ of the population from which available data are sampled:

"I have read some statements about bootstrap method, they say it is effective and does not require any assumptions about the population's distribution. But i have not really understand how to properly use this method."

Data. Suppose you have $n = 50$ observations sampled at random from an unknown distribution. Suppose these are as follows (the numbers in brackets show the index of the first observation on each line):

x
# [1] 0.11 0.62 0.61 0.62 0.86 0.64 0.01 0.23 0.67 0.51
#[11] 0.69 0.54 0.28 0.92 0.29 0.84 0.29 0.27 0.19 0.23
#[21] 0.32 0.30 0.16 0.04 0.22 0.81 0.53 0.91 0.83 0.05
#[31] 0.46 0.27 0.30 0.51 0.18 0.76 0.20 0.26 0.99 0.81
#[41] 0.55 0.65 0.31 0.62 0.33 0.50 0.68 0.48 0.24 0.77

Point estimate. The observed sample mean $\bar X = 0.4692.$ In my R code I use a for this value. It is an estimate of the population mean $\mu.$

a = mean(x);  a
## 0.4692

Nonparametric bootstrap CI. In order to make a CI for $\mu,$ I need information about the variability of $\bar X.$ Specifically, I would like to know the distribution of the differences $D = \bar X - \mu.$ If I knew this distribution then I could use it to find $L$ and $U$ such that

$$ 0.95 = P(L < D = \bar X -\mu < U) = P(\bar X - U < \mu < \bar X - L),$$

so that a 95% CI for $\mu$ would be of the form $(\bar X - U, \bar X - L).$

Not knowing the distribution of $D.$ I enter the bootstrap world, where I repeatedly re-sample from the sample of 50 observations in order to obtain estimates $L^*$ of $L$ and $U^*$ of $U$.

Specifically, one re-sample consists of a sample of size $50$ chosen with replacement from the sample x. I find its mean $\bar X^*.$ And then, temporarily, using the observed $A = 0.4692$ as proxy for the unknown $\mu$, I find $D^* = \bar X^* - A.$ In this same way, I make a large number $B$ re-samples, obtaining another value $D^*$ from each re-sample.

Back in the real world I find quantiles .025 and .975 of the $B$ values $D^*,$ which I use as $L^*$ and $U^*,$ respectively. Finally, my 95% nonparametric bootstrap CI for $\mu$ is of the form $(\bar X - U^*, \bar X - L^*),$ which I found to be $(0.396, 0.541),$ based on $B = 100,000$ re-samples of size $n=50.$

R code for bootstrap. Because bootstrapping is a simulation process, you may get slightly different CIs on each run. (With $B = 10^5,$ often only the last digit of the confidence bounds changes.) If you use the same data and R with the same seed shown at the head of the program, you will get exactly the same result I did.

In the R code below, I use suffixes .re instead of $*$ to indicate re-sampling in the bootstrap world. (I have tried to use simplified R code so it will be clear what is going on even if you are not familiar with R.)

set.seed(4321)
B = 10^5;  d.re = numeric(B);  n = 50
for (i in 1:B) {
  a.re = mean(sample(x, n, repl=T))
  d.re[i] = a.re - mean(x)  }
L.re = quantile(d.re, .025)
U.re = quantile(d.re, .975)
mean(x) - U.re; mean(x) - L.re
97.5% 
0.396 
2.5% 
0.5412

Reality check. By way of confession, I simulated the sample x as you suggested by taking $n = 50$ values from $\mathsf{Unif}(0,1),$ rounded to two places.

set.seed(1234)
x = round(runif(50),2)
a = mean(x); a;  sd(x)
## 0.4692
## 0.2636312

A t confidence interval should come pretty close to finding a valid 95% CI for $\mu = 0.5.$ I obtained the 95% t-interval $(0.394, 0.544),$ which is in excellent agreement with the 95% nonparametric boostrap CI. (You can generate exactly the same 50 values x, if you use R with the same seed.)

Intrepretation of Bootstrap method in a simple example, with uniform population to infer.

There are 2 best solutions below

Related Questions in STATISTICAL-INFERENCE

Related Questions in BOOTSTRAP-SAMPLING

Trending Questions

Popular # Hahtags

Popular Questions