Bootstrap method failing where blocking works

179 Views Asked by At

I'm computing an average of individual samples that are not entirely independent and need an estimate for the true standard deviation. According to Newman and Barkema's book the most reliable method will be Bootstrap sampling (see section 3.4.3), where you don't have to worry about the samples being independent and which should give an estimate of the standard deviation of the mean $\sigma_m\approx\sigma\ /\sqrt{n}$ where $n$ is the number of samples.

However I proceed to compute the average a number of times so that I get a brute force estimate of the actual $\sigma_m$, and it turns out that the bootstrap is consistently underestimating this.

In itself that is maybe not so strange; the bootstrap being an estimate. But the weird thing is that if I use the blocking (or binning) method (see 3.4.2) I get a much better estimate - while according to Newman and Barkema this should be a much more primitive method.

In fact the bootstrap consistently gives an estimate very close to the naive $\sigma_m\approx\sqrt{\big(\ \overline{x^2}-\overline{x}^2\ \big)\ /\ n}$.

Any idea what's going on?

1

There are 1 best solutions below

8
On

Step 1

Use "$n-1$". See https://stats.stackexchange.com/questions/3931/intuitive-explanation-for-dividing-in-n-1-when-calculating-sd . If you use $n$ you have a biased estimator of the s.d. Since $n> n-1$ the bias is negative (towards too small s.d.s).

I.e., use $\sigma_m \approx \sigma/\sqrt{n-1}$.

Step 2

(Added by edit on 20150715, based on additional information from the poster.)

If the fourth (and/or higher even moments of the population distribution) are non-zero, then the sample s.d.s may be biased estimators of the population s.d.s. (The biases from the various moments could cancel, but that's rare in practice.)

The distribution of sample standard deviations is distributed with mean equal to the population standard deviation and with variance $\frac{\mu_4}{n} - \frac{\mu_2^2(n-3)}{n(n-1)}$. That is, the sample s.d. will systematically vary from the population s.d. if the fourth central moment of the population is not zero. This deviation will decrease to zero as $1/n$.

Relative to bootstrapping, if your population distribution is leptokurtic (has excess kurtosis relative to the standard normal, equivalently, is more central with thinner tails than a normal distribution), and the initial sample is representative, then a subsample is likely to underestimate the population s.d. If the population is platykurtic (low center, fat tails) and the initial sample is representative, then subsamples likely overestimate the population s.d.