Quickly estimating the optimal bin size for the Jackknife re-sampling method

162 Views Asked by At

I am trying to apply the Jackknife method to compute confidence intervals for data collected from Monte Carlo simulations. My understanding is that to counter the effect of autocorrelations I should bin my data in a "leave $n$ out" fashion, then by varying the number of bins I find the resulting errors for various bin sizes and look for where they converge to a maximum value. For example, from this book there is this plot (figure 13.27):

enter image description here

where the lower dotted line represents the error calculated assuming all measurements are independent, and the upper line is larger by a factor of $\sqrt{1 + 2\tau}$ where $\tau$ is the autocorrelation time and represents the true error. The Jackknife errors converge near this result when the bins are around 100-1000.

However, for large data sets, there are many possible bin sizes to test, and this can become extremely computationally expensive. If I have 10,000 data points and I a bin size of 100, then I have to construct 100 bins each with 9,900 data points, and if I am checking many possible bin sizes this takes way too long. So I am trying to find an optimal way to approximate the correct bin size, I can't seem to find any information about this process online.