Bootstrap estimation of the 95% confidence intervals for the 95% quantile for gamma distribution

2.6k Views Asked by At

I cant find any where information or algorithm how to apply in steps the bootstrap procedure to estimate the 95% confidence intervals for the 95% quantile from a random sample. Does anyone knows how to do it and can you write it? Thanks in advance.

1

There are 1 best solutions below

2
On

The simplest possible approach is a nonparametric bootstrap as follows:

You have a large sample from a process, but no idea what the underlying distribution may be (no family such as gamma, no parameters). Maybe it is the length of time until a particular item fails when operated at too high a temperature. You can easily make a histogram of the SAMPLE and see it has a long tail to the right. And that it's 95th percentile is at 92.62 days. Knowing the 95th percentile of this PROCESS is important, and you want a 95% CI for that.

 # Generate fake data (for example using Gamma(5, .1)
 set.seed(1234)  # so you can reproduce same dataset
 n = 1000;  sh = 5;  rt = .1
 x = rgamma(n, sh, rt)
 q95.obs = quantile(x, .95)
 ##      95% 
 ## 92.62036 
 summary(x)
 ##  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 ## 8.064  33.830  47.180  50.500  62.440 171.900 

Here is one method to find a 95% CI based on the observed 95th percentile of the sample programmed in R. (There are many styles of nonparametric bootstrap. I have no idea which ones you may have encountered. This one uses the 'quantile method'.)

 B = 10000;  q95 = numeric(B)
 for (i in 1:B) {
   x.b = sample(x,length(x),repl=T)  # bootstrap re-sample
   q95[i] = quantile(x.b, .95) }
 2*q95.obs - quantile(q95, c(.975,.025))
 ##  88.25883 98.23680 
 length(unique(q95))
 ## 265

So based on your sample, a bootstrap 95% CI for the 95th percentile of the process is $(88.3, 98.2).$ The final statement says that, among 10,000 bootstrap iterations, we encountered 're-samples' that produced 265 uniquely distinct 95th percentiles, which is enough for a reasonably good CI. I suspect that using a very small sample of size for such a bootstrap would not yield a useful CI.

Because pseudorandom simulation is involved in getting a bootstrap CI, you may get a slightly different result if you run the same program again. One additional run gave slightly different result that agrees with the one above when rounded to one decimal place.

I don't know whether this is a drill exercise in a class or a real problem from your work. If you have some kind of parametric bootstrap method in mind, please provide a more specific description of the situation.