After scouring the internet and reference books for a couple of days I couldn't really find an answer to the current problem I am trying to solve. Lets say that I want to construct a confidence interval of a mean for a sample using the bootstrap method. The mean will represent the expected number of trials before the first success (Geometric Distribution). However, the data I have only consists of the total number of successes and total number of trials. I don't have access to the separate trials. My current approach to this problem is:
- Generate a random binary set that consists of successes as ones and failures (number of trials - number of successes) as zeros.
- For B times, sample from the generated binary set to create a bootstrap resample of the same size.
- For each of these B resamples calculate the probability of success $p\_{mle}$ using the Maximum Likelihood Estimate for the Geometric Distribution. Then find the mean using $\frac{1}{p\_{mle}}$ to create a bootstrap distribution.
- Then I construct the confidence interval by finding the respective percentiles of the bootstrap distribution of the means.
So the problem I have with this is that I am not sure if it's correct to be able to generate a random binary variable and assume that is a good representation of the original sample. Also, is it okay to transform the bootstrap sample?
Any advice would be appreciated! Thanks in advance.
This seems like a good fit for parametric bootstrap. You can estimate $p$ in your sample by $\hat{p}$ (for example the MLE would be a good choice) and you can then sample from a geometric distribution with parameter $\hat{p}$ to generate your bootstrap samples. You know the size of your sample (because that is simply the number of successes you have) and this should also be the size of your bootstrap samples.
In point 1 of your procedure you do not make it so clear what you mean by "random binary set".