If I have n ordered samples with f that are "bad", what's the likelihood that all of the f samples are within d distance of both its previous/next neighboring "bad" sample?
In case it helps, in my problem, n=21, f=3, d=5.
Here's an example in which all f bad samples are within d distance of each other (where 1 = "good" and 0 = "bad"). (In fact, the max distance between bad samples is between the 1st sample and the 4th sample, which is a distance of 4-1=3, which is less than my threshold of d=5.)
01101011111111111111
and here's an example in which not all f bad samples are within d distance of each other (since the last bad sample is clearly more than 5 samples away from its left "bad samples" neighbor. (It doesn't have a right "bad sample" neighbor.)
011101111111110111111
(Just to reiterate: a "bad sample" must be within d distance of both its left/right "bad sample" neighbor. But of course, if it's the first or last bad sample, it only has one immediate neighbor. And, the f bad samples are assumed to be randomly distributed among the n total samples)
(Note: I'm not sure which branch of mathematics this would necessarily fall under, so if someone thinks I mis-guessed the tag, please suggest different tags that I ought to use.)
There are $\binom{n}f$ possible orderings. For each of the first $f-1$ bad experiments, we subtract the orderings where that bad experiment is followed by $d$ good experiments (because this would imply the distance to the next is more than $d$). Such a sequence can be chosen by choosing a sequence of $f$ zeroes and $n-d$ ones, and then inserting the remaining $d$ ones right after that particular bad experiment. This can be done in $\binom{n-d}f$ ways. Doing this for each of the $f-1$ bad experiments, you get $$ \binom{n}f-(f-1)\binom{n-d}f $$ However, this is still not correct. The problem is that sequences with two bad experiments whose distance to the next is more than $d$ have been doubly subtracted. These must be added back in. For each pair of bad experiments among the first $f-1$, we can use a similar method to count sequences where these are both followed by $d$ ones; set aside $2d$ ones, arbitrarily arrange the remaining $f$ zeroes and $n-2d$ ones, then insert $d$ ones after each bad experiment in the pair. We are now at $$ \binom{n}f-(f-1)\binom{n-d}f+\binom{f-1}d\binom{n-2d}f $$ However, we are still not done, since it turns out that the experiments with three distance which are more than $d$ have not been correctly counted. To fix this, and all further overlaps, you need to use the principle of inclusion exclusion. The final result is $$ \sum_{k= 0}^{\lfloor(n-f)/d\rfloor}(-1)^k\binom{f-1}k\binom{n-kd}{f} $$