Related problem to covering a circle with random arcs

182 Views Asked by Bumbble Comm At 29 Mar 2026 - 4:53

I have a problem setup wherein we have (the following are all integers) a sequence of length $G$, and $N$ reads of length $L$. I'm interested in the problem where we consider the sequence to be circular and we want to know the probability that the reads entirely cover every position of the sequence (at least once).

I found a nice result from Stevens 1939 that is about covering a circle of circumference 1 with N reads of a given size L.

This gives the probability of coverage as $P(C) = \sum_{k=0}^{\left \lfloor{1/L}\right \rfloor } (-1)^k \binom{N}{k} (1-kL)^{N-1}$

I thought I might be able to apply this result to my problem simply by normalizing $L$, however I have been unable to get this theoretical probability and a simulated probability to match.

This Matlab code shows the monte carlo simulation and the theoretical equation for some sample values and I've normalized $L$ such that $\hat{L} = L/(G-L+1)$ (I also tried simply normalizing via $L/G$ but the results were even further far apart). Any ideas what I'm missing here?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This is an inclusion-exclusion formula that sums over the number of gaps that the sequences might leave. You can perform the same type of calculation for your case.

A complication arises because unlike in the continuous case, there's a finite probability that reads start at the same point, and this makes it more difficult to count the number of ways of leaving gaps between them.

Let's therefore condition on the number of different starting points of the reads. Again by inclusion-exclusion, the probability for $N$ reads to have exactly $m$ different starting points is

$$ \binom Gm\sum_{j=0}^{m-1}(-1)^j\binom mj\left(\frac{m-j}G\right)^N\;, $$

where $\binom Gm$ is the number of choices for the starting points, $j$ counts how many of these starting points are missed, $\binom mj$ is the number of ways to choose these $j$ points, $(m-j)^N$ is the number of ways to choose starting points for the $N$ reads while missing these $j$ points, and $G^N$ is the total number of choices for the starting points.

Now we can apply inclusion-exclusion as in the continuous case. Fix one of the $m$ starting points as the first and order the remaining ones relative to it. Then the probability that there are gaps at $k$ particular points in the order is

$$ \frac{\binom{G-1-kL}{m-1}}{\binom{G-1}{m-1}}\;, $$

since every admissible choice of $m-1$ points from $G-1$ (where the fixed first point is excluded) yields a unique choice of $m-1$ points from $G-1-kL$ when you cut out $L$ points at each of the $k$ gaps. Thus the probability of coverage is

$$ \sum_{m=1}^N\binom Gm\sum_{j=0}^{m-1}(-1)^j\binom mj\left(\frac{m-j}G\right)^N\sum_{k=0}^{\left\lfloor\frac{G-1}L\right\rfloor}(-1)^k\binom mk\frac{\binom{G-1-kL}{m-1}}{\binom{G-1}{m-1}}\\ =\sum_{m=1}^N\frac Gm\sum_{j=0}^{m-1}(-1)^j\binom mj\left(\frac{m-j}G\right)^N\sum_{k=0}^{\left\lfloor\frac{G-1}L\right\rfloor}(-1)^k\binom mk\binom{G-1-kL}{m-1}\;. $$

Note that the two inner sums can be evaluated separately, the first one has at most $N$ terms and the second one also has at most $N$ terms if there is any chance of coverage, the binomial coefficients can all be built up successively during the summation, using only $O(1)$ operations per term, and the $N$-th power takes only $O(N)$ different values, so the evaluation takes a total of $O\left(N^2\right)$ operations.

Here's code that tests this calculation against a simulation.

Related problem to covering a circle with random arcs

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in RANDOM-VARIABLES

Related Questions in INCLUSION-EXCLUSION

Related Questions in MONTE-CARLO

Trending Questions

Popular # Hahtags

Popular Questions