Probability of an at least n occurences of subsequences of length k in a RNA of length n

38 Views Asked by Bumbble Comm At 10 May 2026 - 3:10

Suppose we have a RNA sequence of length N, then we have a subsequence of length K,

$K \le N$

Is there a method to calculate the probability of the subsequence occuring an arbitrary number of times or at least an arbitrary number of times in the sequence?

The probability of it happening at least once is simply:

$1-(1-\frac{1}{4^K})^{(N-k+1)}$

It is also easy to calculate the probability of it happenning at least N times, if the sequence consists of same consecutive letters (in other words it can overlap independently) is simply the above probability to the power of N:

$(1-(1-\frac{1}{4^K})^{(N-k+1)})^N$

But how to calculate it for an arbitrary sequence? Particularly I cannot wrap my head around the fact that some sequences can overlap while others cannot. For example If I search for a sequence AAAAAAT and I find it on k-th place, it is clear that I wont observe it on the next 7 places, while for example after observing the sequence CGCGCGCG I can observe it again every two places for the length of 8. Is there actually any expression for the probability? Or do I have to manually go through all the combinations?

Original Q&A

Probability of an at least n occurences of subsequences of length k in a RNA of length n

Related Questions in PROBABILITY

Related Questions in COMBINATORICS-ON-WORDS

Trending Questions

Popular # Hahtags

Popular Questions