How to calculate the exact probability that the sum of a sliding window over a list of boolean values equals to m?

41 Views Asked by At

Let $a_n$ be a sequence of random boolean values of size $n$ with the probability of each value being true $p$: $$ a_n=\left\{\begin{align*} 0, \quad & \text{if } \mathrm{random}(p) < 0.5 \\ 1, \quad & \text{if } \mathrm{random}(p) \ge 0.5 \\ \end{align*}\right. $$

Let a sequence $b_m$ be a sliding window of size $m$ over the sequence $a_n$ that is defined as: $$ b_{m}=\{a_{i},a_{i+1},\dots,a_{i+m-1}\}, i=\overline{1,n-m-1} $$

How can I find the exact probability that at least one of the sliding windows $b_m$ contains exactly $m$ true values?

My own investigation

Usually we calculate probability of "at least one" as $1-P(A)$, where $P(A)$ is the probability of the event not happening. But how can I find $P(A)$?

The problem is that the sliding windows overlap with each other, meaning the events are interdependent. If the events were independent, I could compute PDF of binomial distribution for $m$ successes of $m$ trials and raise it to $n-m-1$: $$P(B)=\left(1-p^m\right)^{n-m-1} \tag{1}$$ But it's not the case for dependent events.

I used Monte-Carlo simulation to obtain approximate probability of $1-P(A)$ with $n=10,m=5,p=\frac{1}{2}$:

import random


def generate_random_boolean_array(size: int, probability: float):
    return [random.random() >= probability for _ in range(size)]


total = 10**5
success = 0

n = 10
m = 5
p = 0.5

for _ in range(total):
    count = 0

    a = generate_random_boolean_array(n, p)
    for i in range(n - m + 1):
        if sum(a[i:i+m]) == m:
            count += 1

    if count > 0:
        success += 1


print(success / total)

The approximate probability that at least one of the sliding windows $b_m$ contains exactly $m$ true values is $0.11$. How can I find the exact probability using probability theory?

I think, the probability of the first sliding window is not dependent on other sliding windows, since it contains all new random boolean values. So I think it can be computed using equation $(1)$. If we denote $P(C)$ to be the probability of the next sliding window not to have $m$ true values, will this probability be the same for all sliding windows except the first one? If it will, then the probability I am looking for would be equal to: $$P(A)=P(B) \cdot P(C)^{m-n-1-1}$$

Intuitively, the next sliding window is dependent on $m-1$ previous boolean values, and $1$ new boolean value. Can I find $P(C)$ in this case?

P.S.: I'm a student and I want to solve this problem for educational purposes, to understand the math, not to just get the approximate probability using simulation, which would be enough for practical purposes.