Probability of sum over a window of binary vector

418 Views Asked by At

I have a vector of one's and zero's of length n with a probability p of observing a one and 1-p of observing a zero. I slide a (overlapping) window of size $k$ across this vector and take the sum (which can take values $0\dots k$). What is the probability of observing a particular sum?

My specific problem uses $n=1000000$, $p=0.995$, $k=3$ and this is what I'm mostly interested in, however a general solution would be nice.

I've enumerated the different possibilities that give rise to each sum for $k=3$, i.e. $$p^3 + (2p+1-p)*3 + (p+2(1-p))*3 + (1-p)^3 = 1$$ but does the fact that the sliding windows are overlapping change the probabilities?

2

There are 2 best solutions below

3
On BEST ANSWER

I suspect your calculation $$p^3 + (2p+1-p)^3 + (p+2(1-p))^3 + (1-p)^3$$ should be instead $$\sum_{i=0}^k \Pr[\mbox{sum}=i]=\sum_{i=0}^k \binom{k}{i} \,p^i (1-p)^{k-i}= p^3 + 3p^2(1-p) +3p(1-p)^2 + (1-p)^3$$

1
On

Suppose your window shows $xyz$. The follow matrix is your probability for shifting to $yzw$:

$$ \begin{array} {c|c|c|c|c|c|c|c} & 000 & 001 & 010 & 011 & 100 & 101 & 110 & 111 \\ \hline 000 & 1-p & p & & & & & & \\ \hline 001 & & & 1-p & p & & & & \\ \hline 010 & & & & & 1-p & p & & \\ \hline 011 & & & & & & & 1-p & p \\ \hline 100 & 1-p & p & & & & & & \\ \hline 101 & & & 1-p & p & & & & \\ \hline 110 & & & & & 1-p & p & & \\ \hline 111 & & & & & & & 1-p & p \\ \hline \end{array} $$

Suppose we are interested in a target $T=2$. Then we can simplify the table to:

$$ \begin{array} {c|c|c|c|c|c|c|c} & 000 & 001 & 010 & 100 & 111 & T=2 \\ \hline 000 & 1-p & p & & & & \\ \hline 001 & & & 1-p & & & p \\ \hline 010 & & & & 1-p & & p \\ \hline 100 & 1-p & p & & & & \\ \hline 111 & & & & & p & 1-p \\ \hline T=2 & & & & & & 1 \\ \hline \end{array} $$

With your problem, $p = 0.995$, so the transition matrix becomes:

$$ M = \begin{bmatrix} 0.005 & 0.995 & 0 & 0 & 0 & 0 \\ 0 & 0 & .005 & 0 & 0 & 0.995 \\ 0 & 0 & 0 & .005 & 0 & 0.995 \\ .005 &0.995 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0.995 & .005 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix} $$ Your initial probability of being in a given window state is: $$V = \begin{bmatrix} 1/8 & 1/8 & 1/8 & 1/8 & 1/8 & 3/8 \end{bmatrix}$$

After the initial state, which gives you 3 tape entries, you see $n - 3$ more, so your final probability is: $$VM^{n-3}$$

For n being a million, your probability will be so close to 100% that there is really no point in trying to calculate it...

But for something small, like $n = 10$, you can see:

$$VM^7 = \begin{bmatrix} 7.98 \cdot 10^{-13} & 1.58 \cdot 10^{-10} & 3.48 \cdot 10^{-12} & 1.94 \cdot 10^{-12} & 0.12 & 0.87 \end{bmatrix} $$

So for a shorter $n=10$ tape, you have a 87% chance of having passed through a state with a sum $T=2$.