The probability of a sum of a binary subsequence

199 Views Asked by At

Let a source S generating a binary (and independent) sequence with the following probabilities: $p$ for the symbol '$0$' and $1−p$ for the symbol '$1$'.

The event B occurs when having a block of $n$ consecutive zeros. The B blocks are not redundant, i.e., they have no common zeros. The B blocks are detected while reading S from the beginning. For example, $n=4$:

\begin{equation} S = 0011\underbrace{0000}_{B_1}100101\underbrace{0000}_{B_2}11\underbrace{0000}_{B_3}\underbrace{0000}_{B_4}0010111\underbrace{0000}_{B_5}00011101\underbrace{0000}_{B_6}0011 \cdots \end{equation}

Assuming we have a certain $n$, $p_B$ is the probability of the event B.

E is the event that corresponds to: a subsequence between two adjacent B blocks (e.g., bits between $B_2$ and $B_3$, or $B_3$ and $B_4$) has an even sum. Note that, the sum zero is even, also the null sum (the two B blocks are consecutive, no bits between them like $B_3$ and $B_4$) is considered as even.

The question is: how to express the probability $p_E$ in terms of $p$, $n$, and $p_B$?

1

There are 1 best solutions below

1
On

This is an extended comment in that I do a brute force approach that suggests a potential answer.

First the potential answer: The probability of having an even number of 1's in a random sequence of Bernoulli random variables with probability $1-p$ that ends with $n$ zeros for the first time is $1/(2-p^n)$.

Start by enumerating the possibilities for $n=2$. If we look at all of the sequences that can occur up to a length of 10 we get the following sequences that end in 00:

00
100
0100  1100
01100  10100  11100
010100  011100  101100  110100  111100
0101100  0110100  0111100  1010100  1011100  1101100  1110100  1111100
01010100  01011100  01101100  01110100  01111100  10101100  10110100  10111100
11010100  11011100  11101100  11110100  11111100
010101100  010110100  010111100  011010100  011011100  011101100  011110100  
011111100  101010100  101011100  101101100  101110100  101111100  110101100  
110110100  110111100  111010100  111011100  111101100  111110100  111111100
0101010100  0101011100  0101101100  0101110100  0101111100  0110101100  0110110100
0110111100  0111010100  0111011100  0111101100  0111110100  0111111100  1010101100
1010110100  1010111100  1011010100  1011011100  1011101100  1011110100  1011111100
1101010100  1101011100  1101101100  1101110100  1101111100  1110101100  1110110100
1110111100  1111010100  1111011100  1111101100  1111110100  1111111100

For each sequence the probability will be $p^{\text{number of zeros}}\times(1-p)^{\text{number of ones}}$. Adding up all of probabilities where the sum of the ones is even gets one a polynomial in $p$. The first few coefficients are plugged into OEIS and an expansion of $1/(2-p^2)$ is found.

Doing this again for $n=3$ and $n=4$ finds a pattern.

Running simulations matches well with the relatively simple formula. Using Mathematica:

nsim = 10000;
p0 = 0.6;
nzeros = 2;
zeros = ConstantArray[0, nzeros];
SeedRandom[12345];
g = RandomVariate[BernoulliDistribution[1-p0], {nsim, 5000}];
sp[seq_, z_] := If[Length[seq] == 0, 0, 
  Boole[EvenQ[Total[seq[[1 ;; SequencePosition[seq, z, 1][[1, 1]]]]]]]]

(* Estimated probability of an even number of 1's *)
Total[sp[#, zeros] & /@ g]/nsim // N
(* 0.6096 *)

(* Potential formula for the probability *)
pEven = 1/(2 - p^nzeros) /. p -> p0
(* 0.609756 *)