Probability of a specific binary string sub-sequence occurring

402 Views Asked by At

I've been wracking my brains for a while to try to come up with a non-brute-force solution for this problem.

If you have have a random binary sequence of length $N$, what is the probability that some sub-sequence $X$ is present? My first thought was that it only depends on $N$ and the length of $X$, but from experimentation it's clear that it depends on the contents of $X$.

It seems similar to the problem where you want to know the probability of at least $A$ heads and $B$ tails when flipping a coin $C$ times, but I'm not sure how to then impose the restriction of a specific sequence (Not that I know how to solve that problem either).

I've computed a few results using brute force:

$N=5$ , $X = "111"$, $P(N,X) = 0.25$

$N=5$, $X = "101"$, $P(N,X) = 0.34375$

$N=5$, $X = "100"$, $P(N,X) = 0.375$

2

There are 2 best solutions below

0
On

Here is a way to get at least a recurrence relation:

Let us look at the string "100". We want to find out how many strings of length $n$ do not contain it.

Clearly such a string can only end in ...1, ...10, or ...000.
Now define:
$a_n$: Number of bitstrings of length $n$ that do not contain "100" and end in ...1
$b_n$: Number of bitstrings of length $n$ that do not contain "100" and end in ...10
$c_n$: Number of bitstrings of length $n$ that do not contain "100" and end in ...000

Then we have the recurrence relations $$\begin{align*} a_{n+1} &= a_n + b_n + c_n \\ b_{n+1} &= a_n \\ c_{n+1} &= c_n \end{align*}$$

(because, for example, a string of length $n+1$ that counts towards $a_{n+1}$ can be created by appending "1" to any string belonging to $a_n$, $b_n$ or $c_n$; a string that counts towards $b_n$ can only be created by appending "0" to a string belonging to $a_n$)

Starting with $a_3 = 4$, $b_3 = 2$ and $c_3 = 1$ we can iteratively calculate the values $a_n$, $b_n$ and $c_n$.

The number of binary sequences of length $n$ that do contain "100" is then given by $2^n - (a_n+b_n+c_n)$.

0
On

This is expanding on the solution from @hgmath

For the sequence "100", define the generating function $A(x)=\sum_{n=3}^{\infty}a_n x^{n-3}$. Similarly, define $B(x)=\sum_{n=3}^{\infty}b_n x^{n-3}$ and $C(x)=\sum_{n=3}^{\infty}c_n x^{n-3}$.
Then, these can be solved like this in Mathematica (easy to solve by hand, too):

Solve[{
   A - a3 == x (A + B + C),
   B - b3 == x A,
   C - c3 == x C},
  {A, B, C}] /. {a3 -> 4, b3 -> 2, c3 -> 1}

Simplify[(A + B + C) /. (%[[1]])]

The solution is $A+B+C=\frac{-4 x^2-2 x+1}{x^3-2 x+1}$.

Then, you can find the probabilities by differentiating the generating function.

P[n_] := (2^n - (D[(7 - 2 x - 4 x^2)/(1 - 2 x + x^3), {x, n - 3}] /. x -> 0)/(n - 3)!)/2^n
P[5]

P[5] is 3/8 for example.

Unfortunately, there will be a different generating function for every given binary sequence, but the general approach works for any of them.