The number of binary sequences with given characteristics.

647 Views Asked by At

Given a binary sequence, for example: "00101101", the following properties can be determined:

n: the total length of the sequence
p: the total number of `1`s in the sequence
k: a vector of transition counts [{0->0}, {0->1}, {1->0}, {1->1}]
     reading the sequence from left to right

In the case case of "00101101", they are determined as:

n: 8, p: 4, k: [1 3 2 1]

However, in the case, there in total 9 strings with these characteristics:

 {"01010011" "00110101" "01001101" "01001011" "01011001" 
  "01101001" "00101101" "00101011" "01100101"}

Is there a general equation to find the count of sequences matching a particular set of characteristics?

2

There are 2 best solutions below

3
On BEST ANSWER

Using generating functions is a very generic way to solve this type of problem is of course always a correct approach. For this relatively simple problem we can also solve it directly with some observations and obtain an easy to calculate expression for the desired result.

In the characterisation you like to use, there are essentially 6 numbers: $\{n,p,k_{00},k_{01},k_{10},k_{11}\}$, where $k_{xy}$ stands for the number of transitions $x \to y$. These are however not independent but correlated and also note that the vector $k$ is more conveniently interpreted as a transition matrix rather than a vector. From this characterisation we immediately can conclude that for any corresponding binary sequence of length $n$ what it's first and last digits have to be. This follows from the following more detailed intermezzo.

Let us first discuss what happens if we add one transition to the original arbitrary sequence of $n-1$ transitions, i.e., we add explicitly the transition from the last digit to the first to make the pattern cyclic. In this case the characterisation is slightly modified and has the form $\{n,p,K_{00},K_{01},K_{10},K_{11}\}$, where for three transitions we have $K_{xy}=k_{xy}$ and for exactly one transition we have $K_{xy}=k_{xy}+1$.

With this, it is easy to see that we need to have the following constraints $ p = K_{01} + K_{11}= K_{10} + K_{11}$ and $ n - p = K_{00} + K_{01} = K_{00} + K_{10}$, because we know how many transitions there should be starting with 1 as well as how many that finish with 1, and the same for 0. It is then immediately clear that $K_{01}=K_{10}$ which follows from the equations, but is also evident from the fact that in a cyclic sequence, where we start and finish with the same initial digit, there must be as many transitions $0 \to 1$ as there are $1 \to 0$.

If we use $t$ to denote the number of transitions $0 \to 1$, we find that the characterisation for the cyclic case is $\{n,p,n-p-t,t,t,p-t\}$ for any positive numbers $n,p,t$ provided that $n \geq p+t$ and $p \geq t$ to guarantee non-negative numbers.

We know that in the original problem exactly one of the last four numbers should be one less than in the above characterisation. For a given characterisation $\{n,p,k_{00},k_{01},k_{10},k_{11}\}$, we can immediately determine whether it is possible at all for this pattern to appear by checking whether addition of unity to any one of the last four numbers results in match with the pattern established above. Note that by default this also fixes the first and last digit in the sequence. I.e., if we have $k_{01}>k_{10}$ all sequences have to be of the type $0 \cdots 1$ and for $k_{01}<k_{10}$ all sequences have to be of the type $1 \cdots 0$. In the case $k_{01}=k_{10}$ we get for $k_{00}+k_{01}=n-p$ that the sequence is $1 \cdots 1$ and otherwise $0 \cdots 0$.

So from the characterisation $\{n,p,k_{00},k_{01},k_{10},k_{11}\}$ we know the first and last digit of an allowed sequence, as well as the number of transitions $0 \to 1$ and $1 \to 0$. In particular this means that we know the number of intervals $I_1$ and $I_0$ that contain only 1's and 0's respectively. We also know the number of times $p$ and $q=n-p$ that the digit 1 and 0 appear in the sequence. To find all possible allowed sequences, we therefor have to find the number of ways in which $I_1-p$ digits 1 and $I_0-q$ digits 0 can distributed over the corresponding intervals. This is the same as the product of counting the number of ways in which $p$ 1's and $q$ 0's can be partitioned in $I_1$ and $I_0$ non-empty sets.

This gives as a final result $$\binom{n - p - 1}{ I_0 - 1} \binom{p - 1}{I_1 - 1}$$

In the particular example we have $\{8,4,1,3,2,1\}$. Since $k_{01}=2<k_{10}=3$ all sequences have to start with a 0 and end with a 1. Therefor there are three intervals that contain only 0's and three intervals that contain only 1's in each sequence. And since we have four digits 1 and four digits 0, we only need to find the number of non-empty partitions of 4 elements into three intervals, which is $\binom{4-1}{3-1} = 3$, we get the final number $3 \times 3 = 9$.

3
On

One way to solve counting problems about patterns in strings (whether it's avoiding them or including them) is the Goulden-Jackson cluster method. The paper referenced in the link is a classic by Noonan and Zeilberger. (A basic undergraduate course in combinatorics is the only prerequisite for being able to read the paper.) They give an efficient method for computing the generating function for strings that avoid a collection of patterns. As an added bonus, their website also contains several Maple programs that implement their algorithms.

For instance, you ask to find the number of binary strings of a specified length with a specified number of 1's, 00's, 01's, 10's and 11's. For instance, in their package DAVID_IAN there's a method GJgfDetail that computes the generating function over an alphabet for the number of strings that contains various numbers of patterns. For your example about binary strings with various numbers of 0's, 1's, 00's, 01's, 10's, and 11's, the Maple command

GJgfDetail({0, 1}, {[0, 0], [0, 1], [1, 0], [1, 1]}, x, t)

returns the generating fuction $$\frac {t_{{0,0}}t_{{1,1}}x_{{0}}x_{{1}}-t_{{0,1}}t_{{1,0}}x_{{0}}x_{{ 1}}-t_{{0,0}}x_{{0}}x_{{1}}+t_{{0,1}}x_{{0}}x_{{1}}+t_{{1,0}}x_{{0}}x_ {{1}}-t_{{1,1}}x_{{0}}x_{{1}}-t_{{0,0}}x_{{0}}-t_{{1,1}}x_{{1}}+x_{{0} }+x_{{1}}+1}{t_{{0,0}}t_{{1,1}}x_{{0}}x_{{1}}-t_{{0,1}}t_{{1,0}}x_{{0} }x_{{1}}-t_{{0,0}}x_{{0}}-t_{{1,1}}x_{{1}}+1} $$

In particular, the number of binary strings of length 8 with four 1's (equivalently four 1's and four 0's), one 00, three 01's, two 10's, and one 11 would be the coefficient of $x_0^4\, x_1^4\,t_{0,0}^1 \,t_{0,1}^3\, t_{1,0}^2\, t_{1,1}^1$ in the multi-variable Taylor expansion of the generating function. The Maple command

coeftayl(gf, [x[0], x[1], t[0, 0], t[0, 1], t[1, 0], t[1, 1]] = [0, 0, 0, 0, 0, 0], [4, 4, 1, 3, 2, 1])

returned (in happy agreement with your result)

9.