What is the number of binary strings of length N with exactly R runs of ones, with C total ones?

1.8k Views Asked by At

I'm concerned with the total number of ones, and the total number of runs, but not with the size of any of the runs.

For example, $N=8$, $R=3$, $C=5$ includes 11101010, 01101011 among the 24 total possible strings.

I can compute these for small $N$ easily enough, but I am specifically interested in the distribution for $N=65536$. As this will result in very large integers, the log probability distribution is equally useful.

I found [1] and [2], which includes this:

Let $N_{n;g_k,s_k}$ denote the number of binary strings which contain for given $g_k$ and $s_k$, $g_k=0,1,…,⌊\frac{s_k}{k}⌋$, $s_k=0,k,k+1,…,n$, exactly $g_k$ runs of 1’s of length at least $k$ with total number of 1’s (with sum of lengths of runs of 1’s) exactly equal to $s_k$ in all possible binary strings of length $n$.

An expression for this is given in eq. (24):

$N_{n;g_k,s_k} = \sum_{y=0}^{n-s_k} {y+1 \choose g_k } {s_k-(k-1)g_k-1 \choose g_k-1} \sum_{j=0}^{⌊\frac{n-y-s_k}{k}⌋} (-1)^j {y+1-g_k \choose j} {n-s_k-kj-g_k \choose n-s_k-kj-y} $

for $g_k \in \{1, ..., ⌊\frac{s_k}{k}⌋\}$, $s_k \in \{k, k+1, ..., n\}$.

I think this is exactly what I'm looking for, with $k = 1$, $s_k = C$ and $g_k = R$. However, when I implemented this I did not get the expected results (Python shown below, edge cases omitted), based on comparing to counting all strings for N=8. I am working backwards to try to understand where I might have gone wrong, but not having much luck yet. I wonder if I am misunderstanding the result.

def F(x, y, n):
    # x = C or s_k (cardinality)
    # y = R or g_k (runCount)
    # n = N (total bits)

    a1 = 0
    for z in range(n-x+1):
        b1 = choose(z+1, y) * choose(x-1, y-1)
        a2 = 0
        for j in range(n-z-x+1):
            a2 += (-1) ** j * choose(z+1-y, j) * choose(n-x-j-y, n-x-j-z)
        a1 += b1 * a2

    return a1

Note that the choose function uses factorial, which I realize won't work for larger $N$ - but should be fine for $N=8$.

Edit: corrected a sign error typo in eq. (24) and the equivalent error in the python code.

[1] Counting Runs of Ones and Ones in Runs of Ones in Binary Strings, Frosso S. Makri, Zaharias M. Psillakis, Nikolaos Kollas https://file.scirp.org/pdf/OJAppS_2013011110241057.pdf

[2] On success runs of a fixed length in Bernoulli sequences: Exact and asymptotic results, Frosso S.Makria, Zaharias M.Psillakis http://www.sciencedirect.com/science/article/pii/S0898122110009284

2

There are 2 best solutions below

6
On BEST ANSWER

runs_ones_1

Consider a binary string and let's put an additional (dummy) fixed $0$ at the start of the it. We individuate as a run a $0$ followed by contiguous $1$'s.

So, given a string of length $N$, total of ones $C$, and number of runs $R$, we will have $N-C$ zeros, of which $N-C-R+1$ are "free", that is not tight to mark the runs.

The number of ways to constitute the runs is the number of (standard) compositions of $C$ into $R$ parts, that is $$ {{C-1} \choose {R-1}}$$.
The number of ways to place the "free" zeros will be equal to the number of weak compositions of their number into $R+1$ parts (in front of the first and then after each run), i.e.: $$ {{N-C-R+1+R+1-1} \choose {R}}={{N-C+1} \choose {R}}$$.

Which confirms N.Shales's answer, so that the merit should go to him.

Addendum

Concerning formula (24), for what I can see, it looks like that in the 3rd binomial ${{y+1+g_k} \choose {j}}$ there is a sign typo, and that it should be $\cdots -g_k$.
Then putting for instance $k=1,\; s_k=n-1 \; g_k=2$ it will correctly give $n-2$,
and with $k=1,\; s_k=C \; g_k=R$ it will give the formula above.
But not having the full text, I cannot check that further.

4
On

Perhaps there is some value in solving this with generating functions. In the present case we have the marked generating function using $z$ for ones and $w$ for zeros and $y$ for runs of ones

$$(1+y(z+z^2+z^3+\cdots)) \\ \times \left(\sum_{q\ge 0} (w+w^2+w^3+\cdots)^q y^q (z+z^2+z^3+\cdots)^q\right) \\ \times (1+w+w^2+w^3+\cdots).$$

This is

$$\left(1+\frac{yz}{1-z}\right) \times \left(\sum_{q\ge 0} \frac{w^q}{(1-w)^q} y^q \frac{z^q}{(1-z)^q}\right) \times \frac{1}{1-w} \\ = \left(1+\frac{yz}{1-z}\right) \times \frac{1}{1-ywz/(1-w)/(1-z)} \times \frac{1}{1-w}.$$

Extracting the coefficient on $[y^R]$ for $R$ runs we get

$$\frac{w^R z^R}{(1-w)^R (1-z)^R} \frac{1}{1-w} + \frac{z}{1-z} \frac{w^{R-1} z^{R-1}}{(1-w)^{R-1} (1-z)^{R-1}} \frac{1}{1-w}.$$

Next do the coefficient on $[z^C]$ for $C$ ones

$$[z^{C-R}] \frac{w^R}{(1-w)^R (1-z)^R} \frac{1}{1-w} + [z^{C-R}] \frac{w^{R-1}}{(1-w)^{R-1} (1-z)^{R}} \frac{1}{1-w} \\ = {C-1\choose R-1} \frac{w^R}{(1-w)^{R+1}} + {C-1\choose R-1} \frac{w^{R-1}}{(1-w)^{R}}.$$

Finally extract $[w^{N-C}]$ for the remaining zeros:

$${C-1\choose R-1} [w^{N-C-R}] \frac{1}{(1-w)^{R+1}} + {C-1\choose R-1} [w^{N-C-R+1}] \frac{1}{(1-w)^{R}} \\ = {C-1\choose R-1} {N-C\choose R} + {C-1\choose R-1} {N-C\choose R-1} \\ = {C-1\choose R-1} {N-C+1\choose R}.$$

This confirms the accepted answer, no credit claimed here.