How many n-bit strings have at most m subsequent 0s?

225 Views Asked by At

The title already is the complete question, but I would like to add some details to make clear what I mean.

A $n$-bit string is an element of $\{0,1\}^n$. All possible 3-bit strings are:

  • 0: 000
  • 1: 001
  • 2: 010
  • 3: 011
  • 4: 100
  • 5: 101
  • 6: 110
  • 7: 111

As they can be encoded as binary numbers, there are $2^n$ $n$-bit strings. If I want at most $m=1$ subsequent 0s, the allowed sequences are:

  • 2: 010
  • 3: 011
  • 5: 101
  • 6: 110
  • 7: 111

Hence for $n=3, m=1$ the answer is 5.

What is the answer for arbitrary $n \in \mathbb{N}$ and $m \in \{0, \dots, n\}$? What is the answer for $n=30, m=4$?

Background of the question

This is the background of my question. Is it not necessary to understand it / answer it.

I have developed a symbol recognition application (see http://write-math.com) and I am now thinking about different possibilities to extend it to formula recognition. One part of formula recognition is to recognize the single symbols.

I get on-line data, that means I get the information how the symbol is written (in contrast to off-line data, where you only have pixels). My data is a list of strokes, where each stroke is a point. A point has x/y coordinates and a time:

[[(x=12, y=21, t=0), (x=7, y=13, t=3), ((x=7, y= 21, t=5), ...],
 [(x=0, y=0, t=6)],
 [...], ...]

Now I know that people always segment their symbols perfectly. That means a symbol can have multiple strokes, but one stroke always belongs to exactly one symbol. As I have one recognizer for symbols which works fairly well, it would be possible to simply use it to recognize multiple symbols by trying every possible segmentation. This can be modeled by taking a $n$-bit string for $n+1$ strokes. If a given bit is 0, then the neighboring strokes are connected:

stroke1   stroke2   stroke3
        0         1

Stroke1 and stroke2 are connected, but separated from stroke3
Hence the first symbol has the first 2 strokes,
the second symbol has the last stroke

I know that symbols with more than 4 strokes are very rare. So rare in fact, that I don't use any stroke after the 4th stroke for recognition. Hence I don't need to check $2^n$ segmentation, but less. My question is how many segementations I have to check.

1

There are 1 best solutions below

0
On

Let we fix the value of $m$ just for clarity. For instance, let $m=3$.

Let $E_n$ be the set of binary strings with $n$ bits and without $m$ consecutive zeroes. Let $A_n=|E_n|$.

We have $A_0=1,A_1=2,A_2=4$ since the constraint on the number of zeroes does not affect strings with fewer than three bits. Assume $n\geq 3$ and let $\sigma$ be an element of $E_n$. $\sigma$ can be only:

  • a $1$ followed by an element of $E_{n-1}$;
  • a $01$ followed by an element of $E_{n-2}$;
  • a $001$ followed by an element of $E_{n-3}$

hence we have the recursion: $$ A_{n} = A_{n-1}+A_{n-2}+A_{n-3} $$ with characteristic polynomial $x^3-(x^2+x+1)$. An explicit formula for $A_n$ can hence be derived in the same way we get a closed formula for Fibonacci numbers:

$$ A_n = k_1 \zeta_1^n + k_2 \zeta_2^n + k_3 \zeta_3^n $$ where $\zeta_1,\zeta_2,\zeta_3$ are the roots of the characteristic polynomial and $k_1,k_2,k_3$ depend on the initial conditions $A_0=1,A_1=2,A_2=4$.

It is also interesting to notice that the dominant root $\zeta_1$ can be bounded by multiplying the characteristic polynomial by $x-1$ then doing a step of the Newton's method with starting point $x_0=2$. That leads to:

$$ \zeta_1 \leq 2-\frac{1}{2^m}. $$