Number of binary strings without adjacent char more than $n$

165 Views Asked by At

$N$ is a natural number, let $0\leq\alpha\leq 1$ a fixed real number, I want to estimate the number of length-N binary strings which statisfy: the same character (i.e $0$ or $1$) could not occur repeatedly and subsequently more than $\alpha N$ times.

For example: substrings like $000....000$ (more than $\alpha N$ many $0$) could not exist.

I only need to know how the number $I(N)$ of these strings grows when $N\to \infty$.

2

There are 2 best solutions below

0
On BEST ANSWER

The answer is close to $2^n$. Here's an easy computation which suggests something like this: for a random binary string, the expected number of times $k$ consecutive zeroes or ones appear is $\frac{2(n-k+1)}{2^k}$. Already from this simple calculation we expect that a "typical" binary string has runs of length about $\log_2 n$. It turns out to be unlikely for a binary string to have runs much longer than this, which can be proven as follows.

The number $I(n, k)$ of length-$n$ binary strings not containing $k$ consecutive $0$s or $1$s can be computed as follows. We can encode such a string by recording first its initial digit and second a word of length $n-1$ using letters $D$ and $S$ (for "different" and "same") which record whether each successive digit is the same or different than the previous one. This is readily seen to be a bijection, and moreover the original string does not contain $k$ consecutive $0$s or $1$s iff the new string does not contain $k-1$ consecutive $S$'es. The first digit is irrelevant. It follows that

$$I(n, k) = 2 J(n-1, k-1)$$

where $J(n-1, k-1)$ is the number of length-$(n-1)$ strings of $D$s and $S$'es not containing $k-1$ consecutive $S$'es. The behavior of this sequence is thoroughly analyzed in Example V.4 of Flajolet and Sedgewick's Analytic Combinatorics on page 308, which contains in particular the following estimate: if $L$ denotes the random variable given by the longest run of $S$'es in a random string of length length $n$, then

$$\mathbb{P}(L < \lfloor \log_2 n \rfloor + h) = \exp(-f(n) 2^{-h-1}) + O \left( \frac{\log n}{\sqrt{n}} \right)$$

where $f(n) = 2^{ \{ \log_2 n \} }$ and $\{ x \}$ denotes the fractional part; in particular $1 \le f(n) \le 2$. Replacing $n$ with $n-1$ and setting $h = k - \lfloor \log_2 n \rfloor$ gives

$$\frac{I(n, k)}{2^n} = \frac{J(n-1, k-1)}{2^{n-1}} = \exp \left( -f(n-1) 2^{-k - \lfloor \log_2 n \rfloor - 1} \right) + O \left( \frac{\log n}{\sqrt{n}} \right)$$

from which we conclude that for $\alpha \in (0, 1]$, $I(n, \alpha n)$ is close to $2^n$ as $n \to \infty$; their ratio is asymptotically $1 + O \left( \frac{\log n}{\sqrt{n}} \right)$ (for $k$ this large the exponential term is $1$ plus a term so small that the error term dominates), which in particular approaches $1$.

The weaker conclusion that $\lim_{n \to \infty} \frac{I(n, \alpha n)}{2^n} = 1$ can probably be proven by estimating the variance of the number of times $k$ consecutive $S$'es appear and then applying Chebyshev's inequality. Flajolet and Sedgewick give an asymptotic expansion for the expected length $\mathbb{E}(L)$ of the longest run (which is $\log_2 n + O(1)$ but they give a more precise asymptotic than this) and mention that the variance is $O(1)$.

0
On

Partial answer: You can make (at least) somewhere around $2^{N-2/\alpha}$ different valid bitstrings (there is some rounding going on, but that's roughly the result you get). You do this by inserting $01$ every $\alpha N$ bits to break any potential consecutive substrings, and then choosing the remaining bits freely.

I don't know whether I am missing a significant amount of valid strings this way, though. It depends on what "significant" means.