Number of distinct strings 'covering' all strings of same length, with restricted index selection.

91 Views Asked by At

A string $x \in n^n$ 'covers' another string $y \in n^n$ if $x$ contains all the symbols that constitute $y$. For eg., $x = abcdeb$ covers $y = aaacbc$. The problem is to derive a formula to count the number of $l$-length combinations of the indices $i_1i_2 \dots i_n$ such that $x_{i_1}x_{i_2}\dots x_{i_l}$ covers $y_{i_1}y_{i_2}\dots y_{i_l}$, where $x,y \in n^n$. The indices are selected in a non-decreasing order, i.e., $i_1 \leq i_2 \leq \dots \leq i_l$. The count is to be performed over all possible $x,y \in n^n$.

I took a stab at solving the problem. First I transformed the given strings into binary strings $\in 2^n$, where the bit at index $i$ represents if the $i^{th}$ symbol from the $n$ possible symbols is present in the given string $(1$ represents presence and $0$ absence). For eg., if $x,y \in \{0,1,\dots,8\}^9$, and $y = 014801484$, then $y' = \{110010001\}$. Next, we consider the hamming weight of $y'$ represented as: $w = ||y'||_1$. I came up with the following estimate for the count of all $l$-combinations of the binary strings $x' \in 2^n$, that cover the corresponding $l$-combinations of some other string $y' \in 2^n$:

\begin{equation}\label{eq} \sum\limits_{w=1}^n \dbinom{n}{w} \cdot \sum\limits_{m=0}^w \dbinom{n-m}{l} \cdot 2^{n-w} \qquad \qquad \end{equation} But this seems incorrect!

First let me explain how I got to this equation: initially, I consider the $x's$ with $1's$ at exactly that same indices as another string $y'$ with hamming weight $w$. There are $2^{n-w}$ options to fill the rest of the $n-w$ indices in such $x's$. Hence, we may choose any $l$-out-of-$n$ indices. Next, we consider the $x's$ that disagree with the $y'$ at a single $1$ spot (i.e., where $y'$ is $1)$, hence from such $x's$ we must choose from the remaining $n-1$ indices and so on...But, the $2^{n-w}$ factor remains the same as we're still fixing $w$ spots in the $x's~ (\text{i.e., }n-1 \; 1's \text{ and one } 0)$. To account for the the total number of binary strings of hamming weights $w = 1$ to $w=n$, I added the term $\dbinom{n}{w}$.

But as mentioned earlier, this formula is incorrect as it overcounts the result. For eg., for $y' = 101101$ with $l=3$, all of the following $l$ combinations are equivalent $(= 101)$:

  • $i_1i_2i_3$

  • $i_4i_5i_6$

  • $i_1i_2i_4$

  • $i_1i_2i_6$

  • $i_3i_5i_6$

and so on...but the aforementioned formula counts them as distinct $l$ bits long strings covering the corresponding $l$ bits of $y'$. I have been stuck on this problem for some time, and any help is greatly appreciated. Thanks!

1

There are 1 best solutions below

1
On BEST ANSWER

I looked at your previous question (or this one earlier?) and it seemed a little different, so I might be misunderstanding, but here’s a try.

In your example with $y = 014801484$, it sounds like you want to count the number of strings in $\{0,1,\dots,8\}^9$ (left-padded integers between $000000000$ and $888888888$) that contain all of the digits $0$, $1$, $4$, and $8$.

If so, use inclusion-exclusion as follows.

For a subset $D$ of available digits, let $N(n,d,D)$ be the set of left-padded $n$-digit integers in base $d$ (strings) that are missing all of the digits in $D$. For example, $N(9,9,\{4\})=\vert\{0,1,2,3,5,6,7,8\}^9$, and $N(7,5,\{0,1\})=\{2,3,4\}^7$. In general, $|N(n,d,D)\vert=(d-\vert D\vert)^n$. (Note that $|D|$ can’t be greater than $d$.)

The cardinality of $N(n,d,D)$ doesn’t depend on the specific elements of $D$, only on the size of $D$, so let $m(n,d,k)$ be the number of strings missing every digit in some specific set of $k$ digits.

If you have $d$ available digits, and $D(y)$ is the set of digits that appear in $y$, you are trying to find out how many left-padded $n$-digit base-$d$ integers are not in $\bigcup_{i\in D(y)} N(n,d,\{i\})$, which is $d^n-|\bigcup_{i\in D(y)} N(n,d,\{i\})|$.

Now by inclusion-exclusion, $d^n-|\bigcup_{i\in D(y)} N(n,d,\{i\})|\\=d^n-\sum_{i\in D(y)} |N(n,d,\{i\})|+\sum_{i,j\in D(y),i\neq j} |N(n,d,\{i,j\})|\pm\cdots\\= d^n-|D(y)|m(n,d,1) +{|D(y)|\choose2}m(n,d,2) -{|D(y)|\choose3}m(n,d,3)\cdots \pm {|D(y)|\choose|D(y)|}m(n,d,|D(y)|)$.

This equals ${|D(y)|\choose0}(d-0)^n-{|D(y)|\choose1}(d-1)^n +{|D(y)|\choose2}(d-2)^n \cdots \pm {|D(y)|\choose|D(y)|}(d-|D(y)|)^n$.

To make this easier to read, let $M=|D(y)|$, so the number you want is

${M\choose0}(d-0)^n-{M\choose1}(d-1)^n +{M\choose2}(d-2)^n \cdots \pm {M\choose M}(d-M)^n$.

There may be a closed-form expression for this, but Mathematica didn’t give me one when I asked. On the other hand, it’s not hard to evaluate numerically. For your example of $9$-digit strings with $9$ available digits that “cover” the digit-set $\{0,1,4,8\}$, Mathematica tells me there are ${4\choose0}(9-0)^9-{4\choose1}(9-1)^9 +{4\choose2}(9-2)^9 - {4\choose 3}(9-3)^9+ {4\choose 4}(9-4)^9\\=9^9-4\cdot8^9+6\cdot7^9-4\cdot6^9+5^9= 54313560$

strings.

Does this agree with your calculations?