Length of smallest DFA for the language consisting of string $x$ s.t. the number of $0$'s in $x$ is divisible by $k$ and number of $1$'s in $x$ is odd

1k Views Asked by At

"Let $k \ge 2$. Let $L$ be the set of strings in $\{0,1\}^*$ such that $x \in L$ if and only if the number of $0$'s in $x$ is divisible by $k$ and number of $1$'s in $x$ is odd. What is the minimum number of states in a deterministic finite automaton (DFA) that accepts $L$?"

This is a question from the (discontinued) GRE-CS exam (see question-50 in http://faculty.cse.tamu.edu/walker/Quals/GRE_CompSci_1.pdf). The answer to the question is $2k$. And the answer is quite a plausible one after one does a bit of work and draws a few diagrams.

I have two questions:

(i) How to show rigorously that $2k$ is the correct answer?

(ii) Is there a general known strategy to deal with problems of this pattern (not necessarily this specific problem).

To give an actual specific example, consider the language $L_n \subseteq \{0,1\}^*$ such that: $$L_n=\{s\in \{0,1\}^*\, |\,\, abs(N_0(s)-N_1(s)) \leq n \}$$ where $abs$ denotes the absolute value and $N_0(s),N_1(s)$ denote the number of $0$'s and $1$'s in $s$ respectively. It seems that the problem of smallest DFA (for $L_n$) can also be solved by first determining by inspection the smallest DFA for $L_n$ and then applying the minimisation algorithm for confirmation (as pointed by Rick Decker in comments).

Edit: It was mentioned in one of the answer below that the language $L_n$ just mentioned above isn't regular. The language I had actually in mind was this:

$L_n=\{s\in \{0,1\}^*\, |\,$ for every substring t of s, $abs(N_0(t)-N_1(t)) \leq n \}$

3

There are 3 best solutions below

2
On BEST ANSWER

Let me slightly change your notation by considering the alphabet $A = \{a,b\}$ (this is just to avoid any possible confusion with the states, see below). Let me also use the (standard) notation $|u|_c$ to denote the number of occurrences of a letter $c$ in the word $u$.

You are asking for the number of states of the minimal DFA of the language $$ L_k = \{ u \in A^* \mid |u|_a \equiv 0 \bmod k \space \text{and} \space |u|_b \equiv 1 \bmod 2\}. $$ It is easy to design a DFA $\mathcal{A}_k = (Q, A, \cdot, q_-, F)$ recognising $L_k$ be taking $Q = \mathbb{Z}/{k\mathbb{Z}} \times \mathbb{Z}/{2\mathbb{Z}}$ as set of states , $q_ = (0, 0)$ as initial state, $F = \{(0, 1)\}$ and transitions of the form $$ (p,q)\xrightarrow{a} (p+1,q) \qquad (p,q)\xrightarrow{b} (p, q+1) $$ It remains to prove that $\mathcal{A}_k$ is minimal. Observe that each letter ($a$ and $b$) induce a permutation on $Q$, so that $\mathcal{A}_k$ is a permutation automaton. Furthermore, $\mathcal{A}_k$ is accessible, which means that every state can be reached from the initial state.

The result now follows from two results of independent interest.

Proposition 1. In a finite accessible permutation automaton, any two pairs of states can be connected by a path.

Proof. Let $\mathcal{A} = (Q, A, \cdot, q_-, \{q_f\})$ be an accessible permutation automaton and let $p, q \in Q$. Since $\mathcal{A}$ is accessible, there exists a path from $q_-$ to $p$ labelled by a word $u$ and another path from $q_-$ to $q$ labelled by a word $v$. Let $n = |Q|$. Since $u$ induces a permutation on $Q$, the word $u^{n!-1}$ induces the inverse permutation (indeed, since the symmetric group on $|Q|$ has order $n!$, $u^{n!}$ induces the identity) and hence is the label of a path from $p$ to $q_-$. It follows that the word $u^{n!-1}v$ is the label of a path from $p$ to $q$.

Proposition 2. Every finite accessible permutation automaton with a single final state is minimal.

Proof. Let $\mathcal{A} = (Q, A, \cdot, q_-, \{q_f\})$ be an accessible permutation automaton with a single final state. Let $q_1$ and $q_2$ be two equivalent states: since there is only one final state, this means that, for every word $u$, the conditions $q_1 \cdot u = q_f$ and $q_2 \cdot u = q_f$ are equivalent. But, by Proposition 1, there exists at least one word $u$ such that $q_1 \cdot u = q_f$. Since $u$ induces a permutation on $Q$, one gets $q_1 = q_2$ and thus $\mathcal{A}$ is minimal.

Second question. You are asking for the number of states of the minimal DFA of the language $$ R_n = \{ u \in A^* \mid \left||u|_a - |u|_b \right| \leqslant n\}. $$ Unfortunately, this is not a regular language. For instance, $R_0 = \{ u \in A^* \mid |u|_a = |u|_b\}$ is well-known to be a context-free nonregular language since $R_0 \cap a^*b^* = \{a^nb^n \mid n \geqslant 0\}$.

0
On

For any string $s \in \{0, 1\}^\ast$, let $(x_s, y_s)$ be the tuple such that $x_s$ is the number of $0$'s in $s$ and $y_s$ is the number of $1$'s in $s$.

Now $s \in L$ if and only if $x_s \equiv 0 \pmod k$ and $y_s \equiv 1 \pmod 2$, which means the same as saying that $(x_s, y_s) = (0, 1)$ in $\mathbb{Z}_k \times \mathbb{Z}_2$.

Suppose $s = s_1s_2$ (concatenation) and we have already read $s_1$, which has $x_{s_1}$ number of $0$'s and $y_{s_1}$ number of $1$'s. Then in order for $s$ to be accepted, we need $s_2$ to have $x_{s_2} \equiv -x_{s_1} \pmod k$ and $y_{s_2} \equiv 1 - y_{s_1} \pmod 2$.

This means that after reading $s_1$, the DFA must be in a state in which it can only accept strings $s_2$ which correspond to tuples $(x_{s_2}, y_{s_2}) = (-x_{s_1}, 1 - y_{s_1})$ in $\mathbb{Z}_k \times \mathbb{Z}_2$.

But as $(x_{s_1}, y_{s_1})$ ranges over all tuples in $\mathbb{Z}_k \times \mathbb{Z}_2$, so does $(-x_{s_1}, 1 - y_{s_1})$.

This means that for each pair $(x_{s_2}, y_{s_2}) \in \mathbb{Z}_k \times \mathbb{Z}_2$, the DFA must have a state that accepts a string containing $x_{s_2} \pmod k$ number of $0$'s and $y_{s_2} \pmod 2$ number of $1$'s. But these states must be distinct since they accept different sets of strings.

That means we cannot have less than $2k$ states in the DFA.

A DFA with exactly $2k$ states may be obtained as follows:

Initial state: $state_{0,\text{ }0}$

For $i = 0, ..., k-1$ and $j = 0, 1$:

$state_{i,\text{ }j}:$
$0 \rightarrow state_{i+1 \pmod{k},\text{ }j}$
$1 \rightarrow state_{i,\text{ }j+1 \pmod 2}$

The accepting state is $state_{0,\text{ }1}$.

1
On

The Myhill-Nerode theorem (which generalizes the reasoning in Tob Ernack's answer) provides a general way to prove lower bounds for DFA sizes:

If you have a set $A$ of words such that any two words $x$ and $y$ in $A$ have a distinguishing extension -- that is a $z$ such that one but not both of $xz$ and $yz$ are in $\mathcal L$, then a deterministic automaton for $\mathcal L$ must have at least $|A|$ states.

In your case we can use $A=\{\mathtt 0^n\mathtt 1^m \mid n\in\{1,2,\ldots,k\}, m\in\{0,1\}\}$, which has cardinality $2k$.


(In fact, the size of the minimal DFA for a regular language is exactly the size of a maximal $A$ that satisfies this condition).