Deciding equivalence of regular languages

1.8k Views Asked by At

Given two regular expressions $R$ and $S$ on an alphabet $\Sigma$ it is possible to decide their equivalence as follows:

  1. build two finite automata $M_R$ and $M_S$ such that $L(R) = L(M_R)$ and $L(S) = L(M_S)$
  2. build an automaton $M$ such that $L(M) = (L(M_R) - L(M_S)) \cup (L(M_S) - L(M_R))$
  3. test emptyness of $L(M)$ using a reachability algorithm on $M$

I was wondering if there is another way to decide equivalence. Suppose $M_R$ and $M_S$ are the minimal DFA (without epsilon-moves) such that $L(R) = L(M_R)$ and $L(S) = L(M_S)$. If they have a different number of states, then $R$ and $S$ are not equivalent. Otherwise let $m$ be the number of states of the two automata. Is it true that $L(M_R) = L(M_S)$ iff $\{x \in L(M_R) : |x| \leq m +1 \} = \{x \in L(M_S) : |x| \leq m +1 \}$? How to prove that with the Myhill-Nerode theorem?

3

There are 3 best solutions below

4
On BEST ANSWER

Theorem Let $\mathcal{A}$ and $\mathcal{B}$ be two DFA's with $m$ states and $n$ states, respectively. Then $\mathcal{L}(\mathcal{A}) = \mathcal{L}(\mathcal{B})$ iff $\{x \in \mathcal{L}(\mathcal{A}) : |x| < mn \} = \{x \in \mathcal{L}(\mathcal{B}) : |x| < mn \}$.

Proof We prove the two directions of the double implication.

($\Rightarrow$) Trivial.

($\Leftarrow$) We prove the contrapositive. Suppose $\mathcal{L}(\mathcal{A}) \neq \mathcal{L}(\mathcal{B})$ and let $w$ be a word of minimal lenght such that $w \not\in \mathcal{L}(\mathcal{A}) \cap \mathcal{L}(\mathcal{B}) = \mathcal{L}(\mathcal{A} \times \mathcal{B})$. Suppose, by way of contradiction, that $|w| \geq mn$. We set $X = \{\hat{\delta}_{\mathcal{A} \times \mathcal{B}}(q_0,x) : x \text{ prefix of } w\}$. Since $|X| \geq mn+1$ and $|Q_{\mathcal{A}\times\mathcal{B}}| = mn$, there exist two prefixes $u,u'$ of $w$ such that $\hat{\delta}_{\mathcal{A} \times \mathcal{B}}(q_0,u) = \hat{\delta}_{\mathcal{A} \times \mathcal{B}}(q_0,u')$. We can assume w.l.o.g. that $u$ is a prefix of $u'$. Hence there exist two strings $v,z$ such that $uv = u'$ and $u'z = w$ and hence $uvz = w$. Moreover since $u \neq u'$, the string $v$ is non-trivial, (i.e. $|v| \geq 1$). Now note that $\hat{\delta}_{\mathcal{A} \times \mathcal{B}}(\hat{\delta}_{\mathcal{A} \times \mathcal{B}}(q_0,u),v) = \hat{\delta}_{\mathcal{A} \times \mathcal{B}}(q_0,u)$, i.e. the characters composing the string $v$ lead the automaton $\mathcal{A} \times \mathcal{B}$ through a loop from the state $\hat{\delta}_{\mathcal{A} \times \mathcal{B}}(q_0,u)$ into itself. Therefore the string $uz$ is such that $\hat{\delta}_{\mathcal{A} \times \mathcal{B}}(q_0,uz) = \hat{\delta}_{\mathcal{A} \times \mathcal{B}}(q_0,w)$ and $|uz|<|w|$. This contradicts the minimality of $w$.

1
On

I have not read this paper, "Testing the Equivalence of Regular Languages", by Almeida, Moreira, and Reis, but the abstract sounds quite promising:

Antimirov and Mosses presented a rewrite system for deciding the equivalence of two (extended) regular expressions and argued that this method could lead to a better average-case algorithm than those based on the comparison of the equivalent minimal DFAs. In this paper we present a functional approach of a variant of that method, prove its correctness and give some experimental comparative results. Although being a refutation method, our preliminary results lead to the conclusion that indeed this method is feasible and it is, almost always, faster than the classical methods.

3
On

Here is a concrete counterexample to the conjecture, now that I’m reading it correctly.

Let $R=(a\lor b)^3\Big(a(a\lor b)^5\Big)^*$ and $S=(a\lor b)^3\Big(b(a\lor b)^5\Big)^*$. These have the $7$-state DFAs whose transitions tables are shown below; $s_0$ is the initial state, and $s_3$, shown in red, is the sole acceptor state.

$$\begin{array}{r|cc} M_R&a&b\\ \hline s_0&s_1&s_1\\ s_1&s_2&s_2\\ s_2&\color{red}{s_3}&\color{red}{s_3}\\ \color{red}{s_3}&s_4&s_6\\ s_4&s_5&s_5\\ s_5&s_0&s_0\\ s_6&s_6&s_6 \end{array} \qquad\qquad \begin{array}{r|cc} M_S&a&b\\ \hline s_0&s_1&s_1\\ s_1&s_2&s_2\\ s_2&\color{red}{s_3}&\color{red}{s_3}\\ \color{red}{s_3}&s_6&s_4\\ s_4&s_5&s_5\\ s_5&s_0&s_0\\ s_6&s_6&s_6 \end{array}$$

The eight words of length $3$ are the only words of length at most $8$ in both $L(M_R)$ and in $L(M_S)$, but the languages are not the same: $a^9\in L(M_R)\setminus L(M_S)$.

The problem is that the minimal redundant loop $s_3\to s_4\to s_5\to s_0\to s_1\to s_2\to s_3$ is too long and its starting point too far removed from the initial state, so the difference in the two languages doesn’t show up within the first $8$ transitions.