CFG with even length and not in the form of SS.

432 Views Asked by At

I know that we can construct a CFG of all strings over $\{0, 1\}$ with even length using the following

$$T \mapsto 0T0 \mid 0T1 \mid 1T0 \mid 1T1 \mid \epsilon$$

But I want the string to have non-equal halves, meaning that they are not in the form $SS$, where $S$ is some string.

How I can modify the above CFG to satisfy this constraint?

1

There are 1 best solutions below

0
On BEST ANSWER

The language that you want to prove to be context-free is

$$L = \{uv \in \{0,1\}^* \mid |u| = |v| \text{ and } u \neq v\} $$

where $|w|$ is the length of the string $w$. Actually, it turns out that $L = L'$ where

$$L' = \{u_1 x \, u_2 \, v_1 y \, v_2 \in \{0,1\}^* \mid x,y\in \{0,1\}, \ |u_1| = |u_2|, \ |v_1| = |v_2| \text{ and } x \neq y \} $$

The formal proof that $L = L'$ is below and is quite technical. But intuitively, it is clear that $L = L'$. Indeed, if a string $w$ of even length can be split in two non-identical halves, then some characters from one of the halves can be "lent" to the other in order to decompose $w$ in the way described in the definition of $L'$. And the inverse process is also well defined.

The psychological advantage of considering $L'$ instead of $L$ (although they are the same set) is that the language $L'$ is clearly generated by the following context-free grammar:

\begin{align} S &\ \to \ T_0 T_1 \ \mid \ T_1 T_0 \\ T_0 &\ \to \ 0 \ \mid \ 0 \, T_0 \, 0 \ \mid \ 0 \, T_0 \, 1 \ \mid \ 1 \, T_0 \, 0 \ \mid \ 1 \, T_0 \, 1 \\ T_1 &\ \to \ 1 \ \mid \ 0 \, T_1 \, 0 \ \mid \ 0 \, T_1 \, 1 \ \mid \ 1 \, T_1 \, 0 \ \mid \ 1 \, T_1 \, 1 \end{align}

The intuition is that $T_0$ (the rule in the second line) yields a string of the form $w_1 0 \, w_2$ with $|w_1| = |w_2|$, while $T_1$ (the rule in the third line) yields a string of the form $w_1' 1 \, w_2'$ with $|w_1'| = |w_2'|$. Putting $T_0$ and $T_1$ together (the rule in the first line) yields a string of the form described in the definition of $L'$.


Let us prove that $L = L'$ in a rigorous way.

Proof of $L \subseteq L'$. Let $w \in L$. So, $w = uv$ where $|u| = |v| = n$ for some $n \geq 0$, and $u \neq v$; that is, $u$ and $v$ have the same length and differ at least in their $i^\text{th}$ character, for some position $i$. Therefore, $u = x_1 \dots x_{i-1} x_i \, x_{i+1} \dots x_n$ and $v = y_1 \dots y_{i-1} y_i \, y_{i+1} \dots y_n$ with $x_i \neq y_i$ for some $1 \leq i \leq n$ (all the $x_j$'s and $y_j$'s are in $\{0,1\}$). There are three cases:

  1. either $i-1 = n - i$ (that is, $|x_1 \dots x_{i-1}| = |x_{i+1} \dots x_{n}|$ and $|y_1 \dots y_{i-1}| = |y_{i+1} \dots y_{n}|$); then, we set $u_1 = x_1 \dots x_{i-1}$ and $u_2 = x_{i+1} \dots x_{n}$, and $v_1 = y_1 \dots y_{i-1}$ and $v_2 = y_{i+1} \dots y_{n}$; thus, $w = u_1 x_i u_2 \,v_1 y_i v_2$ with $|u_1| = i-1 = n-1 = |u_2|$, $|v_1| = i-1 = n-i = |v_2|$ and $x_i \neq y_i$; therefore, $w \in L'$;

  2. or $i-1 > n - i$ (that is, $|x_1 \dots x_{i-1}| > |x_{i+1} \dots x_{n}|$ and $|y_1 \dots y_{i-1}| > |y_{i+1} \dots y_{n}|$); then, we set $k = 2i - 1 - n$, and $u_1 = x_1 \dots x_{i-1}$ and $u_2 = x_{i+1} \dots x_{n} \, y_1 \dots y_k$, and $v_1 = y_{k+1} \dots y_{i-1}$ and $v_2 = y_{i+1} \dots y_{n}$; thus, $w = u_1 x_i u_2 \,v_1 y_i v_2$ with $|u_1| = i-1 = n-i + k= |u_2|$, $|v_1| = i - 1- k = n-i = |v_2|$ and $x_i \neq y_i$; therefore, $w \in L'$;

  3. or $i-1 < n - i$ (that is, $|x_1 \dots x_{i-1}| < |x_{i+1} \dots x_{n}|$ and $|y_1 \dots y_{i-1}| < |y_{i+1} \dots y_{n}|$); this case is analogous and dual to the previous one.

Proof of $L' \subseteq L$. Let $w \in L'$. So, $w = u_1 x \, u_2 \, v_1 y \, v_2$ with $x,y \in \{0,1\}$, $|u_1| = |u_2| = n$ for some $n \geq 0$, $|v_1| = |v_2| = m$ for some $m \geq 0$, and $x \neq y$. Therefore, $u_1 = x_1 \dots x_{n}$ and $u_2 = x_{1}' \dots x_n'$ and $v_1 = y_1 \dots y_{m}$ and $v_2 = y_{1}' \dots y_m'$ with $x_i \neq y_i$, where all the $x_j$'s, $x_j'$'s, $y_j$'s and $y_j'$'s are in $\{0,1\}$. There are three cases:

  1. either $n = m$ (that is, $|u_1| = |u_2| = |v_1| = |v_2|$); then, we set $u = u_1 \, x \, u_2$ and $v = v_1 \, y \, v_2$, hence $w = uv$ with $|u| = 2n + 1 = 2m + 1 = |v|$ and $u \neq v$ (because $x \neq y$, where $x$ is the $(n+1)^\text{th}$ character in $u$, and $y$ is the $(n+1)^\text{th}$ character in $v$); therefore, $w \in L$;

  2. or $n > m$ (that is, $|u_1| = |u_2| > |v_1| = |v_2|$); then, we set $k = n-m$ and $u = u_1 \, x \, x_1' \dots x_m'$ and $v = x_{m+1}' \dots x_{n}' v_1 \, y \, v_2$, hence $w = uv$ with $|u| = n + m + 1 = n - m + 2m + 1 = |v|$ and $u \neq v$ (because $x \neq y$, where $x$ is the $(n+1)^\text{th}$ character in $u$, and $y$ is the $(n+1)^\text{th}$ character in $v$); therefore, $w \in L$;

  3. or $n < m$ (that is, $|u_1| = |u_2| < |v_1| = |v_2|$); this case is analogous and dual to the previous one.