The shortest word in context free language

882 Views Asked by At

Let $G=(\Sigma,N,R,S)$ be a context-free grammar. For every production rule A --> w, we say that its length is $r$ if $|w|=r$. In addition $n = |N|$, and $k =$ the maximal length of a production rule in the grammar. We assume here that $L(G)$ is not empty. Find a tight upper bound (an arithmetic expression f(n,k)) for the shortest word in L(G). Namely, find an expression such that the shortest word in L(G) is at most f(n,k), but there is another context-free grammar F such that the shortest word in L(F) is exactly f(n,k). Thanks a lot :)

1

There are 1 best solutions below

3
On

Intuitively, if we're trying to make the word as short as possible, we might want to get rid of the non-terminals as soon as possible and replace them by as few terminals as we can.

More formally, if the language $L(G)$ is non-empty, the grammar must contain at least one rule of the form $A\rightarrow w$ with $w$ containing no non-terminals (otherwise one would never be able to get rid of them all); let's call such rule terminal too. Now, consider the terminal rule with $|w|$ as short as possible. Whenever we encounter the non-terminal $A$, we'd have to eventually apply some terminal rule to get rid of it. However, any such rule would result in at least as many terminals as applying the chosen one straight away; so we can safely apply the chosen rule as soon as we can and forget all other rules applicable to $A$ (as they won't be used ever).

The grammar can be simplified a bit now: Since $A$ is now essentially the same as $w$, we can actually get rid of $A$ completely and replace it by $w$ in all the rules. Doing so reduces the set of non-terminals by one and possibly expands the length of the remaining production rules. This expansion is at most $k$-fold, though, because each rule has at most $k$ non-terminals on the right-hand side (and, very importantly, this stays true after the expansion too). The new grammar has $(n-1)$ non-terminals and the maximal length of production rule in it dos not exceed $k^2$.

Not very surprisingly, we can repeat the same process over and over, reducing the number of non-terminals by $1$ and expanding the production rules at most $k$-fold each time. Repeating it $(n-1)$ times in total yields a grammar with just one non-terminal and production rules of length not exceeding $f(n,k)=k^n$, which gives us the desired upper bound on the length of the shortest word in $G$.

In order to see that it's tight, it's sufficient to consider a very simple grammar $G_{n,k}=(\{a\},\{A_1,A_2,\ldots A_n\}, R, A_1)$ with ruleset $R$: $$\begin{eqnarray*} A_1 & \rightarrow & \overbrace{A_2A_2 \ldots A_2}^{k} \\ A_2 & \rightarrow & A_3A_3 \ldots A_3 \\ & \ldots & \\ A_{n-1} & \rightarrow & A_nA_n \ldots A_n\\ A_n & \rightarrow & \underbrace{aa\ldots a}_{k} \\ \end{eqnarray*}$$