There is a sequence of operations on grammars of a string that strictly decreases the size of grammars down to the smallest grammer.

64 Views Asked by At

I'm trying to figure out the smallest grammar problem, which yes I know is impossible since it's such a hard problem, but humor me for a sec.

Let $g$ be a smallest grammar for the string $s$ over the alphabet $\Sigma = \{a,b,c, \dots, z\}$, ie for formal language theorists, $s \in \Sigma^*$.

For example $s = abaaabbaba$ and $g$ might be:

$$ g \to AaaAbAa \\ A \to ab $$

Well we could undo the involvement of the variable $A$ in one step by substituting the RHS of $A$ to wherever it occurs in other rules. If this doesn't result in a larger sized grammar then the use of the variable was not required, so let's only work with "minimal" smallest grammars.

The topic statement

Since there is a sequence of such operations all the way to $g \to s = abaaabbaba$, then there must be a sequence of reverse operations leading from the grammar $g \to s$ all the way to one of its smallest grammars. And these operations can be chosen such that they drop the size of the intermediate grammar each time (strictly decreasing size).

The reverse operation used above would be taking any set of occurances of the contents of some variable $A$, ie some occurences in the grammar definition, and replacing them with a symbol for $A$ and adding the rule for $A$ if it wasn't already.

1

There are 1 best solutions below

1
On BEST ANSWER

(Somewhat) informal argument in a proof by induction by the number $n$ of elements in your string:

The statement is clearly true for $n=1$ (the initial production is minimal). For illustration, note that any string of length $2$ also is already minimal: if

$A \rightarrow ab,$

the only possibly replacement grammar is

$A \rightarrow B$

$B \rightarrow ab$

which has greater length.

Assume that the claim holds for strings of length $n-1$, and assume further wlog that the letter $a$ added is the first. Using the usual notation, the length $n$ string is thus of form $a \alpha \beta$, with $\alpha$ or $\beta$ possibly empty (the notation will hopefully make sense when reading the below).

There are now 2 cases:

Case 1:
There is no reduction possible on top of those we can make for the string $\alpha \beta$ of length $n-1$. However, the reductions possible for this string can be made with $a$ appended to the left of $\alpha$ (or its leading factor). Assume that starting from the string $\alpha \beta$ it took $k \le n-1$ steps to reach a minimal grammar. Then it takes $k$ steps as well to reach it for a string of length $n$, and we are done.

Case 2:
It is easy to see (after thinking about it, and this is where the proof isn't fully formal as it should be a lemma) that any additional reduction in step $n$ means that the string is of form

$aA \alpha aA \beta aA \gamma,$

the sub-string $aA$ occuring at least three times in the string of length $n$, with any of the greek symbols possibly empty. In this case, we add an additional reduction for a new non-terminal $B$

$B \rightarrow aA$

This will not interfere with the reductions possible for the old string $\alpha \beta$ (we perform them 'with an $a$ added on the left as needed'; which, again, in a fully formal argument needs a lemma), so in this case too, we have shown the step $n-1$ to $n$.

Note: The meaning of the term 'length' should be clear from context (it could be the length of a string; and length as you defined in your comment below your question).