Shortest string possible

591 Views Asked by At

I was at an interview, and I was asked to give the shortest string generated given this context free grammar. I did not review in years, so I think I got it wrong. What is the answer so I know it for future purposes?

$$\begin{align*} &S\to ABA\mid SS\\ &A\to S0\mid T1T\\ &B\to S1\mid 0\\ &T\to 0 \end{align*}$$

Thanks.

4

There are 4 best solutions below

0
On BEST ANSWER

You can actually do this via a form of abstract interpretation! We're going to interpret this grammar (not the syntax trees, the grammar itself!) into the naturals, namely the minimum length each non-terminal can generate. To do this, we'll replace each production arrow by '=', each terminal by 1, concatenation by '+' and alternation by 'min'. For clarity, I'll write the symbols in the interpretation in lower case (and I'm going to walk roughshod over some details).

Applying this procedure to the given grammar, we get: $$\begin{align} &s = \min(a + b + a, s + s) \\ &a = \min(s + 1, t + 1 + t) \\ &b = \min(s+1,1) \\ &t = 1 \end{align} $$

Substituting $t=1$ (i.e. we've found out that $T$ produces at least one terminal), and using $\min(a+c,b+c)=\min(a,b)+c$ we get: $$\begin{align} &s = \min(a + b + a, s + s) \\ &a = \min(s, 2) + 1 \\ &b = \min(s,0) + 1 \end{align} $$

Now we can deduce (using $x \geq 0 \implies \min(x,0) = 0$) that $b = 1$. Substituting that gives us: $$\begin{align} &s = \min(2a + 1, 2s) \\ &a = \min(s, 2) + 1 \end{align} $$

Now we are seemingly stuck, because of the mutual recursion between $a$ and $s$. And in fact, $s=0, a=1$ would be a solution to this last system. This (loosely) corresponds to using the production $S \to SS$ forever, without producing any terminals. But we could also reason that $S$ can never produce the empty string - by doing a second interpretation, which I'll put at the end in order to keep this story going!

So we also have $s \geq 1$. Plugging that into $s = \min(2a + 1, 2s)$ along with $a \geq 1$ gives us $s \geq 2$, whence $a \geq 3$. From that we get that $s \geq 4$, and we finally arrive at $s \geq 7$.

Therefore we have that $s=7, a=3$ is a solution to the last system of equations shown, subject to the constraint that $s \geq 1$. From that we finally conclude that $S$ produces at least 7 terminals.

This may seem like a lot of work for such a trivial problem, but after the idea of using a second interpreation, it doesn't require big leaps of insight so a computer could do it. In fact, parser generators use a similar procedure to see what the first terminal coming from a non-terminal can be and so on.

One of the details I've omitted is that a simple grammar like $S \to 1S$ (with interpretation $s = s+1$ seemingly doesn't have a solution - but it does, $s=\infty$! There are ways around this of course (you could prove that this new grammar doesn't terminate by yet another abstract interpretation, or you could interpret into $\mathbb{N} \cup \{\infty\}$, but I hope I at least got the basic idea across...

Appendix: Proving $S$ can't produce the empty string

Instead of a numerical interpretation, we're going to interpret the grammar into the booleans, to answer the question "Can this terminal produce the empty string?" A terminal is interpreted by false($\perp$), concatenation by $\wedge$ and alternation by $\vee$ (and the empty string by truth, but that's not necessary here). We get:

$$\begin{align} &s = (a \wedge b \wedge a) \vee (s \wedge s) \\ &a = (s \wedge \perp) \vee (t \wedge \perp \wedge t) \\ &b = (s \wedge \perp) \vee \perp \\ &t = \perp \end{align} $$

With a bit of boolean logic, this rapidly simplifies to $s,a,b,t=\perp$, so none of the non-terminals can produce the empty string. Whew!

0
On

To get a terminal string, you must get rid of the non-terminals $S$ and $A$. Using $S\to SS$ clearly does not help, so you must start with $S\to ABA$. The only way to get rid of the $A$’s is with $A\to T1T$, so the best you can do is $S\to ABA\to^* T1T0T1T\to^* 0100010$.

0
On

Hint:

  1. There are no rules which make strings shorter.
  2. What ways are there to rid of a symbol A?
0
On

Let $m$ be the length of the shortest string produced by $S$. We are trying to find $m$.

$T$ and $B$ clearly can't turn into any string of length less than 1.

$A$ can turn into $S0$ or $T1T$. $S0$ must turn into a string of length at least $m+1$, and by hypothesis we can produce a string of length $m$ starting from $S$, and $m+1$ is not the shortest. So if we ever use $A$ it will be via the $T1T$ production, producing a string of length at least 3.

$S$ can turn into $SS$, which has length at least $2m$. But $2m \ge m$, so there must be a way to produce the shortest possible string without using the $SS$ production. So there must be a way to produce it using only the $ABA$ production. Any string produced by $ABA$ has length at least $3 + 1 + 3 = 7$.

So the answer is $S\to ABA \to T1T\ 0\ T1T \to 010\ 0\ 010$ for a length of 7.