Can we prove that the size of a smallest grammar for $s = a^{2^n}$ is $2n$? (equivalently powering a number $a$)

41 Views Asked by At

Let $s = a^{2^n}$. Then clearly we can form a reduced single-string grammar as:

$$ S \to A^2 \\ A \to B^2 \\ B \to C^2 \\ \vdots \\ Z \to a^2 $$ which has size $2n$. Can we prove that this is indeed a smallest grammar for $s$, for any $n$? This is equivalent to saying it takes minimally $2n$ multiplications to power a number $a$ by $2^n$.

2

There are 2 best solutions below

0
On BEST ANSWER

It is not true when $n$ gets very large.

To make long strings of $a$s, it is more efficient to have a grammar of the form $$ A \to BBB \\ B \to CCC \\ C \to DDD \\ \ldots $$ than to have only two repetitions on each right-hand side. Repeated tripling allows us to multiply by $9$ for each $6$ symbols on a right-hand side, whereas repeating doubling only gives us a multiplication by $8$ for $6$ symbols.

The problem is that this won't give us powers of $2$ directly, and if we need to mention any significant fraction of the $a^{3^k}$ strings once again to combine them into an $a^{2^n}$, the benefit of counting by threes instead of twos will be lost.

On average, each time spend $6$ symbols to triple two times, we would have spent $6 \log_8 9$ symbols to get as high with doublings, so we have saved only $6(\log_8(9)-1) \approx 0.32 $ symbols. So, again on average we can afford only to mention one of the powers-of-three only once every $2\cdot\frac{1}{0.32}\approx 6.2 $ triplings.

Very well, then! $7$ triplings make a factor of $3^7=2187$. It can be done with $21$ symbols, whereas it will take doublings $22$ symbols to reach just $2096$. Then we have one symbol left over.

So write $2^n$ in base 2187.

Use repeated triplings to make all the required powers of $2187$, and then have $2186$ different non-terminals $T_1$ through $T_{2186}$ where the right-hand side of $T_i$ consists of exactly those powers of $2187$ that have the digit $i$ in the expansion of $2^n$. When we have done all of that we still have not used as many right-hand-side symbols as it would take to reach $2^n$.

All we now need is $$ S \to T_1\,T_2T_2\,T_3T_3T_3\ldots \underbrace{T_{2185}\ldots T_{2185}}_{2185} \, \underbrace{T_{2186}\ldots T_{2186}}_{2186} $$ whose right-hand side is a few million symbols long, but does not depend on $n$. So when $n$ gets large enough, the savings we get from multiplying by $2187$ instead of $2096$ with $22$ symbols will eventually grow larger than the cost of that fixed right-hand side.


(Repeated multiplications of more than $3$ will not help us. The efficiency per symbol can be expressed as $\frac{\log x}x$ with $x$ repetitions on each right-hand side, and that expression has a maximum at $x=2.71828$ and then decreases as $x\to\infty$.)

0
On

Thanks to @aid78 in the comments, it should be a simpler proof than I thought it would be.


When we minimize the grammar $S$, we're minimizing the formula $\|S\|=\sum\limits_{A \in V(S)} |\gamma_A|$, where $V(S)$ is the variable set, which is just the number of symbols on the rhs of production rules. That is the definition used in the smallest grammar problem literature.

But this is equivalent to minimizing the number of concatenations since $\# \text{concats} = \|S\| - 1$. In fact minimizing $\|S\|$ is the same as minimizing $\|S\| + c$ for any constant $c \in \Bbb{R}$.

@aid78 says to look at the largest formable string using precisely $k$ steps ($k$ concatenations) for $k \geq 0$.

These are precisely the strings $a^{2^k}$!

Now, if $k' = \|S'\| \lt 2 n = \|S\|$, then there were $k' - 1$ concatenations. The largest string formable is $a^{2^{k' - 1}}$ which is strictly shorter than $a^{2^n}$ since $2^{k'-1} \lt 2^{n} \iff k'-1 \lt n = \|S\|/2$.

Here's where I'm stuck.