Why is it more efficient to compute the modular exponentiation by calculating to the power of two and not three for example?

3.1k Views Asked by At

I learned about modular exponentiation from this website and at fast modular exponentiation they calculate the modulo of the number to the power of two and then they repeat this step. Why not calculate to the power of three ? https://www.khanacademy.org/computing/computer-science/cryptography/modarithmetic/a/fast-modular-exponentiation

4

There are 4 best solutions below

3
On

You might think that exponentiation by cubing might be faster because it triples the exponent in one step whereas exponentiation by squaring doubles it the exponent in one step. But this comparison is not fair, because exponentiation by cubing takes two multiplications per step (compute $x^2$, then multiply it by $x$) while exponentiation by squaring only takes one.

If you instead "compare apples to apples", two multiplications with exponentiation by squaring quadruples the exponent, while two multiplications of exponentiation by cubing only triples the exponent. Now if you ignore any "shortcut" optimizations, multiplications in modular arithmetic take essentially constant time (depending only on the modulus), so exponentiation by squaring will raise the current exponent by more for a given computational cost.

For the overall procedure, the situation is made a little bit more complicated, because in general $k$ is not a power of $2$ or $3$. Consequently these methods have to perform a recursive step: this recursion amounts to the decomposition $x^k=\prod_{i=0}^n x^{b_i a^i}$, where $b_i$ are digits in the base $a$ expansion of $k$.

So there is a question of how long each of the factors in the recursion takes to compute, and also how many of them there are. The number of factors is typically smaller for $a=3$ than for $a=2$, at least if you use the decomposition I wrote above (which requires a squaring step at the end of a given branch whenever $b_i=2$).

Still, none of the issues in the previous paragraph are enough to give exponentiation by cubing the advantage, simply because the factors themselves take significantly longer to compute.

2
On

If we want to calculate $x^N$, the monomials: $$x, x^2, x^4, x^8, \ldots, x^{2^{\lfloor \log_2 N \rfloor}}$$ are the only initial powers we need to calculate (this is $\lfloor \log_2 N \rfloor - 1$ multiplications, because we obtain each term by squaring the previous term). Then we write $N$ in binary as $$N = \sum_{i=0}^{\lfloor \log_2 N\rfloor} a_i2^i,$$ $a_i \in \left\{0,1\right\}$. If there are $k$ $1$s among the $a_i$, there are $k$ nonzero summands of $N$, and it takes $k$ multiplications to get $x^N$ by multiplying the corresponding monomials $x^{2^i}$. In the worst case, we perform $2 \log_2 N$ total multiplications of large numbers. In this analysis, we ignore the cost to write $N$ in binary, because we assume $N$ is small compared to the powers of $x$ we will be dealing with ($N \ll x^N$).

If we try to use base 3 instead of base 2, we now have $\log_3 N$ monomials to calculate, but each one takes 2 multiplications to obtain (2 multiplications to get $x^3$ from $x$, another 2 to get $x^9$ from $x^3$, etc.), so in total $2 \log_3 N$ multiplications to get the monomials. In analogy with the above, now the expansion of $N$ has coefficients in $\{0,1,2\}$, and each $2$ introduces an extra multiplication in the final step: $$x^{24} = x^{2(9) + 2(3)} = x^9x^9x^3x^3,$$ so in the worst case again we have $2 \log_3 N$ multiplications to get the final result.

Since $$\frac{2 \log_3 x}{\log_2 x} = \frac{2\ln 2}{\ln 3} \approx 1.26,$$ this is less efficient in the worst case. More than that, the fact that the coefficients of $N$ in ternary have 3 cases instead of 2 makes the algorithm more complicated.

This is not the end of the story, because there are ways to avoid some of the extra multiplications introduced by modular exponentiation in base 3, by combining terms that have coefficients of 2: $$x^{24} = (x^9x^3)^2,$$ a technique that can be broadly applied (see this question on the CS StackExchange).

So I have not answered why (or whether) base 2 is used in practice in modern algorithms, but I think I have shown why base 2 is optimal for the naive algorithm, and why there is no immediate improvement from moving to a larger base.

0
On

You can indeed speed up a modular exponentiation by working in a different base, and this is routinely done by constrained implementations in real life (e.g. smart cards). But to make it work, the base has to be a power of $2$.

Suppose you have to calculate $x^e \bmod p$, and you decide to work in base $8$. Then you pre-compute a table $T$ containing $1, x, x^2, x^3, x^4, x^5, x^6,$ and $x^7$ (all $\bmod p$).

Now you can perform the exponentiation (roughly) as follows:

y := 0  
Loop while bits remain in e:  
  y := y^8 mod p // Square y three times  
  i := the next three bits in the exponent e (from msb to lsb)  
  y := y*T[i] mod p // Incorporate the appropriate table entry
End Loop

The reason it has to be a power of $2$ is that otherwise the step y := y^8 would be too slow.

It takes time (and storage space) to build the table $T$, so you have to choose your base carefully to optimise the time saved. (But if the exponentiation is going to be performed many times for the same $x$, then you only need to build the table once, so you can make it as large as possible subject to your space constraints.) You can also effectively double the table size by only storing odd powers of $x$; you might like to work out for yourself how to do this.

Note that the number of squarings is unchanged by this method, so you can't achieve a speed-up of more than about 50% unless squaring is significantly faster than multiplication.

5
On

In order to raise x to the power N - modularly or otherwise - by repeated squaring or cubing, one has to add corrective multiplications to make the final exponent of the result equal to N in base 2 or 3, respectively. This means one has to extract the base-2 (resp. base-3) digits of N. The advantage of binary is that, because of our hardware design choices, we already have the base-2 digits of N; extracting them requires only SHIFT and AND instructions. For base-3 one would have to divide by 3, a significantly more expensive operation. [this assumes N is a variable input; if N is a pre-chosen constant one can of course precalculate the required steps without performing any divisions]

If we were living in a world where our design choices favored some form of ternary, maybe this question would have been asking why base-2 isn't preferred over base-3. :-)