Is faster than linear addition possible using parallel computing?

312 Views Asked by At

I saw on some YouTube video a comment that claims that it's possible to add two n-bit numbers in $o(n)$ time (i.e., faster than linear), at first I thought this is correct since it was written on the internet and everything that's on the internet must be true, but I couldn't thought of any way to do it so I thought of asking it here.

Is it possible to add two n-bit numbers in less than $O(n)$ operations? Assume you have unlimited number of cores/processors you can use to parallel compute.

If it was for example xor instead of addition it's easy to see how you can achieve an $O(log(n))$ algorithm by spliting the number to n/2 2-bit numbers computing xor on those using n/2 different cores and then continue to do the same for the $n/2$-bit number we get from the results of the xors, but can we do something similar to addition? If not is there a proof it's not possible?

2

There are 2 best solutions below

5
On BEST ANSWER

It is not difficult to get $O(\log n)$ time (i.e. gate delays). It is of course wrong to say "$O(\log n)$ operations", because obviously you need $Ω(n)$ work (i.e. gates).

The trick is to define two carryover bits for each block, one for the actual carryover and one for the carryover if there an extra one was added to that block. Specifically, for each pair of positions $a,b$ with $a ≤ b$, let $c_0(a,b)$ be the carryover bit for adding the bits at positions $[a,b)$ in the inputs, and let $c_1(a,b)$ be the carryover bit for adding the same bits but plus an extra one. (Here the positions start from $0$ at the least significant bit to $n-1$ at the most significant bit.)

For example, if the bits at positions $[a,b)$ were 1101 and 0010 then adding them yields 1111 (no carryover) but adding an extra one yields 10000 (has carryover), hence $c_0(a,b) = 0$ and $c_1(a,b) = 1$.

The key observation is that for any positions $a,b,c$ with $a ≤ b ≤ c$, you can compute $c_0(a,c)$ and $c_1(a,c)$ from $c_0(a,b)$ and $c_1(a,b)$ and $c_0(b,c)$ and $c_1(b,c)$! (I will leave this as an easy exercise for you.)

Using this recursive relation we can easily construct a log-depth recursive circuit that computes $c_0(a,b)$ and $c_1(a,b)$ for every positions $a$ and $b = a+2^k$ where $k∈ℕ$, using a balanced binary tree with an $O(1)$-sized circuit at each node. This recursive computation takes a single upward pass.

Note that $c_0(0,2^k)$ is the true carryover bit for adding the first $2^k$ bits. All we need now is to compute $c_0(0,b)$ for every $b$ and we would be done! Indeed we can easily do so by a single downward pass on the same binary tree. (I will leave this as an exercise for you as well.)

2
On

To expand on a comment to user21820's excellent answer: one can actually show that addition can be implemented in constant depth, with a polynomial-size circuit involving AND, NOT, and OR gates with arbitrary fan-in. That is, in circuit complexity terms, addition can be implemented in the class $\textsf{AC}_0$.

See, for instance, those lecture notes.

Now, the "arbitrary fan-in" part may sound like a big assumption; however, since it is known that $\textsf{AC}_0\subseteq \textsf{NC}_1$ (where $\textsf{NC}_1$ is the class of functions computable by $O(\log n)$-depth, poly-size circuits with fan-in $\leq 2$ gates), this is in fact a stronger statement than saying addition can be implemented in logarithmic depth.