I saw on some YouTube video a comment that claims that it's possible to add two n-bit numbers in $o(n)$ time (i.e., faster than linear), at first I thought this is correct since it was written on the internet and everything that's on the internet must be true, but I couldn't thought of any way to do it so I thought of asking it here.
Is it possible to add two n-bit numbers in less than $O(n)$ operations? Assume you have unlimited number of cores/processors you can use to parallel compute.
If it was for example xor instead of addition it's easy to see how you can achieve an $O(log(n))$ algorithm by spliting the number to n/2 2-bit numbers computing xor on those using n/2 different cores and then continue to do the same for the $n/2$-bit number we get from the results of the xors, but can we do something similar to addition? If not is there a proof it's not possible?
It is not difficult to get $O(\log n)$ time (i.e. gate delays). It is of course wrong to say "$O(\log n)$ operations", because obviously you need $Ω(n)$ work (i.e. gates).
The trick is to define two carryover bits for each block, one for the actual carryover and one for the carryover if there an extra one was added to that block. Specifically, for each pair of positions $a,b$ with $a ≤ b$, let $c_0(a,b)$ be the carryover bit for adding the bits at positions $[a,b)$ in the inputs, and let $c_1(a,b)$ be the carryover bit for adding the same bits but plus an extra one. (Here the positions start from $0$ at the least significant bit to $n-1$ at the most significant bit.)
For example, if the bits at positions $[a,b)$ were
1101and0010then adding them yields1111(no carryover) but adding an extra one yields10000(has carryover), hence $c_0(a,b) = 0$ and $c_1(a,b) = 1$.The key observation is that for any positions $a,b,c$ with $a ≤ b ≤ c$, you can compute $c_0(a,c)$ and $c_1(a,c)$ from $c_0(a,b)$ and $c_1(a,b)$ and $c_0(b,c)$ and $c_1(b,c)$! (I will leave this as an easy exercise for you.)
Using this recursive relation we can easily construct a log-depth recursive circuit that computes $c_0(a,b)$ and $c_1(a,b)$ for every positions $a$ and $b = a+2^k$ where $k∈ℕ$, using a balanced binary tree with an $O(1)$-sized circuit at each node. This recursive computation takes a single upward pass.
Note that $c_0(0,2^k)$ is the true carryover bit for adding the first $2^k$ bits. All we need now is to compute $c_0(0,b)$ for every $b$ and we would be done! Indeed we can easily do so by a single downward pass on the same binary tree. (I will leave this as an exercise for you as well.)