I would like to implement fast algorithm for multiplication (SSA algorithm) of arbitrarily long integers with a number theoretic transform (e.g. Fermat number transform). But as number of elements (digits in base b) grows it is evident that a single digit (e.g. integer 64-bit) becomes unable to store the result of single NTT step. I am wondering how to overcome such issue. I could use some already existing big integer as a single digit. Or to somehow employ Chinese remainder theorem (but I am not sure how to do it and if the computation would be still quick as it seems multiple convolutions must be computed).
Any hint whether there exist another possible solutions or how Chinese remainder theorem could be used in this context would be much welcome.
Answering the question myself. The trick is to use the SSA algorithm recursively when performing element-wise multiplication. https://en.m.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm