I view the main idea behind the Cooley-Tukey algorithm as follows. Suppose we have two $n$-degree polynomials (with coefficients in $\mathbb{Z}$). We will
- Evaluate the polynomials on the $2n + 1$ roots of unity. Concretely, let $\omega = e^{\frac{2 \pi i}{2n+1}}$. We evaluate $\langle p(\omega), p(\omega^2) \cdots p(\omega^{2n +1})\rangle$ and $\langle q(\omega), q(\omega^2) \cdots q(\omega^{2n +1}) \rangle$.
- We multiply these vectors pointwise.
- We interpolate the resulting vector into coefficient form.
The reason we choose roots of unity for this evaulation points is that a high degree of regularity that makes points 1 and 3 easy.
I am not sure how Step 2 above can be done efficiently without any approximations. The quanities $p(\omega^i)$ or $q(\omega^j)$ are polynomials in $\mathbb{Z}[\omega]$. They would look like $x_0 + x_1 \omega + x_2 \omega^2 + \cdots x_{2n+1} \omega^{2n+1}$.
Now, I understand that $\mathbb{Z}[\omega] \subseteq \mathbb{C}$, meaning that the elements can be represnted as $a + bi$ with $a, b \in \mathbb{R}$. The trouble with this representation is that elements of $\mathbb{R}$ cannot be represented nicely. For example, one could use floating point number, but this might introduce rounding issues.
What tricks should be used to represent the elements of $\mathbb{Z}[\omega]$ so that step 2 can be done elegantly and without any lossy approximation?
[See Chapter 2.6 of Dasgupta-Pappadimitrou-Vazirani for clarification]
You can consider $\mathbb{Z}/m\mathbb{Z}$ rather than $\mathbb{C}$. There should exist $w$, that is a privitive $k$-th root of unity modulo $m$ for some $k \ge 2n + 1$. Choosing a good value of $m$ is tricky. Number $m$ is not necessarily prime, however primality could be helpful. Also $m$ should be big enough to fit all coefficients of the product. If $p(x) = a_0 + a_1x + \cdots + a_nx^n$ and $q(x) = b_0 + b_1x + \cdots + b_nx^n$ with $|a_i| \le c$ and $|b_i| \le c$ for some constant $c$, then you need $m \ge 2(n + 1)c^2 + 1$, because $-(n + 1)c^2 \le a_0\cdot b_n + a_1\cdot b_{n - 1} + \cdots + a_n \cdot b_0 \le (n + 1)c^2$.
For Cooley–Tukey algorithm effectiveness you need $k$ to be a power of $2$, so it makes sense to find a prime $m = 2^{\ell}(2s + 1) + 1$ for $2^{\ell} \ge 2n + 1$. The reason for such kind of $m$ is the foloowing. For each prime $m$ there is a primitive $\varphi(m)$-root of unity modulo $m$, while $\varphi(m) = m - 1$ is Euler's totient function. (Moreover, the number of such primitive roots is $\varphi(\varphi(m)) = \Omega(m / \log \log m)$. It is high enough to find such root pretty easy and fast by random selection and test.) So for prime $m = 2^{\ell}(2s + 1) + 1$ we have $\varphi(m) = m - 1 = 2^{\ell}(2s + 1)$ that is divisible by $2^{\ell}$. This means that primitive $\varphi(m)$-root $r$ of unity modulo $m$ implies that $r^{2s + 1}$ is $2^{\ell}$-root of unity modulo $m$.
I used $m = 786433 = 3\cdot 2^{18} + 1$ and some small primitive $2^{18}$-root of unity modulo $m$ for my specific problems. (May be I used other values of $m$ too, but much more seldom.) But you can easily find any other appropriate $m$.