I've recently been working with Number Theoretic Transforms and have a question I can't work out the answer for.
Setup
Let $\mathbb{F}_{q}$ be a finite field of prime order $q$ and $q = 2m +1$ for $m\in\mathbb{Z}$ an odd integer. I have $\vec{a} \in \mathbb{F}_{q}^{k}$, a vector and I wish to Lagrange interpolate this vector to produce a polynomial $A(x)\in \mathbb{F}_{q}[x]$ such that $A(\omega_{n}^{i})=a_{i}$ for some primitive $n$-th root of unity $\omega_{n}\in\mathbb{F}_{q}^{*}$ (with $n \geq k$).
At least that is the general case, I am really interested in a specific case with \begin{align*} q = 21888242871839275222246405745257275088696311157297823662689037894645226208583 \end{align*} then $q-1$ has prime factorisation \begin{align*} 2×3^2×13×29×67×229×311×983×11003×405928799×11465965001×13427688667394608761327070753331941386769. \end{align*}
My Plan
I take my vector $\vec{a}$ and find the smallest number $n \geq k$ such that $n | q-1$. This fact guarantees that there is a primitive $n$-th root of unity in $\mathbb{F}_{q}^{*}$, which we shall denote by $\omega_{n}$. I can then pad my vector to length $n$ with zeroes so that now we are dealing with $\vec{a}\in \mathbb{F}_{q}^{n}$ ( I apologise for overloading notation). Then the coefficients of $A(x)$ as a degree $n-1$ polynomial are given by the Number Theoretic Transform of $\vec{a}$. That is \begin{align*} A_{j} = \sum_{i=0}^{n-1}a_{i}\omega_{n}^{ij}. \end{align*}
My Problem
When reading about NTTs most methods for computing them use the Cooley-Tukey algorithm and the fact that $n = 2^{s}$ for some reasonably large $s$. My problem is that $q-1$ is not highly divisible by two, or in fact any prime. I thought about using a combination of the PFA algorithm and Rader's algorithm. This way I can break $n$ down into its prime factorisation and thus convert the problem from computing an $n$-point NTT into a number of $p_{i}$-point NTTs with each $p_{i}$ prime. Then each $p_{i}$ NTT can be computed using Rader's algorithm. The fact that $n$ is a divisor of $q-1$ means that for each $p_{i}$ we know that $\mathbb{F}_{q}^{*}$ contains $p_{i}$-th primitive root of unity and so each of these $p_{i}$-point NTTs can be computed in $\mathbb{F}_{q}$.
Question 1: Rader's algorithm for a $p$-point NTT with $p$ prime involves computing a circular convolution which can be done using a $p-1$-point NTT. I can't find a good example or explanation of how this is done so would be very appreciative if somebody could explain it or point me in the correct direction.
Question 2: A related question is then should I break down the $p-1$-point NTT further using the same method as before? Or can this even be done?
Any and all help would be much appreciated, or even suggestions of a better method to the one I have described above.
Example case
I know for me sometimes an example helps illustrate the problem. Say we have arrived at $n=2262$ and are computing a $2262$-point NTT (for ease I'll write a $m$-point NTT as $\mathrm{NTT}_{m}$ from now on). We can factorise $2262$ as $2\times 3\times 13\times 29$ and so the PFA algorithm tells us that \begin{align} \mathrm{NTT}_{2262}= \mathrm{NTT}_{2}\otimes\mathrm{NTT}_{3}\otimes\mathrm{NTT}_{13}\otimes\mathrm{NTT}_{29}. \end{align} Both $\mathrm{NTT}_{2}$ and $\mathrm{NTT}_{3}$ are probably small enough to just compute (in fact $\mathrm{NTT}_{13}$ probably is as well if a bit tedious, but for sake of argument we won't). We can then use Rader's algorithm to compute $\mathrm{NTT}_{13}$ and $\mathrm{NTT}_{29}$, which will require us to compute $\mathrm{NTT}_{12}$ and $\mathrm{NTT}_{28}$ respectively. This is where I get stuck and don't know if I should break down $\mathrm{NTT}_{12}$ and $\mathrm{NTT}_{28}$ in the same way, or even if I can as both $12$ and $28$ are divisible by $4$ and there is no primitive $4$-th root of unity in $\mathbb{F}_{q}^{*}$.
EDIT
I have found an alternative solution to my original problem using "ECFFT" as defined in this paper. I still do not know the answer for the best way of solving my problem using traditional NTTs.