Fast convolution with "small" values

97 Views Asked by At

Say we have two sequences of integers $a = \{a_1 \dots a_n\}$ and $b = \{b_1 \dots b_n\}$, where $a_i, b_i \in \mathbb{Z}_q$, but we know some value $p<q$ such that $0 \leq a_i < p$.

We want to compute the cyclic convolution of these sequences over the larger field $\mathbb{Z}_q$ ($a \circledast b$). Naively, we could compute an FFT for each sequence, multiply component-wise, and then compute the inverse FFT. This takes 3 FFT's of length $n$, and $n$ $q$-bit multiplications.

Now, imagine that we keep $a$ fixed but want to compute the cyclic convolution against many different $b$'s. Assume that we are given the $b$'s in FFT form, and can produce a result in FFT form.

Then we can precompute and store the FFT of $a$, and our runtime will be quite fast, since we no longer need to compute any FFT's. Our storage will be $n \log q$ bits, for the FFT form of $a$ over $\mathbb{Z}_q$.

Can we exploit the fact that $p<q$ to reduce our storage costs? Specifically, can we store only $\approx n \log p$ bits about $a$, and still compute the convolutions with only $n$ $q$-bit multiplications? It's ok to impose some requirements of $p$ or $q$ (ex: $p|q$, $q$ is prime), or even drop use of the FFT altogether and use some other evaluation form that takes $O(n^2)$ to convert from or to.

1

There are 1 best solutions below

0
On

Here's a solution that works for some values of $n,p,q$:

Assume $q=\Pi_i q_i$ for several prime $q_i < p$ for $i \in [1,k]$. Choose a modulus $M \ge np^2$. By CRT, we can decompose the convolution between $a$ and $b$ into $k$ convolutions, each between $a$ and $b \bmod q_i$. By our choice of $M$ and because $q_i < p$, we can just do each of these convolutions over $\Bbb Z_M$. The reason this works is that, over the integers, a convolution between $b \bmod q_i$ and $a$ can produce a maximum value of $np^2$, so we will not wrap around $M$ when computing over $\Bbb Z_M$. Then we can reduce the mod $M$ results mod $q_i$, and apply the CRT to recover the value mod $q$.

Say we choose $M$ so that it is 'FFT friendly' - that is, it has a primitive $2n$-th root of unity. Then we can now get away with just storing the $a$ value in FFT form over $\Bbb Z_M$. This is $n \log M = n (2\log p + \log n)$ bits, which could be much less than $n \log q$ bits, depending on the values of $n, p, q$. The server must perform $k = \log q / \log p$ multiplications of $\log M$-bit words now, instead of a single $\log q$-bit multiplication. The technique works best when $q$ is much larger than $p$ and $n$.