Say that we have a set of CRT or RNS moduli $p_1, \dots, p_k$ where $M=\prod_{i=1}^k p_i$ and the CRT representation of two numbers $x,y$, i.e.
- $x_{\mathsf{CRT}} = ([x \bmod p_1], \dots, [x \bmod p_k])$
- $y_{\mathsf{CRT}} = ([y \bmod p_1], \dots, [y \bmod p_k])$.
We'd like to compute the CRT representation of $[x \bmod y]_{\mathsf{CRT}} = ([ (x \bmod y) \bmod p_1], \dots, [(x \bmod y) \bmod p_k])$ by manipulating only numbers in CRT form.
One idea to solve this is by computing $(x - x/y)$ in their CRT form. The main problem which needs to be tackled is the division $x/y$ in CRT form. There are a couple of methods to solve the integer division in CRT which can be found in a few papers such as CL06 or HK95.
In a nutshell those papers compute the reciprocical of $y$ in CRT form and then do a couple of iterations, similar with Newton-Raphson of computing $x/y$ using repeated multiplications. The problem with these methods is that it requires many switches between different CRT representations which are relatively expensive in my model.
I'm thinking that if I restrict $y$ to those that satisfy $\mathsf{gcd}(y, M) = 1$ perhaps the solution might be less expensive but I am unsure how to tackle this. I could also make the primes $p_i$ smooth $\approx 2^n + \epsilon$ but again, usure how this would have better efficiency.
The question I have is whether you are aware of any efficient integer division or modulo reduction which works over CRT representations and has as few CRT conversions as possible.