Assume that I have already calculated the integer division $\frac{p}{q}$ so that I have both integer part and rest a.k.a. "modulus": $$\cases{\left\lfloor\frac{p}{q} \right\rfloor = n\\p \equiv m\,\,\,(\text{mod } q)}$$
Now, say I want to calculate the same for $q+d$, where $d\lt\lt q$. Is there some faster way than just to do the full euclidean algorithm from scratch..?
Consider the equation $$nq = p$$
If we for a second allow all quantities to take any real number, we can, assuming (which is given in the question) that $p$ is constant, rewrite $n$ as a function of $q$:
$$n(q) = \frac{p}{q}$$
Now if we calculate $$\frac{\partial n}{\partial q} = - \frac{p}{q^2}$$
We have an expression for how fast $n$ should change with the change of $q$. Now, this function we can tabulate. Actually might be more practical to tabulate $$-\frac{1}{q^2}$$
Now we we can make an estimate for how large a step we should take in the $n$ direction for each step of length 1 in the $q$ direction. It should be $-\frac{p\Delta p}{q^2}$. Assuming we have made the tabulation above to some precision, this can be done by 3 multiplications. Possibly we will need some interpolation on the table for better precision, but that can once again be done division-free, usually as linear combination of low order polynomials.
For example a small portion of primes: $\left[\begin{array}{cccccccc}83&89&97&101&103&107&109&113\end{array}\right]$ and we are to test the primality of a number the magnitude of $10^6$. $$\frac{\partial n}{\partial q} = - \frac{10^6}{(10^2)^2} \approx -100$$
With some interpolation and deploying central difference estimate for differential we get estimates
$$\begin{array}{cccccccc}[ -811.25 & -924.96 & -408.12 & -192.23 & -362.81 & -171.47 & -324.65 ]\end{array}$$ Which are rather close to the correct $\Delta q$: $$ \begin{array}{cccccccc}[ -812.24 & -926.68 & -408.29 & -192.25 & -362.94 & -171.48 & -324.75]\end{array}$$
Even if we would not get very close, modern CPUs have instructions in hardware both to calculate multiply+add and then $$\min_{i}|a-b_i|$$ for 8 or 16 $i$ at a time. That means in practice an ability to determine 3 or 4 additional bits of precision for a number per clock cycle!