I have a series: $$f(n) = 0^0 + 1^1 + 2^2 + ..... n^n$$ I want to calculate $f(n) \pmod m$ where $n ≤ 10^9$ and $m ≤10^3$. I have tried the approach of the this accepted answer but the complexity is more than acceptance. I have used this approach for calculating modular exponential. However, I am not able to optimize it further. Kindly help. Thanks in advance.
Efficient Method to find funsum$(n) \pmod m$ where funsum$(n)= 0^0 + 1^1 + 2^2 + ..... n^n$
290 Views Asked by user641 https://math.techqa.club/user/user641/detail AtThere are 2 best solutions below
On
$$\sum_{k=1}^n k^k\bmod m\equiv \sum_{k=1}^n(k\bmod m)^k.$$
You can precompute $k^k\bmod m$ for all $k\in[0,m-1]$ (this takes $O(m\log m)$ modular multiplies by squarings). For the next terms, $(k+pm)^{k+pm}\equiv k^k(k^m)^p$, and it suffices to also keep the values of $k^m$.
The total cost for the summation will be
$$\color{green}{O(m\log m+n)}.$$
For example, with $m=5$, and assuming $0^0=0$,
$$\ \ \ \ \ k^5\to0,1,2,3,4,$$ $$\ \ \ \ \ k^k\to 0,1,4,2,1,\\\ k^{k+5}\to0,1,3,1,4,\\k^{k+10}\to0,1,1,3,1,\\k^{k+15}\to0,1,2,4,4,$$ $$k^{k+20}\to0,1,4,2,1,\\k^{k+25}\to0,1,3,1,4,\\k^{k+30}\to0,1,1,3,1,\\k^{k+35}\to0,1,2,4,4,\\\cdots$$
In this table, the columns are the powers of $k$, which show a period of at most $m-1$ (actually $\phi(m)$, Euler's totient). Hence, blocks of $m\cdot(m-1)$ elements bring a constant contribution.
After computing a complete block by the above method, the complexity lowers to
$$\color{green}{O(m^2)},$$
using $$f(n)=f(n\bmod (m-1)m)+\left\lfloor\frac n{(m-1)m}\right\rfloor b$$ where $b$ is the sum over a whole block
Finally, we can yet speed-up the process by accumulating the columns of a block (be them complete or not) with the geometric summation formula
$$k^k+k^{k+m}+k^{k+2m}+\cdots k^{k+pm}=k^k\frac{k^{m(p+1)}-1}{k^m-1}.$$
After precomputing a table of inverses, the last expression can be computed in time $O(\log m)$, hence the whole process takes
$$\color{green}{O(m\log m)}.$$
For an efficient computation, we have to use the well-known fast exponentiation by repeated squaring, based on the simple observations $x^{2k}=(x^k)^2$ and $x^{2k+1}=x^{2k}\cdot x$. There's no need in a recursive function, because it's easy to transform it into an iteration through the binary expansion of the exponent. But that would still give an algorithm of complexity $O(n\log n),$ and that's quite forbidding already for $n=10^8,$ let alone $n=10^9.$ Savings are possible since we need the results only modulo $m$, and that's assumed to be much smaller than $n$.
Obviously, large exponents aren't the problem, so I consider reduction of the exponents modulo $\varphi(m)$ a blind alley: we're summing over all bases up to $n$, many of them having common divisors with $m$ (60% in the obviously allowed case $m=1000$).
But certainly $l=j\pmod m$ implies $l^l=j^l\pmod m$, so we can group our summands accordingly. Let $n=mk+r$ with $0\le r\le m-1.$ Then, $$f(n) = 1+\sum^n_{l=1}l^l=1+\sum^{m-1}_{j=1}j^j\sum^{k-1}_{i=0}j^{mi}+\sum^{r}_{j=1}j^j\cdot j^{mk}\pmod m.$$ With an auxiliary function $$g(x,k)=\sum^{k-1}_{i=0}x^i,$$ we can write this $$f(n) = 1+\sum^{r}_{j=1}j^j\,g(j^m, k+1)+\sum^{m-1}_{j=r+1}j^j\,g(j^m, k)\pmod m.$$ Unfortunately, using the fast formula $$g(x,k)=\frac{x^k-1}{x-1}$$ is not an option, as division modulo a composite $m$ can get very tricky.
But there's an alternative: Obviously, we have $g(x,2k)=(1+x^k)\,g(x,k)$ and $g(x,2k+1)=1+x\,g(x,2k)$, and we don't even have to calculate $x^k,$ since $1+x^k=(x-1)\,g(x,k)+2,$ i.e. $g(x,2k)=((x-1)\,g(x,k)+2)\,g(x,k)$.
As one can see easily, the resulting algorithm has complexity $O(m\log n)$, and that's reasonably fast (milliseconds) for the values of $n,m$ in the question.
The crucial functions are implemented like this (in Java):