Let $A \in \mathbb{R}^{N\times N}$ be a large symmetric matrix that has at most $\frac{1}{8}$ of its elements non-zero.
We have an equation that involves a term $\mathrm{Tr}(A^n)$, that is, trace of the matrix $A$ to some small integer power $n$ (usually $3 < n < 7$).
What is the fastest way to calculate the trace?
For context, this comes from a physical model for many-body dispersion, where you take the trace of a logarithm of a matrix to obtain the dispersion energy:
$E = \int \frac{\mathrm{d}\omega}{2\pi} \mathrm{Tr}\{\mathrm{ln}(I-A(\omega))\}$
and I am expanding the logarithm as a series with terms involving $A(\omega)^n$ up to some maximum power. I have found that this is faster than the diagonalization of $A(\omega)$, or using the equality $\mathrm{Tr}\{\mathrm{ln}(A)\} = \mathrm{ln}\{\mathrm{det}(A)\}$. The energy equation also tells you that the maximum eigenvalue of $A(\omega)$ cannot be larger than $1$ or you get complex values for the energy which is unphysical. They can be smaller than $-1$ though so that does not set a hard constraint for the spectral radius.
As mentioned, the eigendecomposition for the matrix in question is too slow compared to just directly multiplying to the power of $n-1$ and then taking the dot product:
$\mathrm{Tr}(A^n) = \sum\limits_{i=1}^N [A^{n-1}]_i \cdot [A^T]_i$,
where $[A]_i$ denotes the $i$th row of the matrix $A$. However, this still involves the matrix multiplication up to the $(n-1)$th power.
Or, for even power, using the symmetricity:
$\mathrm{Tr}(A^n) = \sum\limits_{i=1}^N [A^{n/2}]_i \cdot [A^{n/2}]_i$.
For $n=6$, this still requires two large (sparse) matrix-matrix multiplications.
I know that for the case of $n=2$ this is just $N$ dot products. I also know that in this case the trace can be written as the sum of all elements of the Hadamard product (element-wise product) because the matrix is symmetric:
$\mathrm{Tr}(A^2) = \sum\limits_{i=1}^N \sum\limits_{j=1}^N [A\odot A]_{ij}$,
which does not require any matrix multiplications. However, just the case $n=2$ is not very useful for me.
I also know that taking the row sum of the first $A$ in the $A^n$ and the column sum of the last $A$ and denoting them by $A_r$ and $A_c$ respectively, we have:
$A_r A^{n-2} A_c = \sum\limits_{i=1}^N \sum\limits_{j=1}^N [A^n]_{ij}$,
which reduces the matrix multiplications to vector-matrix products, because $A_r$ is a row vector and $A_c$ a column vector. However, this yields the "grand sum" (sum of all elements of the matrix) and not the trace:
$\mathrm{Tr}(A^n) = \sum\limits_{i=1}^N [A^n]_{ii}$.
Maybe surprisingly, the calculation of the grand sum is cheaper than the calculation of the trace in this case. Is there a connection I can make here to also reduce the computational time for the trace, for $n>2$?
Furthermore, I need to calculate the gradient of this term (for the forces of course):
$\nabla\mathrm{Tr}(A^n) = \mathrm{Tr}(A^{n-1}\nabla A)$,
where the equality follows from the cyclicity of the trace. Here some rows (or columns by symmetricity) of the gradient are zero which then only requires one to calculate the corresponding rows of $A^{n-1}$. But is there a faster way to do this, preferably avoiding the matrix-matrix products?
If it turns out that there are no faster ways to do this, I would be interested in fast approximations of the trace with a controllable error.