I would like an efficient way to solve for $\mathrm{tr}(X)$, where \begin{equation} XA=B \end{equation} and I know the matrices $A$ and $B$, but not $X$. $A$ and $B$ are Hermitian and sparse and I would like to avoid computing $X$ in its entirety, only to take its trace and nothing else.
There is a detailed answer to a similar question involving the Lyapunov equation, but I hoped that in this simpler case that there may be less involved or more efficient options.
I have been impressed recently by the performance of iterative techniques, such as those of Yousef Saad in solving matrix-vector equations and I wonder whether there is a similar technique that can be applied here? Sadly, the best I can think of at the moment, however, is to solve row-by-row and summing the element corresponding to the diagonal. Of course, that would be counterproductive when an LU decomposition approach would be much more efficient (assuming I have to compute the whole matrix).
Thanks,
Joly
I've been researching this for the past couple of days now and I've found a technique which I hadn't previously known about which seems to solve this for me with a low computational overhead. I'll just describe it briefly and give the references (Bekas, Kokiopoulou and Saad and Avron and Toledo) in case it's useful for anyone else. It's much more generally applicable than to just the problem I asked about above.
For some matrix function, $\mathbf{f}(\mathbf{X})$, the trace may be approximated as: $$ \mathrm{tr}(\mathbf{f}(\mathbf{X}))\approx\sum_{i=1}^N \vec{v}^T_i \mathbf{f}(\mathbf{X})\vec{v}_i, $$ where $\vec{v}_i$ are some set of vectors, which may be random or otherwise. For my problem, $\mathbf{f}(\mathbf{X})=\mathbf{X}=\mathbf{B}\mathbf{A}^{-1}$ and so the trace may be approximated as $$ \mathrm{tr}(\mathbf{X})\approx\sum_{i=1}^N \vec{u}^T_i\vec{w}_i, $$ where $\vec{u}_i^T=\vec{v}_i^T\mathbf{B}$ and $\vec{w}_i$ may be obtained by solving $\vec{v}_i=\mathbf{A}\vec{w}_i$. In my case, if $\vec{v}_i$ are normally distributed random numbers, N must be greater than is practical. Instead, if I take vectors from a Hadamard matrix as suggested in those papers, then for the problem I want to solve I seem to require at most 512 vectors to converge to $10^{-8}$ in the trace. This is remarkably cheap considering the dimension of the matrices which I am dealing with.
I mocked this up in Matlab:
acc=0; N=2048; H=hadamard(2^ceil(log2(size(A,1)))); exactr=trace(B*inv(A)); for i=1:N; v=H(i,1:size(A,1))'; acc=acc+((v'*B)*(A\v)); if(i-(2^floor(log2(i)))==0) fprintf('%d %20.16f %20.16f %20.16f\n',i,acc/i,exactr,abs((acc/i)-exactr)); end; end; trest=acc/Nwhich worked for my smallest matrices, where I can test against the exact trace. In practice, one would not generate the full Hadamard matrix, but vectors from it as needed.