In Matrix Computations by Golub and Van Loan (3rd edition, page 433) an algorithm is given for a parallel version of the classical Jacobi algorithm for solving a real symmetric eigenvalue problem. The algorithm overwrites the real symmetric matrix $A$ with $V^TAV$ where $V$ is orthogonal.
The idea is that each set of 2-by-2 symmetric Schur decompositions are non-conflicting and therefore can be done in parallel. From p. 432:
Note that all the rotations within each of the three rotation sets are "non-conflicting". That is, subproblems (1, 2) and (3, 4) can be carried out in parallel.
However, I cannot see how this is the case. Clearly I am missing something (or the book is wrong). Here is the point of confusion:
Within the inner loop, we have the update step: $$ A = J(p, q, \theta)^TAJ(p, q, \theta). $$
This modifies both rows $p$ and $q$ and columns $p$ and $q$ of matrix $A$. Therefore for two subproblems $(p, q)$ in a set such as $\{(1, 2), (3, 4)\}$, the updates are conflicting because, for example, element $A(1, 3)$ is modified by both sets.
Could anyone please explain what I am missing in order to correctly implement this parallel algorithm?
EDIT: Here is the pseudocode. (Sorry I couldn't get the indentations right.) The subroutine 'music' just rotates the set of indices in 'top' and 'bottom' to generate new pairs of indices that are "non-conflicting". If you can see how this would work in the first iteration, i.e. set=1, that would be sufficient.
Algorithm 8.4.4 (Parallel Order Jacobi) Given a symmetric $A \in \mathbf{R}^{n \times n}$ and a tolerance $tol > 0$, this algorithm overwrites $A$ with $V^TAV$ where $V$ is orthogonal and off$(V^TAV) \leq tol \|A\|_F$. It is assumed that $n$ is even.
$V = I_n$
$eps = tol \|A\|_F$
$top = 1:2:n; bot = 2:2:n$
while off$(A) > eps$
for $set=1:n-1$
for $k = 1 : n/2$
$p = min(top(k), bot(k))$
$q = max(top(k), bot(k))$
$(c, s) = $sym.schur2$(A, p, q)$
$A = J(p, q, \theta)^T A J(p, q, \theta)$
$V = VJ(p, q, \theta)$
end
$[top, bot] = $music$(top, bot, n)$
end
end
The text below states:
Notice that the $k$-loop steps through $n/2$ independent, nonconflicting sub-problems.
It seems that the $2x2$ Schur-decompositions are non-conflicting and can be done in parallel. The update of $A$ however is indeed conflicting.