I am taking a course in Computational Mathematics. During one of my recent lectures about different algorithms when dealing with matrices, the following what they call "block-algorithm" is a good and decently fast way of generating the upper triangular form of a matrix. Sadly the algorithm was not presented in any adequately detailed way and as such I've been "left hanging". I've tried to Google it up but am having a real hard time finding any good resources about this algorithm, I don't even know it's name (was never presented and "block algorithm" doesn't yield much good).
So here I am, asking you if you know what this algorithm is all about and if you have any solid resources to learn more about it? What would be absolutely AMAZING would be to have some kind of visualizer of it where one can step through the code and see in real time what is going on.
/* reduce matrix A to upper triangular form */
void eliminate(double **A, int N, int B)
{
int i, j, k, I, J;
/* size of block is (M*M) */
int M = N / B;
/* for all block rows */
for (I = 0; I < B; I++)
{
/* for all block columns */
for (J = 0; J < B; J++)
{
/* loop over pivot elements */
for (k = 0; k <= Min((I + 1) * M - 1, (J + 1) * M - 1); k++)
{
/* if pivot element within block */
if (k >= I * M && k <= (I + 1) * M - 1)
{
/* perform calculations on pivot */
for (j = Max(k + 1, J * M); j <= (J + 1) * M - 1; j++)
{
A[k][j] = A[k][j] / A[k][k];
}
/* if last element in row */
if (j == N)
A[k][k] = 1.0;
}
/* for all rows below pivot row within block */
for (i = Max(k + 1, I * M); i <= (I + 1) * M - 1; i++)
{
/* for all elements in row within block */
for (j = Max(k + 1, J * M); j <= (J + 1) * M - 1; j++)
{
A[i][j] = A[i][j] - A[i][k] * A[k][j];
}
/* if last element in row */
if (j == N)
A[i][k] = 0.0;
}
}
}
}
} /* end eliminate */
One other algorithm that was presented was the following that is a lot less complicated and easier to visualize. It is though not as performant as the top one.
/* reduce matrix A to upper triangular form */
void eliminate (double **A, int N) {
int i, j, k;
/* loop over all diagonal (pivot) elements */
for (k = 0; k < N; k++) {
/* for all elements in pivot row and right of pivot element */
for (j = k + 1; j < N; j++) {
/* divide by pivot element */
A[k][j] = A[k][j] / A[k][k];
}
/* set pivot element to 1.0 */
A[k][k] = 1.0;
/* for all elements below pivot row and right of pivot column */
for (i = k + 1; i < N; i++) {
for (j = k + 1; j < N; j++) {
A[i][j] = A[i][j] - A[i][k] * A[k][j];
}
A[i][k] = 0.0;
}
} /* end pivot loop */
} /* end eliminate */
```
The algorithm is really just performing fairly standard row-reduction, but in a way that (a) allows the calculations to happen "in place" (i.e. by altering the values in the array, rather than creating a new array at each step), and (b) groups calculations within loops in a way that winds up being efficient (rather than needing to iterate back and forth).
The "blocking" involves dividing the (presumably $N \times N$) matrix into $B$ parts horizontally and vertically, giving $B \times B$ total blocks. If we ignore the blocking (equivalent to setting $B = 1$), the algorithm does the following:
For each row $k$ in the matrix:
Set $A_{k,\cdot} := A_{k,\cdot} / A_{k,k}$
i.e. scale row $k$ so that its leading term is 1
For each row $i > k$ below row $k$:
The blocking version just does this operation within blocks - it loops over each row of blocks, and within that row it loops over each column of blocks. It does this by "forgetting" to set $A_{k,k}$ to 1 and $A_{i,k}$ to 0 until it's in the last column of blocks, so that the coefficients are still present.
This diagram shows how the algorithm processes a matrix if $B = 2$:
It diagonalises $B_{1,1}$, but doesn't set the diagonal elements to 1 yet.
It scales $B_{1,2}$ and sets the diagonal elements in $B_{1,1}$ to 1.
It row-reduces $B_{2,1}$ by each row in $B_{1,1}$, making use of the knowledge that $B_{1,1}$ is now upper triangular to skip all of its lower-left zeros, saving a few calculations.
Finally, it diagonalises $B_{2,2}$.
Overall, it will be taking something like $O(B^2 (N/B)^3)$ operations to do everything, and it should mean that there's a value of $B$ the optimises the number of operations (I'm guessing it's somewhere around $B \sim \sqrt{N}$).
I will note that if I'm reading it right the algorithm assumes that the matrix is invertible and has a non-zero entry in $A_{1,1}$, otherwise you're going to get some dumb behaviour, but it would be pretty easy to adjust to cover those cases.