Simplification of a recursive linear algebra system

233 Views Asked by At

I've come across a problem that can be described by the following system: $$C_n=I-B_{n-1}^2$$ $$B_n=A_{n-1}C^{-1}_n B_{n-1} A_{n-1} + B_{n-1}$$ $$A_n=A_{n-1}C^{-1}_n A_{n-1}$$

Where $A$ and $B$ are square, non-symmetric, dense matrices. $A_0$ and $B_0$ are available and the necessary output are $A_n$ and $B_n$ for a specific $n$.

Currently, $A_n$ and $B_n$ are obtained by solving the system, with the inverse of $C_n$ being explicitly calculated (no factorization). As expected, the solution takes a long time.

I have two questions:

  1. Is there any obvious algebraic manipulation that I'm missing that could significantly simplify the system?

  2. If instead of directly calculating the inverse of $C_n$, its factorization was used, could it be reused in the system for a faster solution?

1

There are 1 best solutions below

2
On BEST ANSWER
  1. Unless further assumptions are added, I don't see any real possibilities. Ex. if $A_{n-1}$ and $B_{n-1}$ can be simultaneously diagonalized, then this property carries inductively to $A_n$ and $B_n$ via $C_n$, and the complexity of each subsequent step is linear in the matrix dimension. However, we are not likely to encounter this situation outside of a textbook. A more realistic assumption is that your matrices are hierarchical matrices for which fast multiplication and inversion routines are known.
  2. For numerical reasons, we should avoid forming $C_n$ explicitly, unless we actually need to examine some entries. I do not see a way to reduce the flop count, but we can try to maximize the potential for BLAS3 operations, i.e., matrix matrix multiplication. You obtain an LU factorization of $C_n$ using, say, DGETRF (real double precision arithmetic, Gaussian elimination with partial pivoting) from LAPACK. You need $X_n = C_{n}^{-1}A_{n-1}$ and $Y_n = C_n^{-1} B_{n-1}$. You obtain them by solving the system $$C_n \begin{bmatrix} X_n & Y_n \end{bmatrix} = \begin{bmatrix} A_{n-1} & B_{n-1}\end{bmatrix}$$ using, say, DGETRS (forward/backward substitution using the output of DGETRF) from LAPACK. This subroutine overwrites the original RHS with the solution, so you must begin each iteration by making a copy, say, $$\begin{bmatrix} Z_1 & Z_2 \end{bmatrix} = \begin{bmatrix} A_{n-1} & B_{n-1}\end{bmatrix}.$$ You then multiply with $Z_1 = A_{n-1}$ from the left to produce $$\begin{bmatrix} A_{n-1} X_n & A_{n-1} Y_n \end{bmatrix} = \begin{bmatrix} A_{n-1} C_n^{-1} A_{n-1} & A_{n-1} C_n^{-1} B_{n-1} \end{bmatrix},$$ using DGEMM from BLAS. You now have $A_n$ and $B_n$ can be computed with a single DGEMM operation. Organizing the calculations in this manner does not change the flop count, but increases the opportunity for the machine to use matrix matrix operations. They have high arithmetic intensity (large number of flops per memory operation), hence they are well suited for machines with deep memory hierarchies, and they are also well suited for SIMD operations.
  3. Comment/question: It seems to me that you must have some assumptions on $B_0$ to preclude the case of, say, $$B_0 = \begin{bmatrix} I_r & 0 \\ 0 & X \end{bmatrix},$$ where $I_r$ is the identity matrix of dimension $r$ and $X$ is any suitably size dense matrix. Otherwise, $C_0$ has singular.