Solve many linear systems of similar structure

76 Views Asked by At

Let $\Phi \in R^{N \times n}$, $N > n$, with $\Phi$ having linearly independent columns.

I have to solve a lot of linear systems of the form $$ (\Phi^\top D_i \Phi) x = \Phi^\top b_i, $$ where $D_i \in R^{N \times N}$ is a different strictly positive definite diagonal matrix and $b_i \in R^N$ a different RHS for every system but the matrix $\Phi$ stays constant.

Is there any way to precompute some factorization such that the linear system is fast to solve every iteration? I stumbled upon this "solution": Prefactoring to solve many similar linear systems, but it doesn't seem to work:

>> Phi = randn(100,10);
>> D = diag(randn(100,1));
>> [Q, R] = qr(Phi, 0);
>> norm(inv(Phi'*D*Phi) - inv(R) * Q' * inv(D) * Q * inv(R)')

ans =

    5.8901
2

There are 2 best solutions below

4
On BEST ANSWER

Unfortunately, this answer will be negative: 1.) The mismatch is expected, because the current approach is mathematically inconsistent. 2.) While it may be possible to fix it, I don't know how.


The usual calculation, as in the link you provided, is \begin{align} \Phi^T D \Phi x &= b \tag{*} \\ R^T Q^T D Q R x &= b \\ Q^T D Q R x &= R^{-T} b \\ D Q R x &= Q (R^{-T} b) \tag{**} \\ Q R x &= D^{-1} (Q (R^{-T} b)) \\ R x &= Q^T (D^{-1} (Q (R^{-T} b))) \\ x &= R^{-1} (Q^T (D^{-1} (Q (R^{-T} b)))) \end{align} However, (**) was false: $Q$ only has a left inverse ($Q^T Q = {}$ the identity) not a right inverse ($Q Q^T \not= {}$ the identity) when $Q R$ is the reduced factorization of $\Phi \in \mathbb{R}^{N \times n}$ with $N > n$. This can be substantiated by numerical evidence (MATLAB):

>> Phi = randn(100,10);
[Q,R] = qr(Phi,0);
norm(Q'*Q - eye(10))
norm(Q*Q' - eye(100))
ans =
   6.9928e-16
ans =
    1.0000

Okay: Back to (mathematical) square one. Let's investigate the solvability of (*).

  • $\Phi^T z = b$ has infinitely many solutions, since $\Phi^T \in \mathbb{R}^{n \times N}$ and has more columns ($N$) than rows ($n$).
  • $D y = z$ has a unique solution (assuming $D$ is full rank), since $D \in \mathbb{R}^{N \times N}$.
  • $\Phi x = y$ has a unique solution (again, assuming $\Phi$ is full rank) if and only if $y$ is the in the column rank (span) of $\Phi$. Backtracking, $z = \Phi^{-T} b$, and $y = D^{-1} \Phi^{-T} b$. Therefore, there is a unique solution if and only if $D^{-1} \Phi^{-T} b$ is in the column space of $\Phi$.

I don't know how to prove this, however, numerical tests in MATLAB suggest that the probability of having $y$ in the column space of $\Phi$ is is extremely low.

  • Test description: Let's consider a simplified example, with $\Phi \in \mathbb{R}^{4 \times 2}$, $D \in \mathbb{R}^{4 \times 4}$, and $b \times \mathbb{R}^2$. We will compute $y = D^{-1} \Phi^{-T} b$, compute the reduced row echelon form (rref) of the augmented matrix $A = [\Phi ~ | ~ y]$. Note that $A$ (and $\text{rref}(A)$) has $4$ rows and $3$ columns. Recall: If any rows of $\text{rref}(A)$ have a non-zero number in the third column, then they must have a non-zero in the first or second column -- otherwise, the system $\Phi x = y$ is inconsistent, and $y$ is not in the column space of $\Phi$.
  • Numerical results:

    >> clear
    any_consistent_Phi = false();
    any_low_rank_PhiDPhi = false();
    for i = 1:10000
        Phi = randn(4,2);
        D = diag(randn(4,1));
        b = randn(2,1);
        y = D\((Phi')\b);
        if all(rref([Phi, y]) ~= [eye(3); zeros(1,3)])
            any_consistent_Phi = true();
        end
        if rank(Phi' * D * Phi) < 2
            any_low_rank_PhiDPhi = true();
        end
    end
    any_consistent_Phi
    any_low_rank_PhiDPhi
    any_consistent_Phi =
      logical
       0
    any_low_rank_PhiDPhi =
      logical
       0
    
  • Discussion: Among $10\,000$ pairs of $(\Phi, D, b)$, there was not one for which $y = D^{-1} \Phi^{-T} b$ was in the column space of $\Phi$ (represented by the fact that any_consistent_Phi was false()).

Last comment: I suppose we could compute the least-squares solution of $\Phi x = y$ satisfying $\Phi^T \Phi x = \Phi^T y$. However, this seems strange to me -- assuming (*) is full rank, why compute a least-squares solution? (The any_low_rank_PhiDPhi variable in the above numerical test returned false(), which means that all $10\,000$ matrices were full rank... so if there are any non-full-rank matrices $\Phi^T D \Phi$, they happen very infrequently.)

0
On

When $D$ is a positive diagonal matrix, is is very easy to find $\sqrt{D}$ with $\sqrt{D}^2=D$ by simply taking the root of all elements. That leaves you with $\Phi^T \sqrt{D}^T \sqrt{D} \Phi x = \Phi b$. Using $\Psi = \sqrt{D}\Phi$ you find $\Psi^T \Psi x = \Psi \sqrt{D}^{-1} b$. Lastly you set $\sqrt{D}^{-1}b=r$ and arrive at

$$\Psi^T \Psi x =\Psi^Tr,$$ something that is known as Normal Equations and is used to solve least square problems, in the case of a tall and thin $\Psi$, as you describe it.

A way better solution to this problem (in terms of numerical stabilty and calculation time) is to solve the least squares problem $\min_x(||\Psi x -r||_2)$.

If all entries of $D$ are about the same value, you can simply use the QR decomposition of $\Phi$ and solve $$Q_\Phi R_\Phi x-\sqrt{D}^{-1} \sqrt{D}^{-1} b.$$ The error, when using this method is scaled by $||\sqrt{D}^{-1}||_2$, ie the root of the smallest entry of $D$. If that is not too large, you could use the solution $x$ as starting point for iterative refinement.