For an $m\times n$ matrix $A, m>n,$ how do I show or know that we can't directly apply the LU decomposition to solve the least squares problem?

384 Views Asked by At

For an $m\times n$ matrix $A, m>n,$ how do I show or know that we can't directly apply the LU decomposition to solve the least squares problem?

I've seen another post that discusses something sort of surrounding it, but doesn't quite reach what I'm wondering. My intuition is that using the LU decomposition to find the LS solution is ill-conditioned, since we use the QR decomposition instead of the normal equations because despite the cost of the QR algorithm the normal equations are far worse conditioned. Am I in the right direction?

2

There are 2 best solutions below

0
On BEST ANSWER

I don't think you can guarantee an LU decomposition exists without forming the normal equations (see the section for general matrices under Existence and Uniqueness on https://en.m.wikipedia.org/wiki/LU_decomposition). If you do form the normal equations, the LU decomposition can be accomplished as a Cholesky decomposition, since $A^TA$ is symmetric positive definite. However, $\kappa(A^TA) = \kappa(A)^2$, so your new system can have a much worse condition number.

You can guarantee that a reduced QR factorization for A exists, foregoing the need to form the normal equations. That's the benefit.

0
On

Given a tall $m\times n$ ($m > n$) matrix $A$ with full-column rank, $A$ does indeed possess a (pivoted) $LU$ factorization, possibly requiring pivoting for existence and usually requiring pivoting for stability. This results in a factorization of the form

$$ PA = \begin{bmatrix} L_{11} & 0 \\ L_{21} & L_{22} \end{bmatrix} \begin{bmatrix} U \\ 0 \end{bmatrix}, $$

where $L_{11}$ and $L_{22}$ are unit lower triangular and $U$ is upper triangular. One can see that $L_{22}$ is irrelevant, so this can be reduced to the "economy" $LU$ factorization

$$ P A = \begin{bmatrix} L_{11} \\ L_{21} \end{bmatrix} U. $$

Unfortunately, this $LU$ factorization doesn't help in finding the least-squares solution to $Ax = b$. Intuitively, the problem is the $L$ matrix isn't orthogonal and thus doesn't preserve the norm of vectors.

There are three solutions to this problem:

  • Form the normal equations $A^\top A x = A^\top b$ and form the $LU$ (in this case, Cholesky) factorization of $A^\top A$. The problem with this approach is that it squares the two-norm condition number of $A$, resulting in a loss of accuracy for the resulting system of equations.

  • Compute a $QR$ factorization of $A$, $A = QR$. Since the matrix $Q$ is orthogonal and preserves the norm of vectors, one can use the $QR$ factorization to obtain the least squares solution by reducing minimizing $\| Ax - b\|_2$ to $\| R x - Q^\top b \|_2$, from which the least-squares solution is readily apparent.

  • You can use $LU$ factorization to solve least squares problems without squaring the condition number, but you have to be more clever. Let $\alpha$ denote the largest absolute value of an entry of $A$. Then, surprisingly enough, the least-squares solution $x_{\rm ls}$ can be found by solving the following square system of linear equations with $LU$ factorization: $$ \begin{bmatrix} -\alpha I & A \\ A^\top & 0 \end{bmatrix}\begin{bmatrix} y \\ x_{\rm ls} \end{bmatrix} = \begin{bmatrix} b \\ 0 \end{bmatrix} $$ The condition number of this enlarged system is only a small multiple more than the condition number of $A$.

All of these strategies have times when they should be preferred over the others, but generally $QR$ factorization is the preferred solution as it's both reasonably fast (at least for a direct method) and as-accurate-as-you-could-hope, unconditional on the input matrix $A$.