Condition number change in Cholesky matrix decomposition

318 Views Asked by At

Give a symmetric positive definite matrix $A$ that has a LDLT decomposition $A = L D L^{\top}$, why is the condition number of $A$ not less than that of matrix $D$, i.e., $\mbox{cond} (A) \geqslant \mbox{cond}(D)$?

I tried to prove this conclusion by definition, but failed. (03/06,2023)

1

There are 1 best solutions below

11
On

Since the $n$-by-$n$ matrix $A$ is symmetric, it can be diagonalized using an eigendecomposition $A = V \Lambda V^{\top}$, where $V$ is an orthogonal matrix and $\Lambda$ is a diagonal matrix with real eigenvalues on the diagonal. Since $A$ is also positive definite, then the smallest eigenvalue will be positive $\min_i \lambda_i = \lambda_n > 0$.

But the eigendecomposition also admits the following Rayleigh quotient result: $$\max_i \lambda_i = \lambda_1 = \max_{x \neq 0}\frac{x^{\top}Ax}{x^{\top}x} = \max_{x: \text{ }\lVert x \rVert = 1}x^{\top}Ax$$ Likewise: $$\min_i \lambda_i = \lambda_n = \min_{x \neq 0}\frac{x^{\top}Ax}{x^{\top}x} = \min_{x: \text{ }\lVert x \rVert = 1}x^{\top}Ax > 0$$ (See https://en.wikipedia.org/wiki/Rayleigh_quotient)

So the eigendecomposition provides a leading eigenvector $v_1$ which maximizes the Rayleigh quotient, having value $\lambda_1$, and likewise trailing eigenvector $v_n$ which minimizes the quotient, having value $\lambda_n$.

The condition number of $A$ will of course be: $$\mbox{cond}(A) = \lambda_1 / \lambda_n$$ Since $D$ is a diagonal matrix with values $d_i$, its condition number will be: $$\mbox{cond}(D) = d_{max}/d_{min} \quad d_{max} := \max_i d_i \quad d_{min} := \min_i d_i > 0$$

$\mathsf{Statement}$: The matrix $L$ from the LDL decomposition is lower-triangular with ones on the diagonal (also called unit-lower-triangular or unitriangular), therefore: $$d_{max} \le \lambda_1 \quad d_{min} \ge \lambda_n$$

See proof of $\mathsf{Statement}$ at (*) below. Intuitively, the unit-lower-triangular matrix $L$ is not attempting to simultaneously maximize and minimize the Rayleigh quotient, in contrast to the eigendecomposition $V$, so $L$ will result in "less stretched limits".

As a result of the $\mathsf{Statement}$: $$\lambda_1 \cdot d_{min} \ge \lambda_n \cdot d_{max}$$ $$\lambda_1 / \lambda_n \ge d_{max} / d_{min}$$ $$\mbox{cond}(A) \ge \mbox{cond}(D)$$


(*) Proof of $\mathsf{Statement}$:

By definition, $d_{max} := \max_i d_i = \max_i e_i^{\top} D e_i$ and $d_{min} := \min_i d_i = \min_i e_i^{\top} D e_i$ where $e_i$ is the selection vector, which has $1$ at position $i$ and zeros elsewhere, and $D$ is the diagonal matrix from the LDL decomposition. Obviously, $\lVert e_i \rVert = 1$.

From the LDL decomposition: $$A = LDL^{\top} \quad \implies \quad D = L^{-1}AL^{-\top}$$ so that $$d_i = e_i^{\top} D e_i = e_i^{\top} L^{-1}AL^{-\top} e_i$$

Note: The inverse of $L$ always exists, since $L$ is lower-triangular with ones on the diagonal (also called unit-lower-triangular or unitriangular), because it has a full set of $n$ eigenvalues which all have value $1$, so it is non-singular. (If $A$ was singular, i.e., had zero eigenvalues, then the diagonal $D$ would have had zero values.)

The matrix $L$ is lower-triangular with ones on the diagonal (also called unit-lower-triangular or unitriangular). Likewise, $L^{\top}$ is unit-upper-triangular. The following facts are known:

  • The inverse of a lower triangular matrix, if it exists, is lower triangular. See p. 9 in the following link. In our case, $L$ is non-singular, so $L^{-1}$ exists.
  • The inverse of an upper triangular matrix, if it exists, is upper triangular. See p. 9 in the following link.
  • The inverse of a unitriangular matrix is unitriangular. See p. 16 in the following link.

Link: http://homepages.warwick.ac.uk/~ecsgaj/matrixAlgSlidesC.pdf by Peter J. Hammond (not affiliated with me). See also https://en.wikipedia.org/wiki/Triangular_matrix#Algebras_of_triangular_matrices.

Let's denote $x_i := L^{-\top} e_i$ so that $d_i = e_i^{\top} D e_i = x_i^{\top}Ax_i$

Observe that due to the unit-upper-triangular structure of $L^{-\top}$, the vector $x_i = L^{-\top} e_i$ will simply be the $i^{\text{th}}$ column of $L^{-\top}$, which will have a $1$ at position $i$, and can have other non-zero values at positions $1 \text{ to } i-1 \text{ (inclusive)}$. $x_i$ will have zeros at positions $i+1 \text{ to } n \text{ (inclusive)}$.

With $A = V \Lambda V^{\top} = \sum_{j=1}^n \lambda_j v_j v_j^{\top}$ where $v_j$ is the $j^{\text{th}}$ eigenvector (the $j^{\text{th}}$ column of $V$), we have:
$$d_i = e_i^{\top} D e_i = x_i^{\top}Ax_i = \sum_{j=1}^n \lambda_j (x_i^{\top}v_j)(v_j^{\top}x_i) = \sum_{j=1}^n \lambda_j (v_j^{\top}x_i)^2$$

For $i = 1$, $x_1 = (1, 0, \dots, 0)^{\top}$, therefore, since $V$ has orthonormal columns and also orthonormal rows, and in particular the $1^{\text{st}}$ row, then the $1^{\text{st}}$ component of vectors $\{v_j\}_{j=1}^n$ will all have magnitude $\lt 1$, resulting in weights $w_j := (v_j^{\top}x_1)^2 \lt 1$, producing $d_{max} = d_1 \le \lambda_1$.

Note: the assumption $d_{max} = d_1$ is valid since the LDL decomposition, both as in the original J.R. Bunch and L. Kaufman (1977) paper, and also in actual implementations, can use a permutation matrix. In the original paper this is intended for better numerical stability, but in the current proof, the availability of such a permutation allows to assume that the largest diagonal element $d_{max}$ can be assumed to be located at the first element $d_1$.

Edit 22/3/2023:

For $i = n$, as correctly observed by user @Exodd in the comments, the line of reasoning used for $i = 1$ cannot be used as it is for $i = n$. Instead, the following alternate (and hopefully correct) proof is given.

For $i = n$, since the original matrix $A$ is positive definite, it has an inverse $A^{-1}$, which is thus used with the line of reasoning presented above for $i = 1$. Specifically, $\lambda_{min}(A) = \lambda_{max}(A^{-1})$ and $d_{min}(A) = d_{max}(A^{-1})$. So using $A^{-1}$ with the reasoning for $i = 1$ will give: $d_{max}(A^{-1}) = d_1(A^{-1}) \le \lambda_1(A^{-1})$, equivalent to $d_{min}(A) = d_n(A) \ge \lambda_n(A)$ as required.