Is the condition number (SVD) a measure of the orthogonality of a matrix?

230 Views Asked by At

We say a matrix is well-conditioned when its condition number is close to 1, and is ill-conditioned when the condition number is large. I'm puzzled whether the condition number has a simple interpretation? Does it tell us some basic property about a matrix? It appears to have high correlation with the "orthogonality" of a matrix. Not sure if this is true, or how true this is.

Consider two extremes.

First, suppose $A$ is a most well-conditioned square (for simplicity) matrix, having condition number of $1$. Then an SVD of $A$ is necessarily of the form $A=U(\sigma I)V^*$, where $U$ and $V$ are unitary and $\sigma>0$. Apparently, $A$ is itself a scaled unitary matrix, and its columns (and rows) are orthogonal.

Secondly, suppose $A$ is a most ill-conditioned, say $3\times 3$ for simplicity, matrix, with singular values of $1, 0, 0$. Hence $A=U\,\mathrm{diag}(1, 0, 0)\,V^*$, where $U=[u_1,u_2, u_3]$ and $V=[v_1, v_2, v_3]$. In other words, $A=u_1v_1^*$. If we denote $A$ by $A=[a_1, a_2, a_3]$, and measure the orthogonality of its columns by $\gamma_{ij}\triangleq \vert a_i^* a_j\vert/(\Vert a_i\Vert \Vert a_j \Vert)$. Apparently, $\gamma_{ij}=1$ for all $i, j$, so $A$ has "the least orthogonal" columns. Same for its rows.

Can this be generalized somehow? Clearly, when $\gamma_{ij}=0$ for any $i\ne j$, we have the first case, a most well-conditioned matrix. And when $\gamma_{ij}=1$ for some $i\ne j$, we have the second case. The converse however is not true for the second case, which does not necessarily require $\gamma_{ij}=1$ for some $i\ne j$, but simply requires columns or rows to be linearly dependent. So I'm still puzzled: Is the condition number higher when the columns (rows) of a matrix is "less orthogonal" (or more correlated)?

1

There are 1 best solutions below

2
On BEST ANSWER

Given any $k > 0 $, we can always find a square matrix $A$ which has condition number $k$ and uncorrelated columns : $A = \text{diag}(k,1)$ for example. Thus, the columns of a matrix with finite condition number can be 'arbitrarily uncorrelated'.

On the other hand, a singular matrix must have linearly dependent columns, and so the columns must in particular be correlated.

A more interesting question is the following : How correlated can the columns of a matrix be if it has finite condition number $k$? To answer this, we work in a more abstract set up.

Consider a linear endomorphism $ A : V \to V$, where $V$ is a finite dimensional inner product space. Then, the correlation between the images of vectors $v$ and $w$ under $A$ is :

$\text{corr}(Av,Aw) = \dfrac{\langle Av,Aw \rangle}{\sqrt{\langle Av,Av \rangle \langle Aw,Aw \rangle}} = \dfrac{\langle v,A^{*}A, w \rangle}{\sqrt{\langle v,A^{*}A, v \rangle \langle w,A^{*}A, w \rangle}} = \dfrac{{\langle v, w \rangle}_{A}}{\sqrt{{\langle v, v \rangle}_{A} {\langle w, w \rangle}_{A}}} $

where $\langle v, w \rangle_{A} : = \langle v,A^{*}A, w \rangle $ defines a positive semi-definite bilinear form. The question we ask is the following :

For $v$ and $w$ in $V$, with $\langle v , w \rangle = 0$ and $|| v || = ||w || = 1$, how big can $\text{corr}(Av,Aw)$ be? Exploiting the scale invariance of the correlation, this is equivalent to :

$$\underset{\langle v,w \rangle = 0 \ \text{and} \ \langle v,v \rangle _{A} = \langle w,w \rangle _{A} = 1 }{\max}{\text{corr}(Av,Aw)}$$

Note that the normalised eigenvectors of $A^{*} A$ form an orthonormal basis $\mathcal{B} = \{ b_1, b_2 \cdots b_n\}$ which is uncorrelated : $\langle b_{i} , b_{j} \rangle _{A} = \sigma_{i} ^{2} \delta_{ij}$, where $\sigma_{\min} = \sigma_{1} \leq \sigma_{2} \cdots \leq \sigma_{n} = \sigma_{\max}$ are the singular values of $A$. Writing $v = \sum v_{i} b_{i} $ and $ w = \sum w_{i} b_{i} $, the problem is reduced to :

$$\max {\sum{v_i w_i \sigma_{i}^{2}}} \ $$ subject to $\sum v_{i} w_{i} = 0 \ \text{and} \ \sum v_{i}^{2} \sigma_{i}^{2} = \sum w_{i}^{2} \sigma_{i}^{2} = 1$

A Lagrange multiplier argument shows that the maximum is attained at $ v = \frac{1}{\sqrt2} (b_1 + b_n) $ and $ w = \frac{1}{\sqrt2} (b_n - b_1 )$, and equals $\frac{k^2 - 1}{k^2 + 1}$, where $k = \frac{\sigma_\text{max}}{\sigma_\text{min}}$ is the condition number of $A$.

Thus the more ill conditioned a matrix, the more correlated the columns can be.