QR decomposition: From orthogonal vectors to orthonormal ones

731 Views Asked by At

I'm using Strang's Introduction to Linear Algebra, and I'm a bit confused about QR decomposition.

I understand how to turn $\begin{pmatrix} a & b & c\end{pmatrix}$ into $\begin{pmatrix} A & B & C\end{pmatrix}$, in which A, B and C are perpendicular to each other. Then the textbook goes on to talk about $QR$ $decomposition$.

$\begin{pmatrix} a & b & c\end{pmatrix}$ = $\begin{pmatrix} q_1 & q_2 & q_3 \end{pmatrix}$$\begin{pmatrix} q_1^Ta & q_1^Tb & q_1^Tc \\ 0 & q_2^Tb & q_2^Tc \\ 0 & 0 & q_3^Tc\end{pmatrix}$ , in which $q_1$, $q_2$ and $q_3$ are normalized version of $A$, $B$ and $C$.

This is definitely related to the idea that projecting $b$ onto the whole space:

$b=q_1(q_1^Tb)+q_2(q_2^Tb)+...+q_n(q_n^Tb)$,

and here is where I get stuck: Before we obtain $q_2$, we obtain $B$, which is the error vector when we project $b$ onto $a$, and then we normalize $B$ to obtain $q_2$. I don't get why $b = q_1(q_1^Tb)+q_2(q_2^Tb)$. I'm only projecting $b$ onto $a$, not onto the plane spanned by $a$ and $b$.

I'm also not sure how to normalize $B$ to reach $q_2$. I guess these two questions are connected. I can only go as far as $B=b(I-q_1q_1^T)$, but how do I calculate its length and normalize it?

2

There are 2 best solutions below

1
On BEST ANSWER

Let's use slightly different notation, so that our observations will extend to an arbitrary number of vectors.

Start with three vectors $ a_0, a_1, a_2 $. The Gram-Schmidt process starts by taking $ q_0 $ in the direction of $ a_0 $, but of unit length:

  • $\rho_{0,0} = \| a_0 \|_2 $
  • $ q_0 = a_0 / \rho_{0,0} $

Next, you compute the component of $ a_1 $ in the direction of $ q_0 $, $ q_0^T a_1 q_0 $ and then subtract this from $ a_1 $ to be left with the component of $ a_1 $ orthogonal to $ q_0 $. But you do it in steps:

  • $ \rho_{0,1} = q_0^T a_1 $
  • $ a_1^\perp = a_1 - \rho_{0,1} q_0 $ (the component perpendicular to $ q_0 $.)

And then you take that, and make it of unit length to compute $ q_1 $:

  • $ \rho_{1,1} = \| a_1^\perp \|_2 $
  • $ q_1 = a_1^\perp / \rho_{1,1} $

And then you move on: you compute the components of $ a_2 $ in the direction of $ q_0 $, $ q_0^T a_2 q_0 $, and $ q_1 $, $ q_1^T a_2 q_1 $ and you subtract off those components to be left with the component orthogonal to $ q_0 $ and $ q_1 $:

  • $ \rho_{0,2} = q_0^T a_2 $
  • $ \rho_{1,2} = q_1^T a_2 $
  • $ a_2^\perp = a_2 - \rho_{0,2} q_0 - \rho_{1,2} q_1 $ (the component perpendicular to $ q_0 $.)

And then you take that, and make it of unit length to compute $ q_2 $:

  • $ \rho_{2,2} = \| a_2^\perp \|_2 $
  • $ q_2 = a_2^\perp / \rho_{2,2} $

Now, if you look at this carefully, you will find that if you

  • make $ a_0 $, $ a_1 $, and $ a_2 $ the columns of matrix $ A $,
  • make $ q_0 $, $ q_1 $, and $ q_2 $ the columns of matrix $ Q $,
  • $ \rho_{i,j} $ the elements of upper triangular matrix $ R $

then $ A = Q R $.

Here is another way of looking at this as an algorithm.

Consider $ A = Q R $.

Partition $ A $, $ Q $, and $ R $ so that $$ \left( \begin{array}{c | c c} A_0 & a_1 & A_2 \end{array} \right) = \left( \begin{array}{c | c c} Q_0 & q_1 & Q_2 \end{array} \right) \left( \begin{array}{c | c c} R_{00} & r_{01} & R_{02} \\ \hline 0 & \rho_{11} & r_{12}^T \\ 0 & 0 & R_{22} \end{array} \right) $$ Now, assume that the orthonormal columns of $ Q_0 $ have already been computed, as has upper triangular $ R_{00} $ (the coefficients we discussed before). What we want to do is to compute the elements in $ r_{01} $ and $ \rho_{11} $ as well as the next column of $ Q $, $ q_1 $.

How do we do this?

Multiplying out part of the right-hand side of $$\left( \begin{array}{c | c c} A_0 & a_1 & A_2 \end{array} \right) = \left( \begin{array}{c | c c} Q_0 & q_1 & Q_2 \end{array} \right) \left( \begin{array}{c | c c} R_{00} & r_{01} & R_{02} \\ \hline 0 & \rho_{11} & r_{12}^T \\ 0 & 0 & R_{22} \end{array} \right) $$ we find that $$ a_1 = Q_0 r_{01} + q_1 \rho_{11} $$. Here we know $ a_1 $, $ Q_0 $ and we know that $ q_1 $ will be orthogonal to $ Q_0 $.

So, apply $ Q_0^T $ from the left to both side: $$ Q_0^T a_1 = Q_0^T Q_0 r_{01} + Q_0^T q_1 \rho_{11} = r_{01} $$ so, we know how to compute the coefficients in vector $ r_{01} $. When we go back to $$ a_1 = Q_0 r_{01} + q_1 \rho_{11} $$ and compute $$ a_1^\perp = \rho_{11} q_1 = a_1 - Q_0 r_{01} $$ the component of $ a_1 $ orthogonal to the space spanned by the columns of $ Q_0 $. (Notice: $ a_1^\perp = a_1 - Q_0 r_{01} = a_1 - Q_0 ( Q_0^T a_1 ) $ which is the formula for the component orthogonal to...)

All that is left then is to compute the length of $a_1^\perp$ as $ \rho_{11} = \| a_1^\perp \|_2 $ and normalize: $ q_1 = a_1 / \rho_{11} $.

Bingo! we have computed the next columns of $ Q $ and $ R $.

1
On

Hint: write out the formula for Gram-Schmidt process, and try to observe linear relations.