I'm using Strang's Introduction to Linear Algebra, and I'm a bit confused about QR decomposition.
I understand how to turn $\begin{pmatrix} a & b & c\end{pmatrix}$ into $\begin{pmatrix} A & B & C\end{pmatrix}$, in which A, B and C are perpendicular to each other. Then the textbook goes on to talk about $QR$ $decomposition$.
$\begin{pmatrix} a & b & c\end{pmatrix}$ = $\begin{pmatrix} q_1 & q_2 & q_3 \end{pmatrix}$$\begin{pmatrix} q_1^Ta & q_1^Tb & q_1^Tc \\ 0 & q_2^Tb & q_2^Tc \\ 0 & 0 & q_3^Tc\end{pmatrix}$ , in which $q_1$, $q_2$ and $q_3$ are normalized version of $A$, $B$ and $C$.
This is definitely related to the idea that projecting $b$ onto the whole space:
$b=q_1(q_1^Tb)+q_2(q_2^Tb)+...+q_n(q_n^Tb)$,
and here is where I get stuck: Before we obtain $q_2$, we obtain $B$, which is the error vector when we project $b$ onto $a$, and then we normalize $B$ to obtain $q_2$. I don't get why $b = q_1(q_1^Tb)+q_2(q_2^Tb)$. I'm only projecting $b$ onto $a$, not onto the plane spanned by $a$ and $b$.
I'm also not sure how to normalize $B$ to reach $q_2$. I guess these two questions are connected. I can only go as far as $B=b(I-q_1q_1^T)$, but how do I calculate its length and normalize it?
Let's use slightly different notation, so that our observations will extend to an arbitrary number of vectors.
Start with three vectors $ a_0, a_1, a_2 $. The Gram-Schmidt process starts by taking $ q_0 $ in the direction of $ a_0 $, but of unit length:
Next, you compute the component of $ a_1 $ in the direction of $ q_0 $, $ q_0^T a_1 q_0 $ and then subtract this from $ a_1 $ to be left with the component of $ a_1 $ orthogonal to $ q_0 $. But you do it in steps:
And then you take that, and make it of unit length to compute $ q_1 $:
And then you move on: you compute the components of $ a_2 $ in the direction of $ q_0 $, $ q_0^T a_2 q_0 $, and $ q_1 $, $ q_1^T a_2 q_1 $ and you subtract off those components to be left with the component orthogonal to $ q_0 $ and $ q_1 $:
And then you take that, and make it of unit length to compute $ q_2 $:
Now, if you look at this carefully, you will find that if you
then $ A = Q R $.
Here is another way of looking at this as an algorithm.
Consider $ A = Q R $.
Partition $ A $, $ Q $, and $ R $ so that $$ \left( \begin{array}{c | c c} A_0 & a_1 & A_2 \end{array} \right) = \left( \begin{array}{c | c c} Q_0 & q_1 & Q_2 \end{array} \right) \left( \begin{array}{c | c c} R_{00} & r_{01} & R_{02} \\ \hline 0 & \rho_{11} & r_{12}^T \\ 0 & 0 & R_{22} \end{array} \right) $$ Now, assume that the orthonormal columns of $ Q_0 $ have already been computed, as has upper triangular $ R_{00} $ (the coefficients we discussed before). What we want to do is to compute the elements in $ r_{01} $ and $ \rho_{11} $ as well as the next column of $ Q $, $ q_1 $.
How do we do this?
Multiplying out part of the right-hand side of $$\left( \begin{array}{c | c c} A_0 & a_1 & A_2 \end{array} \right) = \left( \begin{array}{c | c c} Q_0 & q_1 & Q_2 \end{array} \right) \left( \begin{array}{c | c c} R_{00} & r_{01} & R_{02} \\ \hline 0 & \rho_{11} & r_{12}^T \\ 0 & 0 & R_{22} \end{array} \right) $$ we find that $$ a_1 = Q_0 r_{01} + q_1 \rho_{11} $$. Here we know $ a_1 $, $ Q_0 $ and we know that $ q_1 $ will be orthogonal to $ Q_0 $.
So, apply $ Q_0^T $ from the left to both side: $$ Q_0^T a_1 = Q_0^T Q_0 r_{01} + Q_0^T q_1 \rho_{11} = r_{01} $$ so, we know how to compute the coefficients in vector $ r_{01} $. When we go back to $$ a_1 = Q_0 r_{01} + q_1 \rho_{11} $$ and compute $$ a_1^\perp = \rho_{11} q_1 = a_1 - Q_0 r_{01} $$ the component of $ a_1 $ orthogonal to the space spanned by the columns of $ Q_0 $. (Notice: $ a_1^\perp = a_1 - Q_0 r_{01} = a_1 - Q_0 ( Q_0^T a_1 ) $ which is the formula for the component orthogonal to...)
All that is left then is to compute the length of $a_1^\perp$ as $ \rho_{11} = \| a_1^\perp \|_2 $ and normalize: $ q_1 = a_1 / \rho_{11} $.
Bingo! we have computed the next columns of $ Q $ and $ R $.