I'm following the source code of the Knockoff-filter method.
I'm trying to understand why doing a QR decomposition of a matrix with added $0$-columns produce any result?
Baiscally, they are taking a matrix and then appending a $0$ matrix of the same size to it, e.g.:
$$\begin{pmatrix}1 &5 &0&0\\2&6&0&0\\3&7&0&0\\4&8&0&0 \end{pmatrix} $$
When they run QR decomposition on this, their Q matrix actually produces a result for the zeros as well... But at-least with the Gram-Schmidt process, I'm not sure how it can be?
x = matrix(1:8, ncol=2)
A = cbind(x, matrix(0,4,2))
qr.Q(qr(A))
And this is the Q-matrix they produce:
$$\begin{pmatrix}-0.1825742 & -8.164966e^{-01} & -0.4000874 & -0.37407225\\ -0.3651484 &-4.082483e^{-01} & 0.2546329 & 0.79697056\\ -0.5477226 &-1.665335e^{-16} & 0.6909965 &-0.47172438\\ -0.7302967 & 4.082483e^{-01} &-0.5455419 & 0.04882607 \end{pmatrix} $$
The first 2 columns are identical to a QR decomposition of the $x$ matrix. How did they get the other 2 columns?
Ok, so it turns out r/LINPACK/LAPACK uses Householder decomposition (which I didn't know about before) - and unlike the Gram-Schmidt process, it focuses on creating the Q's, instead of the R's. So it's essentially an algorithm of finding orthonormal basis up to the number of columns you have.
If you don't know it, in a nutshell - you decompose $A$ by first constructing a series of orthogonal (and invertible) $Q$'s: $$Q_n...Q_1A=R $$ Where each Q is composed of: $$Q_k = \begin{pmatrix}I_{k\times k}&0\\0& F\end{pmatrix} $$
And $F=(I-2\frac{1}{v^Tv}vv^T)$, and $v=||y||e_1-y$, where $y$ is the remaining column and $e_1=(1,0,...)$
So for constructing $Q_1$, $y$ is just the first column, and $e_1$ is just the unit basis with the size equal the #rows. For constructing $Q_2$, you look at $Q_1A$ - and $y$ is the vector of elements in the 2nd column, starting from the 2nd row. $e_1$ is the unit basis but now of size #rows-1.
Once your done, you take the inverse of that, such that $$A = Q_1^{-1}...Q_n^{-1}R := QR $$