Stable Method of orthogonal projection onto a subspace with the help of Moore-Penrose inverse.

553 Views Asked by At

Projection of a vector $v$ onto the column space of a matrix $A$ is given by $AA^\dagger v$. From the definition of Moore-Penrose Inverse we know that $AA^\dagger v = (A^T)^\dagger A^T v $.

Below is the code for implementing the projection of a random vector onto the space of a random matrix.

I would like to know why is there a huge difference between the two methods of calculating the projection.

  % testingprojfrostackexchange
  clear;    
  M = 1400;
  N = 1300;
  r = 1;
  A = rand(M,N);
  u = rand(M,r);

  projLN = pinv(A')*(A'*u);%This is projection through Least Norm
  projLS = A*(pinv(A)*u);%This is projection through Least Square

  [Q R] = qr(A);
  Q = Q(:,1:N);
  z1 = Q*(Q'*u);%This is the actual projection

  display('(projection through QQT) - projLN');
  norm(z1-projLN)/norm(projLN)

  display('(projection through QQT) - projLS');
  norm(z1-projLS)/norm(projLS)

Output

>> stackexchange
   (projection through QQT) - projLN

   ans =

   2.1569e-13

   (projection through QQT) - projLS

   ans =

   8.3546e-15

The actual projection which is given by $Q(:,1:N)Q(:,1:N)^T$, where $Q$ is from the Housholder decomposition of $A$. We get the projections from $AA^\dagger$ and $(A^T)^\dagger A^T$ and find that $AA^\dagger$ is much better when $A$ is a tall matrix.

1

There are 1 best solutions below

6
On BEST ANSWER

Assume that $A\in\mathbb{R}^{m\times n}$ has full column rank and that $A=QR$, $Q\in\mathbb{R}^{m\times n}$, $R\in\mathbb{R}^{n\times n}$, is its "economical" QR factorization.

The dense QR factorization in Matlab is (most likely) implemented using a stable Householder orthogonalization which gives a computed $Q$ which is not exactly orthogonal but is very close to being an orthogonal matrix in the sense that there is an $m\times n$ orthogonal matrix $\hat{Q}$ such that $$ A+E=\hat{Q}R, \quad \|Q-\hat{Q}\|_2\leq c_1(m,n)\epsilon, \quad \|E\|_2\leq c_2(m,n)\epsilon\|A\|_2, $$ where $c_i$ are moderate constants possibly depending on $m$ and $n$ and $\epsilon$ is the machine precision (for double precision $\approx 10^{-16}$).

In the finite precision calculation, we like the orthogonal matrices because they do not amplify the errors. Indeed, using the assumption above we can show that $$ \|\mathrm{fl}(QQ^Tu)-\hat{Q}\hat{Q}^Tu\|_2\leq c_3(m,n)\epsilon\|u\|_2. $$

Although Matlab uses SVD to compute the pseudo-inverse, we can assume that it is computed using the QR factorization. The final reasoning is the same. We have then $A^+=R^{-1}Q^T$ but this time with a little bit more technical work this gives $$ \|\mathrm{fl}(AA^{+}u)-\hat{Q}\hat{Q}^Tu\|_2= \|\mathrm{fl}(QRR^{-1}Q^Tu)-\hat{Q}\hat{Q}^Tu\|_2\leq c_4(m,n)\kappa_2(A)\|u\|_2, $$ where $\kappa_2(A)=\|A\|_2\|A^+\|_2$ is the spectral condition number of $A$. Note that in finite precision $RR^{-1}\neq I$ and the error committed by this operation is proportional to the conditioning of $R$ which approximately (note that $R$ is not the exact R-factor) is equal to that of $A$.

To conclude, both methods you tried are bad in the sense that the error depends on the condition number of $A$. You cannot see that much difference now since $A$ is random and quite well conditioned. The errors should be more visible when $A$ would be more ill-conditioned.


EDIT: Although it might seem that the second approach gives somewhat more accurate results, it is not true in general. The following snippet eventually finds a counterexample:

m = 100;
n = 10;

kappa = 1e6;

while true
    U = orth(rand(m, n));
    D = diag(logspace(0, -log10(kappa), n));
    V = orth(rand(n, n));
    A =  U * D * V;
    u = rand(m, 1);

    x_1 = pinv(A')*(A'*u);
    x_2 = A*(pinv(A)*u);

    [Q, ~] = qr(A, 0);
    x = Q*(Q'*u);

    error_1 = norm(x_1 - x) / norm(x);
    error_2 = norm(x_2 - x) / norm(x);

    if error_1 < error_2
    disp('Gotcha!');
        break;
    end
end