Computationally efficient way to compute projection matrix

1.3k Views Asked by At

Is there some computationally efficient way of computing $(A^\top A)^{-1}$? I would like to know this cause I want to compute the projection matrix $$ P = A(A^\top A)^{-1} A^\top $$ efficiently.

1

There are 1 best solutions below

5
On BEST ANSWER

I'm going to suggest an indirect method which, while slightly longer, is more accurate.

Suppose $A$ is $n\times m$ with $n > m$ has full column rank. As a rule of thumb, you can usually get away without computing with the matrix $A^\top A$ in matrix computations, and this is advisable if possible.

You can reformulate your goal as finding an orthonormal basis $Q\in\mathbb{R}^{n\times m}$ for the column space of $A$. One can then compute matrix-vector multiplications with the projection $\Pi_A = A(A^\top A)^{-1}A^\top = QQ^\top$ by $4mn$ operations by evaluation $\Pi_Ax = Q(Q^\top x)$.

The most efficient and accurate way to do this is by QR factorization, which under the hood is implemented with Householder reflectors and is very accurate. You want to use economy QR factorization $A=QR$ which computes $Q\in \mathbb{R}^{n\times m}$ with orthonormal columns rather than full QR factorization $A = Q'R'$ which computes a full $n\times n$ matrix $Q'\in\mathbb{R}^{n\times n}$. In MATLAB, this is done by the call [Q,~] = qr(A,0).

Let's see how this works. Note that $R$ is nonsingular. Thus,

$$ \Pi_A = A(A^\top A)^{-1}A^\top = QR(R^\top R)^{-1}R^\top Q^\top = QQ^\top, $$

as promised.

If $A$ is rank-deficient (or approximately rank-deficient), you will want to use a SVD and threshold out small singular values to compute an orthonormal basis for the (approximate) column space. I can elaborate on this point in the comments if desired.

Why is this worth the hassle? Consider the following MATLAB example

>> Q_true = randn(10000,100); [Q_true,~] = qr(Q_true, 0);
>> A = Q_true * gallery('randsvd',100,1e10);
>> x = randn(10000,1);
>> tic; disp(norm(A*((A'*A)\(A'*x)) - Q_true*(Q_true'*x))); toc
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =
1.095993e-20. 
 
     2.299635478974520e+01

Elapsed time is 0.003516 seconds.
>> tic; [Q,~] = qr(A,0); disp(norm(Q*(Q'*x) - Q_true*(Q_true'*x))); toc
     1.177844914088478e-06

Elapsed time is 0.021228 seconds.

The method based on QR factorization is like eight times slower, which isn't great. But the error for the QR-based method is ten million times smaller than for the $(A^\top A)^{-1}$ method!

If you can afford it and $A$ is even moderately ill-conditioned, the QR factorization method pays for its modestly increased cost with dramatically increased accuracy.

As Ben Grossmann suggests, iterative methods may be appropriate of $A$ is well-conditioned and $n$ and $m$ are both large.