Projection of a vector $v$ onto the column space of a matrix $A$ is given by $AA^\dagger v$. From the definition of Moore-Penrose Inverse we know that $AA^\dagger v = (A^T)^\dagger A^T v $.
Below is the code for implementing the projection of a random vector onto the space of a random matrix.
I would like to know why is there a huge difference between the two methods of calculating the projection.
% testingprojfrostackexchange
clear;
M = 1400;
N = 1300;
r = 1;
A = rand(M,N);
u = rand(M,r);
projLN = pinv(A')*(A'*u);%This is projection through Least Norm
projLS = A*(pinv(A)*u);%This is projection through Least Square
[Q R] = qr(A);
Q = Q(:,1:N);
z1 = Q*(Q'*u);%This is the actual projection
display('(projection through QQT) - projLN');
norm(z1-projLN)/norm(projLN)
display('(projection through QQT) - projLS');
norm(z1-projLS)/norm(projLS)
Output
>> stackexchange
(projection through QQT) - projLN
ans =
2.1569e-13
(projection through QQT) - projLS
ans =
8.3546e-15
The actual projection which is given by $Q(:,1:N)Q(:,1:N)^T$, where $Q$ is from the Housholder decomposition of $A$. We get the projections from $AA^\dagger$ and $(A^T)^\dagger A^T$ and find that $AA^\dagger$ is much better when $A$ is a tall matrix.
Assume that $A\in\mathbb{R}^{m\times n}$ has full column rank and that $A=QR$, $Q\in\mathbb{R}^{m\times n}$, $R\in\mathbb{R}^{n\times n}$, is its "economical" QR factorization.
The dense QR factorization in Matlab is (most likely) implemented using a stable Householder orthogonalization which gives a computed $Q$ which is not exactly orthogonal but is very close to being an orthogonal matrix in the sense that there is an $m\times n$ orthogonal matrix $\hat{Q}$ such that $$ A+E=\hat{Q}R, \quad \|Q-\hat{Q}\|_2\leq c_1(m,n)\epsilon, \quad \|E\|_2\leq c_2(m,n)\epsilon\|A\|_2, $$ where $c_i$ are moderate constants possibly depending on $m$ and $n$ and $\epsilon$ is the machine precision (for double precision $\approx 10^{-16}$).
In the finite precision calculation, we like the orthogonal matrices because they do not amplify the errors. Indeed, using the assumption above we can show that $$ \|\mathrm{fl}(QQ^Tu)-\hat{Q}\hat{Q}^Tu\|_2\leq c_3(m,n)\epsilon\|u\|_2. $$
Although Matlab uses SVD to compute the pseudo-inverse, we can assume that it is computed using the QR factorization. The final reasoning is the same. We have then $A^+=R^{-1}Q^T$ but this time with a little bit more technical work this gives $$ \|\mathrm{fl}(AA^{+}u)-\hat{Q}\hat{Q}^Tu\|_2= \|\mathrm{fl}(QRR^{-1}Q^Tu)-\hat{Q}\hat{Q}^Tu\|_2\leq c_4(m,n)\kappa_2(A)\|u\|_2, $$ where $\kappa_2(A)=\|A\|_2\|A^+\|_2$ is the spectral condition number of $A$. Note that in finite precision $RR^{-1}\neq I$ and the error committed by this operation is proportional to the conditioning of $R$ which approximately (note that $R$ is not the exact R-factor) is equal to that of $A$.
To conclude, both methods you tried are bad in the sense that the error depends on the condition number of $A$. You cannot see that much difference now since $A$ is random and quite well conditioned. The errors should be more visible when $A$ would be more ill-conditioned.
EDIT: Although it might seem that the second approach gives somewhat more accurate results, it is not true in general. The following snippet eventually finds a counterexample: