In order to solve $Ax=b$, GMRES method finds $x_n$ in the $n$-th Krylov subspace i.e.:
$$K_n=\text{span}\{b,Ab,...,A^{n-1}\}$$
and we have the condition: minimize $\|r_n\|_2$, which $r_n=b-Ax_n$
Now we have a new condition instead of the previous one: $r_n \perp K_n$
How to gain the least square problem like the one in GMRES?
Well, I met this problem in a test few days ago.

I think there must be something wrong with this problem:
In the 3rd line $r_m \perp AK_m$ How could this be?For $r_m$ is actually in $K_{m+1}$ then $r_n\parallel b$?
Note that if $x_m\in x_0+K_m$, then $r_m:=b-Ax_m\in r_0+AK_m\subset K_{m+1}$. The GMRES method defines the iterate $x_m$ such that $$ \|r_m\|_2=\|b-Ax_m\|_2=\min_{y\in x_0+K_m}\|b-Ay\|_2=\min_{c\in K_m}\|r_0-Ac\|_2 =\min_{d\in AK_m}\|r_0-d\|_2. $$ It is well known that an optimization over a subspace is equivalent to certain orthogonality of the error. We have that the GMRES residual is given by $r_m=r_0-d_m$ so that $$ \|r_m\|_2=\|r_0-d_m\|_2=\min_{d\in AK_m}\|r_0-d\|_2, $$ which is equivalent to $$ r_m=r_0-d_m\perp AK_m. $$
You might replace the condition $r_m\perp AK_m$ (equivalent to minimizing the 2-norm of the residual and leading to GMRES) by $r_m\perp K_m$. This defines another method usually called FOM, which, if $A$ is SPD, reduces to conjugate gradients which minimize the $A$-norm of the error $x-x_m$ (so FOM can be considered as "CG for nonsymmetric systems").
Indeed, if $A$ is SPD, we have that $$ r_0-d_m\perp K_m \quad \Leftrightarrow \quad A^{-1/2}r_m=A^{-1/2}r_0-f_m\perp A^{1/2}K_m, $$ where $f_m:=A^{-1/2}d_m\in A^{1/2}K_m$ since $d_m\in AK_m$. So we have $A^{-1/2}r_0$ plus something in $A^{1/2}K_m$ which is supposed to be orthogonal to $A^{1/2}K_m$ as well. This is hence equivalent to $$ \|A^{-1/2}r_m\|_2=\|A^{-1/2}r_0-f_m\|_2=\min_{f\in A^{1/2}K_m}\|A^{-1/2}r_0-f\|_2. $$ Hence the FOM (CG) if $A$ is SPD minimizes the 2-norm of $A^{-1/2}r_m$. Note that if $e_m:=x-x_m$, we have $$ \|A^{-1/2}r_m\|_2=\|r_m\|_{A^{-1}}=\|A^{1/2}e_m\|_2=\|e_m\|_A. $$
Note that in GMRES, the approximate solution is obtained from a small least squares problem with an $(m+1)\times m$ upper Hessenberg matrix computed by the Arnoldi algorithm. However, FOM approximation is determined by solving a linear system with an $m\times m$ (again) upper Hessenberg matrix. There's no difference in solving both the least-squares and the square system (except that the latter might be singular occasionally) as they upper Hessenberg matrices are reduced to triangular forms "on-fly" with essentially the same costs in both methods.