According to the original thought, our goal is to minimize the quadratic error $$\min\{\frac{1}{2}(Ax-b)^2 \}$$
Then, we search the extremum by the derivation of $x$ $$A^T(Ax-b)=0$$ $$A^TAx=A^Tb$$ $$x=(A^TA)^{-1}A^Tb$$
This is the classical interpretation of Least Square, with a little pic by myself:

Furthermore, there is another interpretation according to the lecture of Prof. STRANG, which indicates that we are looking for the projection from $b$ to the column space of $A$ instead of $b$: $$p=Ax=A(A^TA)^{-1}A^Tb$$
with the projection matrix: $$A(A^TA)^{-1}A^T$$
Then, we get another new interpretation with the pic:
These are two different interpretations of Least Square, with however same formulation. So, I am confused. Why could this happen? Which interpretation is right? And Why?
Both interpretations are correct.
The only thing that is NOT correct is your interpretation that the column space of $A$ must be the same straight line of your first picture.
EDIT. They cannot give different values of $x$, because $x$ is the same in both intepretations.
First one: $x=(A^TA)^{-1}A^Tb$.
Second one: $x$ is the solution of the linear system $Ax = b$, when you replace $b$ by $p=Ax=A(A^TA)^{-1}A^Tb$.
Ok, do it: $Ax = p =A(A^TA)^{-1}A^Tb $.
Isn't $x = (A^TA)^{-1}A^Tb$ a solution of this system?