Consider an attempt to find the line $f(t)=C+Gt+Ht$ that best approximates a set of points using least squares.
This is a contrived example to try to explain what exactly goes wrong when we have non-independent columns in a matrix $A$ that is part of a linear system $Ax=b$.
Let's say the points are $(0,1),(1,5),(2,1),(3,1),(4,7)$. Using the proposed line, we have the following system of equations
$$C=1$$ $$C+G+H=5$$ $$C+2G+2H=1$$ $$C+3G+3H=1$$ $$C+4G+4H=7$$
which in matrix form is $Ax=b$
$$\begin{bmatrix} 1&0&0\\ 1&1&1\\ 1&2&2\\ 1&3&3\\ 1&4&4\\ \end{bmatrix}\cdot \begin{bmatrix} C\\ G\\ H\end{bmatrix}=\begin{bmatrix} 1\\ 5\\ 1\\ 1\\ 7\end{bmatrix}$$
This system does not have a solution. We can see this by row reducing it to
$$\begin{bmatrix} 1&0&0\\ 0&1&1\\ 0&0&0\\ 0&0&0\\ 0&0&0\\ \end{bmatrix}\cdot \begin{bmatrix} C\\ G\\ H\end{bmatrix}=\begin{bmatrix} 1\\ 4\\ -8\\ -12\\ -10\end{bmatrix}$$
Finding the least squares solution involves solving the system
$$A^TAx=A^Tb$$
We would need to invert $A^TA$ at this point, but we know that since the rank of $A$ is 2, the rank of $A^TA$ can't be larger than 2. Thus, $A^TA$, which is a 3 by 3 matrix, isn't invertible.
If we think about this problem geometrically, what is it that impedes the projection of $A^Tb$ onto the column space of $A^TA$? Ie, geometrically, why do we need to start with an $A$ that has independent columns?
We certainly do not need to start with a matrix $A$ whose columns are linearly independent. It just makes it easy to write down a formula for the unique solution. However, of course we do not need independent columns in order for the projection of the vector $b$ onto the column space of $A$ to exist (and be uniquely defined).
First, $A^\top Ax = A^\top b$ is consistent, independent of the rank of $A$. Second, any solution will be of the form $x=x_0+u$ for any particular solution $x_0$ and any $u\in N(A^\top A) = N(A)$. If we choose $x_0\in R(A^\top A)$,$^*$ then $x_0$ will be the shortest possible solution. But it's quite irrelevant: The projection of $b$ onto $C(A)$ is the vector $Ax$ for any solution $x$ of the normal equations. Indeed, $A(x_0+u) = Ax_0 + Au = Ax_0$, since $u\in N(A)$.
If you're interested, go investigate the pseudoinverse. This has a number of applications in numerical computations, and is easily computable from the SVD decomposition. But it's really not needed here.
$^*$ $R(B)$ denotes the row space of $B$. This is consistent with Strang's texts (and with my own), but for many people it signifies the column space. Isn't notation wonderful?