For $m > n$ and a matrix $A \in \mathbb R^{m \times n}$ of rank $r \leq n$, we want to \begin{equation} \text{minimize } \| A\mathbf x - \mathbf b\| \end{equation} over all $\mathbf x \in \mathbb R^n$. This can be done using the SVD of $A$. Solving the above problem is equivalent to solving the linear system \begin{equation} A^\top A\mathbf x = A^\top \mathbf b, \end{equation} which, again, can be done using SVD, however, the one of $A^\top A$, which has less entries than $A$ as $n < m$.
My question is why would one want to compute the SVD of $A$ instead of the SVD of $A^\top A$?
As said in the comment, the condition number of $A^TA$ is the condition number of $A$ squared, so is usually a lot higher.
This leads to imprecisions in the computation of the SVD, or in inverting the matrix $A^TA$ in the system $A^TAx = A^Tb$.