For an $m\times n$ matrix $A, m>n,$ how do I show or know that we can't directly apply the LU decomposition to solve the least squares problem?
I've seen another post that discusses something sort of surrounding it, but doesn't quite reach what I'm wondering. My intuition is that using the LU decomposition to find the LS solution is ill-conditioned, since we use the QR decomposition instead of the normal equations because despite the cost of the QR algorithm the normal equations are far worse conditioned. Am I in the right direction?
I don't think you can guarantee an LU decomposition exists without forming the normal equations (see the section for general matrices under Existence and Uniqueness on https://en.m.wikipedia.org/wiki/LU_decomposition). If you do form the normal equations, the LU decomposition can be accomplished as a Cholesky decomposition, since $A^TA$ is symmetric positive definite. However, $\kappa(A^TA) = \kappa(A)^2$, so your new system can have a much worse condition number.
You can guarantee that a reduced QR factorization for A exists, foregoing the need to form the normal equations. That's the benefit.