So I'm trying to prove the following:
Show that a unique solution for linear regression exists i the features are not linearly dependent. Namely,show that a unique solution $argmin_{\theta} (X^{T}X)^{-1}X^{T}y$ exists iff $X$ has full column rank.
What I did was:
Theorem: For every matrix $A\in R^{nxm}$ we have $Null(A) = Null(A^{T}A)$ and $Rank(A) = Rank(A^{T}A)$
then I said: if $(X^{T}X)^{-1}X^{T}$ is a solution $\rightarrow$ $(X^{T}X)^{-1}$ is invertible $\rightarrow$ $Null(A^{T}A)=\{0\}$ $\rightarrow$ $Null(A) = \{0\}$ $\rightarrow$ $A$ is full rank
But I'm not sure this is the right way, also the opposite direction seems identical(reversed) to the one I wrote.
Would appreciate some help
Edit
So I derived the following expression for the normal equation through the derivation of the MSE loss:
$(X^{T}X)\theta = X^{T}y$
So $\theta$ is the parameter I used to derive with, and $X,y$ are the "training set"
So I want to prove now that the solution for $\theta$ that minimize the loss is unique given $X$ is full rank.
You have the right basic ideas, but the way you write things doesn't quite make sense. I would argue that the following statements are all equivalent:
Indeed, these steps make sense in both the forward and backward directions.