$w = (X^TX)^{-1}X^Ty$
Equation above produces weight $w$ which solves quadratic minimization problem in linear functions.
From my understanding, the goal is to find $w$ such that:
$Xw ≈ y$
Thus, I need to minimize this function:
$||Xw-y||^2$
According to wikipedia, Euclidean norm is used to minimize this function:
$2X^T(Xw - y) = 0$
Then this equation is differentiated with respect of $w$, from my understanding we use multivariable chain rule:
$X^TXw-X^Ty=0$
$X^TXw=X^Ty$
$w=(X^TX)^{-1}(X^Ty)$
Somehow, by utilizing Euclidean norm, I minimized the function, but I'm unable to understand how does it exactly work.
Why is Euclidean norm used to solve quadratic minimization problems in linear functions?
The best way to explain, in my opinion, is by projection.
Since $Xw = y$ has not an exact solution we look for $Xw = \bar y$ where $\bar y$ is the projection of $y$ in $Col(X)$.
The error is $e=y-\bar y=y-Xw$ and it is miminized when $e$ is orthogonal to $Col(X)$ that is
$$X^Te=X^T(y-Xw)=0\implies X^Ty=X^TXw\implies w=(X^TX)^{-1}X^Ty$$