I have a machine learning model. I want to find the critical point, at which the cost function converges. So I took it's derivative, set it equal to zero. I want to simplify it and find the optimal $w$.
This is what I've done so far.
The equation is: $$\frac{\partial L}{\partial w} = -\frac{2}{n}\sum_{i=1}^n x_i (y_i - x_i^T w) = 0$$
Solving step by step like this:
multiply both sides
$$\frac{2}{n}\sum_{i=1}^n x_i (y_i - x_i^T w) = 0 * \frac{n}{2}$$
$$\sum_{i=1}^n x_i y_i - \sum_{i=1}^n x_i x_i^T w= 0$$
$$\sum_{i=1}^n x_i y_i = \sum_{i=1}^n x_i x_i^T w$$
$w$ becomes a row vector to preserve the right orientation, so I transposed it:
$$\sum_{i=1}^n y_ix_i = w^T(\sum_{i=1}^n x_i x_i^T)$$
And this is where I stopped. How do I simplify it more, in order to get the $w$ ? Please I also need some explenations of the steps you provide, in order for me to understand and put the pieces together.
EDIT:
I multiplied both sides by the equation $(\sum_{i=1}^n x_i x_i^T)^{-1}$, so we get $$(\sum_{i=1}^n x_i x_i^T)^{-1} \sum_{i=1}^n y_ix_i = w^T(\sum_{i=1}^n x_i x_i^T)(\sum_{i=1}^n x_i x_i^T)^{-1}$$
So now how do I get $w$? thanks!