In class we learned how to derive the optimal line approximated by given points.
There are different ways to approach. We looked at it in a more algebraic way, using matrices and I'm having trouble doing a derivation:
Given some points $\{x_1, \cdots, x_n\} \subset \mathbb{R}^{m}$.
$$ \tag 1 (\mathbf{u},\boldsymbol{\mu}) \leftarrow arg min \left[ \frac{1}{n} \sum_{i=1}^{n} || \underbrace{ \boldsymbol{\mu} + \langle \mathbf{x}_i - \boldsymbol{\mu}, \mathbf{u} \rangle \mathbf{u}}_{\mathbf{\widehat{x}}_i} - \mathbf{x}_i ||^2 \right] $$ $$ \tag 2 = \left[ \frac{1}{n} \sum_{i=1}^{n} || \left( \mathbf{I} - \mathbf{u} \mathbf{u}^{T} \right) (\mathbf{x}_i - \boldsymbol{\mu})||^2 \right] $$
Where $\boldsymbol{\mu}$ and $\mathbf{u}$ are line variables and $\mathbf{\widehat{x}}_i$ is the projected point of $\mathbf{x}_i$ onto the line.
So we are trying to find the line that minimizes the distances from all the points. What I don't understand is how we can manage to derive $(1)$ into $(2)$.
The hint is to use the identity $\langle \mathbf{v}, \mathbf{u} \rangle \mathbf{u} = (\mathbf{u}\mathbf{u}^{T})\mathbf{v}$.
Help is appreciated.
$$ \boldsymbol{\mu} + \langle \mathbf{x}_i - \boldsymbol{\mu}, \mathbf{u} \rangle \mathbf{u} - \mathbf{x}_i = \langle \mathbf{x}_i - \boldsymbol{\mu}, \mathbf{u} \rangle \mathbf{u} - (\mathbf{x}_i-\boldsymbol{\mu})= (\mathbf{u}\mathbf{u}^T)(\mathbf{x}_i - \boldsymbol{\mu}) - \mathbf{I} (\mathbf{x}_i-\boldsymbol{\mu}) =(\mathbf{u}\mathbf{u}^T-\mathbf{I})(\mathbf{x}_i - \boldsymbol{\mu}). $$