The context of the following identity is in the Classical Normal Linear Regression Model, ie, $\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\beta}+ \boldsymbol{u}$ where $\boldsymbol{u}$ is a $n \times 1$ matrix and $u_i \sim iid.N(0, \sigma^2)$ for $i = 1, 2, \cdots, n$
Show that $(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})'(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}) = (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{b})'(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{b})+(\boldsymbol{\beta}-\boldsymbol{b})'\boldsymbol{X}'\boldsymbol{X}(\boldsymbol{\beta}-\boldsymbol{b})$
where:
$\boldsymbol{y}$ is a $n \times 1$ matrix
$\boldsymbol{X}$ is a $n \times k$ matrix
$\boldsymbol{\beta}$ is a $k \times 1$ matrix
$\boldsymbol{b}$ is a $k \times 1$ matrix
$rank(\boldsymbol{X}) = k$
$\boldsymbol{b} = (\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{y}$
I can prove this expression by going from the RHS to LHS, however, I am not sure how to go from the LHS to the RHS (just assume that the expression on the RHS isn't known at first). Thanks.
Let $V = \mathrm{Im}(X)$ (it is a linear subspace of the euclidean space $\mathbb{R}^{n}$) and $\Pi_{V}$ be the orthogonal projection on $\mathrm{Im}(X)$.
In the following, if $z \in \mathbb{R}^{n}$, ${}^t z$ will denote the transpose of $z$. Let's first notice that :
$$ {}^t (y-X\beta)(y-X\beta) = \Vert y-X\beta \Vert^{2} $$
Which we can also write :
$$ \Vert y-\Pi_{V}(y) + \Pi_{V}(y) - X\beta \Vert^{2} $$
By definition, $\Pi_{V}(y) \in \mathrm{Im}(X)$ and $X\beta \in \mathrm{Im}(X)$, so that $\Pi_{V}(y)-X\beta \in \mathrm{Im}(X)$. And $y - \Pi_{V}(y) \in \mathrm{Im}(X)^{\perp}$. So, by Pythagore's theorem, we have :
$$ \Vert y-X\beta \Vert^{2} = \Vert y-\Pi_{V}(y) \Vert^{2} + \Vert \Pi_{V}(y) - X\beta \Vert^{2} \tag{$\star$} $$
Now, we need the following result :
Let $b=({}^t X X)^{-1} {}^t Xy$, we have : $\Pi_{V}(y) = Xb$. It follows that ($\star$) writes :
$$ \Vert y-Xb \Vert^{2} + \Vert Xb-X\beta \Vert^{2} \tag{$\star\star$} $$
The first term in the RHS of ($\star\star$) is ${}^t (y-Xb)(y-Xb)$ and the second term is ${}^t (\beta - b) {}^t X X (\beta - b)$. Eventually, we have :