I need to prove the that scaling x does not affect the prediction of Least Square solution. I think I need to prove the following equation, since scaling columns of X can be rewrite as multiplying a diagonal matrix W.
$$(X^TX)^{-1}X^Ty \cdot x = ((WX)^T(WX))^{-1}(WX)^Ty \cdot (Wx)$$
Here $X \in \mathbb{R}^{d \times n} $, $W \in \mathbb{R}^{d \times d}$, $y \in \mathbb{R}^n$ and $x \in \mathbb{R}^d$.
I know that r.h.s. can be rewritten as $$y^T(WX)((WX)^T(WX))^{-1}Wx$$ $$y^T(WX)(X^TWWX))^{-1}Wx$$
Any idea how I can prove they are equal. It seems to me that it's really hard to get this $W$ and $X$ out of the brackets, although it seems obvious that they should be equal, and it is always true for square matrices.
Remark: A mistake correction: $X \in \mathbb{R}^{n \times d}$. Otherwise the product $X^Ty$ is not compatible.
What you want to show is
$$(X^TX)^{-1}X^Ty.x=((XW)^T(XW))^{-1}(XW)^Ty.(Wx)$$
This can be shown as follows:
\begin{align} ((XW)^T(XW))^{-1}(XW)^Ty.(Wx) &=(WX^TXW)^{-1}(WX^Ty).(Wx)\\ &=W^{-1}(X^TX)^{-1}W^{-1}WX^Ty.(Wx)\\ &=W^{-1}(X^TX)^{-1}X^Ty.(Wx)\\ &=(W^{-1}(X^TX)^{-1}X^Ty)^T(Wx)\\ &=((X^TX)^{-1}X^Ty)^TW^{-1}(Wx)\\ &=((X^TX)^{-1}X^Ty)^T(x)\\ &=((X^TX)^{-1}X^Ty).x \end{align}