Prove scaling does not affect Least Square solution

202 Views Asked by At

I need to prove the that scaling x does not affect the prediction of Least Square solution. I think I need to prove the following equation, since scaling columns of X can be rewrite as multiplying a diagonal matrix W.

$$(X^TX)^{-1}X^Ty \cdot x = ((WX)^T(WX))^{-1}(WX)^Ty \cdot (Wx)$$

Here $X \in \mathbb{R}^{d \times n} $, $W \in \mathbb{R}^{d \times d}$, $y \in \mathbb{R}^n$ and $x \in \mathbb{R}^d$.

I know that r.h.s. can be rewritten as $$y^T(WX)((WX)^T(WX))^{-1}Wx$$ $$y^T(WX)(X^TWWX))^{-1}Wx$$

Any idea how I can prove they are equal. It seems to me that it's really hard to get this $W$ and $X$ out of the brackets, although it seems obvious that they should be equal, and it is always true for square matrices.

1

There are 1 best solutions below

0
On

Remark: A mistake correction: $X \in \mathbb{R}^{n \times d}$. Otherwise the product $X^Ty$ is not compatible.

What you want to show is

$$(X^TX)^{-1}X^Ty.x=((XW)^T(XW))^{-1}(XW)^Ty.(Wx)$$

This can be shown as follows:

\begin{align} ((XW)^T(XW))^{-1}(XW)^Ty.(Wx) &=(WX^TXW)^{-1}(WX^Ty).(Wx)\\ &=W^{-1}(X^TX)^{-1}W^{-1}WX^Ty.(Wx)\\ &=W^{-1}(X^TX)^{-1}X^Ty.(Wx)\\ &=(W^{-1}(X^TX)^{-1}X^Ty)^T(Wx)\\ &=((X^TX)^{-1}X^Ty)^TW^{-1}(Wx)\\ &=((X^TX)^{-1}X^Ty)^T(x)\\ &=((X^TX)^{-1}X^Ty).x \end{align}