Least Squares Solution with Three Matrices Involved

63 Views Asked by At

For the least squares problem $$\arg \min_\textbf{A} \sum_n(\mathbf{X}_{c\times n} - \mathbf{A}_{c\times k}\mathbf{W}_{k\times c}\mathbf{X}_{c\times n})^2$$ I am able to find the solution (if $\mathbf{W}_{k\times c}$ and $\mathbf{X}_{c\times n}$) are known (by solving an OLS problen).

However, how can I solve the same least squares problem (analytically, i.e. without gradient descent) for $\mathbf{W}_{k\times c}$ (if the other matrices are known)?

I would appreciate any hint onto how to to re-arrange the above formula.

1

There are 1 best solutions below

0
On BEST ANSWER

Well, here is the Gradient:

$$ \frac{\partial }{\partial W} \frac{1}{2} {\left\| X - A W X \right\|}_{2}^{2} = -{A}^{T} \left( X - A W X \right) {X}^{T} $$

Now, equate this to $ 0 $ and see that:

$$ W = {\left( {A}^{T} A \right)}^{-1} {A}^{T} X {X}^{T} {\left( X {X}^{T} \right)}^{-1} = {\left( {A}^{T} A \right)}^{-1} {A}^{T} $$

At first it seems strange as $ W $ is independent of $ X $.
Yet plugging it back to the objective function yields:

$$ X - A W X = X - A {\left( {A}^{T} A \right)}^{-1} {A}^{T} X $$

Now the term $ A {\left( {A}^{T} A \right)}^{-1} {A}^{T} \approx I $ for most cases hence the solution works.