For the least squares problem $$\arg \min_\textbf{A} \sum_n(\mathbf{X}_{c\times n} - \mathbf{A}_{c\times k}\mathbf{W}_{k\times c}\mathbf{X}_{c\times n})^2$$ I am able to find the solution (if $\mathbf{W}_{k\times c}$ and $\mathbf{X}_{c\times n}$) are known (by solving an OLS problen).
However, how can I solve the same least squares problem (analytically, i.e. without gradient descent) for $\mathbf{W}_{k\times c}$ (if the other matrices are known)?
I would appreciate any hint onto how to to re-arrange the above formula.
Well, here is the Gradient:
$$ \frac{\partial }{\partial W} \frac{1}{2} {\left\| X - A W X \right\|}_{2}^{2} = -{A}^{T} \left( X - A W X \right) {X}^{T} $$
Now, equate this to $ 0 $ and see that:
$$ W = {\left( {A}^{T} A \right)}^{-1} {A}^{T} X {X}^{T} {\left( X {X}^{T} \right)}^{-1} = {\left( {A}^{T} A \right)}^{-1} {A}^{T} $$
At first it seems strange as $ W $ is independent of $ X $.
Yet plugging it back to the objective function yields:
$$ X - A W X = X - A {\left( {A}^{T} A \right)}^{-1} {A}^{T} X $$
Now the term $ A {\left( {A}^{T} A \right)}^{-1} {A}^{T} \approx I $ for most cases hence the solution works.