I have the following minimization problem of:
$$ \min_W \sum_{n=1}^{N} \| l_n - P_n W x_n \|^2_2$$
where $l_n$ is a $C$ dimensional vector, $P_n$ is a $C \times L$ dimensional matrix, $W$ is a $L \times D$ dimensional matrix and $x_n$ is a $D$ dimensional vector. We also know that $D > C > L$ specifically. This is a convex problem and encouraged by its similarity with ordinary least squares $||b-Ax||_2^2$, I first tried to find a closed solution for it, without opting for numerical approaches. I take the derivative with respect to $W$ and find:
$$\dfrac{d}{dW}\sum_{n=1}^{N} \| l_n - P_nWx_n \|^2_2 = \sum_{n=1}^{N}-2P_n^T(l_n - P_nWx_n)x_n^T$$
But after this point, setting the derivative to zero and solving for the matrix $W$ doesn't seem to be doable to me. I just wanted to be sure and ask if we can solve the expression $$\sum_{n=1}^{N}-2P_n^T(l_n - P_nWx_n)x_n^T = 0$$ for $W$ analytically, without making use of tools of numerical optimization.
You did 95% of the work.
I will write the problem as:
$$ \hat{W} = \arg \min_{W} \frac{1}{2} {\left\| A W x - y \right\|}_{2}^{2} $$
This is a Convex smooth problem. Hence:
$$\begin{align*} \hat{W} = \arg \min_{W} \frac{1}{2} {\left\| A W x - y \right\|}_{2}^{2} & \Leftrightarrow \frac{\partial \frac{1}{2} {\left\| A \hat{W} x - y \right\|}_{2}^{2} }{\partial \hat{W}} = 0 \\ & \Leftrightarrow {A}^{T} \left( A \hat{W} x - y \right) {x}^{T} = 0 \\ & \Leftrightarrow {A}^{T} A \hat{W} x {x}^{T} = {A}^{T} y {x}^{T} \\ & \Leftrightarrow \hat{W} = {\left( {A}^{T} A \right)}^{-1} \left( {A}^{T} y {x}^{T} \right) {\left( x {x}^{T} \right)}^{-1} \\ \end{align*}$$
By the way, you could set $ z = \hat{W} x $ and then solve classic linear least squares for $ z $ yielding $ \hat{z} $. Then use:
$$ \hat{W} = \hat{z} {X}^{T} {\left( x {x}^{T} \right)}^{-1} $$
Dealing with the Sum Form
On my above solution I missed the Sum of the data. So let's take care of that.
Since the Derivative is linear we need to find a solution to:
$$ \sum_{n = 1}^{N} {A}^{T}_{n} {A}_{n} \hat{W} {x}_{n} {x}^{T}_{n} = \sum_{n = 1}^{N} {A}^{T}_{n} {y}_{n} {x}^{T}_{n} $$
We can rewrite this in the form:
$$ \sum_{n = 1}^{N} {B}_{n} \hat{W} {C}_{n} = D $$
Using the Kronecker Product one could see that:
$$ B \hat{W} C = D \Rightarrow \operatorname{Vec} \left( B \hat{W} C \right) = \operatorname{Vec} \left( D \right) \Rightarrow \left( {B}^{T} \otimes C \right) \operatorname{Vec} \left( \hat{W} \right) = \operatorname{Vec} \left( D \right) $$
So the above becomes:
$$ \left( \sum_{n = 1}^{N} \left( {B}^{T}_{n} \otimes {C}_{n} \right) \right) \operatorname{Vec} \left( \hat{W} \right) = \operatorname{Vec} \left( D \right) $$
Which can be written as a linear system.