I'd like someone smarter and more experienced than me to check my answer and give advice on how to do it better and derive a closed form for what I'm looking for.
Given a matrix $Y \in \mathbb R^{m \times k}$, $X \in \mathbb R^{m \times n}$ known matrices, and our matrices of unknown variables $\beta \in \mathbb R^{k \times n}$ we define a function $L(\beta) = \sum_{t = 1}^{m} ||Y_t-X_t\beta ^T||^2$ where $Y_t$ denotes the $t$'th row of $Y$ and $X_t$ denotes $t$'th row of $X$.
My goal was to find $\frac{\partial L(\beta)}{\partial \beta_{ij}}$.
My answer
The function $L$ is just the summation of the squared norms of the difference between row vectors. So if we write it explicitly it should be:
$L(\beta) = (Y_{11}-\sum_{r = 1}^{n}X_{1r}\beta^T _{r1})^2+...+(Y_{1k}-\sum_{r = 1}^{n}X_{1r}\beta^T _{rk})^2+...+(Y_{m1}-\sum_{r = 1}^{n}X_{mr}\beta^T _{r1})^2+...+(Y_{mk}-\sum_{r = 1}^{n}X_{mr}\beta^T _{mk})^2$
this is because the $d$'th entry of $X_t\beta ^T$ is $\sum_{r = 1}^{n}X_{tr}\beta^T _{rd}$ from matrix multiplication.
To simplify things, let work with $\beta$ rather than $\beta ^T$:
$L(\beta) = (Y_{11}-\sum_{r = 1}^{n}X_{1r}\beta _{1r})^2+...+(Y_{1k}-\sum_{r = 1}^{n}X_{1r}\beta _{kr})^2+...+(Y_{m1}-\sum_{r = 1}^{n}X_{mr}\beta _{1r})^2+...+(Y_{mk}-\sum_{r = 1}^{n}X_{mr}\beta _{km})^2$
We are only interested in terms where $\beta _{ij}$ appears. Those terms are $(Y_{1i}-\sum_{r = 1}^{n}X_{1r}\beta _{ir})^2$, $(Y_{2i}-\sum_{r = 1}^{n}X_{2r}\beta _{ir})^2$ and so on until $(Y_{mi}-\sum_{r = 1}^{n}X_{mr}\beta _{ir})^2$
So $\frac{\partial L(\beta)}{\partial \beta _{ij}}= \frac{\partial}{\partial \beta _{ij}}\sum_{t=1}^{m}(Y_{ti}-\sum_{r=1}^{n}X_{tr}\beta _{ir})^2 = -2\sum_{t=1}^{m}(Y_{ti}-\sum_{r=1}^{n}X_{tr}\beta _{ir})x_{tj}$
Is this the correct answer? If it is, is it possible (using matrix multiplication) to find a simple closed form without sigma signs for the entire matrix $\frac{\partial L(\beta)}{\partial \beta}$?
Stripping away the sigmas yields the matrix formula $$\eqalign{ \frac{\partial L}{\partial \beta} &= 2\,(\beta X^TX-Y^TX) }$$