Folks - Please help. What's the gradient for the cost function below?
$ D(Y||AX)=\frac{1}{2} ||Y-AX||^2_F $
Additional info -
-need to get the derivative of that with respect to A. -Multiplicative NMF Algorithms Based on the Squared Euclidean Distance
Let $$ J = \frac{1}{2}\sum_{ij} \left(\sum_k a_{ik}x_{kj} - y_{ij}\right)^2. $$ Then $$ \frac{\partial J}{\partial A} = \frac{\partial J}{\partial a_{lm}} = \sum_{ij} \left(\sum_k a_{ik}x_{kj} - y_{ij}\right) \frac{\partial \left(\sum_k a_{ik}x_{kj} - y_{ij}\right)}{\partial a_{lm}} =\\ = \sum_{ij} \left(\sum_k a_{ik}x_{kj} - y_{ij}\right) \frac{\partial \left(\sum_k a_{ik}x_{kj}\right)}{\partial a_{lm}} =\\ = \sum_{ij} \left(\sum_k a_{ik}x_{kj} - y_{ij}\right) \delta_{il}x_{mj} =\\ = \sum_{j} \left(\sum_k a_{lk}x_{kj} - y_{lj}\right) x_{mj} =\\ = (AX-Y)X^\top. $$
The following Kronecker delta properties were used $$ \frac{\partial x_i}{\partial x_j} = \delta_{ij} = \begin{cases} 1, & i = j\\ 0, & i \neq j \end{cases}. $$ Similary $$ \frac{\partial a_{ij}}{\partial a_{kl}} = \begin{cases} 1, & i = k \text{ and } j = l\\ 0, & i \neq k \text { or } j \neq l \end{cases} = \delta_{ik} \delta_{jl}. $$ Eliminating $\delta_{ij}$ is done via summation $$ \sum_j \delta_{ij} x_j = \sum_{j \neq i} \delta_{ij} x_j + x_i = x_i. $$ In other words $$ \sum_i \delta_{ij} \bullet \to \bullet \text{ with } i \text{ replaced by }j. $$
Let's consider the tricky part $$ \frac{\partial \left(\sum_k a_{ik}x_{kj}\right)}{\partial a_{lm}} = \sum_k \frac{\partial a_{ik}}{\partial a_{lm}} x_{kj} + \sum_k a_{ik} \frac{\partial x_{kj}}{\partial a_{lm}} = \sum_{k} \delta_{il}\delta_{km} x_{kj} = \delta_{il} x_{mj} $$