Gradient of a Frobenium norm cost Function

512 Views Asked by At

Folks - Please help. What's the gradient for the cost function below?

$ D(Y||AX)=\frac{1}{2} ||Y-AX||^2_F $

Additional info -

-need to get the derivative of that with respect to A. -Multiplicative NMF Algorithms Based on the Squared Euclidean Distance

2

There are 2 best solutions below

1
On BEST ANSWER

Let $$ J = \frac{1}{2}\sum_{ij} \left(\sum_k a_{ik}x_{kj} - y_{ij}\right)^2. $$ Then $$ \frac{\partial J}{\partial A} = \frac{\partial J}{\partial a_{lm}} = \sum_{ij} \left(\sum_k a_{ik}x_{kj} - y_{ij}\right) \frac{\partial \left(\sum_k a_{ik}x_{kj} - y_{ij}\right)}{\partial a_{lm}} =\\ = \sum_{ij} \left(\sum_k a_{ik}x_{kj} - y_{ij}\right) \frac{\partial \left(\sum_k a_{ik}x_{kj}\right)}{\partial a_{lm}} =\\ = \sum_{ij} \left(\sum_k a_{ik}x_{kj} - y_{ij}\right) \delta_{il}x_{mj} =\\ = \sum_{j} \left(\sum_k a_{lk}x_{kj} - y_{lj}\right) x_{mj} =\\ = (AX-Y)X^\top. $$

The following Kronecker delta properties were used $$ \frac{\partial x_i}{\partial x_j} = \delta_{ij} = \begin{cases} 1, & i = j\\ 0, & i \neq j \end{cases}. $$ Similary $$ \frac{\partial a_{ij}}{\partial a_{kl}} = \begin{cases} 1, & i = k \text{ and } j = l\\ 0, & i \neq k \text { or } j \neq l \end{cases} = \delta_{ik} \delta_{jl}. $$ Eliminating $\delta_{ij}$ is done via summation $$ \sum_j \delta_{ij} x_j = \sum_{j \neq i} \delta_{ij} x_j + x_i = x_i. $$ In other words $$ \sum_i \delta_{ij} \bullet \to \bullet \text{ with } i \text{ replaced by }j. $$

Let's consider the tricky part $$ \frac{\partial \left(\sum_k a_{ik}x_{kj}\right)}{\partial a_{lm}} = \sum_k \frac{\partial a_{ik}}{\partial a_{lm}} x_{kj} + \sum_k a_{ik} \frac{\partial x_{kj}}{\partial a_{lm}} = \sum_{k} \delta_{il}\delta_{km} x_{kj} = \delta_{il} x_{mj} $$

4
On

Let $M=(AX-Y)$, then the function and its differential can be expressed in terms of the Frobenius (:) product as $$\eqalign{ f &= \frac{1}{2}\,M:M \cr\cr df &= M:dM \cr &= (AX-Y):dA\,X \cr &= (AX-Y)X^T:dA \cr }$$ Since $df=(\frac{\partial f}{\partial A}):dA\,\,$ the gradient must be $$\eqalign{ \frac{\partial f}{\partial A} &= (AX-Y)X^T \cr }$$ If you dislike Frobenius products, you can replace them with traces, e.g. $\,{\rm tr}(A^TB)=A:B$