I am trying to find:
$\frac{\partial}{\partial A} \left||Q\circ C \right||^2_F $ $\quad$ and $\quad$ $\frac{\partial}{\partial B} \left||Q\circ C \right||^2_F $
where $C= B A^T$
C is (pxq) matrix, Q is (pxq) matrix, A is (qxr) matrix, B is (pxr) matrix,
and I'm using $A^T$ for the transpose of $A$
From the question / response:
Gradient of squared Frobenius norm of a Hadamard product
I see that $\frac{\partial}{\partial C} \left||Q\circ C \right||^2_F = 2 Q\circ C $
Is this as simple as using the chain rule?
If $f = \left||Q\circ C \right||^2_F$
$\frac{\partial f}{\partial A } = \frac{\partial C}{\partial A } \frac{\partial f}{\partial C }$
$\frac{\partial C}{\partial A } = B^T$
$\frac{\partial f}{\partial A } = 2 B^T( Q\circ C ) $
and
$\frac{\partial f}{\partial B } = \frac{\partial f}{\partial C } \frac{\partial C}{\partial B} $
$\frac{\partial C}{\partial B } = A$
$\frac{\partial f}{\partial A } = 2 ( Q\circ C ) A$
A follow up question I have is can we find the derivatives with respect to A and B for the $l_1$ norm of $Q\circ C$, which would be the sum of the absolute values of the elements of the matrix $Q\circ C$. I know that
$\sum_i \sum_j (Q\circ C)_{ij} = tr(Q C^T) = tr(C Q^T) $
that is, the sum of the entries of $Q\circ C$ is the trace of $Q C^T$ or $C Q^T$. But this doesn't take into account the absolute values.
Thank you in advance for any help you can provide.
Let's define some variables and the function of interest $$\eqalign{ C &= BA^T &\implies dC=B\,dA^T+dB\,A^T \cr X &= Q\circ C &\implies dX=Q\circ dC \cr \phi &= \|X\|_F^2 = X:X \cr }$$ where colon denotes the trace/Frobenius product, i.e. $$Q:C={\rm tr}(Q^TC)$$ Now find the differential of the function $$\eqalign{ d\phi &= 2X:dX \cr&= 2X:Q\circ dC \cr&= 2Q\circ X:dC\cr &= 2Q\circ X:(B\,dA^T+dB\,A^T) \cr &= 2(Q\circ X)^TB:dA + 2(Q\circ X)A:dB \cr }$$ To find the gradient wrt $A$, we hold $B$ constant so that $dB=0$. $$\eqalign{ \frac{\partial\phi}{\partial A} &= 2(Q\circ X)^TB }$$ Similarly to find the gradient wrt $B$, hold $A$ constant. $$\eqalign{ \frac{\partial\phi}{\partial B} &= 2(Q\circ X)A \cr\cr}$$ For the follow up question, consider the element-wise functions. $$\eqalign{ G &= {\rm signum}(X) \cr Y &= {\rm abs}(X) = G\circ X \cr }$$ The $\ell_1$ or Manhattan norm is given by $$\eqalign{ \mu &= 1:Y = G:X \cr }$$ Its differential is $$\eqalign{ d\mu &= G:dX \cr &= G:Q\circ dC \cr &= Q\circ G:dC \cr &= Q\circ G:(B\,dA^T+dB\,A^T) \cr &= (Q\circ G)^TB:dA + (Q\circ G)A:dB \cr }$$ and its gradients are $$\eqalign{ \frac{\partial\mu}{\partial A} &= (Q\circ G)^TB,\,\,\,\,\,\, \frac{\partial\mu}{\partial B} &= (Q\circ G)A \cr }$$