I am sitting way too long on this problem. I hope I can get some help. Lets assume that $z$ and $x$ are vectors and $A$ is a matrix. I want to show that $$\nabla_\text{A} f(z) = (\nabla_\text{z}f(z) \otimes x$$ where $$z=Ax$$
I can write $(\nabla_\text{A})_{ij} = \frac{\partial}{\partial A_{ij}}$ and the outer product as $(u \otimes v)_{i,j} = u_i v_j$
I start with the left hand site:
\begin{eqnarray} (\nabla_\text{A} f(z))_{ij} &=& \frac{\partial}{\partial A_{ij}} f(z)\\ &=& \frac{\partial}{\partial A_{ij}} f(Ax)\\ &=& \frac{\partial}{\partial A_{ij}} f(Ax)x_j\\ &=& (\frac{\partial}{\partial A_{ij}}f(Ax) \otimes x)_{ij}\\ &=& (\frac{\partial}{\partial A_{ij}}f(z) \otimes x)_{ij}\\ \end{eqnarray}
But here I got stuck. The derivative is with respect to $A_{ij}$ and not $z_i$. Does someone see my error? I am also not sure if I did the inner derivative right (2nd-3rd step).
Thanks!
Assume that you know how to calculate a function and its gradient wrt $z$ $$\eqalign{f &= f(z),\,\,\,\,\,f_z = \nabla_zf}$$ Then someone tells you that $z$ is really a function of some other variables, namely $$\eqalign{z &= Ax}$$ and now you wish to find the gradient wrt one of the new variables.
So write the differential of the function in terms of the old gradient, perform a change of variables, and then recover the new gradient $$\eqalign{ df &= f_z:dz \cr &= f_z:dA\,x \cr &= f_z\,x^T:dA \cr &= f_A:dA \cr \nabla_Af=f_A &= f_z\,x^T \cr\cr }$$ In some of the intermediate steps, a colon was used to denote the trace/Frobenius product, i.e. $\,\,A:B={\rm tr}(A^TB)$
The cyclic property of the trace, gives rise to some rules for rearranging terms in the Frobenius product, such as $$\eqalign{ A:BC &= AC^T:B \cr A:BC &= B^TA:C \cr A:BC &= BC:A \cr }$$ Also note that the matrix (or vector) on each side of the colon operator has the same shape. In precisely the same way that the matrix on each side of an elementwise/Hadamard product must have the same shape.