I'm trying to derive a very simple matrix derivative:
Take the derivative of $\operatorname{Tr}(A' X)$ with respect to $X$.
However, I got two different answers by following different methods.
First Method: vec routine: $\operatorname{Tr}(A' X) = vec(A)' vec(X)$, so that $d(\operatorname{Tr}(A'X)) = vec(A)' dvec(X)$, and therefore the derivative is $vec(A)'$.
Second Method: element-wise :
$$\frac{\partial \operatorname{Tr}(A^{T}X)}{\partial Xij}% =\operatorname{Tr}(A^{T}E_{ij})=\operatorname{Tr}(e_{j}^{T}A^{T}e_{i})=A_{ij}$$
And therefore the derivative is $A$.
The first method is following the idea by Steven W. Nydick in this material.
And the Second one is introduced in 'Kronecker product and its application'.
Why these two methods give me different answers? Which one is correct?
ps: I guess I am little bit confused by how the first method is transforming the matrix derivative to vector derivative. Steven's material says that the matrix derivative of $f(X)$ w.r.t $X$ is equal to the derivative of $vec(f(X))$ w.r.t $vec(X)$.
Thanks.
P.s. Thanks to the help of user1551, I see that it is a layout transform issue. But I could not see how the transform is between $d(f(X))/d(X)$ and $d(vec(f(X))/d(vec(X))$, if $f$ returns a matrix.
Both are correct. They only differ in the layout of the derivative. In the first method, you group the partial derivatives into a vector; in the second one, you group them into a matrix. When people do usual multivariate calculus, they tend to use the first form; when they do matrix calculus, they tend to adopt the second form.