Consider a matrix $A(x)\in \mathbb{R}^{m,p}$, where $x\in \mathbb{R}^{n}$. The partial derivative of $A(x)$ by $x$ results in an $m$ x $pn$ matrix and thus, $\frac{\partial A(x)}{\partial x} \in \mathbb{R}^{m,pn}$. This is proven in the following paper where numerical layout was used. The paper also proves the following result $$\frac{\partial }{\partial x} (A(x)B(x))=\frac{\partial A(x)}{\partial x}(B(x)\otimes I_n) + A(x) \frac{\partial B(x)}{\partial x}$$
where $I_n$ is the identity matrix of size $n$. Following the derivation on the paper, I confirmed that if we had a vector $b\in\mathbb{R}^{p}$, the following is also true $$\frac{\partial }{\partial x} (A(x)b)=\frac{\partial A(x)}{\partial x}(b\otimes I_n) $$
Now I am trying to use this result but I'm getting the wrong matrix size. Here is my problem. Let $X=diag(x) \in \mathbb{R}^{n,n}$, $A(x) = GX$, where $G\in\mathbb{R}^{n,n}$ and size $p=n$, ie, now $b\in\mathbb{R}^n$. The following derivative result should be true $$\frac{\partial}{\partial x}(A(x)b)=\frac{\partial (GX)}{\partial x}(b\otimes I_n)$$ where $(b\otimes I_n)\in\mathbb{R}^{n^2,n}$ Now the issue is that of taking the partial derivative of $GX$ w.r.t $x$. So far I tried the following, $$\frac{\partial}{\partial X}(GX) = \frac{\partial}{\partial vecX}(I_n \otimes G)vec(X) = (I_n \otimes G)$$ and knowing that my matrix $X$ is a diagonal matrix, I would have changed the kronecker product to a Khatri-Rao product resulting in $(I_n\odot G)\in\mathbb{R}^{n^2,n}$ (though I'm not sure about this step). The issue remains that if I try to joint the resulting terms, the product $(I_n\odot G)(b\otimes I_n)$ cannot be computed due to size issues. If I had the transpose of $(I_n\odot G)$ the product would have been possible and the size of $(I_n\odot G)$ would have been as expected from the first derivative I shown from the paper.
Can anyone please show me were I'm getting it wrong?
Thanks
UPDATE: In my example, if I use the identity $vec(ABC)=(C^T\otimes A)vec(B)$, the result of the derivative would be$$ \frac{\partial}{\partial vec(X)}(vec(GXb)) = \frac{\partial}{\partial vec(X)}(b^t\otimes G)vec(X)=(b^t\otimes G)$$ and considering that $X$ is diagonal, we can change to a Khatri-Rao product leading to $(b^T\odot G)$.In my application, I need to get $G$ out as a separate matrix, thus the preference for the original answer's formulation. After some workarounds, I found out that the answer using my original formulation should be $$ GP^T(b\otimes I_n)$$ where $P = [e_1,e_{n+2},e_{2n+3} ... e_{n^2}] \in \mathbb{R}^{n^2,n}$ and $e_k \in \mathbb{R}^{n^2,1}$ is a column vector with unity element at position $k$. Computing the results will show that $$ GP^T(b\otimes I_n)=(b^T\odot G)$$ The problem still remains that I cannot manage to show that $$\frac{\partial}{\partial x}(A(x))=\frac{\partial}{\partial x}(GX) = GP^T \in \mathbb{R}^{n,n^{2}}$$ which make sense size-wise.
$ \def\LR#1{\left(#1\right)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\grad#1#2{\frac{\partial #1}{\partial #2}} \def\m#1{\left[\begin{array}{c}#1\end{array}\right]} \def\v#1{{\bf{#1}}} $The linked paper is terrible. It defines the gradient in terms of the Kronecker product as $$\grad{A}{x} \;=\; \LR{A\otimes\grad{}{x^T}}$$ This introduces either the insane convention of having the differential operator acting on the variable to its left or else a re-definition of the Kronecker product.
Here is a much better paper which covers integration as well as differentiation. It utilizes MacRae's (1974) definition, which in terms of the standard Kronecker product (and a differential operator acting on the variable to its right) is $$\grad{A}{x} \;=\; \LR{\grad{}{x}\otimes A}$$ Magnus and Neudecker recommend vectorizing all matrices, leading to the simple convention $$\grad{A}{x} \;\to\; {\grad{\vecc{A}}{x^T}}$$ However, the best approach is probably plain old index notation $$\LR{\grad{A}{x}}_{ijk} \;=\; \grad{A_{ij}}{x_k}$$ Unless Kang's notation is widespread in your particular field of study, I would recommend reading papers by other researchers. Seriously, any journal article by any author will be better than this paper.
Update
Just to be clear, here are the standard Kronecker products between a matrix and a vector $$\eqalign{ {\v{A}}\otimes{\v{x}} &= \m{ a_{11}\v{x} &\ldots &a_{1p}\v{x} \\ \vdots &\ddots &\vdots \\ a_{m1}\v{x} &\ldots &a_{mp}\v{x} \\ } \qquad\quad \v{x}\otimes\v{A} &= \m{x_1\v{A} \\ \vdots \\ x_n\v{A}} \\ }$$ Notice how the left factor is expanded before the right factor.
Now here is MacRae's definition of a matrix-by-vector gradient $$\eqalign{ \grad{\v{A}}{\v{x}} &= \m{\grad{\v{A}}{x_1} \\ \vdots \\ \grad{\v{A}}{x_n}} \;\equiv\; \LR{\grad{}{\v{x}}\otimes \v{A}} \\ }$$
and here is equation (8) from Khang's paper $$\eqalign{ \grad{\v{A}}{\v{x}} &= \m{ \grad{a_{11}}{\v{x}} &\grad{a_{12}}{\v{x}} &\ldots &\grad{a_{1p}}{\v{x}} \\ \grad{a_{21}}{\v{x}} &\grad{a_{22}}{\v{x}} &\ldots &\grad{a_{2p}}{\v{x}} \\ \vdots & \vdots & \ddots & \vdots \\ \grad{a_{m1}}{\v{x}} &\grad{a_{m2}}{\v{x}} &\ldots &\grad{a_{mp}}{\v{x}} \\ } \;\overset{?}{=}\; \LR{\v{A}\otimes\grad{}{\v{x^T}}} \\ }$$ The Kronecker product (unlike the matrix product) isn't reordered when transposed $$\LR{A\otimes B}^T \;=\; \LR{A^T\otimes B^T} \;\ne\; \LR{B^T\otimes A^T}$$ so my objection is not simply about the layout convention, which involves transposing the gradient as a whole, i.e. $$\LR{\grad{y}{x}} \quad-{\rm vs}-\quad \LR{\grad{y}{x}}^T$$ but rather the fundamental behavior of differential operators and Kronecker products.