I have the following problem:
Find the matrix derivative $\frac{dY}{dX}$, where $Y=(X')^2B$, matrix $X$ is $p \times q$ and $B$ is a given matrix.
I have gotten this far:
By matrix derivative definition we can write:
$$ \frac{dY}{dX} =\frac{d}{dvec'X} \otimes vec(Y)= \frac{d}{dvec'X} \otimes vec((X')^2B) $$
Using the vec property (v)
$$ vec(ABC) = (C'\otimes A)vec(B) $$
We can write the element $vec((X')^2B) = vec(X'X'B)\underbrace{=}_{(v)} (B'\otimes X')vec(X')$
so we get
$$ \frac{dY}{dX}=\frac{d}{dvec'X} \otimes \Big[(B'\otimes X')vec(X')\Big] $$ Using the Kroenecker product property $$ (A\otimes C)\cdot (B\otimes D) = (AB)\otimes(CD) $$ we can write
$$ 1\cdot \frac{d}{dvec'X} \otimes \Big[(B'\otimes X')vec(X')\Big] = \Big[1\otimes (B'\otimes X')\Big]\Big[\frac{d}{dvec'X} \otimes vec(X')\Big] = \Big[B'\otimes X'\Big]\Big[\frac{d}{dvec'X} \otimes vec(X')\Big] $$
So I feel like I am almost there, but I don't get the concept of the matrix differentiation and notation and the difference between $vec'X$ and $vec(X')$. I feel like we can somehow cancel out the last product...
Any tips appericated!
Don't apply vectorization too early in the process.
The first step is to calculate the differential of your function. $$\eqalign{ Y &= X^TX^TB \\ dY &= \color{red}{dX^T}X^TB + X^T\color{red}{dX^T}B \\ }$$ The second step is vectorization. $$\eqalign{ \operatorname{vec}(dY) &= \left((X^TB)^T\otimes I\right)\operatorname{vec}(dX^T) + \left(B^T\otimes X^T\right)\operatorname{vec}(dX^T) \\ &= \left(B^TX\otimes I + B^T\otimes X^T\right)K\operatorname{vec}(dX) \\ }$$ where $K$ is the commutation matrix associated with Kronecker products, i.e. $$\eqalign{ \operatorname{vec}(A^T) &= K\operatorname{vec}(A) \\ }$$ Now it's a simple matter to identify the gradient as $$\eqalign{ \frac{\partial\operatorname{vec}(Y)}{\partial\operatorname{vec}(X)} &= \left(B^TX\otimes I + B^T\otimes X^T\right)K \\ }$$