my function is $f(W)=\operatorname{trace}(AW^TW)$ where $A$ is any matrix of size $r \times r$ and $W$ is the variable matrix of size $m\times r$. I found the gradient is given by $WA$ but I am a bit stuck for the hessian matrix. I think it should be $\operatorname{kron}(A^T, I)$ but I am not sure about that.
Could anyone explain ?
You can use $\mathrm{vec}(W)$ (column stacking) and obtain \begin{align} \mathrm{Tr}(AW^{\top}W)= \mathrm{Tr}(WAW^{\top})=\mathrm{vec}(W^{\top})^{\top}\mathrm{vec}(AW^{\top})=\mathrm{vec}(W^{\top})^{\top}(I_m\otimes A)\mathrm{vec}(W^{\top}). \end{align}