I doubt this is solvable at all, but I thought I will give a try. Essentially I am trying to extend Gauss-Newton algorithm to 2nd Taylor term.
http://en.wikipedia.org/wiki/Levenberg–Marquardt_algorithm
Just ignore lambda.
I used same steps and came to a dead end...
$$\frac{\partial}{\partial X} \| (I \otimes X^\top)AX + BX + C\|^2$$
Is there a solution to this?
Kind regards, Dominykas
For convenience, define $F=(I\otimes X^T)AX+BX+C$
You want to find the derivative of $\|F\|^2_F = F:F$
Start with the differential $$\eqalign{ d(F:F) &= 2\,F:dF \cr &= 2\,F:\Big[(I\otimes dX^T)AX+(I\otimes X^T)A\,dX+B\,dX\Big] \cr &= 2\,F:(I\otimes dX^T)AX+2\,F:(I\otimes X^T)A\,dX+2\,F:B\,dX \cr &= 2\,AXF^T:(I\otimes dX)+2\,A^T(I\otimes X)F:dX+2\,B^TF:dX \cr }$$ The tricky bit is the first term on the RHS. Assume that we know the Kronecker expansion $AXF^T=\sum_k G_k\otimes H_k$, where ($G_k,H_k$) have the same dimensions as ($I,X$) respectively. And use this to simplify the first term $$\eqalign{ (AXF^T):(I\otimes dX) &= \sum_k (G_k\otimes H_k):(I\otimes dX) \cr &= \sum_k (G_k:I)\otimes(H_k:dX) \cr &= \bigg(\sum_k {\rm tr}(G_k)\,H_k\bigg):dX \cr &= Y:dX \cr }$$ Substituting $$\eqalign{ d\,\|F\|^2 &= 2\,Y:dX+2\,A^T(I\otimes X)F:dX+2\,B^TF:dX \cr\cr \frac {\partial\,\|F\|^2} {\partial X} &= 2\,Y+2\,A^T(I\otimes X)F+2\,B^TF \cr }$$ So there is an answer to your question, but finding the Kronecker factors of $AXF^T$ at each step of your optimization routine is going to be expensive.