Derivative of 'inside' matrix in the squared Frobenius norm of a product of three matrices

194 Views Asked by At

How does one approach finding $\nabla_X\|AXB\|_F^2$?

Normally I would expand to $\|AXB\|_F^2=\text{Trace}(B^TX^TA^TAXB)$ and rearrange using the cyclic property of traces to make $X$ the first and/or last matrix in the product, but this isn't possible here.

$A$ is wide and $B$ is tall; both are full-rank. $X$ is square. All matrices are real-valued.

2

There are 2 best solutions below

0
On BEST ANSWER

Look up formula (116) in the Matrix Cookbook:

$$\frac{\partial}{\partial X} \operatorname{tr}( B^T X^T C X B ) = C^T X B B^T + C X B B^T$$

So, the answer is:

$$2 A^T A X B B^T$$

0
On

Use a colon as a convenient product notation for the trace $$\eqalign{ A:B &= {\rm Tr}(A^TB) \\ A:A &= \big\|A\big\|^2_F \\ }$$ and, for typing convenience, define the matrix $\;Y = AXB$.

Write the cost function in terms of this new variable, then calculate its differential and gradient. $$\eqalign{ \phi &= \big\|Y\big\|^2_F = Y:Y \\ d\phi &= 2Y:dY = 2Y:A\,dX\,B = 2A^TYB^T:dX \\ \frac{\partial \phi}{\partial X} &= 2A^TYB^T = 2A^TAXBB^T \\ }$$