I am looking for the gradient of the following cost function
$||T^{-1}AT|| + ||T^{-1}BT||$
with respect to $T$. $A$, $B$ are real square matrices. $T$ is a coordinate change of corresponding dimension. The norm can either be Frobenius or L2 (either euclidean or spectral).
My problem occurs since the diagonalization transformation on $A$ or the jordan canonical transformation which minimizes its norm (under certain assumptions), can cause $||B||$ to explode and viceversa. I believe this is a very hard, impossible, problem to find global minima and I am looking into all mathematical insight and algorithms I could use to help me find a consistently a similar local minima
Thank you
Let's use a colon to denote the trace/Frobenius product, $$\eqalign{ A:B &= {\rm tr\,}(A^TB) \cr \|A\|_F^2 &= {\rm tr\,}(A^TA) = A:A \cr }$$ And let's define the variables $$\eqalign{ X &= T^{-1}AT &\implies dX = T^{-1}\big(A\,dT-dT\,X\big) \cr Y &= T^{-1}BT &\implies dY = T^{-1}\big(B\,dT-dT\,Y\big) \cr \alpha^2 &= \|T^{-1}AT\|_F^2 &= \|X\|_F^2 = X:X \cr \beta^2 &= \|T^{-1}BT\|_F^2 &= \|Y\|_F^2 = Y:Y \cr }$$ Find the differential and then the gradient of $\alpha$ $$\eqalign{ 2\alpha\,d\alpha &= 2X:dX \cr &= 2X:T^{-1}\big(A\,dT-dT\,X\big) \cr &= 2T^{-T}X:\big(A\,dT-dT\,X\big) \cr &= 2\big(A^TT^{-T}X-T^{-T}XX^T\big):dT \cr &= 2T^{-T}\big(X^TX-XX^T\big):dT \cr \frac{\partial\alpha}{\partial T} &= \alpha^{-1}T^{-T}\big(X^TX-XX^T\big) \cr }$$ The calculation for $\beta$ is similar and yields $$\eqalign{ \frac{\partial\beta}{\partial T} &= \beta^{-1}T^{-T}\big(Y^TY-YY^T\big) \cr }$$ So, if we choose the Frobenius norm, then your cost function $(\phi)$ and its gradient is given by
$$\eqalign{ \phi &= \alpha + \beta \cr \frac{\partial\phi}{\partial T} &= \frac{\partial\alpha}{\partial T} + \frac{\partial\beta}{\partial T} \cr &= T^{-T}\Bigg(\frac{X^TX-XX^T}{\|X\|_F} + \frac{Y^TY-YY^T}{\|Y\|_F}\Bigg) \cr\cr }$$ Note that the cyclic property of the trace gives us several ways to rearrange the terms in a Frobenius product. For example, all of the following are equivalent $$\eqalign{ A:BC &= A^T:(BC)^T \cr &= BC:A \cr &= AC^T:B \cr &= B^TA:C \cr }$$