How to calculate the derivative of $f(M) = \operatorname{Trace}(M^pX)$ with respect to $M$?

132 Views Asked by At

I have been thinking about this problem for a couple of months, and eventually failed. Could someone help me?

Let $M$ and $X$ be two symmetric matrices with $M\succeq 0$ and $X=X^T$, and let $p$ be a nonzero real number $|p|\le 10$. What is the derivative of $f$ with respect to $M$, where

$$f(M) = \operatorname{Trace}(M^pX),$$

and $\operatorname{Trace}(A)$ is the trace operator to calculate the sum of elements in diagonal line of $A$.

2

There are 2 best solutions below

4
On

Use a colon to denote the trace/Frobenius product $$\eqalign{ A:B = {\rm Tr}(A^TB) = {\rm Tr}(AB^T) }$$ From this definition and the cyclic property one can deduce the rearrangement rules $$\eqalign{ A:B &= B:A = B^T:A^T \\ A:BC &= B^TA:C = AC^T:B = I:A^TBC = I:BCA^T \\ }$$ If the matrices are symmetric, one can omit the transposes.

Consider the following function defined for integer $n$ values, for which one can calculate the differential and gradient. $$\eqalign{ f &= X:A^n \\ df &= X:\left(\sum_{k=0}^{n-1} A^{k}\,dA\,A^{n-k-1}\right) \\ &= \left(\sum_{k=0}^{n-1} A^{k}\,X\,A^{n-k-1}\right):dA \\ \frac{\partial f}{\partial A} &= \left(\sum_{k=0}^{n-1} A^{k}\,X\,A^{n-k-1}\right) \;\doteq\; G \qquad\{{\rm the\,gradient}\} \\\\ }$$ To extend this result to the matrix $M$ with a rational exponent $p=\ell/m$
define the matrix $A$ such that $$A^n = M^{\ell/m}\;\implies\; A^{mn}=M^{\ell}$$ Then find an expression for $dA$ in terms of $dM$, i.e. $$\eqalign{ \left(\sum_{k=0}^{mn-1} A^{k}\,dA\,A^{mn-k-1}\right) &= \left(\sum_{j=0}^{\ell-1} M^{j}\,dM\,M^{\ell-j-1}\right) \\ }$$ To avoid tensors, vectorize the matrix expressions $$\eqalign{ &a = {\rm vec}(A),\qquad m = {\rm vec}(M) \\ &\left(\sum_{k=0}^{mn-1} A^{k}\otimes A^{mn-k-1}\right)\,da = \left(\sum_{j=0}^{\ell-1} M^{j}\otimes M^{\ell-j-1}\right)\,dm \\ }$$ which can be abbreviated to $\,\big(B\,da = C\,dm\big).\,$ We'll also need two more vectorizations $$\eqalign{ x &= {\rm vec}(X),\qquad g &= {\rm vec}(G) &= \left(\sum_{k=0}^{n-1} A^{k}\otimes A^{n-k-1}\right)x \;\doteq\; Ex \\ }$$ Substitute all of this into the previous result to obtain $$\eqalign{ df &= g:da \\&= Ex:B^{-1}C\,dm \\&= CB^{-1}Ex:dm \\ \frac{\partial f}{\partial m} &= CB^{-1}Ex \quad\implies\quad \frac{\partial f}{\partial M} = {\rm vec}^{-1}\big(CB^{-1}Ex\big) \\ }$$ For non-rational exponents, it'll be even harder.

Update

The comments pointed out that the above formula only works for $n>0$. To handle negative exponents, you can do the following.

Write the function in terms of the inverse matrix $V=A^{-1},\;$ calculate its differential in terms of $dV.\;$ Finally, substitute $dV=-V\,dA\,V,\;$ i.e. $$\eqalign{ f &= X:A^{-n} \\&= X:V^n \\ df &= \left(\sum_{k=0}^{n-1} V^{k}\,X\,V^{n-k-1}\right):dV \\ &= -\left(\sum_{k=0}^{n-1} V^{k}\,X\,V^{n-k-1}\right):V\,dA\,V \\ &= -\left(\sum_{k=0}^{n-1} V^{k+1}\,X\,V^{n-k}\right):dA \\ \frac{\partial f}{\partial A} &= -\left(\sum_{k=0}^{n-1} V^{k+1}\,X\,V^{n-k}\right) \\ }$$
0
On

Let $f(A) = \operatorname{tr}(A)$ then

$df_A[H]=\operatorname{tr}H=(\operatorname{tr}\circ\operatorname{id})[H]=(\operatorname{tr}\circ\, dA_A)[H]$

for $A=M^pX$, and e.g., $p=2$, $dA_M=dM_M\,M\,X + M\,dM_M\,X$ and the chain rule gives (reusing letter $H$ for the auxiliary variable):

$df_M[H]=(\operatorname{tr}\circ\operatorname{dA_M})[H]=(\operatorname{tr}\circ\,(dM_M\,M\,X + M\,dM_M\,X))[H]=\operatorname{tr}(H\,M\,X + M\,H\,X)$

If products of three symmetric matrices are considered, any permutation is allowed, therefore

$df_M[H]=2\operatorname{tr}(M\,X\,H)=2\operatorname{tr}(M\,X\,\circ\,\operatorname{id})[H]$

More generally, $$d(\operatorname{tr}(M^pX))=p\operatorname{tr}(M^{p-1}X\circ\,\operatorname{id})$$