Matrix derivative of scalar function involving matrix square root

1.1k Views Asked by At

Let $X$ be a positive definite matrix with positive definite matrix square root $X^{1/2}$. Define $$y = \text{trace}(AX^{1/2})$$ some known matrix $A$. What is ${\partial y}/{\partial X}$ ? I tried using this set of notes together with the square root formula from here to evaluate it in the $2 \times 2$ case using index notation, but there must be a better way? Especially to generalize it to the $n \times n$ case. I would guess it is something like $$ \frac{\partial y}{\partial X} = \frac{1}{2} A^T X^{-1/2}$$ where $X^{-1/2}$ is the square root of $X^{-1}$.

1

There are 1 best solutions below

1
On BEST ANSWER

For convenience define a new matrix $$S=X^{1/2}$$ Let's start by finding the differential of $X$ in terms of the $S$-matrix. $$\eqalign{ X &= SS \cr dX &= dS\,S + S\,dS \cr dx &= (S^T\otimes I+I^T\otimes S)\,ds \cr ds &= (S\otimes I+I\otimes S)^{-1}\,dx = M\,dx \cr }$$ where in the last steps I've vectorized the results using the notation $ds={\rm vec}(dS)$ and $dx={\rm vec}(dX)$, and taken advantage of the fact that $S$ and $I$ are symmetric.

Now we need the Kronecker decomposition of the matrix $M$. Look for the classic paper "Approximation with Kronecker Products" by van Loan and Pitsianis, or Pitsianis' 1997 dissertation (which contains Matlab code). Despite the name of the paper, the Kronecker factorization is a full decomposition, not an approximation.

Anyway, the matrix can be decomposed into $$\eqalign{ M &= \sum_{k=1}^r Y_k\otimes Z_k \cr }$$ where $r={\rm rank}(\widetilde{M})$, the rank of the so-called tilde matrix of $M$, which is an operation which doesn't do any actual calculations, it merely reshapes and shuffles the elements of the matrix, but the operation does change the rank. Note that in this case, we want the the factors $\{Y_k, Z_k\}$ to be square matrices with the same dimensions as the other matrices $\{A, S, X\}$. The desired dimensions of the factors are one of the inputs to the tilde function.

Substituting this decomposition into the previous expression $$\eqalign{ ds &= M\,dx = \sum_{k=1}^r Y_k\otimes Z_k\,dx \cr dS &= \sum_{k=1}^r Z_k\,dX\,Y_k^T \cr \cr }$$

Finally let's write your function in of the inner/Frobenius product, i.e. $$A:B={\rm tr}(A^TB)$$ and find its differential. $$\eqalign{ y &= A:S \cr dy &= A:dS \cr &= A:\Bigg(\sum_{k=1}^r Z_k\,dX\,Y_k^T\Bigg) \cr &= \Bigg(\sum_{k=1}^r Z_k^T\,A\,Y_k\Bigg): dX \cr\cr }$$ This means that the gradient is $$\eqalign{ \frac{\partial y}{\partial X} &= \sum_{k=1}^r Z_k^T\,A\,Y_k \cr\cr }$$