Derivative of trace of inverse of a matrix function

1.1k Views Asked by At

I am trying to derive the derivative of the trace of inverse of a matrix function (of X), i.e. $$f(X)=Tr\left((HXH^{H}+I)^{-1}\right) $$ where $H\in R^{n\times m}, X\in R^{m\times m}$.

So $HXH^{H}+I$ is invertible while $HXH^{H}$ is not. Any one know how to find $\frac{\partial f}{\partial X}$ ?

Thanks in advance!

2

There are 2 best solutions below

9
On

Note: In the below, I used $X$ instead of $H$ by accident. Also, $X^*$ is my notation for $X^H$.


We can find this inverse by the chain rule. Note that $f = f_1 \circ f_2 \circ f_3$, where $$ f_1(A) = Tr(A)\\ f_2(A) = A^{-1}\\ f_3(A) = XAX^* + I $$ The derivatives of these functions are given by $$ [f_1'(A)](B) = Tr(B)\\ [f_2'(A)](B) = -A^{-1}BA^{-1}\\ [f_3'(A)](B) = XBX^*\\ $$ The chain rule tells us that $$ [(f_1 \circ f_2 \circ f_3)'(A)] = \\ [f_1'(f_2(f_3(A)))]\circ [f_2'(f_3(A))] \circ f_3'(A) $$ So, all together, that gives us $$ [f'(A)] = \\ (B \mapsto Tr[(XBX^* + I)^{-1}])\circ (B \mapsto -A^{-1}(XBX^* + I)A^{-1})\circ (B \mapsto XBX^*) =\\ (B \mapsto Tr[(XBX^* + I)^{-1}])\circ (B \mapsto -A^{-1}(X[XBX^*]X^* + I)A^{-1}) =\\ (B \mapsto Tr[(XBX^* + I)^{-1}])\circ (B \mapsto -A^{-1}(X^2BX^{*2} + I)A^{-1}) =\\ (B \mapsto Tr[(X[-A^{-1}(X^2BX^{*2} + I)A^{-1}]X^* + I)^{-1}]) $$ In other words, if $A$ is parameterized as a function of $t \in \Bbb R$, then we can write $$ \frac{d}{dt}f(A(t)) = Tr[(X[-A^{-1}(X^2\frac{dA}{dt}X^{*2} + I)A^{-1}]X^* + I)^{-1}] $$


Clarification of my derivatives:

By $[f'(A)](B)$, I mean that for every $A$, $f'(A)$ is a function on $B$. You are right to be confused. Let's look at an example.

Take $f(X) = Tr(X)$, and find its derivative my way. In fact, $f$ is linear, so this will be easy: we have $$ [f'(A)](B) = Tr(B) $$ Note that, in this case, $A$ doesn't pop up in the definition of $f'$. In other words, $f$ has a constant derivative. This makes sense because $f$ is linear. Another way of writing the above is to say that if $A(t)$ is a matrix-valued function on $t \in \Bbb R$, then $$ \frac{d}{dt}f(A(t)) = Tr \frac{dA}{dt} $$ Now, my understanding of your notation is that $$ \frac{\partial f}{\partial X}[i,j] = \frac{\partial f}{x_{ji}} $$ equivalently, define $A_{ij}(t) = A + t E_{ji}$ (where $E_{ji}$ is the matrix with $0$s everywhere except at the $j,i$ entry, where it has a $1$). Then we should have $$ \frac{\partial f}{\partial X}(A)[i,j] = \frac {d}{dt} f(A_{ij}(t)) $$ By my above statement, that means we can write $$ \frac{\partial f}{\partial X}(A)[i,j] = Tr(E_{ji}) = \begin{cases} 1 & i=j\\ 0 & i \neq j \end{cases} $$ In other words, $\frac {\partial f}{\partial X}$ is $I$, which is exactly what we expected.

1
On

I got an answer myself in a relative simple way: $$\frac{\partial f}{\partial X}=H^H((HXH^H+I)^{-2})^{H}H$$ through the chain rule. That is, let $U=HXH^H$ and $g(U)=Tr((U+I)^{-1})$, then $$\frac{\partial f}{\partial X_{i,j}}=Tr[(\frac{\partial g(U)}{\partial U})^H \frac{\partial U}{\partial X_{i,j}}]$$.Since we can easily get $\frac{\partial g(U)}{\partial U}=(U+I)^{-2}$ and $$U=\sum_i \sum_j X_{i,j} h_i h_j^H$$, we know $$\frac{\partial U}{\partial X_{i,j}}=h_i h_j^H$$.We thus have $$\frac{\partial f}{\partial X_{i,j}}=Tr[((HXH^H+I)^{-2})^{H}h_ih_j^{H}]=Tr[h_j^{H}((HXH^H+I)^{-2})^{H}h_i]=h_j^{H}((HXH^H+I)^{-2})^{H}h_i$$. Reorganizing them in a compact form we have the above result (hopefully there is no error). Actually the $X$ I am dealing with is diagonal, so the derivation is simpler than above. Thank you all for your attention and especially for @Omnomnomnom!