I am trying to derive the derivative of the trace of inverse of a matrix function (of X), i.e. $$f(X)=Tr\left((HXH^{H}+I)^{-1}\right) $$ where $H\in R^{n\times m}, X\in R^{m\times m}$.
So $HXH^{H}+I$ is invertible while $HXH^{H}$ is not. Any one know how to find $\frac{\partial f}{\partial X}$ ?
Thanks in advance!
Note: In the below, I used $X$ instead of $H$ by accident. Also, $X^*$ is my notation for $X^H$.
We can find this inverse by the chain rule. Note that $f = f_1 \circ f_2 \circ f_3$, where $$ f_1(A) = Tr(A)\\ f_2(A) = A^{-1}\\ f_3(A) = XAX^* + I $$ The derivatives of these functions are given by $$ [f_1'(A)](B) = Tr(B)\\ [f_2'(A)](B) = -A^{-1}BA^{-1}\\ [f_3'(A)](B) = XBX^*\\ $$ The chain rule tells us that $$ [(f_1 \circ f_2 \circ f_3)'(A)] = \\ [f_1'(f_2(f_3(A)))]\circ [f_2'(f_3(A))] \circ f_3'(A) $$ So, all together, that gives us $$ [f'(A)] = \\ (B \mapsto Tr[(XBX^* + I)^{-1}])\circ (B \mapsto -A^{-1}(XBX^* + I)A^{-1})\circ (B \mapsto XBX^*) =\\ (B \mapsto Tr[(XBX^* + I)^{-1}])\circ (B \mapsto -A^{-1}(X[XBX^*]X^* + I)A^{-1}) =\\ (B \mapsto Tr[(XBX^* + I)^{-1}])\circ (B \mapsto -A^{-1}(X^2BX^{*2} + I)A^{-1}) =\\ (B \mapsto Tr[(X[-A^{-1}(X^2BX^{*2} + I)A^{-1}]X^* + I)^{-1}]) $$ In other words, if $A$ is parameterized as a function of $t \in \Bbb R$, then we can write $$ \frac{d}{dt}f(A(t)) = Tr[(X[-A^{-1}(X^2\frac{dA}{dt}X^{*2} + I)A^{-1}]X^* + I)^{-1}] $$
Clarification of my derivatives:
By $[f'(A)](B)$, I mean that for every $A$, $f'(A)$ is a function on $B$. You are right to be confused. Let's look at an example.
Take $f(X) = Tr(X)$, and find its derivative my way. In fact, $f$ is linear, so this will be easy: we have $$ [f'(A)](B) = Tr(B) $$ Note that, in this case, $A$ doesn't pop up in the definition of $f'$. In other words, $f$ has a constant derivative. This makes sense because $f$ is linear. Another way of writing the above is to say that if $A(t)$ is a matrix-valued function on $t \in \Bbb R$, then $$ \frac{d}{dt}f(A(t)) = Tr \frac{dA}{dt} $$ Now, my understanding of your notation is that $$ \frac{\partial f}{\partial X}[i,j] = \frac{\partial f}{x_{ji}} $$ equivalently, define $A_{ij}(t) = A + t E_{ji}$ (where $E_{ji}$ is the matrix with $0$s everywhere except at the $j,i$ entry, where it has a $1$). Then we should have $$ \frac{\partial f}{\partial X}(A)[i,j] = \frac {d}{dt} f(A_{ij}(t)) $$ By my above statement, that means we can write $$ \frac{\partial f}{\partial X}(A)[i,j] = Tr(E_{ji}) = \begin{cases} 1 & i=j\\ 0 & i \neq j \end{cases} $$ In other words, $\frac {\partial f}{\partial X}$ is $I$, which is exactly what we expected.