Derivative of a trace

2.6k Views Asked by At

I'm new here, so "Hi" to everyone :D

I got the following problem. I have the matrices $A$, $B$, $C$, $X$ and $Y$. All matrices are square (say n-by-n). In particular: - $A$ is full rank - $B$ is symmetric and (semi)definite positive; - $C$ is diagonal and definite positive; - $Y$ is diagonal and definite positive; - $X$ is diagonal ($X = \operatorname{diag}\{x_1, \ldots,x_n\}$) and it is the unknown matrix;

Then I have the following function: $f(X) = (A(B+X^{T}YX)^{-1}A^{T} + C)^{-1}$ (it may seem dumb to write $X^{T}$ since it is diagonal, but I think this is the best way to write it).

I would like to evaluate the derivative of the trace of $f(X)$ with respect to each $x_i$.

Any idea?

3

There are 3 best solutions below

6
On BEST ANSWER

If we perturb an invertible matrix $M$ by a small $\Delta M$, the first-order change in $M^{-1}$ is given by $\Delta (M^{-1}) := (M+\Delta M)^{-1} - M^{-1} = -M^{-1} (\Delta M) M^{-1} + O(\|\Delta M\|^2)$. Now, consider $f(X) = (A(B+X^{T}YX)^{-1}A^{T} + C)^{-1}$. \begin{align} \Delta f(X) =&\Delta\left((A(B+X^{T}YX)^{-1}A^{T} + C)^{-1}\right)\\ \approx&-f(X)\ \Delta\left(A(B+X^{T}YX)^{-1}A^{T} + C\right) f(X)\\ =&-f(X)A \Delta\left((B+X^{T}YX)^{-1}\right) A^{T}f(X)\\ \approx&f(X)A(B+X^{T}YX)^{-1} \Delta\left(B+X^{T}YX\right) (B+X^{T}YX)^{-1}A^{T}f(X)\\ \approx&f(X)A(B+X^{T}YX)^{-1} \left((\Delta X)^{T}YX + X^TY\Delta X\right) (B+X^{T}YX)^{-1}A^{T}f(X). \end{align} Therefore \begin{align} \Delta\, \mathrm{trace}f(X) \approx&\mathrm{trace}\, f(X)A(B+X^{T}YX)^{-1} \left((\Delta X)^{T}YX + X^TY\Delta X\right) (B+X^{T}YX)^{-1}A^{T}f(X)\\ =&2\,\mathrm{trace}\, (\Delta X)^{T}YX (B+X^{T}YX)^{-1}A^{T}f(X)^2A(B+X^{T}YX)^{-1} \end{align} and in turn $$ \frac{d\mathrm{trace}f(X)}{dX} = 2YX (B+X^{T}YX)^{-1}A^{T}f(X)^2A(B+X^{T}YX)^{-1}. $$ This is the formula for a general square matrix $X$. For a diagonal $X$, simply take the diagonal of the above derivative.

3
On

Just use the chain rule, which states that $$ \frac{\partial g(\bf{U})}{\partial X_{i,j}} = \operatorname{trace}\left(\left(\frac{\partial g(\bf{U})}{\partial \bf{U}}\right)^T\frac{\partial \bf{U}}{\partial X_{i,j}}\right). $$

In your case, let $\bf{U} = A(B+X^{T}YX)^{-1}A^{T} + C$ and then $g(\bf{U}) = \operatorname{trace}(U^{-1})$. First, observe that $$ \frac{\partial g(\bf{U})}{\partial \bf{U}} = -\bf{U}^{-T}U^{-T}. $$ Next, we evaluate $\frac{\partial \bf{U}}{\partial X_{i,j}}$. \begin{align} \frac{\partial \bf{U}}{\partial X_{i,j}} &= \frac{\partial}{\partial X_{i,j}}(A(B+X^{T}YX)^{-1}A^{T} + C)\\ & = \frac{\partial}{\partial X_{i,j}}(A(B+X^{T}YX)^{-1}A^T )\\ & = A\left(\frac{\partial}{\partial X_{i,j}}(B+X^{T}YX\right)^{-1})A^{T}\\ & = -A(B+X^{T}YX)^{-1}\left(\frac{\partial}{\partial X_{i,j}}(B+X^{T}YX)\right)(B+X^{T}YX)^{-1}A^T \\ & = -A(B+X^{T}YX)^{-1}(X^TYJ^{ij} + J^{ji}YX)(B+X^T YX)^{-1}A^T \end{align} in which $(J^{ij})_{kl} = \delta_{ik} \delta_{jl}$, where $\delta$ denotes the Kronecker delta function. Note that in the above calculation I used \begin{align} &\frac{\partial}{\partial X_{i,j}}X^{T}YX = X^TYJ^{ij} + J^{ji}YX\\ &\frac{\partial}{\partial X_{i,j}}(Z(X))^{-1} = Z^{-1}\left(\frac{\partial}{\partial X_{i,j}}Z\right)Z^{-1} \end{align}

0
On

Use a colon to denote the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB) = {\rm Tr}(B^TA) = B:A$$ For typing convenience, define the matrices $$\eqalign{ G = G^T &= XYX + B \qquad&\implies\quad dG = XY\,dX + dX\,YX \\ H = H^T &= AG^{-1}A^T + C \qquad&\implies\quad dH = -AG^{-1}\,dG\,G^{-1}A^T \\ M = M^T &= G^{-1}A^TH^{-2}AG^{-1}\\ }$$ Write the trace of the function in terms of these new definitions.
Then calculate its gradient. $$\eqalign{ \phi &= {\rm Tr}(H^{-1}) \\&= I:H^{-1} \\ d\phi &= -I:(H^{-1}dH\,H^{-1}) \\ &= -H^{-2}:dH \\ &= H^{-2}:(AG^{-1}\,dG\,G^{-1}A^T) \\ &= (G^{-1}A^TH^{-2}AG^{-1}):dG \\ &= M:dG \\ &= M:(XY\,dX + dX\,YX) \\ &= (YXM + MXY):dX \\ &= (YXM + MXY):{\rm Diag}(dx) \\ &= {\rm diag}(YXM + MXY):dx \\ &= {\rm diag}(2MXY):dx \\ &= 2XY\,{\rm diag}(M):dx \\ \frac{\partial\phi}{\partial x} &= 2XY\,{\rm diag}(M) \\ \frac{\partial\phi}{\partial x_i} &= e_i^T\left(\frac{\partial\phi}{\partial x}\right) \;=\; 2\,x_iy_i\,M_{ii} \\ }$$