derivative of an objective function with trace of the matrix

248 Views Asked by At

How can you derive the gradient of $$f_{\mu}(U) = \mu \log (\mathbf{Tr}\exp(A+U)/\mu) -\mu \log n$$

as

$$f_{\mu}(U) = (\mathbf{Tr}(A+U)/\mu)^{-1} \exp(A+U)/\mu$$ where $A,U$ are symmetric matrices and $\mu$ is constant.

When taking derivative of $\mathbf{Tr}\exp(A+U)/\mu$ by chain rule, why does it get $\exp(A+U)/\mu$?

1

There are 1 best solutions below

0
On BEST ANSWER

For convenience, let $T = {\rm tr}(\exp(U+A))$.

Then the function and its differential can be written as the variables $$\eqalign{ f &= \mu\log(T) -\mu\log(\mu) - \mu\log(n) \cr\cr df &= \mu\,\,d\log(T) \cr &= \mu\,\frac{dT}{T} = \frac{\mu}{T}\,dT \cr &= \frac{\mu}{T}\,\exp(U+A):dU \cr\cr \frac{\partial f}{\partial U} &= \frac{\mu}{T}\,\exp(U+A) \cr &= \frac{\mu\,\exp(U+A)}{{\rm tr}(\exp(U+A))} \cr }$$ where I have used the following fact about the differential of the trace of a scalar function applied to a matrix argument $$\eqalign{ d\,{\rm tr}(f(X)) = f'(X^T) : dX }$$