Gradient of a function that involves a matrix square root

1k Views Asked by At

Let $S$ be a positive definite matrix of size $n$.

Consider the function $f: \mathbb{R}_+^n \longrightarrow \mathbb{R}$ defined by

$$\forall u \in \mathbb{R}^n, \; f(u) = \text{tr}((\text{diag}(u)S)^{1/2}).$$

What would be its gradient?

1

There are 1 best solutions below

5
On

Given a vector $y$, we'd like to denote two operations: generating a diagonal matrix whose main diagonal is the vector, and the inverse operation of extracting the main diagonal from a matrix into a vector, i.e. $$Y= {\rm Diag}(y) \implies y={\rm diag}(Y)$$ For this particular function, define the auxiliary variable $$X = {\rm Diag}(u)\,S$$ Write the function, and find its differential and gradient as $$\eqalign{ \phi &= {\rm tr}\big(X^{1/2}\big) \cr d\phi &= \tfrac{1}{2}X^{-T/2}:dX \cr &= \tfrac{1}{2}X^{-T/2}:{\rm Diag}(du)\,S \cr &= \tfrac{1}{2}X^{-T/2}S^T:{\rm Diag}(du) \cr &= \tfrac{1}{2}SX^{-1/2}:{\rm Diag}(du) \cr &= \tfrac{1}{2}{\rm diag}\big(SX^{-1/2}\big):du \cr \frac{\partial\phi}{\partial u} &= \tfrac{1}{2}{\rm diag}\big(SX^{-1/2}\big) \cr }$$ In several intermediate steps, a colon was used to denote the trace/Frobenius product, i.e. $$A:B = {\rm tr}\big(A^TB\big)$$