Partial Derivative of Trace of Matrix in negative power wrt to parameters

327 Views Asked by At

$\renewcommand{\v}[1]{\mathrm{vec}\left(#1\right)} \renewcommand{\m}[1]{\mathbf{#1}} \renewcommand{\trace}[1]{\mathrm{trace}\left(#1\right)} \renewcommand{\diag}[1]{\mathrm{diag}\left(#1\right)}$

Suppose we have the diagonal matrix $\mathbf{D} = diag(\mathbf{HW1})$, where $W$ is diagonal and known and $1$ is a column vector with ones. $A$ and $B$ are known matrices, also.

How can we calculate the partial derivative of the following wrt matrix calculus? $$\frac{\partial \trace{\mathbf A \mathbf D^{-1/2} \mathbf B)}}{\partial\m H}$$

This can be written as $$\frac{\partial \trace{\mathbf B\mathbf A \mathbf D^{-1/2} }}{\partial\m H} = \frac{\partial \trace{\mathbf{S} \mathbf D^{-1/2} }}{\partial\m H} = \trace {\frac{\partial {\mathbf{S} \mathbf D^{-1/2}} }{\partial\m H}}$$

Except for the analytical way, I couldn't find out in matrix cookbook something that can show me the way.

1

There are 1 best solutions below

4
On BEST ANSWER

Define the operation $B={\rm Diag}(b)$ which takes a vector and returns a diagonal matrix,
and the operation $b={\rm diag}(B)$ which extracts the diagonal of a matrix into a vector.

Since $W$ is a diagonal matrix we can write it as $W={\rm Diag}(w)$ for some vector $w$.
The result of multiplying this matrix by a column of ones is simply $\,w=W1$.

Define some new variables $$\eqalign{ S &= BA, \quad &s &= {\rm diag}(S) = {\rm diag}(S^T) \cr g &= Hw, \quad &G &= {\rm Diag}(g), \quad dg = dH\,w \cr }$$ Write the function of interest in terms of these new variables.
Then calculate its differential and its gradient. $$\eqalign{ \phi &= {\rm Tr}(SG^{-1/2}) = S^T:G^{-1/2} = s:g^{-1/2} \cr d\phi &= s:dg^{-1/2} \cr &= s:(-\tfrac{1}{2}g^{-3/2}\odot dg) \cr &= -\tfrac{1}{2}s:G^{-3/2}dg \cr &= -\tfrac{1}{2}G^{-3/2}s:dH\,w \cr &= -\tfrac{1}{2}G^{-3/2}sw^T:dH \cr \frac{\partial\phi}{\partial H} &= -\tfrac{1}{2}G^{-3/2}sw^T \cr &= -\tfrac{1}{2}{\rm Diag}(HW1)^{-3/2}\,{\rm diag}(BA)\,1^TW \cr }$$ where functions on vectors are applied elementwise,
and $(\odot)$ represents the elementwise/Hadamard product,
and $(\,:\,)$ represents the trace/Frobenius product, i.e. $\,A:B={\rm Tr}(A^TB)$.

NB:  The cyclic property of the trace allows Frobenius products to be rearranged in numerous ways.
For example $$\eqalign{ A:BC &= B^TA:C \cr&= AC^T:B \cr&= BC:A \cr&= A^T:(BC)^T \cr }$$ are all equivalent.