Deriving a composed matrix function

39 Views Asked by At

I'm interested in deriving the following function, with respect to the matrix $W$,

\begin{align} R\left({G}; H\right) = \log \bigg|{I} +s \cdot \big({G}^H{G}\big)^{-1}{G}^H{H}{H}^H{G}\bigg|. \end{align}

where $G = WA$, and $G,W,A,H \in \mathbb{C}^{M\times M}$ and $s \in \mathbb{C}$.

I used matrixcalculus.org to reach the final answer, but I want to understand the way:

\begin{align} \nabla_{W} R({W},{A}) = \bigg(s\!\cdot\!{A} \notag \Big({I}+ s\!\cdot\!({{G}^H{G}\big)^{-1} {G}^H{H}{H}^H{G}}\Big)^{-1}\notag \Big(2({G}^H{G})^{-1}{G}^H{H}{H}^H\Big)\notag \Big({I}_M-{G}({G}^H{G})^{-1}{G}^H\Big)\bigg) ^H. \end{align}

To facilitate the calculations I defined:

$Y = G^H G, \quad P = GY^{-1}G^H, \quad Z = PHH^H, \quad Q = I + s\cdot Z$

So, we have $R = \log \det Q$.

I used the differential method, but then I got stuck:

\begin{align} \quad dR = Q^{-T} : dQ \\ \quad = Q^{-T} :s\cdot dZ \\ \end{align}

And now I'm not sure how should I move forward.

what is $dZ, $what is $dP$, and what is $dY$? and how do I substitute them?

1

There are 1 best solutions below

2
On BEST ANSWER

$ \def\s{\sigma} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} $I'll keep most of your variable names, but I need to rename $H\to F,\:$ since $H^H$ is one keystroke away from disaster.

Also, in the complex domain, one typically uses Wirtinger derivatives (i.e. treating $\,\{dG,\,dG^H\}\,$ as independent variables).

First, note that the pseudoinverse can be used to shorten the expression $$\eqalign{ G\LR{G^HG}^{-1}G^H = GG^+ \equiv P \\ }$$ The differential of this ortho-projector is well known $$\eqalign{ dP &= \LR{I-P} \;dG\; G^+ + \LR{G^+}^H dG^H \LR{I-P} \\ }$$ Applying this to the current problem $\LR{{\rm and\:ignoring\;the\;} dG^H {\rm\:term}}$ $$\eqalign{ dR &= Q^{-T}:dQ \\ &= s\,Q^{-T}:dZ \\ &= s\,Q^{-T}:dP\;FF^H \\ &= \LR{sFF^HQ^{-1}}^T:dP \\ &= \LR{sFF^HQ^{-1}}^T:\LR{I-P} \;dG\; G^+ \\ &= \LR{s\,G^+FF^HQ^{-1}\LR{I-P}}^T:dG \\ &= \LR{s\,G^+FF^HQ^{-1}\LR{I-P}}^T:dW\,A \\ &= \LR{sAG^+FF^HQ^{-1}\LR{I-P}}^T:dW \\ \grad{R}{W} &= \LR{sAG^+FF^HQ^{-1}\LR{I-P}}^T \\ }$$

Where did the factor of $\,2\,$ go?

This is a typical occurrence when dealing with Wirtinger derivatives.

As a simple example, consider the Frobenius norm (squared) $$\phi = \frob{A}^2$$ When $A$ is $\sf real$, the gradient wrt $A$ is $$\eqalign{ \phi &= A:A \\ d\phi &= 2A:dA \\ \grad{\phi}{A} &= 2A \\ }$$ But when $A$ is $\sf complex$, the Wirtinger gradient wrt $A$ is $$\eqalign{ \phi &= A^* : A \\ d\phi &= A^*:dA \\ \grad{\phi}{A} &= A^* \\ }$$ BTW, this is also the reason why the (outer) Hermitian conjugate in your result has been replaced by a simple transpose in my result.

To reorder the matrices, you can take advantage of the identity $$ \det(I+AB) \;=\; \det(I+BA) $$ before differentiating.