Gradient of $\lVert \mathbf{H}^{\dagger}\mathbf{H} - \mathbf{B}\rVert_F^2$

182 Views Asked by At

I'm trying to get the derivative of

$$f(\mathbf{H})=\lVert \mathbf{H}^{\dagger}\mathbf{H} - \mathbf{B}\rVert^2_F$$

with respect to $\mathbf{H}$, where $\mathbf{H}^\dagger$ denotes the pseudo-inverse of $\mathbf{H}$.

I know that $\nabla f = \mathbf{J}_{\mathbf{H}^{\dagger}\mathbf{H}}(\mathbf{H})^T \cdot \text{vect}\left(2(\mathbf{H}^{\dagger}\mathbf{H}-\mathbf{B}) \right)$ where $\mathbf{J}_{\mathbf{H}^{\dagger}\mathbf{H}}(\mathbf{H})$ is the Jacobian Matrix of $\mathbf{H}^{\dagger}\mathbf{H}$.

By the other hand, from "Matrix Cook Book", I know that

  • $d(\mathbf{H}^\dagger\mathbf{H}) = d(\mathbf{H}^\dagger)\mathbf{H} + \mathbf{H}^\dagger d(\mathbf{H}) $

and from Derivative of pseudoinverse with respect to original matrix I know that

  • $\eqalign{d\mathbf{H}^\dagger &= \mathbf{H}^\dagger (\mathbf{H}^\dagger)^T\,d\mathbf{H}^T\,(\mathbf{I}-\mathbf{H}\mathbf{H}^\dagger) + (\mathbf{I}-\mathbf{H}^\dagger \mathbf{H})\,d\mathbf{H}^T\,(\mathbf{H}^\dagger)^T\mathbf{H}^\dagger -\mathbf{H}^\dagger\,d\mathbf{H}\,\mathbf{H}^\dagger \cr }$

But I don't know how to use the last two differential, I mean, how can I obtain $\mathbf{J}_{\mathbf{H}^{\dagger}\mathbf{H}}(\mathbf{H})$ or $\partial (\mathbf{H}^{\dagger}\mathbf{H})_{(i,j)}/\partial \mathbf{H}_{(k,l)}$?

1

There are 1 best solutions below

5
On BEST ANSWER

Note that $H$ is a poor choice for the name of the matrix, since it is easily confused with the Hermitian conjugate operation (likewise, the ubiquity of the transpose operation makes $T$ a bad choice).

So let's define the following matrices $$\eqalign{ X &= H \quad\; {\rm (independent\,variable)} \cr P &= X^\dagger \quad {\rm (pseudo\,inverse\,of\,X)} \cr A &= PX-B \cr dA &= d\,(PX) \cr &= dP\,X + P\,dX \cr &= -P\,dX\,PX + (I-PX)\,dX^H\,P^H \quad+\quad P\,dX \cr &= P\,dX\,(I-PX) + (I-PX)\,dX^H\,P^H \cr }$$ NB:   The relationships $$\eqalign{ &(I-XP)X =(X-X)= 0 \\ &P^HPX = P^H \\ }$$ were used to eliminate some terms.

Write the function in terms of these new variables.
Then find its differential and gradient. $$\eqalign{ f &= A^*:A \cr df &= A^*:dA \quad&+\; conj \cr &= A^*:\Big(P\,dX\,(I-PX) + (I-PX)\,dX^H\,P^H\Big) \quad&+\; conj \cr &= A^*:P\,dX\,(I-PX) + A^*:(I-PX)\,dX^H\,P^H \quad&+\; conj \cr &= P^TA^*(I-PX)^T:dX + (I-PX)^TA^*P^*:dX^H \quad&+\; conj \cr &= P^TA^*(I-PX)^*:dX + P^HA^H(I-PX):dX^* \quad&+\; conj \cr \cr &= P^TA^*(I-PX)^*:dX + P^HA^H(I-PX):dX^* \cr &\;+\; P^HA(I-PX):dX^* + P^TA^T(I-PX)^*:dX \cr \cr &= \Big(P^TA^*(I-PX)^* + P^TA^T(I-PX)^*\Big):dX \quad&+\; conj \cr &= P^T\,(A+A^H)^T\,(I-PX)^T:dX \quad&+\; conj \cr &= P^T\,(2PX - B-B^H)^T\,(I-PX)^T:dX \quad&+\; conj \cr \frac{\partial f}{\partial X} &= P^T\,(2PX - B-B^H)^T\,(I-PX)^T \cr \frac{\partial f}{\partial X^*} &= P^H\,(2PX - B-B^H)^H\,(I-PX)^H \cr \cr }$$ If all of the matrices are in fact real $\big($i.e. $X=X^*\big)$, then these Wirtinger derivatives can be combined into a single real result. $$\eqalign{ \frac{\partial f}{\partial X} &= 2\,P^T\,(2PX - B-B^T)\,(I-PX) \cr }$$ In several places above, a colon is used to denote the trace product, i.e. $$\eqalign{ &A:B &= {\rm Tr}(AB^T) \cr &A:A^*&= {\rm Tr}(AA^H) = \|A\|_F^2 \cr }$$