Derivative of a column-normalized complex matrix

145 Views Asked by At

Note: I have found the answer to this and left my working below

I am trying to find the derivative $\dfrac{\partial J}{\partial \mathbf{A}^\ast}$ which I have already reduced to \begin{align}\dfrac{\partial J}{\partial \mathbf{A}^\ast_{ij}} &= \dfrac{\operatorname{Tr}(({\partial J}/{\partial \mathbf{B}^\ast})^T\partial\mathbf{B}^\ast)}{\partial \mathbf{A}^\ast_{ij}} + \dfrac{\operatorname{Tr}(({\partial J}/{\partial \mathbf{B}})^T\partial\mathbf{B})}{\partial \mathbf{A}^\ast_{ij}}\\ &=\sum_{k,l}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{kl}\dfrac{\partial\mathbf{B}^\ast_{kl}}{\partial \mathbf{A}^\ast_{ij}} + \sum_{k,l}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{kl}\dfrac{\partial\mathbf{B}_{kl}}{\partial \mathbf{A}^\ast_{ij}} \end{align} using this. $J$ is a real-valued scalar that depends on $\mathbf{A}$. The matrices $\mathbf{B}$ and $\mathbf{A}$ are related by $$\mathbf{B}_{ij} = \dfrac{\mathbf{A}_{ij}}{\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}$$ i.e. $\mathbf{B}$ is an $L_2$ column-normalized $\mathbf{A}$

However, I can't quite figure out how to proceed with the differentiation. I would appreciate any help!

My attempt:

We focus on \begin{align} \dfrac{\partial\mathbf{B}^\ast_{kl}}{\partial \mathbf{A}^\ast_{ij}} &=\dfrac{\partial}{\partial \mathbf{A}^\ast_{ij}} \dfrac{\mathbf{A}_{kl}^\ast}{\sqrt{\sum_{m}\mathbf{A}_{ml}\mathbf{A}_{ml}^\ast}} \end{align} which is zero if $l\ne j$, so I now consider \begin{align} \dfrac{\partial\mathbf{B}^\ast_{kj}}{\partial \mathbf{A}^\ast_{ij}} &=\dfrac{\partial}{\partial \mathbf{A}^\ast_{ij}} \dfrac{\mathbf{A}_{kj}^\ast}{\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}} \end{align}

For the case where $i=k$, \begin{align} \dfrac{\partial\mathbf{B}^\ast_{ij}}{\partial \mathbf{A}^\ast_{ij}} &=\dfrac{\partial}{\partial \mathbf{A}^\ast_{ij}} \dfrac{\mathbf{A}_{ij}^\ast}{\sqrt{\mathbf{A}_{ij}\mathbf{A}^\ast_{ij} + \sum_{m\ne i}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\\ &= \dfrac{1}{\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\left(1-\dfrac{1}{2}\dfrac{\mathbf{A}^\ast_{ij}\mathbf{A}_{ij}}{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}\right)\\ \end{align} For the case where $i\ne k$, \begin{align} \dfrac{\partial\mathbf{B}^\ast_{ij}}{\partial \mathbf{A}^\ast_{ij}} &=\dfrac{\partial}{\partial \mathbf{A}^\ast_{ij}} \dfrac{\mathbf{A}_{kj}^\ast}{\sqrt{\mathbf{A}_{ij}\mathbf{A}^\ast_{ij} + \sum_{m\ne i}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\\ &= \dfrac{1}{\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\left(-\dfrac{1}{2}\dfrac{\mathbf{A}^\ast_{kj}\mathbf{A}_{ij}}{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}\right)\\ \end{align}

So we have \begin{align} \dfrac{\operatorname{Tr}(({\partial J}/{\partial \mathbf{B}^\ast})^T\partial\mathbf{B}^\ast)}{\partial \mathbf{A}^\ast_{ij}} &=\sum_{k}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{kj}\dfrac{\partial\mathbf{B}^\ast_{kj}}{\partial \mathbf{A}^\ast_{ij}}\\ &=\dfrac{1}{\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{ij}-\dfrac{1}{2\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\sum_{k}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{kj}\dfrac{\mathbf{A}^\ast_{kj}\mathbf{A}_{ij}}{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast} \end{align}

We now ignore $l\ne j$ again and consider \begin{align} \dfrac{\partial\mathbf{B}_{kj}}{\partial \mathbf{A}^\ast_{ij}} &=\dfrac{\partial}{\partial \mathbf{A}^\ast_{ij}} \dfrac{\mathbf{A}_{kj}^\ast}{\sqrt{\mathbf{A}_{ij}\mathbf{A}^\ast_{ij} + \sum_{m\ne i}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}} =\dfrac{1}{\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\left(-\dfrac{1}{2}\dfrac{\mathbf{A}^\ast_{kj}\mathbf{A}_{ij}}{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}\right)\\ \end{align}

So we have \begin{align} \dfrac{\operatorname{Tr}(({\partial J}/{\partial \mathbf{B}^\ast})^T\partial\mathbf{B})}{\partial \mathbf{A}^\ast_{ij}} &=\sum_{k}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{kj}\dfrac{\partial\mathbf{B}_{kj}}{\partial \mathbf{A}^\ast_{ij}}\\ &=-\dfrac{1}{2\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\sum_{k}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{kj}\dfrac{\mathbf{A}^\ast_{kj}\mathbf{A}_{ij}}{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast} \end{align}

Thus the overall expression evaluates \begin{align}\dfrac{\partial J}{\partial \mathbf{A}^\ast_{ij}} &= \dfrac{\operatorname{Tr}(({\partial J}/{\partial \mathbf{B}^\ast})^T\partial\mathbf{B}^\ast)}{\partial \mathbf{A}^\ast_{ij}} + \dfrac{\operatorname{Tr}(({\partial J}/{\partial \mathbf{B}})^T\partial\mathbf{B})}{\partial \mathbf{A}^\ast_{ij}}\\ &=\sum_{k,l}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{kl}\dfrac{\partial\mathbf{B}^\ast_{kl}}{\partial \mathbf{A}^\ast_{ij}} + \sum_{k,l}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{kl}\dfrac{\partial\mathbf{B}_{kl}}{\partial \mathbf{A}^\ast_{ij}}\\ &=\dfrac{1}{\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{ij}-\dfrac{1}{\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\sum_{k}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{kj}\dfrac{\mathbf{A}^\ast_{kj}\mathbf{A}_{ij}}{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}\\ &=\dfrac{1}{\sqrt{\sum_{m}\mathbf{A}_{mj}\mathbf{A}_{mj}^\ast}}\left[\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{ij}-\mathbf{B}_{ij}\sum_{k}\left(\dfrac{\partial J}{\partial \mathbf{B}^\ast}\right)_{kj}\mathbf{B}^\ast_{kj}\right] \end{align}

1

There are 1 best solutions below

4
On BEST ANSWER

Given two matrices $(X,Y)$ with the same shape (equal #rows and columns), denote their Hadamard and Frobenius products as $\;\;X\odot Y,\;\;X:Y\!=\!{\rm Tr}(X^TY),\;$ respectively.

Let's also use $(X^T,X^C,X^*)$ to denote the transpose, complex and hermitian conjugates of $X$.

Recall that the elements of the Gram matrix $(A^*A)$ are the inner products of the columns of $A$. In particular, the $k^{th}$ diagonal element is equal to the square of the length of the $k^{th}$ column of $A$.

If we create a vector $(h)$ of the lengths of the columns of $A$, then it must satisfy $$\eqalign{ h\odot h &= {\rm diag}(A^*A) \\ 2h\odot dh &= {\rm diag}(A^*dA+dA^*A) \\ }$$ It will proved convenient to use $h$ to create a diagonal matrix as well as its inverse. $$\eqalign{ H &= {\rm Diag}(h),\quad Y=H^{-1} \\ 2H\odot dH &= {\rm Diag}\Big({\rm diag}(A^*dA+dA^*A)\Big) \\ dH &= \tfrac{1}{2}Y\odot\big(A^*dA+dA^*A\big) \\ }$$ Note that $(H,Y)$ matrices are symmetric and their elements are all real.

Using these new variables, we can write $B=AY$.

The stated goal is to find the gradient of a real scalar function $(J)$ with respect to $A$, given its gradient $(G)$ with respect to $B$. We will approach this problem by calculating the differential of $J$, and then performing a change of variables from $B\to A$.

The differential will consist of a set of terms plus their hermitian conjugates. To save horizontal space, I'll only write the first set of terms and simply make reference to the conjugate terms. $$\eqalign{ dJ &= G:dB + G^*:dB^* \\ &= G:dB &\quad+\quad{conj} \\ &= G:d(AY) &\quad+\quad{conj} \\ &= GY:dA + A^TG:dY &\quad+\quad{conj} \\ &= GY:dA - A^TG:Y\,dH\,Y &\quad+\quad{conj} \\ &= GY:dA - YA^TGY:dH &\quad+\quad{conj} \\ &= GY:dA - \tfrac{1}{2}\big(YA^TGY\big):Y\odot\big(A^*dA+dA^*A\big) &\quad+\quad{conj} \\ &= GY:dA - \tfrac{1}{2}(Y^3\odot A^TG):\big(A^*dA+dA^*A\big) &\quad+\quad{conj} \\ &= \Big(GY-\tfrac{1}{2}A^{C}(Y^3\odot A^TG)\Big):dA - \tfrac{1}{2}(Y^3\odot A^TG)A^T:dA^* &\quad+\quad{conj} \\ }$$ At this point, we notice that a term involving $dA^*$ has appeared, which means that in the conjugate there is a corresponding term involving $dA$. Swap these and collect all of the terms involving $dA$. $$\eqalign{ dJ &= \Big(GY-\tfrac{1}{2}A^{C}(Y^3\odot A^TG + (Y^3\odot A^TG)^*)\Big):dA &\quad+\quad{conj} \\ &= \Big(GY-A^{C}\,{\cal Re}(Y^3\odot A^TG)\Big):dA &\quad+\quad{conj} \\ &= \Big(GY-A^{C}R\Big):dA &\quad+\quad{conj} \\ &= \big(GY-A^{C}R\big):dA \;+\; \big(GY-A^{C}R\big)^*:dA^* }$$ This gives us our expression for the new gradient $$\eqalign{ \frac{\partial J}{\partial A} &= GY-A^{C}R \\ &= \bigg(\frac{\partial J}{\partial B}\bigg)Y-A^{C}\,{\cal Re}\Bigg(Y^3\odot A^T\bigg(\frac{\partial J}{\partial B}\bigg)\Bigg) \\ }$$ and its conjugate (which is what this question actually asks for) $$\eqalign{ \frac{\partial J}{\partial A^*} &= YG^*-RA^T \\ &= Y\bigg(\frac{\partial J}{\partial B^*}\bigg)-{\cal Re}\Bigg(Y^3\odot \bigg(\frac{\partial J}{\partial B^*}\bigg)^CA\Bigg)\,A^T \\ }$$