Derivative of $\Bbb C^{n \times n} \to \Bbb C^{n \times n}$ function

124 Views Asked by At

If function $f : \Bbb C^{n \times n} \to \Bbb C^{n \times n}$ is defined by$$ f(X) := X A X^{H} - X B - E X^{H} + F $$ find the derivative $\frac{\partial f}{\partial X}$. Here, $X^{H}$ denotes the complex conjugate transpose of $X$.

I want to do this as I want to find out the minimizer of trace$(f(X))$ , so I want to differentiate and find out the optimal $X$, i.e.,

$$\underset{X}{\min} \quad\mbox{Trace} \left( f(X) \right)$$


How do I find the derivative of $f$ with respect to $X$? I consulted the matrix cookbook but did not find the relevant results. I only found results pertaining to vectors but here the derivative is with respect to a matrix.

2

There are 2 best solutions below

0
On

Since there's some debate about what question you intended, I'm going to address the real situation, as it is less complicated.

You want to compute $\dfrac{\partial F}{\partial X}Y$, i.e., the directional derivative in direction $Y$ (which is another matrix). This is the best way to handle the derivative as a linear map when the inputs are matrices. I'm going to write $^\top$ for the transpose.

We have \begin{align*} \frac{\partial f}{\partial X}Y &= \lim_{t\to 0} \frac{F(X+tY)-F(X)}t \\ &= \lim_{t\to 0}\frac1t \big((X+tY)A(X+tY)^\top - (X+tY)B - E(X+tY)^\top+F-XAX^\top+XB +EX^\top - F\big) \\ &= \lim_{t\to 0} \frac{tYAX^\top + tXAY^\top + t^2YAY^\top - tYB-tEY^\top}t \\ &= YAX^\top + XAY^\top - YB - EY^\top. \end{align*}

Note that there is no more concise formula, as the answer involves both $Y$ and $Y^\top$.

3
On

$ \def\l{\left} \def\r{\right} \def\o{{\tt1}} \def\p{\partial} \def\lr#1{\l(#1\r)} \def\trace#1{\operatorname{Tr}\lr{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Assume that $A=A^H,E=B^H\;{\rm and}\;F=F^H\,$ which ensures that the trace is real-valued.
$\big($NB: This implies that $\,A^*=A^T\;{\rm and}\;E^*=B^T\big)$
Let's also introduce a colon as a convenient product notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{AB^T} \\ }$$ Use colon products to rewrite the objective function, then calculate its differential and gradient. $$\eqalign{ \phi &= X:\lr{AX^H}^T - X:B^T - B^H:\lr{X^H}^T + \trace{F} \\ d\phi &= dX:\lr{X^*A^*} - dX:B^T \\ \grad{\phi}{X} &= \lr{X^*A^* - B^T} \;=\; \lr{XA-E}^* \\ }$$ Setting the gradient to zero yields an optimal solution of $\;X=EA^{-1}$

Notice that during the differentiation process, $X$ and $X^H$ are treated as independent variables. This is called Wirtinger differentiation (aka the $\,\mathbb{CR}\,$- Calculus).

If you calculate the gradient wrt $X^H$ instead, you will find that $$\eqalign{ \grad{\phi}{X^H} &= \lr{XA-E}^T \\ }$$ and the optimal solution is the same.