I'm struggling with finding directional derivatives of the gradient of a function of complex matrices.
As a first setup, consider a function $g:\mathbb{C}^n\to\mathbb{R}$ such that $$g(\mathbf{x})=\frac{1}{4}\big(\mathbf{x}^*\mathbf{A}\mathbf{x}-a\big)^2$$ where $\mathbf{A}=\mathbf{A}^*\in\mathbb{C}^{n\times n}$ and $a\in\mathbb{R}$, $a\neq 0$. Being a real-valued function, its derivative is the same w.r.t $\mathbf{x}$ or $\mathbf{x}^*$ from a Wirtinger calculus perspective. Therefore, its gradient $\nabla_{\mathbf{x}} g$ is $$\begin{align} \nabla_{\mathbf{x}}g&=\big(\mathbf{x}^*\mathbf{A}\mathbf{x}-a\big)\mathbf{A}\mathbf{x}\\ \end{align}$$ and $\mathrm{D}\big(\nabla_{\mathbf{x}} g\big)[\mathbf{u}]$, the directional derivative of the gradient in direction $\mathbf{u}$ is given by (using Frechet derivatives): $$\begin{align} \mathrm{D}\big(\nabla_{\mathbf{x}}g\big)[\mathbf{u}]&=\lim_{t\to0}\frac{\nabla_{\mathbf{x}}g(\mathbf{x}+t\mathbf{u})-\nabla_{\mathbf{x}}g(\mathbf{x})}{t}=\big(\mathbf{x}^*\mathbf{A}\mathbf{x}-a\big)\mathbf{A}\mathbf{u}+\big(\mathbf{x}^*\mathbf{A}\mathbf{u}+\mathbf{u}^*\mathbf{A}\mathbf{x}\big)\mathbf{A}\mathbf{x}\\ \end{align}$$
Now, consider the generalization of $g$ to complex matrices $\mathbf{X}\in\mathbb{C}^{n\times p}$: $$\begin{align} f(\mathbf{X})&=\sum_{i=1}^pg(\mathbf{X}\mathbf{e}_i)=\frac{1}{4}\big\|\mathrm{diag}\left(\mathbf{X}^*\mathbf{A}\mathbf{X}-a\mathbf{I}\right)\big\|^2=\frac{1}{4}\big\|\left(\mathbf{X}^*\mathbf{A}\mathbf{X}\right)\circ\mathbf{I}-a\mathbf{I}\big\|_F^2 \end{align}$$ When computing the gradient $\nabla_{\mathbf{X}} f$, I obtain $$\begin{align} \nabla_{\mathbf{X}}f&=\mathbf{A}\mathbf{X}\big(\left(\mathbf{X}^*\mathbf{A}\mathbf{X}\right)\circ\mathbf{I}-a\mathbf{I}\big)\\ \end{align}$$ which seems to be correct as all my numerical tests are satisfactory. But, when computing the directional derivatives of the gradient using Frechet derivatives, I obtain $$\begin{align} \mathrm{D}\big(\nabla_{\mathbf{X}}f\big)[\mathbf{U}]&=\lim_{t\to0}\frac{\nabla_{\mathbf{X}}f(\mathbf{X}+t\mathbf{U})-\nabla_{\mathbf{X}}f(\mathbf{X})}{t}\\ &=\mathbf{A}\mathbf{U}\big(\left(\mathbf{X}^*\mathbf{A}\mathbf{X}\right)\circ\mathbf{I}-a\mathbf{I}\big)+\mathbf{A}\mathbf{X}\big(\left(\mathbf{X}^*\mathbf{A}\mathbf{U}+\mathbf{U}^*\mathbf{A}\mathbf{X}\right)\circ\mathbf{I}\big)\\ \end{align}$$ which fails my numerical tests. I also obtain the same result when considering the gradient and directional derivative in an element-to-element basis. Is there something that I'm missing, like the diagonal structure of the matrix $\mathbf{X}^*\mathbf{AX}$ with the Hadamard product in the chain rule? Haven't been able to obtain a formulation that passes my numerical evaluations. Thanks.
Given the matrix $$A=A^H \quad\iff\quad A^T=A^C$$ where the superscripts $(T,H,C)$ denote the transpose, hermitian, and complex conjugates, respectively. Consider the following real scalar functions of the vector $y$ $$\eqalign{ \psi &= \frac{(Ay)^C:y -\alpha}{2} \\ d\psi &= \frac{(Ay)^C:dy+(Ay):dy^C}{2} \\ \\ \phi &= \psi^2 \\ d\phi &= 2\psi\,d\psi \\ &= (\psi Ay)^C:dy + (\psi Ay):dy^C \\ \frac{\partial\phi}{\partial y} &= (\psi Ay)^C \qquad({\rm gradient\,wrt\,}y) \\ \frac{\partial\phi}{\partial y^C} &= (\psi Ay) \\ }$$ Now substitute $\,Xe_k\to y\;$ (where $e_k$ denotes a standard basis vector) and define the component functions $$\eqalign{ \psi_k &\doteq \psi(Xe_k),\qquad \phi_k &\doteq \phi(Xe_k) \\ }$$ Finally, create your generalized function by summing over the components. $$\eqalign{ \Phi &= \sum_k \phi_k \\ d\Phi &= \sum_k(\psi_kAXe_k)^C:dX\,e_k + (\psi_kAXe_k):dX^Ce_k \\ &= \sum_k(AX)^C(\psi_ke_ke_k^T):dX + (AX)\,(\psi_ke_ke_k^T):dX^C \\ }$$ Define the real diagonal matrix $P = {\rm Diag}(\psi_k)$ to write this in a concise form. $$\eqalign{ d\Phi &= (AXP)^C:dX + (AXP):dX^C \\ &= 2\;{\cal Re}\Big((AXP)^C:dX\Big) \\ \frac{\partial\Phi}{\partial X} &= (AXP)^C \qquad({\rm gradient\,wrt\,}X) \\ \frac{\partial\Phi}{\partial X^C} &= (AXP) \\ }$$ Substitute $\,U\to dX\,$ then $\,d\Phi\,$ becomes the directional derivative.
NB: In the above, a colon is used to denote the trace/Frobenius product $$A:B = {\rm Tr}(A^TB) = {\rm Tr}(AB^T) = B:A$$