Given a tall matrix $F \in \mathbb{C}^{m \times n}$, where $m > n$, and a non-symmetric matrix $A$ of size $n \times n$, consider the spectral norm
$$ \begin{aligned} \| A - F^* \operatorname{diag}(b) F \|_2 &= \sigma_{\max} \left( A - F^* \operatorname{diag}(b) F \right) \\ &= \sqrt{\lambda_{\max} \left( (A-F^*\operatorname{diag}(b)F)^* (A-F^*\operatorname{diag}(b)F ) \right),} \end{aligned} $$
How do I compute analytically $\nabla_b \|A-F^*\operatorname{diag}(b)F\|_2$, where $b \in \mathbb{C}^{m \times 1}$ is some vector and {$*$} is a sign for conjugate transpose?
I need gradient because I want to find $b$ by minimizing $\|A-F^*\operatorname{diag}(b)F\|_2$ as I would like to find the optimum by using gradient descent. Is it possible?
Let's use $\big\{F^T,\,F^C,\,F^H=(F^C)^T\big\}\,$ to denote the $\big\{$Transpose, Complex, Hermitian$\big\}$ conjugates of $F$, respectively.
Let's also use the Frobenius product (:) notation instead of the Trace function, i.e. $$A:B = {\rm Tr}(A^TB)$$ [NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]
Define the variables $$\eqalign{ B &= {\rm Diag}(b) \cr X &= F^HBF-A \cr }$$ Given the SVD of $X$ $$\eqalign{ X &= USV^H \cr U &= \big[\,u_1\,u_2 \ldots u_n\,\big],\,&u_k&\in{\mathbb C}^{m\times 1} \cr S &= {\rm Diag}(\sigma_k),&S&\in{\mathbb R}^{n\times n} \cr V &= \big[\,v_1\,v_2 \ldots v_n\,\big],&v_k&\in{\mathbb C}^{n\times 1} \cr }$$ where the $\sigma_k$ are ordered such that $\,\,\sigma_1>\sigma_2\ge\ldots\ge\sigma_n\ge 0$
The gradient of the spectral norm $\phi = \|X\|_2$ can be written as $$G = \frac{\partial\phi}{\partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$ To find the gradient wrt the vector $b$, write the differential and perform a change of variables. $$\eqalign{ d\phi &= G:dX \cr &= G:F^H\,dB\,F \cr &= F^C GF^T:dB \cr &= F^C GF^T:{\rm Diag}(db) \cr &= {\rm diag}\big(F^CGF^T\big):db \cr \frac{\partial\phi}{\partial b} &= {\rm diag}\big(F^CGF^T\big) \cr &= {\rm diag}\big((Fu_1)^C(Fv_1)^T\big) \cr }$$