How does one solve the following problem (matrix $2$-norm and diagonal matrix constraint) analytically?
$$\hat b = \arg \min_{b} f \left( b \right)$$ such that $$f \left( b \right) = \left\|A- {F}^{*} \operatorname{diag} \left( b \right) {F} \right\|_{2}= \sigma_1(A- {F}^{*} \operatorname{diag} \left( b \right) {F} )$$ where, $A$ is a square matrix (size: m $\times$ m), $F$ is a tall matrix (size: n $\times$ m) and $b$ is a vector (size: n), $\sigma_1(A- {F}^{*} \operatorname{diag} \left( b \right) {F} )$ the maximum singular value of a matrix given by $A- {F}^{*} \operatorname{diag} \left( b \right) {F} $.
P.S: $\left\| \cdot \right\|_{2}$ is the matrix-2 norm (operator norm) and $ *$ is the conjugate transpose.
The set of diagonal matrices $\mathcal{B} = \left\{ B\in \mathbb{R}^{n \times n} \mid B = \operatorname{diag}\left( b \right) \right\}$ is a convex set (Because any linear combination of diagonal matrices is also a diagonal matrix).
Partial answer for how to go about this numerically (since the objective function may not be everywhere differentiable it seems, so I'm not going to do it analytically).
This is an unconstrained convex optimization problem on $\mathbb{R}^n$. Furthermore, the objective $f$ follows the rules of so-called "Disciplined Convex Programming". Hence, convex optimization software such as CVXPY can handle it. Here's a short script in Python:
(Sub)gradient descent approach
The function $f$ is differentiable at any point $b$ where the matrix $A-F^\text{T}\text{diag}(b)F$ has a unique largest singular value. At such a point, let $u$ and $v$ denote the left and right singular vectors (respectively) corresponding to the (unique) largest singular value. The gradient of $f$ at such a point is $$ \nabla{f}(b)=-Fu\otimes{Fv}, $$ where $\otimes$ denotes component-wise multiplication. At points where $f$ is isn't differentiable, you have to be more careful--look up subgradient descent. I'm not sure what kind of convergence guarantees there are for this, but it will be cheaper than CVXPY.
I implemented a simple version in Python, and my main issue was finding step-sizes that led to convergence--I got lots of oscillation.