Solve matrix $2$-norm problem with diagonal matrix constraint

Question

Solve matrix $2$-norm problem with diagonal matrix constraint

709 Views Asked by Bumbble Comm At 10 May 2026 - 2:47

How does one solve the following problem (matrix $2$-norm and diagonal matrix constraint) analytically?

$$\hat b = \arg \min_{b} f \left( b \right)$$ such that $$f \left( b \right) = \left\|A- {F}^{*} \operatorname{diag} \left( b \right) {F} \right\|_{2}= \sigma_1(A- {F}^{*} \operatorname{diag} \left( b \right) {F} )$$ where, $A$ is a square matrix (size: m $\times$ m), $F$ is a tall matrix (size: n $\times$ m) and $b$ is a vector (size: n), $\sigma_1(A- {F}^{*} \operatorname{diag} \left( b \right) {F} )$ the maximum singular value of a matrix given by $A- {F}^{*} \operatorname{diag} \left( b \right) {F} $.

P.S: $\left\| \cdot \right\|_{2}$ is the matrix-2 norm (operator norm) and $ *$ is the conjugate transpose.

The set of diagonal matrices $\mathcal{B} = \left\{ B\in \mathbb{R}^{n \times n} \mid B = \operatorname{diag}\left( b \right) \right\}$ is a convex set (Because any linear combination of diagonal matrices is also a diagonal matrix).

Original Q&A

There are 3 best solutions below

Bumbble Comm On 16 Jan 2019 - 5:36

Now I understand, you are using the spectral norm. In that case, the problem can be formulated as a non-linear optimization problem. $\DeclareMathOperator{diag}{diag}$ $$\begin{align*} \min_{\lambda,b,v}\quad&-|\lambda|\\ \text{s.t.}\quad&(A-F^*\diag(b)F-\lambda I)v=0 \end{align*}$$

where the optimal $\lambda$ gives $|\lambda|$ as the solution to your problem. If you try to apply the KKT conditions, you might get an analytic solution, but more likely you'd need numerical methods.

Answer for Euclidean norm

This is an unconstrained least-squares problem, so we can use the Moore-Penrose inverse (pseudoinverse). The matrix which solves the problem $\min_{X}\lVert A-F^TX\rVert_2$ is $X=(F^T)^+ A=(F^+)^T A$. If we solve $X=\diag(b)F$, then we are done.

If $F$ is full rank $m$, then we can solve that system. In particular, we can use the pseudoinverse again after multiplying by a vector $e$ of ones: $e^TX=b^TF$ and then $b=(F^+)^T X^Te$. So in the full rank case, $\hat{b}=(F^+)^T A^TF^+ e=(FF^T)^{-1}FA^TF^T(FF^T)^{-1}e$.

If $F$ does not have full rank, then we cannot necessarily solve that system, so this method doesn't apply. In that case, this is an unconstrained problem, and it's quite easy to show it's quadratic form in $b$, so just differentiating and setting to zero should give a unique solution.

Bumbble Comm On 09 Jun 2019 - 6:49

Some details to support the comment by David M about the gradient.

Let $X=(A-F^TBF) = UDV^T$

Given this SVD factorization, the gradient of the spectral norm can be used to calculate the differential, then perform a change of variables from $dX$ to $db$. $$\eqalign{ df &= u_1v_1^T:dX \cr &= -u_1v_1^T:(F^T\,dB\,F) \cr &= -Fu_1(Fv_1)^T:dB \cr &= -xy^T:dB \cr &= -xy^T:{\rm Diag}(db) \cr &= -{\rm diag}(xy^T):db \cr &= -(x\odot y):db \cr \frac{\partial f}{\partial b} &= -x\odot y = -(Fu_1)\odot(Fv_1) \cr }$$ where $(\odot)$ denotes the elemenwise/Hadamard product, $(:)$ denotes the trace/inner product, i.e. $$A:B = {\rm Tr}(A^TB)$$ The ${\rm Diag}(b)$ function generates a diagonal matrix from the input vector
and ${\rm diag}(B)$ returns the main diagonal of the input matrix as a vector.

**Bumbble Comm** · Accepted Answer

Partial answer for how to go about this numerically (since the objective function may not be everywhere differentiable it seems, so I'm not going to do it analytically).

This is an unconstrained convex optimization problem on $\mathbb{R}^n$. Furthermore, the objective $f$ follows the rules of so-called "Disciplined Convex Programming". Hence, convex optimization software such as CVXPY can handle it. Here's a short script in Python:

import cvxpy as cp
import numpy as np

def main():
    n = 10 # Require n > m 
    m = 5

    F = np.random.randn(n, m)
    A = np.random.randn(m, m)

    b = cp.Variable(n)

    objective = cp.Minimize(cp.norm(A - F.T * cp.diag(b) * F))

    prob = cp.Problem(objective, [])

    result = prob.solve(solver='SCS')

    print(b.value)
    print(result)

main()

(Sub)gradient descent approach

The function $f$ is differentiable at any point $b$ where the matrix $A-F^\text{T}\text{diag}(b)F$ has a unique largest singular value. At such a point, let $u$ and $v$ denote the left and right singular vectors (respectively) corresponding to the (unique) largest singular value. The gradient of $f$ at such a point is $$ \nabla{f}(b)=-Fu\otimes{Fv}, $$ where $\otimes$ denotes component-wise multiplication. At points where $f$ is isn't differentiable, you have to be more careful--look up subgradient descent. I'm not sure what kind of convergence guarantees there are for this, but it will be cheaper than CVXPY.

I implemented a simple version in Python, and my main issue was finding step-sizes that led to convergence--I got lots of oscillation.

Solve matrix $2$-norm problem with diagonal matrix constraint

There are 3 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in CONVEX-OPTIMIZATION

Related Questions in SPECTRAL-NORM

Trending Questions

Popular # Hahtags

Popular Questions