Ignoring positive (semi)definite condition for optimization

173 Views Asked by At

Suppose I have a smooth scalar-valued objective function $f(\cdot)$ defined on the space of $n \times n$ matrices. I want to solve the optimization problem $\min_{A \geq 0} f(A)$ where $A \geq 0 $ means that $A$ is positive semidefinite. It is easy to calculate the (matrix) derivative of $f(\cdot)$ if I ignore the constraint $A \geq 0 $. Let $f'(A)$ (which is a $n \times n$ matrix) denote this derivative, and suppose I proceed by finding $A \geq 0$ such that $f'(A) = 0$. Is this a valid approach?

In short, will I still get the critical points right even if I calculate the derivative ignoring the constraint that $A \geq 0$?

Thanks in advance!

1

There are 1 best solutions below

11
On

No, you are ignoring the boundary. Your technique works for an unconstrained problem, however in a constrained setting the optimal solution doesn't have to have a zero gradient (it does require that the directional derivative is non-negative in all directions). Moreover, since you did not state if the objective function $f$ is convex or not, this is only a necessary condition, it might not be sufficient.

However, your problem can still be solvable. In case the objective function is linear, then you have a semidefinite program. Even if $f$ is not linear, since your single constraint is convex, then you have a convex domain that even has a closed-form projection solution, so you can use some variant of gradient projection / proximal gradient to solve the problem. If $f$ is convex, you can find the global minimum, else you might have to settle for a stationary point.

In case you take the gradient projection approach, the projection onto the set of PSD matrices is $P_T(A)=U[\Lambda(A)]_+U^T$, where $A=U\Lambda U^T$ is the spectral decomposition of $A$, and $[\Lambda(A)]_+$ means the maximum between the diagonal elements of $\Lambda$ (eigenvalues of $A$) and zero, applied per element.

Notice that in both cases (PSD programming, gradient projection), the solution involves an eigenvalues decomposition at each stet, and is therefore suitable for medium-size problems.