I have a scalar function $f(\rho) = Tr(\rho H) + c\ Tr(\rho\log\rho)$, where $Tr$ is trace, $\rho$ is a positive semidefinite matrix with trace 1 and $H$ is a Hermitian matrix and $c$ is a postive constant and we are using the matrix logarithm.
I'm trying to minimize $f$ over all PSD $\rho$ with unit trace and would like to use gradient descent. Because I know my solution $\rho$ must have trace 1 and be positive semidefinite, I will project at each iteration onto the PSD with trace 1 subspace.
Currently, what I have tried does not work - I'm getting the wrong answer. The possibilities are
- My gradient is incorrect. I am using $Tr(H) + T\ Tr(\log\rho)$ as the gradient. I derive this by assuming $Tr(d\rho) = 0$ so I can do it as if $\rho$ and $d\rho$ commute.
- The projection I'm using is flawed. I am symmetrizing $\rho$ at each step using $\rho = \rho + \rho^\dagger$. I'm not sure how to project onto the positive semidefinite matrices though since $\rho$ is sometimes singular and I can't do $\rho = \sqrt{\rho^\dagger\rho}$ according to MATLAB. To keep the trace 1, I would take $\rho = \frac{\rho}{Tr(\rho)}$ within each step.
Can anyone point out how to approach this problem? Thank you.
The feasible set in your optimization problem (positive semidefinite matrices of unit trace) is called the spectrahedron.
The most popular projected-gradient like algorithms for minimizing a function over the spectrahedron are specializations of the Frank-Wolfe algorithm. See for example the paper Sparse Approximate Solutions to Semidefinite Programs by Elad Hazan here. The paper is well-cited, and many of the papers that cite it expand upon its original idea.