I could use some help with a matrix optimization problem. I have a $M \times M$ sample covariance matrix $\hat{\mathbf{C}}$ that I'd like to approximate using the decomposition $\hat{\mathbf{C}} = \mathbf{LL}^T$ where $\mathbf{L}$ is a $M\times K$, $K < M$, matrix with the following desired properties.
Each column of $\mathbf{L}$ is sparse. That is, all entries of $\mathbf{L}$ are near-zero or large in magnitude. Note that I don't require some maximum number of non-zero entries per column.
Each column of $\mathbf{L}$ is smooth. That is, adjacent entries in a column are in close to each other.
I thus want $\mathbf{L}$ to be the matrix that minimizes the penalized objective criterion,
$$|| \hat{\mathbf{C}} - \mathbf{LL}^T||_2^2 + \lambda_1 ||\mathbf{L}||_1 + \lambda_2\text{PEN}_2(\mathbf{L})$$
where $||\cdot||_1$ is the $L^1$ norm, $||\cdot||_2$ is the $L^2$ norm, and $\text{PEN}_2(\cdot)$ is a penalty on the second numeric derivative, and the $\lambda_j$ are tuning paramters. (Please request more detail if this isn't clear).
I had hoped to compute $\mathbf{L}$ via gradient descent, but I believe the roughness penalty (third term) makes it difficult to write down the gradient. So I began researching other optimization techniques that require only an objective function and found Simultaneous Perturbation Stochastic Approximation (SPSA). As this seemed like a promising solution, I searched for implementations of this algorithm and found the Python library qiskit which implements the algorithm in this class. In my application $\mathbf{L}$ will be a tall and skinny matrix (something like 100 by 5), but to test this SPSA optimization with fewer variables I let $\mathbf{L}$ be 10 by 2. However, even with the smaller $\mathbf{L}$ the algorithm didn't converge.
I have a few questions:
- Is it realistic to expect convergence with this objective function?
- Is SPSA a viable approach to this problem?
- If no to (2), what other approach might work?