Numerical low-rank approximation of sample covariance matrix

105 Views Asked by At

I could use some help with a matrix optimization problem. I have a $M \times M$ sample covariance matrix $\hat{\mathbf{C}}$ that I'd like to approximate using the decomposition $\hat{\mathbf{C}} = \mathbf{LL}^T$ where $\mathbf{L}$ is a $M\times K$, $K < M$, matrix with the following desired properties.

  1. Each column of $\mathbf{L}$ is sparse. That is, all entries of $\mathbf{L}$ are near-zero or large in magnitude. Note that I don't require some maximum number of non-zero entries per column.

  2. Each column of $\mathbf{L}$ is smooth. That is, adjacent entries in a column are in close to each other.

I thus want $\mathbf{L}$ to be the matrix that minimizes the penalized objective criterion,

$$|| \hat{\mathbf{C}} - \mathbf{LL}^T||_2^2 + \lambda_1 ||\mathbf{L}||_1 + \lambda_2\text{PEN}_2(\mathbf{L})$$

where $||\cdot||_1$ is the $L^1$ norm, $||\cdot||_2$ is the $L^2$ norm, and $\text{PEN}_2(\cdot)$ is a penalty on the second numeric derivative, and the $\lambda_j$ are tuning paramters. (Please request more detail if this isn't clear).

I had hoped to compute $\mathbf{L}$ via gradient descent, but I believe the roughness penalty (third term) makes it difficult to write down the gradient. So I began researching other optimization techniques that require only an objective function and found Simultaneous Perturbation Stochastic Approximation (SPSA). As this seemed like a promising solution, I searched for implementations of this algorithm and found the Python library qiskit which implements the algorithm in this class. In my application $\mathbf{L}$ will be a tall and skinny matrix (something like 100 by 5), but to test this SPSA optimization with fewer variables I let $\mathbf{L}$ be 10 by 2. However, even with the smaller $\mathbf{L}$ the algorithm didn't converge.

I have a few questions:

  1. Is it realistic to expect convergence with this objective function?
  2. Is SPSA a viable approach to this problem?
  3. If no to (2), what other approach might work?