Covariance shrinkage and L2 penalty

188 Views Asked by Bumbble Comm At 10 May 2026 - 8:52

I was reading about covariance shrinkage on scikit learn guides and came across this line:

Mathematically, this shrinkage consists in reducing the ratio between the smallest and the largest eigenvalues of the empirical covariance matrix. It can be done by simply shifting every eigenvalue according to a given offset, which is equivalent of finding the l2-penalized Maximum Likelihood Estimator of the covariance matrix

I am unable to understand how $\Sigma_{\text{shrunk}} = (1-\alpha)\hat{\Sigma} + \alpha \frac{\text{tr}(\Sigma)}{p}I$ translates into L2 penalty of MLE estimation of covariance matrix. Any insights or mathematical proofs will be of great help. Thanks.

For reference: This is the link to the scikit learn guide. (Section 2.6.1)

Original Q&A

There are 1 best solutions below

Bumbble Comm On 13 Jan 2020 - 9:23

If you want to do this very crudely, consider
$\lambda_1 \gt \lambda_2 \geq... \geq \lambda_p \gt 0$
(I infer there are $p$ distinct eigenvalues though you never defined $p$ here.)

All that is being done is observing, for any $c\gt 0$
$\frac{\lambda_1}{\lambda_p} \gt \frac{\lambda_1 + c}{\lambda_p + c}$

you can easily check that for any eigenvector $\Sigma \mathbf x = \lambda \mathbf x\longrightarrow \big(\Sigma + cI\big)\mathbf x = (\lambda +c) \mathbf x$.

So adding a scaled identity matrix shifts all eigenvalues by the same amount for your covariance matrix (which necessarily has a full set of eigenvectors being real symmetric)

from here there's a small refinement in deciding to take convex combinations, so instead they look at

$\frac{\lambda_1}{\lambda_p} \gt \frac{(1-\alpha) \lambda_1 + \alpha\bar{\lambda}}{(1-\alpha)\lambda_p + \alpha\bar{\lambda}}$

the inequality should be intuitively obvious but in case its not, since all terms are positive, you can clear the denominators, subtract $(1-\alpha)\lambda_1\lambda_p$ from each side and the claim reduces to $\alpha \bar{\lambda}\lambda_1 \gt \alpha \bar{\lambda}\lambda_p$

where $\bar{\lambda}$ is the arithmetic mean of the eigenvalues given by $\frac{1}{p}\big(\lambda_1 + \lambda_2 + ... +\lambda_p\big) = \frac{1}{p}\text{trace}\big(\Sigma\big)$

Covariance shrinkage and L2 penalty

There are 1 best solutions below

Related Questions in COVARIANCE

Related Questions in MAXIMUM-LIKELIHOOD

Related Questions in REGULARIZATION

Trending Questions

Popular # Hahtags

Popular Questions