I was reading about covariance shrinkage on scikit learn guides and came across this line:
Mathematically, this shrinkage consists in reducing the ratio between the smallest and the largest eigenvalues of the empirical covariance matrix. It can be done by simply shifting every eigenvalue according to a given offset, which is equivalent of finding the l2-penalized Maximum Likelihood Estimator of the covariance matrix
I am unable to understand how $\Sigma_{\text{shrunk}} = (1-\alpha)\hat{\Sigma} + \alpha \frac{\text{tr}(\Sigma)}{p}I$ translates into L2 penalty of MLE estimation of covariance matrix. Any insights or mathematical proofs will be of great help. Thanks.
For reference: This is the link to the scikit learn guide. (Section 2.6.1)
If you want to do this very crudely, consider
$\lambda_1 \gt \lambda_2 \geq... \geq \lambda_p \gt 0$
(I infer there are $p$ distinct eigenvalues though you never defined $p$ here.)
All that is being done is observing, for any $c\gt 0$
$\frac{\lambda_1}{\lambda_p} \gt \frac{\lambda_1 + c}{\lambda_p + c}$
you can easily check that for any eigenvector $\Sigma \mathbf x = \lambda \mathbf x\longrightarrow \big(\Sigma + cI\big)\mathbf x = (\lambda +c) \mathbf x$.
So adding a scaled identity matrix shifts all eigenvalues by the same amount for your covariance matrix (which necessarily has a full set of eigenvectors being real symmetric)
from here there's a small refinement in deciding to take convex combinations, so instead they look at
$\frac{\lambda_1}{\lambda_p} \gt \frac{(1-\alpha) \lambda_1 + \alpha\bar{\lambda}}{(1-\alpha)\lambda_p + \alpha\bar{\lambda}}$
the inequality should be intuitively obvious but in case its not, since all terms are positive, you can clear the denominators, subtract $(1-\alpha)\lambda_1\lambda_p$ from each side and the claim reduces to $\alpha \bar{\lambda}\lambda_1 \gt \alpha \bar{\lambda}\lambda_p$
where $\bar{\lambda}$ is the arithmetic mean of the eigenvalues given by $\frac{1}{p}\big(\lambda_1 + \lambda_2 + ... +\lambda_p\big) = \frac{1}{p}\text{trace}\big(\Sigma\big)$