I've been trying to write a function (python) to sample covariance matrices. Not sample from them, but to sample the matrices themselves. What I've found is that the positive semi-definite constraint makes sampling somewhat challenging.
By intuition, $cov(a,b) \le min[ var(a), var(b)]$. Great but this constraint alone doesn't quite meet the positive semidefinite requirement.
def cov_sampler(n, max_var=3):
# 1. Sample variances randomly
variances = np.random.uniform(low=0.01, high=max_var, size=n)
# 2. Initialize cov matrix
covariances = np.zeros([n, n])
# 3.0. Assign variance or covariance values respectively
for i in range(n):
for j in range(i,n):
var_i = variances[i]
var_j = variances[j]
# 3.1. assign variance along diagonal
if i==j:
covariances[i,j] = var_i
covariances[j,i] = var_i
continue
# 3.2. ensure cov(a,b) <= min[ var(a), var(b) ]
else:
max_cov = min(var_i, var_j)
cov = np.random.uniform(low=-max_cov, high=max_cov)
covariances[i,j] = cov
covariances[j,i] = cov
# 4.0 If min eigenvalue <= 0, subtract from all elements on diagonal
eigenvalues = np.linalg.eig(covariances)[0]
min_ev = np.min(eigenvalues)
diagonal = np.identity(n=n) * min_ev * int(min_ev < 0)
# 5.0 exit
return covariances - diagonal
My question is- why is the eigen dance necessary?
Intuitively I can reason that if $corr(a,b)$ is strong positive and $corr(b,c)$ is strong positive, then we might know something about $corr(a,c)$.
But why would removing negative eigen values from the diagonal be beneficial?