What does singular value decomposition of covariance matrix represent?

788 Views Asked by At

I am reading the paper "Understanding dimensional collapse in contrastive self-supervised learning." The authors identified a dimensional collapse phenomenon:

i.e. some dimension of embedding collapses to zero. They show this by collecting the embedding vectors on the validation set. Each embedding vector has a size of $d=128$, then compute the covariance matrix $C\in\mathbb{R}^{d\times d}$. Then the singular value decomposition is applied on the covariance matrix. They state that a number of singular values collapse to zero, thus representing collapsed dimensions.

Thus my questions are:

  1. What does singular value decomposition of covariance matrix represent?
  2. Why a number of singular value of covariance matrix collapse to zero can represent these dimension of embedding collapse?
1

There are 1 best solutions below

0
On

I had the same question and found this to be a good answer that reflected my empirical observations of using the square root of singular values of the covariance matrix as the scale of major axes of variance, invariant to how the data is rotated: http://www.cs.utah.edu/~tch/CS4640F2019/resources/A%20geometric%20interpretation%20of%20the%20covariance%20matrix.pdf

Keep in mind that because all covariance matrix are symmetric and positive semi-definite, their singular values are the same as their eigenvalues. So you don't actually need to compute the SVD and can just directly compute the eigenvalues if you are interested in a rotation invariant measure of scale. As for $U$ and $V$, You can think of $V$ as rotating such that the axes are aligned to the major axes of variance, then we apply the eigenvalues / singular values to scale them. Because the covariance matrix is symmetric and positive semi-definite, $U = V$. I don't have a proof for it, but at least in this case, $U$ and $V$ appear to be involutory (their own inverse). So $U$ rotates back to the original space which isn't aligned on the major axes of variance.

Keep in mind my background is in engineering, so apologies for any lack of rigor here, especially in the second paragraph.