I am trying to verify my solution for a simple problem using numpy.
So we are given a data matrix, $\mathbf{X}$, where each row is a datapoint. We are also given its SVD, $\mathbf{U\Delta V}^T$. We are asked to compute the eigendecomposition of the variance ($\mathbf{\Sigma}=\frac{1}{N}\mathbf{X}^T\mathbf{X}$ with $N$ number of datapoints).
What I've done is:
$$ \mathbf{\Sigma} = \frac{1}{N}\mathbf{X}^T\mathbf{X} = \frac{1}{N}\mathbf{V\Delta U}^T\mathbf{U\Delta V}^T = \frac{1}{N}\mathbf{V\Delta}^2\mathbf{V}^T \implies \mathbf{\Sigma V} = \mathbf{V}\frac{\mathbf{\Delta^2}}{N} $$
So, eigenvectors are the columns of $\mathbf{V}$ while eigenvalues are the elements in the diagonal of $\frac{\mathbf{\Delta^2}}{N}$.
The prolem is when I try to verify this in numpy by running this code:
X = np.random.uniform(1, 20, [10, 10])
U, D, Vt = np.linalg.svd(X)
eigenvalues, eigenvectors = np.linalg.eig(1/10 * X.T@X)
and then comparing 1/10*D**2 with eigenvalues and Vt.T with eigenvectors, some (not all!) of the values in V and the eigenvectors have different sign (but same absolute value). Note that the eigenvalues have the right sign.
Is there an error in my hand-computation, is this some approximation error or is this a propriety of eigenbases I am not aware of?
If $v$ is an eigenvector of some operator with the corresponding eigenvalue $\lambda$, then for any $k \neq 0$ the vector $kv$ is also an eigenvector with the same eigenvalue $\lambda$. To put it another way, eigenvectors corresponding to $\lambda$ (together with the zero vector) form a linear subspace, — point of view that is especially important when considering degenerate eigenvalues (with geometric multiplicity > 1).
Thus, even if you normalize the eigenvectors, still both $v$ and $-v$ are valid answers for the same $\lambda$. Naturally, you don't have much control over which exactly normalized eigenvector will be computed via a numerical method; yet you always can multiply it by $-1$ if you need to.