I've noticed a discrepancy between the eigenvalue spectrum retrieved via different eigendecomposition methods. As an example, here's code showing PCA of the Iris dataset, whereby both (Fig A) eigendecomposition of the covariance matrix and (Fig B) SVD on the dataset have been performed independently.
import sys, os, re
import numpy as np
from numpy import linalg as LA
import pandas as pd
import matplotlib
from matplotlib.figure import Figure
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
from sklearn.preprocessing import StandardScaler
# =============================================================================
# ready data:
# =============================================================================
df = pd.read_csv(
filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
header=None,
sep=',')
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
df.dropna(how="all", inplace=True) # drops the empty line at file-end
df.tail()
# split data table into data X and class labels y
X = df.iloc[:,0:4].values
y = df.iloc[:,4].values
X_std = StandardScaler().fit_transform(X)
mean_vec = np.mean(X_std, axis=0)
cov_mat = (X_std - mean_vec).T.dot((X_std - mean_vec)) / (X_std.shape[0]-1)
# =============================================================================
# method 1: np.linalg.eig()
# =============================================================================
eig_vals, eig_vecs = np.linalg.eig(cov_mat)
# =============================================================================
# method 2: np.linalg.svd()
# =============================================================================
u,s,v = np.linalg.svd(X_std.T)
# =============================================================================
# plot eigenvalues
# =============================================================================
fig = plt.subplots()
plt.subplot(1,4,1)
plt.scatter(range(len(s)),eig_vals)
plt.title('A: linalg.eig spectrum')
plt.ylabel('Eigenvalues')
plt.xlabel('Eigenvectors')
plt.xlim(0,4)
plt.subplot(1,4,2)
plt.scatter(range(len(s)),s)
plt.title('B: linalg.svd spectrum (S)')
plt.ylabel('Eigenvalues')
plt.xlabel('Eigenvectors')
plt.xlim(0,4)
plt.subplot(1,4,3)
plt.scatter(range(len(s)),s**2.)
plt.title('C: linalg.svd spectrum (S^2)')
plt.ylabel('Eigenvalues')
plt.xlabel('Eigenvectors')
plt.xlim(0,4)
plt.subplot(1,4,4)
plt.scatter(range(len(s)),s**(1/2.))
plt.title('D: linalg.svd spectrum (S^(1/2))')
plt.ylabel('Eigenvalues')
plt.xlabel('Eigenvectors')
plt.xlim(0,4)
plt.tight_layout()
plt.show()
As seen in the figure, the two eigenvalue spectrums (Fig A, B) have similar shapes, but are on different scales. Then when I take the square of the eigenvalues^(1/2) from SVD (Fig C, since SVD's central matrix is the square root of the eigs by theory), the spectra (A and C) have the exact same shape but on completely different scales.
Given all of this, what is the proper convention to represent these spectra? If I use SVD for a paper, for example, should I be squaring the eigenvalues in the central matrix and then normalizing them?