PCA - How to calculate the Covariance matrix?

264 Views Asked by At

I have the following points: $$ \left\{ (1.1,1.4),(1.5,2.1),(1,1.6),(2,2.1),(2.3,3.2),(3.1,3.5),(1.9,2.7),(2.2,3.4),(0.5,1.2),(2.5,2.9)\right\} $$ I'm trying to figure what the get_covariance() of from sklearn.decomposition import PCA as sklearnPCA does. From what I understand, it first normalize: $$ \begin{cases} \overline{x}=\frac{1.1+1.5+1+2+2.3+3.1+1.9+2.2+0.5+2.5}{10}=1.81\\ \overline{y}=\frac{1.4+2.1+1.6+2.1+3.2+3.5+2.7+3.4+1.2+2.9}{10}=2.41 \end{cases} $$ Then you do $\left(x_{i}^{norm},y_{i}^{norm}\right)=\left(x_{i}-\overline{x},y_{i}-\overline{y}\right)$, so you get: $$ \left\{ (-0.71,-1.01),(-0.31,-0.31),(-0.81,-0.81),(0.19-0.31),(0.49,0.79),(1.29,1.09),(0.09,0.29),(0.39,0.99),(-1.31,-1.21),(0.69,0.49)\right\} $$ But now, how do you calculate the covariance matrix? It returns:

enter image description here

But how do I calculate it manually?

EDIT: My question is that I don't get how they got $Var(X)=0.616$. I get:
$$ Var(X)=\frac{1.1^{2}+1.5^{2}+1^{2}+2^{2}+2.3^{2}+3.1^{2}+1.9^{2}+2.2^{2}+0.5^{2}+2.5^{2}}{10}-(\frac{1.1+1.5+1+2+2.3+3.1+1.9+2.2+0.5+2.5}{10})^2=0.5549 $$ How did they got that number? Also I get: $$ Var(Y)=(\frac{1.4^2+2.1^2+1.6^2+2.1^2+3.2^2+3.5^2+2.7^2+3.4^2+1.2^2+2.9^2}{10})-(\frac{1.4+2.1+1.6+2.1+3.2+3.5+2.7+3.4+1.2+2.9}{10})^2=0.6449 $$ and not $Var(Y)=0.7165$.

The code in python if someone wants to try:

POINTS = [[1.1,1.4], [1.5,2.1], [1,1.6], [2,2.1], [2.3,3.2], [3.1,3.5], [1.9,2.7], [2.2,3.4], [0.5,1.2], [2.5,2.9]]
clf = sklearnPCA(n_components=1)
pca_transformed = clf.fit_transform(POINTS)
covariance_matrix = clf.get_covariance()
print(covariance_matrix)
2

There are 2 best solutions below

0
On

Each component of the matrix $a_{ij}$ will be: $$a_{ij} = E\left[(x_i - \mu_i)(x_j - \mu_j)\right]$$

In your case, $x_1$ and $x_2$ are $x$ and $y$, and I defined $\mu_i = E(x_i)$.

5
On

Compute $\bar{x^2}=\sum x_i^2 /n$, $\bar{y^2}=\sum y_i^2 /n$, $\bar{xy}=\sum x_iy_i /n$, Then $var(X)=\frac{n}{n-1}(\bar{x^2}-(\bar{x})^2)$. $var(Y)=\frac{n}{n-1}(\bar{y^2}-(\bar{y})^2)$, and $covar(XY)=\frac{n}{n-1}(\bar{xy}-\bar{x}\bar{y})$.
$\frac{n}{n-1}$ is needed because sample means are being used.

$\begin{pmatrix} var(X) & covar(XY) \\ covar(XY) & var(Y) \end{pmatrix}$