Using principal component analysis to reduce dimension, how can we know the distance between data points is reduced and features are uncorrelated?

95 Views Asked by At

Using PCA, if we reduce the dimension of a dataset $x_1, \dots, x_n \in \mathbb{R}^d$ of mean zero, then we can get a dimensionally reduced dataset $y_1, \dots , y_n \in \mathbb{R}^k$, for some $1\leq k \leq d$

How can we know that PCA shrinks the dataset between data points? And how can we know that new features are uncorrelated?

My idea:
To show the distance: we can show that $\|y_i - y_j\| \leq \|x_i - x_j\|$
And for $f^{(i)} = (y_{1,i}, y_{2,i}, \dots , y_{n,i}) \in \mathbb{R}^n$ where $ 1 \in \{1,\dots,k\}$ we want to show that for all $i \neq j, f^{(i)} \perp f^{(j)}$

We know that $y_i = \langle x_i,e_j \rangle$
$\|y_i - y_j\| =\sqrt{\sum_{i=1}^n(y_i-y_j)^2}$

1

There are 1 best solutions below

0
On BEST ANSWER

The $n \times k$ matrix $Y$ (whose rows are $y_1, \ldots, y_n$) is $Y = XV_{1:k}$ where the columns of $V_{1:k} \in \mathbb{R}^{d \times k}$ are the orthonormal eigenvectors of $X^\top X$ corresponding to the largest $k$ eigenvalues.

  • Note that $Y^\top Y = V_{1:k}^\top X^\top X V_{1:k}$. The left-hand side is a matrix whose $(i,j)$ entry is $\langle f^{(i)}, f^{(j)}\rangle$. The right-hand side is a $k \times k$ diagonal matrix whose diagonal entries are the largest $k$ eigenvalues of $X^\top X$. Since the matrix is diagonal, we have $f^{(i)} \perp f^{(j)}$ for $i \ne j$.
  • $y_i = x_i^\top V_{1:k}$, so $$\|y_i - y_j\|^2 = \|(x_i - x_j)^\top V_{1:k}\|^2 = (x_i - x_j)^\top V_{1:k} V_{1:k}^\top (x_i - x_j) = \sum_{m=1}^k (x_{i,m} - x_{j,m})^2 \le \sum_{m=1}^d (x_{i,m} - x_{j,m})^2 = \|x_i - x_j\|^2.$$ (Above, we noted that $V_{1:k} V_{1:k}^\top$ is a $d \times d$ diagonal matrix with $k$ ones and $d-k$ zeros on the diagonal.)