Using PCA, if we reduce the dimension of a dataset $x_1, \dots, x_n \in \mathbb{R}^d$ of mean zero, then we can get a dimensionally reduced dataset $y_1, \dots , y_n \in \mathbb{R}^k$, for some $1\leq k \leq d$
How can we know that PCA shrinks the dataset between data points? And how can we know that new features are uncorrelated?
My idea:
To show the distance: we can show that $\|y_i - y_j\| \leq \|x_i - x_j\|$
And for $f^{(i)} = (y_{1,i}, y_{2,i}, \dots , y_{n,i}) \in \mathbb{R}^n$ where $ 1 \in \{1,\dots,k\}$ we want to show that for all $i \neq j, f^{(i)} \perp f^{(j)}$
We know that
$y_i = \langle x_i,e_j \rangle$
$\|y_i - y_j\| =\sqrt{\sum_{i=1}^n(y_i-y_j)^2}$
The $n \times k$ matrix $Y$ (whose rows are $y_1, \ldots, y_n$) is $Y = XV_{1:k}$ where the columns of $V_{1:k} \in \mathbb{R}^{d \times k}$ are the orthonormal eigenvectors of $X^\top X$ corresponding to the largest $k$ eigenvalues.