I would like to start with practical applications of:
- Principal component analysis
- Low-rank approximation matrices
the problem that I found is that with some internet researches I can't really place this 2 topics in a specific branch of the math, and this means that I can't even find good resources about this 2 arguments.
I work primarily with 3d geometry, so nothing related to applied statistics or other fields.
These topics are both closely related to the SVD in linear algebra. Trefethen's book Numerical Linear Algebra has a good discussion of the SVD. But I've seen PCA discussed more directly in Machine Learning textbooks.
Here's my attempt at an explanation of PCA via the SVD:
Let $x_1,\ldots,x_N \in \mathbb R^D$, and assume for simplicity that the mean $\frac{1}{N} \sum_{n=1}^N x_n = 0$.
In order to reduce the dimensionality of our data, we can try to find a matrix $U = \begin{bmatrix} u_1 & \cdots & u_K \end{bmatrix}$ such that each vector $x_n$ belongs (approximately) to the column space of $U$: \begin{equation*} x_n \approx U c_n \end{equation*} for some vector of coefficients $c_n \in \mathbb R^K$. We will then be able to specify each $x_n$ (to a good approximation) using only $K$ numbers rather than $D$ numbers. If $K \ll D$ this may be a big savings.
In other words, if $X = \begin{bmatrix} x_1 & \cdots & x_N \end{bmatrix}$, we want to find a matrix $U$ such that \begin{equation*} X \approx U C \end{equation*} for some matrix of coefficients $C$. If we choose $U$ well, then $UC$ will be a good low rank approximation to $X$. (The rank of $UC$ is at most $K$.)
But we know how to find an optimal low rank approximation to $X$. The SVD gives it to us. That's why the SVD can be used for compression. Suppose that \begin{equation*} X = \sum_{i=1}^r \sigma_i u_i v_i^T \end{equation*} is an SVD of $X$. The first $K$ terms of this sum give us a very good approximation to $X$: \begin{align*} X &\approx \sum_{i=1}^K \sigma_i u_i v_i^T \\ &= U \Sigma V^T \\ &= UC. \end{align*} The PCA algorithm returns this matrix $U$, computed from the SVD of $X$, as output.
This matrix $U$ is optimal in the following sense: if $\hat{U} \in \mathbb R^{D \times K}$ and $\hat{C} \in \mathbb R^{K \times N}$, then \begin{equation*} \| X - UC \|_F \leq \| X - \hat{U} \hat{C} \|_F. \end{equation*} Indeed, out of all $D \times N$ matrices of rank at most $K$, $UC$ is an optimal approximation of $X$ (with respect to either the Frobenius norm or the 2-norm).