SVD factorizes a matrix $A \in \mathbb{R}^{mxn}$ such that $A = U\Sigma V^T$, quoting from Deisenroth et al:
Instead of doing the full SVD factorization, we will now investigate how the SVD allows us to represent a matrix $A$ as a sum of simpler (low-rank) matrices $A_i$, which lends itself to a matrix approximation scheme that is cheaper to compute than the full SVD. We construct a rank-1 matrix $A_i \in \mathbb{R}^{mxn}$ as: $$A_i := u_iv_i^T,$$ which is formed by the outer product of the ith orthogonal column vector of $U$ and $V$.
Each $A_i$ is then multiplied by its corresponding $\sigma _i$ in $\Sigma$, and all the rank-1 matrices are summed up to the chosen rank approximation.
I need to understand the intuition behind using the outer product, I know that each rank-1 matrix is weighted by its singular value (which is sorted in a descending order, so that the last few matrices in the summation have minimal weight), but I can't grasp how the outer product does the trick. So, any pointers?
Write the singular value decomposition as \begin{equation*} A = U\Sigma V^\top = \begin{bmatrix} u_1 & u_2 & \cdots & u_m\end{bmatrix}\begin{bmatrix}\tilde{\Sigma} & 0_{r\times (n-r)} \\ 0_{(m-r)\times r} & 0_{(m-r)\times(n-r)} \end{bmatrix}\begin{bmatrix} v_1^\top \\ v_2^\top \\ \vdots \\ v_n^\top \end{bmatrix} \end{equation*} where $\tilde{\Sigma} = \begin{bmatrix}\sigma_1 & 0 & \cdots & 0 \\ 0 & \sigma_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_r\end{bmatrix}$. Then multiplying out the above expression yields \begin{equation*} A = \sum_{i=1}^r \sigma_i u_i v_i^\top. \end{equation*} You can now drop terms from the above dyadic sum in order to approximate $A$ by its leading principal components. In fact, by the Eckart–Young–Mirsky theorem, we can show that the best rank-$k$ approximation of $A$ (in terms of the spectral norm and the Frobenius norm) is given by \begin{equation*} \hat{A}^* = \arg\min_{\text{rank}(\hat{A}) \le k}\|\hat{A} - A\| = \sum_{i=1}^k\sigma_i u_iv_i^\top. \end{equation*} See here for a proof of this optimality.