I am following the online Linear Algelbra course by prof. Gilbert Strang from MIT,
In the video lecture on SVD, Link,
He mentioned at the beginning that he wanted to transform a vector $\vec{v}$ from R(A) into a vector $\vec{u}$ in the column space so that A$\vec{v}$ = $\vec{u}$ . These two vectors are also orthornomal basis of their corresponding space.
This raised a question from my deep confusion that is:
- For the general equation A$\vec{v}$ = $\vec{u}$, then all $\vec{v}$ 's come from the row space of A?
- What is the point to transform a vector as basis of row space into column space? What is the further connection between this and PCA in ML by which people reduce the high-dim dataset?
I do not understand so well this lecture due to my losing track of the first 10 minutes of the given video above, I am a beginner into ML, so I wish to learn more Linear Algebra, very sorry if my question is silly but for those beginners like me, it would help just make a huge jump in a field
Thank for your time and help :)
I'm not sure why he wants these bases. I think what he's trying to do is give you a geometric idea of what the singular value decomposition gives you. Once you have the decomposition $A = U \Sigma V^\top$, then you can find the bases he wants.
First, let's clear up a potential misconception. We cannot always find orthonormal bases $u_1, \ldots, u_r$ and $v_1, \ldots, v_r$ for the rowspace and columnspace respectively, such that $Au_i = v_i$ for all $i$ (orthogonal bases would be fine). What we can do is find such orthonormal bases so that $Au_i = \sigma_i v_i$, for some $\sigma_i \ge 0$. Note also that it is possible to assume both bases are the same length, as the row and column ranks are equal, as always.
Consider $$AV = U\Sigma.$$ Note that the columns $v_1, \ldots, v_n$ of $V$ are orthonormal, and as we always have with matrix multiplication, $$AV = A\left(\begin{array}{c|c}&&&\\v_1&v_2&\cdots & v_n\\&&&\end{array}\right) = \left(\begin{array}{c|c}&&&\\Av_1&Av_2&\cdots & Av_n\\&&&\end{array}\right).$$ If $u_1, \ldots, u_n$ are the columns of $U$ and $\sigma_1, \ldots, \sigma_n$ are the diagonal entries of $\Sigma$, then $$U\Sigma = A\left(\begin{array}{c|c}&&&\\\sigma_1 u_1&\sigma_2 u_2&\cdots & \sigma_n u_n\\&&&\end{array}\right).$$ This means that $Av_i = \sigma_i u_i$, as we'd expect.
The question is, why is it that the $v_i$s come from the rowspace, and the $u_i$s come from the columnspace? Well, actually, we only require the $v_i$s to be in the rowspace when $\sigma_i > 0$, and similarly for $u_i$s. We also need to address why these vectors are bases for these respective spaces.
Note that, if $\sigma_i > 0$, then $$u_i = \sigma_i^{-1} Av_i,$$ which is an element of the columnspace of $A$! Recall that $Av_i$ is just a linear combination of the columns of $A$, whose coefficients are just the entries of $v_i$. So, when $\sigma_i > 0$, we get $u_i$ is in the columnspace of $A$.
Similarly, $$A = U\Sigma V^\top \implies A^\top = V \Sigma U^\top \implies A^\top U = V \Sigma,$$ so symmetrically we'd expect $v_i = \sigma_i^{-1} A^\top u_i$, which means $v_i$ is in the columnspace of $A^\top$, and hence the rowspace of $A$.
Let's suppose that the first $r$ of the $\sigma_i$s are strictly positive. How can we show that $u_1, \ldots, u_r$ is a basis for the columnspace? How can we show $v_1, \ldots, v_r$ is a basis for the rowspace? Note that they are already known to be linearly independent.
Let's take an arbitrary element $Ax$ of the columnspace. Then $Ax = U\Sigma V^\top x$. Note that $\Sigma V^\top x$ is a column vector, and so $U(\Sigma V^\top x)$ is an element of the columnspace of $U$, i.e. an element of the span of the vectors $u_1, \ldots, u_n$. Moreover, given the factor of $\Sigma$, which has $0$s in all diagonal entries past the $r$th entry, the column vector $\Sigma V^\top x$ contains $0$s past the $r$th entry. This means that $U(\Sigma V^\top x)$ belongs not just to the span of $u_1, \ldots, u_n$, but to the span of $u_1, \ldots, u_r$. Thus, $u_1, \ldots, u_r$ does indeed form an orthonormal basis of the columnspace of $A$.
A symmetric argument works to show that $v_1, \ldots, v_r$ span, and hence form an orthonormal basis for, the rowspace.
I'm sure that, if you somehow managed to stumble across orthonormal bases with these properties, then extending them to bases of the full space would allow you to construct the singular value decomposition, but this answer is long enough.