Understanding low-rank approximation, from the SVD

1.5k Views Asked by At

I've been on a couple Wikipedia pages today reading up on the SVD and the use of low rank approximation, and I have a couple of basic questions:

if

$$A = U\Sigma V^*$$

$$= [U_1 U_2] \begin{bmatrix} \Sigma_1 & 0 \\ 0 & \Sigma_2 \\ \end{bmatrix}[V_1 V_2]$$

then $A'=U_1\Sigma_1V_1$, called a "reduced SVD", is a rank $r$ matrix such that the Frobenius norm $||A-A'||_F$ is minimized.

So, does this mean that for some large data matrix -- let's say all the columns of $A$ represent lung cancer patients, and the rows represent variables such as the patients' age, height, weight, marital status, smoker / non-smoker, has or doesn't have family history of cancer, etc. -- with the lower rank $r$ matrix, we essentially "delete" all of the rows that are insignificant, in the sense that those rows of variables showed no variance and so isn't helpful? E.g. maybe the vast majority of patients are married, and so we delete the row corresponding to marital status. And so we keep all the rows of the matrix that have the most variance.

Intuitively, this seems wrong: based on the above, I could wrongly throw out the row variable of smoker status, if the vast majority of the patients were smokers and so there is little variance. But that would be throwing out pretty essential data that shows that most lung cancer patients were smokers.

So, where have I gone wrong in my thinking of low-rank approximation / the SVD?

Also, concerning the data matrix $A$: does it ever act on vectors via ? That would seem silly...what would its "action" even be? It's just an enormous array of the patients' data. It's not some...rotation...or dilation...or reflection....or projection...

whereas, in contrast, a stochastic matrix acting on probability vectors has the effect of 'updating' the probability vector of some Markov chain.

Thanks,

1

There are 1 best solutions below

5
On BEST ANSWER

If, for example, you want to predict lung cancer then your data set should have samples from both categories (i.e. healthy and sick people). If the smoking attribute was strongly associated with only sick people then it would not appear in all the data points (it would only appear for sick people) and would have a high variance.

Also remember that when you do low rank approximation you basically remove the contribution of the singular vectors that correspond to the smallest singular values. These do not necessarily correspond to the rows of the matrix $A$ but could correspond to some linear combination of them. I hope this helps.