Merging two SVD factorizations but using only $U_1S_1$ and $U_2S_2$ (question about Radim's Phd Thesis [gensim])

785 Views Asked by At

I am reading Radim's Phd Thesis (the creator of gensim), on the chapter about SVD (page 45). There he puts certain assignment, indicating that we can calculate the "merge" of a couple of SVD factorizations $A_1=U_1S_1V_1^{T}$ and $A_2=U_2S_2V_2^{T}$. The purpose of such merge can be thought as an incremental SVD algorithm, which uses a couple of already calculated SVD and combines them into a single SVD factorization (which shall be the same as if we would have started with matrix $\begin{bmatrix}A_1 \mid A_2\end{bmatrix}$, on the first place).

The tricky part about his merge, is that it uses only the matrices $U_1S_1$ and $U_2S_2$; the matrices $V_1$ and $V_2$ are not saved because he cares about an streamed Latent Semantic Indexing application, where the columns of matrices $A_1$ and $A_2$ represent documents which can come continuously in high volumes (hence the resulting matrices $V_1$,$V_2$ from SVD could be quite big):

$USV^{T}$ $\xleftarrow{SVD}$ $\begin{bmatrix} U_1S_1 \mid U_2S_2 \end{bmatrix}$

The thesis uses $SVD_k$ indicating a truncated SVD, and actually the two already calculated SVDs are truncated as well. On the other hand, the one-line algorithm above for the merge is non efficient, as is not really exploiting the fact that we have already two calculated SVDs; nor that we do not really need the matrix $V^{T}$. The thesis also mentions a decay factor $\gamma$ that I am omitting here. All those details are not relevant for my question, but mentioning just as additional context. Actually, this one-line merge is just presented in the thesis, as one step in a sequence of refinements that eventually lead to an optimized merge (which evolves from the simple assignment presented here, to an algorithm with several steps).

Now the actual question: Radim seems to claim that the matrices $U$ and $S$ that we get from the SVD calculation below (using only $U_1S_1$ and $U_2S_2$):

\begin{equation} USV^{T} \xleftarrow{SVD} \begin{bmatrix} U_1S_1 \mid U_2S_2 \end{bmatrix} \end{equation}

are the same that we would have obtained, from calculating SVD against the full decompositions; that is:

\begin{equation} USV^{T} \xleftarrow{SVD} \begin{bmatrix} U_1S_1V_1^{T} \mid U_2S_2V_2^{T} \end{bmatrix} = \begin{bmatrix}A_1 \mid A_2\end{bmatrix} \end{equation}

Such equivalence is not evident to me; that is, the statement that we arrive to the same matrices $U$ and $S$ with either of the two equations above (first equation does not use the matrices $V_1$, $V_2$, while second equation does) ... is there a missing step or an additional theorem that justifies this equivalence? (even more, did I interpret incorrectly the claim?)

Would say "thanks in advance", but seems to be considered bad practice.

1

There are 1 best solutions below

4
On

By definition, $U_1S_1V_1^{T} = A_1$ and $U_2S_2V_2^{T} = A_2$.

When we concatenate $A_1$ and $A_2$, we get $\begin{bmatrix}A_1 \mid A_2\end{bmatrix}$ = $\begin{bmatrix}U_1S_1V_1^{T} \mid U_2S_2V_2^{T}\end{bmatrix}$ (simple substituion).

So it's trivially true that computing SVD from the LHS is equal to computing SVD from the RHS:

$USV^{T}$ $\xleftarrow{SVD}$ $\begin{bmatrix} U_1S_1V_1^{T} \mid U_2S_2V_2^{T} \end{bmatrix} = \begin{bmatrix}A_1 \mid A_2\end{bmatrix}$

There's nothing to prove, this is all from the definition of SVD. My guess is the confusion comes from some notation misunderstanding :)

Or maybe you mean, "how does the equivalence hold in case $U_1$, $S_1$, $U_2$, $S_2$ are already truncated rank-k matrices, i.e. $U_1^k$, $U_2^k$ etc, rather than full rank?". Is it true that

$$SVD_k(A_1 \mid A_2) = SVD_k(SVD_k(A_1) \mid SVD_k(A_2))$$ (eq. 3.8 in the thesis)?

The answer is of course it doesn't: each truncation introduces an approximation. Information is lost at each merge step.

What this online algorithm aims to do is lose as little information as possible = retain as much variance as possible, given we keep only the rank-k truncated matrices $S^k$, $U^k$ from the previous merge steps at any time.