My question is more of a theoretical kind, and concerns the way of getting document embeddings using pre-trained word embeddings and LSA algorithm. A solution was offered here and it implies that a co-occurrence document-term matrix is multiplied by a word embedding matrix:
document_vecs = dtm %*% vecs
A similar solution is offered by J. Silge and E. Hvitfeldt (using sparse matrices):
doc_matrix <- word_matrix %*% embedding_matrix
This solution works fine for my (practical) tasks, I just do not quite understand, why.
Suppose I factorize a term-document matrix M as UΣV^t, then my word embeddings are calculated as UΣ, and my document embeddings are calculated as ΣV^t (I found these formulas in an online DA course, would be grateful for literature references).
However, adopting the solution above, I can also calculate document embeddings by multiplying M^t (transposed in order to get document-term matrix) by word embeddings (UΣ).
But (UΣV^t)^t * UΣ = U^t * Σ^t * V * U * Σ = Σ^t * V * Σ (since U^t * U gives an identity matrix) = Σ^2 * V (and not ΣV^t, as expected).
I am not a professional mathematician, and probably there is some simple explanation. Why does this multiplication thing work? Would be most grateful for a solution.