I'm planning to use an algorithm called Harmony, designed for data normalization, particularly in the context of single cell data analysis. Harmony operates by taking principal components (PCs) as input and outputting corrected principal components (PC').
Typically, Harmony is applied to the top k ranks of PCs to correct for batch effects or other sources of variability. However, I would like retain the original data structure post-normalization. To do this I would input all PCs generated from my dataset into Harmony. I would use singular value decomposition (SVD) to generate all possible principal components.
The SVD of a matrix A can be represented as:
$$ A = U S V^T $$
Here, U and V are orthogonal matrices, while S is a diagonal matrix containing the singular values. To create the PCs, I would multiply U S and not perform a k-rank approximation subsetting the PCs as all methods i've seen that use harmony.
These PCs would input to harmony which produces the output PC'.
$$ PC \rightarrow Harmony \rightarrow PC' $$
After processing through Harmony, and acquiring the normalized PCs (PC'). My assumption is that the normalized component PC' can equivalently be represented as the product of modified matrices U' and S', implying:
$$ PC' \equiv U'S' \equiv (US)' $$
The ultimate goal is to reconstruct the original dataset in its normalized form, which I assume can be achieved by:
$$ (US)' V^T = A' $$
My questions to the community are:
- Given Harmony normalizes the PCs (PC -> PC'), is it mathematically sound to equate the normalized PCs (PC') to $(US)'$ and then use this to reconstruct the original data matrix in its normalized form, as in $(US)' V^T = A'$?
- Are there any conceptual or mathematical flaws in my approach to using all PCs for normalization with Harmony and subsequently reconstructing the dataset?