Mathematical proof of Randomized SVD algorithm description

118 Views Asked by At

I would like to be helped with understanding of Randomized SVD algorithm description as presented here (section Inside of redsvd).

Let $\bf{A}$ be a matrix to be analyzed with $n$ rows and $m$ columns, and $r$ be the rank of a truncated SVD (if $r = min(n, m)$ is chosen, then the result is full SVD).

As the first step, a random Gaussian matrix $\bf{O}$ with $m$ rows and $r$ columns is created and a sample matrix $\bf{Y} = \bf{A}^{\mathsf{T}} \bf{O}$ is computed. Then the Gram-Schmidt process is applied to $\bf{Y}$ so that each column of $\bf{Y}$ is orthonormalized. Then matrix $\bf{B} = \bf{A}\bf{Y}$ with $n$ rows and $r$ columns is computed. Although the size of $\bf{Y}$ is much smaller than that of $\bf{A}$, $\bf{Y}$ holds the information of $\bf{A}$, i.e. $\bf{A}\bf{Y}\bf{Y}^{\mathsf{T}} = \bf{A}$. Intuitively, the row information is compressed by $\bf{Y}$ and can be decompressed by $\bf{Y}^{\mathsf{T}}$.

...

Why the equation $\bf{A}\bf{Y}\bf{Y}^{\mathsf{T}} = \bf{A}$ holds?

If $\bf{Y}$ had orthonormal rows, then $\bf{Y}\bf{Y}^{\mathsf{T}} = I$ would be true. However, $\bf{Y}$ being rectangular matrix with orthonormal columns, can you please tell me, how they come up with that equation? The description of compression is very vague, I need mathematical proof.