For computational efficiency in PCA, evaluation of covariance matrix is done as $C = AA^T$ instead of original $C=A^TA$. How can that be justified? What steps are needed then to recover the original information? Although I can find the beauty and idea of PCA algorithm, still having hard time understanding some concepts.
Note: Here $C$ is the covariance matrix and $A$ is the set of data vectors represented in a matrix.
Why use these matrix product at all?
Just compute the SVD, using Golub-Kahan or better,
$$A=USV^T,$$
$S$ diagonal, with non-negative diagonal, $U,V$ orthogonal, then you automatically also know the eigenvector decompositions
$$AA^T=U(SS^T)U^T\text{ and }A^TA=V(S^TS)V^T.$$
Representing the matrix as $A=\sum_{k=1}^r σ_k u_kv_k^T$ also allows you to see how the conversion that you asked about works. If $AA^T$ is diagonalized, the result can be written as $AA^T=\sum_{k=1}^r σ_k^2 u_ku_k^T$, the pairs $(λ_k=σ_k^2, u_k)$ are the result of the diagonalization. Since the vectors $u_k$ are orthonormal, one gets $A^Tu_k=σ_k v_k$, so that in turn $v_k=\frac1{\sqrt{λ_k}}A^Tu_k$, where the $v_k$ are the orthonormal eigenvectors of $A^TA$.