The way I understand it, the rank k approximation of a matrix A is really just a projection of the row vectors of A onto the span of a subspace of the domain given by the singular vectors pertaining to the largest singular values. Those projected row vectors will be the row vectors of the approximation matrix. I've figured out a proof for why projecting the row vectors on the singular vectors pertaining to the largest singular values gives the best approximation for all the row vectors of A, so that is fully clear to me.
But what I can't understand is why having a best approximation for the row vectors as the row vectors for the approximation matrix would be optimal to approximate the matrix in the first place. Why exactly the row vectors? Why not have the best possible approximation for the column vectors and let those column vectors be the column vectors of the approximation matrix? Or would that give the same approximation? Is there some intuition behind this?
This question was perhaps a bit abstract. I hope people understand what I'm getting at. Let me know if I should elaborate.
Any answer is appreciated! Merry christmas :)