Conceptually, why is the first singular vector a good rank one approximation instead of something like the averaging of the total singular vectors?
If you have $$A = U\Sigma V^T $$
why isn't
$$\sqrt{\sigma_{avg}}u_{avg}v_{avg}^T$$ a good low rank approximation?
$$$$
How about weighted average of the singular vectors?
Context of my Question:
An exam with $m$ questions is given to $n$ students. The instructor collects all the grades in a $n * m $ matrix $G$ with $G_{ij}$ the grade obtained by student $i$ on the question $j$. We would like to assign a difficulty score to each question based on the available data.
How would you compute a rank one approximation to $G$
Solution:
To approximate $G$ by a rank one vector we simply compute the SVD of $G$ and select the singular vectors corresponding to the largest singular value. Precisely, we set $s \sqrt{\sigma_1}u_1$ and $q =\sqrt{ \sigma_i}v_1$ where $u_1 \:\: v_1$ are the first columns of the matrices $U \:\: V$ in the SVD of $G = U\Sigma V^T$ and $\sigma_1$ is the largest singular value
because leading singular values contains significantly great variance of original matrix,then others,imagine some signal or noise-signal analysis,big singular values contains big variance of original signal,then small one,which can be considered as a variance of noise,so leading singular values in this case can be used to approximate signal more accurately,then all of them,all of them is used to construct original matrix correctly,but sometimes we need to denose signal,so it is not necessary to recover original data,but get close approximation,which contains most of information and can be used instead of original data