Why is the first left and right singular vectos scale by the first singular values a good approximation of the original matrix

Question

Why is the first left and right singular vectos scale by the first singular values a good approximation of the original matrix

1.2k Views Asked by Bumbble Comm At 26 Mar 2026 - 7:59

Conceptually, why is the first singular vector a good rank one approximation instead of something like the averaging of the total singular vectors?

If you have $$A = U\Sigma V^T $$

why isn't

$$\sqrt{\sigma_{avg}}u_{avg}v_{avg}^T$$ a good low rank approximation?

$$$$

How about weighted average of the singular vectors?

Context of my Question:

An exam with $m$ questions is given to $n$ students. The instructor collects all the grades in a $n * m $ matrix $G$ with $G_{ij}$ the grade obtained by student $i$ on the question $j$. We would like to assign a difficulty score to each question based on the available data.

How would you compute a rank one approximation to $G$

Solution:

To approximate $G$ by a rank one vector we simply compute the SVD of $G$ and select the singular vectors corresponding to the largest singular value. Precisely, we set $s \sqrt{\sigma_1}u_1$ and $q =\sqrt{ \sigma_i}v_1$ where $u_1 \:\: v_1$ are the first columns of the matrices $U \:\: V$ in the SVD of $G = U\Sigma V^T$ and $\sigma_1$ is the largest singular value

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2014-05-31 10:15:33

because leading singular values contains significantly great variance of original matrix,then others,imagine some signal or noise-signal analysis,big singular values contains big variance of original signal,then small one,which can be considered as a variance of noise,so leading singular values in this case can be used to approximate signal more accurately,then all of them,all of them is used to construct original matrix correctly,but sometimes we need to denose signal,so it is not necessary to recover original data,but get close approximation,which contains most of information and can be used instead of original data

**Bumbble Comm** · Answer 2 · 2016-06-01 14:20:52

The easiest way to see this is to use the Eckart–Young–Mirsky theorem, which states that the rank-k approximation $A_k$ to $A$ that minimizes Frobenius norm $|| A - A_k ||_F$ is $A_k = U_k \Sigma_k V_k^T$ for $\Sigma_k$ the top $k$ singular values and $U_k$ and $V_k$ the corresponding singular vectors. For $k=1$, this gives $A_1 = \sigma_1 u_1 v_1^T$.

Why is the first left and right singular vectos scale by the first singular values a good approximation of the original matrix

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in SVD

Trending Questions

Popular # Hahtags

Popular Questions