Understanding SVD Notation

698 Views Asked by At

Given any $m \times n$ matrix $M$, one can write $$ M = U\Sigma V^T $$ is the Singular Value Decomposition, where $U$ and $V$ are orthonormal and $\Sigma$ is a diagonal matrix.

Now, the same $M$ can be written as:

$$M = \sum_{i=1}^r u_i c_i v_i^T\,,$$ where $u_i$ is the $i$th column of $U$, $v_i$ is the $i$th column of $V$ and $c_i$ is the $i$th diagonal entry of $\Sigma$.

I don't understand why the second representation is the same as first one?

In general, how could matrix multiplication be expressed as product of columns, I have learnt that matrices are multiplied row by column. This is the only way even Profs do, so how can a matrix multiplication be expressed as only involving column vectors?

Sorry if the question is too basic, but I am having lot of trouble understanding how people are using column vectors in matrix multiplications.

3

There are 3 best solutions below

0
On BEST ANSWER

First, just for clarity: $U$ is $m\times m$, $V$ is $n\times n$, and $\Sigma$ is $m\times n$.

We have, according to the first decomposition, that for any $1\leq i\leq m$ and $1\leq j\leq n$, $$ M_{i,j} = (U\Sigma V^T)_{ij} = \sum_{k=1}^n (U\Sigma)_{ik} (V^T)_{kj} = \sum_{k=1}^n (U\Sigma)_{ik} V_{jk} = \sum_{k=1}^n \sum_{\ell=1}^m U_{i\ell}\Sigma_{\ell k} V_{jk} $$ Now, since $\Sigma$ is an $m\times n$ diagonal matrix, $\Sigma_{\ell k}$ will be $0$ if $k\neq \ell$, and equal to $c_k$ otherwise. (Note also that it will only be non-zero for $k\leq r\stackrel{\rm def}{=} \min(n,m)$, since after that there is no $c_k$). Therefore, the inner sum can be simplified out, and we get $$ M_{i,j} = \sum_{k=1}^r U_{ik} c_k V_{jk} \tag{$\dagger$} $$

Now, let us look at the $(i,j)$-th entry of the other expression: since $u_k v_k^T$ is a matrix and $c_k$ a scalar, we have $$ \left(\sum_{k=1}^r u_k c_k v_k^T\right)_{i,j} = \left(\sum_{k=1}^r c_k (u_k v_k^T)\right)_{i,j} = \sum_{k=1}^r c_k (u_k v_k^T)_{i,j} $$ But what is $(u_k v_k^T)_{i,j}$? It is the product of the $i$-th entry of the vector $u_k$ and the $j$-th entry of the vector $v_k$, i.e. by definition it is $U_{ik} V_{jk}$. So overall, $$ \left(\sum_{k=1}^r u_k c_k v_k^T\right)_{i,j} = \sum_{k=1}^r c_k U_{ik} V_{jk} \tag{$\ddagger$} $$ and we get the same RHS as in $(\dagger)$, showing what we wanted.

2
On

Hint:

  • Given a matrix $A∈ℝ^{l×m}$ and a matrix $B∈ℝ^{m×n}$ the product $AB$ is a $l×n$ matrix. And for the entries holds $$(AB)_{ik} = \sum_{j=1}^m a_{ij}b_{jk}$$
  • A column vector $u$ is a matrix $m×1$. For the transpose holds $u^\top∈ℝ^{1×m}$. What kind of matrix is $u^\top u$ and what kind of matrix is $uu^\top$?

Can you take it from here?

If not - try to calculate $u^\top u$ and $uu^\top$ for $u=\begin{pmatrix}1\\2\\3\end{pmatrix}$.


Hint2: One straight forward way, to see if two matrices are the same, is to calculate and compare the entries.

Let $M=UΣV^\top$ and $\tilde{M}=\sum_{j=1}^mu^{(j)}c_{jj}v^{(j)\top}$.

First let $r=m$ for simplification. Since $c_{ii}=0$ for $i>r$, you can later just toss out some summands.

Step 1:
$$(ΣV^\top)_{ik} = \sum_{j=1}^nc_{ij}v^\top_{jk} = c_{jj}v_{jk}^\top.$$

Then it follows: $$M_{νμ}=[U(ΣV^\top)]_{νμ}=...=\sum_{j=1}^mU_{νj}c_{jj}V_{jμ}^\top.$$

Step 2:
I will use the notation $u^{(j)}$ for the column-vector. So we have: $$u^{(j)} = \begin{pmatrix} U_{1j} \\ \vdots \\ U_{mj} \end{pmatrix}$$ and $$v^{(j)} = \begin{pmatrix} V_{1j} \\ \vdots \\ V_{nj} \end{pmatrix} ⇒ v^{(j),\top}=(V_{1j}, …, V_{nj})=(V^\top_{j1},…,V^\top_{jn})$$ Here $V^\top_{j1}$ are again the entries of $V^\top$.

With that we get: $$(u^{(j)}v^{(j)\top})_{ik} = ... = U_{ij}V_{jk}^\top.$$ And then it follows: $$\tilde{M}_{νμ}=\sum_{j=1}^mc_{jj}[u^{(j)}v^{(j),\top}]_{νμ} = …$$


Can you fill in all …'s?

3
On

The identity $U \Sigma V^T=\sum_{i=1}^r u_i c_i v_i^T$ can be seen as an example of block multiplication.

We can write the matrix $U$ as a $1\times r$ block matrix of its columns: $$ U =\begin{pmatrix} u_1 &u_2 &\cdots &u_r \end{pmatrix}.\tag1 $$ similarly we can write $V$ as a block matrix of columns: $$ V =\begin{pmatrix} v_1 &v_2 &\cdots &v_r \end{pmatrix}.\tag2 $$ Notice that the elements of (1) and (2) are themselves matrices (called "blocks"). It is legal to perform matrix multiplication with block matrices as long as the block matrices are conformable and the blocks being multiplied together are conformable matrices. They are in this case, since $u_i$ is a column, $c_i$ is a scalar, and $v_i^T$ is a row. We obtain immediately:

$$U\Sigma V^T=\begin{pmatrix} u_1 &u_2 &\cdots &u_r \end{pmatrix} \begin{pmatrix} c_1 & & \\ & c_2 &\\ & & \ddots &\\ & & & c_r \end{pmatrix} \begin{pmatrix} v_1^T \\v_2^T\\ \vdots \\v_r^T \end{pmatrix}=\sum_{i=1}^r u_i c_i v_i^T $$