Time complexity of some matrix multiplication

1.8k Views Asked by At

I am reading a paper that uses some matrix multiplication as the following,

enter image description here

where $n>>d$, $\bf A,B$ are a $n \times d$ matrices, $\bf U$ is a $n\times n$ matrix, and $\bf S$ is a $n\times n$ diagonal matrix with only the first $d$ diagonal elements being non-zero (${\text {rank}} {\bf A}$ at most $d$, so there are at most $d$ non-zero singular values). All matrices contain only non-negative values.

The multiplication of a $l\times m$ matrix and a $m\times n$ matrix take $O(lmn)$ time.

I tried different multiplication orders but cannot arrive at a $nd^2$ time complexity. For example,

1) first compute $US^{-1}$, then result is a $n\times n$ matrix with only the first $d$ columns being "non-zero". Since $S$ is diagonal with $d$ non-zero diagonal elements, the time complexity is $O(nd)$.

2) then compute $A^T(US^{-1})$, the result is a $d\times n$ matrix with only the first $d$ columns being "non-zero". The time complexity is $O(nd^2)$.

3) then compute $B(A^TUS^{-1})$, the result is a $n\times n$ matrix with the first $d$ columns being "non-zero". The time complexity is $O(nd^2)$.

4) then compute $S^{-1}U^T$ similar to 1), the result is a $n\times n$ matrix with the first $d$ rows being "non-zero". The time complexity is $O(nd)$.

5) finally the do multiplication $(BA^TUS^{-1})(S^{-1}U^T)$, but this takes $O(n^2d)$ time.

I tried some other order without success to derive $d^2n$ complexity. Am I missing something, or the paper is wrong about this?

1

There are 1 best solutions below

0
On BEST ANSWER

We don't know enough about $B$ to prove that computing $BX$ has the stated complexity. Given what we are, $BX$ is computable in $O(n^2d)$.

A bit further in the paper they write the following:

... What is more, we can see from Eq. (5) that $P$ itself has the low-rank structure. In other words, we do not need to store in the form of $n \times n$. Instead, we can represent (compress) $P$ as the multiplication of two low-rank matrices $X$ and $Y$...

From this, I think by calculating $P$ they mean calculating a representation of $P$ in terms of $X$ and $Y$. After omitting the simpler terms $(Y)$, this amounts to calculating $X$, which is doable in $O(nd^2)$. One way of doing so follows now.

$US^{-2}$ is calculable in $O(d^2n)$, because $S^{-2}$ is nice.

Next is $A^T(US^{-2})$; it will have at most $d$ non-zero columns, and $A^T$ is $d \times n$, so there are $d^2$ numbers to calculate, and you are taking the scalar product of $n$-dim vectors, so the complexity is again $O(d^2 n)$. The result is $d \times n$, where the last $n-d$ columns are $0$.

Finally, $(A^T(US^{-2}))U^T$. The result is $d \times n$, and we are taking the scalar product of $n$-dim vectors. However, the left side gives vectors that have at least $n-d$ zero coordinates, so we are effectively only working with $d$-dim vectors, thus the complexity is again $O(nd^2)$.

Allow me a bit about SVD, because I feel some confusion there. If $A = USV^T$ is $n\times d$, then $U$ is $n\times n$, $S$ is $n \times d$, and $V$ is $d \times d$.
$V$ acts on $\mathbb R^d$ by some rotation or reflection, then $S$ scales some coordinates (some by zero, some by not), and embeds the result in $\mathbb R^n$ the easiest way: by adding a few extra zero coordinates. Finally, it's $U$'s turn to act on $\mathbb R^n$ by rotation or reflection. Next, the pseudo-inverse.

$$(AA^T)^\dagger = (USV^TVSU^T)^\dagger = (US^2U^T)^\dagger = U (S^2)^\dagger U^T $$ where $(S^2)^\dagger = S^{-2}$ is a diagonal, $n \times n$ matrix such that its diagonal elements are the reciprocals of $S^2$'s diagonal elements($0^{-1} = 0$).

So $US^{-1}$ doesn't make sense, because $S^{-1}$ is $d \times n$.
Thus we know that $S^{-2}$ should be kept as it is, reducing the number of possibilities.